IBM Support

IC99534: RESTART OF SERVER WITH LARGE AMOUNTS OF ATTACHED STORAGE CAN CAUSE INVALID REPORTING THAT STANDBY SERVER IS DISCONNECTED

Fixes are available

Refresh Pack 5.2.2 (June 2014) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.5 (July 2014) for Tivoli Storage Productivity Center
Refresh Pack 5.2.3 (August 2014) for Tivoli Storage Productivity Center
Fix Pack 5.2.4 (November 2014) for Tivoli Storage Productivity Center
Fix Pack 5.2.4.1 (December 2014) for Tivoli Storage Productivity Center
Refresh Pack 5.2.5 (March 2015) for Tivoli Storage Productivity Center (withdrawn)
Fix Pack 5.1.1.6 (March 2015) for Tivoli Storage Productivity Center
Fix Pack 5.2.5.1 (April 2015) for Tivoli Storage Productivity Center (withdrawn)
Refresh Pack 5.2.6 (June 2015) for Tivoli Storage Productivity Center
Refresh Pack 5.2.7 (August 2015) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.9 (October 2015) for Tivoli Storage Productivity Center
IBM Spectrum Control V5.2.8 (December 2015)
IBM Spectrum Control V5.2.9 (February 2016)
IBM Spectrum Control V5.2.10 (May 2016)
IBM Spectrum Control V5.2.10.1 (July 2016)
IBM Spectrum Control V5.2.11 (August 2016)
Fix Pack 5.1.1.12 (October 2016) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.13 (February 2017) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.14 (June 2017) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.15 (Sept 2017) for Tivoli Storage Productivity Center
IBM Spectrum Control V5.2.12 (November 2016)
IBM Spectrum Control V5.2.13 (March 2017)
IBM Spectrum Control V5.2.14 (May 2017)
IBM Spectrum Control V5.2.15 (August 2017)
IBM Spectrum Control V5.2.15.2 (November 2017)
IBM Spectrum Control V5.2.16 (March 2018)
IBM Spectrum Control V5.2.17 (May 2018)
Fix Pack 5.1.1.8 (July 2015) for Tivoli Storage Productivity Center
Fix Pack 5.2.7.1 (February 2016) for Tivoli Storage Productivity Center

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When TPC-R is shut down it does not properly remove all of the
    essni configs from the standby
    server. This causes a timing window where if the reconnect is
    issued prior to the initialization of
    the hardware layer a second thread creates a new connection to
    the hardware. Upon completing
    initialization of the hardware layer the second thread already
    has a connection and thus it cannot
    connect. The hmc connection on the standby shows that it is
    connected but the clusters are then
    marked as either "Authentication Failure" or "Controller
    Disconnected". This is merely a reporting
    issue because on a Takeover command the hardware client is
    initialized properly and the storage
    will again show as connected to the clusters.
    

Local fix

  • On the Active server either change the site location or update
    the password. The change will drive an update and change the
    status to connected.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Any user with a large amount of attached storage using TPC-R *
    * High Availability                                            *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * Once the High Availability relationship has been             *
    * synchronized both TPC-R servers will show connected to local *
    * and remote storage. If the standby is restarted it goes      *
    * through a sequence of re-initializing the local connections  *
    * to the storage. This sequence takes longer the more storage  *
    * that is attached. If this startup sequence is not completed  *
    * prior to the High Availability reconnect sequence 2          *
    * simultaneous connections will be opened to the HMC ESSNI     *
    * Client using the same GUID. The second connection is refused *
    * by the server and reports an Authentication failure.         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Wait until the Standby has completely connected to the       *
    * storage prior to issuing the High Availability reconnect     *
    * command.                                                     *
    ****************************************************************
    

Problem conclusion

  • The startup sequence will now detect if a second connection is
    being opened and close out the first one prior to making a new
    connection.
    

Temporary fix

  • If the standby is reporting an issue with authentication failure
    the customer can attempt the following workarounds
    1) Update the password for the HMC connection in the storage
    details panel
    2) Change the site location on the active server (the site
    location can be changed back to the original after).
    3) Prevent the startup procedure on the standby server from
    making a connection prior to the reconnect
    (A) Stop the TPC-R server on the STANDBY ONLY and remove the
    csmdb directory
    zOS:
    -path_prefix- {/var | /opt}/Tivoli/RM/database/csmdb/
    Distributed systems:
    <TPC_HOME>/ewas/profiles/ReplicationServerProfile/database/csmdb
    <WAS_HOME>/
    (B) Restart the server, this will leave a clean standby server,
    set the server as a standby and allow the High Aviailability to
    synchronize
    

Comments

APAR Information

  • APAR number

    IC99534

  • Reported component name

    TPC

  • Reported component ID

    5608TPC00

  • Reported release

    510

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-02-21

  • Closed date

    2014-03-07

  • Last modified date

    2014-03-07

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TPC

  • Fixed component ID

    5608TPC00

Applicable component levels

  • R511 PSY

       UP

  • R520 PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS5R93","label":"IBM Spectrum Control"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
23 March 2022