Why doesn't my client reconnect to the SSM after SSM failover?

If VEMKD, SD, and SSM are running on the primary host during failover, the primary candidate will take over within two minutes and start VEMKD, SD and SSM within another two minutes to complete the SSM recovery. If the SSM is running on another management host when failover occurs, the SSM recovery should be complete within two minutes. If the client is unable to reconnect to the SSM after failover, it may be caused by the settings for the client reconnection environment variables. Each of the following environment variables plays a role in the client reconnection process:
  • PLATCOMMDRV_TCP_KEEPALIVE_TIME
  • SOAM_RELOCATED_RECONNECTION_RETRY_LIMIT
  • SOAM_RELOCATED_RECONNECTION_RETRY_INTERVAL
  • SOAM_RECONNECTION_RETRY_LIMIT
  • SOAM_RECONNECTION_RETRY_INTERVAL
Verify that the environment variables are properly configured by checking that the following calculation:
PLATCOMMDRV_TCP_KEEPALIVE_TIME
+ (SOAM_RELOCATED_RECONNECTION_RETRY_LIMIT * SOAM_RELOCATED_RECONNECTION_RETRY_INTERVAL)
+ (SOAM_RECONNECTION_RETRY_LIMIT * SOAM_RECONNECTION_RETRY_INTERVAL)
is greater than the failover time (for VEMKD, SD, and SSM).

Note that the failover time will be less than four minutes for an SSM running on a primary host, and less than two minutes for an SSM running on a management host.

If the calculated time of the environment variables is less than the failover time, the client cannot reconnect to the SSM and the client will exit after the calculated time. The client detects that the connection is broken (as it is notified by the operating system) after the PLATCOMMDRV_TCP_KEEPALIVE_TIME interval (in seconds) passes, which then triggers the reconnection process. Thus, when failover occurs, the client will remain idle for the value of the PLATCOMMDRV_TCP_KEEPALIVE_TIME interval, without receiving any new task output from the newly started SSM.

Note: If PLATCOMMDRV_TCP_KEEPALIVE_TIME is not configured, IBM® Spectrum Symphony will set this value to 180 seconds (that is, three minutes) by default.