Restart light fails on one host, but completes on another host

During a restart light, a member fails over to a guest host, so that the recovery process can complete. Use the information in this topic to help you diagnose why a restart light on the initial guest host is unsuccessful, but then completes successfully on a second guest host.

Symptoms

The following sample output from the db2instance -list command shows an environment with three members and two cluster caching facilities:
ID        TYPE         STATE                 HOME_HOST    CURRENT_HOST    ALERT   PARTITION_NUMBER        LOGICAL_PORT    NETNAME
--        ----         -----                 ---------    ------------    -----   ----------------        ------------    -------
0         MEMBER       WAITING_FOR_FAILBACK  hostA        hostC           YES                    0                   1    hostC-ib0
1         MEMBER       STARTED               hostB        hostB           NO                     0                   0    hostB-ib0
2         MEMBER       STARTED               hostC        hostC           NO                     0                   0    hostC-ib0
128       CF           PRIMARY               hostD        hostD           NO                     -                   0    hostD-ib0
129       CF           PEER                  hostE        hostE           NO                     -                   0    hostE-ib0

HOSTNAME    STATE      INSTANCE_STOPPED ALERT
--------    -----      ---------------- -----
hostA       INACTIVE   NO               YES
hostB       ACTIVE     NO               NO
hostC       ACTIVE     NO               NO
hostD       ACTIVE     NO               NO
hostE       ACTIVE     NO               NO
Member 0 experienced a problem with its home host, hostA, and attempted a restart light on hostB. However, the restart light failed on hostB. The member then attempted a restart light on hostC, which was successful.
If hostA becomes available again, its state will change from INACTIVE to ACTIVE. member 0 will fail back to hostA, and the state of the member will change from WAITING_FOR_FAILBACK to STARTED.
ID        TYPE             STATE           HOME_HOST   CURRENT_HOST    ALERT   PARTITION_NUMBER        LOGICAL_PORT    NETNAME
--        ----             -----           ---------   ------------    -----   ----------------        ------------    -------
0         MEMBER           STARTED         hostA       hostA           YES                    0                   0    hostA-ib0
1         MEMBER           STARTED         hostB       hostB           NO                     0                   0    hostB-ib0
2         MEMBER           STARTED         hostC       hostC           NO                     0                   0    hostC-ib0
128       CF               PRIMARY         hostD       hostD           NO                     -                   0    hostD-ib0
129       CF               PEER            hostE       hostE           NO                     -                   0    hostE-ib0

HOSTNAME              STATE      INSTANCE_STOPPED ALERT
--------              -----      ---------------- -----
hostA                 ACTIVE     NO               NO
hostB                 ACTIVE     NO               NO
hostC                 ACTIVE     NO               NO
hostD                 ACTIVE     NO               NO
hostE                 ACTIVE     NO               NO

Troubleshooting steps

To help troubleshoot the restart light failure on hostB, take one or both of the following steps:
  • Check the db2diag log file for information about the failure, and then investigate it.
    The following sample output shows the restart light attempt on hostB:
    2009-08-27-23.37.52.416270-240 I6733A457            LEVEL: Event
    PID     : 1093874              TID  : 1             KTID : 2461779
    PROC    : db2star2
    INSTANCE:                      NODE : 000
    HOSTNAME: hostB
    EDUID   : 1
    FUNCTION: Db2, base sys utilities, DB2StartMain, probe:3368
    MESSAGE : Idle process taken over by member
    DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes
    996
    DATA #2 : Database Partition Number, PD_TYPE_NODE, 2 bytes
    0
    Check the diag messages to analyze the errors corresponding to the restart light failure on hostB.
  • See Diagnosing a host reboot with a restart light for steps to diagnose the host failure on hostA.
  • See Diagnosing a cluster file system failure that occurred during restart light for an example on how to troubleshoot this scenario.
  • After you diagnose the problem, clear the alert for the member.