Restart light fails on one host, but completes on another host
During a restart light, a member fails over to a guest host, so that the recovery process can complete. Use the information in this topic to help you diagnose why a restart light on the initial guest host is unsuccessful, but then completes successfully on a second guest host.
Symptoms
The following sample output from
the db2instance -list command
shows an environment with three members and
two cluster caching facilities:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostC YES 0 1 hostC-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
HOSTNAME STATE INSTANCE_STOPPED ALERT
-------- ----- ---------------- -----
hostA INACTIVE NO YES
hostB ACTIVE NO NO
hostC ACTIVE NO NO
hostD ACTIVE NO NO
hostE ACTIVE NO NO
Member 0
experienced a problem with its home host, hostA, and attempted a restart
light on hostB. However, the restart light failed on hostB. The
member then attempted a restart light on hostC, which was successful.If
hostA becomes available again, its state will change from INACTIVE
to ACTIVE. member 0
will fail back to hostA, and the state of the member will
change from WAITING_FOR_FAILBACK to STARTED.
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER STARTED hostA hostA YES 0 0 hostA-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
HOSTNAME STATE INSTANCE_STOPPED ALERT
-------- ----- ---------------- -----
hostA ACTIVE NO NO
hostB ACTIVE NO NO
hostC ACTIVE NO NO
hostD ACTIVE NO NO
hostE ACTIVE NO NO
Troubleshooting steps
To help troubleshoot
the restart light failure on hostB, take one or both of the following
steps:
- Check the db2diag log file for information about the failure,
and then investigate it.The following sample output shows the restart light attempt on hostB:
Check the diag messages to analyze the errors corresponding to the restart light failure on hostB.2009-08-27-23.37.52.416270-240 I6733A457 LEVEL: Event PID : 1093874 TID : 1 KTID : 2461779 PROC : db2star2 INSTANCE: NODE : 000 HOSTNAME: hostB EDUID : 1 FUNCTION: Db2, base sys utilities, DB2StartMain, probe:3368 MESSAGE : Idle process taken over by member DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes 996 DATA #2 : Database Partition Number, PD_TYPE_NODE, 2 bytes 0
- See Diagnosing a host reboot with a restart light for steps to diagnose the host failure on hostA.
- See Diagnosing a cluster file system failure that occurred during restart light for an example on how to troubleshoot this scenario.
- After you diagnose the problem, clear the alert for the member.