Checking restart status for members

If you know a member has failed, perhaps because of a power loss or other hardware problem that you have since corrected, you might want to know whether it has restarted successfully.

About this task

You can use the DB2_MEMBER administrative view to examine the operational status of all members in a Db2® pureScale® instance. You can also use the DB2_GET_INSTANCE_INFO table function, which provides options for querying specific hosts.

The process for checking member restart status is exactly as is shown in Example 1 of Viewing status information for members and cluster caching facilities in a Db2 pureScale instance. Specifically, formulate an SQL query that uses the DB2_MEMBER administrative view (or the DB2_GET_INSTANCE_INFO table function) to retrieve values for the following columns:
  • ID
  • HOME_HOST
  • CURRENT_HOST
  • STATE
  • ALERT

Procedure

  1. Formulate the SQL query using whichever interface you prefer.
    This example uses the DB2_MEMBER administrative view:
    SELECT ID,  
           varchar(STATE,21) AS STATE, 
           varchar(HOME_HOST,10) AS HOME_HOST, 
           varchar(CURRENT_HOST,10) AS CUR_HOST, 
           ALERT 
    FROM SYSIBMADM.DB2_MEMBER
  2. Run the query.
    The results returned will look like the following:
    
    ID     STATE                 HOME_HOST  CUR_HOST   ALERT
    ------ --------------------- ---------- ---------- --------
         0 STARTED               HOSTA      HOSTA      NO
         1 STARTED               HOSTB      HOSTB      NO
         2 STARTED               HOSTC      HOSTC      NO
    
      3 record(s) selected.
    In the previous example, all members are running on their own hosts with no alerts.

Results

When looking at the restart status for the members, check that:
  • The value for the STATE column is either RESTARTING or STARTED. The former is an indication that the member is in the process of being restarted; the latter indicates that it has successfully restarted. If the state is RESTARTING, check the status again in a few minutes to see if the state has changed to STARTED.
  • The value for CUR_HOST is the same as the value for HOME_HOST. This indicates that the member is running on its home host.
  • There is no YES value in the alert column for the member you are interested in.
If CUR_HOST is different that HOME_HOST, or if the state has not moved beyond RESTARTING, or remains as WAITING_FOR_FAILBACK, or if there is a YES value in the alert column, then there might be a problem that requires further investigation.

Example

Example 1: Failed member in the process of restarting
In this example, member 0 is in the process of restarting on its home host, HOSTA.
ID     STATE                 HOME_HOST  CUR_HOST   ALERT
------ --------------------- ---------- ---------- --------
     0 RESTARTING            HOSTA      HOSTA      NO
     1 STARTED               HOSTB      HOSTB      NO
     2 STARTED               HOSTC      HOSTC      NO

  3 record(s) selected.

To see if the restart is ultimately successful, run the query again in a few seconds.

Example 2: Failed member that is not able to restart
In this example, member 0 is waiting to fail back to its home host. Currently, it is running in light mode on HOSTB.
ID     STATE                 HOME_HOST  CUR_HOST   ALERT
------ --------------------- ---------- ---------- --------
     0 WAITING_FOR_FAILBACK  HOSTA      HOSTB      NO
     1 STARTED               HOSTB      HOSTB      NO
     2 STARTED               HOSTC      HOSTC      NO

  3 record(s) selected.
In this case, you might want to check the host status for HOSTA to see if there is an issue. Using the DB2_CLUSTER_HOST_STATE administrative view might return the following results:
HOST       STATE    STOPPED ALERT
---------- -------- ------- --------
HOSTD      ACTIVE   NO      NO
HOSTB      ACTIVE   NO      NO
HOSTA      INACTIVE NO      YES
HOSTC      ACTIVE   NO      NO

  4 record(s) selected.
This report shows that there is an alert on HOSTA, and that the host is inactive. However, the instance was not stopped using the db2stop command. Further investigation might reveal an incident such as a loss of power to this host. Once the problem with the host is resolved, check the restart status again to see if the member is able to restart.