Interpretation of status information

When you query hosts, members or cluster caching facilities for status information, the system presents state and alert information that tells you about the status of the various components in your Db2® pureScale® environment. When problems arise, you generally need to examine both states and alerts to understand what is happening in the system.

The state of a host, member or cluster caching facility (also known as a CF) reflects its operational status. When everything is operating normally, the values reported for the state of hosts, members and cluster caching facilities (also known as CFs) can give you a general idea of the status of your system. For example, a status of RESTARTING, or WAITING_FOR_FAILBACK on a member does not itself indicate that there is a problem. There might be several valid reasons why a member is failing over to a new host, or restarting on its home host, such as when hosts are taken offline for maintenance. If a member is failing over on a frequent, repeated basis, there might be a problem that warrants further investigation.

An alert for a host, member or CF is an indication that a problem exists that might require investigation or intervention. Looking at alerts in the context of the state of a given system component can reveal additional information about the source of the problem. The sections that follow outline the various combinations of state and alert information that you might encounter for hosts, members or cluster caching facilities, and how to interpret different combinations of states and alerts.

Remember: The completeness of state and alert information returned by the interfaces that report on this information depends on the following factors:
  • The type of instance in which the table function, administrative view, or command is being run (for example, Db2 pureScale instances or other database instances)
  • Whether a supported cluster manager is employed in that instance. All Db2 pureScale Feature deployments use a cluster manager.
See Differences in reporting for data-sharing and environments other than Db2 pureScale environments for details.

Host status

You can view information about the hosts in a Db2 pureScale environment using a number of different interfaces. One such interface is the DB2_CLUSTER_HOST_STATE administrative view. For example, consider this SQL query:
SELECT varchar(HOSTNAME,10) AS HOST,
       varchar(STATE,8) AS STATE, 
       varchar(INSTANCE_STOPPED,7) AS STOPPED, 
       ALERT  
FROM SYSIBMADM.DB2_CLUSTER_HOST_STATE
The output of running the preceding SQL statement would look like this:

HOST       STATE    STOPPED ALERT
---------- -------- ------- --------
HOSTD      ACTIVE   NO      NO
HOSTB      ACTIVE   NO      NO
HOSTA      ACTIVE   YES     NO
HOSTC      ACTIVE   NO      NO

  4 record(s) selected.

(In the preceding example, the STOPPED column corresponds to the INSTANCE_STOPPED column returned by the administrative view.)

The values for the state, instance_stopped and alert columns can take on different values, depending on the conditions at any given time. The possible values are summarized in Table 1.
Table 1. Combinations of state, instance_stopped, and alerts possible on a host system in a Db2 pureScale instance
STATE INSTANCE_STOPPED ALERT Description
ACTIVE NO NO The host is active and operating normally.
YES The host is active (that is, it responds to system commands), however there might be a problem preventing it from participating in the Db2 pureScale instance. For example, there might be a file system problem or a network communication issue, or the idle processes that the Db2 pureScale Feature requires for performing failovers might not be running.
YES NO The host is active. The instance has been stopped explicitly on this host by the administrator using the db2stop instance on hostname command
YES The host is active, however, an alert exists for the host that has not been cleared. The administrator has explicitly stopped the instance.
INACTIVE NO NO Not applicable. A host cannot be INACTIVE when both INSTANCE_STOPPED and ALERT are set to NO.
YES The host is not responding to system commands. The instance was not stopped explicitly by the administrator, however there is an alert. This combination of status information indicates the abnormal shutdown of a host. Such a shutdown might arise, for example, from a power failure on a host.
YES NO Normal state when the instance has been stopped by the administrator. Such a combination of status information might arise when the host is being taken offline for the installation of software updates.
YES The host is not responding to system commands. An alert exists for the host that has not been cleared, but the instance was stopped explicitly by the administrator (that is, the system did not shut down abnormally).
Tip: You can see details about alerts using the DB2_INSTANCE_ALERTS administrative view. See Viewing details for an alert for an example.

Member status

You can view member states and alerts using several different interfaces. One such interface is the DB2_MEMBER administrative view. The DB2_MEMBER administrative view shows status information for members in a Db2 pureScale instance. What follows is an example of how to use this administrative view to retrieve member status:
SELECT ID,  
       varchar(STATE,21) AS STATE, 
       varchar(HOME_HOST,10) AS HOME_HOST, 
       varchar(CURRENT_HOST,10) AS CUR_HOST, 
       ALERT 
FROM SYSIBMADM.DB2_MEMBER
The values for the state, and alert columns can take on different values, depending on the conditions at any given time. The possible values are summarized in Table 2.
Table 2. Combinations of state, and alerts possible for members in a Db2 pureScale instance
STATE ALERT Description
STARTED NO The member is started in the instance and is operating normally.
YES The member is started in the instance. However, at some point, there was an unsuccessful attempt to fail over to another host. Since that unsuccessful attempt to fail over, the member was able to fail over successfully to another host, or it has failed back to its home host. If the member is running on it its home host, it is running normally; if it is running on a guest host, it is running in light mode. Either way, investigate the alert to determine what happened.
STOPPED NO The member has been stopped by the administrator using the db2stop command.
YES The member has been stopped by the administrator using the db2stop command, however, the alert field has not yet been cleared.
RESTARTING NO The member is starting.
YES The member is starting. However, at some point, there was an unsuccessful attempt to start the member on the home host or to fail over to another host. The alert field has not yet been cleared.
WAITING_FOR_FAILBACK NO The member is running in light mode on a guest host, and is waiting to fail back to the home host. You might want to examine the status of the home host to see if anything is preventing the member from failing back to the home host (for example, a failed network adapter).
YES An attempt to restart the member on the home host might have failed, automatic failback is disabled, or crash recovery might have failed. You need to resolve the problem and clear the alert manually before the member can automatically fail back to its home host. If automatic failback is disabled, manually clear the alert and enable automatic failback using the db2cluster command.
ERROR YES Db2 cluster services was not able to start the member  on any host. You need to resolve the problem and clear the alert manually before attempting to restart the instance.
Tip: You can see details about alerts using the DB2_INSTANCE_ALERTS administrative view. See Viewing details for an alert for an example.

cluster caching facility status

The DB2_GET_INSTANCE_INFO table function lets you retrieve status information for members in a Db2 pureScale instance. One of the benefits of the table function is that you can pass parameters to it to narrow the scope of the results returned. For example, to retrieve information about CFs in a Db2 pureScale instance, you can construct a query such as:
SELECT ID,
       varchar(STATE,17) AS STATE,
       varchar(HOME_HOST,10) AS HOME_HOST,
       varchar(CURRENT_HOST,10) AS CUR_HOST,
       ALERT 
FROM TABLE(DB2_GET_INSTANCE_INFO(NULL,'','','CF',NULL))
The values for the state, and alert columns can take on different values, depending on the conditions at any given time. The possible values are summarized in Table 3.
Table 3. Combinations of state, and alerts possible for cluster caching facilities in a Db2 pureScale instance
STATE ALERT Description
STOPPED NO The cluster caching facility (also known as a CF) has been manually stopped using the db2stop command.
YES There has been an unsuccessful attempt by the CF to become the primary CF. The cluster caching facility has been manually stopped in the instance by the administrator using the db2stop command.
RESTARTING NO The CF is restarting, either as a result of the db2start command, or after a primary CF failure.
YES The CF is restarting, however, there is a pending alert from a previous failed attempt by the CF to take on the primary role that must be cleared manually.
BECOMING_PRIMARY NO The CF will take on the role of primary CF if there is no other primary CF already running in the instance.
YES Not applicable. The CF cannot attempt to take on the primary role with an alert condition set.
PRIMARY NO The CF has taken on the role of primary CF and is operating normally.
YES Not applicable. The CF cannot be acting as the primary CF with an alert condition set.
CATCHUP(n%) NO This secondary CF is in the process of copying information from the primary CF required for it to operate in PEER mode.
Note: When you view the status of the secondary CF using the command db2instance -list, it will be in CATCHUP state until a connection is made to the database. Once the first connection is made, the process of copying data from the primary CF begins.
YES This secondary CF is in the process of copying information from the primary CF required for it to operate in PEER mode. There is a pending alert from a previous failed attempt by this CF to take on the primary role that must be cleared manually.
PEER NO This secondary CF is ready to assume the role of primary CF if the current primary CF fails.
YES This secondary CF is ready to assume the role of primary CF if the current primary CF fails. There is a pending alert from a previous failed attempt by this CF to take on the primary role that must be cleared manually.
ERROR YES The CF could not be started on any host in the instance. You need to resolve the problem and clear the alert manually before attempting to restart the instance.
Tip: You can see details about alerts using the DB2_INSTANCE_ALERTS administrative view. See Viewing details for an alert for an example.

Differences in reporting for data-sharing and environments other than Db2 pureScale environments

All the various table functions, administrative views and commands that report status data for hosts, members and cluster caching facilities can be used outside of a Db2 pureScale instance. However, the results returned by these interfaces might be different from what you see in a Db2 pureScale instance.
In a configuration that uses a clustered file system with a supported cluster manager (CM) (a configuration sometimes known as "integrated High Availability" or "integrated HA") the results returned for most of these status-reporting interfaces will resemble what you see in a Db2 pureScale instance. One exception is when retrieving information about hosts in your instance using the DB2_GET_CLUSTER_HOST_STATE table function or the DB2_CLUSTER_HOST_STATE administrative view. Outside of a Db2 pureScale instance with integrated HA, neither of these interfaces will return the INSTANCE_STOPPED column. The results of a query that uses the DB2_CLUSTER_HOST_STATE administrative view, for example, resemble those shown in Figure 1
Figure 1. Results returned by the DB2_CLUSTER_HOST_STATE administrative view outside of a Db2 pureScale instance with a cluster manager.
HOSTNAME STATE  INSTANCE_STOPPED ALERT
-------- ------ ---------------- -----
HOSTA    ACTIVE -                NO
HOSTB    ACTIVE -                NO
HOSTC    ACTIVE -                NO
HOSTD    ACTIVE -                NO
Another exception is any interface that specifically reports on status for cluster caching facilities. Outside of a Db2 pureScale environment, there are no cluster caching facilities, so there is no status to report. For example, the DB2_CF administrative view returns results similar to the following in an environment other than a Db2 pureScale environment:
Figure 2. Results returned by the DB2_CF administrative view outside of a Db2 pureScale instance.
ID     CURRENT_HOST        STATE      ALERT
------ ------------------- ---------- -----

     0 record(s) selected.
When the status-reporting interfaces are used in an instance without a CM, no status or alert information is returned at all. For example, the results of a query that uses the DB2_CLUSTER__HOST_STATE administrative view resemble those in Figure 3
Figure 3. Results returned by the DB2_CLUSTER_HOST_STATE administrative view outside of a Db2 pureScale instance without a cluster manager.
HOSTNAME STATE  INSTANCE_STOPPED ALERT
-------- ------ ---------------- -----
HOSTA    -      -                -
HOSTB    -      -                -
HOSTC    -      -                -
HOSTD    -      -                -

     4 record(s) selected.