Scenario: Restart light

This scenario describes the steps that occur during a member restart in light mode. It covers the most common case where there is a single host failure that causes that host's resident member to be automatically restarted as a guest member on another host that is still active. The scenario also covers how the guest member is failed back to its home host.

Initial setup

There are six hosts (HostA, HostB, HostC, HostD, HostE, HostF) in the Db2® pureScale® instance:
  • Member 10 is running on HostA (its home host)
  • Member 20 is running on HostB (its home host)
  • Member 30 is running on HostC (its home host)
  • Member 40 is running on HostD (its home host)
  • cluster caching facility 128 (CF 128) is running on HostE
  • cluster caching facility 129 (CF 129) is running on HostF
There is a set of Db2 idle processes for the instance on each host with pre-allocated memory that is reserved for restart light recovery purposes. Db2 cluster services monitors all the resources in the cluster.
The status information for the hosts, members, and CFs can be displayed by using the LIST INSTANCE command (or any of the other interfaces described in Interfaces for retrieving status information for Db2 pureScale instances). At this point, the LIST INSTANCE command returns:

LIST INSTANCE

MEMBER_ID TYPE    STATE   HOME_HOST CURRENT_HOST ALERT 
--------- ------- ------- --------- ------------ ----- 
       10 MEMBER  STARTED hostA     hostA        NO    
       20 MEMBER  STARTED hostB     hostB        NO    
       30 MEMBER  STARTED hostC     hostC        NO    
       40 MEMBER  STARTED hostD     hostD        NO    
      128 CF      PRIMARY hostE     -            NO    
      129 CF      PEER    hostF     -            NO    


HOSTNAME STATE  INSTANCE_STOPPED ALERT
-------- ------ ---------------- -----
hostA    ACTIVE NO               NO
hostB    ACTIVE NO               NO
hostC    ACTIVE NO               NO 
hostD    ACTIVE NO               NO
hostE    ACTIVE NO               NO
hostF    ACTIVE NO               NO

Host failure

A power failure occurs on the HostA server. Db2 cluster services cannot restart member 10 on HostA so it restarts the member in light mode on the next available host: HostB.

At this point, the LIST INSTANCE command shows that member 10's state is now RESTARTING and its current host is now HostB, and the state of HostA is INACTIVE (note that the INSTANCE_STOPPED field is not set because the instance was not manually stopped on HostA) and it has an alert:

LIST INSTANCE

MEMBER_ID TYPE    STATE      HOME_HOST CURRENT_HOST ALERT 
--------- ------- ---------- --------- ------------ ----- 
       10 MEMBER  RESTARTING hostA     hostB        NO    
       20 MEMBER  STARTED    hostB     hostB        NO    
       30 MEMBER  STARTED    hostC     hostC        NO    
       40 MEMBER  STARTED    hostD     hostD        NO    
      128 CF      PRIMARY    hostE     -            NO    
      129 CF      PEER       hostF     -            NO    


HOSTNAME STATE    INSTANCE_STOPPED ALERT
-------- -------- ---------------- -----
hostA    INACTIVE NO               YES
hostB    ACTIVE   NO               NO
hostC    ACTIVE   NO               NO 
hostD    ACTIVE   NO               NO
hostE    ACTIVE   NO               NO
hostF    ACTIVE   NO               NO

Waiting for failback

After the process model is started, member crash recovery is performed on each database that requires it. To check the progress of the member crash recovery, use the LIST UTILITIES command with the SHOW DETAIL option, as described in Monitoring members in restart light. After member crash recovery completes, member 10 waits to be failed back to HostA and will not be able to process any new transactions until then. There can be indoubt transactions that must be resolved as member 10 is waiting to be failed back.

At this point, the LIST INSTANCE command shows that member 10's state is now WAITING_FOR_FAILBACK:

LIST INSTANCE

MEMBER_ID TYPE    STATE                HOME_HOST CURRENT_HOST ALERT 
--------- ------- -------------------- --------- ------------ ----- 
       10 MEMBER  WAITING_FOR_FAILBACK hostA     hostB        NO    
       20 MEMBER  STARTED              hostB     hostB        NO    
       30 MEMBER  STARTED              hostC     hostC        NO    
       40 MEMBER  STARTED              hostD     hostD        NO    
      128 CF      PRIMARY              hostE     -            NO    
      129 CF      PEER                 hostF     -            NO    


HOSTNAME STATE    INSTANCE_STOPPED ALERT
-------- -------- ---------------- -----
hostA    INACTIVE NO               YES
hostB    ACTIVE   NO               NO
hostC    ACTIVE   NO               NO 
hostD    ACTIVE   NO               NO
hostE    ACTIVE   NO               NO
hostF    ACTIVE   NO               NO

Issue with host resolved

Power® is restored to HostA, so HostA becomes active in the Db2 pureScale instance again.

At this point, the LIST INSTANCE command shows that HostA is now active and the alert has been cleared:

LIST INSTANCE

MEMBER_ID TYPE    STATE                HOME_HOST CURRENT_HOST ALERT 
--------- ------- -------------------- --------- ------------ ----- 
       10 MEMBER  WAITING_FOR_FAILBACK hostA     hostB        NO    
       20 MEMBER  STARTED              hostB     hostB        NO    
       30 MEMBER  STARTED              hostC     hostC        NO    
       40 MEMBER  STARTED              hostD     hostD        NO    
      128 CF      PRIMARY              hostE     -            NO    
      129 CF      PEER                 hostF     -            NO    


HOSTNAME STATE  INSTANCE_STOPPED ALERT
-------- ------ ---------------- -----
hostA    ACTIVE NO               NO
hostB    ACTIVE NO               NO
hostC    ACTIVE NO               NO 
hostD    ACTIVE NO               NO
hostE    ACTIVE NO               NO
hostF    ACTIVE NO               NO

Failing back to the home host

Db2 cluster services detects that HostA is active and automatically fails back member 10 to that host.

At this point, the LIST INSTANCE command shows that the state of member 10 is now RESTARTING and its current host is again HostA:

LIST INSTANCE SHOW DETAIL

MEMBER_ID TYPE    STATE      HOME_HOST CURRENT_HOST ALERT 
--------- ------- ---------- --------- ------------ ----- 
       10 MEMBER  RESTARTING hostA     hostA        NO    
       20 MEMBER  STARTED    hostB     hostB        NO    
       30 MEMBER  STARTED    hostC     hostC        NO    
       40 MEMBER  STARTED    hostD     hostD        NO    
      128 CF      PRIMARY    hostE     -            NO    
      129 CF      PEER       hostF     -            NO    


HOSTNAME STATE  INSTANCE_STOPPED ALERT
-------- ------ ---------------- -----
hostA    ACTIVE NO               NO
hostB    ACTIVE NO               NO
hostC    ACTIVE NO               NO 
hostD    ACTIVE NO               NO
hostE    ACTIVE NO               NO

Restarting on the home host

When member 10 successfully completes member restart on HostA, its state is changed to STARTED, and it can now process new transactions and accept user connections. At this point, the LIST INSTANCE command returns the same information about the Db2 pureScale instance as in Initial setup.