Recovery group issues for shared recovery groups

An ESS 3000, ESS 3200, or Start of change ESS 3500 End of change recovery group is called a shared recovery group because the enclosure disks are shared by both the canister servers in the building block. These building block contains two canister servers and an NVMe enclosure, and configures as a single recovery group that is simultaneously active on both canister servers.

The single shared recovery group structure is necessitated because the ESS system can have as few as 12 disks, which is the smallest number of disks a recovery group can contain. Having 12 disks allows for one equivalent spare and 11-wide 8+3P RAID codes.

The following example displays a canister server pair of a representative ESS building block that is using the individual building block node class ESSNC:

 # mmvdisk server list --node-class ESSNC
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESSRG: LG002, LG004
     4  canister2.gpfs.net                yes      serving ESSRG: root, LG001, LG003

For these ESS systems, each server is simultaneously serving the same single recovery group, ESSRG. The server workload within the building block is balanced by subdividing the single shared recovery group into the following log groups: LG001, LG002, LG003, LG004, and the lightweight root or master log group. The non-root log groups are called user log groups. Only the user log groups contain the file system vdisk NSDs.

All recovery groups in a cluster can be listed by using the mmvdisk recoverygroup list command:

# mmvdisk recoverygroup list
                                                            needs    user
recovery group  active   current or master server          service  vdisks  remarks
--------------  -------  --------------------------------  -------  ------  -------
ESSRG           yes      canister2.gpfs.net                no           16
ESSRG1          yes      server1.gpfs.net                  no            8
ESSRG2          yes      server2.gpfs.net                  no            8

The needs service column in all the IBM Spectrum Scale RAID commands is narrowly defined to mean whether a disk in the recovery group is called out for replacement. The mmvdisk recoverygroup list --not-ok command can be used to show other recovery group issues, including those involving log groups or servers:

# mmvdisk recoverygroup list --not-ok
recovery group  remarks
--------------  -------
ESSRG       server canister2.gpfs.net 'down'
#

If one server of an ESS shared recovery group is down, all the log groups must failover to the remaining server:

 # mmvdisk recoverygroup list --server --recovery-group ESSRG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESSRG: root, LG001, LG002, LG003, LG004
     4  canister2.gpfs.net                no       configured

When the down server is brought back up, the Recovery Group Configuration Manager (RGCM) process that is running on the cluster manager node assigns it two of the user log groups. The two user log groups are used to rebalance the recovery group server workload. For more information, see Server failover for shared recovery groups.

Other than cases where a failover occurs or while servers are rejoining a recovery group, RGCM must always keep two user log groups on each server. In the unlikely event that both servers are active but each server does not have two user log groups, you can shut down one of the servers and restart it. Shutting down the servers and restarting them causes the RGCM to redistribute the user log groups to the servers.

For example, consider a situation where the following allocation of log groups lasts for five or more minutes:

 # mmvdisk recoverygroup list --server --recovery-group ESSRG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESSRG: root, LG001, LG002, LG003
     4  canister2.gpfs.net                yes      serving ESSRG: LG004

In such cases, shutting down canister2 and starting it back up restores the log group workload balance in the building block within five or fewer minutes:

# mmshutdown -N canister2.gpfs.net
# mmstartup -N canister2.gpfs.net
# sleep 300
# mmvdisk recoverygroup list --server --recovery-group ESSRG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESSRG: root, LG002, LG003
     4  canister2.gpfs.net                yes      serving ESSRG: LG001, LG003