Recovery Group Issues

An ESS 3000 recovery group has a different structure from the recovery groups in ESS version 5.3.5.

The recovery groups in ESS 5.3.5 are called paired recovery groups and always come in pairs, dividing ownership of the enclosure disks in half, with one recovery group primary to each of the two servers in the ESS building block. An ESS 3000 building block contains two canister servers and an NVMe enclosure, and configures as a single recovery group that is simultaneously active on both canister servers. An ESS 3000 recovery group is called a shared recovery group because the enclosure disks are shared by both the canister servers in the building block. The single shared recovery group structure is necessitated because the ESS 3000 can have as few as 12 disks, which is the smallest number of disks a recovery group can contain. having 12 disks allows for one equivalent spare and 11-wide 8+3P RAID codes. In contrast, ESS 5.3.5 building blocks always contain a minimum of 24 disks, which can therefore be divided into two paired recovery groups of at least 12 disks.

The following example displays a server pair of a representative ESS 5.3.5 building block, that is using the individual building block node class ESS:

 # mmvdisk server list --node-class ESS
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     1  server1.gpfs.net                  yes      serving ESSRG1
     2  server2.gpfs.net                  yes      serving ESSRG2
#

Server workload within the building block is balanced by each server that is serving one of the two paired recovery groups. The following example displays a canister server pair of a representative ESS 3000 building block, that is using the individual building block node class ESS3000:

 # mmvdisk server list --node-class ESS3000
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESS3000RG: LG002, LG004
     4  canister2.gpfs.net                yes      serving ESS3000RG: root, LG001, LG003

In the case of ESS 3000, each server is simultaneously serving the same single recovery group, ESS3000RG.The server workload within the building block is balanced by subdividing the single shared recovery group into the following log groups: LG001, LG002, LG003, LG004, and the lightweight root or master log group. The non-root log groups are called user log groups. Only the user log groups contain the file system vdisk NSDs.

All recovery groups in a cluster can be listed by using the mmvdisk recoverygroup list command:

# mmvdisk recoverygroup list
                                                            needs    user
recovery group  active   current or master server          service  vdisks  remarks
--------------  -------  --------------------------------  -------  ------  -------
ESS3000RG       yes      canister2.gpfs.net                no           16
ESSRG1          yes      server1.gpfs.net                  no            8
ESSRG2          yes      server2.gpfs.net                  no            8

The needs service column in all the IBM Spectrum Scale RAID commands is narrowly defined to mean whether a disk in the recovery group is called out for replacement. The mmvdisk recoverygroup list --not-ok command can be used to show other recovery group issues, including those involving log groups or servers:

# mmvdisk recoverygroup list --not-ok
recovery group  remarks
--------------  -------
ESS3000RG       server canister2.gpfs.net 'down'
#

If one server of an ESS 3000 shared recovery group is down, all the log groups must failover to the remaining server:

 # mmvdisk recoverygroup list --server --recovery-group ESS3000RG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESS3000RG: root, LG001, LG002, LG003, LG004
     4  canister2.gpfs.net                no       configured

When the down server is brought back up, the Recovery Group Configuration Manager (RGCM) process that is running on the cluster manager node assigns it two of the user log groups to rebalance the recovery group server workload.

Other than cases where there is a failover or while servers are rejoining a recovery group, RGCM must always keep two user log groups on each server. In the unlikely event that both servers are active but each server does not have two user log groups, you can shut down one of the servers and restart it. Shutting down the servers and restarting them causes the RGCM to redistribute the user log groups to the servers.

For example, consider a situation where the following allocation of log groups lasts for five or more minutes:

 # mmvdisk recoverygroup list --server --recovery-group ESS3000RG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESS3000RG: root, LG001, LG002, LG003
     4  canister2.gpfs.net                yes      serving ESS3000RG: LG004

In such cases, shutting down canister2 and starting it back up restores the log group workload balance in the building block within five or fewer minutes:

# mmshutdown -N canister2.gpfs.net
# mmstartup -N canister2.gpfs.net
# sleep 300
# mmvdisk recoverygroup list --server --recovery-group ESS3000RG
 node
number  server                            active   remarks
------  --------------------------------  -------  -------
     3  canister1.gpfs.net                yes      serving ESS3000RG: root, LG002, LG003
     4  canister2.gpfs.net                yes      serving ESS3000RG: LG001, LG003