Limitations of CCR

Ensure that you are aware of the following limitations of the Cluster configuration repository (CCR) to identify the workarounds, if any.

CCR limitation when the cluster is configured with two quorum nodes and at least one tiebreaker disk

You might need to shut down quorum nodes during a maintenance process. In a cluster with two quorum nodes, the cluster might not be able to reach quorum even when if one quorum node is active and it has access to the tiebreaker disks.

The reason for this limitation is that the CCR stores the committed files and the file updates only on quorum nodes and not on tiebreaker disks. After one quorum node becomes active, the CCR server on that quorum node reads the Paxos state from the tiebreaker disks during startup. This process might find out an occurrence of file update in the past. This file update went only to the other quorum node, which is not available when this quorum node is started up. This action results in no CCR quorum during startup.

Recommendation to avoid this limitation

Shut down quorum nodes one at a time. That is, shut down the second quorum node only after the first quorum node is started up and GPFS is in active state on the first quorum node. Use the mmchmgr command to assign the cluster manager role to the quorum node that remains active before you shut down the other quorum node.

The following example shows the recommended procedure. The cluster that is used in this example is configured with two quorum nodes and one tiebreaker disk:
 mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         gpfs-cluster-2.localnet.com
  GPFS cluster id:           13445038716777666550
  GPFS UID domain:           gpfs-cluster-2.localnet.com
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name      IP address   Admin node name       Designation
----------------------------------------------------------------------------
   1   node-21.localnet.com  10.0.100.21  node-21.localnet.com  quorum
   2   node-22.localnet.com  10.0.100.22  node-22.localnet.com  quorum
   3   node-23.localnet.com  10.0.100.23  node-23.localnet.com
   4   node-24.localnet.com  10.0.100.24  node-24.localnet.com
   5   node-25.localnet.com  10.0.100.25  node-25.localnet.com
 mmlsconfig tiebreakerDisks
tiebreakerDisks disk1
All nodes in the cluster are active and the current cluster manager is node-21:
 mmgetstate -a

 Node number  Node name  GPFS state
-------------------------------------
           1  node-21    active
           2  node-22    active
           3  node-23    active
           4  node-24    active
           5  node-25    active
 mmlsmgr
file system      manager node
---------------- ------------------
gpfs0            10.0.100.21 (node-21)

Cluster manager node: 10.0.100.21 (node-21)
The following example shows that the second quorum node node-22 is shut down for maintenance purposes and the remaining quorum node remains active:
 mmgetstate -a

 Node number  Node name  GPFS state
-------------------------------------
           1  node-21    active
           2  node-22    unknown
           3  node-23    active
           4  node-24    active
           5  node-25    active
After the maintenance for quorum node node-22 is completed, it rejoins the cluster and becomes active again:
 mmgetstate -a

 Node number  Node name  GPFS state
-------------------------------------
           1  node-21    active
           2  node-22    active
           3  node-23    active
           4  node-24    active
           5  node-25    active
To shut down quorum node node-21 for maintenance purposes with minimal disruption, assign the quorum node node-22 as the new cluster manager. If node node-21 is the file system manager for GPFS file systems, you also need to assign it as the new file system manager:
 mmchmgr -c node-22
Appointing node 10.0.100.22 (node-22) as cluster manager
Node 10.0.100.22 (node-22) has taken over as cluster manager
 mmchmgr gpfs0 node-22
Sending migrate request to current manager node 10.0.100.21 (node-21).
Node 10.0.100.21 (node-21) resigned as manager for gpfs0.
Node 10.0.100.22 (node-22) appointed as manager for gpfs0.
 mmlsmgr
file system      manager node
---------------- ------------------
gpfs0            10.0.100.22 (node-22)

Cluster manager node: 10.0.100.22 (node-22)
Now quorum node node-21 can be shut down for maintenance purposes without losing the GPFS quorum:
 mmgetstate -a

 Node number  Node name  GPFS state
-------------------------------------
           1  node-21    unknown
           2  node-22    active
           3  node-23    active
           4  node-24    active
           5  node-25    active
After maintenance is done for quorum node node-21, it rejoins the cluster and becomes active again as shown in the following example:
 mmgetstate -a

 Node number  Node name  GPFS state
-------------------------------------
           1  node-21    active
           2  node-22    active
           3  node-23    active
           4  node-24    active
           5  node-25    active

CCR Limitation on using Persistent Reserve (PR) when a disk is already set as a CCR tiebreaker disk

On AIX® and Linux®, if the CCR is configured with tiebreaker disks, the mmchconfig command fails when setting the usePersistentReserve flag, as shown in the following example:
 mmchconfig usePersistentReserve=yes
Verifying GPFS is stopped on all nodes ...
mmchconfig: Processing disk gpfs1nsd
mmchconfig: chdev failed to set PR_key_value for hdisk2.
mmchconfig: 6027-1940 Unable to set reserve policy PR_shared on disk gpfs1nsd on node node-22.localnet.com.
mmchconfig: 6027-1214 Unable to enable Persistent Reserve on the following disks:
gpfs1nsd
mmchconfig: The usePersistentReserve parameter will remain unchanged.
mmchconfig: 6027-1639 Command failed. Examine previous error messages to determine cause.

CCR uses the tiebreaker disks and opens access to those disks on the quorum nodes to use them for the Paxos protocol. This limitation prevents the mmchconfig command from setting the Persistent Reserve (PR) flag successfully.

The workaround to set the PR flag successfully is as follows:

  • Configure the CCR without tiebreaker disks, by using the mmchconfig tiebreakerDisks=no command.
  • Set the PR flag, by using the mmchconfig usePersistentReserve=yes.
  • Reconfigure the CCR with the original tiebreaker disks, by using the mmchconfig tiebreakerDisks=<ORIGINAL_TIEBREAKER_DISKS>.