Limitations of CCR
Ensure that you are aware of the following limitations of the Cluster configuration repository (CCR) to identify the workarounds, if any.
CCR limitation when the cluster is configured with two quorum nodes and at least one tiebreaker disk
You might need to shut down quorum nodes during a maintenance process. In a cluster with two quorum nodes, the cluster might not be able to reach quorum even when if one quorum node is active and it has access to the tiebreaker disks.
The reason for this limitation is that the CCR stores the committed files and the file updates only on quorum nodes and not on tiebreaker disks. After one quorum node becomes active, the CCR server on that quorum node reads the Paxos state from the tiebreaker disks during startup. This process might find out an occurrence of file update in the past. This file update went only to the other quorum node, which is not available when this quorum node is started up. This action results in no CCR quorum during startup.
Recommendation to avoid this limitation
Shut down quorum nodes one at a time. That is, shut down the second quorum node only after the first quorum node is started up and GPFS is in active state on the first quorum node. Use the mmchmgr command to assign the cluster manager role to the quorum node that remains active before you shut down the other quorum node.
mmlscluster
GPFS cluster information
========================
GPFS cluster name: gpfs-cluster-2.localnet.com
GPFS cluster id: 13445038716777666550
GPFS UID domain: gpfs-cluster-2.localnet.com
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
----------------------------------------------------------------------------
1 node-21.localnet.com 10.0.100.21 node-21.localnet.com quorum
2 node-22.localnet.com 10.0.100.22 node-22.localnet.com quorum
3 node-23.localnet.com 10.0.100.23 node-23.localnet.com
4 node-24.localnet.com 10.0.100.24 node-24.localnet.com
5 node-25.localnet.com 10.0.100.25 node-25.localnet.com
mmlsconfig tiebreakerDisks
tiebreakerDisks disk1
mmgetstate -a
Node number Node name GPFS state
-------------------------------------
1 node-21 active
2 node-22 active
3 node-23 active
4 node-24 active
5 node-25 active
mmlsmgr
file system manager node
---------------- ------------------
gpfs0 10.0.100.21 (node-21)
Cluster manager node: 10.0.100.21 (node-21)
mmgetstate -a
Node number Node name GPFS state
-------------------------------------
1 node-21 active
2 node-22 unknown
3 node-23 active
4 node-24 active
5 node-25 active
mmgetstate -a
Node number Node name GPFS state
-------------------------------------
1 node-21 active
2 node-22 active
3 node-23 active
4 node-24 active
5 node-25 active
mmchmgr -c node-22
Appointing node 10.0.100.22 (node-22) as cluster manager
Node 10.0.100.22 (node-22) has taken over as cluster manager
mmchmgr gpfs0 node-22
Sending migrate request to current manager node 10.0.100.21 (node-21).
Node 10.0.100.21 (node-21) resigned as manager for gpfs0.
Node 10.0.100.22 (node-22) appointed as manager for gpfs0.
mmlsmgr
file system manager node
---------------- ------------------
gpfs0 10.0.100.22 (node-22)
Cluster manager node: 10.0.100.22 (node-22)
mmgetstate -a
Node number Node name GPFS state
-------------------------------------
1 node-21 unknown
2 node-22 active
3 node-23 active
4 node-24 active
5 node-25 active
mmgetstate -a
Node number Node name GPFS state
-------------------------------------
1 node-21 active
2 node-22 active
3 node-23 active
4 node-24 active
5 node-25 active
CCR Limitation on using Persistent Reserve (PR) when a disk is already set as a CCR tiebreaker disk
mmchconfig usePersistentReserve=yes
Verifying GPFS is stopped on all nodes ...
mmchconfig: Processing disk gpfs1nsd
mmchconfig: chdev failed to set PR_key_value for hdisk2.
mmchconfig: 6027-1940 Unable to set reserve policy PR_shared on disk gpfs1nsd on node node-22.localnet.com.
mmchconfig: 6027-1214 Unable to enable Persistent Reserve on the following disks:
gpfs1nsd
mmchconfig: The usePersistentReserve parameter will remain unchanged.
mmchconfig: 6027-1639 Command failed. Examine previous error messages to determine cause.
CCR uses the tiebreaker disks and opens access to those disks on the quorum nodes to use them for the Paxos protocol. This limitation prevents the mmchconfig command from setting the Persistent Reserve (PR) flag successfully.
The workaround to set the PR flag successfully is as follows:
- Configure the CCR without tiebreaker disks, by using the mmchconfig tiebreakerDisks=no command.
- Set the PR flag, by using the mmchconfig usePersistentReserve=yes.
- Reconfigure the CCR with the original tiebreaker disks, by using the mmchconfig tiebreakerDisks=<ORIGINAL_TIEBREAKER_DISKS>.