Node quorum with tiebreaker disks
When running on small GPFS clusters, you might want to have the cluster remain online with only one surviving node.
To achieve this, you need to add a tiebreaker disk to the quorum configuration. Node quorum with tiebreaker disks allows you to run with as little as one quorum node available as long as you have access to a majority of the quorum disks (refer to Figure 1). Enabling node quorum with tiebreaker disks starts by designating one or more nodes as quorum nodes. Then one to three disks are defined as tiebreaker disks using the tiebreakerDisks parameter on the mmchconfig command. You can designate any disk to be a tiebreaker.
When utilizing node quorum with tiebreaker disks, there are specific rules for cluster nodes and for tiebreaker disks.
- There is a maximum of eight quorum nodes.
- All quorum nodes need to have access to all of the tiebreaker disks.
- You may have an unlimited number of non-quorum nodes.
- If a network connection fails, which causes the loss of quorum,
and quorum is maintained by tiebreaker disks, the following rationale
is used to re-establish quorum. If a group has the cluster manager,
it is the
survivor
. The cluster manager can give up its role if it communicates with fewer than the minimum number of quorum nodes as defined by the minQuorumNodes configuration parameter. In this case, other groups with the minimum number of quorum nodes (if they exist) can choose a new cluster manager.
Changing quorum semantics:
When using the cluster configuration repository (CCR) to store configuration files, the total number of quorum nodes is limited to eight, regardless of quorum semantics, but the use of tiebreaker disks can be enabled or disabled at any time by issuing an mmchconfig tiebreakerDisks command. The change will take effect immediately, and it is not necessary to shut down GPFS when making this change.
- You can have one, two, or three tiebreaker disks. However, you should use an odd number of tiebreaker disks.
- Among the quorum node groups that appear after an interconnect failure, only those having access to a majority of tiebreaker disks can be candidates to be the survivor group.
- Tiebreaker disks must be connected to all quorum nodes.
- In a CCR-based cluster, when adding
tiebreaker disks:
- GPFS should be up and running, if tiebreaker disks are part of the file system.
- GPFS can be either running or shut down, if tiebreaker disks are not a part of the file system.
When a quorum node detects loss of network connectivity, but before GPFS runs the algorithm that decides if the node will remain in the cluster, the tiebreakerCheck event is triggered. This event is generated only in configurations that use quorum nodes with tiebreaker disks. It is also triggered on the cluster manager periodically by a challenge-response thread to verify that the node can still continue as cluster manager.