Node quorum with tiebreaker disks

When running on small GPFS clusters, you might want to have the cluster remain online with only one surviving node.

To achieve this, you need to add a tiebreaker disk to the quorum configuration. Node quorum with tiebreaker disks allows you to run with as little as one quorum node available as long as you have access to a majority of the quorum disks (refer to Figure 1). Enabling node quorum with tiebreaker disks starts by designating one or more nodes as quorum nodes. Then one to three disks are defined as tiebreaker disks using the tiebreakerDisks parameter on the mmchconfig command. You can designate any disk to be a tiebreaker.

When utilizing node quorum with tiebreaker disks, there are specific rules for cluster nodes and for tiebreaker disks.

Cluster node rules:
  1. There is a maximum of eight quorum nodes.
  2. All quorum nodes need to have access to all of the tiebreaker disks.
  3. You may have an unlimited number of non-quorum nodes.
  4. If a network connection fails, which causes the loss of quorum, and quorum is maintained by tiebreaker disks, the following rationale is used to re-establish quorum. If a group has the cluster manager, it is the survivor. The cluster manager can give up its role if it communicates with fewer than the minimum number of quorum nodes as defined by the minQuorumNodes configuration parameter. In this case, other groups with the minimum number of quorum nodes (if they exist) can choose a new cluster manager.

Changing quorum semantics:

When using the cluster configuration repository (CCR) to store configuration files, the total number of quorum nodes is limited to eight, regardless of quorum semantics, but the use of tiebreaker disks can be enabled or disabled at any time by issuing an mmchconfig tiebreakerDisks command. The change will take effect immediately, and it is not necessary to shut down GPFS when making this change.

Tiebreaker disk rules:
  • You can have one, two, or three tiebreaker disks. However, you should use an odd number of tiebreaker disks.
  • Among the quorum node groups that appear after an interconnect failure, only those having access to a majority of tiebreaker disks can be candidates to be the survivor group.
  • Tiebreaker disks must be connected to all quorum nodes.
  • In a CCR-based cluster, when adding tiebreaker disks:
    • GPFS should be up and running, if tiebreaker disks are part of the file system.
    • GPFS can be either running or shut down, if tiebreaker disks are not a part of the file system.
In Figure 1 GPFS remains active with the minimum of a single available quorum node and two available tiebreaker disks.
Figure 1. GPFS configuration using node quorum with tiebreaker disks
This graphic depicts a GPFS configuration utilizing node quorum with tiebreaker disks. There are four nodes in the configuration. Three of the nodes are quorum nodes, leaving one non-quorum node. There are three tiebreaker, directly attached disks

When a quorum node detects loss of network connectivity, but before GPFS runs the algorithm that decides if the node will remain in the cluster, the tiebreakerCheck event is triggered. This event is generated only in configurations that use quorum nodes with tiebreaker disks. It is also triggered on the cluster manager periodically by a challenge-response thread to verify that the node can still continue as cluster manager.