Node quorum with tiebreaker disks

When running on small GPFS™ clusters, you might want to have the cluster remain online with only one surviving node.

To achieve this, you need to add a tiebreaker disk to the quorum configuration. Node quorum with tiebreaker disks allows you to run with as little as one quorum node available as long as you have access to a majority of the quorum disks (refer to Figure 1). Enabling node quorum with tiebreaker disks starts by designating one or more nodes as quorum nodes. Then one to three disks are defined as tiebreaker disks using the tiebreakerDisks parameter on the mmchconfig command. You can designate any disk to be a tiebreaker.

When utilizing node quorum with tiebreaker disks, there are specific rules for cluster nodes and for tiebreaker disks.

Cluster node rules:
  1. There is a maximum of eight quorum nodes.
  2. All quorum nodes need to have access to all of the tiebreaker disks.
  3. When using the traditional server-based (non-CCR) configuration repository, you should include the primary and secondary cluster configuration servers as quorum nodes.
  4. You may have an unlimited number of non-quorum nodes.
  5. If a network connection fails, which causes the loss of quorum, and quorum is maintained by tiebreaker disks, the following rationale is used to re-establish quorum. If a group has the cluster manager, it is the "survivor". The cluster manager can give up its role if it communicates with fewer than the minimum number of quorum nodes as defined by the minQuorumNodes configuration parameter. In this case, other groups with the minimum number of quorum nodes (if they exist) can choose a new cluster manager.

Changing quorum semantics:

When using the cluster configuration repository (CCR) to store configuration files, the total number of quorum nodes is limited to eight, regardless of quorum semantics, but the use of tiebreaker disks can be enabled or disabled at any time by issuing an mmchconfig tiebreakerDisks command. The change will take effect immediately, and it is not necessary to shut down GPFS when making this change.

When using the traditional server-based (non-CCR) configuration repository, it is possible to define more than eight quorum nodes, but only when no tiebreaker disks are defined:
  1. To configure more than eight quorum nodes under the server-based (non-CCR) configuration repository, you must disable node quorum with tiebreaker disks and restart the GPFS daemon. To disable node quorum with tiebreaker disks:
    1. Issue the mmshutdown -a command to shut down GPFS on all nodes.
    2. Change quorum semantics by issuing mmchconfig tiebreakerdisks=no.
    3. Add additional quorum nodes.
    4. Issue the mmstartup -a command to restart GPFS on all nodes.
  2. If you remove quorum nodes and the new configuration has less than eight quorum nodes, you can change the configuration to node quorum with tiebreaker disks. To enable quorum with tiebreaker disks:
    1. Issue the mmshutdown -a command to shut down GPFS on all nodes.
    2. Delete the appropriate quorum nodes or run mmchnode --nonquorum to drop them to a client.
    3. Change quorum semantics by issuing the mmchconfig tiebreakerdisks="diskList" command.
      • The diskList contains the names of the tiebreaker disks.
      • The list contains the NSD names of the disks, preferably one or three disks, separated by a semicolon (;) and enclosed by quotes.
    4. Issue the mmstartup -a command to restart GPFS on all nodes.
Tiebreaker disk rules:
  • You can have one, two, or three tiebreaker disks. However, you should use an odd number of tiebreaker disks.
  • Among the quorum node groups that appear after an interconnect failure, only those having access to a majority of tiebreaker disks can be candidates to be the survivor group.
  • Tiebreaker disks must be connected to all quorum nodes.
In Figure 1 GPFS remains active with the minimum of a single available quorum node and two available tiebreaker disks.
Figure 1. GPFS configuration using node quorum with tiebreaker disks
This graphic depicts a GPFS configuration utilizing node quorum with tiebreaker disks. There are four nodes in the configuration. Three of the nodes are quorum nodes, leaving one non-quorum node. There are three tiebreaker, directly attached disks

When a quorum node detects loss of network connectivity, but before GPFS runs the algorithm that decides if the node will remain in the cluster, the tiebreakerCheck event is triggered. This event is generated only in configurations that use quorum nodes with tiebreaker disks. It is also triggered on the cluster manager periodically by a challenge-response thread to verify that the node can still continue as cluster manager.