Additional failure group considerations

GPFS uses file system descriptor to be replicated on a subset of the disks as changes to the file system occur, such as adding or deleting disks. To reduce the risk of multiple failure GPFS picks disks to hold the replicas in different failure group.

There is a structure in GPFS called the file system descriptor that is initially written to every disk in the file system, but is replicated on a subset of the disks as changes to the file system occur, such as adding or deleting disks. Based on the number of failure groups and disks, GPFS creates between one and five replicas of the descriptor:
  • If there are at least five different failure groups, five replicas are created.
  • If there are at least three different disks, three replicas are created.
  • If there are only one or two disks, a replica is created on each disk.

Once it is decided how many replicas to create, GPFS picks disks to hold the replicas, so that all replicas are in different failure groups, if possible, to reduce the risk of multiple failures. In picking replica locations, the current state of the disks is taken into account. Stopped or suspended disks are avoided. Similarly, when a failed disk is brought back online, GPFS may modify the subset to rebalance the file system descriptors across the failure groups. The subset can be found by issuing the mmlsdisk -L command.

GPFS requires a majority of the replicas on the subset of disks to remain available to sustain file system operations:
  • If there are at least five different failure groups, then GPFS is able to tolerate a loss of two of the five groups. If disks out of three different failure groups are lost, the file system descriptor may become inaccessible due to the loss of the majority of the replicas.
  • If there are at least three different failure groups, then GPFS is able to tolerate a loss of one of the three groups. If disks out of two different failure groups are lost, the file system descriptor may become inaccessible due to the loss of the majority of the replicas.
  • If there are fewer than three failure groups, then a loss of one failure group may make the descriptor inaccessible.

    If the subset consists of three disks and there are only two failure groups, one failure group must have two disks and the other failure group has one. In a scenario that causes one entire failure group to disappear all at once, if the half of the disks that are unavailable contain the single disk that is part of the subset, everything stays up. The file system descriptor is moved to a new subset by updating the remaining two copies and writing the update to a new disk added to the subset. But if the downed failure group contains a majority of the subset, the file system descriptor cannot be updated and the file system has to be force unmounted.

    Introducing a third failure group consisting of a single disk that is used solely for the purpose of maintaining a copy of the file system descriptor can help prevent such a scenario. You can designate this disk by using the descOnly designation for disk usage on the disk descriptor. For more information on disk replication, see Network Shared Disk (NSD) creation considerations and Data Mirroring and Replication.