Pdisk-group fault tolerance: an example

Every data stripe (including user data and system configuration data) within the IBM Spectrum Scale RAID system is protected through a distinct form of redundancy. Each of these data stripes has a set of disks within which they constrain their strip placement. Each stripe of the data (for which there are many stripes in each whole) has individual strips that serve in the redundancy code protection of the object's data. The placement of these strips has been distributed across a set of pdisks residing within a set of drawers. These drawers reside within a set of enclosures.

Figure 1 shows a sample stripe placement for a vdisk that was using a RAID redundancy code of 4WayReplication (that is, four duplicate copies of each data strip). The pdisk-group fault-tolerant placement has chosen to place the four strips of the stripe across four drawers in the two enclosures available to this recovery group.
Figure 1. Strips across JBOD enclosures
Strips across JBOD enclosures
By segregating each individual strip across as wide a set of disk groups as possible, IBM Spectrum Scale RAID ensures that the loss of any set of disk groups up to fault tolerance of the RAID redundancy code is survivable.
Figure 2 shows an example of the same configuration after the loss of a full enclosure and one drawer from the second enclosure.
Figure 2. Strips across JBOD enclosures after failure
Strips across JBOD enclosures after failure
In this example, the pdisk-group fault-tolerant placement of individual strips across multiple enclosures and multiple drawers has ensured that at least one of the four duplicate copies has survived the multiple disk failures that occurred when an enclosure and a separate drawer failed.