Pdisk-group fault tolerance: an example
Every data stripe (including user data and system configuration data) within the IBM Spectrum Scale RAID system is protected through a distinct form of redundancy. Each of these data stripes has a set of disks within which they constrain their strip placement. Each stripe of the data (for which there are many stripes in each whole) has individual strips that serve in the redundancy code protection of the object's data. The placement of these strips has been distributed across a set of pdisks residing within a set of drawers. These drawers reside within a set of enclosures.
Figure 1 shows a sample stripe placement for a vdisk that was using a RAID
redundancy code of 4WayReplication (that is, four duplicate copies of each data strip). The
pdisk-group fault-tolerant placement has chosen to place the four strips of the stripe across four
drawers in the two enclosures available to this recovery group.
By segregating each individual strip across as wide a set of disk groups as possible,
IBM Spectrum
Scale RAID ensures that the loss of any set of disk
groups up to fault tolerance of the RAID redundancy code is survivable.
Figure 2
shows an example of the same configuration after the loss of a full enclosure and one drawer from the second enclosure.
In this example, the pdisk-group fault-tolerant placement of individual strips across multiple enclosures and multiple drawers
has ensured that at least one of the four duplicate copies has survived the multiple disk failures that occurred when an
enclosure and a separate drawer failed.