RAID codes

IBM Storage Scale RAID corrects for disk failures and other storage faults automatically by reconstructing the unreadable data by using the available data redundancy of a Reed-Solomon code or N-way replication. IBM Storage Scale RAID uses the reconstructed data to fulfill client operations. If disk fails, IBM Storage Scale RAID uses the reconstructed data to rebuild the data onto spare space. IBM Storage Scale RAID supports 2- and 3-fault-tolerant Reed-Solomon codes and 3-way and 4-way replication, which detects and corrects up to two or three concurrent faults¹. The redundancy code layouts that IBM Storage Scale RAID supports, called tracks, are illustrated in Figure 1.

Figure 1. Redundancy codes supported by IBM Storage Scale RAID. IBM Storage Scale RAID supports 2- and 3-fault-tolerant Reed-Solomon codes, which partition a GPFS block into eight data strips and 2 or 3 parity strips. The `N`-way replication codes duplicate the GPFS block on `N` - 1 replica strip.

Depending on the configured RAID code, IBM Storage Scale RAID creates redundancy information automatically. Using a Reed-Solomon code, IBM Storage Scale RAID divides a GPFS block of user data equally into eight data strips and generates 2 or 3 redundant parity strips. This results in a stripe or track width of 10 or 11 strips and storage efficiency of 80% or 73%, respectively (excluding user-configurable spare space for rebuild operations).

Using N-way replication, a GPFS data block is replicated simply N − 1 times, in effect implementing 1 + 2 and 1 + 3 redundancy codes, with the strip size equal to the GPFS block size. Thus, for every block/strip that is written to the disks, N replicas of that block/strip are also written. This results in a track width of 3 or 4 strips and storage efficiency of 33% or 25%, respectively.

¹ An ƒ-fault-tolerant Reed-Solomon code or a (1 + ƒ)-way replication can survive the concurrent failure of ƒ disks or read faults. Also, if there are s equivalent spare disks in the array, an ƒ-fault-tolerant array can survive the sequential failure of ƒ + s disks where disk failures occur between successful rebuild operations.