Declustered RAID

Compared to conventional RAID, IBM Storage Scale RAID implements a sophisticated data and spare space disk layout scheme that allows for arbitrarily sized disk arrays while also reducing the overhead to clients when recovering from disk failures. To accomplish this, IBM Storage Scale RAID uniformly spreads or declusters user data, redundancy information, and spare space across all the disks of a declustered array. Figure 1 compares a conventional RAID layout versus an equivalent declustered array.

As illustrated in Figure 2, a declustered array can significantly shorten the time that is needed to recover from a disk failure, which lowers the rebuild overhead for client applications. When a disk fails, erased data is rebuilt by using all the operational disks in the declustered array, the bandwidth of which is greater than the fewer disks of a conventional RAID group. Furthermore, if an extra disk fault occurs during a rebuild, the number of impacted tracks that require repair is markedly less than the previous failure and less than the constant rebuild overhead of a conventional array.

The decrease in declustered rebuild impact and client overhead can be a factor of 3 to 4 times less than a conventional RAID. Because IBM Storage Scale stripes client data across all the storage nodes of a cluster, file system performance becomes less dependent upon the speed of any single rebuilding storage array.

Figure 2. Lower rebuild overhead in declustered RAID versus conventional RAID. When a single disk fails in the 1-fault-tolerant 1 + 1 conventional array on the left, the redundant disk is read and copied onto the spare disk, which requires a throughput of seven strip I/O operations. When a disk fails in the declustered array, all replica strips of the six impacted tracks are read from the surviving six disks and then written to six spare strips, for a throughput of two strip I/O operations. The bar chart illustrates disk read and write I/O throughput during the rebuild operations.