Compared to conventional RAID, IBM Storage
Scale RAID implements
a sophisticated data and spare space disk layout scheme that allows for arbitrarily sized disk
arrays while also reducing the overhead to clients when recovering from disk
failures. To accomplish this, IBM Storage
Scale RAID uniformly
spreads or declusters user data, redundancy information, and spare space across all the
disks of a declustered array. Figure 1 compares a
conventional RAID layout versus an equivalent declustered array.
Figure 1. Conventional RAID versus declustered RAID layouts. This
figure is an example of how IBM Storage
Scale RAID improves client
performance during rebuild operations by using the throughput of all disks in the declustered array.
This declustered array is illustrated here by comparing a conventional RAID of three arrays versus a
declustered array, both using seven disks. A conventional 1-fault-tolerant 1 + 1 replicated RAID
array in the lower left is shown with three arrays of two disks each (data and replica strips) and a
spare disk for rebuilding. To decluster this array, the disks are divided into seven tracks, two
strips per array, as shown in the upper left. The strips from each group are then combinatorially
spread across all seven disk positions, for a total of 21 virtual tracks, per the upper right. The
strips of each disk position for every track are then arbitrarily allocated onto the disks of the
declustered array of the lower right (in this case, by vertically sliding down and compacting the
strips from above). The spare strips are uniformly inserted, one per disk.
As illustrated in Figure 2, a declustered array
can significantly shorten the time that is needed to recover from a disk failure, which lowers the
rebuild overhead for client applications. When a disk fails, erased data is
rebuilt by using all the operational disks in the declustered array, the bandwidth of which is
greater than the fewer disks of a conventional RAID group. Furthermore, if an extra disk fault
occurs during a rebuild, the number of impacted tracks that require repair is markedly less than the
previous failure and less than the constant rebuild overhead of a conventional
array.
The decrease in declustered rebuild impact and client overhead can be a factor
of 3 to 4 times less than a conventional RAID. Because IBM Storage
Scale stripes client data across all the storage nodes of a
cluster, file system performance becomes less dependent upon the speed of any single rebuilding
storage array.
Figure 2. Lower rebuild overhead in declustered RAID
versus conventional RAID. When a single disk fails in the 1-fault-tolerant 1 + 1 conventional array on the left, the
redundant disk is read and copied onto the spare disk, which requires a throughput of seven strip
I/O operations. When a disk fails in the declustered array, all replica strips of the six impacted
tracks are read from the surviving six disks and then written to six spare strips, for a throughput
of two strip I/O operations. The bar chart illustrates disk read and write I/O throughput during the
rebuild operations.