Recommendations

The following table shows how many node failures can occur based on a recovery group size, with different erasure code protections.

There are limits on what block sizes can be used with each erasure code, depending on device media type. The table below illustrates these limits:

Block size 4+2P 4+3P 8+2P 8+3P
1 MiB SSD or HDD SSD or HDD SSD or HDD SSD or HDD
2 MiB SSD or HDD SSD or HDD SSD or HDD SSD or HDD
4 MiB HDD HDD SSD or HDD SSD or HDD
8 MiB HDD HDD HDD HDD
16 MiB N/A N/A HDD HDD
Key
SSD or HDD This combination of block size and erasure code may be used with SSD (NVMe or SAD) or HDD drives
HDD This combination of block size and erasure code may be used with HDD drives only

Even though the number of failures that can be tolerated in a smaller recovery group is the same as the number of failures in a larger recovery group, the amount of data that is critical and must be rebuilt for each failure is less for a larger recovery group. For example, with an 8+3P array on an 11-node recovery group, 3 node failures would impact all of the data in the file system. On a 30-node recovery group, 3 node failures would impact only about 10% of the data on the file system (assuming all disks are the same size), and the critical rebuild will complete more quickly because the rebuild work is distributed across a larger number of remaining nodes.

When planning the erasure code type, also consider future expansion of the cluster and storage utilization. Erasure codes for a vdisks cannot be changed after the vdisk is created, and larger stripe widths have better storage utilization. A 4+3P code utilizes 57% of total capacity for usable data, while a 8+3P code uses 73% of total capacity for usable data. So, rather than creating a 9-node cluster with 4+3P and expanding it in the future, an 11-node cluster using 8+3P may be more cost-effective. In some cases, using a non-recommended erasure code may be tolerable if there are plans to increase the cluster size.