Recommendations

This topic describes recommendations on what block sizes to be used with each erasure code and how many node failures can be tolerated based on the recovery group size.

The following table shows how many node and disk failures can be tolerated with different RAID protections and node numbers.

Note:

To protect data from disk failure, all failure tolerances that are marked with # in the following table need to be paid attention to for the spare disk space other than the erasure code. You can change the number of spare disk space to the same or bigger number than the node number before creating vdisks. For example, for 6 nodes with 4+2p erasure code, you can change all DA's spare disk space to 6 before creating vdisks. (This can only hold one disk failure without downgrading the fault tolerance below 1 node + 1 pdisk after rebuilding and you need to replace the disks as quickly as possible if more than one disk fails.)

Use the recommended Erasure code with the nodes number without # marked if you do not have enough spare disk space.

You must replace any failed disk as quickly as possible if you cannot meet the requirements. For more information, see RAID rebuild and Spare space.

Note: All failure tolerances that are marked with * are limited by recovery group descriptors rather than by the RAID code.
Table 1. Node and disk failures that can be tolerated based on RAID codes and node numbers
Number of nodes 3WayReplication 4WayReplication 4+2P 4+3P 8+2P 8+3P 16+2P 16+3P
3 1 Node + 1 Device * 1 Node + 1 Device * Not recommended

1 Node

Not recommended

1 Node

Not recommended

2 Devices

Not recommended

3 Devices

Not recommended

2 Devices

Not recommended

3 Devices

4 1 Node + 1 Device * 1 Node + 1 Device * Not recommended

1 Node

1 Node + 1 Device # Not recommended

2 Devices

Not recommended

1 Node

Not recommended

2 Devices

Not recommended

3 Devices

5 2 Nodes 2 Nodes * Not recommended

1 Node

1 Node + 1 Device Not recommended

1 Node

Not recommended

1 Node

Not recommended

2 Devices

Not recommended

3 Devices

6 2 Nodes 2 Nodes * 2 Nodes # 2 Nodes Not Recommended

1 Node

1 Node + 1 Device # Not recommended

2 Devices

Not recommended

3 Devices

7 2 Nodes 2 Nodes * 2 Nodes 2 Nodes* Not Recommended

1 Node

1 Node + 1 Device Not recommended

2 Devices

Not recommended

3 Devices

8 2 Nodes 2 Nodes * 2 Nodes 2 Nodes* Not Recommended

1 Node

1 Node + 1 Device Not recommended

2 Devices

Not recommended

3 Devices

9 2 Nodes 3 Nodes 2 Nodes 3 Nodes Not Recommended

1 Node

1 Node + 1 Device Not recommended

1 Node

Not recommended

1 Node

10 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes # 2 Nodes Not recommended

1 Node

1 Node + 1 Device#
11+ 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes 3 Nodes Not recommended

1 Node

1 Node + 1 Device
18 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes # 2 Nodes
19+ 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes 3 Nodes 2 Nodes 3 Nodes

There are limits on what block sizes can be used with each RAID Code. The following table provides information about the limits:

Table 2. Limits on block sizes to be used with RAID Code
Block size 3WayReplication 4WayReplication 4+2P 4+3P 8+2P 8+3P 16+2P 16+3P
256 KiB Supported Supported Not supported Not supported Not supported Not supported Not supported Not supported
512 KiB Supported Supported Supported Supported Supported Supported Supported Supported
1 MiB Supported Supported Supported Supported Supported Supported Supported Supported
2 MiB Supported Supported Supported Supported Supported Supported Supported Supported
4 MiB Not supported Not supported Supported Supported Supported Supported Supported Supported
8 MiB Not supported Not supported Supported Supported Supported Supported Supported Supported
16 MiB Not supported Not supported Not supported Not supported Supported Supported Supported Supported
The following considerations are required for choosing block sizes to be used depending on the device media type:
  • SSD (NVMe or SAS) drives are better to be set with smaller block size for small I/O workloads.
  • HDD drives are better to be set with larger block size for large I/O workloads.

Even though the number of failures that can be tolerated in a smaller recovery group is the same as the number of failures in a larger recovery group, the amount of critical data that must be rebuilt for each failure is less for a larger recovery group. For example, with an 8+3P array on an 11-node recovery group, three-node failures would impact all of the data in the file system. On a 30-node recovery group, three node failures would impact only about 10% of the data on the file system (assuming all disks are the same size). The critical rebuild would complete more quickly because the rebuild work is distributed across a larger number of remaining nodes.

When you plan the erasure code type, also consider future expansion of the cluster and storage utilization. Erasure codes for vdisks cannot be changed after the vdisk is created, and larger stripe widths have better storage utilization. A 4+3P code uses 57% of total capacity for usable data, while an 8+3P code uses 73% of the total capacity for usable data. So, rather than creating a 9-node cluster with 4+3P and expanding it in the future, a 12 or more-node cluster by using 8+3P might be more cost-effective. In some cases, using a non-recommended erasure code might be tolerable if there are plans to increase the cluster size.