Calculating the failure tolerance for vaults

The failure tolerance value for vaults represents the tolerance of a vault to drive failures across all of the COS Slicestor nodes in its storage pool. The lower the value for drive failure tolerance, the greater the risk that write operations can no longer be completed for the vault.

The calculation of drive failure tolerance is based on the following values:
  • The write threshold value in the IDA that is configured for the vault. The write threshold is the minimum number of COS Slicestor nodes that must be available to complete write operations for the vault.
  • The number of available COS Slicestor nodes in the storage pool that the vault belongs to.
  • The value for drive failure tolerance for each of the available COS Slicestor nodes in the storage pool. This value represents the tolerance of the node to drive failures and is based on the number of failed drives in the node and the drive error threshold. A value of 2 means that the node will be unavailable to store vault data if 2 drives fail in the node.
For example, a vault in your IBM Cloud Object Storage environment is configured with an IDA of 8-4-6. The IDA determines that the vault data is stored across 8 COS Slicestor nodes and a minimum of 6 nodes must be available to complete write operations for the vault.
The nodes have the following values for drive failure tolerance:
Node Drive Failure Tolerance
node_1 3
node_2 4
node_3 0*
node_4 2
node_5 4
node_6 1
node_7 3
node_8 3
*A failure tolerance value of 0 means that node_3 is not available to store vault data.

7 COS Slicestor nodes are available in the storage pool and a minimum of 6 nodes must be available to complete write operations. Therefore, if 2 more nodes fail, the write threshold value is exceeded and write operations can no longer be completed for the vault.

Use the following steps to calculate the value for drive failure tolerance for a vault:
  1. Use the IDA column on the Vaults page to identify the write threshold value for the vault. In the example, the write threshold value is 6.
  2. Use the Slicestor Nodes page to determine the number of available COS Slicestor nodes in the storage pool that the vault belongs to. A value > 0 in the Drive Failure Tolerance column means that the node is available. In the example, 7 nodes are available.
  3. Calculate how many nodes must fail for the number of available nodes to fall below the write threshold value. In the example, (7 - 6) + 1 = 2. When 2 nodes fail, the vault is not available for write operations.
  4. Use the Drive Failure Tolerance column on the Slicestor Nodes page to identify the 2 available nodes in the storage pool that have the lowest tolerance to drive failures.
    In the example, the 2 nodes are as follows:
    Node Drive Failure Tolerance
    node_4 2
    node_6 1
  5. To calculate the overall failure tolerance for the vault, sum the failure tolerance values for the 2 nodes. 2 + 1 = 3.

    The drive failure tolerance for the vault is 3. If 3 drives fail, write operations can no longer be completed for the vault.