Node‑rack distribution considerations

Node‑rack distribution considerations describe how IBM Storage Scale Erasure Code Edition nodes are distributed across racks and how this distribution affects data protection and fault tolerance.

Starting with IBM Storage Scale Erasure Code Edition 6.0.1.0, the system supports rack failure domains. Rack‑aware placement improves fault tolerance by spreading erasure‑coded data across racks instead of concentrating it within a single rack.

When you plan erasure code selection, consider how node distribution across racks can affect the effective failure tolerance.

Balanced node-rack distribution

In a configuration with four racks and one ECE node per rack, each node is isolated. A 4+3P erasure code tolerates the failure of one node. Because each rack contains only one node, the system can also tolerate the loss of one rack.

Unbalanced node-rack distribution without rack awareness

In a configuration with three racks and four ECE nodes, where one rack contains two nodes. Although the number of nodes and erasure code is not changed (4+3P), the effective protection is significantly reduced.

With a 4+3P erasure code, the simultaneous loss of both nodes in the rack increases the single‑node fault tolerance. which prevents the system from tolerating the loss of a rack.

A 4+3P erasure code tolerates the loss of one node per stripe. In a layout where two nodes share the same rack, a rack‑level infrastructure failure can remove both nodes at once, exceeding the erasure code’s fault tolerance and eliminating effective rack‑level protection.

Improved protection with rack failure domains

By configuring rack failure domains, ECE becomes aware of how nodes are physically distributed across racks. This awareness allows GNR declustered arrays to place data and parity stripe members more intelligently, reducing the likelihood that multiple stripe members are placed in the same rack.

In the same three‑rack, four‑node layout, rack‑aware placement can distribute stripe and parity fragments more safely. This placement improves resilience so that the system can tolerate the failure of one rack, even when a rack hosts multiple nodes. For this reason, rack failure domains are a critical design consideration whenever node distribution across racks is uneven.