Distributing data across a cluster
You can distribute data uniformly across a cluster.
You can distribute the data in the following possible ways:
- To ensure that the data is distributed evenly across all failure groups and all nodes within a failure group, import the data through a node that does not have any attached NSD and takes the role as a GPFS client node in the cluster.
- Use a write affinity depth of 0 across the cluster.
- Make every GPFS node an ingest node and deliver data equally across all the ingest nodes. However, this strategy is expensive in terms of implementation.
Ideally, all the failure groups must have an equal number of disks with roughly equal capacity. If one failure group is much smaller than the rest, it is likely to fill up faster than the others, therefore complicating the rebalancing actions.
After the initial ingesting of data, the cluster might be unbalanced. In such a situation, use
the mmrestripefs command with the -b option to
rebalance the data.
Note: For FPO users, the mmrestripefs -b command breaks the
original data placement that follows the data locality rule.