Using the Ceph Manager balancer module
The balancer is a module for Ceph Manager (ceph-mgr) that optimizes the placement of placement groups (PGs) across OSDs in order to achieve a balanced distribution, either automatically or in a supervised fashion.
- crush-compat
- The CRUSH compat mode uses the compat
weight-setfeature to manage an alternative set of weights for devices in the CRUSH hierarchy.The normal weights should remain set to the size of the device to reflect the target amount of data that you want to store on the device. The balancer then optimizes the
weight-setvalues, adjusting them up or down in small increments in order to achieve a distribution that matches the target distribution as closely as possible. Because PG placement is a pseudorandom process, there is a natural amount of variation in the placement; by optimizing the weights, the balancer counter-acts that natural variation.This mode is fully backwards compatible with older clients. When an OSDMap and CRUSH map are shared with older clients, the balancer presents the optimized weights as the real weights.
The primary restriction of this mode is that the balancer cannot handle multiple CRUSH hierarchies with different placement rules if the subtrees of the hierarchy share any OSDs. Because this configuration makes managing space utilization on the shared OSDs difficult, it is generally not recommended. As such, this restriction is normally not an issue.
- upmap
- Use
upmapto optimize PG placement and achieve uniform distribution across OSDs. In most cases, this distribution is nearly uniform, with each OSD hosting an equal number of PGs (+/-1 PG). In most cases, this distribution is "perfect", with an equal number of PGs on each OSD +/-1 PG, as they might not divide evenly.- Run the following command to check the PG distribution by device class:
If theceph osd df deviceclassVARvalues forhddorssdare outside the range[0.95 – 1.05], the balancer may be disabled or misconfigured.If any OSD shows aREWEIGHTvalue other than1.000, reset the OSD.ceph osd reweight osd.ID 1.000 - Improve balancer effectiveness by ensuring that pools have enough PGs. If PGS per OSD is less than 100, add PGs to the pools served by that device class.
When PG autoscaler is enabled, apply the following settings:
ceph config set global max_pg_per_osd 500 ceph config set global mon_target_pg_per_osd 300 ceph config set mgr/balancer/upmap_max_deviation 1Note: Using ceph config set mgr/balancer/upmap_max_deviation 1 is useful when OSD sizes vary significantly. - Overlapping CRUSH roots can prevent proper balancing.
Verify the pool rules by using the following command:
ceph osd pool ls detail - After verifying the pool rules, inspect the CRUSH rule.
ceph osd crush rule dump | jq 'map(select(.rule_id == RULE_ID) | .steps)'Each pool should use a rule that specifies a device class, for example,
default~ssd. If any pool uses a rule without a device class, contact IBM Support for guidance.
- Run the following command to check the PG distribution by device class:
- read
- The OSDMap can store explicit mappings for individual primary OSDs as exceptions to the normal CRUSH placement calculation. These pg-upmap-primary entries provide fine-grained control over primary PG mappings. This mode optimizes the placement of individual primary PGs in order to achieve balanced reads, or primary PGs, in a cluster. In
readmode, upmap behavior is not exercised, so this mode is best for uses cases in which only read balancing is desired.Important: To allow use of this feature, you must tell the cluster that it only needs to support rRef or later clients with the following command:
This command fails if any pre-Reef clients or daemons are connected to the monitors. To work around this issue, use theceph osd set-require-min-compat-client reef--yes-i-really-mean-itflag:
You can check what client versions are in use with: the ceph features command. For example,ceph osd set-require-min-compat-client reef --yes-i-really-mean-it[ceph: root@host01 /]# ceph features
- upmap-read
- This balancer mode combines optimization benefits of both
upmapandreadmode. Like inreadmode,upmap-readmakes use of pg-upmap-primary.Useupmap-readfor achieving theupmapmode’s offering of balanced PG distribution as well as thereadmode’s offering of balanced reads.Important: To allow use of this feature, you must tell the cluster that it only needs to support Reef or later clients with the following command:
This command fails if any pre-Reef clients or daemons are connected to the monitors. To work around this issue, use theceph osd set-require-min-compat-client reef--yes-i-really-mean-itflag:
You can check what client versions are in use with: the ceph features command. For example,ceph osd set-require-min-compat-client reef --yes-i-really-mean-it[ceph: root@host01 /]# ceph features