Using the Ceph Manager balancer module

Edit online

The balancer is a module for Ceph Manager (ceph-mgr) that optimizes the placement of placement groups (PGs) across OSDs in order to achieve a balanced distribution, either automatically or in a supervised fashion.

Currently the balancer module cannot be disabled. It can only be turned off to customize the configuration.

Modes

Edit online

There are currently two supported balancer modes:

crush-compat: The CRUSH compat mode uses the compat weight-set feature, introduced in Ceph Luminous, to manage an alternative set of weights for devices in the CRUSH hierarchy. The normal weights should remain set to the size of the device to reflect the target amount of data that you want to store on the device. The balancer then optimizes the weight-set values, adjusting them up or down in small increments in order to achieve a distribution that matches the target distribution as closely as possible. Because PG placement is a pseudorandom process, there is a natural amount of variation in the placement; by optimizing the weights, the balancer counter-acts that natural variation.

This mode is fully backwards compatible with older clients. When an OSDMap and CRUSH map are shared with older clients, the balancer presents the optimized weights as the real weights.

The primary restriction of this mode is that the balancer cannot handle multiple CRUSH hierarchies with different placement rules if the subtrees of the hierarchy share any OSDs. Because this configuration makes managing space utilization on the shared OSDs difficult, it is generally not recommended. As such, this restriction is normally not an issue.

upmap: Starting with Luminous, the OSDMap can store explicit mappings for individual OSDs as exceptions to the normal CRUSH placement calculation. These upmap entries provide fine-grained control over the PG mapping. This CRUSH mode will optimize the placement of individual PGs in order to achieve a balanced distribution. In most cases, this distribution is "perfect", with an equal number of PGs on each OSD +/-1 PG, as they might not divide evenly.

IMPORTANT:

   To allow use of this feature, you must tell the cluster that it only needs to support luminous or later clients with the following command:

  [ceph: root@host01 /]# ceph osd set-require-min-compat-client luminous

  This command fails if any pre-luminous clients or daemons are connected to the monitors.

  Due to a known issue, kernel CephFS clients report themselves as jewel clients. To work around this issue, use the `--yes-i-really-mean-it` flag:

  [ceph: root@host01 /]# ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it

  You can check what client versions are in use with:

  [ceph: root@host01 /]# ceph features

Prerequisites

Edit online

A running IBM Storage Ceph cluster.

Procedure

Edit online

Ensure the balancer module is enabled:

Example

[ceph: root@host01 /]# ceph mgr module enable balancer

Turn on the balancer module:

Example
```
[ceph: root@host01 /]# ceph balancer on
```

The default mode is upmap. The mode can be changed with:

Example

[ceph: root@host01 /]# ceph balancer mode crush-compat

Example

[ceph: root@host01 /]# ceph balancer mode upmap

Status

The current status of the balancer can be checked at any time with:

Example

[ceph: root@host01 /]# ceph balancer status

Automatic balancing

By default, when turning on the balancer module, automatic balancing is used:

Example

[ceph: root@host01 /]# ceph balancer on

The balancer can be turned back off again with:

Example

[ceph: root@host01 /]# ceph balancer off

This will use the crush-compat mode, which is backward compatible with older clients and will make small changes to the data distribution over time to ensure that OSDs are equally utilized.

Throttling

No adjustments will be made to the PG distribution if the cluster is degraded, for example, if an OSD has failed and the system has not yet healed itself.

When the cluster is healthy, the balancer throttles its changes such that the percentage of PGs that are misplaced, or need to be moved, is below a threshold of 5% by default. This percentage can be adjusted using the target_max_misplaced_ratio setting. For example, to increase the threshold to 7%:

Example

[ceph: root@host01 /]# ceph config-key set mgr target_max_misplaced_ratio .07

Supervised optimization

The balancer operation is broken into a few distinct phases:

Building a plan.
Evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a plan.
Executing the plan.
- To evaluate and score the current distribution:
  
  Example
```
[ceph: root@host01 /]# ceph balancer eval
```
- To evaluate the distribution for a single pool:
  
  Syntax
```
ceph balancer eval POOL_NAME
```
  Example
```
[ceph: root@host01 /]# ceph balancer eval rbd
```
- To see greater detail for the evaluation:
  
  Example
```
[ceph: root@host01 /]# ceph balancer eval-verbose ...
```
- To generate a plan using the currently configured mode:
  
  Syntax
```
ceph balancer optimize PLAN_NAME
```
  Replace PLAN_NAME with a custom plan name.
  
  Example
```
[ceph: root@host01 /]# ceph balancer optimize rbd_123
```
- To see the contents of a plan:
  
  Syntax
```
ceph balancer show PLAN_NAME
```
  Example
```
[ceph: root@host01 /]# ceph balancer show rbd_123
```
- To discard old plans:
  
  Syntax
```
ceph balancer rm PLAN_NAME
```
  Example
```
[ceph: root@host01 /]# ceph balancer rm rbd_123
```
- To see currently recorded plans use the status command:
```
[ceph: root@host01 /]# ceph balancer status
```
- To calculate the quality of the distribution that would result after executing a plan:
  
  Syntax
```
ceph balancer eval PLAN_NAME
```
  Example
```
[ceph: root@host01 /]# ceph balancer eval rbd_123
```
- To execute the plan:
  
  Syntax
```
ceph balancer execute PLAN_NAME
```
  Example
```
[ceph: root@host01 /]# ceph balancer execute rbd_123
```
  NOTE: Only execute the plan if it is expected to improve the distribution. After execution, the plan will be discarded.