Backfillfull OSDs

Understand and troubleshoot backfillfull OSDs.

If OSDs are considered backfillfull, the ceph health detail command returns an error message similar to the following example:
3 backfillfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull

What this means

When one or more OSDs has exceeded the backfillfull threshold, Ceph prevents data from rebalancing to this device. This is an early warning that rebalancing might not be complete and that the cluster is approaching full. The default for the backfullfull threshold is 90%.

Troubleshooting this problem

Determine how many percent of raw storage (%RAW USED) is used:
ceph df
If %RAW USED is higher than 70-75%, you can do any of the following options:
  • Delete unnecessary data as a short-term solution to avoid production downtime.
  • For a long-term solution, scale the cluster by adding an OSD node.
  • Increase the backfillfull ratio for the OSDs that contain the PGs stuck in backfull_toofull to allow the recovery process to continue. Add new storage to the cluster as soon as possible or remove data to prevent filling more OSDs.
ceph osd set-backfillfull-ratio VALUE

The range for VALUE is 0.0 to 1.0.

For example,
[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92

For more information, see Nearfull OSDs and Deleting data from a full storage cluster.