Backfillfull OSDs
Understand and troubleshoot backfillfull OSDs.
If OSDs are considered backfillfull, the ceph health detail command returns an error message similar to the following example:
health: HEALTH_WARN 3 backfillfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull
What this means
When one or more OSDs has exceeded the backfillfull threshold, Ceph prevents data from rebalancing to this device. This is an early warning that rebalancing might not be complete and that the cluster is approaching full. The default for the backfullfull threshold is 90%.
Troubleshooting this problem
Determine what percent of raw storage (%RAW USED) is used, by using the ceph df command.
If
%RAW USED is higher than 70-75%, troubleshoot by using any of the following options:
- For a long-term solution, scale the cluster by adding an OSD node.
- Delete unnecessary data.
Note: This is a short-term solution to avoid production downtime.
For Ceph Block Device pools, also run the fstrim command on clients with conventional file systems on volumes served by pools that use the backfillfull OSD.
- Increase the
backfillfullratio for the OSDs that contain the PGs stuck inbackfull_toofullto allow the recovery process to continue. Add new storage to the cluster as soon as possible or remove data to prevent filling more OSDs.Important: Increasing the backfillfull ratio is a temporary emergency action. Restore the default setting as soon as possible to reduce the risk of cluster instability.ceph osd set-backfillfull-ratio VALUEThe range for VALUE is 0.0 to 1.0.
For example,[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92
For more information and additional resolution strategies, see the following: