Full OSDs
Understand and troubleshoot full OSDs.
HEALTH_ERR 1 full osds osd.3 is full at 95%
What this means
Ceph prevents clients from running I/O operations on full OSD nodes to avoid losing data. It returns the HEALTH_ERR full osds message when the cluster reaches the capacity set by the mon_osd_full_ratio parameter. By default, this parameter is set to 0.95, which means 95% of the cluster capacity.
Troubleshooting this problem
To diagnose and resolve full OSDs, complete the following steps:
- Run ceph osd df to check how full each OSD is. When invoked without arguments, it reports on the entire cluster. You can also filter by device class or a single OSD.
ceph osd df# ceph osd df ssd# ceph osd df osd.1701 - Review the
%USEcolumn to see how full each OSD is. TheVARcolumn shows how each OSD compares to the average.The
STDDEVvalue indicates how evenly OSDs are filled.If ceph osd df ssd or ceph osd df nvme reports a standard deviation greater than 2.0, the balancer might not be enabled or functioning correctly.
- If ceph df shows that the device class of the full OSD is much less full on average than the full OSD, this is likely a balancer issue. For more information, see Using the Ceph Manager balancer module.
- If OSDs within the device class are balanced, use the following approaches to reduce usage:
- Check for benchmark detritus.
If rados bench was run against this pool, orphaned data might remain. Check by running the following command:
rados -p mypoolname ls | grep -c benchIf this command returns a large number, consult IBM Support for safe removal by using the rados rm command.
- Request that users remove temporary, outdated, or non-essential data.
- Run fstrim on Ceph Block Device clients that use conventional file systems on volumes in a pool that uses the full OSD. This discards unused blocks and frees capacity.
- Add nodes or OSDs of the appropriate device class so the balancer can redistribute data and reduce fullness.
- Check for benchmark detritus.
- As an emergency measure, you can temporarily raise the full ratio.
ceph osd set-full-ratio 0.98Important: Raising the full ratio is a temporary emergency measure. Restore the default setting as soon as possible to reduce the risk of cluster instability.
For more information, see Nearfull OSDs and Deleting data from a full storage cluster.