How To
Summary
This article includes the steps to identify the orphan subvolumes, clones, and snapshots and delete them.
Objective
- There are situations where the ceph cluster is full with status like:
sh-4.4$ ceph -s
cluster:
id: aaaaaaaa-xxxx-xxxx-aaaa-bbbbbbbbbbbb
health: HEALTH_ERR
16 backfillfull osd(s)
1 full osd(s)
4 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 19 pgs backfill_toofull
10 pgs not deep-scrubbed in time
9 pgs not scrubbed in time
12 pool(s) full
- Sometimes this is because stale cephfs subvolumes are present in the ceph cluster, which do not have a corresponding PV in OCP.
- In this document, we provide the principles to identify them and to know whether we can delete them safely.
Environment
Steps
Data collection
What data should we request to analyze a clone/snapshot issue?
From ceph toolbox:
- List of clone status:
for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi | jq -r '.[].name'` ; do echo subvolume $i: ; echo -----;ceph fs clone status ocs-storagecluster-cephfilesystem $i csi; echo ----- ; done > /tmp/clonestatus-$(date +%F-%H-%M-%S).txt
- List of snapshots:
for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi | jq -r '.[].name'` ; do echo subvolume $i: ; echo -----; ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem $i csi; echo ----- ; done > /tmp/snaplist-$(date +%F-%H-%M-%S).txt
- Snapshot info of all snapshots:
for subvolume in $(ceph fs subvolume ls ocs-storagecluster-cephfilesystem --group_name csi | jq -r '.[].name'); do for snap in $(ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem ${subvolume} --group_name csi|jq -r '.[].name'); do echo ${subvolume} ${snap} ; ceph fs subvolume snapshot info ocs-storagecluster-cephfilesystem ${subvolume} ${snap} --group_name csi ; done; done > /tmp/cephfs_snap_info_all-$(date +%F-%H-%M-%S).txt.txt
- From oc bastion:
oc get volumesnapshot -A > oc-snapshot-$(date +%F-%H-%M-%S).out oc get volumesnapshot -o yaml > oc-snapshot-$(date +%F-%H-%M-%S).yaml oc get volumesnapshotcontent > oc-snapshotcontent-$(date +%F-%H-%M-%S).out oc get volumesnapshotcontent -o yaml > oc-snapshotcontent-$(date +%F-%H-%M-%S).yaml oc get pv > pv$(date +%F-%H-%M-%S).out oc get pv -o yaml > pv$(date +%F-%H-%M-%S).yaml
Deletion Rules of CephFS Subvolume
Key concepts to know whether a cephfs subvolume can be deleted or not
-
Each cephfs subvolume should have a corresponding PV. The number of cephfs PVs [1] must be the same as the number of cephfs subvolumes [2]
[1] oc get pv |grep cephfsNote. Assuming all Storage Classes with Provisioner
openshift-storage.cephfs.csi.ceph.comhave a SC name that includes ceph[2] ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi -
The list [2] will provide the existing cephfs subvolumes, which can be of two types:
subvolumeandclone...in any case, a PV should exist for both types. -
If we observe the difference in PV and cephfs subvolume count, check for the orphan cephfs subvolume
- Pick the cephfs subvolumes one by one listed in [2], and try to search for their corresponding pv in the pv describe# oc get pv -o yaml | grep <subvolume-name>- If any subvolume does not have an entry in the above pv describe, it means it does not have any corresponding pv. This subvolume can be deleted, with the exception: -
Exception: cephfs subvolumes that are used as source volumes for snapshots should not be deleted. When the cephfs subvolume has snapshots. Before deleting these cephfs subvolumes, we would need to delete their snapshots, and would need to know if the snapshot can be deleted or not, see below Snapshot section.
-
If that subvolume does not have any snapshots and if it is orphan, then we should delete it
# ceph fs subvolume rm ocs-storagecluster-cephfilesystem <subvolume-name> csi
Deletion rules of cephfs snapshot
Key concepts to know whether a cephfs snapshot can be deleted or not
-
Any VolumeSnapshotContent based in cephfs [3], should have a correspondent entry in cephfs snapshot list [4] == The number of VolumeSnapshotContent must be the same as the number of cephfs subvolumes
[3] $ oc get volumesnapshotcontent |grep openshift-storage.cephfs.csi.ceph.com$List of cephfs snapshots:
[4] $ for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi --format json | jq '.[] | .name' | cut -f 2 -d '"'`; do echo "Subvolume : $i"; ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem $i csi; done
-
Currently, there is no cephfs command to list all existing cephfs snapshots. However, you can list all the snapshots of one cephfs subvolume. List [4] will get the snapshot list from each cephfs subvolume.
-
We can delete a snapshot when two conditions are met:
-
It has no associated VolumeSnapshotContent == Anything that is on that output [4] that does not have a corresponding VolumeSnapshotContent [2]
AND
-
It has NO pending clones ==
"has_pending_clones": "no" -
You can get the snapshot info to check clone status with this command:
$ for subvolume in $(ceph fs subvolume ls ocs-storagecluster-cephfilesystem --group_name csi | jq -r '.[].name'); do for snap in $(ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem ${subvolume} --group_name csi|jq -r '.[].name'); do echo ${subvolume} ${snap} ; ceph fs subvolume snapshot info ocs-storagecluster-cephfilesystem ${subvolume} ${snap} --group_name csi ; done; done > /tmp/cephfs_snap_info_all.txt
Cephfs Clone state
Source: Ceph File System > FS volumes and subvolumes > Cloning Snapshots
-
A clone can be in one of the following states:
pending : Clone operation has not started
in-progress : Clone operation is in progress
complete : Clone operation has successfully finished
failed : Clone operation has failed
canceled : Clone operation is cancelled by user
Additional Information
Root Cause:
A) The main reason for stale clones is that they were created from a Storage Class with RECLAIMPOLICY Retain, Users forget to manually delete all associated resources: PVC, PV, and cephfs subvolume
With Storage Class with RECLAIMPOLICY Retain
- When you delete the PVC (not in active use by a pod), only the PVC is deleted, but the PV remains as Retain and Bound, for example:
pvc-4d5c3028-1b77-46cc-ae06-064331b5d12d 10Gi RWX Retain Bound my-shared-storage-2/shared-retain retain-cephfs
This is the expected behaviour, as explained on Kubernetes Documentation / Concepts / Storage / Persistent Volumes / Retain
- After that, when you delete the PV, the PV is deleted from OCP, but the cephfs subvolumes remain in the ceph system (when there is no finalizer)
Conclusion: The User needs to manually remove all resources, one by one: PVC, PV, and then cephfs subvolume.
B) The second reason is when the PVC is deleted, AND there is still a cephfs snapshot associated
With Storage Class with RECLAIMPOLICY Delete
-
When you delete a PVC (not in active use by a pod), the PV and associated cephfs subvolume are deleted.
-
In case the PVC has a snapshot, when you delete a PVC (not in active use by a pod), the PVC and PV are deleted, BUT not the associated cephfs subvolume, AS the associated VolumeSnapshot and VolumeSnapshotContent remain.
The cephfs subvolume cannot be deleted, even manually, see this example:
sh-4.4$ ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-a147b01d-e7c4-4a85-9a8e-497f39fc64e6 csi
Error ENOTEMPTY: subvolume 'csi-vol-a147b01d-e7c4-4a85-9a8e-497f39fc64e6' has snapshots
sh-4.4$
- After that, when you delete the VolumeSnapshot, the VolumeSnapshot and VolumeSnapshotContent are deleted (if the SNAPSHOTCLASS has the DELETIONPOLICY Retain) and the source volume is automatically deleted from ceph ...most probably because it was pending to be deleted.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
23 June 2025
UID
ibm17231040