How to delete stale/orphan CephFS subvolume, clones and snapshots in FDF

How To

Summary

This article includes the steps to identify the orphan subvolumes, clones, and snapshots and delete them.

Objective

There are situations where the ceph cluster is full with status like:

sh-4.4$ ceph -s
  cluster:
    id:     aaaaaaaa-xxxx-xxxx-aaaa-bbbbbbbbbbbb
    health: HEALTH_ERR
            16 backfillfull osd(s)
            1 full osd(s)
            4 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself): 19 pgs backfill_toofull
            10 pgs not deep-scrubbed in time
            9 pgs not scrubbed in time
            12 pool(s) full

Sometimes this is because stale cephfs subvolumes are present in the ceph cluster, which do not have a corresponding PV in OCP.
In this document, we provide the principles to identify them and to know whether we can delete them safely.

Environment

IBM Storage Fusion Data Foundation (FDF) 4.x

Red Hat OpenShift Data Foundation (ODF) 4.x

Steps

Data collection

What data should we request to analyze a clone/snapshot issue?

From ceph toolbox:

List of clone status:

for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi | jq -r '.[].name'` ; do echo subvolume $i: ; echo -----;ceph fs clone status ocs-storagecluster-cephfilesystem $i csi; echo ----- ; done > /tmp/clonestatus-$(date +%F-%H-%M-%S).txt

List of snapshots:

for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi | jq -r '.[].name'` ; do echo subvolume $i: ; echo -----; ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem $i csi; echo ----- ; done > /tmp/snaplist-$(date +%F-%H-%M-%S).txt

Snapshot info of all snapshots:

for subvolume in $(ceph fs subvolume ls ocs-storagecluster-cephfilesystem --group_name csi | jq -r '.[].name'); do for snap in $(ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem ${subvolume} --group_name csi|jq -r '.[].name'); do echo ${subvolume} ${snap} ; ceph fs subvolume snapshot info ocs-storagecluster-cephfilesystem ${subvolume} ${snap} --group_name csi  ; done; done > /tmp/cephfs_snap_info_all-$(date +%F-%H-%M-%S).txt.txt

From oc bastion:

  oc get volumesnapshot -A > oc-snapshot-$(date +%F-%H-%M-%S).out   
  oc get volumesnapshot -o yaml > oc-snapshot-$(date +%F-%H-%M-%S).yaml
   oc get volumesnapshotcontent  > oc-snapshotcontent-$(date +%F-%H-%M-%S).out
   oc get volumesnapshotcontent -o yaml > oc-snapshotcontent-$(date +%F-%H-%M-%S).yaml
  oc get pv > pv$(date +%F-%H-%M-%S).out 
  oc get pv -o yaml > pv$(date +%F-%H-%M-%S).yaml

Deletion Rules of CephFS Subvolume

Key concepts to know whether a cephfs subvolume can be deleted or not

Each cephfs subvolume should have a corresponding PV. The number of cephfs PVs [1] must be the same as the number of cephfs subvolumes [2]
```
[1] oc get pv |grep cephfs
```
Note. Assuming all Storage Classes with Provisioner openshift-storage.cephfs.csi.ceph.com have a SC name that includes ceph
```
[2] ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi
```
The list [2] will provide the existing cephfs subvolumes, which can be of two types: subvolume and clone ...in any case, a PV should exist for both types.
If we observe the difference in PV and cephfs subvolume count, check for the orphan cephfs subvolume
- Pick the cephfs subvolumes one by one listed in [2], and try to search for their corresponding pv in the pv describe
```
# oc get pv -o yaml | grep <subvolume-name>
```
- If any subvolume does not have an entry in the above pv describe, it means it does not have any corresponding pv. This subvolume can be deleted, with the exception:
Exception: cephfs subvolumes that are used as source volumes for snapshots should not be deleted. When the cephfs subvolume has snapshots. Before deleting these cephfs subvolumes, we would need to delete their snapshots, and would need to know if the snapshot can be deleted or not, see below Snapshot section.
If that subvolume does not have any snapshots and if it is orphan, then we should delete it
```
# ceph fs subvolume rm ocs-storagecluster-cephfilesystem <subvolume-name> csi
```

Deletion rules of cephfs snapshot

Key concepts to know whether a cephfs snapshot can be deleted or not

Any VolumeSnapshotContent based in cephfs [3], should have a correspondent entry in cephfs snapshot list [4] == The number of VolumeSnapshotContent must be the same as the number of cephfs subvolumes

[3] $ oc get volumesnapshotcontent |grep openshift-storage.cephfs.csi.ceph.com$

List of cephfs snapshots:

[4] $ for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi --format json | jq '.[] | .name' | cut -f 2 -d '"'`; do echo "Subvolume : $i"; ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem $i csi; done

Currently, there is no cephfs command to list all existing cephfs snapshots. However, you can list all the snapshots of one cephfs subvolume. List [4] will get the snapshot list from each cephfs subvolume.
We can delete a snapshot when two conditions are met:
It has no associated VolumeSnapshotContent == Anything that is on that output [4] that does not have a corresponding VolumeSnapshotContent [2]

AND
It has NO pending clones == "has_pending_clones": "no"

You can get the snapshot info to check clone status with this command:

$ for subvolume in $(ceph fs subvolume ls ocs-storagecluster-cephfilesystem --group_name csi | jq -r '.[].name'); do for snap in $(ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem ${subvolume} --group_name csi|jq -r '.[].name'); do echo ${subvolume} ${snap} ; ceph fs subvolume snapshot info ocs-storagecluster-cephfilesystem ${subvolume} ${snap} --group_name csi  ; done; done > /tmp/cephfs_snap_info_all.txt

Cephfs Clone state

Source: Ceph File System > FS volumes and subvolumes > Cloning Snapshots

A clone can be in one of the following states:

pending : Clone operation has not started

in-progress : Clone operation is in progress

complete : Clone operation has successfully finished

failed : Clone operation has failed

canceled : Clone operation is cancelled by user

Additional Information

Root Cause:

A) The main reason for stale clones is that they were created from a Storage Class with RECLAIMPOLICY Retain, Users forget to manually delete all associated resources: PVC, PV, and cephfs subvolume

With Storage Class with RECLAIMPOLICY Retain

When you delete the PVC (not in active use by a pod), only the PVC is deleted, but the PV remains as Retain and Bound, for example:

  pvc-4d5c3028-1b77-46cc-ae06-064331b5d12d   10Gi       RWX            Retain           Bound    my-shared-storage-2/shared-retain                                       retain-cephfs

This is the expected behaviour, as explained on Kubernetes Documentation / Concepts / Storage / Persistent Volumes / Retain

After that, when you delete the PV, the PV is deleted from OCP, but the cephfs subvolumes remain in the ceph system (when there is no finalizer)

Conclusion: The User needs to manually remove all resources, one by one: PVC, PV, and then cephfs subvolume.

B) The second reason is when the PVC is deleted, AND there is still a cephfs snapshot associated

With Storage Class with RECLAIMPOLICY Delete

When you delete a PVC (not in active use by a pod), the PV and associated cephfs subvolume are deleted.
In case the PVC has a snapshot, when you delete a PVC (not in active use by a pod), the PVC and PV are deleted, BUT not the associated cephfs subvolume, AS the associated VolumeSnapshot and VolumeSnapshotContent remain.

The cephfs subvolume cannot be deleted, even manually, see this example:

sh-4.4$ ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-a147b01d-e7c4-4a85-9a8e-497f39fc64e6 csi
Error ENOTEMPTY: subvolume 'csi-vol-a147b01d-e7c4-4a85-9a8e-497f39fc64e6' has snapshots
sh-4.4$

After that, when you delete the VolumeSnapshot, the VolumeSnapshot and VolumeSnapshotContent are deleted (if the SNAPSHOTCLASS has the DELETIONPOLICY Retain) and the source volume is automatically deleted from ceph ...most probably because it was pending to be deleted.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIUAA0","label":"Documentation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips