Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation

Before you begin

About this task

Follow the steps in the procedure to remove the failed or unwanted Ceph OSDs in dynamically provisioned IBM Storage Fusion Data Foundation.
Important: Scaling down of cluster is supported only with the help of the IBM Support team.
Warning:
  • Removing an OSD when the Ceph component is not in a healthy state can result in data loss.

  • Removing two or more OSDs at the same time results in data loss.

Procedure

  1. Scale down the OSD deployment.
    # oc scale deployment rook-ceph-osd-<osd-id> --replicas=0
  2. Get the osd-prepare pod for the Ceph OSD to be removed.
    # oc get deployment rook-ceph-osd-<osd-id> -oyaml | grep ceph.rook.io/pvc
  3. Delete the osd-prepare pod.
    # oc delete -n openshift-storage pod rook-ceph-osd-prepare-<pvc-from-above-command>-<pod-suffix>
  4. Remove the failed OSD from the cluster.
    # failed_osd_id=<osd-id>
    
    # oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
  5. Verify that the OSD is removed successfully by checking the logs.
    # oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
  6. Optional: If you get an error as FAILED_OSD_ID from the ocs-osd-removal-job pod in OpenShift Container Platform, see Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs.
  7. Delete the OSD deployment.
    oc delete deployment rook-ceph-osd-<osd-id>

What to do next

Verification step
  • To check if the OSD is deleted successfully, run:
    oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>

    This command must return the status as Completed.