How To
Summary
This article includes the steps to clean up the disk that was used by Ceph OSD earlier and create a new OSD on it.
Objective
Environment
Steps
- Make sure that the cluster is healthy and all pgs are in an active+clean state. Removing an OSD when the Ceph cluster is not in a healthy state and pgs are not active+something can result in
Data Loss - If the goal is to replace 2 or more OSDs, remove one OSD at a time. The health of the Ceph must be checked and ensure that the health is
HEALTH_OKbetween two OSD removals. Failure to do so can result inData Loss. - Removing 2 or more OSDs back to back
willresults inData Loss. - Removing 2 or more OSDs in a single command line
willresult inData Loss.
Then, run the #ceph -s command to check cluster health:
# ceph -s
cluster:
id: 0782432c-e1aa-11ec-aed8-fa163ed0b37b
health: HEALTH_OK <--- Overall health must be HEALTH_OK
services:
mon: 5 daemons, quorum mgmt-0.icemanny01.lab.pnq2.cee.redhat.com,osds-2,osds-1,osds-0,mons-1 (age 2h)
mgr: mgmt-0.icemanny01.lab.pnq2.cee.redhat.com.fycvbc(active, since 4w), standbys: osds-2.gmcizc
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
data:
pools: 1 pools, 1 pgs
objects: 3 objects, 0 B
usage: 58 MiB used, 30 GiB / 30 GiB avail
pgs: 1 active+clean <--- Must be only Active+Clean
- If the overall health of the system is anything other than HEALTH_OK, Do Not Proceed
- If the state of the Placement Groups (PGs) reports anything other than Active+Clean, Do Not Proceed
Examples of PG states we do NOT want to see:Creating, Peering, Degraded, Recovering, Migrating, Backfilling, Remapped. - If any OSDs are full or near full, Do Not Proceed
# ceph health detail
HEALTH_ERR 1 full osd(s); 6 near full osd(s)
osd.60 is full at 95%
osd.0 is near full at 86%
osd.4 is near full at 91%
osd.8 is near full at 92%
- If one is certain the
Cephcomponent ishealthyandnot too full, proceed with the OSD removal steps. - Take care to read
entire Resolution Sectionand follow the steps for your organization's type and version of OCS. - Allow at least one day for the
Cephcomponent to rebalance the data. - Do not proceed with removing another OSD unless the
Cephcomponents are healthy. - Again, removing OSDs back to back or removing 2 or more OSDs in one command line
willresult inData Loss.
# oc scale deployment isf-cns-operator-controller-manager --replicas=0 -n ibm-spectrum-fusion-ns
# oc scale deployment ocs-operator rook-ceph-operator --replicas=0
# oc get -n openshift-storage -o yaml deployment rook-ceph-osd-{osd-id} | grep "ceph.rook.io/pvc:"
Example output:
ceph.rook.io/pvc: ocs-deviceset-lso-volumeset-0-data-0w75fl
# oc get -n openshift-storage pvc ocs-deviceset-lso-volumeset-0-data-0w75fl
Example output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ocs-deviceset-lso-volumeset-0-data-0w75fl Bound local-pv-e37b8c4a 512Gi RWO lso-volumeset 2d
# oc get pv local-pv-e37b8c4a -oyaml | grep path
f: path: {}
path: /mnt/local-storage/lso-volumeset/scsi-36000c2914f732fdde713794b8fb5714a
# oc debug node/<node_with_failed_osd>
# chroot/host
# ls -l /mnt/local-storage/lso-volumeset/
total 0
lrwxrwxrwx. 1 root root 43 Jan 3 10:56 scsi-36000c2914f732fdde713794b8fb5714a -> /dev/disk/by-id/scsi-36000c2914f732fdde713794b8fb5714a
# ls -l /dev/disk/by-id/
total 0
.
lrwxrwxrwx.1 root root 9 Jan 3 10:48 scsi-36000c2914f732fdde713794b8fb5714a -> ../../sdb <<<<<
# osd_id_to_remove=0
# oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
Example output:
deployment.extensions/rook-ceph-osd-0 scaled
osd_id_to_remove is the integer in the pod name immediately after the rook-ceph-osd prefix. In this example, the deployment name is rook-ceph-osd-0. 4. Verify that the rook-ceph-osd pod is terminated.
# oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
Example output:
No resources found in openshift-storage namespace.
i. Delete any old ocs-osd-removal jobs.
# oc delete -n openshift-storage job ocs-osd-removal-job
Example output:
job.batch "ocs-osd-removal-job" deleted
openshift-storage project.
# oc project openshift-storage
# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=false | oc create -n openshift-storage -f -
Example output:
job.batch/ocs-osd-removal-job created
NOTE: In OCP 4.10 you can get this error, from the "ocs-osd-removal-job" pod that runs the commands to delete the OSD device on the Ceph cluster:
# oc logs -n openshift-storage ocs-osd-removal-job-cj522
2022-04-26 13:35:54.741879 I | rookcmd: starting Rook v4.10.0-0.ed62be54b2371ca23ae9b81137a2d301d032f164 with arguments '/usr/local/bin/rook ceph osd remove --osd-ids=0 --force-osd-removal false'
2022-04-26 13:35:56.047744 I | cephosd: validating status of osd.0
2022-04-26 13:35:56.047762 I | cephosd: osd.0 is marked 'DOWN'
2022-04-26 13:35:56.047772 D | exec: Running command: ceph osd safe-to-destroy 0 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-04-26 13:35:56.313675 W | cephosd: osd.0 is NOT be ok to destroy, retrying in 1m until success
The solution is to run the OSD Removal job with FORCE_OSD_REMOVAL=true, as per Bug 2059027 - Device Replacement with FORCE_OSD_REMOVAL, OSD moved to the "destroyed" state. Run this command when all PGs are active+<something>. If not, PGs must complete backfilling or be investigated to ensure they are active.
ocs-osd-removal-job pod.
A status of Completed confirms that the OSD removal job succeeded.
# oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
# oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
Example output:
2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0
8. If encryption was enabled at the time of installation, remove dm-crypt the managed device-mapper mapping from the OSD devices that are removed from the respective OpenShift Data Foundation nodes. If the encryption is not enabled skip step 8 and jumps to step 9.
i. Get the Persistent Volume Claim (PVC) name of the replaced OSD from the logs of ocs-osd-removal-job pod.
# oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i 'pvc|deviceset'
Example output:
2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxx
# oc debug node/<node_with_failed_osd>
# chroot/host
# dmsetup ls| grep <pvc name>
Example output:
ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt (253:0)
<ocs-deviceset-name> Is the name of the relevant device based on the PVC names identified in the previous step)
# cryptsetup luksClose --debug --verbose <ocs-deviceset-name>
# oc debug node/<node_with_failed_osd>
# chroot/host
# lsblk
# sgdisk -Z /dev/sdb
# wipefs -af /dev/sdb <Mention the disk name for failed osd which we identified in step 2- iv>
- The disk can be cleaned with the below steps:
DISK="/dev/sdX"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
sgdisk --zap-all $DISK
# Wipe portions of the disk to remove more LVM metadata that may be present
$ for gb in 0 1 10 100 1000; do dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((gb * 1024**2)); done
# SSDs may be better cleaned with blkdiscard instead of dd
# This might not be supported on all devices
blkdiscard $DISK
# Inform the OS of partition table changes
partprobe $DISK
# oc delete pv local-pv-e37b8c4a
Example output:
persitentvolume "local-pv-e37b8c4a" deleted
ocs-osd-removal jobs.
# oc delete -n openshift-storage job ocs-osd-removal-job
# oc get pv | grep Available
# oc scale deployment ocs-operator rook-ceph-operator --replicas=1
# oc scale deployment isf-cns-operator-controller-manager --replicas=1 -n ibm-spectrum-fusion-ns
# oc get -n openshift-storage pvc
# oc get -n openshift-storage pods -l app=rook-ceph-osd
Additional Information
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
19 August 2025
UID
ibm17105098