Disk replacement and OpenShift dedicated OSD failure recovery
Learn how to replace a faulty disk in a Red Hat® OpenShift® dedicated Object Storage Device (OSD) cluster.
About this task
This procedure involves identifying the faulty OSD, scaling down its deployment, deleting the old pod, creating a new Persistent Volume Claim (PVC), and verifying the new OSD.
Procedure
- In the Red Hat OpenShift console, go to Storage > Data Foundation and check the health of the storage.
- Do the following checks before you remove the disk:
- Run the following command to check the status of the disk:
oc get nodes
Example output:NAME STATUS ROLES AGE VERSION compute-1-ru3.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-1-ru4.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-1-ru5.gen2070810.mydomain.com Ready worker 7d5h v1.29.10+67d3387 compute-1-ru6.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-1-ru7.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-1-ru8.gen2070810.mydomain.com Ready worker 57d v1.29.10+67d3387 compute-2-ru3.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-2-ru4.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-2-ru5.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-2-ru6.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-2-ru7.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-2-ru8.gen2070810.mydomain.com Ready worker 7d6h v1.29.10+67d3387 compute-2-ru9.gen2070810.mydomain.com Ready worker 7d6h v1.29.10+67d3387 compute-3-ru3.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 compute-3-ru4.gen2070810.mydomain.com Ready worker 69d v1.29.10+67d3387 control-1-ru2.gen2070810.mydomain.com Ready control-plane,master 70d v1.29.10+67d3387 control-2-ru2.gen2070810.mydomain.com Ready control-plane,master 70d v1.29.10+67d3387 control-3-ru2.gen2070810.mydomain.com Ready control-plane,master 70d v1.29.10+67d3387
compute-1-ru3.gen2070810.mydomain.com Ready - Run the following command to get the status of Machine Config Pools (MCPs):
oc get mcp
Example output:NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-2a12402e6914f4b1e405dfed2bc8c6f8 True False False 3 3 3 0 70d worker rendered-worker-bb3f210603cc8b5978a6347cc87c8657 True False False 15 15 15 0 70d
- Run the following command to retrieve the devices on the particular host and check the
health and performance of your storage cluster:
Ceph -s
Example output:cluster: id: 4141c39c-14e1-478f-9cb1-3a8d8a57d22c health: HEALTH_OK services: mon: 3 daemons, quorum a,c,d (age 39h) mgr: b(active, since 69m), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 33 osds: 33 up (since 69m), 33 in (since 70m); 122 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 1289 pgs objects: 12.22M objects, 47 TiB usage: 139 TiB used, 92 TiB / 231 TiB avail pgs: 1496747/36662775 objects misplaced (4.082%) 1165 active+clean 118 active+remapped+backfill_wait 4 active+remapped+backfilling 2 active+clean+scrubbing io: client: 1.4 MiB/s rd, 1.2 MiB/s wr, 50 op/s rd, 108 op/s wr recovery: 471 MiB/s, 118 objects/s progress: Global Recovery Event (68m) [=========================...] (remaining: 7m)
- Run the following command to view a hierarchical representation of your Ceph storage
clusters, including the status, weight, and location of each OSD:
ceph osd tree
Example output:ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 230.54782 root default -7 34.93149 host compute-1-ru5-isf-racka-rtp-raleigh-ibm-com 2 ssd 6.98630 osd.2 up 1.00000 1.00000 5 ssd 6.98630 osd.5 up 1.00000 1.00000 8 ssd 6.98630 osd.8 up 1.00000 1.00000 11 ssd 6.98630 osd.11 up 1.00000 1.00000 14 ssd 6.98630 osd.14 up 1.00000 1.00000 -3 34.93149 host compute-1-ru6-isf-racka-rtp-raleigh-ibm-com 1 ssd 6.98630 osd.1 up 1.00000 1.00000 4 ssd 6.98630 osd.4 up 1.00000 1.00000 7 ssd 6.98630 osd.7 up 1.00000 1.00000 9 ssd 6.98630 osd.9 up 1.00000 1.00000 13 ssd 6.98630 osd.13 up 1.00000 1.00000 -5 34.93149 host compute-1-ru7-isf-racka-rtp-raleigh-ibm-com 0 ssd 6.98630 osd.0 up 1.00000 1.00000 3 ssd 6.98630 osd.3 up 1.00000 1.00000 6 ssd 6.98630 osd.6 up 1.00000 1.00000 10 ssd 6.98630 osd.10 up 1.00000 1.00000 12 ssd 6.98630 osd.12 up 1.00000 1.00000 -11 41.91779 host control-1-ru2-isf-racka-rtp-raleigh-ibm-com 16 ssd 6.98630 osd.16 up 1.00000 1.00000 17 ssd 6.98630 osd.17 up 1.00000 1.00000 18 ssd 6.98630 osd.18 up 1.00000 1.00000 19 ssd 6.98630 osd.19 up 1.00000 1.00000 30 ssd 6.98630 osd.30 up 1.00000 1.00000 33 ssd 6.98630 osd.33 up 1.00000 1.00000 -13 41.91779 host control-1-ru3-isf-racka-rtp-raleigh-ibm-com 15 ssd 6.98630 osd.15 up 1.00000 1.00000 26 ssd 6.98630 osd.26 up 1.00000 1.00000 27 ssd 6.98630 osd.27 up 1.00000 1.00000 28 ssd 6.98630 osd.28 up 1.00000 1.00000 29 ssd 6.98630 osd.29 up 1.00000 1.00000 32 ssd 6.98630 osd.32 up 1.00000 1.00000 -9 41.91779 host control-1-ru4-isf-racka-rtp-raleigh-ibm-com 20 ssd 6.98630 osd.20 up 1.00000 1.00000 21 ssd 6.98630 osd.21 up 1.00000 1.00000 22 ssd 6.98630 osd.22 up 1.00000 1.00000 23 ssd 6.98630 osd.23 up 1.00000 1.00000 24 ssd 6.98630 osd.24 up 1.00000 1.00000 31 ssd 6.98630 osd.31 up 1.00000 1.00000
- Run the following command to list pods associated with OSDs:
oc get pods | grep osd
Example output:rook-ceph-osd-0-85fcb5fd9-5cvws 2/2 Running 0 8d rook-ceph-osd-1-5dd8bc8d9d-cbz7g 2/2 Running 0 37d rook-ceph-osd-10-7cc49487b5-bdw6w 2/2 Running 0 8d rook-ceph-osd-11-84cd6fb7d7-xpn22 2/2 Running 0 37d rook-ceph-osd-12-7bb579498c-25tzh 2/2 Running 0 8d rook-ceph-osd-13-866dbc7f57-kqqpn 2/2 Running 0 37d rook-ceph-osd-14-6f7f6dd89b-7skg8 2/2 Running 0 37d rook-ceph-osd-2-5fd5548c56-gclrv 2/2 Running 0 37d rook-ceph-osd-4-6f4fb99d7-w7vrb 2/2 Running 0 37d rook-ceph-osd-5-7b7ff5cfb5-blpj7 2/2 Running 0 37d rook-ceph-osd-6-5d9856b54f-bjpmm 2/2 Running 0 8d rook-ceph-osd-7-7f79d4b76-jxz8x 2/2 Running 0 37d rook-ceph-osd-8-75497d6999-wz296 2/2 Running 0 37d rook-ceph-osd-9-56d5cc7c59-w29n2 2/2 Running 0 37d rook-ceph-osd-key-rotation-0-29054880-fvbmb 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-1-29034720-jx9wf 0/1 Completed 0 18d
In this example, all pods in the
openshift-storage
namespace are running. - Run the following command to retrieve a list of pods:
oc get pods | grep -v Running
Example output:NAME READY STATUS RESTARTS AGE rook-ceph-osd-key-rotation-0-29054880-fvbmb 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-1-29034720-jx9wf 0/1 Completed 0 18d rook-ceph-osd-key-rotation-1-29044800-fnww9 0/1 Completed 0 11d rook-ceph-osd-key-rotation-1-29054880-tj6nc 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-10-29054880-w4vrz 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-11-29034720-nb54s 0/1 Completed 0 18d rook-ceph-osd-key-rotation-11-29044800-6hgn5 0/1 Completed 0 11d rook-ceph-osd-key-rotation-11-29054880-kvbcz 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-12-29054880-tcfsk 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-13-29034720-68dw6 0/1 Completed 0 18d rook-ceph-osd-key-rotation-13-29044800-9t7b4 0/1 Completed 0 11d rook-ceph-osd-key-rotation-13-29054880-8jdrm 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-14-29034720-kt6h8 0/1 Completed 0 18d rook-ceph-osd-key-rotation-14-29044800-dqcc5 0/1 Completed 0 11d rook-ceph-osd-key-rotation-14-29054880-5rg67 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-16-29034720-x2zv6 0/1 Completed 0 18d rook-ceph-osd-key-rotation-16-29044800-z2k6x 0/1 Completed 0 11d rook-ceph-osd-key-rotation-16-29054880-krxk8 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-17-29034720-9hqt7 0/1 Completed 0 18d rook-ceph-osd-key-rotation-17-29044800-b7cz4 0/1 Completed 0 11d rook-ceph-osd-key-rotation-17-29054880-ppltq 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-18-29034720-cb5nj 0/1 Completed 0 18d rook-ceph-osd-key-rotation-18-29044800-x56b2 0/1 Completed 0 11d rook-ceph-osd-key-rotation-18-29054880-pslk5 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-19-29034720-h2l8v 0/1 Completed 0 18d rook-ceph-osd-key-rotation-19-29044800-v8lm2 0/1 Completed 0 11d rook-ceph-osd-key-rotation-19-29054880-w28k6 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-2-29034720-spfv7 0/1 Completed 0 18d rook-ceph-osd-key-rotation-2-29044800-6cbr4 0/1 Completed 0 11d rook-ceph-osd-key-rotation-2-29054880-bc29w 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-20-29034720-n2gf4 0/1 Completed 0 18d rook-ceph-osd-key-rotation-20-29044800-dw7dc 0/1 Completed 0 11d rook-ceph-osd-key-rotation-20-29054880-qfmjt 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-21-29034720-x9chb 0/1 Completed 0 18d rook-ceph-osd-key-rotation-21-29044800-mdnxx 0/1 Completed 0 11d rook-ceph-osd-key-rotation-21-29054880-gq4rq 0/1 Completed 0 4d7h rook-ceph-osd-key-rotation-22-29034720-stdk4 0/1 Completed 0 18d rook-ceph-osd-key-rotation-22-29044800-2jbp2 0/1 Completed 0 11d
In this example, all pods in the
openshift-storage
namespace are running. - Run the following command to get the pod status:
oc get pods | grep -v 'Running\|Completed'
- Run the following command to check the status of the disk:
- Physically remove a single disk from the storage node within the rack unit as
follows:
- Go to Workloads > Pods.
- Select the rook-ceph-tools pod, and go to the
Terminal tab. The Terminal tab displays how many ODs are down.
- Run the following command to view a hierarchical representation of your Ceph storage
clusters, including the status, weight, and location of each OSD:
ceph osd tree
Example output:ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 52.39632 root default -4 17.46544 rack gen2007 -15 3.49309 host compute-1-ru3-gen2070810-mydomain-com -5 ssd 3.49309 osd.5 up 1.00000 1.00000 -3 6.98618 host compute-1-ru4-gen2070810-mydomain-com 0 ssd 3.49309 osd.0 up 1.00000 1.00000 3 ssd 3.49309 osd.3 up 1.00000 1.00000 -19 6.98618 host control-1-ru2-gen2070810-mydomain-com 9 ssd 3.49309 osd.9 up 1.00000 1.00000 10 ssd 3.49309 osd.10 up 1.00000 1.00000 -12 17.46544 rack gen2008 -21 3.49309 host compute-2-ru3-gen2070810-mydomain-com 8 ssd 3.49309 osd.8 up 1.00000 1.00000 -11 6.98618 host compute-2-ru4-gen2070810-mydomain-com 2 ssd 3.49309 osd.2 up 1.00000 1.00000 4 ssd 3.49309 osd.4 up 1.00000 1.00000 -23 6.98618 host control-2-ru2-gen2070810-mydomain-com 11 ssd 3.49309 osd.11 up 1.00000 1.00000 14 ssd 3.49309 osd.14 up 1.00000 1.00000 -8 17.46544 rack gen2010 -7 3.49309 host compute-3-ru3-gen2070810-mydomain-com 1 ssd 3.49309 osd.1 up 1.00000 1.00000 -25 6.98618 host compute-3-ru4-gen2070810-mydomain-com 12 ssd 3.49309 osd.12 down 1.00000 1.00000 13 ssd 3.49309 osd.13 up 1.00000 1.00000 -17 6.98618 host control-3-ru2-gen2070810-mydomain-com 6 ssd 3.49309 osd.6 up 1.00000 1.00000 7 ssd 3.49309 osd.7 up 1.00000 1.00000
In this example, the
osd.12
pod is down. - Run the following command to get the pod status:
oc get pods -n openshift-storage
Example output:NAME READY STATUS RESTARTS AGE compute-3-ru3gen2070810mydomaincom-debug-pln6g 1/1 Running 0 61s csi-addons-controller-manager-585444c85d-67mlg 2/2 Running 0 8d csi-cephfsplugin-28p6r 2/2 Running 0 60d csi-cephfsplugin-5v8lz 2/2 Running 0 60d csi-cephfsplugin-7js8t 2/2 Running 5 (8d ago) 60d csi-cephfsplugin-dpvwm 2/2 Running 0 60d csi-cephfsplugin-g7d9w 2/2 Running 12 (48d ago) 60d csi-cephfsplugin-kc8c4 2/2 Running 0 60d csi-cephfsplugin-kdt94 2/2 Running 1 (7d6h ago) 7d6h csi-cephfsplugin-klxtb 2/2 Running 0 60d csi-cephfsplugin-lczkp 2/2 Running 0 7d7h csi-cephfsplugin-m46mm 2/2 Running 4 (48d ago) 52d csi-cephfsplugin-mntrc 2/2 Running 3 (6d13h ago) 60d csi-cephfsplugin-mqrrz 2/2 Running 0 7d7h csi-cephfsplugin-provisioner-796d9bb95c-c5vhw 5/5 Running 1 (60d ago) 60d csi-cephfsplugin-provisioner-796d9bb95c-c7jzz 5/5 Running 1 (8d ago) 8d csi-cephfsplugin-rlbh2 2/2 Running 0 60d csi-cephfsplugin-t7t7w 2/2 Running 0 60d csi-cephfsplugin-tn7sd 2/2 Running 173 57d csi-cephfsplugin-w59ks 2/2 Running 0 60d csi-cephfsplugin-xxhx7 2/2 Running 24 (6d11h ago) 60d csi-cephfsplugin-z4hq2 2/2 Running 1 (60d ago) 60d csi-rbdplugin-4s6vv 3/3 Running 1 (60d ago) 60d csi-rbdplugin-6mq2k 3/3 Running 30 (6d11h ago) 60d csi-rbdplugin-8kb98 3/3 Running 0 60d csi-rbdplugin-ftdl6 3/3 Running 0 60d csi-rbdplugin-fz2m9 3/3 Running 0 60d csi-rbdplugin-gswzx 3/3 Running 0 7d7h csi-rbdplugin-j4xcv 3/3 Running 7 (8d ago) 60d csi-rbdplugin-lt556 3/3 Running 5 (48d ago) 52d csi-rbdplugin-n27q5 3/3 Running 0 60d csi-rbdplugin-provisioner-7cf4b47dd9-hgl4r 6/6 Running 0 5d7h csi-rbdplugin-provisioner-7cf4b47dd9-jjtj5 6/6 Running 3 (8d ago) 60d csi-rbdplugin-q9jrz 3/3 Running 2 (7d6h ago) 7d6h csi-rbdplugin-qms2x 3/3 Running 242 57d csi-rbdplugin-qwbc9 3/3 Running 0 60d csi-rbdplugin-trwqj 3/3 Running 0 60d csi-rbdplugin-v9jvw 3/3 Running 0 60d csi-rbdplugin-z8b9f 3/3 Running 0 7d7h csi-rbdplugin-zrfs9 3/3 Running 0 60d csi-rbdplugin-zrjz8 3/3 Running 15 (48d ago) 60d csi-rbdplugin-ztvp9 3/3 Running 4 (6d13h ago) 60d noobaa-core-0 2/2 Running 0 60d noobaa-db-pg-0 1/1 Running 0 60d noobaa-endpoint-6747c855c9-5lk88 1/1 Running 0 60d noobaa-operator-6b6f9d8dbf-s4hz5 1/1 Running 1 (8d ago) 60d ocs-client-operator-console-7d85dc6bf9-dmdj7 1/1 Running 0 60d ocs-client-operator-controller-manager-85bbcc7bfd-lkqsk 2/2 Running 2 (8d ago) 60d ocs-metrics-exporter-858847c98f-j492c 1/1 Running 0 60d ocs-operator-68fd7c69df-kzkd6 1/1 Running 0 60d ocs-provider-server-5488864bb7-868p5 1/1 Running 0 60d odf-console-69bdbf8bc4-jq7j9 1/1 Running 0 8d odf-operator-controller-manager-59465654c-69dp5 2/2 Running 0 8d rook-ceph-crashcollector-compute-1-ru3.gen2070810.mydomainxzt2s 1/1 Running 0 48d rook-ceph-crashcollector-compute-1-ru4.gen2070810.mydomainv8l42 1/1 Running 0 60d rook-ceph-crashcollector-compute-2-ru3.gen2070810.mydomain24c4g 1/1 Running 0 60d rook-ceph-crashcollector-compute-2-ru4.gen2070810.mydomainkxt4h 1/1 Running 0 60d rook-ceph-crashcollector-compute-3-ru3.gen2070810.mydomainxw2qh 1/1 Running 0 60d rook-ceph-crashcollector-compute-3-ru4.gen2070810.mydomain7nrfg 1/1 Running 1 7d5h rook-ceph-crashcollector-control-1-ru2.gen2070810.mydomain5ncj4 1/1 Running 0 60d rook-ceph-crashcollector-control-2-ru2.gen2070810.mydomaing7gdv 1/1 Running 0 60d rook-ceph-crashcollector-control-3-ru2.gen2070810.mydomain7h5gx 1/1 Running 0 8d rook-ceph-exporter-compute-1-ru3.gen2070810.mydomain.com-d2hh6t 1/1 Running 0 48d rook-ceph-exporter-compute-1-ru4.gen2070810.mydomain.com-584wsh 1/1 Running 0 60d rook-ceph-exporter-compute-2-ru3.gen2070810.mydomain.com-7rd2st 1/1 Running 0 60d rook-ceph-exporter-compute-2-ru4.gen2070810.mydomain.com-6lxxs6 1/1 Running 0 60d rook-ceph-exporter-compute-3-ru3.gen2070810.mydomain.com-6rpqbx 1/1 Running 0 60d rook-ceph-exporter-compute-3-ru4.gen2070810.mydomain.com-8dfghn 1/1 Running 1 7d5h rook-ceph-exporter-control-1-ru2.gen2070810.mydomain.com-6wjtzh 1/1 Running 0 60d rook-ceph-exporter-control-2-ru2.gen2070810.mydomain.com-dpqw2b 1/1 Running 0 60d rook-ceph-exporter-control-3-ru2.gen2070810.mydomain.com-7shzhw 1/1 Running 0 8d rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-68c6f4d7l9rkl 2/2 Running 0 60d rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5c9454ffghdgc 2/2 Running 0 60d rook-ceph-mgr-a-5bbc79cc47-qlgfg 3/3 Running 0 60d rook-ceph-mgr-b-74b7b7dbc8-lcljs 3/3 Running 0 60d rook-ceph-mon-a-689f4f88c9-bhwsx 2/2 Running 0 60d rook-ceph-mon-b-d95657f6c-xr5sj 2/2 Running 0 60d rook-ceph-mon-c-6f44b749bd-69x97 2/2 Running 0 60d rook-ceph-osd-0-6bd74f8f7-28z9b 2/2 Running 0 60d rook-ceph-osd-1-f58fcdc48-q9ddv 2/2 Running 0 60d rook-ceph-osd-10-55ccb7d855-lbpzt 2/2 Running 0 60d rook-ceph-osd-11-5b5ff878f4-p55wr 2/2 Running 0 60d rook-ceph-osd-12-788699f47d-9r726 1/2 CrashLoopBackOff 7 (58s ago) 7d5h rook-ceph-osd-13-94c9c995b-cjndf 2/2 Running 2 7d5h rook-ceph-osd-14-645d67d598-fp8c8 2/2 Running 0 60d rook-ceph-osd-2-856fb4c7d5-5m2rr 2/2 Running 0 60d rook-ceph-osd-3-545f957b8f-cj98t 2/2 Running 0 60d rook-ceph-osd-4-6b56d59b7c-r2qmv 2/2 Running 0 60d rook-ceph-osd-5-ffdbf57b4-vkhbq 2/2 Running 0 48d rook-ceph-osd-6-7668b4b68b-4tfg8 2/2 Running 0 8d rook-ceph-osd-7-64d9795ffc-d6bf7 2/2 Running 0 8d rook-ceph-osd-8-55b798bdd5-xcnqr 2/2 Running 0 60d rook-ceph-osd-9-7745f8bd55-7zhv8 2/2 Running 0 60d rook-ceph-osd-key-rotation-0-29024640-574h2 0/1 Completed 0 16d rook-ceph-osd-key-rotation-0-29034720-7gwfq 0/1 Completed 0 9d rook-ceph-osd-key-rotation-0-29044800-ps6hl 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-1-29024640-gsbpj 0/1 Completed 0 16d rook-ceph-osd-key-rotation-1-29034720-x4fqn 0/1 Completed 0 9d rook-ceph-osd-key-rotation-1-29044800-hvvwv 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-10-29024640-wdkq4 0/1 Completed 0 16d rook-ceph-osd-key-rotation-10-29034720-p56qd 0/1 Completed 0 9d rook-ceph-osd-key-rotation-10-29044800-kq94k 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-11-29024640-8mslv 0/1 Completed 0 16d rook-ceph-osd-key-rotation-11-29034720-gqjd2 0/1 Completed 0 9d rook-ceph-osd-key-rotation-11-29044800-4q4xd 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-12-29044800-6fdqz 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-13-29044800-7bpbn 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-14-29024640-6tfrx 0/1 Completed 0 16d rook-ceph-osd-key-rotation-14-29034720-vtvmd 0/1 Completed 0 9d rook-ceph-osd-key-rotation-14-29044800-26lcc 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-2-29024640-b8n6j 0/1 Completed 0 16d rook-ceph-osd-key-rotation-2-29034720-tj64z 0/1 Completed 0 9d rook-ceph-osd-key-rotation-2-29044800-cjzq4 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-3-29024640-qt5fp 0/1 Completed 0 16d rook-ceph-osd-key-rotation-3-29034720-8l4zw 0/1 Completed 0 9d rook-ceph-osd-key-rotation-3-29044800-hh5tj 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-4-29024640-f5hd7 0/1 Completed 0 16d rook-ceph-osd-key-rotation-4-29034720-p2v89 0/1 Completed 0 9d rook-ceph-osd-key-rotation-4-29044800-dxvkt 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-5-29024640-bvpnr 0/1 Completed 0 16d rook-ceph-osd-key-rotation-5-29034720-w8m48 0/1 Completed 0 9d rook-ceph-osd-key-rotation-5-29044800-9djwj 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-6-29044800-j8q24 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-7-29044800-b5pfn 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-8-29024640-njxbk 0/1 Completed 0 16d rook-ceph-osd-key-rotation-8-29034720-cl5pr 0/1 Completed 0 9d rook-ceph-osd-key-rotation-8-29044800-g2t6z 0/1 Completed 0 2d17h rook-ceph-osd-key-rotation-9-29024640-59nl9 0/1 Completed 0 16d rook-ceph-osd-key-rotation-9-29034720-h997f 0/1 Completed 0 9d rook-ceph-osd-key-rotation-9-29044800-zchq4 0/1 Completed 0 2d17h rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-76f8b44x62z2 2/2 Running 0 60d rook-ceph-tools-68d744d548-smkrw 1/1 Running 0 60d storageclient-737342087af10580-status-reporter-29048734-gfbb7 0/1 Completed 0 35s ux-backend-server-7796f96896-zcbjs 2/2 Running 0 8d
In this example, the
osd-12
pod is in theCrashLoopBackOff
state.
- Do the following steps to clean up the OSD that is down:
- Run the following command to get the failed OSD:
oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
Example output:NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rook-ceph-osd-0-6bd74f8f7-28z9b 2/2 Running 0 60d 172.18.78.23 compute-1-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-1-f58fcdc48-q9ddv 2/2 Running 0 60d 172.18.78.62 compute-3-ru3.gen2070810.mydomain.com <none> <none> rook-ceph-osd-10-55ccb7d855-lbpzt 2/2 Running 0 60d 172.18.78.21 control-1-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-11-5b5ff878f4-p55wr 2/2 Running 0 60d 172.18.78.41 control-2-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-12-788699f47d-9r726 1/2 CrashLoopBackOff 8 (2m27s ago) 7d5h 172.18.78.63 compute-3-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-13-94c9c995b-cjndf 2/2 Running 2 7d5h 172.18.78.63 compute-3-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-14-645d67d598-fp8c8 2/2 Running 0 60d 172.18.78.41 control-2-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-2-856fb4c7d5-5m2rr 2/2 Running 0 60d 172.18.78.43 compute-2-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-3-545f957b8f-cj98t 2/2 Running 0 60d 172.18.78.23 compute-1-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-4-6b56d59b7c-r2qmv 2/2 Running 0 60d 172.18.78.43 compute-2-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-5-ffdbf57b4-vkhbq 2/2 Running 0 48d 172.18.78.22 compute-1-ru3.gen2070810.mydomain.com <none> <none> rook-ceph-osd-6-7668b4b68b-4tfg8 2/2 Running 0 8d 172.18.78.61 control-3-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-7-64d9795ffc-d6bf7 2/2 Running 0 8d 172.18.78.61 control-3-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-8-55b798bdd5-xcnqr 2/2 Running 0 60d 172.18.78.42 compute-2-ru3.gen2070810.mydomain.com <none> <none>
In this example, the
osd-12
pod is inCrashLoopBackOff
state. It indicates that theosd-12
is down and in failed stateNote: Use the same method and check theceph osd
tree in therook-ceph-tool
pod. - Scale down the following operators in sequence:
- Scale down the
ocs-operator
.-
Go to Workloads > Pods page.
The
ocs-operator
appears in the Pods list. -
Go to Workloads > Deployments page.
The
ocs-operator
appears in the deployed Project list. -
Click ocs-operator and check the Deployment details tab.
For example, the Deployment details tab displays that the number of Pod is 1.
- Scale down the Pod number to 0.
-
Go back to the Pods tab and check for the
ocs-operator
.The
ocs-operator
does not appear or is no longer available in the Pods list.
-
- Scale down the
rook-ceph-operator
.-
Go to Workloads > Pods page.
The
rook-ceph-operator
appears in the Pods list. -
Go to Workloads > Deployments page.
The
rook-ceph-operator
appears in the deployed Project list. -
Click rook-ceph-operator and check the Deployment details tab.
For example, the Deployment details tab displays that the number of Pod is 1.
- Scale down the Pod number to 0.
-
Go back to the Pods tab and check for the
rook-ceph-operator
.The
rook-ceph-operator
does not appear or is no longer available in the Pods list.
-
- Scale down the
- Scale down the OSD deployment as follows:
-
Verify the OSD ID from previous step.
The OSD ID is
rook-ceph-osd-0-6c99fc999b-2s9mr
, and pod ID is0
.osd_id_to_remove=<replace-it-with-osd-id> oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
Example output:[osd_id_to_remove=12 oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0 deployment.apps/rook-ceph-osd-12 scaled
-
Scale down the OSD deployment replica to 0.
osd_id_to_remove=<replace-it-with-osd-id> oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
Example output:[osd_id_to_remove=12 oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0 deployment.apps/rook-ceph-osd-12 scaled
- Wait until the rook-ceph-osd pod is in the
Terminated
state.If not, run the following oc command to terminate therook-ceph-osd
pod:oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
Example output:NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rook-ceph-osd-0-6bd74f8f7-28z9b 2/2 Running 0 60d 172.18.78.23 compute-1-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-1-f58fcdc48-q9ddv 2/2 Running 0 60d 172.18.78.62 compute-3-ru3.gen2070810.mydomain.com <none> <none> rook-ceph-osd-10-55ccb7d855-lbpzt 2/2 Running 0 60d 172.18.78.21 control-1-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-11-5b5ff878f4-p55wr 2/2 Running 0 60d 172.18.78.41 control-2-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-13-94c9c995b-cjndf 2/2 Running 2 7d5h 172.18.78.63 compute-3-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-14-645d67d598-fp8c8 2/2 Running 0 60d 172.18.78.41 control-2-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-2-856fb4c7d5-5m2rr 2/2 Running 0 60d 172.18.78.43 compute-2-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-3-545f957b8f-cj98t 2/2 Running 0 60d 172.18.78.23 compute-1-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-4-6b56d59b7c-r2qmv 2/2 Running 0 60d 172.18.78.43 compute-2-ru4.gen2070810.mydomain.com <none> <none> rook-ceph-osd-5-ffdbf57b4-vkhbq 2/2 Running 0 48d 172.18.78.22 compute-1-ru3.gen2070810.mydomain.com <none> <none> rook-ceph-osd-6-7668b4b68b-4tfg8 2/2 Running 0 8d 172.18.78.61 control-3-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-7-64d9795ffc-d6bf7 2/2 Running 0 8d 172.18.78.61 control-3-ru2.gen2070810.mydomain.com <none> <none> rook-ceph-osd-8-55b798bdd5-xcnqr 2/2 Running 0 60d 172.18.78.42 compute-2-ru3.gen2070810.mydomain.com <none> <none> rook-ceph-osd-9-7745f8bd55-7zhv8 2/2 Running 0 60d 172.18.78.21 control-1-ru2.gen2070810.mydomain.com <none> <none>
-
- Do the following steps to remove the previous OSD jobs from the cluster:
- Delete any old or outdated
ocs-osd-removal
jobs.Run the following oc command to deleteocs-osd-removal
jobs.oc delete -n openshift-storage job ocs-osd-removal-job
-
Remove the old or outdated OSD from the cluster.Note:
- Ensure that you set the correct
osd_id_to_remove
. - Change the FORCE_OSD_REMOVAL value to
true
for clusters that only have three OSDs or in cases where the clusters have insufficient space to restore all three replicas of the data after the OSD is removed.
- Run the following command for more than three OSDs:
oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
- Run the following command for only three OSDs or insufficient space (force
delete)
oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
Example output:FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f - job.batch/ocs-osd-removal-job created
- Ensure that you set the correct
- Verify the OSD removal: Wait for the
ocs-osd-removal-job
to complete, and then run the following command to verify whether the OSD removal is complete.oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
Example output:NAME READY STATUS RESTARTS AGE ocs-osd-removal-job-s4vhc 0/1 Completed 0 24s
Run the following command to confirm the logs again:oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
Example output:
2025-03-25 17:49:01.967466 I | cephosd: completed removal of OSD 12
Run the following command to confirm the persistent volume (PV) state.oc get pv -n openshift-storage |grep Released
The PVC is in the
Pending
state, and the PV is in theReleased
state.Example output:local-pv-3cd386dc 3576Gi RWO Delete Released openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d ibm-spectrum-fusion-local <unset> 70d
To locate the compute node, use the oc command to describe the PV.
For example, the PV host name is fu49 (kubernetes.io/hostname=fu49
).oc describe pv local-pv-3cd386dc Name: local-pv-3cd386dc Labels: kubernetes.io/hostname=compute-3-ru4.gen2070810.mydomain.com storage.openshift.com/owner-kind=LocalVolumeSet storage.openshift.com/owner-name=ibm-spectrum-fusion-local storage.openshift.com/owner-namespace=openshift-local-storage Annotations: pv.kubernetes.io/bound-by-controller: yes pv.kubernetes.io/provisioned-by: local-volume-provisioner-compute-3-ru4.gen2070810.mydomain.com storage.openshift.com/device-id: nvme-SAMSUNG_MZWLO3T8HCLS-00A07_S794NC0X301493 storage.openshift.com/device-name: nvme1n1 Finalizers: [kubernetes.io/pv-protection] StorageClass: ibm-spectrum-fusion-local Status: Released Claim: openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d Reclaim Policy: Delete Access Modes: RWO VolumeMode: Block Capacity: 3576Gi Node Affinity: Required Terms: Term 0: kubernetes.io/hostname in [compute-3-ru4.gen2070810.mydomain.com] Message: Source: Type: LocalVolume (a persistent volume backed by local storage on a node) Path: /mnt/local-storage/ibm-spectrum-fusion-local/nvme-SAMSUNG_MZWLO3T8HCLS-00A07_S794NC0X301493 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning VolumeFailedDelete 30s (x26 over 6m54s) deleter Error cleaning PV "local-pv-3cd386dc": failed to get volume mode of path "/mnt/local-storage/ibm-spectrum-fusion-local/nvme-SAMSUNG_MZWLO3T8HCLS-00A07_S794NC0X301493": Directory check for "/mnt/local-storage/ibm-spectrum-fusion-local/nvme-SAMSUNG_MZWLO3T8HCLS-00A07_S794NC0X301493" failed: open /mnt/local-storage/ibm-spectrum-fusion-local/nvme-SAMSUNG_MZWLO3T8HCLS-00A07_S794NC0X301493: no such file or directory
- Do the following steps to remove the encryption related configuration:
- For each of the previously identified nodes, run the following
command:
oc debug node/<node name> chroot /host dmsetup ls| grep <pvc name>
- Remove the mapped
device.
cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
Example output:sh-5.1# cryptsetup luksClose --debug --verbose ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt
# Allocating crypt device context by device ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt. # Initialising device-mapper backend library. # dm version [ opencount flush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # Detected dm-ioctl version 4.48.0. # Detected dm-crypt version 1.24.0. # Device-mapper backend running with UDEV support enabled. # dm status ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount noflush ] [16384] (*1) # Releasing device-mapper backend. # Allocating context for crypt device (none). # Initialising device-mapper backend library. Underlying device for crypt device ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt disappeared. # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm deps ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush ] [16384] (*1) # LUKS device header not available. # Deactivating volume ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt. # dm versions [ opencount flush ] [16384] (*1) # dm status ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount noflush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm deps ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush securedata ] [16384] (*1) # dm versions [ opencount flush ] [16384] (*1) # Udev cookie 0xd4d0a0b (semid 0) created # Udev cookie 0xd4d0a0b (semid 0) incremented to 1 # Udev cookie 0xd4d0a0b (semid 0) incremented to 2 # Udev cookie 0xd4d0a0b (semid 0) assigned to REMOVE task(2) with flags DISABLE_LIBRARY_FALLBACK (0x20) # dm remove ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d-block-dmcrypt [ opencount flush retryremove ] [16384] (*1) # Udev cookie 0xd4d0a0b (semid 0) decremented to 0 # Udev cookie 0xd4d0a0b (semid 0) waiting for zero # Udev cookie 0xd4d0a0b (semid 0) destroyed # Releasing crypt device empty context. # Releasing device-mapper backend.
- For each of the previously identified nodes, run the following
command:
- Identify the PV to delete:Run the oc command to find the failed PV.
oc get pv -l kubernetes.io/hostname=<failed-osds-worker-node-name>
Example output:NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE local-pv-3cd386dc 3576Gi RWO Delete Released openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-35d72d ibm-spectrum-fusion-local <unset> 70d local-pv-7a941968 3576Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-2-data-1t2lbp ibm-spectrum-fusion-local <unset> 70d
- Delete the released PV:Run the following oc command to delete the released PV:
oc delete pv <pv_name>
Example output:sh-5.1$ ceph -s cluster: id: dff472f9-a62c-4a68-8cdb-3de2cc62de9a health: HEALTH_WARN 1 daemons have recently crashed services: mon: 3 daemons, quorum a,b,c (age 8w) mgr: b(active, since 7w), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 14 osds: 14 up (since 47m), 14 in (since 37m) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 393 pgs objects: 66.06k objects, 234 GiB usage: 707 GiB used, 48 TiB / 49 TiB avail pgs: 393 active+clean io: client: 56 KiB/s rd, 1.9 MiB/s wr, 9 op/s rd, 240 op/s wr sh-5.1$ ^C sh-5.1$ ^C sh-5.1$ ^C sh-5.1$ ^C sh-5.1$ ^C sh-5.1$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 48.90323 root default -4 17.46544 rack gen2007 -15 3.49309 host compute-1-ru3-gen2070810-mydomain-com 5 ssd 3.49309 osd.5 up 1.00000 1.00000 -3 6.98618 host compute-1-ru4-gen2070810-mydomain-com 0 ssd 3.49309 osd.0 up 1.00000 1.00000 3 ssd 3.49309 osd.3 up 1.00000 1.00000 -19 6.98618 host control-1-ru2-gen2070810-mydomain-com 9 ssd 3.49309 osd.9 up 1.00000 1.00000 10 ssd 3.49309 osd.10 up 1.00000 1.00000 -12 17.46544 rack gen2008 -21 3.49309 host compute-2-ru3-gen2070810-mydomain-com 8 ssd 3.49309 osd.8 up 1.00000 1.00000 -11 6.98618 host compute-2-ru4-gen2070810-mydomain-com 2 ssd 3.49309 osd.2 up 1.00000 1.00000 4 ssd 3.49309 osd.4 up 1.00000 1.00000 -23 6.98618 host control-2-ru2-gen2070810-mydomain-com 11 ssd 3.49309 osd.11 up 1.00000 1.00000 14 ssd 3.49309 osd.14 up 1.00000 1.00000 -8 13.97235 rack gen2010 -7 3.49309 host compute-3-ru3-gen2070810-mydomain-com 1 ssd 3.49309 osd.1 up 1.00000 1.00000 -25 3.49309 host compute-3-ru4-gen2070810-mydomain-com 13 ssd 3.49309 osd.13 up 1.00000 1.00000 -17 6.98618 host control-3-ru2-gen2070810-mydomain-com 6 ssd 3.49309 osd.6 up 1.00000 1.00000 7 ssd 3.49309 osd.7 up 1.00000 1.00000
- Delete any old or outdated
- Add the new OSD into the node.
If you want to reuse the same disk on the node rather than replacing it, ensure you clean the disk before adding it to the node. Only then will the corresponding PV transition to the
Available
state. To clean the disk, run the following command:Note: Add the disk name that is added back while executing the command.DISK="/dev/sdX" # Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean) sgdisk --zap-all $DISK # Wipe portions of the disk to remove more LVM metadata that may be present dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=0 # Clear at offset 0 dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1 * 1024**2)) # Clear at offset 1GB dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((10 * 1024**2)) # Clear at offset 10GB dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((100 * 1024**2)) # Clear at offset 100GB dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1000 * 1024**2)) # Clear at offset 1000GB
To verify the PV state before you add the new OSD, run the following command:
oc get pv |grep local
The following is an example output that shows the PV state before the OSD is added:local-pv-159e1b8e 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-11ms68d ibm-spectrum-fusion-local <unset> 37d local-pv-2ac8a85b 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-2xdlrk ibm-spectrum-fusion-local <unset> 37d local-pv-406f1af8 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-128sb6h ibm-spectrum-fusion-local <unset> 37d local-pv-451f6341 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-31wkjwp ibm-spectrum-fusion-local <unset> 37d local-pv-4ce100b9 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-26px6qd ibm-spectrum-fusion-local <unset> 37d local-pv-521ee04f 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-30g5vj5 ibm-spectrum-fusion-local <unset> 37d local-pv-55a0cd22 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-4t8t9v ibm-spectrum-fusion-local <unset> 37d local-pv-566149be 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-19kg9bc ibm-spectrum-fusion-local <unset> 37d local-pv-566e5661 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1349xzd ibm-spectrum-fusion-local <unset> 37d local-pv-5b320962 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-29w8c49 ibm-spectrum-fusion-local <unset> 37d local-pv-5ca4de4d 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-276cwj9 ibm-spectrum-fusion-local <unset> 37d local-pv-66b9a934 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1mglzb ibm-spectrum-fusion-local <unset> 37d local-pv-683f9dbf 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-695cp5 ibm-spectrum-fusion-local <unset> 37d local-pv-6ba1664e 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-334h7jp ibm-spectrum-fusion-local <unset> 22d local-pv-753a64df 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-23xlj2c ibm-spectrum-fusion-local <unset> 37d local-pv-89429641 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-9tc4vl ibm-spectrum-fusion-local <unset> 37d local-pv-93967824 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-15vv92v ibm-spectrum-fusion-local <unset> 37d local-pv-95ccaa7e 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-2025lf6 ibm-spectrum-fusion-local <unset> 37d local-pv-9e64c170 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1087jgw ibm-spectrum-fusion-local <unset> 37d local-pv-a3bb3040 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-5dtrs9 ibm-spectrum-fusion-local <unset> 37d local-pv-a6c9cfdb 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-34vx4dh ibm-spectrum-fusion-local <unset> 23h local-pv-b7c4932c 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-2825ldc ibm-spectrum-fusion-local <unset> 37d local-pv-bb8d453c 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-07hjl5 ibm-spectrum-fusion-local <unset> 37d local-pv-bbc6335b 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-14tklz4 ibm-spectrum-fusion-local <unset> 37d local-pv-c47077d7 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-8mtkd2 ibm-spectrum-fusion-local <unset> 37d local-pv-c89629e0 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-21hqwg9 ibm-spectrum-fusion-local <unset> 37d
After you add the disk back, you can see that one PV is in theAvailable
state. To verify the PV state, run the following command:oc get pv |grep local
The following is an example output where the new PV is in theAvailable
state and no new OSDs formed in pods.local-pv-159e1b8e 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-11ms68d ibm-spectrum-fusion-local <unset> 37d local-pv-2ac8a85b 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-2xdlrk ibm-spectrum-fusion-local <unset> 37d local-pv-406f1af8 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-128sb6h ibm-spectrum-fusion-local <unset> 37d local-pv-451f6341 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-31wkjwp ibm-spectrum-fusion-local <unset> 37d local-pv-4ce100b9 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-26px6qd ibm-spectrum-fusion-local <unset> 37d local-pv-4da6a449 7153Gi RWO Delete Available ibm-spectrum-fusion-local <unset> 2m23s local-pv-521ee04f 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-30g5vj5 ibm-spectrum-fusion-local <unset> 37d local-pv-55a0cd22 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-4t8t9v ibm-spectrum-fusion-local <unset> 37d local-pv-566149be 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-19kg9bc ibm-spectrum-fusion-local <unset> 37d local-pv-566e5661 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1349xzd ibm-spectrum-fusion-local <unset> 37d local-pv-5b320962 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-29w8c49 ibm-spectrum-fusion-local <unset> 37d local-pv-5ca4de4d 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-276cwj9 ibm-spectrum-fusion-local <unset> 37d local-pv-66b9a934 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1mglzb ibm-spectrum-fusion-local <unset> 37d local-pv-683f9dbf 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-695cp5 ibm-spectrum-fusion-local <unset> 37d local-pv-6ba1664e 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-334h7jp ibm-spectrum-fusion-local <unset> 22d local-pv-753a64df 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-23xlj2c ibm-spectrum-fusion-local <unset> 37d local-pv-89429641 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-9tc4vl ibm-spectrum-fusion-local <unset> 37d local-pv-93967824 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-15vv92v ibm-spectrum-fusion-local <unset>
Run the following command to list pods associated with OSDs:oc get pods |grep osd
Example output:ocs-osd-removal-job-rh72h 0/1 Completed 0 54m rook-ceph-osd-0-85fcb5fd9-5cvws 2/2 Running 0 9d rook-ceph-osd-1-5dd8bc8d9d-cbz7g 2/2 Running 0 37d rook-ceph-osd-10-7cc49487b5-bdw6w 2/2 Running 0 9d rook-ceph-osd-11-84cd6fb7d7-xpn22 2/2 Running 0 37d rook-ceph-osd-12-7bb579498c-25tzh 2/2 Running 0 9d rook-ceph-osd-13-866dbc7f57-kqqpn 2/2 Running 0 37d rook-ceph-osd-14-6f7f6dd89b-7skg8 2/2 Running 0 37d rook-ceph-osd-15-697c5b8577-9plff 2/2 Running 0 8h rook-ceph-osd-16-8d78df98c-khbh7 2/2 Running 0 37d rook-ceph-osd-17-c8ffbb5bf-q4xqb 2/2 Running 0 37d rook-ceph-osd-18-65f847c8d4-zpz2f 2/2 Running 0 37d rook-ceph-osd-2-5fd5548c56-gclrv 2/2 Running 0 37d rook-ceph-osd-20-b88868db5-h55h9 2/2 Running 0 37d rook-ceph-osd-21-556947cb6f-k29mm 2/2 Running 0 37d rook-ceph-osd-22-5d79578d54-b9qlb 2/2 Running 0 37d rook-ceph-osd-23-84bcb674c5-6jzxx 2/2 Running 0 37d rook-ceph-osd-24-6df4949c8c-qvczd 2/2 Running 0 37d rook-ceph-osd-26-5d7b7b6f46-vqgz4 2/2 Running 0 46h rook-ceph-osd-27-59b8c85b78-wz4pb 2/2 Running 0 47h rook-ceph-osd-28-54dc8db898-ml8r5 2/2 Running 0 47h rook-ceph-osd-29-5666b4975c-qrpm5 2/2 Running 0 47h rook-ceph-osd-3-7cfc7899dc-px8xm 2/2 Running 0 9d rook-ceph-osd-30-7cc88bfcfc-xsnvc 2/2 Running 0 37d rook-ceph-osd-31-78cc567d99-8jhkh 2/2 Running 0 37d rook-ceph-osd-32-6c5fcdfb45-4jz2q 2/2 Running 0 47h rook-ceph-osd-33-694f48d4f8-4sflf 2/2 Running 0 22d rook-ceph-osd-4-6f4fb99d7-w7vrb 2/2 Running 0 37d rook-ceph-osd-5-7b7ff5cfb5-blpj7 2/2 Running 0 37d rook-ceph-osd-6-5d9856b54f-bjpmm 2/2 Running 0 9d rook-ceph-osd-7-7f79d4b76-jxz8x 2/2 Running 0 37d rook-ceph-osd-8-75497d6999-wz296 2/2 Running 0 37d rook-ceph-osd-9-56d5cc7c59-w29n2 2/2 Running 0 37d rook-ceph-osd-key-rotation-0-29054880-fvbmb 0/1 Completed 0 4d14h rook-ceph-osd-key-rotation-1-29034720-jx9wf 0/1 Completed 0 18d rook-ceph-osd-key-rotation-1-29044800-fnww9 0/1 Completed 0 11d
- Scale up the following operators in sequence:
- Scale up the
ocs-operator
.-
Go to Workloads > Deployments page.
The
ocs-operator
appears in the deployed Project list. -
Click ocs-operator and check the Deployment details tab.
For example, the Deployment details tab displays that the number of Pod is 0.
- Scale up the Pod number to 1.
-
Go to Workloads > Pods page.
The
ocs-operator
appears in the Pods list with the statusRunning
.
-
- Scale up the
rook-ceph-operator
.-
Go to Workloads > Deployments page.
The
rook-ceph-operator
appears in the deployed Project list. -
Click rook-ceph-operator and check the Deployment details tab.
For example, the Deployment details tab displays that the number of Pod is 0.
- Scale up the Pod number to 1.
-
Go to Workloads > Pods page.
The
rook-ceph-operator
appears in the Pods list with the statusRunning
.
-
- Scale up the
- Check the Ceph status:
-
Run the following command to check the health and performance:
ceph -s
Example output:cluster: id: 4141c39c-14e1-478f-9cb1-3a8d8a57d22c health: HEALTH_WARN 1 daemons have recently crashed services: mon: 3 daemons, quorum a,c,d (age 47h) mgr: b(active, since 49s), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 33 osds: 33 up (since 12s), 33 in (since 56s); 177 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 1289 pgs objects: 12.21M objects, 46 TiB usage: 139 TiB used, 91 TiB / 231 TiB avail pgs: 0.233% pgs not active 2115238/36626691 objects misplaced (5.775%) 1108 active+clean 160 active+remapped+backfill_wait 12 active+remapped+backfilling 5 active+remapped 3 peering 1 active+clean+remapped io: client: 6.2 KiB/s rd, 980 KiB/s wr, 7 op/s rd, 59 op/s wr recovery: 859 MiB/s, 53 keys/s, 1.70k objects/s
-
Run the following command to view a hierarchical representation of your Ceph storage clusters:
ceph osd tree
Example output:ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 230.54782 root default -7 34.93149 host compute-1-ru5-isf-racka-rtp-raleigh-ibm-com 2 ssd 6.98630 osd.2 up 1.00000 1.00000 5 ssd 6.98630 osd.5 up 1.00000 1.00000 8 ssd 6.98630 osd.8 up 1.00000 1.00000 11 ssd 6.98630 osd.11 up 1.00000 1.00000 14 ssd 6.98630 osd.14 up 1.00000 1.00000 -3 34.93149 host compute-1-ru6-isf-racka-rtp-raleigh-ibm-com 1 ssd 6.98630 osd.1 up 1.00000 1.00000 4 ssd 6.98630 osd.4 up 1.00000 1.00000 7 ssd 6.98630 osd.7 up 1.00000 1.00000 9 ssd 6.98630 osd.9 up 1.00000 1.00000 13 ssd 6.98630 osd.13 up 1.00000 1.00000 -5 34.93149 host compute-1-ru7-isf-racka-rtp-raleigh-ibm-com 0 ssd 6.98630 osd.0 up 1.00000 1.00000 3 ssd 6.98630 osd.3 up 1.00000 1.00000 6 ssd 6.98630 osd.6 up 1.00000 1.00000 10 ssd 6.98630 osd.10 up 1.00000 1.00000 12 ssd 6.98630 osd.12 up 1.00000 1.00000 -11 41.91779 host control-1-ru2-isf-racka-rtp-raleigh-ibm-com 16 ssd 6.98630 osd.16 up 1.00000 1.00000 17 ssd 6.98630 osd.17 up 1.00000 1.00000 18 ssd 6.98630 osd.18 up 1.00000 1.00000 19 ssd 6.98630 osd.19 up 1.00000 1.00000 30 ssd 6.98630 osd.30 up 1.00000 1.00000 33 ssd 6.98630 osd.33 up 1.00000 1.00000 -13 41.91779 host control-1-ru3-isf-racka-rtp-raleigh-ibm-com 15 ssd 6.98630 osd.15 up 1.00000 1.00000 26 ssd 6.98630 osd.26 up 1.00000 1.00000 27 ssd 6.98630 osd.27 up 1.00000 1.00000 28 ssd 6.98630 osd.28 up 1.00000 1.00000 29 ssd 6.98630 osd.29 up 1.00000 1.00000 32 ssd 6.98630 osd.32 up 1.00000 1.00000 -9 41.91779 host control-1-ru4-isf-racka-rtp-raleigh-ibm-com 20 ssd 6.98630 osd.20 up 1.00000 1.00000 21 ssd 6.98630 osd.21 up 1.00000 1.00000 22 ssd 6.98630 osd.22 up 1.00000 1.00000 23 ssd 6.98630 osd.23 up 1.00000 1.00000 24 ssd 6.98630 osd.24 up 1.00000 1.00000 31 ssd 6.98630 osd.31 up 1.00000 1.00000
-
Run the following command to display disk usage and performance information for each OSD in your Ceph storage clusters:
ceph osd df
The following is an example output whereosd 19
has come back online, and therook-ceph-osd-19
pod has restarted successfully.ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 292 KiB 6.6 GiB 2.6 TiB 63.36 1.05 129 up 5 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 311 KiB 6.8 GiB 2.5 TiB 63.84 1.06 128 up 8 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 278 KiB 7.1 GiB 2.4 TiB 65.10 1.08 129 up 11 ssd 6.98630 1.00000 7.0 TiB 4.3 TiB 4.3 TiB 314 KiB 6.5 GiB 2.7 TiB 61.33 1.02 116 up 14 ssd 6.98630 1.00000 7.0 TiB 4.3 TiB 4.3 TiB 458 KiB 6.7 GiB 2.7 TiB 62.06 1.03 118 up 1 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 331 KiB 6.3 GiB 2.5 TiB 64.03 1.06 121 up 4 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 504 KiB 6.7 GiB 2.5 TiB 63.92 1.06 124 up 7 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.2 TiB 265 KiB 6.5 GiB 2.8 TiB 60.14 1.00 116 up 9 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 303 KiB 6.8 GiB 2.6 TiB 63.32 1.05 116 up 13 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 360 KiB 6.9 GiB 2.5 TiB 64.47 1.07 126 up 0 ssd 6.98630 1.00000 7.0 TiB 4.6 TiB 4.6 TiB 544 KiB 7.0 GiB 2.4 TiB 65.25 1.08 122 up 3 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 500 KiB 6.8 GiB 2.6 TiB 63.21 1.05 124 up 6 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 270 KiB 7.0 GiB 2.5 TiB 63.95 1.06 126 up 10 ssd 6.98630 1.00000 7.0 TiB 4.6 TiB 4.6 TiB 389 KiB 7.1 GiB 2.3 TiB 66.55 1.10 133 up 12 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.2 TiB 325 KiB 6.7 GiB 2.7 TiB 60.70 1.01 116 up 16 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 293 KiB 6.8 GiB 2.6 TiB 63.34 1.05 122 up 17 ssd 6.98630 1.00000 7.0 TiB 4.7 TiB 4.7 TiB 331 KiB 7.1 GiB 2.3 TiB 67.47 1.12 120 up 18 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 315 KiB 6.9 GiB 2.5 TiB 63.61 1.05 121 up 19 ssd 6.98630 1.00000 7.0 TiB 1.7 GiB 1.5 GiB 1 KiB 162 MiB 7.0 TiB 0.02 0 25 up 30 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 440 KiB 6.7 GiB 2.6 TiB 62.67 1.04 123 up 33 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.2 TiB 325 KiB 6.4 GiB 2.8 TiB 60.20 1.00 117 up 15 ssd 6.98630 1.00000 7.0 TiB 2.4 TiB 2.4 TiB 283 KiB 4.2 GiB 4.6 TiB 34.58 0.57 72 up 26 ssd 6.98630 1.00000 7.0 TiB 4.7 TiB 4.7 TiB 409 KiB 7.1 GiB 2.3 TiB 66.84 1.11 127 up 27 ssd 6.98630 1.00000 7.0 TiB 4.6 TiB 4.6 TiB 397 KiB 7.2 GiB 2.4 TiB 66.10 1.10 123 up 28 ssd 6.98630 1.00000 7.0 TiB 4.5 TiB 4.5 TiB 388 KiB 7.0 GiB 2.5 TiB 63.90 1.06 124 up 29 ssd 6.98630 1.00000 7.0 TiB 4.6 TiB 4.6 TiB 411 KiB 7.0 GiB 2.4 TiB 65.25 1.08 134 up 32 ssd 6.98630 1.00000 7.0 TiB 4.3 TiB 4.3 TiB 476 KiB 6.7 GiB 2.7 TiB 61.44 1.02 113 up 20 ssd 6.98630 1.00000 7.0 TiB 4.4 TiB 4.4 TiB 294 KiB 6.9 GiB 2.5 TiB 63.67 1.06 121 up 21 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.1 TiB 307 KiB 6.6 GiB 2.8 TiB 59.48 0.99 115 up 22 ssd 6.98630 1.00000 7.0 TiB 4.3 TiB 4.3 TiB 398 KiB 6.7 GiB 2.6 TiB 62.09 1.03 117 up 23 ssd 6.98630 1.00000 7.0 TiB 4.1 TiB 4.1 TiB 317 KiB 6.6 GiB 2.9 TiB 58.62 0.97 114 up 24 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.2 TiB 298 KiB 6.5 GiB 2.8 TiB 60.17 1.00 116 up 31 ssd 6.98630 1.00000 7.0 TiB 4.2 TiB 4.2 TiB 248 KiB 6.1 GiB 2.8 TiB 60.06 1.00 119 up TOTAL 231 TiB 139 TiB 139 TiB 11 MiB 214 GiB 91 TiB 60.33 MIN/MAX VAR: 0/1.12 STDDEV: 11.93
-
- Verify whether the new OSD pod is running. Run the following oc command to check whether the new OSD pod is in the
Running
state.oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
Example output:NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rook-ceph-osd-0-85fcb5fd9-5cvws 2/2 Running 0 9d 9.42.107.150 compute-1-ru7.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-1-5dd8bc8d9d-cbz7g 2/2 Running 0 37d 9.42.107.149 compute-1-ru6.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-10-7cc49487b5-bdw6w 2/2 Running 0 9d 9.42.107.150 compute-1-ru7.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-11-84cd6fb7d7-xpn22 2/2 Running 0 37d 9.42.107.148 compute-1-ru5.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-12-7bb579498c-25tzh 2/2 Running 0 9d 9.42.107.150 compute-1-ru7.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-13-866dbc7f57-kqqpn 2/2 Running 0 37d 9.42.107.149 compute-1-ru6.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-14-6f7f6dd89b-7skg8 2/2 Running 0 37d 9.42.107.148 compute-1-ru5.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-15-697c5b8577-9plff 2/2 Running 0 8h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-16-8d78df98c-khbh7 2/2 Running 0 37d 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-17-c8ffbb5bf-q4xqb 2/2 Running 0 37d 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-18-65f847c8d4-zpz2f 2/2 Running 0 37d 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-19-74cc9c8bc6-cc8p4 2/2 Running 0 111s 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-2-5fd5548c56-gclrv 2/2 Running 0 37d 9.42.107.148 compute-1-ru5.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-20-b88868db5-h55h9 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-21-556947cb6f-k29mm 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-22-5d79578d54-b9qlb 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-23-84bcb674c5-6jzxx 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-24-6df4949c8c-qvczd 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-26-5d7b7b6f46-vqgz4 2/2 Running 0 47h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-27-59b8c85b78-wz4pb 2/2 Running 0 47h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-28-54dc8db898-ml8r5 2/2 Running 0 47h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-29-5666b4975c-qrpm5 2/2 Running 0 47h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-3-7cfc7899dc-px8xm 2/2 Running 0 9d 9.42.107.150 compute-1-ru7.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-30-7cc88bfcfc-xsnvc 2/2 Running 0 37d 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-31-78cc567d99-8jhkh 2/2 Running 0 37d 9.42.107.147 control-1-ru4.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-32-6c5fcdfb45-4jz2q 2/2 Running 0 47h 9.42.107.146 control-1-ru3.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-33-694f48d4f8-4sflf 2/2 Running 0 22d 9.42.107.145 control-1-ru2.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-4-6f4fb99d7-w7vrb 2/2 Running 0 37d 9.42.107.149 compute-1-ru6.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-5-7b7ff5cfb5-blpj7 2/2 Running 0 37d 9.42.107.148 compute-1-ru5.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-6-5d9856b54f-bjpmm 2/2 Running 0 9d 9.42.107.150 compute-1-ru7.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-7-7f79d4b76-jxz8x 2/2 Running 0 37d 9.42.107.149 compute-1-ru6.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-8-75497d6999-wz296 2/2 Running 0 37d 9.42.107.148 compute-1-ru5.isf-racka.rtp.raleigh.ibm.com <none> <none> rook-ceph-osd-9-56d5cc7c59-w29n2 2/2 Running 0 37d 9.42.107.149 compute-1-ru6.isf-racka.rtp.raleigh.ibm.com <none> <none>
- Verify whether new PV and PVCs are created. Command example:
oc get pvc -n openshift-storage |grep local-pv-4da6a449
Example output:ocs-deviceset-ibm-spectrum-fusion-local-0-data-35tw7jf Bound local-pv-4da6a449 7153Gi RWO ibm-spectrum-fusion-local <unset> 13h aditisaxena1271@Aditis-MBP ~ % oc get pv -n openshift-storage |grep local-pv-4da6a449 local-pv-4da6a449 7153Gi RWO Delete Bound openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-35tw7jf ibm-spectrum-fusion-local <unset> 13h
- Verify the OSD encryption settings. Note: If the cluster wide encryption is enabled, ensure that the crypt keyword is next to the
ocs-deviceset
name.Run the followingoc
command to verify the status of the new node:oc debug node/<new-node-name> -- chroot /host lsblk -f
Example output:nvme1n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca 48ac3c60-da64-4131-b83f-c5bb9313ef70 `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-35tw7jf-block-dmcrypt ceph_bluestore nvme2n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca c087e8c0-cdda-48c8-8eee-6e378bbbb4c5 `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-9tc4vl-block-dmcrypt ceph_bluestore nvme0n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca ded9f6a0-b4b3-4e52-b07c-5221310cb63f `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-26px6qd-block-dmcrypt ceph_bluestore nvme4n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca 9246abee-ffec-4fe4-9ad3-9da4a66c5a3c `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-29w8c49-block-dmcrypt ceph_bluestore nvme5n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca a76bfe54-f50b-431a-85bb-975b814fa8b6 `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-17k74xw-block-dmcrypt ceph_bluestore nvme6n1 crypto_LUKS 2 pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca 266751d6-6fcf-4d7a-a1bb-236118b0c535 `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-334h7jp-block-dmcrypt ceph_bluestore
Thenvme1n1
is up and encrypted.Run the followingoc
command to verify the status of new PVC:oc debug node/control-1-ru2.isf-racka.rtp.raleigh.ibm.com -- chroot /host dmsetup ls |grep <new_PVC_name>
Example output:dmsetup ls |grep ocs-deviceset-ibm-spectrum-fusion-local-0-data-35tw7jf Starting pod/control-1-ru2isf-rackartpraleighibmcom-debug-k86s5 ... To use host binaries, run `chroot /host` Removing debug pod ... ocs-deviceset-ibm-spectrum-fusion-local-0-data-35tw7jf-block-dmcrypt (253:3)
- Go to the IBM Fusion user interface and check the health of the storage.
- In the Red Hat OpenShift console, go to
Storage > Data Foundation and
check the health of the storage. It confirms the success of the deployment.
- Run the following command to get the failed OSD: