Red Hat OpenShift Data Foundation storage node failure
You can do a node replacement proactively for an operational node and reactively for a failed node. For a failed node backed by local storage devices, you must replace the Red Hat® OpenShift® Data Foundation storage node.
- Before you begin
- Red Hat recommends that replacement nodes are
configured with similar infrastructure, resources, and disks as the node planned for replacement.
Note: Contact IBM Support before you proceed with any of these fixes.
- Procedure
- Do the following steps to check for the occurrence of Red Hat OpenShift Data Foundation storage node failure and identify the failed
node:
- Set the Red Hat OpenShift Data Foundation cluster to maintenance mode:
oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode=true"
Example output:[root@fu40 ~]# oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode=true" odfcluster.odf.isf.ibm.com/odfcluster labeled
- Identify the failed node:
- Log in to IBM Storage Fusion user interface.
- Go to Data foundation page and check for warning in the Health section for storage cluster.
Alternatively, you can use the oc command to identify the node:oc get node -l cluster.ocs.openshift.io/openshift-storage=
Sample output:[root@fu71-f09-vm3 ~]# oc get node -l cluster.ocs.openshift.io/openshift-storage= NAME STATUS ROLES AGE VERSION f09-prc4m-worker-cluster-b-9chb5 NotReady worker 27d v1.24.0+4f0dd4d f09-prc4m-worker-cluster-c-mfb77 Ready worker 31d v1.24.0+4f0dd4d f09-prc4m-worker-cluster-d-r5bxx Ready worker 27d v1.24.0+4f0dd4d
- Identify the failed mon (if any) and Red Hat OpenShift Dedicated
pods that are running in the node, which is planned for replacement: In an operational storage node environment:
oc get pods -n openshift-storage -o wide | grep -i <node_name>
- If the storage node failed in a failed storage node, there is no
node_name
for the failed pods. Filter thepending
pods instead.oc get pods -n openshift-storage -o wide | grep -i pending
Example output: The mon deployment isrook-ceph-mon-d
, and the Red Hat OpenShift Dedicated deployment isook-ceph-osd-0
.[root@fu71-f09-vm3 ~]# oc get pods -n openshift-storage -o wide | grep -i pending rook-ceph-mon-d-67686857d7-zv62c 0/2 Pending 0 8m50s <none> <none> <none> <none> rook-ceph-osd-0-75b954c9bf-62xm4 0/2 Pending 0 8m50s <none> <none> <none> <none>
- Remove the failed objects.
- Remove the failed node from
odfcluster
CR -
spec: autoScaleUp: false creator: CreatedByFusion deviceSets: - capacity: "0" count: 3 name: ocs-deviceset-ibm-spectrum-fusion-local storageClass: ibm-spectrum-fusion-local encryption: keyManagementService: {} localVolumeSetSpec: deviceTypes: - disk - part size: 2Ti storageNodes: - f09-prc4m-worker-cluster-d-r5bxx - f09-prc4m-worker-cluster-c-mfb77 - f09-prc4m-worker-cluster-b-9chb5 <<-- this one, remove this line.
- Remove the mon and Red Hat OpenShift Dedicated pods
- Scale down the deployments of the identified pods. The mon deployment is
rook-ceph-mon-d
and the Red Hat OpenShift Dedicated deployment isrook-ceph-osd-0
.oc scale deployment rook-ceph-mon-d --replicas=0 -n openshift-storage oc scale deployment rook-ceph-osd-0 --replicas=0 -n openshift-storage
Ensure that you confirm the values of
mon_id
andosd_id
.
- Remove the crashcollector pods
- Remove the crashcollector pods ( if any). You must put scale replica to
0.
oc scale deployment --selector=app=rook-ceph-crashcollector,node_name=<node_name> --replicas=0 -n openshift-storage
- Mark the failed node as unschedulable
- Mark the node as
SchedulingDisabled
.oc adm cordon <node_name>
Example command and output:oc adm cordon f09-prc4m-worker-cluster-b-9chb5 node/f09-prc4m-worker-cluster-b-9chb5 cordoned
oc get node -l cluster.ocs.openshift.io/openshift-storage=
NAME STATUS ROLES AGE VERSION f09-prc4m-worker-cluster-b-9chb5 NotReady,SchedulingDisabled worker 28d v1.24.0+4f0dd4d f09-prc4m-worker-cluster-c-mfb77 Ready worker 31d v1.24.0+4f0dd4d f09-prc4m-worker-cluster-d-r5bxx Ready worker 27d v1.24.0+4f0dd4d
- Remove the pods which are in Terminating state
- This step is for the failed storage node. You can ignore this step if your are removing an
operational
node.
oc get pods -A -o wide | grep -i <node_name> | awk '{if ($4 == "Terminating") system ("oc -n " $1 " delete pods " $2 " --grace-period=0 " " --force ")}'
Example command and output:oc get pods -A -o wide | grep -i f09-prc4m-worker-cluster-b-9chb5 | awk '{if ($4 == "Terminating") system ("oc -n " $1 " delete pods " $2 " --grace-period=0 " " --force ")}'
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "isf-data-protection-operator-controller-manager-5c7cf574d5ms4xx" force deleted
- Drain the node
-
oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
Example command and output:oc adm drain f09-prc4m-worker-cluster-b-9chb5 --force --delete-emptydir-data=true --ignore-daemonsets node/f09-prc4m-worker-cluster-b-9chb5 already cordoned
WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-node-7696f, openshift-cluster-node-tuning-operator/tuned-fk949, openshift-dns/dns-default-gvv4m, openshift-dns/node-resolver-t5dk8, openshift-image-registry/node-ca-wtnp9, openshift-ingress-canary/ingress-canary-kgxts, openshift-local-storage/diskmaker-discovery-qkmh7, openshift-local-storage/diskmaker-manager-m9q42, openshift-machine-config-operator/machine-config-daemon-252j8, openshift-monitoring/node-exporter-cghwc, openshift-multus/multus-additional-cni-plugins-mkz4m, openshift-multus/multus-bz789, openshift-multus/network-metrics-daemon-57v5r, openshift-network-diagnostics/network-check-target-n6bhw, openshift-sdn/sdn-fhp47, openshift-storage/csi-cephfsplugin-5vsp9, openshift-storage/csi-rbdplugin-bfpfs node/f09-prc4m-worker-cluster-b-9chb5 drained
- Delete the node
- Delete the failed node:
If you do not want to destroy this node for test purpose, you can remove the storage labeloc delete node <node_name>
cluster.ocs.openshift.io/openshift-storage=
.oc label nodes/f09-prc4m-worker-cluster-b-9chb5 cluster.ocs.openshift.io/openshift-storage-
- Remove the failed node from
- Add new OpenShift compute nodes.
- Create a new compute node and ensure that the new node is in ready state.
- Update new node info in
odfcluster
CR.- Edit the odfcluster cr and add new node name in
StorageNodes
-
oc edit odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster
storageNodes: - f09-prc4m-worker-cluster-b-djplp <<--- new node - f09-prc4m-worker-cluster-d-r5bxx - f09-prc4m-worker-cluster-c-mfb77
Verify whether the new nodes are labeled successfully as storage node.oc get node -l cluster.ocs.openshift.io/openshift-storage
NAME STATUS ROLES AGE VERSION f09-prc4m-worker-cluster-b-djplp Ready worker 27d v1.24.0+4f0dd4d <<-- this node f09-prc4m-worker-cluster-c-mfb77 Ready worker 31d v1.24.0+4f0dd4d f09-prc4m-worker-cluster-d-r5bxx Ready worker 27d v1.24.0+4f0dd4d
Verify whether the new PVs are created automatically. The local PV gets created automatically in a short time.oc get pv | grep Available
local-pv-e97b23d7 2Ti RWO Delete Available ibm-spectrum-fusion-local 2m49s
- Edit the odfcluster cr and add new node name in
- Replace the failed Red Hat OpenShift Dedicated disks.
Remove the failed Red Hat OpenShift Dedicated from the cluster. You can also specify multiple failed ODs. Use the correct
failed_osd_id
.The
failed_osd_id
is the integer in the pod name immediately after therook-ceph-osd
prefix. You can add comma separated Red Hat OpenShift Dedicated IDs in the command to remove more than one Red Hat OpenShift Dedicated, for example,FAILED_OSD_IDS=0,1,2
.Remove the failed Red Hat OpenShift Dedicated:oc process -n openshift-storage ocs-osd-removal \ -p FAILED_OSD_IDS=<failed_osd_id> FORCE_OSD_REMOVAL=true | oc create -n openshift-storage -f -
Example output:[root@fu71-f09-vm3 ~]# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 FORCE_OSD_REMOVAL=true | oc create -n openshift-storage -f - Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "operator" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "operator" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "operator" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "operator" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") job.batch/ocs-osd-removal-job created
Check the status of theocs-osd-removal-job
pod to verify whether the Red Hat OpenShift Dedicated got removed successfully. A status of Completed confirms that the Red Hat OpenShift Dedicated removal job succeeded.oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
Example output:[root@fu71-f09-vm3 ~]# oc get pod -n openshift-storage | grep ocs-osd-removal ocs-osd-removal-job-ls65l 0/1 Completed 0 23s
Ensure that the Red Hat OpenShift Dedicated removal is completed:oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
Example output:[root@fu71-f09-vm3 ~]# oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal' 2022-11-24 15:19:28.750910 I | cephosd: completed removal of OSD 0
Delete theocs-osd-removal-job
:oc delete -n openshift-storage job ocs-osd-removal-job
Delete theReleased PV
which is attached to the previous node.oc get pv | grep -i released
local-pv-e4a12175 2Ti RWO Delete Released openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m6gnz ibm-spectrum-fusion-local 3h34m [root@fu71-f09-vm3 ~]# oc delete pv local-pv-e4a12175 persistentvolume "local-pv-e4a12175" deleted
- Recover the failed objects.
- Restart the mon deployment/pod:
- Update the nodeSelector in deployment with new node.
oc edit deployment -n openshift-storage rook-ceph-mon-d
nodeSelector: kubernetes.io/hostname: f09-prc4m-worker-cluster-b-djplp <<--new node
- Scale the replica to 1 and wait till the mon pods are in running
state.
Example:oc scale deployment rook-ceph-mon-d --replicas=1 -n openshift-storage
oc scale deployment rook-ceph-mon-d --replicas=1 -n openshift-storage deployment.apps/rook-ceph-mon-d scaled
[root@fu71-f09-vm3 ~]# oc get pod -n openshift-storage | grep mon rook-ceph-mon-a-5bbb9dd98b-z54fx 2/2 Running 0 4m45s rook-ceph-mon-b-7fdd8f958b-lk9g2 2/2 Running 0 5m18s rook-ceph-mon-d-6945fbbfc5-nhhw8 2/2 Running 0 5m41s
- Update the nodeSelector in deployment with new node.
- Verify the Red Hat OpenShift Dedicated pods.Wait till all the pods are in running state.
oc get pods -o wide -n openshift-storage| grep osd
[root@fu71-f09-vm3 ~]# oc get pods -o wide -n openshift-storage| grep osd rook-ceph-osd-0-d559cc4fb-xspr8 2/2 Running 0 4m58s 10.129.6.49 f09-prc4m-worker-cluster-b-djplp <none> <none> rook-ceph-osd-1-6df7f9c669-n94md 2/2 Running 0 5m20s 10.128.6.95 f09-prc4m-worker-cluster-d-r5bxx <none> <none> rook-ceph-osd-2-5c5d48ff7c-sdd7l 2/2 Running 0 5m17s 10.129.4.125 f09-prc4m-worker-cluster-c-mfb77 <none> <none> rook-ceph-osd-prepare-1bf7dd3d71fe899383e625dd0c27ea37-x9vtk 0/1 Completed 0 4h8m 10.128.6.79 f09-prc4m-worker-cluster-d-r5bxx <none> <none> rook-ceph-osd-prepare-24272b5641dc95baffc7932d78894e3c-zhz8m 0/1 Completed 0 5m24s 10.129.6.48 f09-prc4m-worker-cluster-b-djplp <none> <none> rook-ceph-osd-prepare-6f3c3b4626ec9888b6fbe5597afd55ea-zh7cp 0/1 Completed 0 4h8m 10.129.4.113 f09-prc4m-worker-cluster-c-mfb77 <none> <none>
- Verify the Red Hat OpenShift Dedicated encryption settings. If
cluster wide encryption is enabled, make sure the “crypt” keyword beside the ocs-deviceset name(s)
oc debug node/<new-node-name> -- chroot /host dmsetup ls
Example output:# oc debug node/fu47 -- chroot /host dmsetup ls Starting pod/fu47-debug ... To use host binaries, run `chroot /host` ocs-deviceset-sc-lvs-0-data-0clwxf-block-dmcrypt (253:0)
If verification fails, contact IBM support .
- Restart the mon deployment/pod:
- Exit maintenance mode after all steps are
completed.
oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode-"
Example output:[root@fu40 ~]# oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode-" odfcluster.odf.isf.ibm.com/odfcluster unlabeled
- Go to Data foundation page in IBM Storage Fusion user interface and check the health of the Storage cluster in the Health section.
- Set the Red Hat OpenShift Data Foundation cluster to maintenance mode: