Red Hat OpenShift Data Foundation Object Storage Device (OSD) failure

For any kind of failed storage devices on the clusters backed by local storage devices, you must replace the Red Hat® OpenShift® Data Foundation Object Storage Device (OSD).

If you encounter this issue, contact IBM support .

Before you begin
Red Hat recommends that replacement OSD devices are configured with similar infrastructure and resources to the device being replaced.
You can replace an OSD in Red Hat OpenShift Data Foundation deployed using local storage devices on the following infrastructures:
  • Bare Metal
  • VMware with local deployment
  • SystemZ
  • IBM Power Systems
Procedure

Do the following steps to check for the occurrence of Red Hat OpenShift Data Foundation OSD failure:

  1. Set the Red Hat OpenShift Data Foundation cluster to maintenance:
    oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode=true"
    
    Example output:
    [root@fu40 ~]# oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode=true"
    odfcluster.odf.isf.ibm.com/odfcluster labeled
    
  2. Identify the failed OSD:
    check whether the OSD failed by using any of the following methods:
    • Log in to Red Hat OpenShift Container Platform web console and go to your storage system details page.
    • In the Overview > Block and File tab, check the Status section for any warning in the Storage cluster.
    • If the warnings indicate OSD down or degraded, then contact IBM support to replace the Red Hat OpenShift Data Foundation failed OSD for your storage node in an internal-attached environment.
      Example warning message:
      1 osds down
      1 host (1 osds) down
      Degraded data redundancy: 333/999 objects degaded (33.333%), 81 pgs degraded
    • Log in to IBM Fusion user interface.
    • Go to Data foundation page and check for warnings in the Health section for storage cluster.
    Alternatively, you can use the oc command to identify the OSD:
    oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
    Sample output:
    [root@fu40 ~]# oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
    NAME                               READY   STATUS             RESTARTS      AGE   IP             NODE   NOMINATED NODE   READINESS GATES
    rook-ceph-osd-0-6c99fc999b-2s9mr   1/2     CrashLoopBackOff   5 (17s ago)   17m   10.128.4.216   fu49   <none>           <none>
    rook-ceph-osd-1-764f9cff48-6gkg9   2/2     Running            0             16m   10.131.2.18    fu47   <none>           <none>
    rook-ceph-osd-2-5d9d5984dc-8gkrz   2/2     Running            0             16m   10.129.2.53    fu48   <none>           <none>

    In this example, rook-ceph-osd-0-6c99fc999b-2s9mr needs to be replaced and fu49 is the Red Hat OpenShift Container Platform node on which the OSD is scheduled. And the failed OSD id is 0.

    You can view the OSD details as well ceph osd df in the Ceph tools. And the failed OSD id is the same as in previous step.

  3. Scale down the OSD deployment
    Scale down the OSD deployment replica to 0
    Verify the OSD id from previous step, the rook-ceph-osd-0-6c99fc999b-2s9mr and pod id is 0.
    osd_id_to_remove=<replace-it-with-osd-id>
    oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
    Example output:
    [root@fu40 ~]# osd_id_to_remove=0
    [root@fu40 ~]#  oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
    deployment.apps/rook-ceph-osd-0 scaled
    Waiting to the rook-ceph-osd pod is terminated
    Run the oc command to terminate rook-ceph-osd pod.
    oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
    Example output:
    [root@fu40 ~]#  oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
    NAME                               READY   STATUS        RESTARTS   AGE
    rook-ceph-osd-0-6c99fc999b-2s9mr   0/2     Terminating   6          20m
    Note: If the rook-ceph-osd pod is in terminating state and taking more time, then use the force option to delete the pod.
    oc delete -n openshift-storage pod rook-ceph-osd-0-6c99fc999b-2s9mr --grace-period=0 --force
    Example output:
    [root@fu40 ~]# oc delete -n openshift-storage pod rook-ceph-osd-0-6c99fc999b-2s9mr --grace-period=0 --force
    warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
    pod "rook-ceph-osd-0-6c99fc999b-2s9mr" force deleted
    Verify whether rook-ceph-osd is terminated.
    [root@fu40 ~]#  oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
    No resources found in openshift-storage namespace.
  4. Remove the old OSD from the cluster.
    Delete any old ocs-osd-removal jobs
    Run the oc command to delete ocs-osd-removal jobs.
    oc delete -n openshift-storage job ocs-osd-removal-job
    Remove the old OSD from the cluster

    Ensure that you set the correct osd_id_to_remove.

    The FORCE_OSD_REMOVAL value must be changed to true in clusters that only have three OSDs, or clusters with insufficient space to restore all three replicas of the data after the OSD is removed.

    • More than three OSDs
      oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
    • Only three OSDs or insufficient space (force delete)
      oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
      Example output:
      [root@fu40 ~]# echo $osd_id_to_remove
      0
    Verify the OSD is removed
    Wait for the ocs-osd-removal-job pod is completed.
    [root@fu40 ~]#  oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
    NAME                        READY   STATUS      RESTARTS   AGE
    ocs-osd-removal-job-s4vhc   0/1     Completed   0          24s
    Double confirm the logs.
    [root@fu40 ~]#  oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
    2022-11-25 16:08:49.858109 I | cephosd: completed removal of OSD 0
    The PVC will go to Pending, and the pv will be Released.
    openshift-storage   ocs-deviceset-ibm-spectrum-fusion-local-0-data-3nsk8j   Pending                                                                        ibm-spectrum-fusion-local     7m16s
    local-pv-a2879220                          600Gi      RWO            Delete           Released   openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b   ibm-spectrum-fusion-local              41m

    To locate the worker node, use the oc command to describe pv.

    For example, pv host name is fu49 kubernetes.io/hostname=fu49.
    [root@fu40 ~]# oc describe pv local-pv-a2879220
    Name:              local-pv-a2879220
    Labels:            kubernetes.io/hostname=fu49
                      storage.openshift.com/owner-kind=LocalVolumeSet
                      storage.openshift.com/owner-name=ibm-spectrum-fusion-local
                      storage.openshift.com/owner-namespace=openshift-local-storage
    Annotations:       pv.kubernetes.io/bound-by-controller: yes
                      pv.kubernetes.io/provisioned-by: local-volume-provisioner-fu49-96f64c0f-e5ed-4bb1-b4ff-cad610562f58
                      storage.openshift.com/device-id: scsi-36000c2913ba6a22c66120c73cb1edae6
                      storage.openshift.com/device-name: sdb
    Finalizers:        [kubernetes.io/pv-protection]
    StorageClass:      ibm-spectrum-fusion-local
    Status:            Released
    Claim:             openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b
    Reclaim Policy:    Delete
    Access Modes:      RWO
    VolumeMode:        Block
    Capacity:          600Gi
    Node Affinity:
      Required Terms:
        Term 0:        kubernetes.io/hostname in [fu49]
    Message:
    Source:
        Type:  LocalVolume (a persistent volume backed by local storage on a node)
        Path:  /mnt/local-storage/ibm-spectrum-fusion-local/scsi-36000c2913ba6a22c66120c73cb1edae6
    Events:
      Type     Reason              Age                  From     Message
      ----     ------              ----                 ----     -------
      Warning  VolumeFailedDelete  6m2s (x26 over 12m)  deleter  Error cleaning PV "local-pv-a2879220": failed to get volume mode of path "/mnt/local-storage/ibm-spectrum-fusion-local/scsi-36000c2913ba6a22c66120c73cb1edae6": Directory check for "/mnt/local-storage/ibm-spectrum-fusion-local/scsi-36000c2913ba6a22c66120c73cb1edae6" failed: open /mnt/local-storage/ibm-spectrum-fusion-local/scsi-36000c2913ba6a22c66120c73cb1edae6: no such file or directory
    Note: If the ocs-osd-removal-job fails and the pod is not in the expected completed state, check the pod logs for further debugging.
    Remove Encryption related configuration
    Remove the dm-crypt managed device-mapper mapping from the OSD devices that are removed from the respective Red Hat OpenShift Data Foundation nodes if encryption was enabled during installation.
    • For each of the previously identified nodes, do the following:
      oc debug node/<node name>
      chroot /host
      dmsetup ls| grep <pvc name>
    • Remove the mapped device.
      cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
      Example output:
      [root@fu40 ~]# oc debug nodes/fu49
      Starting pod/fu49-debug ...
      To use host binaries, run `chroot /host`
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-4.4# dmsetup ls
      ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt	(253:0)
      sh-4.4# cryptsetup luksClose --debug --verbose ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt
      # cryptsetup 2.3.3 processing "cryptsetup luksClose --debug --verbose ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt"
      # Running command close.
      # Locking memory.
      # Installing SIGINT/SIGTERM handler.
      # Unblocking interruption on signal.
      # Allocating crypt device context by device ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt.
      # Initialising device-mapper backend library.
      # dm version   [ opencount flush ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # Detected dm-ioctl version 4.43.0.
      # Detected dm-crypt version 1.21.0.
      # Device-mapper backend running with UDEV support enabled.
      # dm status ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount noflush ]   [16384] (*1)
      # Releasing device-mapper backend.
      # Allocating context for crypt device (none).
      # Initialising device-mapper backend library.
      Underlying device for crypt device ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt disappeared.
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush securedata ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm deps ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush ]   [16384] (*1)
      # LUKS device header not available.
      # Deactivating volume ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt.
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm status ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount noflush ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush securedata ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm deps ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # dm table ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush securedata ]   [16384] (*1)
      # dm versions   [ opencount flush ]   [16384] (*1)
      # Udev cookie 0xd4d9390 (semid 0) created
      # Udev cookie 0xd4d9390 (semid 0) incremented to 1
      # Udev cookie 0xd4d9390 (semid 0) incremented to 2
      # Udev cookie 0xd4d9390 (semid 0) assigned to REMOVE task(2) with flags DISABLE_LIBRARY_FALLBACK         (0x20)
      # dm remove ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b-block-dmcrypt  [ opencount flush retryremove ]   [16384] (*1)
      # Udev cookie 0xd4d9390 (semid 0) decremented to 1
      # Udev cookie 0xd4d9390 (semid 0) waiting for zero
    Find the persistent volume (PV) that need to be deleted
    Run the oc command to find the failed pv.
    oc get pv -l kubernetes.io/hostname=<failed-osds-worker-node-name>
    Example output:
    [root@fu40 ~]# oc get pv -l kubernetes.io/hostname=fu49
    NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                                     STORAGECLASS                REASON   AGE
    local-pv-a2879220   600Gi      RWO            Delete           Released   openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-1m227b   ibm-spectrum-fusion-local            55m
    Delete the released persistent volume (PV)
    Run the oc command to delete the released pv.
    oc delete pv <pv_name>
    Example output:
    [root@fu40 ~]#  oc delete pv local-pv-a2879220
    persistentvolume "local-pv-a2879220" delete
  5. Add new OSD into the node.

    Add a new device physically to the node.

    Track the provisioning of persistent volume (PV)s for the devices that match the deviceInclusionSpec.
    It can take a few minutes to provision the PVs. Once the PV is identified, it adds itself to the cluster automatically.
    • lvs spec
      oc -n openshift-local-storage describe localvolumeset ibm-spectrum-fusion-local
      Example output:
      ...
      Spec:
      Device Inclusion Spec:
          Device Types:
          disk
          part
          Max Size:  601Gi
          Min Size:  599Gi
      Node Selector:
          Node Selector Terms:
          Match Expressions:
              Key:       cluster.ocs.openshift.io/openshift-storage
              Operator:  In
              Values:
    Delete the ocs-osd-removal-job
    Run the oc command to delete the ocs-osd-removal-job.
    ```
    oc delete -n openshift-storage job ocs-osd-removal-job
    ```
    ```
    [root@fu40 ~]# oc delete -n openshift-storage job ocs-osd-removal-job
    job.batch "ocs-osd-removal-job" deleted
    ```
  6. Verify that there is a new OSD running
    Verify new OSD pod is running
    Run the oc command to check the new OSD pod is running.
    oc get -n openshift-storage pods -l app=rook-ceph-osd
    Example output:
    [root@fu40 ~]# oc get -n openshift-storage pods -l app=rook-ceph-osd
    NAME                               READY   STATUS    RESTARTS   AGE
    rook-ceph-osd-0-7f99b8ccd5-ssj5w   2/2     Running   0          7m31s       <<-- This pod
    rook-ceph-osd-1-764f9cff48-6gkg9   2/2     Running   0          64m
    rook-ceph-osd-2-5d9d5984dc-8gkrz   2/2     Running   0          64m
    Tip: If the new OSD does not show as Running after a few minutes, restart the rook-ceph-operator pod to force a reconciliation.
    oc delete pod -n openshift-storage -l app=rook-ceph-operator
    Verify new PVC is created
    Run the oc command to check whether the pods are running.
    oc get pvc -n openshift-storage
    Example output:
    [root@fu40 ~]# oc get pvc -n openshift-storage
    NAME                                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
    db-noobaa-db-pg-0                                       Bound    pvc-783036b5-ec40-41a7-91e5-9e179fd24cc3   50Gi       RWO            ocs-storagecluster-ceph-rbd   65m  <<--This one
    ocs-deviceset-ibm-spectrum-fusion-local-0-data-04vwvq   Bound    local-pv-b45b1d67                          600Gi      RWO            ibm-spectrum-fusion-local     66m
    ocs-deviceset-ibm-spectrum-fusion-local-0-data-24nj5t   Bound    local-pv-c3de9110                          600Gi      RWO            ibm-spectrum-fusion-local     66m
    ocs-deviceset-ibm-spectrum-fusion-local-0-data-3nsk8j   Bound    local-pv-1c9f3b11                          600Gi      RWO            ibm-spectrum-fusion-local     34m     
    [root@fu40 ~]#
    [root@fu40 ~]# oc get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                     STORAGECLASS                  REASON   AGE
    local-pv-1c9f3b11                          600Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-3nsk8j   ibm-spectrum-fusion-local              10m     <<--This one
    local-pv-b45b1d67                          600Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-04vwvq   ibm-spectrum-fusion-local              68m
    local-pv-c3de9110                          600Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-ibm-spectrum-fusion-local-0-data-24nj5t   ibm-spectrum-fusion-local              68m
    pvc-783036b5-ec40-41a7-91e5-9e179fd24cc3   50Gi       RWO            Delete           Bound    openshift-storage/db-noobaa-db-pg-0                                       ocs-storagecluster-ceph-rbd            65m
    Verify the OSD Encryption settings
    If cluster wide encryption is enabled, ensure that the crypt keyword is next to the ocs-deviceset name.
    oc debug node/<new-node-name>  -- chroot /host lsblk -f
    oc debug node/<new-node-name> -- chroot /host dmsetup ls
    Example output:
    [root@fu40 ~]# oc debug node/fu49 -- chroot /host lsblk -f
    Starting pod/fu49-debug ...
    To use host binaries, run `chroot /host`
    NAME                                                                  FSTYPE      LABEL                                           UUID                                 MOUNTPOINT
    loop1                                                                 crypto_LUKS pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca 6a8244eb-55d6-48cc-8e68-33436e512bc6
    loop2                                                                 crypto_LUKS pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca fa228ec1-0b1d-43ad-8707-9ecd38bfb1f8
    sda
    |-sda1
    |-sda2                                                                vfat        EFI-SYSTEM                                      A084-4057
    |-sda3                                                                ext4        boot                                            7d757098-d548-4b7b-8c9a-3dd4f34ceca1 /boot
    `-sda4                                                                xfs         root                                            1cd39805-6936-458d-ae8c-39313bb71c95 /sysroot
    sdc                                                                   crypto_LUKS pvc_name=ocs-deviceset-ibm-spectrum-fusion-loca fa228ec1-0b1d-43ad-8707-9ecd38bfb1f8
    `-ocs-deviceset-ibm-spectrum-fusion-local-0-data-3nsk8j-block-dmcrypt
    sr0
    
    Removing debug pod ...
    [root@fu40 ~]# oc debug node/fu49 -- chroot /host dmsetup ls
    Starting pod/fu49-debug ...
    To use host binaries, run `chroot /host`
    ocs-deviceset-ibm-spectrum-fusion-local-0-data-3nsk8j-block-dmcrypt	(253:0)
    
    Removing debug pod ...
    Note: If verification steps fail, then contact Red Hat support.
    Exit maintenance mode
    Run the oc command to exit maintenance mode after all steps are completed.
    oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode-"
    Example output:
    [root@fu40 ~]# oc label odfclusters.odf.isf.ibm.com -n ibm-spectrum-fusion-ns odfcluster "odf.isf.ibm.com/maintenanceMode-"
    odfcluster.odf.isf.ibm.com/odfcluster unlabeled
  7. Go to Data foundation page in IBM Fusion user interface and check the health of the Storage cluster in the Health section.