Replacing a failed node on VMware user-provisioned infrastructure

Use this procedure to replace a failed node on VMware user-provisioned infrastructure.

Before you begin

  • Ensure that the replacement nodes are configured with similar infrastructure, resources, and disks to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node, and get the labels on the node that you need to replace, where <node_name> specifies the name of the node that needs to be replaced.
    oc get nodes --show-labels | grep <node_name>
  2. Identify the monitor pod (mon) (if any), and OSDs that are running in the node that you need to replace.
    oc get pods -n openshift-storage -o wide | grep -i <node_name>
  3. Identify the monitor pod (mon) (if any), and OSDs that are running in the node that you need to replace.
    oc get pods -n openshift-storage -o wide | grep -i <node_name>
  4. Mark the node as unschedulable
    oc adm cordon <node_name>
  5. Remove the pods which are in Terminating state.
    oc get pods -A -o wide | grep -i <node_name> |  awk '{if ($4 == "Terminating") system ("oc -n " $1 " delete pods " $2  " --grace-period=0 " " --force ")}'
  6. Drain the node.
    oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
  7. Delete the node.
    oc delete node <node_name>
  8. Log in to VMware vSphere and terminate the Virtual Machine (VM) that you have identified.
  9. Create a new VM on VMware vSphere with the required infrastructure.
    For more information see, Infrastructure requirements.
  10. Create a new OpenShift Container Platform worker node using the new VM.
  11. Check for certificate signing requests (CSRs) related to Fusion Data Foundation that are in Pending state.
    oc get csr
  12. Approve all required Fusion Data Foundation CSRs for the new node, where <certificate_name> specifies the name of the CSR.
    oc adm certificate approve <certificate_name>
  13. From the OpenShift Web Console, go to Compute > Nodes and confirm that the new node is in Ready state.
  14. Apply the Fusion Data Foundation label to the new node using one of the following steps:
    From the user interface
    1. Go to Action Menu > Edit Labels > .
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    Apply the Fusion Data Foundation label to the new node:, where <new_node_name> specifies the name of the new node.
    oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  15. Identify the namespace where OpenShift local storage operator is installed, and assign it to the local_storage_project variable.
    local_storage_project=$(oc get csv --all-namespaces | awk '{print $1}' | grep local)
    For example:
    local_storage_project=$(oc get csv --all-namespaces | awk '{print $1}' | grep local)
    echo $local_storage_project
    Example output:
    openshift-local-storage
  16. Add a new worker node to the localVolumeDiscovery and localVolumeSet.
    1. Update the localVolumeDiscovery definition to include the new node, and remove the failed node.
      oc edit -n $local_storage_project localvolumediscovery auto-discover-devices
      Example output, where server3.example.com is removed, and newnode.example.com is the new node:
      [...]
         nodeSelector:
          nodeSelectorTerms:
            - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - server1.example.com
                  - server2.example.com
                  #- server3.example.com
                  - newnode.example.com
      [...]
      Remember: Save before exiting the editor.
    2. Determine the localVolumeSet to edit.
      oc get -n $local_storage_project localvolumeset
      Example output:
      NAME          AGE
      localblock   25h
    3. Update the localVolumeSet definition to include the new node, and remove the failed node.
      oc edit -n $local_storage_project localvolumeset localblock
      Example output, where server3.example.com is removed, and newnode.example.com is the new node:
      [...]
         nodeSelector:
          nodeSelectorTerms:
            - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - server1.example.com
                  - server2.example.com
                  #- server3.example.com
                  - newnode.example.com
      [...]
      Remember: Save before exiting the editor.
  17. Verify that the new localblock Persistent Volume (PV) is available.
    oc get pv | grep localblock | grep Available
    local-pv-551d950     512Gi    RWO    Delete  Available
    localblock     26s
  18. Navigate to the openshift-storage project.
    oc project openshift-storage
  19. Remove the failed OSD from the cluster.
    Multiple failed OSDs can be specified, if required.
    oc process -n openshift-storage ocs-osd-removal \
    -p FAILED_OSD_IDS=<failed_osd_id> | oc create -f -
    <failed_osd_id>

    Is the integer in the pod name immediately after the rook-ceph-osd prefix.

    You can add comma separated OSD IDs in the command to remove more than one OSD, for example, FAILED_OSD_IDS=0,1,2.

    The FORCE_OSD_REMOVAL value must be changed to true in clusters that only have three OSDs, or clusters with insufficient space to restore all three replicas of the data after the OSD is removed.

  20. Verify that the OSD was removed successfully by checking the status of the ocs-osd-removal-job pod.
    A status of Completed confirms that the OSD removal job succeeded.
    oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
  21. Ensure that the OSD removal is completed.
    oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
    Example output:
    2022-05-10 06:50:04.501511 I | cephosd: completed removal of OSD 0
    Important:

    If the ocs-osd-removal-job fails, and the pod is not in the expected Completed state, check the pod logs for further debugging:

    For example:

    oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1
  22. Identify the Persistent Volume (PV) associated with the Persistent Volume Claim (PVC).
    oc get pv -L kubernetes.io/hostname | grep localblock | grep Released
    Example output:
    local-pv-d6bf175b  1490Gi  RWO  Delete  Released  openshift-storage/ocs-deviceset-0-data-0-6c5pw  localblock  2d22h  compute-1
    If there is a PV in Released state, delete it:
    oc delete pv <persistent_volume>
    For example:
    oc delete pv local-pv-d6bf175b
    Example output:
    persistentvolume "local-pv-d9c5cbd6" deleted
  23. Identify the crashcollector pod deployment.
    oc get deployment --selector=app=rook-ceph-crashcollector,node_name=<failed_node_name> -n openshift-storage
    If there is an existing crashcollector pod deployment, delete it.
    oc delete deployment --selector=app=rook-ceph-crashcollector,node_name=<failed_node_name> -n openshift-storage
  24. Delete the ocs-osd-removal-job.
    oc delete -n openshift-storage job ocs-osd-removal-job
    Example output:
    job.batch "ocs-osd-removal-job" deleted

What to do next

  1. Verify that the new node is present in the output:
    oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Workloads > Pods and confirm that at least the following pods on the new node are in a Running state:
    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required Fusion Data Foundation pods are in Running state.
  4. Ensure that the new incremental mon is created, and is in the Running state:
    oc get pod -n openshift-storage | grep mon

    Example output:

    rook-ceph-mon-a-cd575c89b-b6k66         2/2     Running
    0          38m
    rook-ceph-mon-b-6776bc469b-tzzt8        2/2     Running
    0          38m
    rook-ceph-mon-d-5ff5d488b5-7v8xh        2/2     Running
    0          4m8s

    OSD and monitor pod might take several minutes to get to the Running state.

  5. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
    oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  6. If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:
      oc debug node/<node_name>
      chroot /host
    2. Display the list of available block devices:, using the lsblk command.

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  7. If the verification steps fail, contact IBM Support.