Replacing an operational AWS node on installer-provisioned infrastructure

Use this information to replace an operational AWS node on an installer-provisioned infrastructure.

Procedure

  1. Log in to the OpenShift Web Console, and click Compute > Nodes.
  2. Identify the node that you need to replace.
    Take a note of its Machine Name.
  3. Mark the node as unschedulable, using the following command, where <node_name> specifies the name of the node that you need to replace.
    oc adm cordon <node_name>
  4. Drain the node.
    oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important: This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.
  5. Click Compute > Machines and search for the required machine.
  6. For the required machine, click Action menu > Delete Machine.
  7. Click Delete to confirm that the machine is deleted.
    A new machine is automatically created.
  8. Wait for the new machine to start and transition into the Running state.
    Important: This activity might take at least 5 - 10 minutes or more.
  9. Go to Compute > Nodes and confirm that the new node is in a Ready state.
  10. Apply the Fusion Data Foundation label to the new node using one of the following steps:
    From the user interface
    1. Go to Action Menu > Edit Labels > .
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    Apply the Fusion Data Foundation label to the new node:, where <new_node_name> specifies the name of the new node.
    oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

What to do next

Verify that the new node and all pods are running.
  1. Verify that the new node is present in the output:
    oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Workloads > Pods and confirm that at least the following pods on the new node are in a Running state:
    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required Fusion Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
    oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:
      oc debug node/<node_name>
      chroot /host
    2. Display the list of available block devices:, using the lsblk command.

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact IBM Support.