Replacing an operational AWS node on user-provisioned infrastructure

Use this information to replace an operational AWS node on a user-provisioned infrastructure.

Before you begin

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node that you need to replace.
  2. Mark the node as unscheduable, where <node_name> specifies the name of node that you need to replace.
    oc adm cordon <node_name>
  3. Drain the node.
    oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important: This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.
  4. Delete the node.
    oc delete nodes <node_name>
  5. Create a new Amazon Web Service (AWS) machine instance with the required infrastructure.
    For more information, see ../planning/platform_requirements.html.
  6. Create a new OpenShift Container Platform node using the new AWS machine instance.
  7. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in a Pending state.
    oc get csr
  8. Approve all the required OpenShift Container Platform CSRs for the new node, where <certificate_name> specifies the name of the CSR.
    oc adm certificate approve <certificate_name>
  9. Go to Compute > Nodes and confirm that the new node is in a Ready state.
  10. Apply the Fusion Data Foundation label to the new node using one of the following steps:
    From the user interface
    1. Go to Action Menu > Edit Labels > .
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    Apply the Fusion Data Foundation label to the new node:, where <new_node_name> specifies the name of the new node.
    oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

What to do next

Verify that the new node and all pods are running.
  1. Verify that the new node is present in the output:
    oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Workloads > Pods and confirm that at least the following pods on the new node are in a Running state:
    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required Fusion Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
    oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:
      oc debug node/<node_name>
      chroot /host
    2. Display the list of available block devices:, using the lsblk command.

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact IBM Support.