Replacing a failed VMware node on user-provisioned infrastructure
Use this information to replace a failed VMware node on a user-provisioned infrastructure.
Before you begin
- Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
- Identify the node and its Virtual Machine (VM) that you need replace.
- Delete the node, where <node_name> specifies the name of the node
that needs to be replaced.
oc delete nodes <node_name>
- Log in to VMware vSphere, and terminate the VM that you
identified. Important: Delete the VM only from the inventory and not from the disk.
- Create a new VM on VMware vSphere with the required
infrastructure. For more information, see ../planning/platform_requirements.html.
- Create a new OpenShift Container Platform worker node using the new VM.
- Check for the Certificate Signing Requests (CSRs) related to OpenShift
Container Platform that are in Pending state.
oc get csr
- Approve all the required OpenShift Container Platform CSRs for the new
node, where <certificate_name> specifies the name of the CSR.
oc adm certificate approve <certificate_name>
- Go to and confirm that the new node is in a Ready state.
- Apply the Fusion Data Foundation
label to the new node using one of the following steps:
- From the user interface
- Go to
cluster.ocs.openshift.io/openshift-storage, and click Save.
- From the command-line interface
- Apply the Fusion Data Foundation label to the new node:, where
<new_node_name> specifies the name of the new
oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
What to do next
- Verify that the new node is present in the
oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
that at least the following pods on the new node are in a Running state:
- Verify that all the other required Fusion Data Foundation pods are in Running state.
- Verify that the new Object Storage Device (OSD) pods
are running on the replacement
oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
- If cluster-wide encryption is enabled on the
cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
- Create a debug pod and open a chroot environment for the one or more selected
oc debug node/<node_name>
- Display the list of available block devices:, using the lsblk
Check for the
cryptkeyword beside the one or more
- Create a debug pod and open a chroot environment for the one or more selected hosts:
If the verification steps fail, contact IBM Support.