Replacing a failed node on VMware installer-provisioned infrastructure
Use this procedure to replace a failed node VMware installer-provisioned infrastructure.
Before you begin
- Ensure that the replacement nodes are configured with similar infrastructure, resources, and disks to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
Procedure
What to do next
- Verify that the new node is present in the
output:
oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
csi-cephfsplugin-*
csi-rbdplugin-*
and confirm
that at least the following pods on the new node are in a Running state:- Verify that all the other required Fusion Data Foundation pods are in Running state.
- Ensure that the new incremental
mon
is created, and is in the Running state:oc get pod -n openshift-storage | grep mon
Example output:
rook-ceph-mon-a-cd575c89b-b6k66 2/2 Running 0 38m rook-ceph-mon-b-6776bc469b-tzzt8 2/2 Running 0 38m rook-ceph-mon-d-5ff5d488b5-7v8xh 2/2 Running 0 4m8s
OSD and monitor pod might take several minutes to get to the Running state.
- Verify that the new Object Storage Device (OSD) pods
are running on the replacement
node:
oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
- If cluster-wide encryption is enabled on the
cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
- Create a debug pod and open a chroot environment for the one or more selected
hosts:
oc debug node/<node_name>
chroot /host
- Display the list of available block devices:, using the lsblk
command.
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- Create a debug pod and open a chroot environment for the one or more selected
hosts:
-
If the verification steps fail, contact IBM Support.