Cleaning up localhost node from the IBM Storage Fusion HCI System cluster
Follow the instructions to clean up the localhost node from the IBM Storage Fusion HCI System cluster.
About this task
The localhost node must not be added to the OpenShift® cluster, as it creates an issue at a later stage. A couple of OpenShift objects are created when the localhost node is added to the IBM Storage Fusion HCI System cluster.
Procedure
Follow the steps to clean up the localhost node so that the node addition can retry after you
resolve the DHCP or DNS misconfiguration issues.
- Run the following command to edit the
ComputeProvisionWorker CR
of the compute node that indicated an error message This node might contain an invalid hostname (localhost).# oc edit cpw provisionworker-compute-1-ru5
Edit the respective CR to update the location information to an empty string to ensure that theComputeProvisionWorker
controller is not involved when you clean different objects of the compute node.# oc edit cpw provisionworker-compute-1-ru5 ... spec: location: ""
- Delete the respective machine object of the compute node.
You need to identify the machine object first and then mark it with an annotation. Then, scale down the machineset object to delete the machine object.
- Run the following command to get the machine object name from the BMH
object.
# oc -n openshift-machine-api get bmh,machine
Example output:
The BMH object of# oc -n openshift-machine-api get bmh,machine NAME STATE CONSUMER ONLINE ERROR AGE baremetalhost.metal3.io/compute-1-ru5 provisioned isf-rackae6-42ps4-worker-0-r6fmw true 22h baremetalhost.metal3.io/compute-1-ru6 provisioned isf-rackae6-42ps4-worker-0-t922n true 22h baremetalhost.metal3.io/compute-1-ru7 provisioned isf-rackae6-42ps4-worker-0-g842m true 22h baremetalhost.metal3.io/control-1-ru2 externally provisioned isf-rackae6-42ps4-master-0 true 44h baremetalhost.metal3.io/control-1-ru3 externally provisioned isf-rackae6-42ps4-master-1 true 44h baremetalhost.metal3.io/control-1-ru4 externally provisioned isf-rackae6-42ps4-master-2 true 44h NAME PHASE TYPE REGION ZONE AGE machine.machine.openshift.io/isf-rackae6-42ps4-master-0 Running 44h machine.machine.openshift.io/isf-rackae6-42ps4-master-1 Running 44h machine.machine.openshift.io/isf-rackae6-42ps4-master-2 Running 44h machine.machine.openshift.io/isf-rackae6-42ps4-worker-0-g842m Running 22h machine.machine.openshift.io/isf-rackae6-42ps4-worker-0-r6fmw Running 22h machine.machine.openshift.io/isf-rackae6-42ps4-worker-0-t922n Running 22h
compute-1-ru5
maps to theisf-rackae6-42ps4-worker-0-r6fmw
of the machine object. - Mark the machine object to delete by a special annotation
machine.openshift.io/cluster-api-delete-machine: delete-me
. The special marking helps to override the machine deletion policy rule.# oc -n openshift-machine-api edit machine isf-rackae6-42ps4-worker-0-r6fmw apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: machine.openshift.io/cluster-api-delete-machine: delete-me ...
- Now you need to scale down the machineset object so that the deletion of the machine object gets
initiated.Note: After the machineset scale down is performed, the machine objects corresponding to
compute-1-ru5
are cleaned up, and the status of the BMH object corresponding tocompute-1-ru5
changes to deprovisioning.# oc -n openshift-machine-api get machineset NAME DESIRED CURRENT READY AVAILABLE AGE isf-rackae6-cltp4-worker-0 3 3 3 3 8h # oc -n openshift-machine-api get machineset -oyaml | grep replicas replicas: 3 # oc -n openshift-machine-api scale --replicas=<old replica value - 1> machineset <machine set name> machineset.machine.openshift.io/isf-rackae6-42ps4-worker-0 scaled
- After the machineset scale down is performed successfully, the status of the BMH object
corresponding to
compute-1-ru5
changes to ready.Important: This activity might take few minutes, and wait for it to be reflected against the BMH object before proceed with the further steps.
- Run the following command to get the machine object name from the BMH
object.
- Delete the compute nodes in the node object from the OpenShift Container Platform cluster if anything
present.
# oc get nodes NAME STATUS ROLES AGE VERSION compute-1-ru6.isf-rackae6.rtp.raleigh.ibm.com Ready worker 21h v1.23.17+16bcd69 compute-1-ru7.isf-rackae6.rtp.raleigh.ibm.com Ready worker 21h v1.23.17+16bcd69 control-1-ru2.isf-rackae6.rtp.raleigh.ibm.com Ready master,worker 43h v1.23.17+16bcd69 control-1-ru3.isf-rackae6.rtp.raleigh.ibm.com Ready master,worker 43h v1.23.17+16bcd69 control-1-ru4.isf-rackae6.rtp.raleigh.ibm.com Ready master,worker 43h v1.23.17+16bcd69 localhost Ready worker 20h v1.23.17+16bcd69 # oc delete node localhost
- Delete the BMH object of the compute
node.
# oc -n openshift-machine-api get bmh NAME STATE CONSUMER ONLINE ERROR AGE compute-1-ru5 available false 24h compute-1-ru6 provisioned isf-rackae6-42ps4-worker-0-t922n true 24h compute-1-ru7 provisioned isf-rackae6-42ps4-worker-0-g842m true 24h control-1-ru2 externally provisioned isf-rackae6-42ps4-master-0 true 46h control-1-ru3 externally provisioned isf-rackae6-42ps4-master-1 true 46h control-1-ru4 externally provisioned isf-rackae6-42ps4-master-2 true 46h # oc -n openshift-machine-api delete bmh compute-1-ru5 baremetalhost.metal3.io "compute-1-ru5" deleted
- Delete all pending
CertificateSigningRequests
corresponding to the localhost node.# for i in `oc get csr --no-headers | grep -i system:node:localhost | grep -i pending | awk '{ print $1 }'`;do oc delete csr $i; done
- Fix the DNS or DHCP issue to get the correct hostname for the corresponding compute node instead of the local host.
- Delete the
ComputeProvisionWorker
object of the compute node.# oc -n ibm-spectrum-fusion-ns get cpw NAME AGE provisionworker-compute-1-ru5 24h provisionworker-compute-1-ru6 24h provisionworker-compute-1-ru7 24h # oc -n ibm-spectrum-fusion-ns delete cpw provisionworker-compute-1-ru5 computeprovisionworker.install.isf.ibm.com "provisionworker-compute-1-ru5" deleted
- If the issue occurs during installation at the time of OpenShift configuration, the node conversion resumes automatically. If the issue occurs during the node upsize, then retry the node addition.