Draining nodes
When an administrator performs maintenance on a node involving a drain, the IBM Storage Scale operator will intercept the drain call. The operator intercepts this request from the Kubernetes scheduler n order to evict the application pods that use the CSI PVsbefore allowing the deletion of the core pod. This allows the application to shutdown gracefully before storage access on the node is disrupted. On Red Hat OpenShift, this includes intercepting request from the Machine Config Operator that involve drain.
If the operator is not running, the interception and drain will fail.
While maintenance is in progress, do not perform the following actions:
- Do not update IBM Storage Scale container native configuration.
- Do not upgrade IBM Storage Scale container native.
- Do not attempt to add a node or delete a node to the cluster.
In addition, it is possible to have a deadlock due to the applications that are waiting to be safely evicted from one or more nodes under maintenance. The deadlock can occur if too many nodes are under maintenance concurrently, and the application workload gets disrupted if further action is taken. In these cases, the administrator should safely evict and reschedule impacted applications.
For more information about troubleshooting cluster maintenance issues, see Identifying applications preventing cluster maintenance.
Node maintenance
When an administrator needs to perform manual maintenance on a node, it should be cordoned and safely drained of pods. Therefore, the application is rescheduled to other available nodes, and the IBM Storage Scale operator is notified of the node maintenance needed.
-
Drain the node that needs maintenance. The drain command also cordons the node.
On Red Hat OpenShift:
oc adm drain <node name>
On Kubernetes:
kubectl drain <node name>
-
When the drain completes (without error), maintenance can be performed on the node.
-
When maintenance is complete, uncordon the node to allow it to resume normal operation and workload scheduling.
On Red Hat OpenShift:
oc adm uncordon <node name>
On Kubernetes:
kubectl uncordon <node name>