Cloud Pak for Security: Compute node on cluster has "Disk Pressure", pods display as "Evicted" and CP4S fails to start

Troubleshooting

Problem

A situation occurred where a compute node had a "Disk Pressure" condition and pods were being evicted. The corruption on the node stopped the pods from being set up on it.

Symptom

The Ingress Operator update appears stuck and several pods are created causing Disk Pressure on the node.

Diagnosing The Problem

To see the stalled ingress operator Cluster Service Version (CSV) run the command:

"oc get csv -n ibm-common-services"

Look for stalled CSV updates. This can cause hundreds of "ibm-management-ingress-operator" pods, which can cause disk pressure on the nodes.

Resolving The Problem

Procedure

Log in to your Red Hat OpenShift Container Platform Console.
In Red Hat OpenShift Container Platform Console, click the drop-down arrow next to your username.
Select the option Copy login command, then click Display Token.
Highlight and copy the oc login command. The command looks similar to the following:
```
oc login –-token=sha256......
```

Run the following oc command. If any of the compute nodes have the "DiskPressure" status with True, then proceed to the following step.

oc describe node -l node-role.kubernetes.io/worker|egrep -i diskpressure

  DiskPressure     False   Wed, 23 Nov 20...   ...kubelet has no disk pressure
  DiskPressure     True    Wed, 23 Nov 20...   ...kubelet has no disk pressure

..etc.

Run the following command to retrieve a list of the "clusterserviceversion" (CSV) resources under the "ibm-common-services" namespace. If there are previous versions of the updates for the "ibm-management-ingress-operator" stuck in Pending status, then the issue is occurring where the update is stalled.

oc get csv -n ibm-common-services

NAME                            ... VERSION   REPLACES                                  PHASE
ibm-management-ingress-operator ... 1.16.5    ibm-management-ingress-operator.v1.16.4   Replacing
ibm-management-ingress-operator ... 1.16.6    ibm-management-ingress-operator.v1.16.5   Pending

Run the following command to delete the stalled "CSV".

oc delete csv ibm-management-ingress-operator -n ibm-common-services

Run the following command to cordon the node that had the "DiskPressure" state of "True" from the earlier step. Replace <node> with the name of the compute node.
```
$ oc adm cordon <node>
```
Reboot the compute node.
After the node completes the reboot, uncordon the compute node. Replace <node> with the name of the compute node.
```
oc adm uncordon <node>
```
Delete all pods under the "ibm-common-services" namespace.
```
$ oc delete pod --all -n ibm-common-services 
```
Review all pods to confirm there are no Evicted pods.
```
$ oc get pod --all-namespaces | grep Evicted
```
Results
You can now log in to your Red Hat OpenShift Container Platform Console UI

Related Information

Unable to upgrade operator for CSV Pending in Openshift 4

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTDPP","label":"IBM Cloud Pak for Security"},"ARM Category":[{"code":"a8m3p0000000rbnAAA","label":"Administration Task"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

Cloud Pak for Security: Compute node on cluster has "Disk Pressure", pods display as "Evicted" and CP4S fails to start

Troubleshooting

Problem

Symptom

Diagnosing The Problem

Resolving The Problem

Related Information

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?