Troubleshooting
Problem
How do you debug cluster nodes that become "NotReady" with "PLEG is not healthy" as the reason?
Resolving The Problem
What';s Happening
The cluster nodes become "NotReady" with the following reason:
PLEG is not healthy: pleg was last seen active ***h**m***s ago;
Why it';s Happening
Cluster nodes can become "NotReady" with this reason when they are overloaded. Self-checking will be helpful to diagnose the issue.
How to Fix
You need to understand the load on the nodes to determine if it might be contributing to the issues. You can use the following commands to get an understanding of the load on the nodes:
- kubectl top nodes: This command show you if any of the nodes are getting close to CPU/memory capacity.
- kubectl top pods --all-namespaces: This command will show if any pods are consuming large amounts of CPU/memory.
- kubectl get pods --all-namespaces -o=wide: This command will enable you to tie up any pods that are using lots of resources with the nodes that they are running on and understand the total number of pods in the system.
There are situations where PLEG errors can occur if there is a large number of containers on the nodes (even when there is plenty of spare CPU/memory). It depends on configuration, but problems typically occur when you exceed about 400 containers per node. You can see the number of containers per pod in the output from the kubectl get pods command under the READY column.
You can also see PLEG errors if there is are thousands of services in the cluster. You can retrieve a list of the services in the cluster using the following command:
kubectl get services --all-namespaces
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
01 August 2019
UID
ibm1KB0011156