Table of contents

Cluster imbalance overwhelms worker nodes

An imbalance of resource allocation across the cluster can cause single worker nodes to be overwhelmed.

Symptoms

The following symptoms occur and can affect the Watson Studio, Watson Knowledge Catalog, and Analytics Engine Powered by Apache Spark services:

  • Clusters are unstable.
  • Single worker node resources are overwhelmed by many pods.
  • Tools that consume more than minimum resources, such as Juptyer notebook editors, are either unable to run or have slow kernel startup.

Causes

Too many pods are assigned to the same worker node. When one of the services starts using more than its minimum resources, the worker node tries to run more CPUs than it has.

Diagnosing the problem

Check cluster resources

  1. Check resource consumption on all of the nodes by using the following command: oc adm top nodes

    Determine whether there are resource consumption constraints on the CPU and memory.

  2. Check resource reservation on all of the nodes by using the following command: oc describe node nodeName

    Under Allocated resources, you’ll see Request and Limit CPU and Memory. Determine whether the CPU and memory resource reservations are constrained.

    Note: Worker nodes won’t be available for scheduling if Requests resources reach 100%. Over-subscription is allowed, where limits exceed 100%.

Check cluster nodes status

Check events on all of the nodes by using the following command: oc describe node nodeName

At the bottom of the output, under Events, you’ll see activities involving the node.

Check pods with Terminating status

Terminate pods that are holding the resources by using the following command:

oc get pod --all-namespaces -o wide | grep -i terminating | grep -i -v cleanup

If you see large number of pods in Terminating and Error Status, they will hold the reserved CPU and memory resources even though those resources aren’t being used.

Resolving the problem

Modify the default scheduler to fix notebook workload imbalance (only required for Red Hat® OpenShift® version 3.11)

To modify the default scheduler:

  1. On each master node, edit the /etc/origin/master/scheduler.json file and modify the following parameters:
    SelectorSpreadPriority 100   ( default =1 )
    LeastRequestedPriority 100    (default =1 )
    BalancedResourceAllocation 50  (default = 1)
  2. Run the following commands after changes on each master node:
    "master-restart api api"
    "master-restart controllers controllers"

Identify saturated pods and reschedule them to other nodes

Use the following procedures to identify the saturated pods and reschedule the pods to other nodes. There procedures are the workaround that resolves a single node that’s constrained by too many pods, for example, Watson Knowledge Catalog pods.

Identify the saturated pods by doing the following steps:

  1. Run oc describe node nodeName to see the full list of pods on the node and resource reservation and limits.
  2. From the pod lists of describe node output, identify the pods that request a large amount of CPU or memory resources.

Reschedule the pods to the other nodes by doing the following steps:

  1. Mark the node as unschedulable by using the following command:
    oc adm cordon nodename
  2. Check that the node is unschedulable by using the following command: oc get nodes
  3. Show the node in "SchedulingDisabled" status.
  4. Delete the pods and restart the pods by using the following command:
    oc delete pod podname -n namespace
  5. Mark the node back to schedulable by using the following command:
    oc adm uncordon nodename
  6. Confirm that the node is back the “Ready” status by using the following command: oc get nodes

Clean up terminating pods

Use the following steps as a workaround to resolve situations where too many terminating pods consume resources.

  1. Use the following command to get the list of terminating pods:
    oc get pod --all-namespaces -o wide |grep -i terminating | grep -i -v cleanup
  2. Use the following command to clean up the pods:
    oc get pod --all-namespaces -o wide |grep -i terminating | grep -i -v cleanup | awk -F' ' '{print $2}' | xargs oc delete pod -n zen --grace-period=0 --force