Kubernetes CPU throttling: The silent killer of response time
11 April 2023
4 min read

Today, the majority of enterprise organizations running mission-critical applications on Kubernetes are doing so in multitenant environments. These multitenant environments rely on the setting of limits to regulate the tenant workloads’ consumption or to use limits for chargebacks. Some devs will choose to set CPU or GPU limits for benchmark testing of their applications.

CPU throttling—whereby the rate of task scheduling on the physical CPU cores is inadvertently decreased, often resulting in an undesired increase in application response time—is the unintended consequence of this design. Take a look at this example:

In the above figure, the CPU usage of a container is only 25%, which makes it a natural candidate to resize down:

But after we resize down the container (container CPU usage is now 50%, still not high), the response time quadrupled.

What is CPU throttling?

So, what’s going on here? CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your application’s response time and cause a throttling issue. Even if you have more than enough resources on your underlying node, your container workload will still be throttled because it was not configured properly. Furthermore, the performance impact of throttling can vary depending on the underlying physical processor (Intel vs. AMD vs. NVIDIA). The high response times are directly correlated to periods of high CPU throttling, and this is exactly how Kubernetes was designed to work.

To bring some color to this, imagine you set a CPU limit of 200ms and that limit is translated to a group quota in the underlying Linux system. The container is only able to use 20ms of CPU at a time (called a CPU time slice) because the default enforcement period is only 100ms. If your task is longer than 20ms, you will be throttled, and it will take you 4x longer to complete the task.

Based on this behavior, the application’s performance will suffer due to the increase in response time caused by throttling and you will begin troubleshooting to try and find the problem.

Troubleshooting CPU throttling in Kubernetes

If you are running a small deployment, you may be able to manually troubleshoot throttling.

First, you would identify the affected pod using tools like kubectl (link resides outside of ibm.com). Next, review the pod’s resource requests and limits to ensure they are set appropriately. Check for any resource-hungry processes running inside the container that may be causing the throttling and analyze the CPU utilization and limits.

If CPU throttling persists, consider horizontal pod autoscaling to distribute the workload across more pods, or adjust the cluster’s node resources to meet the demands. Continuously monitor and fine-tune resource settings to optimize performance and prevent further throttling issues.

In a larger deployment, this approach is unlikely to scale or persist as you add more pods.

Using IBM Turbonomic to avoid CPU throttling in Kubernetes

CPU throttling is a key application performance metric due to the direct correlation between response time and CPU throttling. This is great news for you, as you can get this metric directly from Kubernetes and OpenShift.

To ensure that your application response times remain low, CPU doesn’t get throttled, and you continue to have a high performance application, you need to first understand that when CPU throttling is occurring, you can’t rely solely on CPU core utilization. You need to account for all of the analytics and resource dependencies that impact application performance. IBM Turbonomic has built these considerations into its analytics platform.

When determining container rightsizing actions, Turbonomic continuously analyzes four dimensions:

  1. CPU limits
  2. CPU requests
  3. Memory limits
  4. Memory requests

Turbonomic can determine the CPU limits that will mitigate the risk of throttling and allow your applications to perform unincumbered. This is all through the power of adding CPU throttling as a dimension for the platform to analyze and manage the tradeoffs that appear. Adding the dimension of CPU throttling will ensure low application response times.

On top of this, Turbonomic is generating actions to move your pods and scale your clusters—as we all know, it’s a full-stack challenge. Customers have the ability to see the KPIs and ask “Which one of my services is being throttled?” It also allows them to understand the history of CPU throttling for each service and remember that each service is directly correlated to application response time, providing users with valuable windows into their system’s performance.

In Kubernetes context, one of the primary benefits of Turbonomic is its ability to quickly identify and remediate unintended consequences of a platform strategy rather than having the customer redesign their multitenant platform strategy. Not only can Turbonomic monitor CPU throttling metrics, but the platform can also automatically right-size your CPU limit and bring the throttling down to a manageable level.

Learn more about IBM Turbonomic

IBM Turbonomic can help simultaneously optimize your cloud spend and performance. You can continuously automate optimization actions in real-time—without human intervention—that proactively deliver the most efficient use of compute, memory, storage and network resources to your apps at every layer of the stack.

 
Author
Cheuk Lam Software Engineer, IBM Blog