My IBM

Kubernetes CPU throttling: The silent killer of response time

11 April 2023

4 min read

Today, the majority of enterprise organizations running mission-critical applications on Kubernetes are doing so in multitenant environments. These multitenant environments rely on the setting of limits to regulate the tenant workloads’ consumption or to use limits for chargebacks. Some devs will choose to set CPU or GPU limits for benchmark testing of their applications.

CPU throttling—whereby the rate of task scheduling on the physical CPU cores is inadvertently decreased, often resulting in an undesired increase in application response time—is the unintended consequence of this design. Take a look at this example:

In the above figure, the CPU usage of a container is only 25%, which makes it a natural candidate to resize down:

But after we resize down the container (container CPU usage is now 50%, still not high), the response time quadrupled.

What is CPU throttling?

So, what’s going on here? CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your application’s response time and cause a throttling issue. Even if you have more than enough resources on your underlying node, your container workload will still be throttled because it was not configured properly. Furthermore, the performance impact of throttling can vary depending on the underlying physical processor (Intel vs. AMD vs. NVIDIA). The high response times are directly correlated to periods of high CPU throttling, and this is exactly how Kubernetes was designed to work.

To bring some color to this, imagine you set a CPU limit of 200ms and that limit is translated to a group quota in the underlying Linux system. The container is only able to use 20ms of CPU at a time (called a CPU time slice) because the default enforcement period is only 100ms. If your task is longer than 20ms, you will be throttled, and it will take you 4x longer to complete the task.

Based on this behavior, the application’s performance will suffer due to the increase in response time caused by throttling and you will begin troubleshooting to try and find the problem.

Troubleshooting CPU throttling in Kubernetes

If you are running a small deployment, you may be able to manually troubleshoot throttling.

First, you would identify the affected pod using tools like kubectl (link resides outside of ibm.com). Next, review the pod’s resource requests and limits to ensure they are set appropriately. Check for any resource-hungry processes running inside the container that may be causing the throttling and analyze the CPU utilization and limits.

If CPU throttling persists, consider horizontal pod autoscaling to distribute the workload across more pods, or adjust the cluster’s node resources to meet the demands. Continuously monitor and fine-tune resource settings to optimize performance and prevent further throttling issues.

In a larger deployment, this approach is unlikely to scale or persist as you add more pods.

Using IBM Turbonomic to avoid CPU throttling in Kubernetes

CPU throttling is a key application performance metric due to the direct correlation between response time and CPU throttling. This is great news for you, as you can get this metric directly from Kubernetes and OpenShift.

To ensure that your application response times remain low, CPU doesn’t get throttled, and you continue to have a high performance application, you need to first understand that when CPU throttling is occurring, you can’t rely solely on CPU core utilization. You need to account for all of the analytics and resource dependencies that impact application performance. IBM Turbonomic has built these considerations into its analytics platform.

When determining container rightsizing actions, Turbonomic continuously analyzes four dimensions:

CPU limits
CPU requests
Memory limits
Memory requests

Turbonomic can determine the CPU limits that will mitigate the risk of throttling and allow your applications to perform unincumbered. This is all through the power of adding CPU throttling as a dimension for the platform to analyze and manage the tradeoffs that appear. Adding the dimension of CPU throttling will ensure low application response times.

On top of this, Turbonomic is generating actions to move your pods and scale your clusters—as we all know, it’s a full-stack challenge. Customers have the ability to see the KPIs and ask “Which one of my services is being throttled?” It also allows them to understand the history of CPU throttling for each service and remember that each service is directly correlated to application response time, providing users with valuable windows into their system’s performance.

In Kubernetes context, one of the primary benefits of Turbonomic is its ability to quickly identify and remediate unintended consequences of a platform strategy rather than having the customer redesign their multitenant platform strategy. Not only can Turbonomic monitor CPU throttling metrics, but the platform can also automatically right-size your CPU limit and bring the throttling down to a manageable level.

Learn more about IBM Turbonomic

IBM Turbonomic can help simultaneously optimize your cloud spend and performance. You can continuously automate optimization actions in real-time—without human intervention—that proactively deliver the most efficient use of compute, memory, storage and network resources to your apps at every layer of the stack.

Author

Cheuk Lam

Software Engineer

IBM Blog

Unlock digital transformation with strategic application modernization

Boost annual revenue by 14% and cut maintenance costs by up to 50% with targeted app modernization strategies.

Resources

Containers in the enterprise

Understand how leading businesses are using container technology to drive innovation, scalability and efficiency. Download your copy now.

Unlock the future of IT with hybrid cloud: A game-changer for your business

Discover how a hybrid cloud strategy can drive flexibility, security and growth for your business. Explore expert insights and real-world case studies that show why leading enterprises are making the switch.

Unlock the power of Docker for scalable, efficient application deployment

Docker simplifies application deployment with lightweight, portable containers, ensuring consistency, scalability and efficiency across environments. Streamline your processes and boost performance with Docker today.

Unlock the full potential of your data intelligence

Ready to transform your business with advanced data solutions? Explore how IBM’s cutting-edge technologies can help you harness the power of data, streamline operations and gain a competitive edge.

Harness the power of Kubernetes: Simplifying container management

Explore how Kubernetes enables businesses to handle large-scale applications, improve resource efficiency and achieve faster software delivery cycles. Learn how adopting Kubernetes can optimize your IT infrastructure and boost operational efficiency.

Optimize your network traffic with efficient load balancing

Enhance your infrastructure’s availability, scalability and security by exploring IBM’s load balancing offerings. Take the next step toward seamless traffic management today.