April 11, 2023 By Cheuk Lam 4 min read

Today, the majority of enterprise organizations running mission-critical applications on Kubernetes are doing so in multitenant environments. These multitenant environments rely on the setting of limits to regulate the tenant workloads’ consumption or to use limits for chargebacks. Some devs will choose to set CPU or GPU limits for benchmark testing of their applications.

CPU throttling—whereby the rate of task scheduling on the physical CPU cores is inadvertently decreased, often resulting in an undesired increase in application response time—is the unintended consequence of this design. Take a look at this example:

Figure 1: CPU with 25% utilization.

In the above figure, the CPU usage of a container is only 25%, which makes it a natural candidate to resize down:

Figure 2: Huge spike in response time after resizing to ~50% CPU utilization.

But after we resize down the container (container CPU usage is now 50%, still not high), the response time quadrupled.

What is CPU throttling?

So, what’s going on here? CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your application’s response time and cause a throttling issue. Even if you have more than enough resources on your underlying node, your container workload will still be throttled because it was not configured properly. Furthermore, the performance impact of throttling can vary depending on the underlying physical processor (Intel vs. AMD vs. NVIDIA). The high response times are directly correlated to periods of high CPU throttling, and this is exactly how Kubernetes was designed to work.

To bring some color to this, imagine you set a CPU limit of 200ms and that limit is translated to a group quota in the underlying Linux system. The container is only able to use 20ms of CPU at a time (called a CPU time slice) because the default enforcement period is only 100ms. If your task is longer than 20ms, you will be throttled, and it will take you 4x longer to complete the task.

Based on this behavior, the application’s performance will suffer due to the increase in response time caused by throttling and you will begin troubleshooting to try and find the problem.

Troubleshooting CPU throttling in Kubernetes

If you are running a small deployment, you may be able to manually troubleshoot throttling.

First, you would identify the affected pod using tools like kubectl. Next, review the pod’s resource requests and limits to ensure they are set appropriately. Check for any resource-hungry processes running inside the container that may be causing the throttling and analyze the CPU utilization and limits.

If CPU throttling persists, consider horizontal pod autoscaling to distribute the workload across more pods, or adjust the cluster’s node resources to meet the demands. Continuously monitor and fine-tune resource settings to optimize performance and prevent further throttling issues.

In a larger deployment, this approach is unlikely to scale or persist as you add more pods.

Using IBM Turbonomic to avoid CPU throttling in Kubernetes

CPU throttling is a key application performance metric due to the direct correlation between response time and CPU throttling. This is great news for you, as you can get this metric directly from Kubernetes and OpenShift.

To ensure that your application response times remain low, CPU doesn’t get throttled, and you continue to have a high performance application, you need to first understand that when CPU throttling is occurring, you can’t rely solely on CPU core utilization. You need to account for all of the analytics and resource dependencies that impact application performance. IBM Turbonomic has built these considerations into its analytics platform.

When determining container rightsizing actions, Turbonomic continuously analyzes four dimensions:

  1. CPU limits
  2. CPU requests
  3. Memory limits
  4. Memory requests

Turbonomic can determine the CPU limits that will mitigate the risk of throttling and allow your applications to perform unincumbered. This is all through the power of adding CPU throttling as a dimension for the platform to analyze and manage the tradeoffs that appear. Adding the dimension of CPU throttling will ensure low application response times.

On top of this, Turbonomic is generating actions to move your pods and scale your clusters—as we all know, it’s a full-stack challenge. Customers have the ability to see the KPIs and ask “Which one of my services is being throttled?” It also allows them to understand the history of CPU throttling for each service and remember that each service is directly correlated to application response time, providing users with valuable windows into their system’s performance.

In Kubernetes context, one of the primary benefits of Turbonomic is its ability to quickly identify and remediate unintended consequences of a platform strategy rather than having the customer redesign their multitenant platform strategy. Not only can Turbonomic monitor CPU throttling metrics, but the platform can also automatically right-size your CPU limit and bring the throttling down to a manageable level.

Learn more about IBM Turbonomic

IBM Turbonomic can help simultaneously optimize your cloud spend and performance. You can continuously automate optimization actions in real-time—without human intervention—that proactively deliver the most efficient use of compute, memory, storage and network resources to your apps at every layer of the stack. 

Explore the interactive demo
Was this article helpful?

More from IBM Turbonomic

Application performance optimization: Elevate performance and reduce costs

4 min read - Application performance is not just a simple concern for most organizations; it’s a critical factor in their business's success. Driving optimal application performance while minimizing costs has become paramount as organizations strive for positive user experiences. These experiences can make or break a business, that’s why prioritizing high performance among applications is non-negotiable. What’s needed is a solution that not only safeguards the performance of your mission-critical applications, but also goes above and beyond through reduced cost, time efficiency and…

Operationalize automation for faster, more efficient incident resolution at a lower cost

3 min read - IT is under enormous pressure. The expectation is 24/7/365 performance while also delivering increasingly better customer experiences at the lowest possible cost. The reality is that it’s difficult to keep apps performing as designed, especially in modern, cloud-native environments with microservices and Kubernetes. Cloud costs are out of control, and teams spend too much time fixing instead of innovating. And it all happens at a rate that makes it impossible for humans to keep up. It’s time for IT to…

AWS EC2 instance types: Challenges and best practices for hosting your applications in AWS

7 min read - When it comes to hosting applications on Amazon Web Services (AWS), one of the most important decisions you will need to make is which Amazon Elastic Compute Cloud (EC2) instance type to choose. EC2 instances are virtual machines that allow you to run your applications on AWS. They come in various sizes and configurations—known as instance families—each designed for a specific purpose. Choosing the right instance offering and instance size for your application is critical for optimizing performance and reducing…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters