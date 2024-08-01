Today, the majority of enterprise organizations running mission-critical applications on Kubernetes are doing so in multitenant environments. These multitenant environments rely on the setting of limits to regulate the tenant workloads’ consumption or to use limits for chargebacks. Some devs will choose to set CPU or GPU limits for benchmark testing of their applications.

CPU throttling—whereby the rate of task scheduling on the physical CPU cores is inadvertently decreased, often resulting in an undesired increase in application response time—is the unintended consequence of this design. Take a look at this example:

In the above figure, the CPU usage of a container is only 25%, which makes it a natural candidate to resize down:

But after we resize down the container (container CPU usage is now 50%, still not high), the response time quadrupled.