Containers are today’s modern application standard, enabling rapid, iterative development and portability across any infrastructure. Container platforms—like Kubernetes—have risen out of the need to manage ever more distributed and dynamic cloud-native applications. Problem solved, right? Wrong.
As we’ve said before, with containers the artifacts change, but the challenge stays the same. How do you continuously assure performance, while maximizing efficiency and maintaining compliance with business and IT policies?
Kubernetes is just the start—albeit the start of great things. There’s a reason Kubernetes has an ecosystem surrounding it. There are a lot of people and organizations, including IBM, working to make it even better. Today we’ll zoom in on a specific example: Kubernetes pod rescheduling.
A pod’s life—without IBM Turbonomic
A Kubernetes pod is a group of containers that share the same characteristics, such as IP address, port space and so on, and can communicate with each other without special configuration. The life of a pod consists of being created, assigned a unique ID (UID) and scheduled to a node until termination or deletion—generally executed through a controller or manually by a person. Life is simple, sweet, and the pod is none the wiser for how healthy it really is—but the end user is.
Slow deaths of a pod
Application performance is a full-stack challenge and only as good as its weakest link. With that said, resource contention can take many forms, affecting pods and nodes— and, of course, end users. We’ll go into a deeper discussion in a separate post of why full-stack control—from applications through the infrastructure—is an absolute necessity. For this post, however, we’ll focus on the pod and node layers.
When it comes to applications, slow is the new down. Since we’re having a bit of fun with analogies today—admittedly like most days—let’s agree that slow, poorly performing pods might as well be dead. So, left to their own devices, pods can die a number of slow deaths—or never come to life.
Death by “noisy neighbor”: Pods on the same node peak together and cause resource contention.
Death by CPU starvation: Nodes are unable to provide pods with the CPU they need to perform.
Long-pending pod: Never mind death, think prelife purgatory. This pod never even sees the light of day because it can’t be scheduled due to resource fragmentation.
If you think this information is morbid, think about the customers who are about to take their business elsewhere because your application isn’t cutting it.
Pod reincarnation (also known as rescheduling)
On to happier subjects, let’s talk about pod reincarnation with IBM® Turbonomic® software. How do you avoid performance degradation in these scenarios? Reschedule the pod. Without Turbonomic poorly performing pods affect end users because once a pod is scheduled, it can’t be moved to another node. Performance degradation occurs, that pod “dies,” and then a new pod is spun up to service that demand on whatever node is available to it. It’s reactive and impacts the end-user experience. But, what if pods weren’t bound to nodes for life? What if a pod could start a second life on a node that’s better for meeting service levels before performance degrades?
That’s exactly what Turbonomic software does. The platform continuously analyzes the changing resource demands of all the pods, available capacity and constraints of the nodes. It determines which nodes a pod should reside on, helping to ensure they always get exactly the amount of resources they need—no more, no less—while maintaining compliance with policies, such as label and selector, affinity and antiaffinity, taint and toleration, and so on. When the Turbonomic platform reschedules a pod, it does so to prevent performance degradation, spinning up the new pod before terminating the old pod, so service is never disrupted.
Note: Turbonomic software delivers full-stack automation, which includes node placement, sizing and provisioning, but we’ll save that discussion for another post.