Containers are today’s modern application standard, enabling rapid, iterative development and portability across any infrastructure. Container platforms—like Kubernetes—have risen out of the need to manage ever more distributed and dynamic cloud-native applications. Problem solved, right? Wrong.

As we’ve said before, with containers the artifacts change, but the challenge stays the same. How do you continuously assure performance, while maximizing efficiency and maintaining compliance with business and IT policies?

Kubernetes is just the start—albeit the start of great things. There’s a reason Kubernetes has an ecosystem surrounding it. There are a lot of people and organizations, including IBM, working to make it even better. Today we’ll zoom in on a specific example: Kubernetes pod rescheduling.

A pod’s life—without IBM Turbonomic

A Kubernetes pod is a group of containers that share the same characteristics, such as IP address, port space and so on, and can communicate with each other without special configuration. The life of a pod consists of being created, assigned a unique ID (UID) and scheduled to a node until termination or deletion—generally executed through a controller or manually by a person. Life is simple, sweet, and the pod is none the wiser for how healthy it really is—but the end user is.

Slow deaths of a pod

Application performance is a full-stack challenge and only as good as its weakest link. With that said, resource contention can take many forms, affecting pods and nodes— and, of course, end users. We’ll go into a deeper discussion in a separate post of why full-stack control—from applications through the infrastructure—is an absolute necessity. For this post, however, we’ll focus on the pod and node layers.

When it comes to applications, slow is the new down. Since we’re having a bit of fun with analogies today—admittedly like most days—let’s agree that slow, poorly performing pods might as well be dead. So, left to their own devices, pods can die a number of slow deaths—or never come to life.

  • Death by “noisy neighbor”: Pods on the same node peak together and cause resource contention.
  • Death by CPU starvation: Nodes are unable to provide pods with the CPU they need to perform.
  • Long-pending pod: Never mind death, think prelife purgatory. This pod never even sees the light of day because it can’t be scheduled due to resource fragmentation.

If you think this information is morbid, think about the customers who are about to take their business elsewhere because your application isn’t cutting it.

Pod reincarnation (also known as rescheduling)

On to happier subjects, let’s talk about pod reincarnation with IBM® Turbonomic® software. How do you avoid performance degradation in these scenarios? Reschedule the pod. Without Turbonomic poorly performing pods affect end users because once a pod is scheduled, it can’t be moved to another node. Performance degradation occurs, that pod “dies,” and then a new pod is spun up to service that demand on whatever node is available to it. It’s reactive and impacts the end-user experience. But, what if pods weren’t bound to nodes for life? What if a pod could start a second life on a node that’s better for meeting service levels before performance degrades?

That’s exactly what Turbonomic software does. The platform continuously analyzes the changing resource demands of all the pods, available capacity and constraints of the nodes. It determines which nodes a pod should reside on, helping to ensure they always get exactly the amount of resources they need—no more, no less—while maintaining compliance with policies, such as label and selector, affinity and antiaffinity, taint and toleration, and so on. When the Turbonomic platform reschedules a pod, it does so to prevent performance degradation, spinning up the new pod before terminating the old pod, so service is never disrupted.

Note: Turbonomic software delivers full-stack automation, which includes node placement, sizing and provisioning, but we’ll save that discussion for another post.

Request a demo today to see how pod rescheduling can ensure maximum efficiency and performance while maintaining compliance with business and IT policies


More from IBM Instana

Observing Camunda environments with IBM Instana Business Monitoring

3 min read - Organizations today struggle to detect, identify and act on business operations incidents. The gap between business and IT continues to grow, leaving orgs unable to link IT outages to business impact.  Site reliability engineers (SREs) want to understand business impact to better prioritize their work but don’t have a way of monitoring business KPIs. They struggle to link IT outages to business impacts because data is often siloed and knowledge is tribal. It forces teams into a highly reactive mode…

Buying APM was a good decision (so is getting rid of it)

4 min read - For a long time, there wasn’t a good standard definition of observability that encompassed organizational needs while keeping the spirit of IT monitoring intact. Eventually, the concept of “Observability = Metrics + Traces + Logs” became the de facto definition. That’s nice, but to understand what observability should be, you must consider the characteristics of modern applications: Changes in how they’re developed, deployed and operated The blurring of lines between application code and infrastructure New architectures and technologies like Docker,…

Debunking observability myths – Part 5: You can create an observable system without observability-driven automation

3 min read - In our blog series, we’ve debunked the following observability myths so far: Part 1: You can skip monitoring and rely solely on logs Part 2: Observability is built exclusively for SREs Part 3: Observability is only relevant and beneficial for large-scale systems or complex architectures Part 4: Observability is always expensive In this post, we'll tackle another fallacy that limits the potential of observability—that you can create an observable system without observability driven by automation. Why is this a myth? The notion that…

Top 8 APM metrics that IT teams use to monitor their apps

5 min read - A superior customer experience (CX) is built on accurate and timely application performance monitoring (APM) metrics. You can’t fine-tune your apps or system to improve CX until you know what the problem is or where the opportunities are. APM solutions typically provide a centralized dashboard to aggregate real-time performance metrics and insights to be analyzed and compared. They also establish baselines to alert system administrators to deviations that indicate actual or potential performance issues. IT teams, DevOps and site reliability…