My IBM

What is Kubernetes monitoring?

31 July 2023

What is Kubernetes monitoring?

Kubernetes monitoring refers to the process of collecting and analyzing data related to the health, performance and cost characteristics of containerized applications running inside a Kubernetes cluster.

Kubernetes, also known as K8s or kube, is a container orchestration platform for scheduling and automating the deployment, management and scaling of containerized applications. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation (CNCF).

Monitoring Kubernetes clusters allows administrators and users to track things like uptime, usage of cluster resources and the interaction between cluster components. Monitoring helps to quickly identify issues such as insufficient resources, failures, pods unable to start and nodes that can’t join the cluster.

Applications on Kubernetes delivered as cloud-native microservices have an order of magnitude more components communicating with each other. Distributed across multiple instances and even locations, modern architectures add new complexities to the day-to-day tasks of monitoring, alerting and troubleshooting.

Also, the ephemeral nature of containers can hamper troubleshooting efforts. Containers usually live as long as the process running inside them and disappear when that process dies. This is one of the most challenging parts of troubleshooting containers. When containers die or are rescheduled to alternative nodes, the details you need for incident response might no longer exist.

Although Kubernetes has built-in cluster operators to monitor clusters and send alerts based on pods running, open source tools and third-party monitoring solutions help deliver full visibility into a K8s environment.

Keep your head in the cloud  

Get the weekly Think Newsletter for expert guidance on optimizing multicloud settings in the AI era.

Subscribe today

Kubernetes monitoring benefits

Proper Kubernetes monitoring delivers a range of benefits, from maintaining the stability and responsiveness of application performance to enhancing security and compliance.

Performance optimization

By tracking and analyzing metrics such as CPU consumption, memory usage, network traffic and response times, it’s possible to identify areas of inefficiency, optimize resource allocation and fine-tune a Kubernetes infrastructure for optimal performance.

This can result in improved application responsiveness and a better user experience.

Efficient resource usage

By monitoring resource usage metrics like CPU usage, memory consumption and network traffic, it’s possible to identify underutilized or overutilized Kubernetes nodes, optimize resource allocation and make informed decisions about infrastructure scaling.

This helps ensure that applications have the necessary resources to perform optimally, with the added benefit of reducing costs.

Proactive issue detection

Alerts and notifications help proactively identify and address the root cause of Kubernetes issues before they lead to disruptions or downtime.

The results are better system stability and minimal impact of potential issues on applications and users.

Rapid troubleshooting and debugging

Monitoring logs, events and metrics help quickly identify and diagnose problems, such as pod failures, resource constraints, networking issues or application errors.

By speeding up the debugging process, downtime is reduced and applications remain available.

Capacity planning and scalability

By analyzing historical data and monitoring trends in resource utilization, it’s possible to better forecast future resource needs, identify when more Kubernetes resources are required and plan for scaling clusters accordingly.

Ultimately, increased workload demands won’t lead to resource shortages.

Enhanced security and compliance

Monitoring Kubernetes logs, network traffic and access patterns make it easier to identify anomalous activities, potential breaches and unauthorized access attempts.

In addition, ensuring proper security controls and policies are in place and actively monitored helps maintain compliance with standards and regulations.

What K8s metrics should be monitored

Full visibility into a Kubernetes stack requires collecting telemetry data on the containers that are constantly being created, destroyed and making calls to one another, while also collecting telemetry data on the Kubernetes cluster itself.

Cluster-level monitoring

For cluster monitoring, there are several cluster-level metrics to follow, which help determine the overall health of a Kubernetes cluster.

Node functions: Monitoring if all cluster nodes are working properly and at what capacity helps determine what cloud resources are needed to run the cluster.

Node availability: Monitoring how many cluster nodes are available helps determine what cloud resources are being paid for (if using cloud provider like AWS or Microsoft Azure) and how the cluster is being used.

Node resource usage: Monitoring how the cluster as a whole is using resources (memory, CPU, bandwidth and disk usage) helps inform decisions about whether to increase or decrease the size or number of nodes in a cluster.

Number of pods running: Monitoring running pods shows if the number of nodes available is enough and, in the case of a node failure, whether or not they might handle the entire pod workload.

Pod-level monitoring

Pod-level monitoring is necessary for ensuring individual pods within a Kubernetes cluster are functioning properly. This involves looking at three types of metrics: Kubernetes metrics, container metrics and application metrics..

1. Kubernetes metrics

Monitoring Kubernetes metrics helps ensure all pods in a Kubernetes deployment are running and healthy.

Number of pod instances: If the current number of instances a pod has compared to how many were expected is low, the cluster might be out of resources.

Pod status: Understanding if pods are running and how many are pending, failed or terminated provides visibility into their availability and stability.

Pod restarts: Monitoring the number of times a pod restarts indicates the stability of the application within the pod. With frequent restarts, an underlying problem such as crashes or resource constraints may be the issue.

CPU usage: Monitoring the CPU consumption of a pod helps identify potential performance bottlenecks and ensure that pods have sufficient processing resources.

Memory usage: Monitoring the memory consumption of a pod helps detect memory leaks or excessive memory usage that could impact an application’s stability.

Network usage: Monitoring the bytes sent/received of a pod provides insights into its communication patterns and helps identify any networking issues.

Kubernetes metrics also include health checks, network data and how the on-progress deployment is going (i.e. number of instances changed from an older version to a new one).

2. Container metrics

Monitoring pod container metrics help determine how close you are to the resource limits you’ve configured. These metrics also allow you to detect pods stuck in a CrashLoopBackoff.

CPU usage/throttling: Monitoring how running containers are consuming CPU helps identify those that are resource-intensive or creating bottlenecks, which might impact the overall performance of the cluster. Tracking CPU throttling metrics highlights if containers are being limited in their CPU usage due to resource constraints or misconfigurations.

Memory usage: Monitoring how running containers are consuming memory brings attention to issues such as memory leaks, excessive memory usage or insufficient memory allocation, which might be affecting container stability and overall system performance.

Network traffic/errors: Monitoring the network traffic of containers, as well as errors such as packet loss or connection failures, helps assess their communication patterns and excessive network usage or unexpected spikes in traffic.

3. Application metrics

Monitoring application metrics help measure the performance and availability of applications running inside Kubernetes pods. These metrics are typically developed by the Kubernetes application itself and relate to the business rules it addresses, such as latency, responsiveness, error rates and response times.

AI Academy

Achieving AI-readiness with hybrid cloud

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

Go to episode

Kubernetes monitoring best practices

Below are several best practices to consider for successfully monitoring Kubernetes environments.

Use Kubernetes DaemonSets: DaemonSets allow you to deploy an agent that monitors each node of your Kubernetes environment and all the resources on that node across the whole Kubernetes cluster. Daemons help ensure that hosts appear and are prepared to provide metrics.

Make smart use of labels: Creating a logical, consistent and coherent labeling schema makes it easier for DevOps teams to identify different components and help deliver the most value from your Kubernetes monitoring.

Use Service Discovery: Service Discovery for Google Kubernetes Engine (GKE) allows you to continuously monitor your applications even if you don’t know where they are running. It automatically adapts metric collection to moving containers for a more complete understanding of a cluster’s health.

Set up alerts and notifications: Set up alerts for critical metrics, such as CPU or memory utilization, and get notified when those metrics reach certain thresholds. Monitoring tools with intelligent alerting help minimize alert fatigue by only sending you alerts for meaningful events or changes.

Monitor control plane elements: Regularly monitoring Kubernetes control plane elements, such as the API server, kube-dns, kubelet, kube-proxy, etcd and controller manager help ensure that cluster services are running smoothly.

Monitor user experience: Although not measured natively in the Kubernetes platform, monitoring the user experience can sometimes alert you to issues before they are discovered inside the cluster.

Use built-in and open source tools: Regardless of your use cases, take advantage of built-in Kubernetes monitoring tools, like Kubernetes Dashboard, cAdvisor (Container Advisor) and Kube-state-metrics, as well as popular open source tools, including Prometheus, Grafana, Jaeger and Elastic Stack (formerly ELK Stack). In addition to deploying, troubleshooting and monitoring, these tools deliver added functions like data visualizations and collecting and storing time-series metrics from various sources.

Use a SaaS-based K8s monitoring solution: To ease Kubernetes management, infrastructure development and costs, as well as receiving regular updates, use a SaaS-based monitoring system with built-in automation instead of an on-premises one.

Unlock digital transformation with strategic application modernization

Boost annual revenue by 14% and cut maintenance costs by up to 50% with targeted app modernization strategies.

Resources

Containers in the enterprise

Understand how leading businesses are using container technology to drive innovation, scalability and efficiency. Download your copy now.

Unlock the future of IT with hybrid cloud: A game-changer for your business

Discover how a hybrid cloud strategy can drive flexibility, security and growth for your business. Explore expert insights and real-world case studies that show why leading enterprises are making the switch.

Unlock the power of Docker for scalable, efficient application deployment

Docker simplifies application deployment with lightweight, portable containers, ensuring consistency, scalability and efficiency across environments. Streamline your processes and boost performance with Docker today.

Unlock the full potential of your data intelligence

Ready to transform your business with advanced data solutions? Explore how IBM’s cutting-edge technologies can help you harness the power of data, streamline operations and gain a competitive edge.

Harness the power of Kubernetes: Simplifying container management

Explore how Kubernetes enables businesses to handle large-scale applications, improve resource efficiency and achieve faster software delivery cycles. Learn how adopting Kubernetes can optimize your IT infrastructure and boost operational efficiency.

Optimize your network traffic with efficient load balancing

Enhance your infrastructure’s availability, scalability and security by exploring IBM’s load balancing offerings. Take the next step toward seamless traffic management today.

What is Kubernetes monitoring?

31 July 2023

What is Kubernetes monitoring?

Keep your head in the cloud

Kubernetes monitoring benefits

What K8s metrics should be monitored

Cluster-level monitoring

Pod-level monitoring

1. Kubernetes metrics

2. Container metrics

3. Application metrics

Achieving AI-readiness with hybrid cloud

Kubernetes monitoring best practices

Resources

Related solutions

Keep your head in the cloud