What is Kubernetes monitoring?
Explore IBM's Kubernetes monitoring solution Subscribe to AI Topic Updates
Dock worker with clipboard looking inside a shipping container
What is Kubernetes monitoring?

Kubernetes monitoring refers to the process of collecting and analyzing data related to the health, performance and cost characteristics of containerized applications running inside a Kubernetes cluster.

Kubernetes, also known as K8s or kube, is a container orchestration platform for scheduling and automating the deployment, management and scaling of containerized applications. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation (CNCF).

Monitoring Kubernetes clusters allows administrators and users to track things like uptime, usage of cluster resources and the interaction between cluster components. Monitoring helps to quickly identify issues such as insufficient resources, failures, pods unable to start and nodes that can’t join the cluster.

Applications on Kubernetes delivered as cloud-native microservices have an order of magnitude more components communicating with each other. Distributed across multiple instances and even locations, modern architectures add new complexities to the day-to-day tasks of monitoring, alerting and troubleshooting.

Also, the ephemeral nature of containers can hamper troubleshooting efforts. Containers usually live as long as the process running inside them and disappear when that process dies. This is one of the most challenging parts of troubleshooting containers. When containers die or are rescheduled to alternative nodes, the details you need for incident response might no longer exist.

Although Kubernetes has built-in cluster operators to monitor clusters and send alerts based on pods running, open source tools and third-party monitoring solutions help deliver full visibility into a K8s environment.

Guide to enterprisewide intelligent automation

Learn how intelligent automation can make your business operations a competitive advantage.

Related content

Register for the guide to operationalize FinOps

Kubernetes monitoring benefits

Proper Kubernetes monitoring delivers a range of benefits, from maintaining the stability and responsiveness of application performance to enhancing security and compliance.

Performance optimization

By tracking and analyzing metrics such as CPU consumption, memory usage, network traffic and response times, it’s possible to identify areas of inefficiency, optimize resource allocation and fine-tune a Kubernetes infrastructure for optimal performance.

This can result in improved application responsiveness and a better user experience.

Efficient resource usage

By monitoring resource usage metrics like CPU usage, memory consumption and network traffic, it’s possible to identify underutilized or overutilized Kubernetes nodes, optimize resource allocation and make informed decisions about infrastructure scaling.

This helps ensure that applications have the necessary resources to perform optimally, with the added benefit of reducing costs.

Proactive issue detection

Alerts and notifications help proactively identify and address the root cause of Kubernetes issues before they lead to disruptions or downtime.

The results are better system stability and minimal impact of potential issues on applications and users.

Rapid troubleshooting and debugging

Monitoring logs, events and metrics help quickly identify and diagnose problems, such as pod failures, resource constraints, networking issues or application errors.

By speeding up the debugging process, downtime is reduced and applications remain available.

Capacity planning and scalability

By analyzing historical data and monitoring trends in resource utilization, it’s possible to better forecast future resource needs, identify when more Kubernetes resources are required and plan for scaling clusters accordingly.

Ultimately, increased workload demands won’t lead to resource shortages.

Enhanced security and compliance

Monitoring Kubernetes logs, network traffic and access patterns make it easier to identify anomalous activities, potential breaches and unauthorized access attempts.

In addition, ensuring proper security controls and policies are in place and actively monitored helps maintain compliance with standards and regulations.

What K8s metrics should be monitored

Full visibility into a Kubernetes stack requires collecting telemetry data on the containers that are constantly being created, destroyed and making calls to one another, while also collecting telemetry data on the Kubernetes cluster itself.

Cluster-level monitoring

For cluster monitoring, there are several cluster-level metrics to follow, which help determine the overall health of a Kubernetes cluster.

Node functions: Monitoring if all cluster nodes are working properly and at what capacity helps determine what cloud resources are needed to run the cluster.

Node availability: Monitoring how many cluster nodes are available helps determine what cloud resources are being paid for (if using cloud provider like AWS or Microsoft Azure) and how the cluster is being used.

Node resource usage: Monitoring how the cluster as a whole is using resources (memory, CPU, bandwidth and disk usage) helps inform decisions about whether to increase or decrease the size or number of nodes in a cluster.

Number of pods running: Monitoring running pods shows if the number of nodes available is enough and, in the case of a node failure, whether or not they might handle the entire pod workload.

Pod-level monitoring

Pod-level monitoring is necessary for ensuring individual pods within a Kubernetes cluster are functioning properly. This involves looking at three types of metrics: Kubernetes metrics, container metrics and application metrics.

1. Kubernetes metrics

Monitoring Kubernetes metrics helps ensure all pods in a Kubernetes deployment are running and healthy.

Number of pod instances: If the current number of instances a pod has compared to how many were expected is low, the cluster might be out of resources.

Pod status: Understanding if pods are running and how many are pending, failed or terminated provides visibility into their availability and stability.

Pod restarts: Monitoring the number of times a pod restarts indicates the stability of the application within the pod. With frequent restarts, an underlying problem such as crashes or resource constraints may be the issue.

CPU usage: Monitoring the CPU consumption of a pod helps identify potential performance bottlenecks and ensure that pods have sufficient processing resources.

Memory usage: Monitoring the memory consumption of a pod helps detect memory leaks or excessive memory usage that could impact an application’s stability.

Network usage: Monitoring the bytes sent/received of a pod provides insights into its communication patterns and helps identify any networking issues.

Kubernetes metrics also include health checks, network data and how the on-progress deployment is going (i.e. number of instances changed from an older version to a new one).

2. Container metrics

Monitoring pod container metrics help determine how close you are to the resource limits you’ve configured. These metrics also allow you to detect pods stuck in a CrashLoopBackoff.

CPU usage/throttling: Monitoring how running containers are consuming CPU helps identify those that are resource-intensive or creating bottlenecks, which might impact the overall performance of the cluster. Tracking CPU throttling metrics highlights if containers are being limited in their CPU usage due to resource constraints or misconfigurations.

Memory usage: Monitoring how running containers are consuming memory brings attention to issues such as memory leaks, excessive memory usage or insufficient memory allocation, which might be affecting container stability and overall system performance.

Network traffic/errors: Monitoring the network traffic of containers, as well as errors such as packet loss or connection failures, helps assess their communication patterns and excessive network usage or unexpected spikes in traffic.  

3. Application metrics

Monitoring application metrics help measure the performance and availability of applications running inside Kubernetes pods. These metrics are typically developed by the Kubernetes application itself and relate to the business rules it addresses, such as latency, responsiveness, error rates and response times.

Kubernetes monitoring best practices

Below are several best practices to consider for successfully monitoring Kubernetes environments.  

Use Kubernetes DaemonSets: DaemonSets allow you to deploy an agent that monitors each node of your Kubernetes environment and all the resources on that node across the whole Kubernetes cluster. Daemons help ensure that hosts appear and are prepared to provide metrics.

Make smart use of labels: Creating a logical, consistent and coherent labeling schema makes it easier for DevOps teams to identify different components and help deliver the most value from your Kubernetes monitoring.

Use Service Discovery: Service Discovery for Google Kubernetes Engine (GKE) allows you to continuously monitor your applications even if you don’t know where they are running. It automatically adapts metric collection to moving containers for a more complete understanding of a cluster’s health.

Set up alerts and notifications: Set up alerts for critical metrics, such as CPU or memory utilization, and get notified when those metrics reach certain thresholds. Monitoring tools with intelligent alerting help minimize alert fatigue by only sending you alerts for meaningful events or changes.

Monitor control plane elements: Regularly monitoring Kubernetes control plane elements, such as the API server, kube-dns, kubelet, kube-proxy, etcd and controller manager help ensure that cluster services are running smoothly.

Monitor user experience: Although not measured natively in the Kubernetes platform, monitoring the user experience can sometimes alert you to issues before they are discovered inside the cluster.

Use built-in and open source tools: Regardless of your use cases, take advantage of built-in Kubernetes monitoring tools, like Kubernetes Dashboard, cAdvisor (Container Advisor) and Kube-state-metrics, as well as popular open source tools, including Prometheus, Grafana, Jaeger and Elastic Stack (formerly ELK Stack). In addition to deploying, troubleshooting and monitoring, these tools deliver added functions like data visualizations and collecting and storing time-series metrics from various sources.

Use a SaaS-based K8s monitoring solution: To ease Kubernetes management, infrastructure development and costs, as well as receiving regular updates, use a SaaS-based monitoring system with built-in automation instead of an on-premises one.

Related solutions
IBM® Instana® Observability

Go beyond traditional APM solutions by democratizing observability so anyone across DevOps, SRE, platform engineering, ITOps and development can get the data they want with the context they need.

Explore Instana

IBM® Turbonomic® Application Resource Management (ARM) platform

When applications consume only what they need to perform, you can improve operational efficiency, increase utilization and reduce energy costs and associated carbon emissions.

Explore Turbonomic

IBM Cloud® Monitoring

Use full-stack telemetry for managing architectures focused on containers and microservices with advanced features to monitor, troubleshoot, define alerts and design custom dashboards.

Explore Cloud Monitoring
Resources What is Kubernetes?

Gain a better understanding of what Kubernetes is, why it is important, how it works and why its popularity as a container orchestration platform continues to surge.

What are containers?

Learn about the importance of containers in cloud computing, their core benefits and the emerging ecosystem of related technologies—including Docker, Kubernetes, Istio and Knative.

PeerPaper™ Report 2023: Best practices for choosing a cloud optimization solution

Download this report to learn best practices and considerations for selecting a cloud optimization solution from PeerSpot members who use Turbonomic.

Take the next step

IBM Instana provides real-time observability that everyone and anyone can use. It delivers quick time-to-value while verifying that your observability strategy can keep up with the dynamic complexity of current and future environments. From mobile to mainframe, Instana supports over 250 technologies and growing. 

Explore IBM Instana Book a live demo