Monitoring Your Kubernetes Environment on IBM Cloud with Sysdig

By: Eric Carter and Shadi Albouyeh

Monitoring Your Kubernetes Environment on IBM Cloud with Sysdig

Kubernetes has taken the container ecosystem by storm, significantly changing how enterprises bring services to market. IBM Cloud Kubernetes Service, a managed container service for the rapid delivery of applications that can bind to advanced services like Watson™ and blockchain, dramatically simplifies deploying services in containers and microservices designed for the cloud. With the move to Kubernetes, however, comes the need for purpose-built operational tools to help manage the complexities of monitoring application and infrastructure performance in a distributed environment. IBM Cloud and Sysdig have partnered to solve this challenge for IBM Cloud customers with IBM Cloud Monitoring with Sysdig.

Applications on Kubernetes delivered as cloud-native microservices have an order of magnitude more components communicating with each other. Distributed across multiple instances and even locations, modern architectures add new complexities for the day-to-day tasks of monitoring, alerting, and troubleshooting. Additionally, the ephemeral nature of containers can hamper troubleshooting efforts. Usually, containers live as long as the process running inside them, disappearing when that process dies. This is one of the most challenging parts of troubleshooting containers. When containers die or are rescheduled to alternate nodes, the details you need for incident response may no longer exist.

IBM Cloud Monitoring with Sysdig is designed for monitoring Kubernetes environments by capturing system calls at microsecond-level granularity upon alert, which lets you reproduce exactly what happened even when containers are gone. Its deep visibility into infrastructure and applications enables DevOps teams to monitor and troubleshoot performance issues in real-time and answer questions like:

  • Pods restarting too frequently?

  • Liveness or readiness probes failing?

  • CPU cycles going off the charts?

You can easily analyze and correlate detailed data across commands, networks, containers, file I/O, errors, processes, syslog messages, and more to quickly find the needle in the haystack and reduce time to resolution for your Kubernetes-based services. Two key capabilities of IBM Cloud Monitoring with Sysdig come together to deliver rich system, container, and application data to dramatically simplify monitoring a Kubernetes environment:

  • ContainerVision makes it possible to inspect applications running inside containers without requiring any instrumentation of the container or application.

  • ServiceVision connects to the Kubernetes API Server to extract service labels and add Kubernetes context to all of your metrics and events

With a single instrumentation point per node in your Kubernetes cluster, you can monitor your hosts, networks, containers, and applications—all delivered from IBM Cloud and available via an intuitive web-based interface.

Monitoring with Kubernetes context

Orchestrators radically change monitoring requirements. Individual containers become less important, while the performance of a service becomes more important. One of the core features of IBM Cloud Monitoring with Sysdig is the ability to visualize and explore your containers and metrics based on physical hierarchy (for example, host > container) as well as the logical microservice hierarchy (for example, namespace > deployment > pod > container). The ability to regroup your infrastructure views on the fly is a powerful way to get insight and understand what’s taking place in your environment regardless of how distributed or dynamic it is. This helps you quickly answer questions about the performance of a service at large or drill down into a pod or even container.

Explore

Metrics are meaningless, however, without context. The key to understanding behavior and performance is the ability to correlate events in your environment with your metrics. Contextual events from Kubernetes and your container engine provide insight into application behavior. IBM Cloud Monitoring with Sysdig automatically collects Kubernetes and container events, which you can overlay with metric views for analysis and troubleshooting. With more services and applications relying on Kubernetes, ensuring the health of your Kubernetes components is another crucial monitoring capability. IBM Cloud Monitoring takes advantage of kube-state-metrics to capture details about the state of your Kubernetes infrastructure. The agent automatically polls the Kubernetes API for kube-state-metrics and makes them available for analysis, correlation, and alerting in your monitoring interface. This lets you identify whether the condition of your cluster is having an impact on application behavior. Here are examples of questions you’ll be able to answer:

  • Does each deployment have sufficient resources?
  • How many pods are running per deployment vs. desired?
  • Is there enough capacity to serve pod requests?
  • How many jobs are actively running?
  • How many nodes are unavailable?
  • How many nodes are out of disk space?

Visualizing with topology maps and dashboards

What if you want to see a map of all your services and, more importantly, visualize how they are communicating to each other? 

Visualizing your Kubernetes services

Topology maps let you drill down from top-level views like clusters and namespaces all the way to pods, containers, and container processes. Each can be overlaid with your choice of metrics, like response times, link traffic, and error counts. This “APM lite” capability provides you the info you need to visualize and manage the performance of your apps with full Kubernetes context and no additional instrumentation or plugins required.

Since IBM Cloud Monitoring with Sysdig auto-discovers what’s running in your environment, default dashboards and metric views are automatically populated. Although the pods implementing these services are distributed across multiple nodes, service data is aggregated for you to view. You can also customize your own dashboards with the details that matter most to you. This includes the “golden signals”—latency, traffic, errors, and saturation—that are important to monitor to understand how your applications are doing, especially as they scale. You also get specific metrics for your applications, things like:

  • Frequently used and slowest HTTP endpoints
  • Top and slowest queries and tables for databases
  • Heap usage, thread count, and garbage collection stats for Java and Go apps

In addition, if you’ve added custom metrics like Prometheus and StatsD into your services, you can collect and display these metrics for analysis in the same way—all from a single pane of glass.

Adaptive alerting with Kubernetes

Alerting is key to a successful Kubernetes monitoring strategy. IBM Cloud Monitoring with Sysdig provides UI-driven, multi-condition alerting and anomaly detection that takes into account the dynamic nature of Kubernetes-based services. Designed to help you get a handle on issues before they impact operations, you can set alerts against everything from tags and labels, for specific container images or even processes running on a node. In a few clicks, configure alerts by Kubernetes abstractions like namespace that cover hundreds of nodes or processes and apply automatically to new deployments, pods, and containers scheduled within the cluster. Additionally, you can set alerts against cluster resources to ensure your Kubernetes components are up and running with the desired state. 

  • Metric alerts can be manually set for thresholds across sums, averages, and rates—even with multiple, required conditions using boolean logic.
  • Event Alerts use discrete system events as opposed to calculations based on metrics. Examples include a Kubernetes CrashLoopBackOff, container unhealthy, or container killed events.
  • Anomaly detection uses Sysdig algorithms to determine normal behavior and alert when normal bounds have been exceeded. Sysdig can both detect anomalies against historical metric patterns and an outlier within a group (e.g., a group of hosts).
Java Alert

Alerts can then be sent to a range of downstream tools such as Slack, ServiceNow, PagerDuty, email, and more. 

Get started today with Kubernetes monitoring on IBM Cloud

We’ve only scratched the surface of what’s possible for Kubernetes monitoring with IBM Cloud Monitoring with Sysdig, but to learn more, access our docs page with links to tutorials for hands-on training. Then, experience the difference it can make for your Kubernetes deployments by provisioning a free monitoring instance from the IBM Cloud Catalog. The service is now available in Dallas, Frankfurt, and London as a multi-zone region deployment with others coming online in the upcoming months.

If you’re planning to attend Kubecon EU 2019 in Barcelona, Spain, be sure to visit us to engage with members from our joint development and Offering Management teams, to discuss further and provide your feedback in person.

Be the first to hear about news, product updates, and innovation from IBM Cloud