April 12, 2019 | Written by: Shadi Albouyeh and Eric Carter
Categorized: Compute Services | DevOps
Share this post:
Kubernetes has taken the container ecosystem by storm, significantly changing how enterprises bring services to market. IBM Cloud Kubernetes Service, a managed container service for the rapid delivery of applications that can bind to advanced services like Watson™ and blockchain, dramatically simplifies deploying services in containers and microservices designed for the cloud. With the move to Kubernetes, however, comes the need for purpose-built operational tools to help manage the complexities of monitoring application and infrastructure performance in a distributed environment. IBM Cloud and Sysdig have partnered to solve this challenge for IBM Cloud customers with IBM Cloud Monitoring with Sysdig.
Applications on Kubernetes delivered as cloud-native microservices have an order of magnitude more components communicating with each other. Distributed across multiple instances and even locations, modern architectures add new complexities for the day-to-day tasks of monitoring, alerting, and troubleshooting. Additionally, the ephemeral nature of containers can hamper troubleshooting efforts. Usually, containers live as long as the process running inside them, disappearing when that process dies. This is one of the most challenging parts of troubleshooting containers. When containers die or are rescheduled to alternate nodes, the details you need for incident response may no longer exist.
IBM Cloud Monitoring with Sysdig is designed for monitoring Kubernetes environment by capturing system calls at microsecond-level granularity upon alert, which lets you reproduce exactly what happened even when containers are gone. Its deep visibility into infrastructure and applications enables DevOps teams to monitor and troubleshoot performance issues in real-time and answer questions like:
- Pods restarting too frequently?
- Liveness or readiness probes failing?
- CPU cycles going off the charts?
You can easily analyze and correlate detailed data across commands, networks, containers, file I/O, errors, processes, syslog messages, and more to quickly find the needle in the haystack and reduce time to resolution for your Kubernetes-based services. Two key capabilities of IBM Cloud Monitoring with Sysdig come together to deliver rich system, container, and application data to dramatically simplify monitoring a Kubernetes environment:
- ContainerVision makes it possible to inspect applications running inside containers without requiring any instrumentation of the container or application.
- ServiceVision connects to the Kubernetes API Server to extract service labels and add Kubernetes context to all of your metrics and events
With a single instrumentation point per node in your Kubernetes cluster, you can monitor your hosts, networks, containers, and applications—all delivered from IBM Cloud and available via an intuitive web-based interface.
Monitoring with Kubernetes context
Orchestrators radically change monitoring requirements. Individual containers become less important, while the performance of a service becomes more important. One of the core features of IBM Cloud Monitoring with Sysdig is the ability to visualize and explore your containers and metrics based on physical hierarchy (for example, host > container) as well as the logical microservice hierarchy (for example, namespace > deployment > pod > container). The ability to regroup your infrastructure views on the fly is a powerful way to get insight and understand what’s taking place in your environment regardless of how distributed or dynamic it is. This helps you quickly answer questions about the performance of a service at large or drill down into a pod or even container.
Visualizing your Kubernetes services
What if you want to see a map of all your services and, more importantly, visualize how they are communicating to each other? IBM Cloud Monitoring with Sysdig dynamically maps the topology of your services and physical infrastructure to help you visualize flows and interdependencies between your hosts, containers, and Kubernetes abstractions. This makes it far easier to discover bottlenecks and recognize where issues exist. Topology maps let you drill down from top-level views like clusters and namespaces all the way to pods, containers, and container processes. Each can be overlaid with your choice of metrics, like response times, link traffic, and error counts. By default, the IBM Cloud Monitoring with Sysdig agent auto-discovers and collects rich applications metrics from over 40 applications. This “APM lite” capability provides you the info you need to visualize and manage the performance of your apps with full Kubernetes context and no additional instrumentation or plugins required.
Analyzing the performance of Kubernetes services
IBM Cloud Monitoring with Sysdig auto-discovers what’s running in your environment and automatically populates default dashboards and metric views. You can also customize your own dashboards with the details that matter most to you. This gives you quick access to the data you need to analyze the performance of your application services across all containers, regardless of the host or data center they are running in. This includes the “golden signals”—latency, traffic, errors, and saturation—that are important to monitor to understand how your applications are doing, especially as they scale. You also get specific metrics for your applications, things like:
- Frequently used and slowest HTTP endpoints
- Top and slowest queries and tables for databases
- Heap usage, thread count, and garbage collection stats for Java and Go apps
Though the pods implementing these services are distributed across multiple nodes, IBM Cloud Monitoring with Sysdig aggregates the service data for you to view—all without any configuration or instrumentation of your containers. In addition, if you’ve added custom metrics like Prometheus and StatsD into your services, you can collect and display these metrics for analysis in the same way—all from a single pane of glass. With IBM Cloud Monitoring with Sysdig, you can correlate any set of metrics together to see and solve problems fast.
Correlating Kubernetes metrics and events
Metrics are meaningless, however, without context. Key to understanding behavior and performance is the ability to correlate events in your environment with your metrics. Contextual events from Kubernetes and your container engine provide insight into application behavior. IBM Cloud Monitoring with Sysdig automatically collects Kubernetes and container events, which you can overlay with metric views for analysis and troubleshooting.
Monitoring Kubernetes infrastructure
With more services and applications relying on Kubernetes, ensuring the health of your Kubernetes components is another crucial monitoring capability. IBM Cloud Monitoring takes advantage of kube-state-metrics to capture details about the state of your Kubernetes infrastructure.
The IBM Cloud Monitoring with Sysdig agent automatically polls the Kubernetes API for kube-state-metrics and makes them available for analysis, correlation, and alerting in your monitoring interface. This lets you identify whether the condition of your cluster is having an impact on application behavior. Here are examples of questions you’ll be able to answer:
- Does each deployment have sufficient resources?
- How many pods are running per deployment vs. desired?
- Is there enough capacity to serve pod requests?
- How many jobs are actively running?
- How many nodes are unavailable?
- How many nodes are out of disk space?
Adaptive alerting with Kubernetes
Alerting is key to a successful Kubernetes monitoring strategy. The shift to containers and orchestration requires a new approach to alerting. IBM Cloud Monitoring with Sysdig provides UI-driven, multi-condition alerting and anomaly detection that takes into account the dynamic nature of Kubernetes-based services. Designed to help you get a handle on issues before they impact operations, you can set alerts against everything from tags and labels, for specific container images or even processes running on a node. In a few clicks, you can configure alerts by Kubernetes abstractions like namespace that cover hundreds of nodes or processes and apply automatically to new deployments, pods, and containers scheduled within the cluster. Additionally, you can set alerts against cluster resources to ensure your Kubernetes components are up and running with the desired state.
- Metric alerts can be manually set for thresholds across sums, averages, and rates—even with multiple, required conditions using boolean logic.
- Event Alerts use discrete system events as opposed to calculations based on metrics. Examples include a Kubernetes CrashLoopBackOff, container unhealthy, or container killed events.
- Anomaly detection uses Sysdig algorithms to determine normal behavior and alert when normal bounds have been exceeded. Sysdig can both detect anomalies against historical metric patterns and an outlier within a group (e.g., a group of hosts).
Alerts can then be sent to a range of downstream tools such as Slack, ServiceNow, PagerDuty, email, and more. In addition, action triggers can be tied into any alert. A capture file can be initiated, giving users detailed system call-level details about everything happening on a host when an alert fires. Or, using a webhook, you can trigger a scheduler to modify a deployment, run a script, or almost any other programmable action.
Get started today with Kubernetes monitoring on IBM Cloud
We’ve only scratched the surface of what’s possible for Kubernetes monitoring with IBM Cloud Monitoring with Sysdig, but to learn more, access our docs page with links to tutorials for hands-on training. Then, experience the difference it can make for your Kubernetes deployments by provisioning a free monitoring instance from the IBM Cloud Catalog. The service is now available in Dallas, Frankfurt, and London as a multi-zone region deployment with others coming online in the upcoming months.
If you’re planning to attend Kubecon EU 2019 in Barcelona, Spain, be sure to visit us to engage with members from our joint development and Offering Management teams, to discuss further and provide your feedback in person.