IBM Cloud Monitoring with Sysdig understands Kubernetes.

The advanced integration allows you to define how to monitor your systems using Kubernetes labels. As we all know, the labeling in Kubernetes is incredibly powerful, so there are billions of ways to slice and dice your infrastructure. Of course, all the normal stuff is there too, but you’ll soon want to move to a more advanced view of your services and deployments. In addition, using the topology view in IBM Cloud Monitoring with Sysdig automatically maps the interactions, which makes it incredibly useful to understand where things like network traffic are going. You can even define the grouping in the topology—later you’ll see how we watch traffic between namespaces for example.

Let’s begin by taking a look at my team’s API and seeing how it’s performing without having to worry about machines and networks. We’re running a fairly traditional API over a database to give access to important business entities and using Kubernetes to make it robust and easy to scale.

Great namespace support

Starting at the simplest level, you can see the Kubernetes pods grouped by namespace. That’s really convenient because namespaces provide a great way to organize an application or service in Kubernetes.

Figure 1: Group by namespace.

My team and I practice DevSecOps, meaning the developers on our team act as operators. This makes Sysdig’s strength even more important because it allows you to look at the things you are influencing. For example, you’re going to want to know which things are using available resources and how much. Sysdig makes it easy to see this over time, which is much more powerful than simply showing a snapshot.

Here’s a quick overview of which things are using noteworthy amounts of memory.

Figure 2: Total resources by namespace.

Already, you’re seeing how useful Sysdig is, without even talking about machines, containers, or processes. So far, you have a glimpse at things that the operators are naturally interested in.

In Figure 2, you can also see I chose to clear some items manually to remove them from the chart. I’ll share how to apply filters later.

Great label support

Now, you may be wondering, “Did my change break production?” There are several ways I could answer that, but often you want to see how a system changed when you pushed a change to GitHub. Of course, that change flowed straight into production with a zero-touch rolling deployment after it passed its gating tests. Among other things, I’ve labeled the deployments with its Git commit so now Sysdig can shine and show resource utilization by Git commit.

Figure 3: Resources by Git commit.

Clearly, this service has a memory leak, but I can see it wasn’t my recent change that introduced it so I can relax a little (for now).

By now, I’m sure you’re already thinking of other ways to use labels. It’s a fantastic setup and great that Kubernetes supports extensive labeling while Sysdig collects the metadata.

Focus on what matters

You may have noticed in that last screenshot that I filtered what I was showing.

Figure 4: Simple scope.

Of course, you can tighten that scope and use simple drag-and-select to zoom in on a time window you care about. Ready to take a look at a specific Git commit?

Figure 5: Memory by pod.

What’s interesting here is that it wasn’t just one pod that was affected, but all of them due to the Kubernetes ingress balancing the troublesome traffic.

You probably noticed the peaks and are wondering why things stopped there. Again, Sysdig has captured the data needed to answer that question. Notably, you don’t have to go back and look at what the configuration might have been from your YAML files as you now have the actual value.

Figure 6: Memory limit.

Figure 6 illustrates the configuration page where I chose the “#” symbol to show a single number for simplicity (I could have just as easily chosen a chart to see how it changed over time).

Another feature of Sysdig is its ability to do the right thing by default. I could have set up the scale for this number, but I didn’t need to. The auto option worked beautifully!

So, now it’s clear that something wonderful is happening – Kubernetes is stepping in when the memory reaches the limit and it’s restarting the pod.

But, could this be causing issues for users?

Figure 7: HTTP error count.

Figure 7 shows 0.06% of requests that are stumbling. Whilst I’m not happy about that, I know that by adding some new functionality I can help the business far more than worrying about a few failed connections. As a developer, this is exactly the information you need to help guide you and your team.

Peek inside the matrix

Sysdig’s topology view allows you to view traffic within the cluster as well as from outside of it. In Figure 8, I focus on the traffic from a namespace called chi-calc, which consumes the API. Being able to group network traffic by namespace is super useful.

Figure 8: API clients.

Excitingly, I can dig into each of those boxes to see what’s in them. In fact, this view covered a time when I ran three instances of the client job. When I expand the box, you can see each of the three client instances. You can expand the API box too and see the traffic between specific instances.

When I’m looking to see what changed, I often build a picture like the one above and then alter the time window. Sysdig has a wonderful way of showing the difference using dashed lines and boxes to show items that went away.

In the image below, I set the time window to be around one of the client job instances. Notice that the older two are marked with dashed boxes because they were previously shown but are no longer relevant.

Figure 9: Changes over time.

This shows the power of being able to group by arbitrary things. Initially, I didn’t care about specific instances and just wanted to look at the traffic between namespaces. In a dynamic infrastructure, you don’t want the little details of pods coming and going to distract you from the big picture.

Start your monitoring using Sysdig today

Thanks for coming along with me as I explored how my API was behaving. Clearly, the Sysdig team’s experience building useful technology is helping me and my team to do the same.

Start using IBM Cloud Monitoring with Sysdig right now and get labeling your objects in Kubernetes to make it shine.


More from Cloud

IBM Tech Now: October 2, 2023

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 86 On this episode, we're covering the following topics: AI on IBM Z IBM Maximo Application Suite 8.11 IBM NS1 Connect Stay plugged in You can check out the IBM Blog Announcements for a…

IBM Cloud inactive identities: Ideas for automated processing

4 min read - Regular cleanup is part of all account administration and security best practices, not just for cloud environments. In our blog post on identifying inactive identities, we looked at the APIs offered by IBM Cloud Identity and Access Management (IAM) and how to utilize them to obtain details on IAM identities and API keys. Some readers provided feedback and asked on how to proceed and act on identified inactive identities. In response, we are going lay out possible steps to take.…

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…