Sai Vennam is joined by LogDNA Dev Advocate Laura Santamaria to take a closer look at observability and how you can harness and drive new insights from data.
Simply put, observability is a property of your systems that helps you understand what's going on with them, monitor what they're doing, and be able to get the information you need to troubleshoot.
In this video, we cover the three tiers of observability (logging, metrics, and monitoring), look at the different levels that data can come from, and learn how different personas in the workplace (e.g., dev, ops, security) can use an aggregator to filter the influx of data into convenient and usable dashboards.
We hope you enjoy the video! Make sure to subscribe to the IBM Cloud YouTube channel for more like this.
What is observability?
Sai: As your applications grow in complexity, how do you harness and drive new insights from all the chaos?
Laura: And, is observability just a buzzword, or is it something that you actually need to think about? Spoiler alert, it is.
Sai: My name is Sai Vennam, and I'm with the IBM Cloud team, and today I'm joined with a special guest.
Laura: Hi there, I'm Laura Santamaría, and I am a Developer Advocate with LogDNA.
Sai: If you don't know, LogDNA is a core part of our observability story on IBM Cloud, but today, we're gonna be talking about observability, so let's start with a definition.
Laura: So, observability is a property of your systems that helps you understand what's going on with them, monitor what they're doing, and be able to get the information you need to troubleshoot.
Three tiers of observability: Logging, metrics, and monitoring
Sai: So the way we see it, there's three major tiers of observability, and let's go through those now.
Laura: We're gonna start out with my favorite which is logging. In addition to logging, we additionally have metrics—so that's just all of your analytics around all of the data that you're gathering and, finally...
Sai: We've got monitoring. Now, monitoring is essentially putting up a magnifying glass to your systems and getting new insights from what's actually running there.
Today we're gonna be starting with an example. In the bottom left corner, we have sketched out a few of the different infrastructure pieces that we'll start with today.
Can we explain what those are?
Laura: Sure, we have a public cloud—it can be any of them. And then you have on-prem, and then let's say we actually have some user data—maybe this is a tablet or a cell phone.
Different personas consume data in different ways
Sai: So, all of those infrastructure pieces are creating and generating data, and what I'm kind of gonna focus on here is the personas that are going to consume them.
So we've got that dev persona, we've got ops, and finally we have security.
So, all of this data flowing in is kind of a lot—I want to have some way of filtering it down for my specific user personas to be able to understand it.
So let's start with developers—what do developers care about?
Different levels that logging can come from
Laura: I actually want to back up here for a moment though because let's talk about all the different levels that logging can come from.
So we have three different levels that we can think about so you have your operating system, you have Kubernetes or any other type of platform, so I'm picking Kubernetes.
Sai: That's my favorite.
Laura: And then finally, your application.
So your operating system and Kubernetes all send really good logs, and you can use a lot of that data pretty much as this or add in some of your own. But, applications is really where you need to spend some time.
So, your devs need to create a proper event stream, and this really goes by the "garbage in, garbage out" system, where you really need to put in good work and get some good data on the side of the application so that you get good logs out.
Sai: Right, exactly, so the great developers out there on Kubernetes and the operating systems, they've instrumented their platforms. But the application, that's up to you as a developer to make sure the instrumentation is in place.
Laura: Absolutely, and when you think about it, let's say that we have an operating system here—and I'm gonna say that's an operating system—and then we have Kubernetes running on it. And then, you actually have your app running on top of Kubernetes. And all of these are to each sending data. So we have three different levels of data, all coming out and trying to come towards the dev that wants some information.
Sai: Right, so it looks like they're all coming into this central area here.
Laura: That's right. We can talk about this as our aggregator. So, our aggregator takes in all of this data and puts it all into one place so we can work with it.
Sai: That's right, but kind of coming back to the problem here—a developer might not care about all of the information flowing in. How do we drive just the pieces that they care about like we mentioned? Maybe they instrumented their specific application—how do we drive that to them?
Filtering data to dashboards
Laura: Absolutely. So an aggregator often has filters. So, in this case, let's say the dev is just asking for data about debugging and just some information there. And your data, your filter, can actually set up a dashboard or some other way of accessing all of that data that the dev can take a look at just the pieces that they need.
Sai: That's a core part of an observability solution—this aggregator, not only does it collect the data, but it needs to externalize it, expose it, so my developers can access it and drive new insights.
So, let's say we solved that part of the puzzle. What do operators care about? What are the operations teams—what are they looking for out of these systems?
Laura: So an operations team might need to know more about degradation of its system or if a pod is falling over; maybe your database filled up and you need to know more information about how you can fix it.
The ops teams is going to be getting data from all of these different systems and filtering it out to yet another dashboard or another interface of some sort and getting that data just what they need.
Sai: Right. So, potentially, they may not care as much about specific application-level logs, but they'll be looking to Kubernetes to say—hey, what was the CPU usage? Do we need to set up some horizontal pod auto scalers to make sure that we don't hit those limits?
Finally—you can kind of probably see where I'm going here—with the last piece of the puzzle with security, they probably have a dashboard that's created for them as well.
So, a security team—let's say they're using a third-party tool, as most security teams generally do—they identify a threat ID or maybe a customer ID, and they want to dive in deeper to a potential threat that's been identified.
So they put that information in the aggregator, and they can identify and make sense of all the chaos to identify exactly what that specific security analyst might be looking for.
But, I want to pose an interesting question here—it's not always about going to the system and identifying what's there. Many times, security advisors need to know what's happening the second it happens and they can't just sit there and stare at logs all day, right?
Laura: Absolutely, this is where monitoring comes in—this is really a two-way street. We have automated alerts that can go out and tell all of these different groups about specific things that they're interested in—specific events that they want to know about.
So, let's say that you have a system that's been accessed, and it's not supposed to be. Frankly, that system is going to figure it out long before a human is, and that's what an alert is for. An ops team doesn't want to find out that there's a degradation of service when their user does—they need to know ahead of time.
Sai: So, a good observability solution should have the ability to externalize the data and then, additionally, set up alerting on top of that.
So our dev team may be their most comfortable in Slack, so they set up a chatbot so that particular exceptions when they're thrown, they're able to know when they happen.
Your ops team, maybe they were using something like a paging system so that, you know, in the middle of the night, if something goes down, they get an alert and they can start looking into it right away.
And then finally, for our security teams—kind of as I mentioned, they're generally using you know maybe third-party tools or custom dashboards—they can set up custom alerting so they can know exactly when something goes down.
Laura: And to be honest, this is your new norm. You're going to have multiple clouds; you're going to have on-prem systems; you're going to have data coming directly in from your users. You need to be able to understand what's going on, and really, this is what observability is all about.