When applications produce anomalies such as erroneous calls, problematic traces, a high degree of events or an accumulation of problem messages in logs, observability tools determine how fast you can identify the root cause.

In a best-case scenario, your observability tool uncovers the first signs of something that could slow down or disrupt an application, and you can mitigate it before it impacts customers. Other times, more sudden events can cause an immediate disruption, and you have to triage and resolve a major incident.

In other words, the alert isn’t the end of the observability process; it’s just the beginning. When every second counts, finding the root cause and applying the fix often depends on the UI of the observability tool. Complex search queries, non-intuitive navigation and hidden menus are time killers. Easier query data means faster fixes and happier customers.

The complexity and speed of modern cloud-native architectures make it impossible to manually keep track of dependencies between application components.

As is often the case, the first line of defense is automation.

Automation drives application data collection and analysis

Observability and APM tools record an unwieldy amount of information. The IBM Instana platform captures and stores every trace, gathered at one-second metric granularity with no sampling. The solution also uses Unbounded analytics capabilities to automatically associate or let you associate data in the context you need. Either path directs you to the specific graph, bar chart or text description that lets you drill down to the root cause.

All that information is organized and displayed in the Dynamic Graph to reveal the dependencies between your application components automatically. That means that the conditions that lead to and result from an issue are associated with that issue, so you can look forward or backward for cause and effect.

In fact, the amount of information could be overwhelming without an interface that makes it easy to drill down to specific data.

What’s new: Unbounded analytics just made triage even easier

That’s why we’ve put so much emphasis on ease of use since the inception of IBM Instana—so at the first sign of trouble, you can easily find and fix the issue. Now we’ve made analytics even faster with easier query data. Here’s a look at some enhancements.

Automated query strings and filters

All IBM Instana analytics are built using filters and groups that are combined using a simple query language that identifies the parameters to be viewed. You can then ‘AND’ or ‘OR’ those parameters into a robust logical statement that drills into the specific data combination you want to see. When analytics are presented within a sidebar menu category such as events, the analytics query string is automatically generated for the issue explored. When you alternatively use the Analytics sidebar menu, you can create queries from scratch.

Automated production profiler that continually analyzes

Figure 1

IBM Instana analytics top-level parameters include Application Calls, Traces and Logs, seven Website parameters, four Mobile parameters, and Profiles. Applications, Website and Mobile parameters can then be organized using the filter, group and latency attributes. IBM Instana

AutoProfile is an automated production profiler that continually analyzes Java, Go, Python, Ruby and other code-level performance.

The main Unbounded analytics screen is displayed in Figure 1. At the top is the Analytics navigation menu, which by default displays Application Calls as the display parameter.

More intuitive menus

The menu snippets in the next screenshot show the full range of Analytics menu options, including Traces, Website parameters, Mobile application parameters and Application Profile, which invokes IBM Instana AutoProfile code-level profiling capability:

Figure 2

The Applications menu:

Figure 3

The Websites menu:

Figure 4

The Mobile Apps menu:

Figure 5

First use case—automated analytics

Figure 6: Two events are flagged in the Events icon in the side menu.

Let’s examine analytics for event incidents from within traces, calls, events and other parameters for the IBM Instana RobotShop eCommerce demo environment.

In the previous screenshot, you’ll see that two events are flagged in the Events icon in the side menu.

Clicking on the Events icon opens the Events page.

In the Open Events list, we see the issues, including an increase in latency and the number of erroneous calls. The increase in error events appears to be more serious, so we click on the most serious listing and drill down to see more detail.

The event detail page in the next screenshot displays the Incident Timeline, Triggering Event and any contextual Related Events. To explore, we click on the Triggering Event to obtain more information about the Event cause:

Figure 7:  A spike in erroneous calls in the timeline graph.

Scrolling down on the Event screen, we see a significant spike in erroneous calls in the timeline graph. From here, we can either view the Built-in Event or invoke Unbounded analytics to obtain more information about the precipitating event by clicking on the Analyze Calls button.

Figure 8: IBM Instana automatically generates the appropriate Filter, Group and Chart information for the erroneous call event.

IBM Instana displays an Unbounded analytics screen for Calls and automatically generates the appropriate Filter, Group and Chart information for the erroneous call event we’re examining. From the Filter line, we can see that the IBM Instana solution has combined the ‘Service Name: discount’ and ‘Call Erroneous is true’ parameters using the Unbounded Analytics query language to specifically point to the location of the erroneous calls. It identifies two endpoints that are experiencing a 100% erroneous call rate.

Figure 9: Scrolling down the list of endpoint errors.
Figure 10: The IBM Instana platform combines the ‘Service Name: discount’ and ‘Call Erroneous is true’ parameters to point to the erroneous calls.

We scroll down the list of endpoint errors and examine one to get a better idea of what issue is causing the problem.

The detail screen shows that the erroneous call is a Connect call that is throwing a 500 Internal Server Error—which typically means the service is not available for a connection.

Second use case—build your analytics

In this use case, we’ll show how to construct a query from the Analytics menu that arrives at the same diagnosis screen:

Figure 11: Construct the queries using the Query Builder.

The first step is to define the Filter, Group and Chart parameters that you want to examine and then construct the queries using the Query Builder.

In this screen, you can see that we built the same query as when we selected the Analyze Calls button in our Event investigation in the previous screenshot.

Figure 12: The same query as in the automated use case.

By scrolling down, we can select the same Endpoint that we selected in the Event incident investigation:

Figure 13: Select the endpoint again.

After selecting the same Endpoint parameter, we end up at the same Call 500 error analytics screen that we reached in the Event incident investigation.

Figure 14: The same Call 500 error analytics screen.

Starting at a flagged anomalies origin or from the sidebar Analytics menu selection, you can arrive at the same analysis. Along the way, once you’re in the Analytics dashboard, you can add or subtract other contextual data to the query to provide more granular information and to determine whether other factors contribute to the issue.

Figure 15

Conclusion

Unbounded analytics is the anomaly-hunting tool for the IBM Instana platform. The first objective of enterprise observability is to obtain Metrics, Events, Traces and Log information in granular one-second detail with context and display it so that you instantly see the state of your applications, services and infrastructure in real time.

But then, when issues arise, though, Unbounded analytics shifts into high gear and is the means that lets you instantly drill down to find an issue’s root cause. Using a combination of machine learning and issue detection, it constructs the associations to obtain answers, even in highly complex distributed environments illustrated in the IBM Instana Dynamic Graph.

When mean time to repair is critical, users are complaining about performance, or worse, customers are abandoning transactions on your core enterprise systems—time is not your ally. The IBM Instana solution not only has the information you need to identify issues but also correlates that information so that you can determine the full nature of the issue rapidly. Unbounded analytics is one of the key IBM Instana tools that enables that correlation.

To learn more, sign up for a free, two-week trial

Categories

More from IBM Instana

Observing Camunda environments with IBM Instana Business Monitoring

3 min read - Organizations today struggle to detect, identify and act on business operations incidents. The gap between business and IT continues to grow, leaving orgs unable to link IT outages to business impact.  Site reliability engineers (SREs) want to understand business impact to better prioritize their work but don’t have a way of monitoring business KPIs. They struggle to link IT outages to business impacts because data is often siloed and knowledge is tribal. It forces teams into a highly reactive mode…

Buying APM was a good decision (so is getting rid of it)

4 min read - For a long time, there wasn’t a good standard definition of observability that encompassed organizational needs while keeping the spirit of IT monitoring intact. Eventually, the concept of “Observability = Metrics + Traces + Logs” became the de facto definition. That’s nice, but to understand what observability should be, you must consider the characteristics of modern applications: Changes in how they’re developed, deployed and operated The blurring of lines between application code and infrastructure New architectures and technologies like Docker,…

Debunking observability myths – Part 5: You can create an observable system without observability-driven automation

3 min read - In our blog series, we’ve debunked the following observability myths so far: Part 1: You can skip monitoring and rely solely on logs Part 2: Observability is built exclusively for SREs Part 3: Observability is only relevant and beneficial for large-scale systems or complex architectures Part 4: Observability is always expensive In this post, we'll tackle another fallacy that limits the potential of observability—that you can create an observable system without observability driven by automation. Why is this a myth? The notion that…

Top 8 APM metrics that IT teams use to monitor their apps

5 min read - A superior customer experience (CX) is built on accurate and timely application performance monitoring (APM) metrics. You can’t fine-tune your apps or system to improve CX until you know what the problem is or where the opportunities are. APM solutions typically provide a centralized dashboard to aggregate real-time performance metrics and insights to be analyzed and compared. They also establish baselines to alert system administrators to deviations that indicate actual or potential performance issues. IT teams, DevOps and site reliability…