When applications produce anomalies such as erroneous calls, problematic traces, a high degree of events or an accumulation of problem messages in logs, observability tools determine how fast you can identify the root cause.
In a best-case scenario, your observability tool uncovers the first signs of something that could slow down or disrupt an application, and you can mitigate it before it impacts customers. Other times, more sudden events can cause an immediate disruption, and you have to triage and resolve a major incident.
In other words, the alert isn’t the end of the observability process; it’s just the beginning. When every second counts, finding the root cause and applying the fix often depends on the UI of the observability tool. Complex search queries, non-intuitive navigation and hidden menus are time killers. Easier query data means faster fixes and happier customers.
The complexity and speed of modern cloud-native architectures make it impossible to manually keep track of dependencies between application components.
As is often the case, the first line of defense is automation.
Automation drives application data collection and analysis
Observability and APM tools record an unwieldy amount of information. The IBM Instana™ platform captures and stores every trace, gathered at one-second metric granularity with no sampling. The solution also uses Unbounded analytics capabilities to automatically associate or let you associate data in the context you need. Either path directs you to the specific graph, bar chart or text description that lets you drill down to the root cause.
All that information is organized and displayed in the Dynamic Graph to reveal the dependencies between your application components automatically. That means that the conditions that lead to and result from an issue are associated with that issue, so you can look forward or backward for cause and effect.
In fact, the amount of information could be overwhelming without an interface that makes it easy to drill down to specific data.
What’s new: Unbounded analytics just made triage even easier
That’s why we’ve put so much emphasis on ease of use since the inception of IBM Instana—so at the first sign of trouble, you can easily find and fix the issue. Now we’ve made analytics even faster with easier query data. Here’s a look at some enhancements.
Automated query strings and filters
All IBM Instana analytics are built using filters and groups that are combined using a simple query language that identifies the parameters to be viewed. You can then ‘AND’ or ‘OR’ those parameters into a robust logical statement that drills into the specific data combination you want to see. When analytics are presented within a sidebar menu category such as events, the analytics query string is automatically generated for the issue explored. When you alternatively use the Analytics sidebar menu, you can create queries from scratch.
Automated production profiler that continually analyzes
IBM Instana analytics top-level parameters include Application Calls, Traces and Logs, seven Website parameters, four Mobile parameters, and Profiles. Applications, Website and Mobile parameters can then be organized using the filter, group and latency attributes. IBM Instana
AutoProfile is an automated production profiler that continually analyzes Java, Go, Python, Ruby and other code-level performance.
The main Unbounded analytics screen is displayed in Figure 1. At the top is the Analytics navigation menu, which by default displays Application Calls as the display parameter.
More intuitive menus
The menu snippets in the next screenshot show the full range of Analytics menu options, including Traces, Website parameters, Mobile application parameters and Application Profile, which invokes IBM Instana AutoProfile code-level profiling capability:
The Applications menu:
The Websites menu:
The Mobile Apps menu:
First use case—automated analytics
Let’s examine analytics for event incidents from within traces, calls, events and other parameters for the IBM Instana RobotShop eCommerce demo environment.
In the previous screenshot, you’ll see that two events are flagged in the Events icon in the side menu.
Clicking on the Events icon opens the Events page.
In the Open Events list, we see the issues, including an increase in latency and the number of erroneous calls. The increase in error events appears to be more serious, so we click on the most serious listing and drill down to see more detail.
The event detail page in the next screenshot displays the Incident Timeline, Triggering Event and any contextual Related Events. To explore, we click on the Triggering Event to obtain more information about the Event cause:
Scrolling down on the Event screen, we see a significant spike in erroneous calls in the timeline graph. From here, we can either view the Built-in Event or invoke Unbounded analytics to obtain more information about the precipitating event by clicking on the Analyze Calls button.
IBM Instana displays an Unbounded analytics screen for Calls and automatically generates the appropriate Filter, Group and Chart information for the erroneous call event we’re examining. From the Filter line, we can see that the IBM Instana solution has combined the ‘Service Name: discount’ and ‘Call Erroneous is true’ parameters using the Unbounded Analytics query language to specifically point to the location of the erroneous calls. It identifies two endpoints that are experiencing a 100% erroneous call rate.
We scroll down the list of endpoint errors and examine one to get a better idea of what issue is causing the problem.
The detail screen shows that the erroneous call is a Connect call that is throwing a 500 Internal Server Error—which typically means the service is not available for a connection.
Second use case—build your analytics
In this use case, we’ll show how to construct a query from the Analytics menu that arrives at the same diagnosis screen:
The first step is to define the Filter, Group and Chart parameters that you want to examine and then construct the queries using the Query Builder.
In this screen, you can see that we built the same query as when we selected the Analyze Calls button in our Event investigation in the previous screenshot.
By scrolling down, we can select the same Endpoint that we selected in the Event incident investigation:
After selecting the same Endpoint parameter, we end up at the same Call 500 error analytics screen that we reached in the Event incident investigation.
Starting at a flagged anomalies origin or from the sidebar Analytics menu selection, you can arrive at the same analysis. Along the way, once you’re in the Analytics dashboard, you can add or subtract other contextual data to the query to provide more granular information and to determine whether other factors contribute to the issue.
Unbounded analytics is the anomaly-hunting tool for the IBM Instana platform. The first objective of enterprise observability is to obtain Metrics, Events, Traces and Log information in granular one-second detail with context and display it so that you instantly see the state of your applications, services and infrastructure in real time.
But then, when issues arise, though, Unbounded analytics shifts into high gear and is the means that lets you instantly drill down to find an issue’s root cause. Using a combination of machine learning and issue detection, it constructs the associations to obtain answers, even in highly complex distributed environments illustrated in the IBM Instana Dynamic Graph.
When mean time to repair is critical, users are complaining about performance, or worse, customers are abandoning transactions on your core enterprise systems—time is not your ally. The IBM Instana solution not only has the information you need to identify issues but also correlates that information so that you can determine the full nature of the issue rapidly. Unbounded analytics is one of the key IBM Instana tools that enables that correlation.