Analyzing traces and calls

Edit online

Examine traces in Unbounded Analytics, where you can investigate the traces and calls collected by Instana. To help you understand how an application behaves with each call, we monitor each one of those calls as they come in to the system.

View traces

Edit online

On the sidebar, click Applications.
On the Applications dashboard, select an application or service.
On the application or services dashboard, click Analyze Calls.
On the Analytics dashboard you can analyze calls by application, service, and endpoint, breaking down the data that Instana is presenting by service, endpoint and call names, respectively. Under Applications, select Calls or Traces.
Click a group and then select a trace.

View trace analytics

Edit online

Trace Analytics

Filtering and grouping traces or calls

Edit online

On the Analytics dashboard, traces or calls can be filtered and grouped using arbitrary tags. In Analyze Calls, filters can be connected using the AND and OR logic operators and grouped together with brackets. In Analyze Traces, only the AND operator is available.

Query Builder

There are two approaches to filter data:

Query Builder
Filter Sidebar

While both of them are usable on their own, best results are achieved when combining them.

Query Builder

Edit online

Use the Query Builder on the Analytics dashboard to filter the initial result set. By clicking Add filter, you can apply the application.name, service.name, and endpoint.name tags, along with infrastructure entity tags such as agent.tag or host.name, to both the source and the destination of a call. By default it is applied to the destination. To change it to the source, click the selector before the tag name and select source. By combining source and destination, you can create queries such as Show me all the calls between these two services or Show me all the calls that are issued from my agent.zone 'production' towards the agent.zone 'test'. The selection of source or destination is not available for call tags such as call.http.path or call.tag, which are properties on the call itself and are independent of the source or destination.

To apply grouping, click Add group and select one of the tags. The default grouping uses the endpoint name (endpoint.name) tag. To inspect the individual traces and calls that match the filters, you can either expand the group to peak into the results, or click Focus on this group that removes the grouping and further filters the results by the value of the selected group. Tags can be applied to the call's source or destination, so you can express queries such as Show me all the calls towards this one service, broken down by caller. Calls that do not match any group are shown in a special group named Tag not present, for instance with agent.zone this will be Calls without the 'agent.zone' tag. To remove the unmatched agent.zone from the results, apply an additional filter with the is present operator. Grouping by source and destination is also not available in Analyze Traces, as the available groups in that view are independent from source or destination of any one particular call.

The preceding example filters by application Catalogue (user-journey) and lists the calls grouped by the endpoint name.

Filter Sidebar

Edit online

Using the results received by applying query builder filters, it's possible to quickly drill down on the data by applying additional filters in the Filter Sidebar on the Analytics dashboard.

Sidebar

Items within the same tag category will be concatenated via logical OR, different tag categories will be concatenated using logical AND. All selected filters in the filter sidebar are applied over any applied query builder filter through logical AND. The header of the filter sidebar shows the total count of selected items across all tags and allows you to quickly remove all applied sidebar filters.

Attention: Note that multiple selections for a single tag are currently not supported on Analyze Traces.

In the preceding example we're filtering by application Catalogue (user-journey) in the query builder AND services catalogue-demo OR discount-svc selected in the filter sidebar.

To quickly group by one of the filter sidebar tags, click the grouping button displayed besides each tag suitable for grouping. This is a quick way to configure grouping in query builder as described earlier. The same way it is possible to group by a specific filter sidebar tag, it is also possible to lift grouping again by clicking the ungroup button on a tag currently used for grouping.

Known limitations

Edit online

Grouping calls by log tags: when grouping calls by log.level or log.message, the special group Tag not present will not be represented like it is the case for other tags.

Latency Distribution

Edit online

Trace and call latency can be inspected using the Latency Distribution chart. When selecting a latency range on the chart, filters are adjusted accordingly. The results in the table as follows will be updated to show only traces or calls within the specified latency range.

Latency Distribution View

Trace view

Edit online

To display a trace view, on the Analytics dashboard select a group, and then click the trace. Selecting a call displays the call in the context of its trace.

Trace View

Summary details

Edit online

The summary details of a trace include:

The trace name (usually an HTTP entry).
The name of the service it occurred on.
The type or technology.
The core KPIs:
- Sub calls to other services.
- The number of erroneous calls.
- The number of errors within the trace.
- The number of warnings within the trace.
- Trace duration. An interval between the start of the first and the end of the last call in a trace.

Timeline

Edit online

The trace timeline displays the following:

when the trace started.
the chronological order of services that have been called throughout the trace.

The call chains hang from the root element (span). On simple three tier systems, you have a typical depth of four levels. In contrast, on systems with a distributed service or microservices architecture, you can expect to see much longer icicles. When you have long sub-calls of the trace, or periodic call patterns, like one HTTP call per database entry, the timeline gives you an excellent overview of the call structure.

To view the details of the span, click the span within the timeline graph. To view the details of where the time was spent within a specific call, hover over the call that is displayed on the timeline graph.

The call details include the following kinds of time:

Self: The amount of time that the call spends outside of the downstream calls (that is, the time that is spent within the call).
Waiting: The amount of time that the call spends waiting on all the downstream calls to complete.
Network: The time difference between the Exit Span Time of the caller and the Entry Span Time of the call.
Total: The total time of a call.

Services

Edit online

The services, listed under the timeline graph, summarizes all the calls per service and lists the number of calls, the aggregated time, and the errors that have occurred. Each service has its own colour (in this example shop = blue, productsearch green). Select a service to view its details in the applications and services dashboard.

Calls

Edit online

Call View

The trace tree displays the structure of the upstream and downstream service calls, along with the type of the call. To explore specific calls, expand and collapse individual parts of the trace tree. Select a call to view its details in the services and endpoints dashboard.

Orphan calls

Edit online

A call is considered orphan if its parent call is missing. A call can be missing for various reasons, such as not being finished yet or being sent to another APM tool. Because the parent-child relationship determines the position of a call within a call tree, the position of an orphan call is unknown. Orphan call is attached directly to the root call. An indicator icon is displayed on the edge between the root and the orphan call.

Orphan call

Call details

Edit online

To display the call detail sidebar, select a call in the timeline graph. The details that are displayed include the source and destination of the call, errors, a status code, along with the stack trace.

Store traces

Edit online

To manually store a displayed trace in long-term storage (for up to 13 months), click the Store Trace button. Alternatively, you can also store the trace automatically by remaining on the Trace Detail view for at least 15 seconds. However, storing large traces for long term is not supported.

Capture logs and errors

Edit online

Instana automatically captures errors when a service returns a bad response or log with an ERROR or WARN level (or similar depending on the framework) was detected.

Automatic aggregation of short exit calls

Edit online

Instana always endeavors to give you the best understanding of service interactions, while also minimizing impact on the actual application. However, certain scenarios require Instana to drop data in order to achieve that.

A common problem in systems is the so called 1+N query problem, which describes a situation where code performs 1 database call to get a list of items, followed by N individual calls to retrieve the individual items. The problem usually can be fixed by only performing one call and joining the other calls to it.

The icon next to the call name indicates how many requests were batched together. Call details match those of the most significant service invocation, for example the request with highest duration or having errors. Duration and error count for the shown call is aggregated from all batched calls.

Call batching example

The aggregation of service interactions only happens within the following constraints:

High frequent and repetitive access patterns of similar type
Individual service invocations take less than 10 ms
Time between invocations is less than 10 ms

Capture parameters

Edit online

Due to impact concerns, at the moment, the tracing sensors of Instana do not automatically capture method parameters or method return values. To capture additional data on demand, use the SDKs.

Long running tasks

Edit online

Due to timeouts, high load, or any other number of environmental conditions, calls might need significant time until they respond. Traces can contain tens or even hundreds of such calls. Because Instana intends to provide tracing information as fast as possible to the user, long-running spans are at first replaced with a placeholder. When the long-running span finally returns, the placeholder is replaced again with the correct call information.

Batched trace processing

Edit online

Due to the high performance and near real-time nature of the span processing pipeline, spans that arrive late and asynchronously are treated slightly different when they are linked to the resulting trace. An information box in the Trace View is presented to the user, and gives information about a malformed trace.

For Instana users, the following effects might occur:

In Trace View, not all sub-calls for the trace are presented.
Separate traces with the same trace-id get listed and partially present the overall trace.
A call might not get mapped to the corresponding Application Perspective, and then might be missed in Unbounded Analytics.
A call might not get mapped to the corresponding Service, and then might be missed in Unbounded Analytics.
The flow map might show inconsistent call counts.
The flow map might show inconsistent service mappings.

This trace has been processed in multiple batches

In this situation, some spans arrive after the 2-second interval when the resulting trace has already been processed. This approach presents all captured spans, but some of the correlations might not be correct.

This potentially leads to the following anomalies in the Instana data model:

An exit span and the corresponding entry span might not get merged into a single call.
A call might not get mapped to the correct service.
A call might not get linked to the correct parent call.
A call might miss infrastructure tags.
A call might not be, or incorrectly mapped to an application perspective.

The following image presents a similar trace, with all spans that are processed in the context of the root span and therefore linked to the trace in a single batch:

Trace processed in single batch

Approximate data

Edit online

Traces and calls are retained for 7 days. Past this period, you see an approximate data indicator of how many calls are retained and the estimation of original call count. Traces and calls that rarely occur might not be represented in such scenarios.

Note: If the time range in the timepicker starts more than and ends within the last 7 days, the whole selected time range will be analyzed using approximate data, even though full data is retained for part of that time range. If you want to analyze full data set, make sure that the selected time range starts within the last 7 days.

Approximate data indicator

Accurate metrics of application perspectives are retained for the last 31 days. When you expand a group or remove the group to see individual calls, you can see only the retained calls and the approximate data indicator.

Beyond 31 days, both metrics and calls will be approximate.

Approximate grouped data indicator

Note on sampling accuracy for call-level metrics

Edit online

The system uses random sampling based on the trace ID hash, which ensures consistent sampling at the trace level. However, this has implications when analyzing call-level metrics:

Sampling is done per trace, not per call. If the number of calls per trace varies significantly, the call-level data can become skewed.
For instance, a single exceptional trace with more than a million calls can heavily impact call-level metrics if the system includes it in the sampling.
This is expected behavior, not a defect. It reflects how trace-based sampling works.

Limitations

Edit online

HTTP parameters that do not specify a name are ignored during call analysis and cannot be used for filtering or grouping. For example, in a call with the query string =val1&key=val2, only named parameter key with value val2 is recognized. The unnamed parameter =val1 is ignored. However, the full query string can still be observed in the call details.