Infrastructure correlation
This page describes how the worlds of application monitoring and infrastructure monitoring are integrated together.
For more information, see application monitoring and infrastructure monitoring.
While some users may mostly rely on application monitoring, it is at times useful to gain a better understanding of how the logical layer is mapped to the physical layer, and required when it comes to troubleshooting issues detected at the application level, but whose root cause lies in the infrastructure layer.
Instana allows bidirectional navigation in the UI between application and infrastructure monitoring:
- From applications and services to infrastructure monitoring
- From calls to infrastructure monitoring
- From infrastructure monitoring to application monitoring
Infrastructure correlation also plays a significant role in:
Similarly, application monitoring and website monitoring are integrated together as explained in more details here.
Navigation from services
In its simplest form a service is executed by a single process, e.g. a spring boot application. However it is pretty common that it is executed by multiple processes to support various use cases:
- increase throughput and resilience, the more processes the better
- to divide up the work by different tenants or configurations
- to run different versions of that service, usually in different environments
That means that from a service you can usually navigate to one or multiple underlying processes and related entities such as hosts, containers, Kubernetes pods, etc.
One entry point is the Stack, inside the Infrastructure tab, which tells you everything about what pieces of infrastructure takes care of running the service, including their health and important metrics like the CPU or memory usage.
Another entry point is the Infrastructure tab of a service dashboard, which gives you a different breakdown than the Stack and in addition provides tracing metrics such as call count, error rate, and mean latency from the host level down to the process level.
Both the preceding widgets offer Kubernetes and Pivotal Cloud Foundry as first-class citizens giving you direct access to related entities like Kubernetes services or PCF applications.
Example of a Stack for a service running in Kubernetes:
Example of an Infrastructure tab for a service running in Kubernetes:
Navigation from calls
Generally traces are linked to infrastructure entities. More specifically a call can be linked to up to 2 processes: the source initiating the call and the destination handling that call. For example a NodeJS process makes an HTTP call to a PHP process.
In the Trace View, you can select any call to open up the Call Details. From both the source and destination sections you are able to know what pieces of infrastructure were involved.
In this example the source is a PHP process while the destination is a spring boot application (Java process):
Sometimes the correlation to infrastructure is not possible, which usually means that the underlying process is not monitored by Instana.
This is always the case for the root call of a trace because an
external, unmonitored client makes the initial request. Here the
shipping service is called from something that is not
monitored by Instana:
Another case is when outgoing calls target an external
third-party service. Here the payment service called
the external www.paypal.com external service.
Navigation from Infrastructure
When looking at infrastructure or platform (Kubernetes, Pivotal Cloud Foundry, and vSphere) entities, you can use the Stack to get a list of services it is executing, and applications it is part of along with metrics such as call and error count.
Upstream/Downstream gives you access to services and applications which respectively call and are called by the current infrastructure entity.
How it works
Before anything, it is important to understand that application and infrastructure monitoring are powered by two distinct data pipelines:
-
Application monitoring: the data (traces and calls.md) come from the Instana tracers or third-party tracers.
-
Infrastructure monitoring: the data (tags and metrics.md) come from the Instana sensors.
These 2 worlds merge seamlessly thanks to a mechanism that we call infrastructure linking, where calls are linked to monitored infrastructure entities. Linking occurs when a common identifier on both sides is found.
Instrumented services
Tracers instrument your processes to capture incoming and outgoing calls. These calls are then reported to the Instana backend where we attempt to link the source and destination of those calls to some known infrastructure entities. When the source process (or destination process) is instrumented, it necessarily implies that the source process (or destination process) is also monitored by an Instana sensor, which knows everything about it. Because both the tracer and sensor are co-located they both know the host and process, which makes infrastructure linking possible.
For example, a Python process is instrumented by the Python tracer, which captures all of the incoming and outgoing calls. Meanwhile several sensors got activated on the host where this process is running: the host, process, and python sensors. Both the tracer and sensors send data separately to the Instana backend but they both contain the same identifier to the process. It is therefore possible to link the destination of the incoming calls, and the source of the outgoing calls, to the Python process.
Services instrumented with OpenTelementry
In many cases, infrastructure correlation can also be performed for services that are instrumented with OpenTelemetry instead of Instana native tracers. For more information and best practices, see OpenTelemetry service mappings and infrastructure correlation.
Databases, messaging systems, and cloud services
Instana tracers do not instrument databases, messaging systems, or cloud services. However, processes that call these untraced systems are instrumented, so outgoing requests are properly mapped to calls. For example, the Java tracer records outgoing requests from a Java process to a MySQL database. These requests are analyzed into calls with the Java process as the source and the MySQL database as the destination. Such calls are visible in Instana, and their destinations are often linked to the infrastructure entity that receives the call. This mapping is possible only when Instana can uniquely match the address information from the outgoing client request with the monitored infrastructure:
On the one hand, Instana does monitor the database or messaging system through one of the Instana sensors and therefore knows about the process, its port, and the host. On the other hand Instana analyzes an outgoing request which may contain enough information to guess the destination process, usually the hostname or IP and port which is carried, e.g. in the connection string.
For example, an outgoing request to a MySQL instance could
contain the connection string
jdbc:mysql://10.128.0.6:3306.
Infrastructure monitoring detected a corresponding MySQL process
exposing the port 3306 and runnning on a host which
exposes the IP 10.128.0.6.
Because of both the IP and port match, the calls and the MySQL instance are linked together:
Instana also supports connection strings which contain a
Kubernetes service name like jdbc:mysql://mysql-svc.
Behind the scenes it will attempt to fully qualify the service name
to uniquely identify the service across all namespaces and
clusters. The result is a call whose destination is linked to the
Kubernetes service, instead of the final process.
For cloud services there are no processes but the idea is the same: find a common identifier shared by the monitored cloud service and the outgoing request to that service. This could be for example a resource identifier like a AWS ARN.
The matching of request address information to the known infrastructure data is not always possible. In some cases, database, messaging, or cloud services show "Unmonitored" infrastructure even though the destination systems are instrumented, as shown in the following examples:
For some monitored databases, messaging systems, or cloud technologies, infrastructure correlation is not supported because tracers do not extract sufficient address information from calls to uniquely identify the destination infrastructure. For example, tracing of messaging calls like Kafka can often determine only the destination queue or topic name, but not the actual messaging cluster or server.
Linking calls to infrastructure is sometimes not possible when the host or IP given in the connection string does not match any of hosts or IPs known from the infrastructure monitoring side. It is usually the case when there is a level of indirection where the process calling the remote database (or messaging service or cloud service) uses a hostname which is:
- an entry in the
/etc/hostssystem file - a DNS CNAME entry
- a pointer to a proxy or load balancer
- an alias given by a Service Discovery service like Consul or Zookeeper
In addition, linking calls to infrastructure is not possible when the address information from the client matches more than one infrastructure entity. This situation often occurs in high-availability database setups where multiple database instances use the same address for client access. As the database itself is not instrumented, Instana cannot identify which of the potential destinations actually executed the database call.
External services
External services are by definition not monitored by Instana and therefore not even visible on the infrastructure monitoring side. Because we know nothing about them, calls to these services are simply not linked to any known infrastructure entities.
In the Infrastructure tab, you can identify these calls as "Unmonitored":
Infrastructure correlation in application and service mapping
What is the role of infrastructure correlation in application and service mapping?
When the Instana backend analyzes traces and calls, it will
first link them to known infrastructure entities, and enrich them
with infrastructure tags such a host.name,
springboot.name or docker.label. These
tags are then used to automatically map these calls to services
using pre-defined
rules or
user-defined rules. For example, a call linked to a spring boot
process will be mapped to a service which gets its name from the
springboot application name. Or you could define a docker label
service-name which could be used to
create a custom service mapping rule to name most of your
services running in Docker.
Same is true for
application mapping where you can use these infrastructure tags
to define applications, for example using the
kubernetes.namespace tag:
When infrastructure linking is not possible, service mapping
cannot rely on infrastructure tags and rely instead on so called
fallback rules which are defined using call tags, like
call.http.host or
call.database.schema.
Infrastructure correlation and incidents
An incident groups related events by leveraging the Dynamic Graph. The ability to link calls (and therefore applications and services) to infrastructure entities enriches the Dynamic Graph with additional connections bridging the two worlds and will therefore result in even more complete incidents and faster root cause analysis.