Using the Dynamic Graph

Edit online

The Dynamic Graph is a model of your application that captures all the physical and logical dependencies between its components. The graph includes physical components such as hosts, operating systems, Java Virtual Machine (JVMs), Cassandra nodes, and MySQL databases. The graph also includes logical components such as traces, applications, services, clusters, and table spaces. Through automatically discovering the components and their dependencies by Instana agents and sensors, the graph continuously updates in real-time to reflect the current state.

The graph continuously updates the state information for each node. The state information includes metrics, configuration data, and a calculated health value. The health value is calculated by using a combination of semantic knowledge and machine learning approaches. Instana analyses the dependencies within the graph to identify logical groupings, such as services and applications. The identification of logical groupings provides a deeper understanding of how issues impact these higher-level components and helps to determine the criticality of those issues. The whole graph is persistent and Instana can go back and forth in time to use the entire knowledge base of the graph for many operational use cases.

Using the Dynamic Graph, Instana calculates the impact of changes and issues on applications or services. If the impact is critical, Instana combines a set of correlated issues and changes into an Incident. An incident tracks the evolution of issues and changes over time, enabling Instana to point directly to the root cause of the incident. Changes are automatically discovered and Instana calculates its impact on the surrounding nodes.

The change can manifest in several ways:

Degradation of health, which is then identified as an issue.
Modification to the system's configuration.
Deployment, appearance, or disappearance of a new process, container, or server.

To illustrate this concept, consider a simple application that uses an Elasticsearch cluster to search for products through a web interface. This example, which is a single microservice, demonstrates how the Dynamic Graph models clusters and dependencies.

Dynamic applications

Edit online

A model of the Dynamic Graph for an Elasticsearch cluster is developed to understand how the Dynamic Graph works and the benefits that it offers in distributed and fluid environments.

The model begins with a single Elasticsearch node. Because the Elasticsearch node is a Java application, the graph appears as follows:

ES Node graph

The graph shows the automatically discovered components on the host along with their relationships. For the Elasticsearch node, the Instana agent discovers the JVM, process, Docker container (if the node runs inside a container), and the host on which the agent is running. If the application is running in a cloud environment such as Amazon AWS, the Instana agent discovers its availability zone and includes that in the graph.

In the Dynamic Graph, each component is represented as a node. Each node has properties (for example, JVM_Version=1.7.21). Instana collects all the relevant metrics in real-time. The metrics include network statistics for the host, garbage collection statistics of the JVM, and the number of documents that are indexed by the Elasticsearch node.

The connections between nodes in the graph represent their relationships. In this case, the relationships are "runs on" relationships. For example, the ES node "runs on" the JVM.

An Elasticsearch cluster consists of multiple nodes that are interconnected to form the cluster.

ES Cluster graph

This graph includes a cluster node that represents the state and health of the whole cluster. It establishes dependencies on all four Elasticsearch nodes that constitute the cluster.

The logical unit of Elasticsearch is the index. Elasticsearch clients use the index to access the documents. Each index is divided into one or more shards that are distributed across the ES nodes in the cluster.

The graph captures the statistics and health of the index that is used by the application.

ES Index graph

The following graph includes a Spring Boot application that accesses the Elasticsearch index.

Spring Boot graph

The Instana Java sensor records the distributed traces for the activity of the Spring Boot application. From these traces, Instana knows that the Spring Boot application accesses an Elasticsearch index. The Dynamic Graph correlates these traces with the corresponding logical components, which enable the tracking of statistics and health metrics of different traces.

This graph helps to understand different Elasticsearch issues and demonstrates how Instana analyzes their impact on the overall service health.

Consider the following problems:

An I/O problem on a single host, which causes slow read/write performance for index or shard data.
An Elasticsearch node experiences thread pool overload, leading to queued requests. These requests cannot be handled until a thread is available to process them.

Graph incident

Incident description

The Host (1) encounters I/O problems. The health intelligence system displays the host's health as yellow and then fires an issue to the issue tracker. Then, the I/O problems on Host (1) impact the ES (Elasticsearch) Node (2). The health intelligence system detects degraded throughput on this node, and marks it as yellow. The health intelligence system fires another issue. Instana correlates the two issues and adds them to one incident. This incident is not marked as problematic as the cluster health is good, and the service quality is not affected.

On another ES node (3), the query processing thread pool becomes overloaded, causing requests to get pooled. As the performance is badly affected by the thread pool overload, Instana marks the status of the node as red. This overload affects the ES cluster (4), which turns to yellow due to decreasing throughput. The two issues that are generated are aggregated to the initial incident.

As the cluster affects the performance of the index (5), the index is marked as yellow, and the issue is added to the incident. Now, the performance of the product search transactions is affected, and the performance health analytics marks the transaction as yellow (6) which also affects the health of the application (7).

Because both the application and the transaction are affected, the incident fires a yellow status, which indicates that the product search performance is decreasing and users are affected.

Highlighted the two root causes:

I/O problem
Thread pool problem

Instana shows the evolution of the incident, and the user can drill into the components at the time of the issue, including the exact historic environment and metrics.

Instana offers the following capabilities:

Ability to combine physical, process, and trace information into the graph and understand their dependencies.
Intelligence to understand the health of single components, clusters, applications, and traces.
Intelligent impact analysis to understand if an issue is critical or not.
Ability to show the root cause of a problem and give actionable information and context.
Ability to store the history of the graph, its properties, metrics, changes, and issues. Instana provides a "time shift" feature to analyze any problem with a clear view on the state and dependencies of all components.

Finding the root cause of problems in modern environments gets more challenging in the coming years. As illustrated by the simple example, solving these issues requires a deep understanding of the context, dependencies, and impact. The complexity increases further in "liquid" systems based on microservices that add and remove services all the time with new releases pushed out frequently. Instana tracks the state and health in real-time, and understands any impact of these changes or issues-all without any manual configuration and in real-time.

Usage

Edit online

The Dynamic Graph is automatically created and updated. You can define some components such as services as described in service configuration.

You can use the powerful Dynamic Focus ability for graph traversal and scoping.