Using the Dynamic Graph
The Dynamic Graph is a model of your application that captures all the physical and logical dependencies between its components such as Host, OS, JVM, Cassandra Node, or MySQL. The graph also includes logical components such as traces, applications, services, clusters, and table spaces. The agent and sensors automatically discover the components and their dependencies, which means the graph is kept up to date in real-time.
The graph continuously updates the state information for each node. The state information includes metrics, configuration data, and a calculated health value. The health value is calculated by using a combination of semantic knowledge and machine learning approaches. Instana analyses the dependencies within the graph to identify logical groupings, such as services and applications. The identification of logical groupings provides a deeper understanding of how issues impact these higher-level components and helps to determine the criticality of those issues. The whole graph is persistent and Instana can go back and forth in time to use the entire knowledge base of the graph for many operational use cases.
Based on the Dynamic Graph, Instana calculates the impact of changes and issues on the application or service. If the impact is critical, Instana combines a set of correlated issues and changes into an Incident. An incident shows how the issues and changes evolve over time, enabling Instana to point directly to the root cause of the incident. Any change is then automatically discovered and Instana calculates its impact on the surrounding nodes. A change can manifest in several ways:
- A degradation of health, which is then identified as an issue.
- A modification to the system's configuration.
- The deployment, appearance, or disappearance of a new process, container, or server.
To illustrate this concept, consider a simple application that uses an Elasticsearch cluster to search for products through a web interface. This example, which is a single microservice demonstrates how the Dynamic Graph models clusters and dependencies.
Dynamic applications
This section develops a model of the Dynamic Graph for an Elasticsearch cluster. This example helps you to understand how the Dynamic Graph works and the benefits it offers in distributed and fluid environments.
The model begins with a single Elasticsearch node. Since this node is a Java application, the graph appears as follows:
The graph depicts the automatically discovered components on the host along with the relationship between them. Related to the Elasticsearch node, the Instana agent discovers JVM, Process, Docker container (if the node runs inside a container), and the host on which it is running. If the application is running in a cloud environment like Amazon AWS, the Instana agent discovers its availability zone and add that to the graph.
The Dynamic Graph represents each component as a node. Each node has properties (for example, JVM_Version=1.7.21) and Instana collects all the relevant metrics in real-time. The metrics include network statistics for the Host, garbage collection statistics of the JVM, and the number of documents indexed by the Elasticsearch node.
The connections between the nodes in the graph represent their relationships. In this case, the relationships are “runs on” relationships. For example, the ES node "runs on" the JVM.
An Elasticsearch Cluster consists of multiple nodes that interconnect to form the cluster.
This graph has a cluster node that represents the state and health of the whole cluster. It establishes dependencies on all four Elasticsearch nodes that constitute the cluster.
The logical unit of Elasticsearch is the index. Elasticsearch clients use the index to access the documents. Each index is divided into one or more shards that are distributed across the ES nodes in the cluster.
The graph captures the statistics and health of the index that is used by the application.
The following graph includes a Spring Boot application that accesses the Elasticsearch index.
The Instana Java sensor records the distributed traces for the activity of the Spring Boot application. From these traces, Instana knows that the Spring Boot application accesses an Elasticsearch index. The Dynamic Graph correlates these traces with the corresponding logical components, which enables the tracking of statistics and health metrics of different traces.
This graph helps to understand different Elasticsearch issues and demonstrates how Instana analyzes their impact on the overall service health.
Consider the following problems:
- An I/O problem on a single host, which causes slow read/write performance for index/shard data.
- An Elasticsearch node is experiencing thread pool overload, leading to queued requests. These requests cannot be handled until a thread is available to process them.
The Host (1) encounters I/O problems. The health intelligence system displays the host's health as yellow and then fires an issue to the issue tracker. Then, the I/O problems on Host (1) impact the ES (Elasticsearch) Node (2). The health intelligence system sees that the throughput on this node is degraded and marks this node as yellow. The health intelligence system fires another issue. Instana correlates the two issues and adds them to one incident. This incident is not marked as problematic as the cluster health is good, and the service quality is not affected.
On another ES node (3), the query processing thread pool becomes overloaded, causing requests to get pooled. As the performance is badly affected by the thread pool overload, Instana marks the status of the node as red. This overload affects the ES cluster (4), which turns to yellow, as the throughput is decreasing. The two issues that are generated are aggregated to the initial incident.
As the cluster affects the performance of the index (5), we mark the index as yellow and add the issue to the incident. Now the performance of the product search transactions is affected, and the performance health analytics marks the transaction as yellow (6) which also affects the health of the application (7).
As both the application and the transaction are affected, the incident fires a yellow status, which indicates that the product search performance is decreasing and users are affected. The path to the two root causes is highlighted – the I/O problem and the Thread pool problem. Instana shows the evolution of the incident, and the user can drill into the components at the time of the issue – including the exact historic environment and metrics.
This shows the following capabilities of Instana:
- Ability to combine physical, process, and trace information into the graph and understand their dependencies.
- Intelligence to understand the health of single components, clusters, applications, and traces.
- Intelligent impact analysis to understand if an issue is critical or not.
- Ability to show the root cause of a problem and give actionable information and context.
- Ability to store the history of the graph, its properties, metrics, changes, and issues. Instana provides a “time shift” feature to analyze any problem with a clear view on the state and dependencies of all components.
Finding the root cause of problems in modern environments will get more challenging in the coming years. As illustrated by the simple example, it requires a deep understanding of the context, dependencies, and impact. The complexity increases further in “liquid” systems based on microservices that add and remove services all the time with new releases pushed out frequently. Instana tracks the state and health in real-time, and understands any impact of these changes or issues. This is all done without any manual configuration and in real-time.
Usage
The Dynamic Graph is created and updated automatically. The definition of some components, like services, can be further specified through service configuration.
You can use our powerful Dynamic Focus ability for graph traversal and scoping.