Automatic discovery and monitoring
Automatic discovery and monitoring offer efficient and adaptive system management, which helps ensure comprehensive coverage across physical, logical, and business components in large-scale applications. This approach detects anomalies in real time to minimize the impact of failures. This advanced monitoring solution provides a thorough and adaptive approach to system management, which is crucial for maintaining the reliability and performance of modern, dynamic applications.
Traditional Application Performance Management (APM) tools help you manually examine and correlate data to detect bottlenecks and errors in production environments. However, traditional APM tools face challenges such as resolving issues that are introduced by the complexities of higher scales and the dynamic nature of modern systems. You must correlate interconnected components and metrics to identify the issues.
A machine learning-driven approach offers significant improvement for system management. To achieve optimal results, the foundation of this approach must be built upon a robust and comprehensive core model. Microservice applications are made of many building blocks, which are constantly evolving. Therefore, you must understand all the blocks and their dependencies, which demand an advanced approach to discovery.
Components of automatic discovery and monitoring
Automatic discovery and monitoring cover physical components, logical components, and business components.
Physical components
The following table outlines the physical components and their descriptions:
Components | Description |
---|---|
Datacenter or Availability zones | Zones exist in different continents and regions. They can fail or have different performance characteristics. |
Hosts or Machines | Either physical, virtual, or delivered as a service. Each host has resources, such as CPU, memory, and I/O, which can become a bottleneck. Each host runs in one zone. |
Containers | Runs on top of a host and needs a scheduler, such as Kubernetes or Apache Mesos to manage containers. |
Processes | Runs within a container or on the host. The processes can include runtime environments, such as Java or PHP, and middleware, such as Tomcat, Oracle, or Elasticsearch. |
Clusters | Many services can act as a group or cluster so that they appear as a unified distributed process to the outside world. The number of instances within a cluster can change and can have an impact on the cluster performances. |
Logical components
The following table outlines the logical components and their descriptions:
Components | Description |
---|---|
Services | Logical units of work that can have many instances and different versions that run on top of the previous mentioned physical building blocks. |
Endpoints | Public API of a service to expose specific commands to the rest of the system. |
Application Perspectives or Applications | A perspective on a set of services and endpoints that are defined by a common context that are declared by using tags. |
Traces | Sequence of synchronous and asynchronous communications between services. Services communicate with each other and deliver a result for a user request. The process of transforming data in a data flow might involve many services. |
Calls | Request between two services. A trace is composed of one or more calls. |
Business components
The following table outlines the business components and their descriptions:
Components | Description |
---|---|
Business services | Compositions of services and applications that deliver unique business value and services. |
Business process | Combination of technical traces that form a process. For example, it might represent the "buying" trace in e-commerce, followed by an order trace in ERP, followed by a trace of FedEx's logistics in delivery to the customers. |
It is common for thousands of service instances in various versions that run on hundreds of hosts across multiple zones and continents to provide an application to its users. This creates a network of dependencies between the components that must work perfectly together so that the service quality of the application is ensured, and the business value is delivered. A traditional monitoring tool might alert when a single component crosses a threshold. However, the failure of one or many of these components does not mean that the quality of the application is definitely affected. Therefore, modern monitoring tools need to understand the entire network of components and their interdependencies to effectively monitor, analyze, and predict the quality of service.
Identifying and cataloging changes
The number of services and their dependencies in modern applications is higher than in applications based in service-oriented architecture (SOA)-based applications, which present a challenge for monitoring tools. The high rate of change of applications due to continuous delivery methodologies, automation tools, and container platforms makes the issue worse. In this dynamic environment, manually keeping up with changes and continually configuring monitoring tools for newly deployed blocks is impractical.
To address this challenge, a modern monitoring solution must automatically and instantaneously discover every block before analyzing and understanding them. This capability helps ensure that subsequent changes are linked, allowing for the reconstruction of a mode at any point in time to investigate incidents.
The changes can happen in any of the building blocks at any time as shown in the following image:
Instana's comprehensive discovery process
Instana Dynamic APM relies on the Instana agent architecture, which uses sensors. Sensors are small, automated programs that are designed to monitor specific entities. A single agent (one per host), which is deployed as a stand-alone process on the host or as a container through the container scheduler, manages these sensors.
The agent automatically detects and monitors various physical components, such as AWS availability zones, Docker containers that run on the host or Kubernetes, processes, such as HAProxy, Nginx, JVM, Spring Boot, Postgres, Cassandra, or Elasticsearch, and even clusters of these processes, such as a Cassandra cluster. For each detected component, the agent collects its configuration data and begins monitoring for changes. It also sends essential metrics for each component every second. The agent automatically detects and uses metrics that are provided by the services, such as JMX or Dropwizard.
Later, the agent injects trace functions into the service code. For example, it intercepts HTTP calls, database calls, and queries to Elasticsearch. The agent captures the context of each call, such as stack traces or payload.
The intelligence that processes and analyzes the captured data into traces, identifies dependencies and services, and detects anomalies and issues is done on the server. Therefore, the agent is compact and can be deployed in thousands of hosts.
Instana is designed for an automatic, immediate, and continuous discovery for the new generation of monitoring solutions.
Collecting data
Instana is a monitoring solution that uses a single agent with multiple sensors, supporting over hundreds of technologies. Sensors are automatically discovered and monitored, with data passed to the agent. The agent manages all communication with the Instana Service Quality Engine. After discovery, sensors collect detailed data about the component's state, which is tailored to the specific technology. The sensors are updated, loaded, and unloaded by the agent. An optional command-line interface provides access to the agent state, individual sensors, and agent logs. For more information, see Configuring and monitoring supported technologies.
The sensor collects the following data:
- Configuration: Catalogs current settings and states to track any changes.
- Events: Initial discovery, state changes (online and offline), built-in events that trigger issues or incidents based on failing health rules on entities, and custom events that trigger issues or incidents based on the thresholds of an individual metric of any entity.
- Traces: Captures trace based on the programming language platform.
- Metrics: Qualitative attributes of the technology that indicate performance.
Instana's sensors perform recursive discovery, such as the Java Machine sensor that continues up the stack to discover frameworks like Tomcat or SpringBoot. Based on discovery, the sensor assists the agent in loading the appropriate additional sensors. In addition, Instana's intelligence for processing and analyzing data, identifies dependencies and services, and detects anomalies and issues that occur on the server. Therefore, the agent is lightweight and can be deployed in thousands of hosts.
The Instana backend uses the streaming technology, which can process millions of events per second that are streamed from the agents. This streaming engine is effectively real-time and takes only 3 seconds to process the situation and display it to the user.