Monitoring Elasticsearch
The Elasticsearch sensor is automatically deployed and installed after you install the Instana agent.
Supported information
Supported operating systems
The supported operating systems of the Elasticsearch sensor are consistent with host agent requirements, which you can check in the Supported operating systems section of each host agent, such as Supported operating systems for Unix.
Supported versions
Instana now supports monitoring the metrics and configuration data for Elasticsearch 0.17.0 to 8.13.2.
Supported client-side tracing
For this technology, Instana supports client-side tracing for the following languages and runtimes:
Configuration
Instana automatically monitors up to 1000 indices and collects 5 most important metrics per index. To enable in-depth index monitoring that gathers 20 metrics per index for up to 200 indices, you need to specify indicesRegex
in
the <agent_install_dir>/etc/instana/configuration.yaml
agent configuration file as shown:
com.instana.plugin.elasticsearch:
enabled: true
indicesRegex: '<INSERT_INDEX_REGEX_HERE>' # eg. 'env-prod.*'
Metrics collection
To view the metrics, complete the following steps:
- From the navigation menu of the Instana UI, select Infrastructure.
- Click a specific monitored host where Elasticsearch is installed.
You can see host dashboard with the following performance metrics, configuration data, and health signatures.
Node-Level
Configuration data
- Version
- Cluster
- Health Status
- Node Name
- Node Type
- Node is Master
- Node is Master Eligible
- Transport
- HTTP
- Log Directory
- Shards
- Indices
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | The query latency is collected from NodeIndicesStats#SearchStats . |
1 second |
Number of Queries | The query count per second is collected from NodeIndicesStats#SearchStats . |
1 second |
Overall Documents | The total number of documents is collected from DocsStats#count . |
1 second |
Added Documents | The total number of indexing operations is collected from IndexingStats#indexCount . |
1 second |
Removed Documents | The number of delete operations that are executed is collected from IndexingStats#deleteCount . |
1 second |
Active Shards | The number of active shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Active Primary Shards | The number of active primary shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Refresh Count | The number of refreshes that are executed per second is collected from NodeIndicesStats#RefreshStats . |
1 second |
Refresh Time | The total time merges that are executed is collected from NodeIndicesStats#RefreshStats . |
1 second |
Flush Count | The total number of flushes that are executed per second is collected from NodeIndicesStats#FlushStats . |
1 second |
Flush Time | The total time merges that are executed is collected from NodeIndicesStats#FlushStats . |
1 second |
Indices metrics | Documents count, Deleted count, and Size per index is collected from IndexStats#DocsStats . |
1 second |
Lucene Segments | The number of segments is collected from NodeIndicesStats#SegmentsStats#count . |
1 second |
Active Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#active . |
1 second |
Queued Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#queue . |
1 second |
Rejected Threads | Search, Index, Bulk, Get are collected from ThreadPoolStats.Stats#rejected . |
1 second |
Sent Data | The size of TX packets that are sent by the node during internal cluster communication is collected from TransportStats#tx_size |
1 second |
Received Data | The size of RX packets that are received by the node during internal cluster communication is collected from TransportStats#rx_size |
1 second |
Index metrics
Data point | Description | Granularity |
---|---|---|
Total Queries | The total number of query operations is collected from SearchStats.Stats#queryTotal |
1 second |
Queries Current | The number of query operations that are currently running is collected from SearchStats.Stats#queryCurrent |
1 second |
Fetches Total | The total number of fetch operations is collected from SearchStats.Stats#fetchCount |
1 second |
Fetches Current | The number of fetch operations that are currently running is collected from SearchStats.Stats#fetchCurrent |
1 second |
Query Time | Time in milliseconds that is spent in executing query operations is collected from SearchStats.Stats#queryTimeInMillis |
1 second |
Fetch Time | Time in milliseconds that is spent in executing fetch operations is collected from SearchStats.Stats#fetchTimeInMillis |
1 second |
Query Cache Evictions | The number of query cache evictions is collected from QueryCacheStats#evictions |
1 second |
Request Cache Evictions | The number of cache eviction requests is collected from RequestCacheStats#evictions |
1 second |
Get Requests | The total number of Get requests is collected from GetStats#count |
1 second |
Get Requests Time | Time in milliseconds spent on Get requests is collected from GetStats#timeInMillis |
1 second |
Failed Get Requests | The number of failed Get requests is collected from GetStats#missingCount |
1 second |
Failed Get Requests Time | Time in milliseconds that is spent on failed Get requests is collected from GetStats#missingTimeInMillis |
1 second |
Indexing Operations Failed | The number of failed indexing operations is collected from IndexingStats#indexFailedCount |
1 second |
Active Merges Count | The current number of merges that are executed is collected from MergeStats#current |
1 second |
Total Merges Size | The total size of merges that are executed is collected from MergeStats#totalSizeInBytes |
1 second |
Total Merges Time | The total time for merges that are executed is collected from MergeStats#totalTimeInMillis |
1 second |
Index metrics that are mentioned in the Index metrics section are enabled for indices that are configured through indicesRegex
regular expression in the agent configuration.
Health Signatures
Each sensor has a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents that depend on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any specific entity.
For more information about built-events for the Elasticsearch node, see the Built-in events reference.
Cluster-Level
Configuration data
- Name
- Health Status
- Nodes, Masters
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | The query latency is calculated as the maximum query latency of all nodes. | 1 second |
Number of Queries | The query count is calculated as the sum of the query count for all nodes. | 1 second |
Overall Documents | The Overall Documents is calculated as the sum of overall documents for all nodes. | 1 second |
Added Documents | The sum of the documents that are added for all nodes. | 1 second |
Removed Documents | The sum of the documents that are removed for all nodes. | 1 second |
Indices | Number of indices | 1 second |
Shards | Active, Active Primary, Initializing, Relocating, Unassigned are collected from ClusterHealth . |
1 second |
Cluster State size | The size of the ClusterState . |
1 second |
Health Signatures
For more information about built-events for the Elasticsearch cluster, see the Built-in events reference.