Monitoring Elasticsearch
The Elasticsearch sensor is automatically deployed and installed after you install the Instana agent.
Supported operating systems
The supported operating systems of the Elasticsearch sensor are consistent with host agent requirements, which can be checked in the Supported operating systems section of each host agent, such as Supported operating systems for Unix.
Supported Versions
Instana supports monitoring metrics and configuration data for Elasticsearch 0.17.0 to 7.17.10.
Configuration
Instana automatically monitors up to 1000 indices and collects 5 most important metrics per index. To enable in-depth index monitoring that gathers 20 metrics per index for up to 200 indices, you need to specify indicesRegex
in
the agent configuration file <agent_install_dir>/etc/instana/configuration.yaml
:
com.instana.plugin.elasticsearch:
enabled: true
indicesRegex: '<INSERT_INDEX_REGEX_HERE>' # eg. 'env-prod.*'
Metrics collection
To view the metrics, select Infrastructure in the sidebar of the Instana User interface, click a specific monitored host, and then you can see a host dashboard with all the collected metrics and monitored processes.
Node-Level
Configuration data
- Version
- Cluster
- Health Status
- Node Name
- Node Type
- Node is Master
- Node is Master Eligible
- Transport
- Log Directory
- Shards
- Indices
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | Query latency is collected from NodeIndicesStats#SearchStats . |
1 second |
Number of Queries | Query count per second is collected from NodeIndicesStats#SearchStats . |
1 second |
Overall Documents | Total Documents is collected from DocsStats#count . |
1 second |
Added Documents | The total number of indexing operations is collected from IndexingStats#indexCount . |
1 second |
Removed Documents | The number of delete operation executed is collected from IndexingStats#deleteCount . |
1 second |
Active Shards | The number of active shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Active Primary Shards | The number of active primary shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Refresh Count | The number of refresh executed per second is collected from NodeIndicesStats#RefreshStats . |
1 second |
Refresh Time | The total time merges have been executed is collected from NodeIndicesStats#RefreshStats . |
1 second |
Flush Count | The total number of flush executed per second is collected from NodeIndicesStats#FlushStats . |
1 second |
Flush Time | The total time merges have been executed is collected from NodeIndicesStats#FlushStats . |
1 second |
Indices metrics | Documents count, Deleted count and Size per index is collected from IndexStats#DocsStats . |
1 second |
Lucene Segments | The number of segments is collected from NodeIndicesStats#SegmentsStats#count . |
1 second |
Active Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#active . |
1 second |
Queued Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#queue . |
1 second |
Rejected Threads | Search, Index, Bulk, Get are collected from ThreadPoolStats.Stats#rejected . |
1 second |
Sent Data | Size of TX packets sent by the node during internal cluster communication is collected from TransportStats#tx_size |
1 second |
Received Data | Size of RX packets received by the node during internal cluster communication is collected from TransportStats#rx_size |
1 second |
Index metrics
Data point | Description | Granularity |
---|---|---|
Total Queries | The total number of query operations is collected from SearchStats.Stats#queryTotal |
1 second |
Queries Current | The number of query operations currently running is collected from SearchStats.Stats#queryCurrent |
1 second |
Fetches Total | The total number of fetch operations is collected from SearchStats.Stats#fetchCount |
1 second |
Fetches Current | The number of fetch operations currently running is collected from SearchStats.Stats#fetchCurrent |
1 second |
Query Time | Time in milliseconds spent performing query operations is collected from SearchStats.Stats#queryTimeInMillis |
1 second |
Fetch Time | Time in milliseconds spent performing fetch operations is collected from SearchStats.Stats#fetchTimeInMillis |
1 second |
Query Cache Evictions | The number of query cache evictions is collected from QueryCacheStats#evictions |
1 second |
Request Cache Evictions | The number of request cache evictions is collected from RequestCacheStats#evictions |
1 second |
Get Requests | The total number of Get request is collected from GetStats#count |
1 second |
Get Requests Time | Time in milliseconds spent on Get requests is collected from GetStats#timeInMillis |
1 second |
Failed Get Requests | The number of failed Get requests is collected from GetStats#missingCount |
1 second |
Failed Get Requests Time | Time in milliseconds spent on failed Get requests is collected from GetStats#missingTimeInMillis |
1 second |
Indexing Operations Failed | The number of failing indexing operations is collected from IndexingStats#indexFailedCount |
1 second |
Active Merges Count | The current number of merges executing is collected from MergeStats#current |
1 second |
Total Merges Size | The total size of merges executed is collected from MergeStats#totalSizeInBytes |
1 second |
Total Merges Time | The total time merges have been executed is collected from MergeStats#totalTimeInMillis |
1 second |
Index metrics mentioned above are going to be enabled for indices configured via regular expression indicesRegex
in the agent configuration.
Health Signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about built-events for the Elasticsearch Node, see the Built-in events reference.
Cluster-Level
Configuration data
- Name
- Health Status
- Nodes, Masters
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | Query latency is calculated as max query latency of all nodes. | 1 second |
Number of Queries | Query count is calculated as query count sum for all nodes. | 1 second |
Overall Documents | Total Documents is calculated as sum of overall documents for all nodes. | 1 second |
Added Documents | Added Documents is calculated as sum of added documents for all nodes. | 1 second |
Removed Documents | Removed Documents is calculated as sum of removed documents for all nodes. | 1 second |
Indices | Number of indices | 1 second |
Shards | Active, Active Primary, Initializing, Relocating, Unassigned is collected from ClusterHealth . |
1 second |
Cluster State size | Size of the ClusterState . |
1 second |
Health Signatures
For information about built-events for the Elasticsearch Cluster, see the Built-in events reference.