Monitoring Elasticsearch

The Elasticsearch sensor is automatically deployed and installed after you install the Instana agent.

Supported operating systems

The supported operating systems of the Elasticsearch sensor are consistent with host agent requirements, which you can check in the Supported operating systems section of each host agent, such as Supported operating systems for Unix.

Supported versions

Instana now supports monitoring the metrics and configuration data for Elasticsearch 0.17.0 to 8.13.2.

Configuration

Instana automatically monitors up to 1000 indices and collects 5 most important metrics per index. To enable in-depth index monitoring that gathers 20 metrics per index for up to 200 indices, you need to specify indicesRegex in the <agent_install_dir>/etc/instana/configuration.yaml agent configuration file as shown:

com.instana.plugin.elasticsearch:
  enabled: true
  indicesRegex: '<INSERT_INDEX_REGEX_HERE>' # eg. 'env-prod.*'

Metrics collection

To view the metrics, complete the following steps:

  • From the navigation menu of the Instana UI, select Infrastructure.
  • Click a specific monitored host where Elasticsearch is installed.

You can see host dashboard with the following performance metrics, configuration data, and health signatures.

Node-Level

Configuration data

  • Version
  • Cluster
  • Health Status
  • Node Name
  • Node Type
  • Node is Master
  • Node is Master Eligible
  • Transport
  • HTTP
  • Log Directory
  • Shards
  • Indices

Performance metrics

Data point Description Granularity
Query Latency The query latency is collected from NodeIndicesStats#SearchStats. 1 second
Number of Queries The query count per second is collected from NodeIndicesStats#SearchStats. 1 second
Overall Documents The total number of documents is collected from DocsStats#count. 1 second
Added Documents The total number of indexing operations is collected from IndexingStats#indexCount. 1 second
Removed Documents The number of delete operations that are executed is collected from IndexingStats#deleteCount. 1 second
Active Shards The number of active shards is collected from IndexRoutingTable#ShardRouting. 1 second
Active Primary Shards The number of active primary shards is collected from IndexRoutingTable#ShardRouting. 1 second
Refresh Count The number of refreshes that are executed per second is collected from NodeIndicesStats#RefreshStats. 1 second
Refresh Time The total time merges that are executed is collected from NodeIndicesStats#RefreshStats. 1 second
Flush Count The total number of flushes that are executed per second is collected from NodeIndicesStats#FlushStats. 1 second
Flush Time The total time merges that are executed is collected from NodeIndicesStats#FlushStats. 1 second
Indices metrics Documents count, Deleted count, and Size per index is collected from IndexStats#DocsStats. 1 second
Lucene Segments The number of segments is collected from NodeIndicesStats#SegmentsStats#count. 1 second
Active Threads Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#active. 1 second
Queued Threads Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#queue. 1 second
Rejected Threads Search, Index, Bulk, Get are collected from ThreadPoolStats.Stats#rejected. 1 second
Sent Data The size of TX packets that are sent by the node during internal cluster communication is collected from TransportStats#tx_size 1 second
Received Data The size of RX packets that are received by the node during internal cluster communication is collected from TransportStats#rx_size 1 second

Index metrics

Data point Description Granularity
Total Queries The total number of query operations is collected from SearchStats.Stats#queryTotal 1 second
Queries Current The number of query operations that are currently running is collected from SearchStats.Stats#queryCurrent 1 second
Fetches Total The total number of fetch operations is collected from SearchStats.Stats#fetchCount 1 second
Fetches Current The number of fetch operations that are currently running is collected from SearchStats.Stats#fetchCurrent 1 second
Query Time Time in milliseconds that is spent in executing query operations is collected from SearchStats.Stats#queryTimeInMillis 1 second
Fetch Time Time in milliseconds that is spent in executing fetch operations is collected from SearchStats.Stats#fetchTimeInMillis 1 second
Query Cache Evictions The number of query cache evictions is collected from QueryCacheStats#evictions 1 second
Request Cache Evictions The number of cache eviction requests is collected from RequestCacheStats#evictions 1 second
Get Requests The total number of Get requests is collected from GetStats#count 1 second
Get Requests Time Time in milliseconds spent on Get requests is collected from GetStats#timeInMillis 1 second
Failed Get Requests The number of failed Get requests is collected from GetStats#missingCount 1 second
Failed Get Requests Time Time in milliseconds that is spent on failed Get requests is collected from GetStats#missingTimeInMillis 1 second
Indexing Operations Failed The number of failed indexing operations is collected from IndexingStats#indexFailedCount 1 second
Active Merges Count The current number of merges that are executed is collected from MergeStats#current 1 second
Total Merges Size The total size of merges that are executed is collected from MergeStats#totalSizeInBytes 1 second
Total Merges Time The total time for merges that are executed is collected from MergeStats#totalTimeInMillis 1 second

Index metrics that are mentioned in the Index metrics section are enabled for indices that are configured through indicesRegex regular expression in the agent configuration.

Health Signatures

Each sensor has a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents that depend on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any specific entity.

For more information about built-events for the Elasticsearch node, see the Built-in events reference.

Cluster-Level

Configuration data

  • Name
  • Health Status
  • Nodes, Masters

Performance metrics

Data point Description Granularity
Query Latency The query latency is calculated as the maximum query latency of all nodes. 1 second
Number of Queries The query count is calculated as the sum of the query count for all nodes. 1 second
Overall Documents The Overall Documents is calculated as the sum of overall documents for all nodes. 1 second
Added Documents The sum of the documents that are added for all nodes. 1 second
Removed Documents The sum of the documents that are removed for all nodes. 1 second
Indices Number of indices 1 second
Shards Active, Active Primary, Initializing, Relocating, Unassigned are collected from ClusterHealth. 1 second
Cluster State size The size of the ClusterState. 1 second

Health Signatures

For more information about built-events for the Elasticsearch cluster, see the Built-in events reference.