Infrastructure Metrics
Infrastructure metrics include system metrics and container metrics. For information about container metrics, see Container Metrics.
System Metrics
Monitor the following system metrics to analyze Elasticsearch health.
- CPU usage
- Disk usage
- Memory usage
Monitor the CPU usage
To ensure that CPU is not over utilized, you must monitor the CPU health regularly. You can monitor the CPU usage at two levels: process level and OS level. If the process level CPU is utilized beyond the threshold limits, you can share the load. However, if the OS level CPU has reached its limits, you must contact your IT team.
Command / Metric | Description |
---|---|
curl -X GET
http://localhost:9240/_nodes/stats/process?pretty
|
This command retrieves the CPU utilization by the external Elasticsearch pods. |
$.nodes.nodeid.process.cpu.percent
|
This JSON path expression retrieves the percentage of CPU usage
by an Elasticsearch pod. If a pod is using 80% of the CPU space for more than 15
minutes, consider the severity as WARNING and
perform the following steps to identify the causes of higher CPU usage.
If a pod is using 90% of the CPU space for more than 15 minutes, look for the following Prometheus metrics:
|
elasticsearch_os_cpu_percent |
If elasticsearch_os_cpu_percent is more than
90%, consider the severity as
CRITICAL and perform the
following steps to identify the causes of higher CPU usage.
|
elasticsearch_process_cpu_percent |
If elasticsearch_process_cpu_percent is more than 90%, consider the severity as CRITICAL and add a new node to the cluster. To learn more about how to add a new external Elasticsearch node, see Adding New Nodes to an Elasticsearch Cluster. |
Monitor the Disk usage
To ensure that all nodes have enough disk space, it is recommended to monitor the disk space regularly.
Command | Description |
---|---|
curl -X GET
http://localhost:9240/_nodes/stats/fs
|
This command retrieves the disk space of the external
Elasticsearch nodes. It lists the disk space available in all nodes. For more information about Elasticsearch node statistics, see Elasticsearch documentation. |
$.nodes..fs.total.total_in_bytes
|
This JSON path expression retrieves the total disk space. |
$.nodes..fs.total.free_in_bytes
|
This JSON path expression retrieves the free disk space. |
.nodes..fs.total.available_in_bytes
|
This JSON path expression retrieves the available disk space. |
Command | Description |
---|---|
curl -X GET
http://localhost:9240/_cluster/settings?pretty
|
This command retrieves the configured disk-based shard
allocations in Elasticsearch. To learn more about disk-based shard
allocations, see Elasticsearch documentation. The shard
allocation is based on the thresholds known as |
Shard allocation:
Low
|
The default threshold for this level is 80%. Once
the threshold is reached, Elasticsearch does not allocate new shards to
nodes that have used more than 80% disk space. You can calculate if the disk
usage is low by using the expression ( average disk usage of the external
Elasticsearch cluster / standalone). If the result of this expression exceeds
the defined threshold (80%), the disk has reached the
|
Shard allocation:
High
|
The default threshold for this level is 85%. Once
the threshold is reached, Elasticsearch attempts to relocate shards
away from a node whose disk usage is above 85%. You can calculate if the disk
usage is low by using the expression ( average disk usage of the external
Elasticsearch cluster / standalone). If the result of this expression exceeds
the defined threshold (85%), the disk has reached the
|
Shard allocation:
Flood
|
The default threshold for this level is 90%. Once the threshold is reached, Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is the last resort to prevent nodes from running out of disk space. You can calculate if the disk usage is in flood stage, by using the expression (
average disk usage of the Elasticsearch cluster / standalone). If the
result of this expression exceeds the defined threshold (90%), the disk is in
the
|
curl -X GET
http://localhost:9240/_nodes/stats/metric
|
This command retrieves information about
specific metrics like fs, http, os, process, and so on.
For more information about the corresponding metrics, see Elasticsearch documentation. |
Monitor the Memory usage
Command | Description |
---|---|
http://HOST:9240/_nodes/nodeid/stats/os
|
This URL retrieves the memory status utilized by the external Elasticsearch pods. |
http:URL/nodes?v&full_id=true&h=id,name,ip
|
This URL retrieves the node id of the corresponding API. This returns the node id, node name, and the node IP address. |
$.nodes.nodeid.os.mem.free_percent
|
This JSON expression retrieves the
percentage of memory that is free.
If a pod is using 85% of the available memory, consider the severity as WARNING, and identify the process that consumes more memory and generate the heap dump. If a pod is using
90% of the available memory,
consider the severity as
CRITICAL, and perform the
following steps to identify the reason.
|