Infrastructure Metrics
Infrastructure metrics include system metrics and container metrics. For information about container metrics, see Container Metrics.
System Metrics
Monitor the following system metrics to analyze API Data Store health.
- CPU usage
- Disk usage
- Memory usage
Monitor the CPU usage
To ensure that CPU is not over utilized, you must monitor the CPU health regularly. You can monitor the CPU usage at two levels: process level and OS level. If the process level CPU is utilized beyond the threshold limits, you can share the load. However, if the OS level CPU has reached its limits, you must contact your IT team.
Command / Metric | Description |
---|---|
curl -X GET
http://localhost:9240/_nodes/stats/process?pretty
|
This command retrieves the CPU utilization by the API Data Store pods. |
$.nodes.nodeid.process.cpu.percent
|
This JSON path expression retrieves the
percentage of CPU usage by an API Data Store pod.
If a pod is using
80% of the CPU space for more
than
15
minutes, consider the severity as
WARNING and perform the
following steps to identify the causes of higher CPU usage.
If a pod is using 90% of the CPU space for more than 15 minutes, look for the following Prometheus metrics:
|
elasticsearch_os_cpu_percent |
If elasticsearch_os_cpu_percent is more than
90%, consider the severity as
CRITICAL and perform the
following steps to identify the causes of higher CPU usage.
|
elasticsearch_process_cpu_percent |
If elasticsearch_process_cpu_percent is more than 90%, consider the severity as CRITICAL and add a new node to the cluster. To learn more about how to add a new API Data Store node, see Adding New Nodes to an Elasticsearch Cluster. |
Monitor the Disk usage
To ensure that all nodes have enough disk space, IBM recommends to monitor the disk space regularly.
Command | Description |
---|---|
curl -X GET
http://localhost:9240/_nodes/stats/fs
|
This command retrieves the disk space of
the API Data Store nodes. It lists the disk space available in all nodes.
For more information about Elasticsearch node statistics, see Elasticsearch documentation. |
$.nodes..fs.total.total_in_bytes
|
This JSON path expression retrieves the total disk space. |
$.nodes..fs.total.free_in_bytes
|
This JSON path expression retrieves the free disk space. |
.nodes..fs.total.available_in_bytes
|
This JSON path expression retrieves the available disk space. |
Command | Description |
---|---|
curl -X GET
http://localhost:9240/_cluster/settings?pretty
|
This command retrieves the configured
disk-based shard allocations in API Data Store. To learn more about disk-based
shard allocations, see
Elasticsearch documentation.
The shard allocation is based on the thresholds known as
|
Shard allocation:
Low
|
The default threshold for this level is
80%. Once the threshold is
reached, API Data Store does not allocate new shards to nodes that have used
more than 80% disk space. You can calculate if the disk usage is low by using
the expression ( average disk usage of the API Data Store cluster /
standalone). If the result of this expression exceeds the defined threshold
(80%), the disk has reached the
|
Shard allocation:
High
|
The default threshold for this level is
85%. Once the threshold is
reached, API Data Store attempts to relocate shards away from a node whose disk
usage is above 85%. You can calculate if the disk usage is low by using the
expression ( average disk usage of the API Data Store cluster / standalone). If
the result of this expression exceeds the defined threshold (85%), the disk has
reached the
|
Shard allocation:
Flood
|
The default threshold for this level is 90%. Once the threshold is reached, API Data Store enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is the last resort to prevent nodes from running out of disk space. You can calculate if the disk usage is in flood stage, by
using the expression ( average disk usage of the API Data Store cluster /
standalone). If the result of this expression exceeds the defined threshold
(90%), the disk is in the
|
curl -X GET
http://localhost:9240/_nodes/stats/metric
|
This command retrieves information about
specific metrics like fs, http, os, process, and so on.
For more information about the corresponding metrics, see Elasticsearch documentation. |
Monitor the Memory usage
Command | Description |
---|---|
http://HOST:9240/_nodes/nodeid/stats/os
|
This URL retrieves the memory status utilized by the API Data Store pods. |
http:URL/nodes?v&full_id=true&h=id,name,ip
|
This URL retrieves the node id of the corresponding API. This returns the node id, node name, and the node IP address. |
$.nodes.nodeid.os.mem.free_percent
|
This JSON expression retrieves the
percentage of memory that is free.
If a pod is using 85% of the available memory, consider the severity as WARNING, and identify the process that consumes more memory and generate the heap dump. If a pod is using
90% of the available memory,
consider the severity as
CRITICAL, and perform the
following steps to identify the reason.
|