Application Metrics
- Index size
- Cluster health
- Number of shards
- GC monitoring
For details about how to generate thread dump and heap dump, see Troubleshooting: Monitoring API Data Store.
If the metrics return an exceeded threshold value, consider the severity as mentioned and perform the possible actions that Software AG recommends to identify and debug the problem and contact IBM for further support.
Index Size
- Faster start-up of Elasticsearch. Multiple smaller indexes instead of one huge index allows Elasticsearch to start up faster.
- Faster response. When you store all data in a single index, then Elasticsearch slows down since it spends a lot of time in shard allocation. Chunking of data in smaller units helps in avoiding this time consumption.
Each index has two divisions; the primary shard and the replica shard. The data is first stored in primary shard. Elasticsearch replicates the data in the primary shard as replica shard. For example, when you allot 25 GB for an index, the space is equally divided for both divisions of an index. As per the example, the size of all indexes total up to a maximum of 300 GB. That is, 150 GB is for primary data and the 150 GB for replica shards. Replication of primary data enables Elasticsearch to make it highly available.
When data on a particular index exceeds a certain limit, it is essential to roll over the index and create a new index. The acceptable size limit of an index depends on its type. Software AG recommends that you specify 25 GB (12.5 for each shard) for the transactional events indexes and 5 GB (2.5 for each shard) for tracer indexes.
It is essential to monitor the transactional events indexes to prevent them exceeding 25 GB of size. For information on calculating index size, see Calculating index size.
You must rollover an index when the size of the primary shard is 12.5 GB. That is, if the size of the primary index is 12.5 GB, then the size of replica will also be 12.5 GB. Hence, you must monitor the size of primary index and perform rollovers as and when required.
When you rollover an index, a new index is created with a primary and a replica for each shard. The naming convention of the new index is Index_name_YYYYMMDDHHMM. For example, gateway_default_analytics_transactionalEvents_YYYYMMDDHHMM. For information on creating a rollover, see Creating Rollover of an Index.
Calculating index size
The query used to calculate the index size returns the primary shard of an index. Hence, you must calculate the actual index size by multiplying the returned size by two. For example, if you want to purge indexes that are beyond 25 GB, then you must purge the indexes whose size are 12.5 GB.
- Run the following command:
For example,http://localhost:9240/_cat/indices/gateway_tenant_index_name? v&s=i&format=json&pretty
Sample output.:http://localhost:9240/_cat/indices/ gateway_default_analytics_transactionalevents_1639736462002-000001? v&s=i&format=json&pretty
[ { "health" : "yellow", "status" : "open", "index" : "gateway_default_analytics_transactionalevents_1639736462002-000001", "uuid" : "2tmWIIAcQ1KeSqIg9iPU0g", "pri" : "5", "rep" : "1", "docs.count" : "663", "docs.deleted" : "0", "store.size" : "909.8kb", "pri.store.size" : "909.8kb" } ]
API Data Store Cluster Health
To ensure optimal health and performance of API Data Store, IBM recommends monitoring the API Data Store cluster health regularly.
Command | Description |
---|---|
curl -X GET
http://localhost:9240/_cluster/health?pretty
|
This command retrieves API Data Store cluster health status. |
$.status
|
This JSON path expression retrieves the cluster health status from the response. |
$.number_of_nodes
|
This JSON path expression retrieves the number of nodes in the cluster from the response. |
The response JSON of the health check request displays a status field
in the response. The status can have the values
green
,
yellow
or
red
. The cluster health status is displayed based on
the following color codes:
Status | Description |
---|---|
green
|
Indicates that the cluster is in a healthy state. When API Data Store is handling huge data, it takes some time to display the cluster health status. |
yellow
|
Indicates that the cluster is not in a
healthy state. Identify the cause and rectify it. During this time, API Data
Store processes the requests for the index that is available. If there are
unassigned shards, then identify the unassigned shards, check the reason for
the unallocation and resolve the issue.
|
red
|
Indicates that API Data Store nodes are down or not reachable or the API Data Store master is not discovered. If the number of nodes does not match the number of API Data Store nodes configured, identify the node that did not join the cluster and identify the root cause for the node to not join the cluster. Based on the root cause, identify if your API Data Store is down. If your API Data Store is down and not reachable, check the connectivity. |
{
"cluster_name": "SAG_apidatastore_cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 101,
"active_shards": 202,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
The overall cluster status is
green
since all API Data Store nodes work as expected.
Number of shards
To ensure proper allocation of shards to nodes, IBM recommends to monitor the number of shards regularly.
Command | Description |
---|---|
curl -X GET
"http://localhost:9240/_cluster/health?pretty"
|
This command retrieves the number of shards
on API Data Store.
If the total number of active shards from the response
exceeds the
API Data Store considers a maximum of 20 active shards per GB of heap space as healthy. Perform any of the following actions to maintain the total number of active shards:
To increase the heap space, modify the parameters Xms2g and Xmx2g in the jvm.options file located at SAG_Install_Directory\InternalDataStore\config. |
Garbage Collection (GC) Monitoring
The GC metric provides the GC run-time in seconds. You must check GC run-time once every five minutes. The average GC run-time should not exceed one second.
Metric | Description |
---|---|
|
The quotient of both the metrics gives the GC run time. If the quotient is more than 1 second, it implies that GC is taking longer time to run and this slows down API Data Store request processing. You must collect the logs and get the mapping of API index and transaction index. |