Determining your current shard consumption
Monitor the number of indexes and shards currently in use so that you can plan your AI training accordingly.
About this task
When log data is collected for IBM Cloud Pak® for AIOps, the data is persisted within Elasticsearch for training purposes. While the collection of historical or live data for training is occurring or after the data collection completes, you can connect to Elasticsearch to check the progress and look at the indexes.
Connect to Elasticsearch
-
Log in to your Red Hat® OpenShift® cluster, or SSH into a control plane node or the loadbalancer VM on your Linux® deployment. Then swtich to the namespace where IBM Cloud Pak for AIOps is installed.
oc project <project>
Where
<project>
is the project that your IBM Cloud Pak for AIOps installation is deployed in. -
Run the following command to list the indices stored in Elasticsearch. Look for index files that have a
1000-1000-<date>-logtrain
name format, such as1000-1000-20210510-logtrain
.export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')" oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/_cat/indices?v" | sort
Example output, where the index name is
1000-1000-20220302-logtrain
:green open 1000-1000-20220302-logtrain eEvaBufQ06yG074hmeHKG 8 1 244 6 72.7kb 72.7kb
The following table lists the elements in this line of output:
Table. elements Element Description Example Index health Health of the index. green Index Status Status of the index. open Index name Name of the index. 1000-1000-20220302-logtrain UUID Universal unique identifier for this index. eEvaBufQ06yG074hmeHKG Number of shards Number of shards required by default for this index. This value varies based on the type of index. For example, an index that is associated with a day of ingested log data for natural language log anomaly training requires 8 shards for both a large and a small installation. 8 Number of replicas Number of replicas required by default for this index. This value varies based on the type of index and the size of the installation. For example, an index that is associated with a day of ingested log data for natural language log anomaly training requires 1 replica for a large installation and 0 replicas for a small installation. A replica value of 1 replicates all of the shards for that index, meaning that a shard replica setting of 8/1 assigns 16 shards to the index. 1 docs.count Detailed statistic associated with this index. 244 docs.deleted Detailed statistic associated with this index. 6 Store size Detailed statistic associated with this index. 72.7 kb Primary store size Detailed statistic associated with this index. 72.7 kb -
Check the count of the records for an index by using the following command:
export INDEX_NAME=<index name> export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')" oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/${INDEX_NAME}/_count"
Where
<index_name>
is the name of an index returned by the previous stepExample output:
{"count":318713,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
-
You can also look at sample data by using the following command:
export INDEX_NAME=<index name> export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')" oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/${INDEX_NAME}/_search"
Where
<index_name>
is the name of an index returned by the previous step.
Shard consumption in a large installation
Use the following table to help interpret the results returned. All values that are provided are for a production (large) installation.
N. | AI component | Index name | Shard usage | Notes |
---|---|---|---|---|
Natural language log anomaly detection (total) | Varies | |||
1 | Each day of ingested log data | 1000-1000- date-logtrain |
16 | The index for each extra day of data uses 16 shards. |
2 | Each model version total | 1000-1000-v _x-contenttype |
50 | The index for each extra model version uses 50 shards. |
3 | Dedicated index for saving training data | 1000-1000-windowed_logs |
5 | |
4 | Dedicated index for deploying models | 1000-1000-log_modles_latest |
5 | |
Statistical baseline log anomaly detection (total) | 20 | |||
5 | Statistical baseline log anomaly detection | 1000-1000-log_models_latest |
5 | |
6 | Statistical baseline log anomaly detection | 1000-1000-reference_embedding |
5 | |
7 | Statistical baseline log anomaly detection | 1000-1000-reference_oob |
5 | |
8 | Statistical baseline log anomaly detection | 1000-1000-si_models_latest |
5 | |
9 | Change risk | 1000-1000-cr_models_latest |
||
10 | Similar tickets | 1000-1000-si_models_latest |
10 | |
Lifecycle services | Varies | |||
11 | Each service | aiops-searchservice-v10 datetime |
5 | The index for each extra lifecycle service uses 5 shards. |
AI platform | 12 | |||
12 | AI platform | postchecktrainingdetails0 |
2 | |
13 | AI platform | prechecktrainingdetails0 |
2 | |
14 | AI platform | trainingdefinition |
2 | |
15 | AI platform | trainingrun |
2 | |
16 | AI platform | trainingrunning |
2 | |
17 | AI platform | trainingstatus |
2 |
Best practices
An analysis of the table in Shard consumption in a large installation shows that the most effective way to keep shard usage down is to follow these practices:
-
Train natural language log anomaly detection on a smaller number of days. As you can see from the table row that is labeled 1, the index for each extra day of data uses 16 shards. The recommended number of days for training is 14.
Note To avoid the possibility of running out of shards, a periodic cron job utility runs every hour to delete all the logtrain elastic indexes that have an index creation date older than 14 days from the current timestamp.
-
Regularly delete new versions of natural language log anomaly detection models. As you can see from the table row that is labeled 2, the index for each extra model version uses 50 shards.
-
When you define a log integration, such as for Elasticsearch, Logstash, and Kibana (ELK), Humio, or Mezmo, the Log Ops UI in IBM Cloud Pak for AIOps doesn't show the start and end date time zone when you use the Historical data for initial AI training data flow mode. You can't use the Log Ops UI to enter the start and end times. Instead, you can specify only the start and end dates. The start and end date time zone is Greenwich mean time (GMT).