Determining your current shard consumption

Monitor the number of indexes and shards currently in use so that you can plan your AI training accordingly.

About this task

When log data is collected for IBM Cloud Pak® for AIOps, the data is persisted within Elasticsearch for training purposes. While the collection of historical or live data for training is occurring or after the data collection completes, you can connect to Elasticsearch to check the progress and look at the indexes.

Connect to Elasticsearch

  1. Log in to your Red Hat® OpenShift® cluster, or SSH into a control plane node or the loadbalancer VM on your Linux® deployment. Then swtich to the namespace where IBM Cloud Pak for AIOps is installed.

    oc project <project>
    

    Where <project> is the project that your IBM Cloud Pak for AIOps installation is deployed in.

  2. Run the following command to list the indices stored in Elasticsearch. Look for index files that have a 1000-1000-<date>-logtrain name format, such as 1000-1000-20210510-logtrain.

    export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')"
    oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/_cat/indices?v" | sort
    

    Example output, where the index name is 1000-1000-20220302-logtrain:

    green open 1000-1000-20220302-logtrain        eEvaBufQ06yG074hmeHKG  8 1 244 6 72.7kb 72.7kb 
    

    The following table lists the elements in this line of output:

    Table. elements
    Element Description Example
    Index health Health of the index. green
    Index Status Status of the index. open
    Index name Name of the index. 1000-1000-20220302-logtrain
    UUID Universal unique identifier for this index. eEvaBufQ06yG074hmeHKG
    Number of shards Number of shards required by default for this index. This value varies based on the type of index. For example, an index that is associated with a day of ingested log data for natural language log anomaly training requires 8 shards for both a large and a small installation. 8
    Number of replicas Number of replicas required by default for this index. This value varies based on the type of index and the size of the installation. For example, an index that is associated with a day of ingested log data for natural language log anomaly training requires 1 replica for a large installation and 0 replicas for a small installation. A replica value of 1 replicates all of the shards for that index, meaning that a shard replica setting of 8/1 assigns 16 shards to the index. 1
    docs.count Detailed statistic associated with this index. 244
    docs.deleted Detailed statistic associated with this index. 6
    Store size Detailed statistic associated with this index. 72.7 kb
    Primary store size Detailed statistic associated with this index. 72.7 kb
  3. Check the count of the records for an index by using the following command:

    export INDEX_NAME=<index name>
    export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')"
    oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/${INDEX_NAME}/_count"
    

    Where <index_name> is the name of an index returned by the previous step

    Example output:

    {"count":318713,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
    
  4. You can also look at sample data by using the following command:

    export INDEX_NAME=<index name>
    export ELASTIC_SET="$(oc get statefulset -l app.kubernetes.io/managed-by=ibm-elasticsearch -o jsonpath='{.items[0].metadata.name}')"
    oc exec "statefulset/${ELASTIC_SET}" -- curl -XGET -k --fail "http://127.0.0.1:19200/${INDEX_NAME}/_search"
    

    Where <index_name> is the name of an index returned by the previous step.

Shard consumption in a large installation

Use the following table to help interpret the results returned. All values that are provided are for a production (large) installation.

Table. shard consumption
N. AI component Index name Shard usage Notes
Natural language log anomaly detection (total) Varies
1 Each day of ingested log data 1000-1000-date-logtrain 16 The index for each extra day of data uses 16 shards.
2 Each model version total 1000-1000-v_x-contenttype 50 The index for each extra model version uses 50 shards.
3 Dedicated index for saving training data 1000-1000-windowed_logs 5
4 Dedicated index for deploying models 1000-1000-log_modles_latest 5
Statistical baseline log anomaly detection (total) 20
5 Statistical baseline log anomaly detection 1000-1000-log_models_latest 5
6 Statistical baseline log anomaly detection 1000-1000-reference_embedding 5
7 Statistical baseline log anomaly detection 1000-1000-reference_oob 5
8 Statistical baseline log anomaly detection 1000-1000-si_models_latest 5
9 Change risk 1000-1000-cr_models_latest
10 Similar tickets 1000-1000-si_models_latest 10
Lifecycle services Varies
11 Each service aiops-searchservice-v10 datetime 5 The index for each extra lifecycle service uses 5 shards.
AI platform 12
12 AI platform postchecktrainingdetails0 2
13 AI platform prechecktrainingdetails0 2
14 AI platform trainingdefinition 2
15 AI platform trainingrun 2
16 AI platform trainingrunning 2
17 AI platform trainingstatus 2

Best practices

An analysis of the table in Shard consumption in a large installation shows that the most effective way to keep shard usage down is to follow these practices:

  • Train natural language log anomaly detection on a smaller number of days. As you can see from the table row that is labeled 1, the index for each extra day of data uses 16 shards. The recommended number of days for training is 14.

    Note To avoid the possibility of running out of shards, a periodic cron job utility runs every hour to delete all the logtrain elastic indexes that have an index creation date older than 14 days from the current timestamp.

  • Regularly delete new versions of natural language log anomaly detection models. As you can see from the table row that is labeled 2, the index for each extra model version uses 50 shards.

  • When you define a log integration, such as for Elasticsearch, Logstash, and Kibana (ELK), Humio, or Mezmo, the Log Ops UI in IBM Cloud Pak for AIOps doesn't show the start and end date time zone when you use the Historical data for initial AI training data flow mode. You can't use the Log Ops UI to enter the start and end times. Instead, you can specify only the start and end dates. The start and end date time zone is Greenwich mean time (GMT).