Resolving reports on full disk or watermark reached

When the disk becomes full, the Elasticsearch and Logstash services generate TOO_MANY_REQUESTS/12/index read-only errors in the Elasticsearch and Logstash runtime log files, forces the cluster to read-only and cannot accept write operations. Essentially, there is not enough disk space to write any more data to the existing hosts.

For more information about the default Elastic Stack log locations, see Elastic Stack troubleshooting.

Elasticsearch and Logstash services report TOO_MANY_REQUESTS/12/index read-only errors

The Elasticsearch and Logstash services generate TOO_MANY_REQUESTS/12/index read-only errors when the Elasticsearch data instances no longer can write any more data.

A safeguard is in place that sets the read_only_allow_delete parameter to true, when the hard disk space is low that makes the Elasticsearch cluster read-only and indicates do not accept write operations. As a result, documents are not indexed into Elasticsearch. You must clean up the storage and verify you have sufficient space. For more information about increasing disk space, see Configuring Elasticsearch disk usage.

When you have significant disk space, you must explicitly run the following command to modify the settings back so that the data can write again:
curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -H'Content-Type: application/json' -XPUT $es_protocol://$es_hostname:$es_port/_settings -d '{"index":{ "blocks":{"read_only_allow_delete":"false"}}}'
The definitions of each variable:
es_protocol
Specifies the protocol for the URL. Use http if security is not enabled, or use https if security is enabled.
es_hostname
Specifies the hostname of the Elasticsearch client node.
es_port
Specifies the port that is used for communication to the Elasticsearch primary node. By default, the port is 9200. For more information, see Summary of ports used by IBM Spectrum Conductor.

Elasticsearch service remains in the TENTATIVE state if the cluster is restarted when host disk reaches the high disk watermark

There are Elasticsearch configurations to control disk-based allocation and the reallocation of data from one note to another.

Depending on the values that are configured for these parameters, the Elasticsearch service can hang in the TENTATIVE when the service reaches its parameter limitations. The Elasticsearch service remains in the TENTATIVE state if the cluster is restarted when host disk usage is equal to or higher than the value of the cluster.routing.allocation.disk.watermark.high parameter.

As a best practice, do not restart the cluster when the disk usage reaches or exceeds this high disk watermark. If you restart the cluster and encounter this error, clean up the disk space and verify you have sufficient disk space. For more information about increasing disk space, see Configuring Elasticsearch disk usage.

Follow this high-level troubleshooting process to isolate hosts with this disk usage error:

  1. Check the Elasticsearch state by using the following command:
    curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XGET $es_protocol://$es_hostname:$es_port/_cluster/health?pretty --tlsv1.2

    If the cluster is in the red state, the Elasticsearch service remains in the TENTATIVE state until all primary shards are active.

  2. Run the following command to see all shards and resolve any primary shards that are not in the STARTED state:
    curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XGET $es_protocol://$es_hostname:$es_port/_cat/shards --tlsv1.2

    Before a shard can be used, it goes through the INITIALIZING state. If a shard cannot be assigned, the shard remains in the UNASSIGNED state with a reason code. For a list of these reasons that a primary shard might not be started, see Reasons for unassigned shard.