Elastic Stack troubleshooting

Follow this high-level troubleshooting process to isolate and resolve problems with Elastic Stack, for example, when a service in the Elastic Stack is in the Error state or remains in the TENATIVE state.

For information about important configurations, what to monitor, and how to diagnosis and prevent problems, see the following Elastic documentation:
- Monitoring
- Cluster Health
  Tip: A red cluster indicates that at least one primary shard and all of its replicas are missing. As a result, the data in that shard is not available, searches return partial results, and indexing into that shard return errors.
- Monitoring Individual Nodes to troubleshoot each node. Identify the troublesome indices and determine why the shards are not available. Check the disks or review the logs for errors and warnings. If the issue stems from node failure or hard disk failure, take steps to bring the node online.
- cat API to view cluster statistics.

Based on the type of error you encounter, refer to the appropriate Elastic Stack log file.

Table 1. Elastic Stack log files
Log file	Default log location
Elastic Stack manager service log Standard out or error log.	$EGO_TOP/integration/elk/log/manager-[out\|err].log.*
Elasticsearch service log Standard out or error log for the primary, client, or data service.	$EGO_TOP/integration/elk/log/es-[out\|err].log.[master\|client\|data].*
Elasticsearch runtime log Runtime log for the primary, client, or data service.	$EGO_TOP/integration/elk/log/elasticsearch/.log.[master\|client\|data]_
Logstash (indexer) service log Standard out or error log.	$EGO_TOP/integration/elk/log/indexer-[out\|err].log.*
Logstash (indexer) runtime log Runtime log.	$EGO_TOP/integration/elk/log/logstash/logstash-plain.log.*
Filebeat (shipper) service log Standard out or error log.	$EGO_TOP/integration/elk/log/shipper-[out\|err].log.*
Filebeat (shipper) runtime log Runtime log.	$EGO_TOP/integration/elk/log/filebeat/filebeat.log.*

Resolve any of the following problems that might occur:

Out of memory exception or Java heap size reached

The default Elasticsearch installation uses 10 GB heap for the Elasticsearch services and 4 GB for Logstash service, which satisfies the 24 GB of RAM for IBM® Spectrum Symphony system requirements. If your hosts have more than 24 GB memory and you need to increase the heap such as for system performance reasons, you can increase the Elasticsearch and Logstash heap sizes in IBM Spectrum Symphony. For more information about increasing the heap, see Tuning the heap sizes for Elasticsearch to accommodate heavy load.

Disk full or watermark is reached

The Elasticsearch service can remain in the TENTATIVE state when it reaches the limitations that are defined in the Elasticsearch watermark parameters.

Consider cleaning up the space and increase the watermarks. For more information, see Configuring Elasticsearch disk usage.

If you see the TOO_MANY_REQUESTS/12/index read-only error in the logs, a safeguard is in place that sets the read_only_allow_delete parameter to true. You must clean up the storage and verify you have sufficient space before you can run a command to modify the settings back. For more information, see Resolving reports on full disk or watermark reached.

Red cluster or UNASSIGNED shards

The Elasticsearch service can remain in the TENTATIVE state when at least one primary shard and all its replicas are missing.

First, you must rule out the disk is full or watermark is reached. For more information, see Resolving reports on full disk or watermark reached. Next, see Resolving red cluster or UNASSIGNED shards.