Elastic Stack troubleshooting
Follow this high-level troubleshooting process to isolate and resolve problems with
Elastic Stack, for example, when a service in the Elastic Stack is in the Error
state or remains in the TENATIVE
state.
- For information about important configurations, what to monitor, and how to diagnosis and
prevent problems, see the following Elastic documentation:
- Monitoring
- Cluster HealthTip: A red cluster indicates that at least one primary shard and all of its replicas are missing. As a result, the data in that shard is not available, searches return partial results, and indexing into that shard return errors.
- Monitoring Individual Nodes to troubleshoot each node. Identify the troublesome indices and determine why the shards are not available. Check the disks or review the logs for errors and warnings. If the issue stems from node failure or hard disk failure, take steps to bring the node online.
- cat API to view cluster statistics.
- Based on the type of error you encounter, refer to the appropriate Elastic Stack log file.
Table 1. Elastic Stack log files Log file Default log location - Elastic Stack manager service log
- Standard out or error log.
$EGO_TOP/integration/elk/log/manager-[out|err].log.* - Elasticsearch service log
- Standard out or error log for the primary, client, or data service.
$EGO_TOP/integration/elk/log/es-[out|err].log.[master|client|data].* - Elasticsearch runtime log
- Runtime log for the primary, client, or data service.
$EGO_TOP/integration/elk/log/elasticsearch/*.log.[master|client|data]_* - Logstash (indexer) service log
- Standard out or error log.
$EGO_TOP/integration/elk/log/indexer-[out|err].log.* - Logstash (indexer) runtime log
- Runtime log.
$EGO_TOP/integration/elk/log/logstash/logstash-plain.log.* - Filebeat (shipper) service log
- Standard out or error log.
$EGO_TOP/integration/elk/log/shipper-[out|err].log.* - Filebeat (shipper) runtime log
- Runtime log.
$EGO_TOP/integration/elk/log/filebeat/filebeat.log.* - Resolve any of the following problems that might occur:
- Out of memory exception or Java heap size reached
- The default Elasticsearch installation uses 10 GB heap for the Elasticsearch services and 4 GB for Logstash service, which satisfies the 24 GB of RAM for IBM® Spectrum Conductor system requirements. If your hosts have more than 24 GB memory and you need to increase the heap such as for system performance reasons, you can increase the Elasticsearch and Logstash heap sizes in IBM Spectrum Conductor. For more information about increasing the heap, see Tuning the heap sizes for Elasticsearch and Logstash to accommodate heavy load.
- Disk full or watermark is reached
- The Elasticsearch service can remain in the TENTATIVE state when it reaches the limitations that are defined in the Elasticsearch watermark parameters.
- Too many buckets exception thrown or charts not displaying properly
- In the Resource Usage page within the cluster or instance group management console, if you have many applications, charts might not display properly and you encounter exceptions about failing to retrieve data and requiring more buckets for aggregation.
- Red cluster or UNASSIGNED shards
- The Elasticsearch service can remain in the TENTATIVE state when at least one primary shard and all its replicas are missing.