List of issues with severity tuned in release 140
List of issues with severity changed from CRITICAL to WARNING
- Frequent TCP fails
- Elasticsearch is rejecting bulk and index requests
- Permanent TCP retransmissions
- Frequent TCP errors
- System load too high
- Abnormally high backend commit duration (more than 5 seconds).
- Abnormally high snapshot duration (more than 5 seconds).
- Usage of open file descriptors is critical.
- Proposals being applied, started to fall, in last minute.
- Failed proposals started to rise, in last minute.
- Redis is rejecting connections
- Thread creation is failing. (Varnish)
- Varnish is out of worker threads, it is queueing up requests.
- Flush all command executed.
- Region server block cache hit ratio is as follows 40%.
- Blocked Threadpools
- Dropped Messages
- Number of queued requests is larger than 500.
- Too much memory used for the Elasticsearch heap leaving not enough memory for Lucene
- Elasticsearch node status is RED
- Elasticsearch node is at the capacity limit
- Heap usage above 95%
- Elasticsearch node is at the capacity limit while relocating/rebalancing
- DL queue size is growing
- Queue is filling
- RabbitMQ Erlang Processes count is critical on node
- RabbitMQ File Descriptors Usage is critical on node
- HAProxy frontend session usage is bigger than 95%
- Resource manager is reporting lost node(s)
- Spark standalone master is reporting dead worker(s)
- Scheduling delay is increasing too fast thus application is falling behind
- Scheduling delay is too high and application is lagging
- Kafka request handler thread is under high load
- Kafka network thread is under high load
- Kafka under-replicated partition
- Maximum request latency is high. (more than 100ms).
- Garbage Collection Activity High (when GC time is less than 80%)
List of issues with severity changed from WARNING to CRITICAL
- CPU credit balance reached zero for this RDS instance.