List of issues with severity tuned in release 140

List of issues with severity changed from CRITICAL to WARNING

  • Frequent TCP fails
  • Elasticsearch is rejecting bulk and index requests
  • Permanent TCP retransmissions
  • Frequent TCP errors
  • System load too high
  • Abnormally high backend commit duration (more than 5 seconds).
  • Abnormally high snapshot duration (more than 5 seconds).
  • Usage of open file descriptors is critical.
  • Proposals being applied, started to fall, in last minute.
  • Failed proposals started to rise, in last minute.
  • Redis is rejecting connections
  • Thread creation is failing. (Varnish)
  • Varnish is out of worker threads, it is queueing up requests.
  • Flush all command executed.
  • Region server block cache hit ratio is as follows 40%.
  • Blocked Threadpools
  • Dropped Messages
  • Number of queued requests is larger than 500.
  • Too much memory used for the Elasticsearch heap leaving not enough memory for Lucene
  • Elasticsearch node status is RED
  • Elasticsearch node is at the capacity limit
  • Heap usage above 95%
  • Elasticsearch node is at the capacity limit while relocating/rebalancing
  • DL queue size is growing
  • Queue is filling
  • RabbitMQ Erlang Processes count is critical on node
  • RabbitMQ File Descriptors Usage is critical on node
  • HAProxy frontend session usage is bigger than 95%
  • Resource manager is reporting lost node(s)
  • Spark standalone master is reporting dead worker(s)
  • Scheduling delay is increasing too fast thus application is falling behind
  • Scheduling delay is too high and application is lagging
  • Kafka request handler thread is under high load
  • Kafka network thread is under high load
  • Kafka under-replicated partition
  • Maximum request latency is high. (more than 100ms).
  • Garbage Collection Activity High (when GC time is less than 80%)

List of issues with severity changed from WARNING to CRITICAL

  • CPU credit balance reached zero for this RDS instance.