Circuit breaker for Kafka robustness and stability

Enable Cloud Pak for AIOps to protect itself and remain up and running if an event storm occurs. You can implement a circuit breaker pattern to avoid Cloud Pak for AIOps failing and ensure that events, alerts, and incidents continue to be processed.

The following sections show the end-to-end flow of configuring a circuit breaker for Kafka robustness:

  • Configuring prometheus.
  • Creating pod monitors.
  • Creating prometheus rules for generating alerts specific to Cloud Pak for AIOps Kafka storage and health.
  • Push alerts generated in OpenShift to Cloud Pak for AIOps by using a webhook integration.
  • Create Cloud Pak for AIOps policies to generate incidents.
  • Integrate incidents with Cloud Pak for AIOps automations.

Note: There is some overlap with the initial two steps of these instructions and the procedure for Monitoring with Prometheus and Grafana.

Use cases for circuit breaker

  • Monitoring the health of Kafka for restarts, failures, and so on. Sending a high priority alert that different components can take action on.

  • Monitor and react to the amount of storage being used by the Kafka PVC.

  • Too much data to Kafka and the databases it feeds into runs out of storage.

View the following sub-topics for details on implementing a circuit breaker pattern for Kafka:

Note: There is some overlap with the initial two steps of these instructions and the procedure for Monitoring with Prometheus and Grafana.