Circuit breaker for Kafka robustness and stability
Enable Cloud Pak for AIOps to protect itself and remain up and running if an event storm occurs. You can implement a circuit breaker pattern to avoid Cloud Pak for AIOps failing and ensure that events, alerts, and incidents continue to be processed.
The following sections show the end-to-end flow of configuring a circuit breaker for Kafka robustness:
- Configuring prometheus.
- Creating pod monitors.
- Creating prometheus rules for generating alerts specific to Cloud Pak for AIOps Kafka storage and health.
- Push alerts generated in OpenShift to Cloud Pak for AIOps by using a webhook integration.
- Create Cloud Pak for AIOps policies to generate incidents.
- Integrate incidents with Cloud Pak for AIOps automations.
Note: There is some overlap with the initial two steps of these instructions and the procedure for Monitoring with Prometheus and Grafana.
Use cases for circuit breaker
-
Monitoring the health of Kafka for restarts, failures, and so on. Sending a high priority alert that different components can take action on.
-
Monitor and react to the amount of storage being used by the Kafka PVC.
-
Too much data to Kafka and the databases it feeds into runs out of storage.
View the following sub-topics for details on implementing a circuit breaker pattern for Kafka:
- Creating a cluster monitoring configmap object
- Creating a user-defined workload monitoring configmap
- Configuring core OpenShift Container Platform monitoring components
- Configuring components that monitor user-defined projects
- Creating Prometheus rule to generate OpenShift alerts
- Creating Prometheus rules to generate alerts for storage issues.
- Push alerts that are generated in OpenShift into Cloud Pak for AIOps by using a webhook integration
- Creating a policy to promote critical Prometheus alerts to an incident
Note: There is some overlap with the initial two steps of these instructions and the procedure for Monitoring with Prometheus and Grafana.