Pre-defined health alerts

Your IBM Cloud Pak for AIOps deployment is automatically configured with comprehensive self-monitoring capabilities to detect and alert on critical issues such as storage capacity. These alerts are automatically sent from Red Hat OpenShift Container Platform's Alertmanager to Cloud Pak for AIOps.

Monitoring

Cloud Pak for AIOps includes the following automated monitoring capabilities:

Storage monitoring

Prometheus rules that are available by default automatically monitor all Cloud Pak for AIOps persistent volume claims (PVCs) and generate alerts when storage thresholds are exceeded:

  • Warning alerts are generated when PVC usage reaches 75% (25% available space remaining)
  • Critical alerts are generated when PVC usage reaches 85% (15% available space remaining)

PVCs for the following components are monitored:

  • Kafka
  • Zookeeper
  • Cassandra
  • CouchDB
  • EDB Postgres
  • Elasticsearch
  • MinIO
  • Redis
  • Zen Metastore EDB

Alert processing

Alerts generated by this automation can be identified by their alert.sender.name = Prometheus. Create policies to handle these alerts as needed, to learn more see Creating policies.

The following example policy shows a scope-based grouping:

Scope-based grouping

Automated webhook configuration

A webhook instance called selfmonitoring-webhook is automatically created during installation to:

  • Receive alerts from OpenShift Container Platform Alertmanager
  • Forward critical alerts to the Cloud Pak for AIOps Alert Viewer
  • Filter alerts based on severity and namespace

This webhook is hidden from the Cloud Pak for AIOps UI to prevent accidental modification or deletion.

Note: Red Hat OpenShift alertmanager requires a trusted webhook endpoint. If your IBM Cloud Pak for AIOps deployment is not using certificates that are trusted by Red Hat OpenShift, it cannot send alerts to IBM Cloud Pak for AIOps. You must use trusted certificates or add the IBM Cloud Pak for AIOps certificate authority (CA) signing certificate to the Red Hat OpenShift CA bundle. For more information about certificates, see Updating the CA bundle.