Self-monitoring with alerts
You can use Red Hat® OpenShift® Container Platform inbuilt Alertmanager to monitor and send alerts to Cloud Pak for AIOps when a problem occurs in the underlying infrastructure.
Cloud Pak for AIOps can ingest events from multiple sources and carry out analytics on them for you to reduce the noise and focus on the events that matter the most in your network. What if Cloud Pak for AIOps itself or the infrastructure that it is running on has a looming problem that might result in losing your event management system?
Since Cloud Pak for AIOps runs on OpenShift Container Platform, you can use the inbuilt Alertmanager feature of OpenShift Container Platform to do the monitoring and configure it to send alerts to Cloud Pak for AIOps when it sees a problem with any of the underlying infrastructure. Examples of problems include pod restarts, low storage, or latency issues, which can cause the loss of the event management system.
Note: Alertmanager sends you initial warning alerts and then escalates those to Critical as the issue becomes more serious. For example, when storage passes a threshold, you initially get a warning alert, but as the threshold goes to 90% an alert becomes Critical.
Complete the following steps to configure the monitoring by using the Alertmanager:
1. Create a Generic Webhook integration.
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
On the Integrations page, click Add integration.
-
From the list of available integrations, search Generic Webhook and click the tile to set up a new webhook ingestion endpoint.
-
In the Add Integration dialog box, provide a name for the integration and select None as the Authentication type and click Next.
-
Click Load sample mapping and select Prometheus Alert Manager to configure the integration with the correct schema mapping from Prometheus to Cloud Pak for AIOps alerts. Click Done.
2. Copy the webhook URL to the Red Hat® OpenShift® Alertmanager user interface.
-
From the new Generic Webhook integration that you created, copy the full webhook URL and switch to the OpenShift Container Platform console. An example URL looks like
https://whconn-6492ea38-cf72-4f3d-b7d3-9fbbc2b1065c-aiops.apps.example.com/webhook-connector/ezca4fgo2tj
. -
In the OpenShift Container Platform console go to Administration > Cluster Settings > Configuration > Alertmanager.
-
Edit the Receivers section for Critical and Default Receiver name.
-
Select Webhook for the Receiver Type field and paste the full URL into the URL field.
3. Configure Alertmanager to send the alerts you are interested in.
Use the Routing labels section to further filter the alerts that get sent to Cloud Pak for AIOps.
The preceding sample figure (with caption Webhook receiver) sends all alerts by using the value of .*
.
In the Routing labels section of the Webhook receiver, you can use alertname
and a value of .*
to allow all alerts through, or you can filter by using, for instance, alertname with a value of SystemMemoryExceedsReservation or any other regular expression that gives you the set of alerts that you want to manage in Cloud Pak for AIOps. When an OpenShift Container Platform alert is generated it contains Labels attached to it as shown in the following example
output:
When you save the Alertmanager configuration, any existing active alerts appear in your Alert Viewer.
Optionally, you can also create an Cloud Pak for AIOps policy to scope group these Prometheus based alerts if you want those to be grouped in one Cloud Pak for AIOps incident.
The following simple example policy provides a scope-based grouping:
Note: Alertmanager requires a trusted webhook endpoint. If your Cloud Pak for AIOps deployment is not using certs that are trusted by OpenShift itself, it cannot send the alerts to Cloud Pak for AIOps. You need to use trusted certs or add the Cloud Pak for AIOps CA signing cert to the OpenShift CA bundle. For more information about certs, see Updating the CA bundle.