Configuring Prometheus as an event source
Prometheus is an open-source systems monitoring and alerting toolkit. You can set up an integration with Monitoring to receive alert information from Prometheus. The Prometheus integration is only available in Monitoring, Advanced.
About this task
Using an incoming webhook URL, configure your Prometheus instance to route alerts to Monitoring, and define alerting rules in your Prometheus Alertmanager configuration file.
Procedure
- Go to Administer > Monitoring > Integrations on the IBM Cloud Pak console.
- Click Configure an integration.
- Go to the Prometheus tile and click Configure.
- Enter a name for the integration and click Copy to add the generated webhook URL to the clipboard. Ensure you save the generated webhook to make it available later in the configuration process. For example, you can save it to a file.
- Click Save.
-
Set up the integration in Prometheus as follows:
Note: If you are want to receive event information from Prometheus included in IBM Cloud Private, the following steps are different. See step 7 for details about how to configure Prometheus included in IBM Cloud Private.
- Ensure you have the Prometheus Alertmanager installed as described in prometheus/alertmanager.
-
Configure the Alertmanager to send alert information from Prometheus to Cloud Event Management. Edit the
alertmanagerFiles
section of your Alertmanager configuration file to add the generated webhook from Monitoring as a receiver. Paste the webhook into theurl:
field. In addition, set thesend_resolved
value to true.For example:
alertmanagerFiles: alertmanager.yml: |- global: resolve_timeout: 20s receivers: - name: 'webhook' webhook_configs: - send_resolved: true url: 'https://myeventsource.mybluemix.net/webhook/prometheus/omaasdev/63234831-4389-480f-8035-bc293b4e05fe/1pA0lWhP09t9_FhPLxNyKrGuglYBnHPa1MXbx4otg3Y' route: group_wait: 10s group_interval: 5m receiver: webhook repeat_interval: 3h
For more information about Alertmanager configuration files, see Prometheus configuration.
-
Edit the
serverFiles
section of your Alertmanager configuration file to define your alerting rules. You must provide at least the following fields for each alert: severity, summary, description, and type. Severity must be one of the following values:- indeterminate
- information
- warning
- minor
- major
- critical
- fatal
The alerting rules syntax is different depending on the version of Prometheus you are using.
If you are using Prometheus version 1.8, see the following example for alerting rules:
serverFiles: rules: "" alerts: |- # Rules for Node ALERT high_node_load IF node_load1 > 20 FOR 10s LABELS { severity = "critical" } ANNOTATIONS { # summary defines the status if the condition is met summary = "Node usage exceeded threshold", # description reports the situation of event description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Node load {{ $value }}", # type defines the type of the resource causing the event type = "Server", } ALERT high_memory_usage IF (( node_memory_MemTotal - node_memory_MemFree ) / node_memory_MemTotal) * 100 > 100 FOR 10s LABELS { severity = "warning" } ANNOTATIONS { # summary defines the status if the condition is met summary = "Memory usage exceeded threshold", # description reports the situation of event description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Memory usage {{ humanize $value }}%", # type defines the type of the resource causing the event type = "Server", } ALERT high_storage_usage IF (node_filesystem_size{fstype="ext4"} - node_filesystem_free{fstype="ext4"}) / node_filesystem_size{fstype="ext4"} * 100 > 90 FOR 10s LABELS { severity = "warning" } ANNOTATIONS { # summary defines the status if the condition is met summary = "Storage usage exceeded threshold", # description reports the situation of event description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Storage usage {{ humanize $value }}%", # type defines the type of the resource causing the event type = "Storage", }
If you are using Prometheus version 2.0 or later, see the following example for alerting rules:
- alert: high_cpu_load expr: node_load1 > 60 for: 30s labels: severity: critical annotations: description: Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server under high load type: Server - alert: high_memory_load expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers + node_memory_Cached)) / sum(node_memory_MemTotal) * 100 > 85 for: 30s labels: severity: warning annotations: description: Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server memory is almost full type: Server - alert: high_storage_load expr: (node_filesystem_size{fstype="aufs"} - node_filesystem_free{fstype="aufs"}) / node_filesystem_size{fstype="aufs"} * 100 > 85 for: 30s labels: severity: warning annotations: description: Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server storage is almost full type: Server
-
Save and close the file.
-
If you want to receive event information from Prometheus included in IBM Cloud Private, set up the integration using the IBM Cloud Private UI as follows:
- Log in to your IBM Cloud Private host. From the navigation menu, click Configuration > ConfigMaps.
- Search for alert to list the ConfigMaps for the Prometheus Alertmanager and alerting rules.
-
Configure the Alertmanager to send alert information from Prometheus in IBM Cloud Private to Monitoring. Edit the
monitoring-prometheus-alertmanager
ConfigMap by clickingand Edit. Add the generated webhook from Monitoring as a receiver. Paste the webhook into the
url:
field. In addition, set thesend_resolved
value to true.You can also click Create resource, add the following Alertmanager configuration, paste the webhook from Monitoring into the
url:
field, and click Create. This will overwrite your settings inmonitoring-prometheus-alertmanager
(note that this example also includes a Slack channel configuration):apiVersion: v1 kind: ConfigMap metadata: labels: app: monitoring-prometheus component: alertmanager name: monitoring-prometheus-alertmanager namespace: kube-system data: alertmanager.yml: |- global: resolve_timeout: 20s slack_api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz' route: receiver: webhook group_by: [alertname, instance, severity] group_wait: 10s group_interval: 10s repeat_interval: 1m routes: - receiver: webhook continue: true - receiver: slack_alerts continue: true receivers: - name: webhook webhook_configs: - send_resolved: true url: 'https://<webhook_url_from_Cloud_Event_Managent.net/webhook/prometheus/xxx/yyy/zzz' - name: slack_alerts slack_configs: - send_resolved: false channel: '#ibmcloudprivate' text: 'Nodes: {{ range .Alerts }}{{ .Labels.instance }} {{ end }} ---- Summary: {{ .CommonAnnotations.summary }} ---- Description: {{ .CommonAnnotations.description }} ---- https://9.30.189.183:8443/prometheus/alerts '
-
Edit the
monitoring-prometheus-alertrules
ConfigMap to define your alerting rules. Clickand Edit. You must provide at least the following fields for each alert: severity, summary, description, and type. Severity must be one of the following values:
- indeterminate
- information
- warning
- minor
- major
- critical
-
fatal
You can also click Create resource, add the following alerting rules, and click Create. This will overwrite your settings in
monitoring-prometheus-alertrules
:apiVersion: v1 kind: ConfigMap metadata: labels: app: monitoring-prometheus component: prometheus name: monitoring-prometheus-alertrules namespace: kube-system data: sample.rules: |- groups: - name: alert.rules rules: - alert: high_cpu_load expr: node_load1 > 5 for: 10s labels: severity: critical annotations: description: Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server under high load - alert: high_memory_load expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers + node_memory_Cached)) / sum(node_memory_MemTotal) * 100 > 85 for: 30s labels: severity: warning annotations: description: Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server memory is almost full - alert: high_storage_load expr: (node_filesystem_size{fstype="aufs"} - node_filesystem_free{fstype="aufs"}) / node_filesystem_size{fstype="aufs"} * 100 > 15 for: 30s labels: severity: warning annotations: description: Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}. summary: Server storage is almost full
-
Optional: To check that you have set up Prometheus in IBM Cloud Private to send event information: from the navigation menu, click Platform -> Alerting, and click the Status tab; check that your settings are available in the Config section.
-
To start receiving alert information from Prometheus, ensure that Enable event management from this source is set to On in Monitoring.