Configuring Prometheus as an event source

Prometheus is an open-source systems monitoring and alerting toolkit. You can set up an integration with Monitoring to receive alert information from Prometheus. The Prometheus integration is only available in Monitoring, Advanced.

About this task

Using an incoming webhook URL, configure your Prometheus instance to route alerts to Monitoring, and define alerting rules in your Prometheus Alertmanager configuration file.

Procedure

  1. Go to Administer > Monitoring > Integrations on the IBM Cloud Pak console.
  2. Click Configure an integration.
  3. Go to the Prometheus tile and click Configure.
  4. Enter a name for the integration and click Copy to add the generated webhook URL to the clipboard. Ensure you save the generated webhook to make it available later in the configuration process. For example, you can save it to a file.
  5. Click Save.
  6. Set up the integration in Prometheus as follows:

    Note: If you are want to receive event information from Prometheus included in IBM Cloud Private, the following steps are different. See step 7 for details about how to configure Prometheus included in IBM Cloud Private.

    1. Ensure you have the Prometheus Alertmanager installed as described in prometheus/alertmanager.
    2. Configure the Alertmanager to send alert information from Prometheus to Cloud Event Management. Edit the alertmanagerFiles section of your Alertmanager configuration file to add the generated webhook from Monitoring as a receiver. Paste the webhook into the url: field. In addition, set the send_resolved value to true.

      For example:

       alertmanagerFiles:
         alertmanager.yml: |-
           global:
             resolve_timeout: 20s
      
           receivers:
             - name: 'webhook'
               webhook_configs:
                 - send_resolved: true
                   url: 'https://myeventsource.mybluemix.net/webhook/prometheus/omaasdev/63234831-4389-480f-8035-bc293b4e05fe/1pA0lWhP09t9_FhPLxNyKrGuglYBnHPa1MXbx4otg3Y'
      
           route:
             group_wait: 10s
             group_interval: 5m
             receiver: webhook
             repeat_interval: 3h
      

      For more information about Alertmanager configuration files, see Prometheus configuration.

    3. Edit the serverFiles section of your Alertmanager configuration file to define your alerting rules. You must provide at least the following fields for each alert: severity, summary, description, and type. Severity must be one of the following values:

      • indeterminate
      • information
      • warning
      • minor
      • major
      • critical
      • fatal

      The alerting rules syntax is different depending on the version of Prometheus you are using.

      If you are using Prometheus version 1.8, see the following example for alerting rules:

      serverFiles:
      rules: ""
      alerts: |-
      # Rules for Node
      ALERT high_node_load
      IF node_load1 > 20
      FOR 10s
      LABELS { severity = "critical" }
      ANNOTATIONS {
      # summary defines the status if the condition is met
      summary = "Node usage exceeded threshold",
      # description reports the situation of event
      description = "Instance {{ $labels.instance }}, Job {{ $labels.job }},      Node load {{ $value }}",
      # type defines the type of the resource causing the event
      type = "Server",
      }
      
      ALERT high_memory_usage
      IF (( node_memory_MemTotal - node_memory_MemFree ) / node_memory_MemTotal) * 100 > 100
      FOR 10s
      LABELS { severity = "warning" }
      ANNOTATIONS {
      # summary defines the status if the condition is met
      summary = "Memory usage exceeded threshold",
      # description reports the situation of event
      description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, 
      Memory usage {{ humanize $value }}%",
      # type defines the type of the resource causing the event
      type = "Server",
      }
      
      ALERT high_storage_usage
      IF (node_filesystem_size{fstype="ext4"} - node_filesystem_free{fstype="ext4"}) / 
      node_filesystem_size{fstype="ext4"}  * 100 > 90
      FOR 10s
      LABELS { severity = "warning" }
      ANNOTATIONS {
      # summary defines the status if the condition is met
      summary = "Storage usage exceeded threshold",
      # description reports the situation of event
      description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, 
      Storage usage {{ humanize $value }}%",
      # type defines the type of the resource causing the event
      type = "Storage",
      }
      

      If you are using Prometheus version 2.0 or later, see the following example for alerting rules:

       - alert: high_cpu_load
          expr: node_load1 > 60
          for: 30s
          labels:
            severity: critical
          annotations:
            description: Docker host is under high load, the avg load 1m is at {{ $value}}.
              Reported by instance {{ $labels.instance }} of job {{ $labels.job }}.
            summary: Server under high load
            type: Server
        - alert: high_memory_load
          expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers
            + node_memory_Cached)) / sum(node_memory_MemTotal) * 100 > 85
          for: 30s
          labels:
            severity: warning
          annotations:
            description: Docker host memory usage is {{ humanize $value}}%. Reported by
              instance {{ $labels.instance }} of job {{ $labels.job }}.
            summary: Server memory is almost full
            type: Server
        - alert: high_storage_load
          expr: (node_filesystem_size{fstype="aufs"} - node_filesystem_free{fstype="aufs"})
            / node_filesystem_size{fstype="aufs"} * 100 > 85
          for: 30s
          labels:
            severity: warning
          annotations:
            description: Docker host storage usage is {{ humanize $value}}%. Reported by
              instance {{ $labels.instance }} of job {{ $labels.job }}.
            summary: Server storage is almost full
             type: Server
      
    4. Save and close the file.

  7. If you want to receive event information from Prometheus included in IBM Cloud Private, set up the integration using the IBM Cloud Private UI as follows:

    1. Log in to your IBM Cloud Private host. From the navigation menu, click Configuration > ConfigMaps.
    2. Search for alert to list the ConfigMaps for the Prometheus Alertmanager and alerting rules.
    3. Configure the Alertmanager to send alert information from Prometheus in IBM Cloud Private to Monitoring. Edit the monitoring-prometheus-alertmanager ConfigMap by clicking Menu overflow and Edit. Add the generated webhook from Monitoring as a receiver. Paste the webhook into the url: field. In addition, set the send_resolved value to true.

      You can also click Create resource, add the following Alertmanager configuration, paste the webhook from Monitoring into the url: field, and click Create. This will overwrite your settings in monitoring-prometheus-alertmanager (note that this example also includes a Slack channel configuration):

      apiVersion: v1
      kind: ConfigMap
      metadata:
        labels:
          app: monitoring-prometheus
          component: alertmanager
        name: monitoring-prometheus-alertmanager
        namespace: kube-system
      data:
        alertmanager.yml: |-
          global:
            resolve_timeout: 20s
            slack_api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
          route:
            receiver: webhook
            group_by: [alertname, instance, severity]
            group_wait: 10s
            group_interval: 10s
            repeat_interval: 1m
            routes:
            - receiver: webhook
              continue: true
            - receiver: slack_alerts
              continue: true
          receivers:
          - name: webhook
            webhook_configs:
            - send_resolved: true
              url: 'https://<webhook_url_from_Cloud_Event_Managent.net/webhook/prometheus/xxx/yyy/zzz'
          - name: slack_alerts
            slack_configs:
            - send_resolved: false
              channel: '#ibmcloudprivate'
              text: 'Nodes: {{ range .Alerts }}{{ .Labels.instance }} {{ end }}      ---- Summary: {{ .CommonAnnotations.summary }}      ---- Description: {{ .CommonAnnotations.description }}       ---- https://9.30.189.183:8443/prometheus/alerts '
      
    4. Edit the monitoring-prometheus-alertrules ConfigMap to define your alerting rules. Click Menu overflow and Edit. You must provide at least the following fields for each alert: severity, summary, description, and type. Severity must be one of the following values:

      • indeterminate
      • information
      • warning
      • minor
      • major
      • critical
      • fatal

        You can also click Create resource, add the following alerting rules, and click Create. This will overwrite your settings in monitoring-prometheus-alertrules:

        apiVersion: v1
        kind: ConfigMap
        metadata:
        labels:
        app: monitoring-prometheus
        component: prometheus
        name: monitoring-prometheus-alertrules
        namespace: kube-system
        data:
        sample.rules: |-
        groups:
        - name: alert.rules
          rules:
          - alert: high_cpu_load
            expr: node_load1 > 5
            for: 10s
            labels:
              severity: critical
            annotations:
              description: Docker host is under high load, the avg load 1m is at {{ $value}}.
                Reported by instance {{ $labels.instance }} of job {{ $labels.job }}.
              summary: Server under high load
          - alert: high_memory_load
            expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers
              + node_memory_Cached)) / sum(node_memory_MemTotal) * 100 > 85
            for: 30s
            labels:
              severity: warning
            annotations:
              description: Docker host memory usage is {{ humanize $value}}%. Reported by
                instance {{ $labels.instance }} of job {{ $labels.job }}.
              summary: Server memory is almost full
          - alert: high_storage_load
            expr: (node_filesystem_size{fstype="aufs"} - node_filesystem_free{fstype="aufs"})
              / node_filesystem_size{fstype="aufs"} * 100 > 15
            for: 30s
            labels:
              severity: warning
            annotations:
              description: Docker host storage usage is {{ humanize $value}}%. Reported by
                instance {{ $labels.instance }} of job {{ $labels.job }}.
              summary: Server storage is almost full
        
    5. Optional: To check that you have set up Prometheus in IBM Cloud Private to send event information: from the navigation menu, click Platform -> Alerting, and click the Status tab; check that your settings are available in the Config section.

  8. To start receiving alert information from Prometheus, ensure that Enable event management from this source is set to On in Monitoring.