IBM® Cloud Private cluster monitoring

You can use the IBM® Cloud Private cluster monitoring dashboard to monitor the status of your cluster and applications.

The monitoring dashboard uses Grafana and Prometheus to present detailed data about your cluster nodes and containers. For more information about Grafana, see the Grafana documentation External link icon. For more information about Prometheus, see the Prometheus documentation External link icon.

Accessing the monitoring dashboard

  1. Log into the IBM® Cloud Private management console.

    Note: When you log into the management console, you have administrative access to Grafana. Do not create additional users within the Grafana dashboard or modify the existing users or org.

  2. Click Menu > Workloads > Monitoring. Alternately you can open https://<master_ip>:8443/grafana, where master_ip is the IP address of the master node.
  3. From the Grafana dashboard, visit one of the three default dashboards:
    • Docker Host & Container Overview
    • Kubernetes cluster monitoring (via Prometheus)
    • Prometheus Stats

If you want to view other data, you can create new dashboards or import dashboards from JSON definition files for Grafana.

Configuring alerts

You can configure Prometheus alerts or integrate external alert service providers, such as Slack or PagerDuty for IBM® Cloud Private.

  1. From the IBM® Cloud Private management console, click Menu > Workloads > ConfigMaps.
  2. To configure Prometheus alerts:

    1. In the same row as the alert_rules configmap, click Action > Edit.
    2. In the data section, provide the alerts. For more information about the alert configuration, see Alerting Rules External link icon in the Prometheus documentation.

      For example, to configure two sample alerts to test the alertmanager dashboard, replace the data section, with the following text:

      "data": {
       "sample.rules": "ALERT NodeMemoryUsage\n  IF (((node_memory_MemTotal-node_memory_MemFree-node_memory_Cached)/(node_memory_MemTotal)*100)) > 25\n  FOR 1m\n  LABELS {\n    severity=\"page\"\n  }\n  ANNOTATIONS {\n    SUMMARY = \"{{$labels.instance}}: High memory usage detected\",\n    DESCRIPTION = \"{{$labels.instance}}: Memory usage is above 75% (current value is: {{ $value }})\"\n  }\nALERT HighCPUUsage\n  IF ((sum(node_cpu{mode=~\"user|nice|system|irq|softirq|steal|idle|iowait\"}) by (instance, job)) - ( sum(node_cpu{mode=~\"idle|iowait\"}) by (instance,job)))/(sum(node_cpu{mode=~\"user|nice|system|irq|softirq|steal|idle|iowait\"}) by (instance, job)) * 100 > 2\n  FOR 1m\n  LABELS { \n    service = \"backend\" \n  }\n  ANNOTATIONS {\n    summary = \"High CPU Usage\",\n    description = \"This machine  has really high CPU usage for over 10m\",\n  }"
      }
      
    3. Click Submit.

    4. To refresh the data for your Prometheus alert settings, open the https://<master_ip>:8443/prometheus/reload page.
  3. To integrate external alert services:
    1. In the same row as the monitoring-prometheus-alertmanager configmap, click Action > Edit.
    2. See Configuration External link icon in the Prometheus documentation.
    3. To refresh the external service configuration data for the alertmanager service, open the https://<master_ip>:8443/alertmanager/reload page.
  4. Allow several minutes for the updates to take effect, and open the alert manager dashboard at https://<master_ip>:8443/alertmanager.

    • If you configured the sample alerts, they display.
    • Any other valid alerts that you configure display.

      After the alert manager dashboard displays valid data, if you configured an external alert service, you can view those alerts in the dashboard for that service.

      You can return to this dashboard to view alerts at any time.