IBM® Cloud Private cluster monitoring
You can use the IBM Cloud Private cluster monitoring dashboard to monitor the status of your cluster and applications.
The monitoring dashboard uses Grafana and Prometheus to present detailed data about your cluster nodes and containers. For more information about Grafana, see the Grafana documentation .
For more information about Prometheus, see the Prometheus documentation
.
- Accessing the monitoring dashboard
- Role-based access
- Configuring alerts
- Accessing monitoring service APIs
Accessing the monitoring dashboard
-
Log in to the IBM Cloud Private management console.
Note: When you log in to the management console, you have administrative access to Grafana. Do not create more users within the Grafana dashboard or modify the existing users or org.
- Click Menu > Platform > Monitoring. Alternatively, you can open
https://<master_ip>:8443/grafana, where master_ip is the IP address of the master node. - From the Grafana dashboard, open one of the three default dashboards:
- Docker Host & Container Overview
- Kubernetes cluster monitoring (via Prometheus)
- Prometheus Stats
If you want to view other data, you can create new dashboards or import dashboards from JSON definition files for Grafana.
Role-based access
Role-base access for monitoring API
A user with role ClusterAdministrator,Administrator or Operator can access monitoring service. A user with role ClusterAdministrator or Administrator can perform write operations in monitoring service, including deleting Prometheus metrics data, and updating Grafana configurations.
Role-base access for monitoring data
Starting with version 1.2.0, the ibm-icpmonitoring Helm chart introduces an important feature. It offers a new module that provides role-based access controls (RBAC) for access to the Prometheus metrics data.
The RBAC module is effectively a proxy that sits in front of the Prometheus client pod. It examines the requests for authorization headers, and at that point, enforces role-based controls. In general, the rules concerning RBAC are as follows:
A user with the role ClusterAdministrator can access any resource. A user with any other role can only access data in the namespaces for which that user is authorized.
Configuring alerts
You can configure Prometheus alerts or integrate external alert service providers, such as Slack or PagerDuty for IBM Cloud Private.
Important: ConfigMap changes are lost when you upgrade, roll back, or update the monitoring release. In addition, the ConfigMap format can change between releases.
- From the IBM Cloud Private management console, click Menu > Configuration > ConfigMaps.
-
To configure Prometheus alerts, complete the following steps:
- In the same row as the
monitoring-prometheus-alertrulesConfigMap, select Action > Edit. - In the
datasection, provide the alerts. For more information about the alert configuration, see Alerting Rulesin the Prometheus documentation.
For example, to configure two sample alerts to test the
alertmanagerdashboard, replace thedatasection, with the following text:"data": { "sample.rules": "groups:\n - name: a.rules\n rules:\n - alert: NodeMemoryUsage\n expr: ((node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes))/ node_memory_MemTotal_bytes) * 100 > 75\n for: 1m\n labels:\n severity: page\n annotations:\n DESCRIPTION: '{{ $labels.instance }}: Memory usage is above 75% (current value is: {{ $value }})'\n SUMMARY: '{{ $labels.instance }}: High memory usage detected'\n - alert: HighCPUUsage\n expr: ((sum(node_cpu_seconds_total{mode=~\"user|nice|system|irq|softirq|steal|idle|iowait\"}) BY (instance, job)) - (sum(node_cpu_seconds_total{mode=~\"idle|iowait\"}) BY (instance, job))) / (sum(node_cpu_seconds_total{mode=~\"user|nice|system|irq|softirq|steal|idle|iowait\"}) BY (instance, job)) * 100 > 3\n for: 1m\n labels:\n service: backend\n annotations:\n description: This machine has really high CPU usage for over 10m\n summary: High CPU Usage\"" }After you replace the
datasection, click Submit. - In the same row as the
-
To integrate external alert services:
- In the same row as the
monitoring-prometheus-alertmanagerConfigMap, select Action > Edit. - For more information about configuring the alerts, see Configuration
and Notification Template Examples
in the Prometheus documentation.
- In the same row as the
-
Allow several minutes for the updates to take effect, and open the alert manager dashboard at
https://<master_ip>:8443/alertmanager.- If you configured the sample alerts, they display.
-
Any other valid alerts that you configure display.
After the alert manager dashboard displays valid data, if you configured an external alert service, you can view those alerts in the dashboard for that service.
You can return to this dashboard to view alerts at any time.
Accessing monitoring service APIs
You can access monitoring service APIs such as Prometheus and Grafana APIs. Before you can access the APIs, you must obtain authentication tokens to specify in your request headers. For information about obtaining authentication tokens, see Preparing to run component or management API commands.
After you obtain the authentication tokens, complete the following steps to access the Prometheus and Grafana APIs.
-
Access the Prometheus API at url,
https://<cluster_CA_domain>:<Port>/prometheus/*and get boot times of all nodes.$ACCESS_TOKENis the variable that stores the authentication token for your cluster.<cluster_CA_domain>is the certificate authority (CA) domain that was set in yourconfig.yamlfile during installation.<Port>is the port that is used to access the management console.
curl -k -s -X GET -H "Authorization:Bearer $ACCESS_TOKEN" https://<cluster_CA_domain>:<Port>/prometheus/api/v1/query?query=node_boot_time_secondsFor detailed information about Prometheus APIs, see Prometheus HTTP API
.
-
Access the Grafana API at url,
https://<cluster_CA_domain>:8443/grafana/*and obtain thesampledashboard.$ACCESS_TOKENis the variable that stores the authentication token for your cluster.<cluster_CA_domain>is the certificate authority (CA) domain that was set in yourconfig.yamlfile during installation.
curl -k -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN” "https://<cluster_CA_domain>:8443/grafana/api/dashboards/db/sample"For detailed information about Grafana APIs, see Grafana HTTP API Reference
.