Monitoring with Prometheus and Grafana

You can use Prometheus as a monitoring system and Grafana as a visualization tool to monitor your existing installation of IBM Cloud Pak for AIOps. Prometheus and Grafana can be installed and configured on a Red Hat OpenShift Container Platform cluster.

Prometheus is an open-source toolkit that can be used with Grafana to create real-time dashboards for monitoring Cloud Pak for AIOps stability and performance.

Note: The following instructions are updated to support the Grafana v5 operator, which is compatible with OpenShift version 4.11.x or higher. Previously, these instructions used the Grafana v4 operator, which is not supported by OpenShift version 4.16.x or higher. If you followed the instructions for the Grafana v4 operator, there might be compatibility issues if you upgrade your OpenShift version to 4.16.x or higher. To check your version of the Grafana operator, go to your OpenShift console, Operators > Installed Operators. To upgrade your Grafana operator from v4 to v5, run the cleanup script, and follow the instructions detailed in the Viewing metrics in Grafana section.

Warning: The clean up script deletes any existing dashboards in Grafana. Copy the JSON of any existing dashboards that you want to keep.

Clean up script (optional):

oc project monitoring grafana
# delete CRs
oc delete grafana grafana
oc delete grafanadatasource prometheus

# delete subscription & CSV
CSV=$(oc get subscription monitoring-grafana -o json | jq -r '.status.installedCSV')
oc delete subscription monitoring-grafana
oc delete csv $CSV

# delete CRDs
# IMPORTANT: if you created any additional CRD instances, make sure to delete them before deleting the definition
oc delete crd grafanadashboards.integreatly.org
oc delete crd grafanadatasources.integreatly.org
oc delete crd grafanafolders.integreatly.org
oc delete crd grafananotificationchannels.integreatly.org
oc delete crd grafanas.integreatly.org

# delete the rest of v4 associated resources
oc delete clusterrole grafana-proxy
oc delete rolebinding grafana-proxy
oc delete secret grafana-k8s-proxy

Notes:

  • The monitoring stack that is described in the following sections uses 3rd party components that are not owned by IBM, use with caution.
  • The monitoring stack that is described in the following sections currently works with Cloud Pak for AIOps that is installed in OpenShift Container Platform cluster. Cloud Pak for AIOps installed in an air-gapped environment (offline) is not supported by the described monitoring stack.

Prerequisites

  • You need network connectivity to the OpenShift Container Platform cluster where Cloud Pak for AIOps is installed.
  • OpenShift command-line interface is required. For more information, see Getting started with the OpenShift CLI.
  • You need to install jq for your operating system to run the commands for JSON processing. For more information, see Download jq.

Use the following sections to set up the monitoring for your Cloud Pak for AIOps environment:

Setting up user workload monitoring

OpenShift Container Platform comes with a prometheus-based monitoring stack to track metrics about the cluster. You can extend the monitoring to user workloads by enabling user workload monitoring. User workloads include anything that is not a core OpenShift Container Platform service. For more information about user workloads monitoring and advanced configuration, see User workload monitoring.

To set up the user workload monitoring capability, you need to enable it.

Note: Once you run the following commands, the associated ConfigMaps (configuration settings) are applied cluster-wide. Do not run the commands on a shared cluster that includes applications other than Cloud Pak for AIOps.

Log in to your OpenShift Container Platform cluster by using the command line. Run the following commands to enable the user workload monitoring capability:

oc create -n openshift-monitoring -f - <<EOF || true
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/created-by: IBM
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true
EOF
oc create -n openshift-user-workload-monitoring -f - <<EOF || true
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/created-by: IBM
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
EOF

After you run the preceding commands, you can view metrics in OpenShift Container Platform web console.

Viewing metrics in OpenShift Container Platform cluster

To view the metrics, go to Observe > Metrics in OpenShift Container Platform web console. You can enter a valid Prometheus query and graph the result. The following are some example queries that you can use to view the metrics for your Cloud Pak for AIOps environment:

  1. The total number of pods that are in a running state across all namespaces:

    sum(kube_pod_status_phase{namespace=~".*", phase="Running"})
    
  2. The total amount of memory that is allocated across all the nodes:

    sum (machine_memory_bytes{node=~"^.*$"})
    
  3. Deployment replicas across all the namespaces:

    kube_deployment_status_replicas{namespace=~".*"}
    

Enabling additional metrics

In some cases you might need configuration to enable additional metrics for your Cloud Pak for AIOps environment.

Kafka

Since a large portion of communication within Cloud Pak for AIOps takes place in Kafka, it is highly beneficial to acquire metrics from Kafka.

Notes:

  • Kafka metrics, such as those displayed in the usage dashboard, are only available from the moment the metrics are enabled. Viewing time windows that precede the enabling of the metrics results in nothing being displayed.

  • The Kafka custom resource (CR) must be in a ready state for metric scraping to be enabled successfully. To view the status of the Kafka CR, navigate to Home > Search. Search for the Kafka resource and click on Kafka instance. The Conditions section at the end of the page shows the Status.

Use the following steps to enable Kafka metrics:

  1. Switch to the project (namespace) where Cloud Pak for AIOps is installed:

    oc project <namespace>
    

    Where <namespace> is the project (namespace) where Cloud Pak for AIOps is installed

  2. Define the Kafka ConfigMap and PodMonitor resources to configure the Kafka Prometheus exporter:

    1. Define the Kafka ConfigMap resource:

      KAFKA_METRICS_CONFIGMAP='
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: kafka-metrics
        labels:
          app: strimzi
      data:
        kafka-metrics-config.yml: |
          # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
          lowercaseOutputName: true
          rules:
          # Special cases and very specific rules
          - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
            name: kafka_server_$1_$2
            type: GAUGE
            labels:
             clientId: "$3"
             topic: "$4"
             partition: "$5"
          - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
            name: kafka_server_$1_$2
            type: GAUGE
            labels:
             clientId: "$3"
             broker: "$4:$5"
          - pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
            name: kafka_server_$1_connections_tls_info
            type: GAUGE
            labels:
              cipher: "$2"
              protocol: "$3"
              listener: "$4"
              networkProcessor: "$5"
          - pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
            name: kafka_server_$1_connections_software
            type: GAUGE
            labels:
              clientSoftwareName: "$2"
              clientSoftwareVersion: "$3"
              listener: "$4"
              networkProcessor: "$5"
          - pattern: "kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):"
            name: kafka_server_$1_$4
            type: GAUGE
            labels:
             listener: "$2"
             networkProcessor: "$3"
          - pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
            name: kafka_server_$1_$4
            type: GAUGE
            labels:
             listener: "$2"
             networkProcessor: "$3"
          # Some percent metrics use MeanRate attribute
          # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
          - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
            name: kafka_$1_$2_$3_percent
            type: GAUGE
          # Generic gauges for percents
          - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
            name: kafka_$1_$2_$3_percent
            type: GAUGE
          - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
            name: kafka_$1_$2_$3_percent
            type: GAUGE
            labels:
              "$4": "$5"
          # Generic per-second counters with 0-2 key/value pairs
          - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
            name: kafka_$1_$2_$3_total
            type: COUNTER
            labels:
              "$4": "$5"
              "$6": "$7"
          - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
            name: kafka_$1_$2_$3_total
            type: COUNTER
            labels:
              "$4": "$5"
          - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
            name: kafka_$1_$2_$3_total
            type: COUNTER
          # Generic gauges with 0-2 key/value pairs
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
            name: kafka_$1_$2_$3
            type: GAUGE
            labels:
              "$4": "$5"
              "$6": "$7"
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
            name: kafka_$1_$2_$3
            type: GAUGE
            labels:
              "$4": "$5"
          - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
            name: kafka_$1_$2_$3
            type: GAUGE
          # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
          # Note that these are missing the '_sum' metric!
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
            name: kafka_$1_$2_$3_count
            type: COUNTER
            labels:
              "$4": "$5"
              "$6": "$7"
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
            name: kafka_$1_$2_$3
            type: GAUGE
            labels:
              "$4": "$5"
              "$6": "$7"
              quantile: "0.$8"
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
            name: kafka_$1_$2_$3_count
            type: COUNTER
            labels:
              "$4": "$5"
          - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
            name: kafka_$1_$2_$3
            type: GAUGE
            labels:
              "$4": "$5"
              quantile: "0.$6"
          - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
            name: kafka_$1_$2_$3_count
            type: COUNTER
          - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
            name: kafka_$1_$2_$3
            type: GAUGE
            labels:
              quantile: "0.$4"
        zookeeper-metrics-config.yml: |
          # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
          lowercaseOutputName: true
          rules:
          # replicated Zookeeper
          - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
            name: "zookeeper_$2"
            type: GAUGE
          - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
            name: "zookeeper_$3"
            type: GAUGE
            labels:
              replicaId: "$2"
          - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
            name: "zookeeper_$4"
            type: COUNTER
            labels:
              replicaId: "$2"
              memberType: "$3"
          - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
            name: "zookeeper_$4"
            type: GAUGE
            labels:
              replicaId: "$2"
              memberType: "$3"
          - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
            name: "zookeeper_$4_$5"
            type: GAUGE
            labels:
              replicaId: "$2"
              memberType: "$3"
      '
      
    2. Define the PodMonitor resource:

      KAFKA_PODMONITOR='
      apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        name: kafka-resources-metrics
        labels:
          app: strimzi
      spec:
        selector:
          matchExpressions:
            - key: "ibmevents.ibm.com/kind"
              operator: In
              values: ["Kafka", "KafkaConnect", "KafkaMirrorMaker", "KafkaMirrorMaker2"]
        namespaceSelector:
          matchNames:
            - myproject
        podMetricsEndpoints:
        - path: /metrics
          port: tcp-prometheus
          relabelings:
          - separator: ;
            regex: __meta_kubernetes_pod_label_(ibmevents_ibm_com_.+)
            replacement: $1
            action: labelmap
          - sourceLabels: [__meta_kubernetes_namespace]
            separator: ;
            regex: (.*)
            targetLabel: namespace
            replacement: $1
            action: replace
          - sourceLabels: [__meta_kubernetes_pod_name]
            separator: ;
            regex: (.*)
            targetLabel: kubernetes_pod_name
            replacement: $1
            action: replace
          - sourceLabels: [__meta_kubernetes_pod_node_name]
            separator: ;
            regex: (.*)
            targetLabel: node_name
            replacement: $1
            action: replace
          - sourceLabels: [__meta_kubernetes_pod_host_ip]
            separator: ;
            regex: (.*)
            targetLabel: node_ip
            replacement: $1
            action: replace
      '
      
  3. Create the ConfigMap and PodMonitor resources by using the apply command:

    oc apply -f - <<EOF
    ${KAFKA_METRICS_CONFIGMAP}
    ---
    ${KAFKA_PODMONITOR}
    EOF
    
  4. Enable the Prometheus JMX exporter by patching the Kafka instance with the ConfigMap that you created in the preceding step.

    Note: When you run the following commands, the Kafka brokers perform a rolling restart to enable the metric exporter.

    oc patch kafka iaf-system --type=merge -p '{
      "spec": {
        "kafka": {
          "metricsConfig": {
            "type": "jmxPrometheusExporter",
            "valueFrom": {
              "configMapKeyRef": {
                "key": "kafka-metrics-config.yml",
                "name": "kafka-metrics"
              }
            }
          }
        }
      }
    }'
    

You can now explore the Kafka broker metrics in your cluster.

For more information about Kafka Prometheus Exporter, see Kafka Prometheus Exporter.

To view the metrics, go to Observe > Metrics in OpenShift Container Platform web console. You can enter a valid Prometheus query and graph the result. For example, you can the use the following Prometheus query to get the active count of all the Kafka topic partitions including the replicas.

sum(kafka_server_replicamanager_partitioncount)

Viewing metrics in Grafana

You can use Grafana to visualize metrics and logs that come from multiple sources, which include Prometheus and other monitoring tools. Grafana provides a web-based interface for creating and customizing dashboards, which can be used to display various metrics and logs from different sources. You need to install Grafana in your OpenShift Container Platform cluster and configure it to connect to Thanos. For more information about Thanos, see Thanos.

Grafana gives access to all metrics across all of the Prometheus instances (both cluster and user workload metrics). You can use the following commands to install the Grafana v5 operator, set up a Grafana instance, and connect Prometheus as a Grafana data source.

  1. Run the following script.

    oc apply -f - <<EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring-grafana
    ---
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: operatorgroup
      namespace: monitoring-grafana
    spec:
      targetNamespaces:
      - monitoring-grafana
    ---
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: monitoring-grafana
      namespace: monitoring-grafana
    spec:
      channel: v5
      installPlanApproval: Automatic
      name: grafana-operator
      source: community-operators
      sourceNamespace: openshift-marketplace 
    EOF
    
    oc project monitoring-grafana
    
    # wait for the Grafana operator to finish installing
    installReady=false
    while [ "$installReady" != true ]
    do
        installPlan=`oc get subscription.operators.coreos.com monitoring-grafana -n monitoring-grafana -o json | jq -r .status.installplan.name`
        if [ -z "$installPlan" ]
        then
            installReady=false
        else
            installReady=`oc get installplan -n monitoring-grafana "$installPlan" -o json | jq -r '.status|.phase == "Complete"'`
        fi
    
        if [ "$installReady" != true ]
        then
            sleep 5
        fi
    done
    installReady=false
    while [ "$installReady" != true ]
    do
        csv=`oc get subscription.operators.coreos.com monitoring-grafana -n monitoring-grafana -o json | jq -r .status.currentCSV`
        if [ -z "$csv" ]
        then
            installReady=false
        else
            installReady=`oc get csv -n monitoring-grafana "$csv" -o json | jq -r '.status.phase == "Succeeded"'`
        fi
    
        if [ "$installReady" != true ]
        then
            sleep 5
        fi
    done
    
    # set up PVC for persisting dashboards (optional)
    # if not using PVC, remove 
    oc apply -n monitoring-grafana -f - <<EOF
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: grafana-data
      namespace: monitoring-grafana
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 2Gi
    EOF
    
    # generate random secret for the proxy
    oc create secret generic -n monitoring-grafana grafana-k8s-proxy --from-literal=session_secret=$(openssl rand --hex 32) || true
    
    # create a serviceaccount for the grafana datasource instance
    oc apply -n monitoring-grafana -f - <<EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: grafana-cluster-monitoring
      namespace: monitoring-grafana
    EOF
    oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-cluster-monitoring -n monitoring-grafana
    oc apply -n monitoring-grafana -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: grafana-auth-secret
      namespace: monitoring-grafana
      annotations:
        kubernetes.io/service-account.name: grafana-cluster-monitoring
    type: kubernetes.io/service-account-token
    EOF
    # create the Grafana instance along with a cluster role and role binding
    oc apply -n monitoring-grafana -f - <<EOF
    apiVersion: grafana.integreatly.org/v1beta1
    kind: Grafana
    metadata:
      labels:
        dashboards: "grafana"
      name: grafana
      namespace: monitoring-grafana
    spec:
      client:
        preferIngress: false
      config:
        auth:
          disable_login_form: "false"
          disable_signout_menu: "true"
        auth.anonymous:
          enabled: "true"
          org_role: Admin
        auth.basic:
          enabled: "true"
        log:
          level: warn
          mode: console
      deployment:
        spec:
          template:
            spec:
              containers:
                - args:
                    - -provider=openshift
                    - -pass-basic-auth=false
                    - -https-address=:9091
                    - -http-address=
                    - -email-domain=*
                    - -upstream=http://localhost:3000
                    - "\$(SAR)"
                    - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
                    - -tls-cert=/etc/tls/private/tls.crt
                    - -tls-key=/etc/tls/private/tls.key
                    - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
                    - -cookie-secret-file=/etc/proxy/secrets/session_secret
                    - -openshift-service-account=grafana-sa
                    - -openshift-ca=/etc/pki/tls/cert.pem
                    - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    - -skip-auth-regex=^/metrics
                  env:
                    - name: SAR
                      value: '-openshift-sar={"resource": "namespaces", "verb": "get"}'
                  image: registry.redhat.io/openshift4/ose-oauth-proxy:v4.10
                  imagePullPolicy: Always
                  name: grafana-proxy
                  ports:
                    - containerPort: 9091
                      name: grafana-proxy
                      protocol: TCP
                  resources: {}
                  volumeMounts:
                    - mountPath: /etc/tls/private
                      name: secret-grafana-k8s-tls
                      readOnly: false
                    - mountPath: /etc/proxy/secrets
                      name: secret-grafana-k8s-proxy
                      readOnly: false
              volumes:
                - name: secret-grafana-k8s-tls
                  secret:
                    secretName: grafana-k8s-tls
                - name: secret-grafana-k8s-proxy
                  secret:
                    secretName: grafana-k8s-proxy
                - name: grafana-data
                  persistentVolumeClaim:
                    claimName: grafana-pvc
      persistentVolumeClaim:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
      route:
        spec:
          port:
            targetPort: grafana-proxy
          tls:
            termination: reencrypt
          to:
            kind: Service
            name: grafana-service
            weight: 100
          wildcardPolicy: None
      service:
        metadata:
          annotations:
            service.alpha.openshift.io/serving-cert-secret-name: grafana-k8s-tls
        spec:
          ports:
            - name: grafana-proxy
              port: 9091
              protocol: TCP
              targetPort: grafana-proxy
      serviceAccount:
        metadata:
          annotations:
            serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"grafana-route"}}'
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: grafana-proxy
      namespace: monitoring-grafana
    rules:
      - apiGroups:
          - authentication.k8s.io
        resources:
          - tokenreviews
        verbs:
          - create
      - apiGroups:
          - authorization.k8s.io
        resources:
          - subjectaccessreviews
        verbs:
          - create
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: grafana-proxy
      namespace: monitoring-grafana
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: grafana-proxy
    subjects:
      - kind: ServiceAccount
        name: grafana-sa
    EOF
    
    # Create the grafanadatasource instance
    oc apply -f - <<EOF
    apiVersion: grafana.integreatly.org/v1beta1
    kind: GrafanaDatasource
    metadata:
      name: prometheus
      namespace: monitoring-grafana
    spec:
      datasource:
        access: proxy
        isDefault: true
        jsonData:
          httpHeaderName1: 'Authorization'
          timeInterval: 5s
          tlsSkipVerify: true
        name: prometheus
        secureJsonData:
          httpHeaderValue1: 'Bearer \${token}'
        type: prometheus
        url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
      instanceSelector:
        matchLabels:
          dashboards: grafana
      valuesFrom:
        - targetPath: secureJsonData.httpHeaderValue1
          valueFrom:
            secretKeyRef:
              key: token
              name: grafana-auth-secret
    EOF
    
    echo "Monitoring will be available at https://$(oc get route -n monitoring-grafana grafana-route -o jsonpath='{.status.ingress[0].host}')"
    

    You might see the following error while the Grafana operator is in the process of deploying:

    Error from server (NotFound): installplans.operators.coreos.com "null" not found
    

Importing dashboards

Dashboards can be imported into Grafana either by an ID or by importing the dashboard JSON itself.

Grafana monitoring home page
Figure. Grafana monitoring homepage

Importing Cloud Pak for AIOps dashboards to Grafana

Several dashboards are provided to assist in the operational monitoring of Cloud Pak for AIOps.

For more information about how to import a dashboard in Grafana, see Import dashboards.

Cloud Pak for AIOps offers the following custom dashboards available as JSON, which can be uploaded or copy pasted into Grafana. For more information, see the following sections:

Importing other dashboards to Grafana

You can import the following dashboards in Grafana. These dashboards are developed by Grafana community members and can be imported with their ID field.

  • K8 Cluster Detail Dashboard ID 10856
  • kubernetes-networking-cluster ID 16371

Viewing available metrics

The preceding dashboards show a fraction of the metrics that are made available by Prometheus. To view the list of available metrics in your OpenShift Container Platform cluster, run the following commands as described in the Authentication with Bearer token.

  1. Obtain the API URL for Thanos querier:

    oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}'
    
  2. Retrieve the list of available Metric APIs and store the result in metrics.json:

    Note: The following command might not work on Federal Information Processing Standards (FIPS) enabled clusters due to networking reasons. To get around this issue, log in to OpenShift Container Platform online console and go to any pod. Exec into the pod and run the following command. Alternatively you can log in to the OpenShift Container Platform cluster by using the command line, exec into the pod, and then run the following command.

    curl -k -H "Authorization: Bearer $(oc whoami -t)" https://<thanos_querier_route>/api/v1/metadata > metrics.json
    

    Where <thanos_querier_route> is the route that you obtained in the preceding command.

All of the Metric APIs listed under the output of the preceding command can be copy and pasted in the cluster under Observe > Metrics. These queries can be used in Grafana to create new panels. The new panels can be saved in a new dashboard or in one of the existing custom (community) dashboards.