Monitoring IBM Cloud Pak for AIOps with Prometheus and Grafana

Learn how to deploy Grafana and connect it to Prometheus to monitor your Cloud Pak for AIOps deployment on Linux.

Notes:

  • Grafana is a 3rd party component that is not owned by IBM, use with caution.
  • The instructions below are only for Linux-based installations of Cloud Pak for AIOps, to find instructions for OpenShift Container Platform, see Deploying and configuring Grafana.

Deploying Grafana

Run the following script to deploy Grafana and connect it to the Prometheus metrics being collected within your Cloud Pak for AIOps deployment. The script will create the monitoring-grafana namespace along with the necessary resources to connect to Prometheus and host the Grafana UI.

Important: Running this script will trigger a rolling restart of the kafka brokers. This is necessary to enable kafka metrics which are used in some of the dashboards. To avoid restarting the brokers, remove this line from the Grafana deployment script before running:
kubectl patch subscription ibm-aiops-orchestrator -n aiops --type=merge -p '{"spec":{"config":{"env":[{"name":"ENABLE_KAFKA_METRICS","value":"true"}]}}}'
Grafana deployment script:
kubectl create namespace monitoring-grafana && \
GRAFANA_PASSWORD=$(openssl rand -base64 16 | tr -d "=+/" | cut -c1-16) && \
kubectl create secret generic grafana-admin-credentials \
  --from-literal=admin-user=admin \
  --from-literal=admin-password="${GRAFANA_PASSWORD}" \
  -n monitoring-grafana && \
echo "==========================================" && \
echo "Grafana Admin Credentials" && \
echo "==========================================" && \
echo "Username: admin" && \
echo "Password: ${GRAFANA_PASSWORD}" && \
echo "==========================================" && \
echo "Please save these credentials securely!" && \
echo "==========================================" && \
cat <<'EOF' | kubectl apply -f - -n monitoring-grafana
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  labels:
     app.kubernetes.io/name: grafana-datasources
data:
  datasources.yaml: |
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        access: proxy
        url: http://prometheus-operated.prometheus-operator.svc.cluster.local:9090/self-monitoring/explorer
        isDefault: true
        editable: false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  labels:
    app.kubernetes.io/name: grafana-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: grafana
    app.kubernetes.io/name: grafana
  name: grafana
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      securityContext:
        fsGroup: 472
        supplementalGroups:
          - 0
      containers:
        - name: grafana
          image: grafana/grafana:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: GF_SECURITY_ADMIN_USER
              valueFrom:
                secretKeyRef:
                  name: grafana-admin-credentials
                  key: admin-user
            - name: GF_SECURITY_ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: grafana-admin-credentials
                  key: admin-password
          securityContext:
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            capabilities:
                drop: ["ALL"]
            seccompProfile:
                type: RuntimeDefault
          ports:
            - containerPort: 3000
              name: http-grafana
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /robots.txt
              port: 3000
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 2
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            tcpSocket:
              port: 3000
            timeoutSeconds: 1
          resources:
            requests:
              cpu: 250m
              memory: 500Mi
            limits:
              cpu: 750m
              memory: 1Gi
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: grafana-pv
            - mountPath: /etc/grafana/provisioning/datasources
              name: grafana-datasources
      volumes:
        - name: grafana-pv
          persistentVolumeClaim:
            claimName: grafana-pvc
        - name: grafana-datasources
          configMap:
            name: grafana-datasources
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  labels:
    app.kubernetes.io/name: grafana-service
spec:
  ports:
    - port: 3000
      protocol: TCP
      targetPort: http-grafana
  selector:
    app: grafana
  sessionAffinity: None
  type: ClusterIP
EOF
AIOPS_DOMAIN=$(kubectl get ingress -n aiops aiops-common-web-ui -o jsonpath='{.spec.rules[0].host}' | sed 's/^cp-console-//') && \
GRAFANA_URL=grafana.${AIOPS_DOMAIN} && \
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring-grafana
  labels:
    app.kubernetes.io/name: grafana-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: ${GRAFANA_URL}
    http:
      paths:
      - backend:
          service:
            name: grafana
            port:
              number: 3000
        path: /
        pathType: Prefix
EOF
cat <<'EOF' | kubectl apply -n prometheus-operator -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: grafana
  labels:
    app.kubernetes.io/name: grafana-networkpolicy
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring-grafana
      podSelector:
        matchLabels:
          app: grafana
  podSelector:
    matchLabels:
      prometheus: prometheus
  policyTypes:
  - Ingress
EOF
kubectl patch subscription ibm-aiops-orchestrator -n aiops --type=merge -p '{"spec":{"config":{"env":[{"name":"ENABLE_KAFKA_METRICS","value":"true"}]}}}' && \
kubectl label servicemonitor --all -n aiops com.ibm.aiops.monitoring/on="true"
kubectl label podmonitor --all -n aiops com.ibm.aiops.monitoring/on="true"
kubectl label servicemonitor --all -n prometheus-operator com.ibm.aiops.monitoring/on="true"
echo "Grafana URL and Login:"
echo "=========================================="
echo "https://${GRAFANA_URL}"
echo "admin / ${GRAFANA_PASSWORD}"
echo "=========================================="

Importing dashboards

Once you've finished deploying Grafana and you've logged into the Grafana UI, custom Cloud Pak for AIOps dashboards can be imported by copying and pasting the json, or by uploading the json file.

Grafana monitoring home page

To see which dashboards are available for import, see Grafana dashboards.