Monitoring IBM Cloud Pak for AIOps with Prometheus and Grafana
Learn how to deploy Grafana and connect it to Prometheus to monitor your Cloud Pak for AIOps deployment on Linux.
Notes:
- Grafana is a 3rd party component that is not owned by IBM, use with caution.
- The instructions below are only for Linux-based installations of Cloud Pak for AIOps, to find instructions for OpenShift Container Platform, see Deploying and configuring Grafana.
Deploying Grafana
Run the following script to deploy Grafana and connect it to the Prometheus metrics being collected within your Cloud Pak for AIOps deployment. The script will create the monitoring-grafana namespace along with the necessary resources to connect to Prometheus and host the Grafana UI.
Important: Running this script will trigger a rolling restart of the kafka brokers. This is necessary to enable kafka metrics which are used in some of the dashboards. To avoid restarting the brokers, remove this line from the Grafana deployment script before running:
kubectl patch subscription ibm-aiops-orchestrator -n aiops --type=merge -p '{"spec":{"config":{"env":[{"name":"ENABLE_KAFKA_METRICS","value":"true"}]}}}'
Grafana deployment script:
kubectl create namespace monitoring-grafana && \
GRAFANA_PASSWORD=$(openssl rand -base64 16 | tr -d "=+/" | cut -c1-16) && \
kubectl create secret generic grafana-admin-credentials \
--from-literal=admin-user=admin \
--from-literal=admin-password="${GRAFANA_PASSWORD}" \
-n monitoring-grafana && \
echo "==========================================" && \
echo "Grafana Admin Credentials" && \
echo "==========================================" && \
echo "Username: admin" && \
echo "Password: ${GRAFANA_PASSWORD}" && \
echo "==========================================" && \
echo "Please save these credentials securely!" && \
echo "==========================================" && \
cat <<'EOF' | kubectl apply -f - -n monitoring-grafana
---
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
labels:
app.kubernetes.io/name: grafana-datasources
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus-operated.prometheus-operator.svc.cluster.local:9090/self-monitoring/explorer
isDefault: true
editable: false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
labels:
app.kubernetes.io/name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
app.kubernetes.io/name: grafana
name: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:latest
imagePullPolicy: IfNotPresent
env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
name: grafana-admin-credentials
key: admin-user
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-admin-credentials
key: admin-password
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 500Mi
limits:
cpu: 750m
memory: 1Gi
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-pv
- mountPath: /etc/grafana/provisioning/datasources
name: grafana-datasources
volumes:
- name: grafana-pv
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
configMap:
name: grafana-datasources
---
apiVersion: v1
kind: Service
metadata:
name: grafana
labels:
app.kubernetes.io/name: grafana-service
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: ClusterIP
EOF
AIOPS_DOMAIN=$(kubectl get ingress -n aiops aiops-common-web-ui -o jsonpath='{.spec.rules[0].host}' | sed 's/^cp-console-//') && \
GRAFANA_URL=grafana.${AIOPS_DOMAIN} && \
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring-grafana
labels:
app.kubernetes.io/name: grafana-ingress
spec:
ingressClassName: nginx
rules:
- host: ${GRAFANA_URL}
http:
paths:
- backend:
service:
name: grafana
port:
number: 3000
path: /
pathType: Prefix
EOF
cat <<'EOF' | kubectl apply -n prometheus-operator -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: grafana
labels:
app.kubernetes.io/name: grafana-networkpolicy
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring-grafana
podSelector:
matchLabels:
app: grafana
podSelector:
matchLabels:
prometheus: prometheus
policyTypes:
- Ingress
EOF
kubectl patch subscription ibm-aiops-orchestrator -n aiops --type=merge -p '{"spec":{"config":{"env":[{"name":"ENABLE_KAFKA_METRICS","value":"true"}]}}}' && \
kubectl label servicemonitor --all -n aiops com.ibm.aiops.monitoring/on="true"
kubectl label podmonitor --all -n aiops com.ibm.aiops.monitoring/on="true"
kubectl label servicemonitor --all -n prometheus-operator com.ibm.aiops.monitoring/on="true"
echo "Grafana URL and Login:"
echo "=========================================="
echo "https://${GRAFANA_URL}"
echo "admin / ${GRAFANA_PASSWORD}"
echo "=========================================="
Importing dashboards
Once you've finished deploying Grafana and you've logged into the Grafana UI, custom Cloud Pak for AIOps dashboards can be imported by copying and pasting the json, or by uploading the json file.
To see which dashboards are available for import, see Grafana dashboards.