Performing routine cluster monitoring
Establish a schedule for monitoring your Cloud Pak for Data deployments on Red Hat® OpenShift® Container Platform.
The health of your cluster can have a huge impact on the health of your Cloud Pak for Data deployments.
- Who should perform this task?
- A cluster administrator must perform this task.
- How frequently should you perform this task?
-
It is recommended that you perform this task at least once per day or once per shift.
However, if there is a large variation in the number of concurrent users or jobs, it is recommended that you perform this task more frequently during peak activity.
Your routine should include the following tasks:
- If your storage is remote, ensure that your network is running at 10 Gbps or greater.
- Run the storage performance validation playbook to confirm that there are no underlying performance issues with your persistent storage.
- Review the monitoring data from the OpenShift Container Platform web console.Important: Ensure that you enable monitoring for the user-defined projects where Cloud Pak for Data software is installed.
-
Relevant OpenShift documentation
OpenShift Version Resources Version 4.10 Version 4.12 - API Performance
- Kubernetes / Compute Resources / Cluster
- Kubernetes / Compute Resource / Node (Pods)
- Kubernetes / Compute Resources / Namespace(Pods)
- Check the status of the Operand Deployment Lifecycle Manager
objects on the cluster:
- Confirm that the catalog sources on the cluster are
Ready:
oc get catalogsource -A \ -o jsonpath="{range .items[*]}{.metadata.name}{': '}{.status.connectionState.lastObservedState}{'\n'}{end}"
- Get information about the Cloud Pak for Data operator
subscriptions to determine the channel and to confirm that the current CSV is the same as the
installed
CSV:
oc get subscription -n ${PROJECT_CPD_INST_OPERATORS} \ -o jsonpath="{range .items[*]}{.metadata.name}{' - channel: '}{.spec.channel}{', installedCSV: '}{.status.installedCSV}{', currentCSV: '}{.status.currentCSV}{'\n'}{end}"
- Confirm that the operator deployments are ready and have available
replicas:
oc get deploy -n ${PROJECT_CPD_INST_OPERATORS}
- Check the status of the operator pods and determine whether any of the pods have been
restarted:
oc get pods -n ${PROJECT_CPD_INST_OPERATORS}
- Confirm that the catalog sources on the cluster are
Ready:
Review the following dashboards: