Performing routine cluster monitoring

Establish a schedule for monitoring your Cloud Pak for Data deployments on Red Hat® OpenShift® Container Platform.

The health of your cluster can have a huge impact on the health of your Cloud Pak for Data deployments.

Who should perform this task?
A cluster administrator must perform this task.
How frequently should you perform this task?

It is recommended that you perform this task at least once per day or once per shift.

However, if there is a large variation in the number of concurrent users or jobs, it is recommended that you perform this task more frequently during peak activity.

Your routine should include the following tasks:

  1. If your storage is remote, ensure that your network is running at 10 Gbps or greater.
  2. Run the storage performance validation playbook to confirm that there are no underlying performance issues with your persistent storage.
  3. Review the monitoring data from the OpenShift Container Platform web console.
    Important: Ensure that you enable monitoring for the user-defined projects where Cloud Pak for Data software is installed.

  4. Relevant OpenShift documentation

    OpenShift Version Resources
    Version 4.10
    Version 4.12
  5. Review the following dashboards:

    • API Performance
    • Kubernetes / Compute Resources / Cluster
    • Kubernetes / Compute Resource / Node (Pods)
    • Kubernetes / Compute Resources / Namespace(Pods)
  6. Check the status of the Operand Deployment Lifecycle Manager objects on the cluster:
    1. Confirm that the catalog sources on the cluster are Ready:
      oc get catalogsource -A \
      -o jsonpath="{range .items[*]}{.metadata.name}{': '}{.status.connectionState.lastObservedState}{'\n'}{end}"
    2. Get information about the Cloud Pak for Data operator subscriptions to determine the channel and to confirm that the current CSV is the same as the installed CSV:
      oc get subscription -n ${PROJECT_CPD_INST_OPERATORS} \
      -o jsonpath="{range .items[*]}{.metadata.name}{' - channel: '}{.spec.channel}{', installedCSV: '}{.status.installedCSV}{', currentCSV: '}{.status.currentCSV}{'\n'}{end}"
    3. Confirm that the operator deployments are ready and have available replicas:
      oc get deploy -n ${PROJECT_CPD_INST_OPERATORS}
    4. Check the status of the operator pods and determine whether any of the pods have been restarted:
      oc get pods -n ${PROJECT_CPD_INST_OPERATORS}