Health checks and monitoring (IBM Cloud Pak for AIOps on OpenShift)

To prepare your cluster for major operations, such as installing or upgrading IBM Cloud Pak for AIOps, and to maintain your system as you complete Day 2 operations, run health checks and use the available monitoring options to review the health of your IBM Cloud Pak for AIOps deployment and your Red Hat OpenShift Container Platform cluster.

By completing initial and routine health checks of your IBM Cloud Pak for AIOps deployment and cluster, you can gain the following benefits:

  • Ensure installation or upgrade readiness

    The good health condition of your Red Hat OpenShift Container Platform cluster, storage, and services are prerequisites to ensure the success of any installation or upgrade of IBM Cloud Pak for AIOps. The prerequisites check can help to ensure the readiness of your cluster to proceed with an installation or upgrade.

  • Ensure your system stability

    Routine health checks monitor your Red Hat OpenShift Container Platform cluster and services to ensure that your system is functioning properly and to identify any potential issues.

  • Maintain your system performance

    Routine health checks can help to ensure that your IBM Cloud Pak for AIOps deployment and cluster runs optimally by detecting and addressing performance bottlenecks. Your administrators can maintain a historical record to support requests for resource increases.

  • Improve your system reliability

    By monitoring the health of IBM Cloud Pak for AIOps, you can proactively identify and resolve issues that might potentially cause downtime or data loss.

For more information about running health checks and logging, see the following topics:

Tools

To help you with running initial and routine health checks, use the following tools and scripts:

  • Red Hat OpenShift Container Platform web console Monitoring dashboards

    Red Hat OpenShift Container Platform provides different dashboards and capabilities that are accessible for the web console that can use to monitor and manage your system health.

  • IBM Cloud Pak for AIOps prerequisite checker tool

    The prerequisite checker tool can be used to validate whether a Red Hat OpenShift Container Platform cluster has the required resources and prerequisites available to enable a successful installation or upgrade of IBM Cloud Pak for AIOps. This tool completes the following checks and generates an installation readiness report:

  • IBM Cloud Pak for AIOps custom sizing tool

    IBM Sales representatives and Business Partners have access to a custom sizing tool that can assess your runtime requirements and provide a custom profile that scales IBM Cloud Pak for AIOps components. The custom profile is applied when you install IBM Cloud Pak for AIOps. This custom profile cannot be applied after installation, and attempting to do so can break your IBM Cloud Pak for AIOps deployment. By requesting your sales representative use this tool for helping you develop your custom profile, you can help to ensure that you have the allocated resources for handling your workload volumes.

  • IBM Cloud Pak for AIOps MustGather tool

    The MustGather tool gathers data for IBM Cloud Pak for AIOps deployments. With this tool, you can complete data collection tasks and view analytics on the gathered data. The tool generates a report that shows critical data about the status and health of the cluster and installation. This report allows your to view an overview of your cluster to help you spot and solve problems. This report can provide you with basic cluster analysis to help identify cluster node resource utilization, warnings, and problematic resources. The report can also provide some basic incident analysis.

Running health checks

For guidance on running health checks for different use cases, review the following documentation:

  • Installation and upgrade health checks

    Run prerequisite checks and health checks to ensure that your cluster is ready to begin an installation or upgrade of IBM Cloud Pak for AIOps.

  • Proactive health checks

    Complete routine health checks and use the available monitoring capabilities to track the ongoing status of your deployment and cluster to maintain the health of your system.

  • Reactive health checks

    When you encounter any issues with the health of your IBM Cloud Pak for AIOps deployment or Red Hat OpenShift Container Platform cluster, run the MustGather tool to gather data about the health and details of your deployment and cluster:

Self monitoring

Review the following different ways to monitor your IBM Cloud Pak for AIOps installation:

  • Self-monitoring with alerts

    Since Cloud Pak for AIOps runs on OpenShift Container Platform, you can use the inbuilt Alertmanager feature of OpenShift Container Platform to monitor and send alerts to Cloud Pak for AIOps when a problem occurs in the underlying infrastructure.

  • Self-monitoring with topology

    Create a topology visualization of your Cloud Pak for AIOps instance to monitor all the resources and their associated errors and warnings.

  • Monitoring with Prometheus and Grafana

    Prometheus can serve as a monitoring system, while Grafana acts as a visualization tool to oversee your current installation. Prometheus and Grafana can be installed and configured on a Red Hat OpenShift Container Platform cluster.

  • Monitoring data processed

    Monitor the volume of data that is processed by Cloud Pak for AIOps to see whether the OpenShift Container Platform cluster is sized properly.

Circuit breaker for Kafka robustness and stability

Equip IBM Cloud Pak for AIOps with the ability to protect itself and stay operational during an event storm by implementing a circuit breaker pattern. This helps prevent system failure while ensuring events, alerts, and incidents are still processed.

For more information, see Circuit breaker for Kafka robustness and stability.

Cluster logging

As a cluster administrator, you can deploy centralized logging on an OpenShift Container Platform cluster, and use it to collect and aggregate node system audit logs, application container logs, and infrastructure logs. You can forward logs to your chosen log solution, including on-cluster, and Red Hat managed log storage. You can also visualize your log data in the OpenShift Container Platform web console, the Kibana web console, or 3rd party log aggregator depending on your deployed log storage solution.

Note: The Kibana web console is deprecated.

OpenShift Container Platform cluster administrators can deploy logging by using Operators.

For information, see About logging.