Monitors in Cloud Pak for Data

You can monitor the state of IBM Cloud Pak for Data and your services with platform monitors, service monitors, and privileged monitors. Platform monitors are installed automatically when you install Cloud Pak for Data. You must manually install service monitors and privileged monitors.

Platform, service, and privileged monitor information can be viewed from the Monitoring > Alerts and events page.

Tip:

You can run additional health checks on your Cloud Pak for Data deployment with the cpd-cli health commands. For more information, see health.

Platform monitors

The following platform monitors are automatically installed when you install Cloud Pak for Data.

Service status check (check-service-status)
A service is composed of pods and one or more service instances. The service status monitor checks the status of the pods, service instances, and monitor events that are associated with a service. A critical state indicates that either a service instance is in a failed state or a pod is in a failed or unknown state.
Service instance status check (check-instance-status)
A service instance is composed of one or more pods. The service instance status monitor checks the status of service instances to determine whether the pods that are associated with the instance are running as expected. A critical state indicates that one or more pods that are associated with the instance are in a failed or unknown state.
Monitor status check (check-monitor-status)
A monitor is a script that checks the state of an entity and generates events based on the state of the entity. The monitor status monitor checks the status of monitoring jobs to determining whether the jobs completed successfully. A critical state indicates that one or more jobs did not complete successfully.
Deployment status check (check-deployment-status)
Each service is configured to maintain a specific number of Deployment replicas. The deployment status monitor checks the status of Deployment replicas that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the service does not have enough replicas.
StatefulSet status check (check-statefulset-status)
Each service is configured to maintain a specific number of StatefulSet replicas. The StatefulSet status monitor checks the status of StatefulSet replicas that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the service does not have enough replicas.
PVC status check (check-pvc-status)
A persistent volume claim (PVC) is a request for storage that meets specific criteria, such as a minimum size or a specific access mode. The PVC status monitor checks the status of the PVCs that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the PVC is not associated with a storage volume, which means that the service cannot store data.
Quota status check (check-quota-status)

An administrator set a vCPU quota and a memory quota for services or for the platform. The quota status monitor checks the quotas and requests that are associated with Cloud Pak for Data to determine whether services have sufficient resources to fulfill requests. A critical state indicates that the service has insufficient resources to fulfill requests.

For more information about setting quotas and thresholds, see Monitoring the platform.

Service monitors

The following monitors are installed when you install service monitors. For more information, see Installing service monitors.

Service Monitors
AI Factsheets No service monitors are available.
Analytics Engine powered by Apache Spark The Analytics Engine powered by Apache Spark service provides a health check monitor.
The monitor includes additional checks to ensure:
  • The service can connect to the Spark database
  • Spark instances are created as expected
  • Spark instances are deleted as expected
  • Spark kernels are created as expected
  • Spark kernels are deleted as expected
  • The Spark history server was started as expected
  • The Spark history server was stopped as expected
Cognos Analytics The Cognos Analytics service provides a health check monitor.
Cognos Dashboards The Cognos Dashboards service provides a health check monitor.
Data Gate The Data Gate service provides a health check monitor.
Data Privacy
  • 5.0.0 and 5.0.1 No service monitors are available.
  • 5.0.2 or later The Data Privacy service provides a health check monitor.

    The monitor includes an additional check to ensure that the dataprivacy pods started successfully.

Data Product Hub No service monitors are available.
Data Refinery The Data Refinery service provides a health check monitor.

The monitor includes an additional check to ensure that the datarefinery pods started successfully.

Data Replication The Data Replication service provides a health check monitor.

The monitor includes an additional check to ensure the health and availability of the Replication API.

DataStage The DataStage service provides a health check monitor.
Data Virtualization

You do not need to run the apply-service-monitor command to install the health check monitor.

The health check monitor for Data Virtualization is automatically installed when you install the service.

The monitor includes additional checks to ensure:
  • The Db2 Big SQL components are running.
  • The data virtualization engine is running.
  • The file system in the pod that manages the caching storage is healthy.
  • The persistent volume that is used by the head pod has sufficient capacity.
Db2 No service monitors are available.
Db2 Big SQL

You do not need to run the apply-service-monitor command to install the monitors.

The health check monitor for Db2 Big SQL is automatically installed when you install the service.

The monitor includes additional checks to ensure:
  • The Db2 Big SQL components are running.
  • The persistent volume that is used by the head pod has sufficient capacity.
Db2 Data Management Console The Db2 Data Management Console service provides a health check monitor.

The monitor includes an additional check to ensure the health and availability of the embedded Redis data store.

Db2 Warehouse No service monitors are available.
Decision Optimization The Decision Optimization service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of:
  • The Decision Optimization user interface
  • The Decision Optimization API
  • The Decision Optimization Modeling Assistant API
EDB Postgres The EDB Postgres service provides a health check monitor.
Execution Engine for Apache Hadoop The Execution Engine for Apache Hadoop service provides a health check monitor.
IBM Knowledge Catalog No service monitors are available.
IBM Knowledge Catalog Premium
  • 5.0.0 and 5.0.1 No service monitors are available.
  • 5.0.2 or later The IBM Knowledge Catalog Premium service provides a health check monitor. The monitor includes additional checks to ensure that the inner AI models are running successfully.
IBM Knowledge Catalog Standard
  • 5.0.0 and 5.0.1 No service monitors are available.
  • 5.0.2 or later The IBM Knowledge Catalog Standard service provides a health check monitor. The monitor includes additional checks to ensure that the inner AI models are running successfully.
IBM Match 360 with Watson The IBM Match 360 with Watson service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of the following IBM Match 360 components:
  • Matching service
  • Model service
Informix

You do not need to run the apply-service-monitor command to install the health check monitor.

The health check monitor for Informix is automatically installed when you install the service.

The monitor includes additional checks for:
  • Database checkpoint statistics
  • Database sessions
  • Database memory use
  • Database disk space use (Dbspace)
MANTA Automated Data Lineage The MANTA Automated Data Lineage service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of:
  • The admin UI
  • The Artemis service
  • The Configuration service
  • The Dataflow service
  • The Flow Agent service
MongoDB The MongoDB service provides a health check monitor.
OpenPages The OpenPages service provides a health check monitor.
The monitor uses the OpenPages REST API to check:
  • The availability of the OpenPages service
  • The status of the database connection for internal and external databases
Orchestration Pipelines The Orchestration Pipelines service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of:
  • The orchestration platform API service.
  • The Pipelines UI
  • The translation service
Planning Analytics The Planning Analytics service provides a health check monitor.
Product Master The Product Master service provides a health check monitor.
RStudio® Server Runtimes The RStudio Server Runtimes service provides a health check monitor.

The monitor includes an additional check to ensure the health and availability of interactive RStudio runtimes.

SPSS Modeler The SPSS Modeler service provides a health check monitor.
The monitor includes additional checks to ensure that the following pods started successfully:
  • SPSS Modeler runtime pod
  • SPSS Modeler flow API pod
Synthetic Data Generator The Synthetic Data Generator service provides a health check monitor.

The monitor includes an additional check to ensure that the runtime pod started successfully.

Voice Gateway No service monitors are available.
Watson Discovery The Watson Discovery service provides a health check monitor.
Watson Machine Learning The Watson Machine Learning service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of the following services:
  • Deployment service
  • Repository service
  • Training service
Watson Machine Learning Accelerator The Watson Machine Learning Accelerator service provides a health check monitor.
Watson OpenScale The Watson OpenScale service provides a health check monitor.
Watson Speech services The Watson Speech services provide two service health check monitors:
  • A service health check monitor for Watson Speech to Text
  • A service health check monitor for Watson Text to Speech
Watson Studio The Watson Studio monitor checks the health and availability of the following components:
  • Machine learning user interface tools
  • Machine learning model visualization
  • Notebook user interface
  • Notebook user interface for rendering static HTML
  • Notebook management API
  • Notebook and script job execution API
Watson Studio Runtimes The Watson Studio Runtimes monitors are integrated with the Watson Studio monitor.

The monitor includes an additional check to ensure the health and availability of Jupyter Notebook and JupyterLab runtimes.

watsonx.ai The watsonx.ai service provides two service health check monitors:
  • A service health check for watsonx.ai.
    The monitor includes additional checks to ensure the health and availability of:
    • The UI and API for Prompt Lab
    • The UI and API for Tuning Studio
  • A service health check for inferencing.
    The monitor includes additional checks to ensure the health and availability of:
    • The router that connects Prompt Lab and Tuning Studio to the inference servers
    • The inference server for each installed large language model
watsonx Assistant The watsonx Assistant service provides a health check monitor.

The monitor includes additional checks to ensure the health and availability of:

  • The UI and API for watsonx Assistant
  • The watsonx Assistant installation infrastructure, specifically the availability of the required microservices and data stores
watsonx Code Assistant for Red Hat® Ansible® Lightspeed The watsonx Code Assistant for Red Hat Ansible Lightspeed service provides a health check monitor.
The monitor includes additional checks to ensure the health and availability of the following services:
  • Code generation service
  • Code matching service
watsonx Code Assistant for Z
  • 5.0.0 No service monitors are available.
  • 5.0.1 or later The watsonx Code Assistant for Z service provides a health check monitor.

    The monitor includes an additional check to ensure the health and availability of the code generation service.

watsonx Code Assistant for Z Code Explanation
  • 5.0.3 or later The watsonx Code Assistant for Z Code Explanation service provides a health check monitor.

    The monitor includes an additional check to ensure the health and availability of the code explanation service.

watsonx.data The watsonx.data service provides a health check monitor.

The monitor includes additional checks to ensure the health and availability of:

  • The Presto engine
  • The EDB Postgres storage cluster
  • 5.0.1 or later The Milvus database
watsonx.governance No service monitors are available.
watsonx Orchestrate No service monitors are available.

Privileged monitors

The following monitors are installed when you install the privileged monitoring service. To install the privileged monitoring service, see Installing privileged monitors.

Cluster operator status check (check-cluster-operator-status)
Checks the status of the cluster operators that comprise the Red Hat OpenShift® Container Platform infrastructure to determine whether:
  • All of the operators are AVAILABLE
  • Any of the operators are DEGRADED
Network status check (check-network-status)
Checks the status of the PodNetworkConnectivityCheck objects for cluster resources to determine whether the objects are Reachable.
Node imbalance status check (check-node-imbalance-status)
Checks whether vCPU requests are balanced across nodes or whether one node is supporting a disproportionately high load.
Node status check (check-node-status)
Checks whether the nodes on the cluster are ready and whether the nodes are using too many resources.
Volume usage status check (check-volume-status)
Checks whether the persistent volume claims associated with the deployment are running out of space.
Restriction: Only persistent volume claims that are mounted by a running pod are monitored.
Operator namespace status check (check-operator-namespace-status)
Checks whether the resources in the operators project for the deployment are healthy.
Important: If you also want to check the status of the operators in the project where the scheduling service is installed, you must run the apply-privileged-monitoring-service command with the --cluster_components_ns=${PROJECT_SCHEDULING_SERVICE} option.
EDB cluster status check (check-edb-cluster-status)
Checks whether any instances of EDB Postgres that are associated with the deployment are healthy. For example, whether the database that Cloud Pak for Data uses to store metadata for the deployment is healthy.