Monitors in Cloud Pak for Data
You can monitor the state of IBM Cloud Pak for Data and your services with platform monitors, service monitors, and privileged monitors. Platform monitors are installed automatically when you install Cloud Pak for Data. You must manually install service monitors and privileged monitors.
Platform, service, and privileged monitor information can be viewed from the
page.You can run additional health checks on your Cloud Pak for Data deployment with the cpd-cli
health
commands. For more information, see health
.
Platform monitors
The following platform monitors are automatically installed when you install Cloud Pak for Data.
- Service status check (
check-service-status
) - A service is composed of pods and one or more service instances. The service status monitor checks the status of the pods, service instances, and monitor events that are associated with a service. A critical state indicates that either a service instance is in a failed state or a pod is in a failed or unknown state.
- Service instance status check (
check-instance-status
) - A service instance is composed of one or more pods. The service instance status monitor checks the status of service instances to determine whether the pods that are associated with the instance are running as expected. A critical state indicates that one or more pods that are associated with the instance are in a failed or unknown state.
- Monitor status check (
check-monitor-status
) - A monitor is a script that checks the state of an entity and generates events based on the state of the entity. The monitor status monitor checks the status of monitoring jobs to determining whether the jobs completed successfully. A critical state indicates that one or more jobs did not complete successfully.
- Deployment status check (
check-deployment-status
) - Each service is configured to maintain a specific number of
Deployment
replicas. The deployment status monitor checks the status ofDeployment
replicas that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the service does not have enough replicas. - StatefulSet status check (
check-statefulset-status
) - Each service is configured to maintain a specific number of
StatefulSet
replicas. The StatefulSet status monitor checks the status ofStatefulSet
replicas that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the service does not have enough replicas. - PVC status check (
check-pvc-status
) - A persistent volume claim (PVC) is a request for storage that meets specific criteria, such as a minimum size or a specific access mode. The PVC status monitor checks the status of the PVCs that are associated with Cloud Pak for Data and reports any issues. A critical state indicates that the PVC is not associated with a storage volume, which means that the service cannot store data.
- Quota status check (
check-quota-status
) -
An administrator set a vCPU quota and a memory quota for services or for the platform. The quota status monitor checks the quotas and requests that are associated with Cloud Pak for Data to determine whether services have sufficient resources to fulfill requests. A critical state indicates that the service has insufficient resources to fulfill requests.
For more information about setting quotas and thresholds, see Monitoring the platform.
Service monitors
The following monitors are installed when you install service monitors. For more information, see Installing service monitors.
Service | Monitors |
---|---|
AI Factsheets | No service monitors are available. |
Analytics Engine powered by Apache Spark | The Analytics Engine powered by Apache Spark service provides
a health check monitor. The monitor includes additional checks to ensure:
|
Cognos Analytics | The Cognos Analytics service provides a health check monitor. |
Cognos Dashboards | The Cognos Dashboards service provides a health check monitor. |
Data Gate | The Data Gate service provides a health check monitor. |
Data Privacy |
|
Data Product Hub | No service monitors are available. |
Data Refinery | The Data Refinery service provides
a health check monitor. The monitor includes an additional check to ensure that the
|
Data Replication | The Data Replication service
provides a health check monitor. The monitor includes an additional check to ensure the health and availability of the Replication API. |
DataStage | The DataStage service provides a health check monitor. |
Data Virtualization |
You do not need to run the The health check monitor for Data Virtualization is automatically installed when you install the service. The monitor includes additional checks to ensure:
|
Db2 | No service monitors are available. |
Db2 Big SQL |
You do not need to run the The health check monitor for Db2 Big SQL is automatically installed when you install the service. The monitor includes additional checks to ensure:
|
Db2 Data Management Console | The Db2
Data Management Console service provides a
health check monitor. The monitor includes an additional check to ensure the health and availability of the embedded Redis data store. |
Db2 Warehouse | No service monitors are available. |
Decision Optimization | The Decision Optimization service provides a health
check monitor. The monitor includes additional checks to ensure the health and availability of:
|
EDB Postgres | The EDB Postgres service provides a health check monitor. |
Execution Engine for Apache Hadoop | The Execution Engine for Apache Hadoop service provides a health check monitor. |
IBM Knowledge Catalog | No service monitors are available. |
IBM Knowledge Catalog Premium |
|
IBM Knowledge Catalog Standard |
|
IBM Match 360 with Watson | The IBM
Match 360 with Watson service provides a
health check monitor. The monitor includes additional checks to ensure the health and availability
of the following IBM
Match 360 components:
|
Informix |
You do not need to run the The health check monitor for Informix is automatically installed when you install the service. The monitor includes additional checks for:
|
MANTA Automated Data Lineage | The MANTA Automated Data Lineage service provides a
health check monitor. The monitor includes additional checks to ensure the health and availability of:
|
MongoDB | The MongoDB service provides a health check monitor. |
OpenPages | The OpenPages service provides a
health check monitor. The monitor uses the OpenPages REST API to check:
|
Orchestration Pipelines | The Orchestration Pipelines service provides a health
check monitor. The monitor includes additional checks to ensure the health and availability of:
|
Planning Analytics | The Planning Analytics service provides a health check monitor. |
Product Master | The Product Master service provides a health check monitor. |
RStudio® Server Runtimes | The RStudio Server
Runtimes service provides a
health check monitor. The monitor includes an additional check to ensure the health and availability of interactive RStudio runtimes. |
SPSS Modeler | The SPSS
Modeler service provides a health
check monitor. The monitor includes additional checks to ensure that the following pods started successfully:
|
Synthetic Data Generator | The Synthetic Data Generator service provides a health
check monitor. The monitor includes an additional check to ensure that the runtime pod started successfully. |
Voice Gateway | No service monitors are available. |
Watson Discovery | The Watson Discovery service provides a health check monitor. |
Watson Machine Learning | The Watson
Machine Learning service provides a health
check monitor. The monitor includes additional checks to ensure the health and availability of the
following services:
|
Watson Machine Learning Accelerator | The Watson Machine Learning Accelerator service provides a health check monitor. |
Watson OpenScale | The Watson OpenScale service provides a health check monitor. |
Watson Speech services | The Watson Speech services provide two
service health check monitors:
|
Watson Studio | The Watson Studio monitor checks
the health and availability of the following components:
|
Watson Studio Runtimes | The Watson Studio Runtimes monitors are
integrated with the Watson Studio
monitor. The monitor includes an additional check to ensure the health and availability of Jupyter Notebook and JupyterLab runtimes. |
watsonx.ai | The watsonx.ai service
provides two service health check monitors:
|
watsonx Assistant | The watsonx Assistant service provides a
health check monitor. The monitor includes additional checks to ensure the health and availability of:
|
watsonx Code Assistant for Red Hat® Ansible® Lightspeed | The watsonx Code Assistant for Red Hat
Ansible Lightspeed service
provides a health check monitor. The monitor includes additional checks to ensure the health and
availability of the following services:
|
watsonx Code Assistant for Z |
|
watsonx Code Assistant for Z Code Explanation |
|
watsonx.data | The watsonx.data service
provides a health check monitor. The monitor includes additional checks to ensure the health and availability of:
|
watsonx.governance | No service monitors are available. |
watsonx Orchestrate | No service monitors are available. |
Privileged monitors
The following monitors are installed when you install the privileged monitoring service. To install the privileged monitoring service, see Installing privileged monitors.
- Cluster operator status check (
check-cluster-operator-status
) - Checks the status of the cluster operators that comprise the Red Hat
OpenShift® Container Platform infrastructure to determine whether:
- All of the operators are
AVAILABLE
- Any of the operators are
DEGRADED
- All of the operators are
- Network status check (
check-network-status
) - Checks the status of the
PodNetworkConnectivityCheck
objects for cluster resources to determine whether the objects areReachable
. - Node imbalance status check (
check-node-imbalance-status
) - Checks whether vCPU requests are balanced across nodes or whether one node is supporting a disproportionately high load.
- Node status check (
check-node-status
) - Checks whether the nodes on the cluster are ready and whether the nodes are using too many resources.
- Volume usage status check (
check-volume-status
) - Checks whether the persistent volume claims associated with the deployment are running out of
space.Restriction: Only persistent volume claims that are mounted by a running pod are monitored.
- Operator namespace status check (
check-operator-namespace-status
) - Checks whether the resources in the operators project for the deployment are
healthy.Important: If you also want to check the status of the operators in the project where the scheduling service is installed, you must run the
apply-privileged-monitoring-service
command with the--cluster_components_ns=${PROJECT_SCHEDULING_SERVICE}
option. - EDB cluster status check (
check-edb-cluster-status
) - Checks whether any instances of EDB Postgres that are associated with the deployment are healthy. For example, whether the database that Cloud Pak for Data uses to store metadata for the deployment is healthy.