watsonx Assistant Prometheus queries

Run Prometheus queries from the OpenShift® Console to visualize metric data.
Who needs to complete this task?
Cluster administrator A cluster administrator must perform this task.
How frequently should you perform this task?
Repeat as needed You should run Prometheus queries as often as necessary to monitor your Cloud Pak for Data deployments. It is recommended that you perform this task at least once per day or once per shift.

Ensure that you enable monitoring for user-defined projects and configure the OpenShift Monitoring stack. For more information about how to complete these tasks, see the documentation in the following table:

OpenShift Version Resources
Version 4.12
Version 4.14
Version 4.15
Version 4.16

To run the following Prometheus queries, go to Observe > Metrics in the OpenShift Console.

Resource usage

CPU remaining for a container
Displays the CPU remaining for a container over a 5-minute interval.
kube_pod_container_resource_limits{pod=~".*wa.*",resource="cpu"}- on (pod,container) rate(container_cpu_usage_seconds_total{pod=~".*wa.*",container!="POD"}[5m])
CPU usage for a container
Displays the total CPU that a container is using over a 5-minute interval.
rate(container_cpu_usage_seconds_total{pod=~".*wa.*",container!="POD"}[5m])
CPU usage for a pod
Displays the total CPU that a pod is using over a 5-minute interval.
pod:container_cpu_usage:sum{pod=~".*wa.*"}
Memory remaining for a container
Displays the memory remaining for a container in GB.
container_spec_memory_limit_bytes{pod=~".*wa.*",container!="POD"} - container_memory_working_set_bytes{pod=~".*wa.*",container!="POD"}
Memory usage for a container
Displays the total memory that a container is using in GB.
container_memory_working_set_bytes{pod=~".*wa.*",container!="POD"}
Memory usage for a pod
Displays the total memory that a pod is using in GB.
pod:container_memory_usage_bytes:sum{pod=~".*wa.*"}

Store

You can run the following Prometheus queries to monitor your watsonx Assistant store:
Number of HTTP requests
Displays the total number of HTTP requests that occurred since the pod started.
assistant_http_request_duration_seconds_count
HTTP requests for observation buckets
The Value column displays the total number of HTTP requests for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of HTTP request that took 10 seconds or less.
assistant_http_request_duration_seconds_bucket
Duration of HTTP requests
Displays the total duration of HTTP requests since the pod started in seconds.
assistant_http_request_duration_seconds_sum
Number of store sessions
Displays the total number of stateful v2 sessions that were handled by the pod since the pods started.
assistant_store_session_size_kilobytes_count
Store sessions for observation buckets
The Value column displays the total number of stateful v2 sessions for each observation bucket since the pod started. The observation buckets are indicated in kilobytes in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of sessions with size 10 KB or less.
assistant_store_session_size_kilobytes_bucket
Size of store session
Displays the total size of the store session since the pod started.
assistant_store_session_size_kilobytes_sum
PostgreSQL pool
Displays a count of the following types of PostgreSQL clients and requests:
  • The total type, which is the number of clients that exist in the pool.
  • The waiting type, which is the number of queued requests that are waiting on a client when all clients are checked out. It can be helpful to monitor this number to see whether you need to adjust the size of the pool.
  • The idle type, which is the number of clients that are not checked out and are idle in the pool.
assistant_store_postgres_pool_counts

etcd

You can run the following Prometheus queries to monitor etcd:
Disk latency for etcd
Displays the current disk latency for etcd with watsonx Assistant. This value should stay under 0.01 or errors can occur.
rate(etcd_disk_wal_fsync_duration_seconds_sum{pod=~".*wa-etcd-.*"}[5m])/rate(etcd_disk_wal_fsync_duration_seconds_count{pod=~".*wa-etcd-.*"}[5m])
Failed proposals for etcd
Displays the total number of failed etcd proposals that occurred. Proposals can include leadership election or sync notices. Failures typically indicate that a cluster is not healthy.
etcd_server_proposals_failed_total{pod=~".*wa-etcd-.*"}
Peer latency for etcd
Displays the current peer latency for etcd with watsonx Assistant. This value should stay under 0.01 or errors can occur.
rate(etcd_network_peer_round_trip_time_seconds_sum{pod=~".*wa-etcd-.*"}[5m])/rate(etcd_network_peer_round_trip_time_seconds_count{pod=~".*wa-etcd-.*"}[5m])

EDB Postgres

You can run the following Prometheus queries to monitor EDB Postgres:
Number of EDB Postgres WAL files
Displays the total number of EDB Postgres WAL (Write-Ahead Log) files that are in use for watsonx Assistant.
cnp_collector_pg_wal{value="count"}
Size of EDB Postgres WAL files
Displays the total size of all EDB Postgres WAL (Write-Ahead Log) files that are in use for watsonx Assistant.
cnp_collector_pg_wal{value="size"}

gRPC

Concurrent gRPC requests
Displays the total number of gRPC requests that are currently being processed. This query returns information for Dragonfly and the CLU Embedding service.
assistant_grpc_server_concurrency_requests
gRPC requests for observation buckets
The assistant_grpc_server_request_duration_seconds metric is a histogram of every gRPC request for the server. It includes information including response codes, methods that were started, and gRPC method types.

When you run the assistant_grpc_server_request_duration_seconds_bucket query, the Value column displays the total number of gRPC requests for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 2.5, then the Value column indicates the total number of requests that took 2.5 seconds or less.

assistant_grpc_server_request_duration_seconds_bucket
Number of gRPC requests
The assistant_grpc_server_request_duration_seconds metric is a histogram of every gRPC request for the server. It includes information including response codes, methods that were started, and gRPC method types. When you run the assistant_grpc_server_request_duration_seconds_count query, the Value column displays the total number of requests that occurred since the pod started.
assistant_grpc_server_request_duration_seconds_count
Duration of gRPC requests
The assistant_grpc_server_request_duration_seconds metric is a histogram of every gRPC request for the server. It includes information including response codes, methods that were started, and gRPC method types. When you run the assistant_grpc_server_request_duration_seconds_sum query, the Value column displays the total duration of requests since the pod started in seconds.
assistant_grpc_server_request_duration_seconds_sum

ModelMesh

You can run the following Prometheus queries to monitor ModelMesh:
ModelMesh requests for observation buckets
The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. A model is considered evicted when it is removed from the cache. Because this metric is of the least recently used cache, you can expect the oldest models to be evicted. If the model eviction age becomes low, then it might mean that too many evictions are occurring. Generally, a model eviction age of less than 4 to 7 days is significant.

When you run the modelmesh_age_at_eviction_milliseconds_bucket query, the Value column displays the total number of evicted models for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 300000, then the Value column indicates the number of evicted models that were used less than 300,000 milliseconds ago.

modelmesh_age_at_eviction_milliseconds_bucket
Number of ModelMesh requests
The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. When you run the modelmesh_age_at_eviction_milliseconds_count query, the Value column displays the total number of models that were evicted since the pod started.
modelmesh_age_at_eviction_milliseconds_count
Duration of ModelMesh requests
The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. It includes information including response codes, methods that were started, and ModelMesh method types. When you run the modelmesh_age_at_eviction_milliseconds_sum query, the Value column displays the total age of evicted models since the pod started in milliseconds.
modelmesh_age_at_eviction_milliseconds_sum

Algorithm training duration

You can run the following Prometheus queries to monitor your watsonx Assistant algorithm training duration:
Algorithm training duration for observation buckets
The assistant_algorithm_training_time_seconds metric is a histogram of every model training that occurs. It measures the time that the training algorithm took to train the model. It includes information including status, service, model language, and an estimate of the workspace size.

When you run the assistant_algorithm_training_time_seconds_bucket query, the Value column displays the total number of trainings for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of model trainings that took 10 seconds or less.

assistant_algorithm_training_time_seconds_bucket
Number of algorithm trainings
The assistant_algorithm_training_time_seconds metric is a histogram of every model training that occurs. It measures the time that the training algorithm took to train the model. It includes information including status, service, model language, and an estimate of the workspace size.

When you run the assistant_algorithm_training_time_seconds_count query, the Value column displays the total number of trainings that occurred since the pod started.

assistant_algorithm_training_time_seconds_count
Duration of algorithm trainings
The assistant_algorithm_training_time_seconds metric is a histogram of every model training that occurs. It measures the time that the training algorithm took to train the model. It includes information including status, service, model language, and an estimate of the workspace size.

When you run the assistant_algorithm_training_time_seconds_sum query, the Value column displays the total duration of model trainings since the pod started in seconds.

assistant_algorithm_training_time_seconds_sum

End-to-end model training duration

You can run the following Prometheus queries to monitor your watsonx Assistant model training duration:
Model training duration for observation buckets
The assistant_total_training_time_seconds metric is a histogram of every model training that occurs. It measures how long the training took in seconds and includes information about the status and service.

When you run the assistant_total_training_time_seconds_bucket query, the Value column displays the total number of trainings for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of model trainings that took 10 seconds or less.

assistant_total_training_time_seconds_bucket
Number of model trainings
The assistant_total_training_time_seconds metric is a histogram of every model training that occurs. It measures how long the training took in seconds and includes information about the status and service.

When you run the assistant_total_training_time_seconds_count query, the Value column displays the total number of trainings that occurred since the pod started.

assistant_total_training_time_seconds_count
Duration of model trainings
The assistant_total_training_time_seconds metric is a histogram of every model training that occurs. It measures how long the training took in seconds and includes information about the status and service.

When you run the assistant_total_training_time_seconds_sum query, the Value column displays the total duration of model trainings since the pod started in seconds.

assistant_total_training_time_seconds_sum

Volume

You can apply the volume queries to any persistent volume claim (PVC). In these queries, replace <VOLUME> with the regular expression for data store type that you want to monitor. Use the following regular expressions for the data stores:
Data store Regular expression
EDB Postgres .*wa-postgres-.*
Elasticsearch data-.*wa-es-.*-.*
etcd data-.*wa-etcd-.*
MinIO export-.*wa-minio-.*
All data stores (EDB Postgres, Elasticsearch, etcd, and MinIO) export-.*wa-minio-.*|data-.*wa-es-.*-.*|.*wa-postgres-.*|data-.*wa-etcd-.*
You can run the following Prometheus queries to monitor your PVCs:
Data remaining for volumes
Displays the amount of data that is remaining for the specified volumes.
kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}
Rate of change for volumes
Displays the rate of change for the specified volumes over a period of 5 minutes.
deriv(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}[5m])
Size of volumes after current rate of change
Displays what the size of a volume will be after 24 hours at the current rate of change. This query is useful to help determine whether a persistent volume will run out of space if the current ingestion or growth continues.
predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}[5m], 24 * 3600)