watsonx Assistant Prometheus queries

Run Prometheus queries from the OpenShift® Console to visualize metric data.

Who needs to complete this task?: Cluster administrator A cluster administrator must perform this task.
How frequently should you perform this task?: Repeat as needed You should run Prometheus queries as often as necessary to monitor your Cloud Pak for Data deployments. It is recommended that you perform this task at least once per day or once per shift.

Ensure that you enable monitoring for user-defined projects and configure the OpenShift Monitoring stack. For more information about how to complete these tasks, see the documentation in the following table:

OpenShift Version	Resources
Version 4.12	Enabling monitoring for user-defined projects Configuring the monitoring stack
Version 4.14	Enabling monitoring for user-defined projects Configuring the monitoring stack
Version 4.15	Enabling monitoring for user-defined projects Configuring the monitoring stack
Version 4.16	Enabling monitoring for user-defined projects Configuring the monitoring stack

To run the following Prometheus queries, go to Observe > Metrics in the OpenShift Console.

Resource usage

You can run the following Prometheus queries to monitor your watsonx Assistant resource usage:

CPU remaining for a container
CPU usage for a container
CPU usage for a pod
Memory remaining for a container
Memory usage for a container
Memory usage for a pod

CPU remaining for a container

Displays the CPU remaining for a container over a 5-minute interval.

kube_pod_container_resource_limits{pod=~".*wa.*",resource="cpu"}- on (pod,container) rate(container_cpu_usage_seconds_total{pod=~".*wa.*",container!="POD"}[5m])

CPU usage for a container

Displays the total CPU that a container is using over a 5-minute interval.

rate(container_cpu_usage_seconds_total{pod=~".*wa.*",container!="POD"}[5m])

CPU usage for a pod

Displays the total CPU that a pod is using over a 5-minute interval.

pod:container_cpu_usage:sum{pod=~".*wa.*"}

Memory remaining for a container

Displays the memory remaining for a container in GB.

container_spec_memory_limit_bytes{pod=~".*wa.*",container!="POD"} - container_memory_working_set_bytes{pod=~".*wa.*",container!="POD"}

Memory usage for a container

Displays the total memory that a container is using in GB.

container_memory_working_set_bytes{pod=~".*wa.*",container!="POD"}

Memory usage for a pod

Displays the total memory that a pod is using in GB.

pod:container_memory_usage_bytes:sum{pod=~".*wa.*"}

Store

You can run the following Prometheus queries to monitor your watsonx Assistant store:

5.0.1 or later Number of HTTP requests
5.0.1 or later HTTP requests for observation buckets
5.0.1 or later Duration of HTTP requests
5.0.1 or later Number of store sessions
5.0.1 or later Store sessions for observation buckets
5.0.1 or later Size of store session
5.0.1 or later PostgreSQL pool

Number of HTTP requests

Displays the total number of HTTP requests that occurred since the pod started.

assistant_http_request_duration_seconds_count

HTTP requests for observation buckets

The Value column displays the total number of HTTP requests for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of HTTP request that took 10 seconds or less.

assistant_http_request_duration_seconds_bucket

Duration of HTTP requests

Displays the total duration of HTTP requests since the pod started in seconds.

assistant_http_request_duration_seconds_sum

Number of store sessions

Displays the total number of stateful v2 sessions that were handled by the pod since the pods started.

assistant_store_session_size_kilobytes_count

Store sessions for observation buckets

The Value column displays the total number of stateful v2 sessions for each observation bucket since the pod started. The observation buckets are indicated in kilobytes in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of sessions with size 10 KB or less.

assistant_store_session_size_kilobytes_bucket

Size of store session

Displays the total size of the store session since the pod started.

assistant_store_session_size_kilobytes_sum

PostgreSQL pool

Displays a count of the following types of PostgreSQL clients and requests:

The total type, which is the number of clients that exist in the pool.
The waiting type, which is the number of queued requests that are waiting on a client when all clients are checked out. It can be helpful to monitor this number to see whether you need to adjust the size of the pool.
The idle type, which is the number of clients that are not checked out and are idle in the pool.

assistant_store_postgres_pool_counts

etcd

You can run the following Prometheus queries to monitor etcd:

Disk latency for etcd
Failed proposals for etcd
Peer latency for etcd

Disk latency for etcd

Displays the current disk latency for etcd with watsonx Assistant. This value should stay under 0.01 or errors can occur.

rate(etcd_disk_wal_fsync_duration_seconds_sum{pod=~".*wa-etcd-.*"}[5m])/rate(etcd_disk_wal_fsync_duration_seconds_count{pod=~".*wa-etcd-.*"}[5m])

Failed proposals for etcd

Displays the total number of failed etcd proposals that occurred. Proposals can include leadership election or sync notices. Failures typically indicate that a cluster is not healthy.

etcd_server_proposals_failed_total{pod=~".*wa-etcd-.*"}

Peer latency for etcd

Displays the current peer latency for etcd with watsonx Assistant. This value should stay under 0.01 or errors can occur.

rate(etcd_network_peer_round_trip_time_seconds_sum{pod=~".*wa-etcd-.*"}[5m])/rate(etcd_network_peer_round_trip_time_seconds_count{pod=~".*wa-etcd-.*"}[5m])

EDB Postgres

You can run the following Prometheus queries to monitor EDB Postgres:

Number of EDB Postgres WAL files
Size of EDB Postgres WAL files

Number of EDB Postgres WAL files

Displays the total number of EDB Postgres WAL (Write-Ahead Log) files that are in use for watsonx Assistant.

cnp_collector_pg_wal{value="count"}

Size of EDB Postgres WAL files

Displays the total size of all EDB Postgres WAL (Write-Ahead Log) files that are in use for watsonx Assistant.

cnp_collector_pg_wal{value="size"}

gRPC

You can run the following Prometheus queries to monitor gRPC:

Concurrent gRPC requests
gRPC requests for observation buckets
Number of gRPC requests
Duration of gRPC requests

Concurrent gRPC requests

Displays the total number of gRPC requests that are currently being processed. This query returns information for Dragonfly and the CLU Embedding service.

assistant_grpc_server_concurrency_requests

gRPC requests for observation buckets

When you run the assistant_grpc_server_request_duration_seconds_bucket query, the Value column displays the total number of gRPC requests for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 2.5, then the Value column indicates the total number of requests that took 2.5 seconds or less.

assistant_grpc_server_request_duration_seconds_bucket

Number of gRPC requests

The assistant_grpc_server_request_duration_seconds metric is a histogram of every gRPC request for the server. It includes information including response codes, methods that were started, and gRPC method types. When you run the assistant_grpc_server_request_duration_seconds_count query, the Value column displays the total number of requests that occurred since the pod started.

assistant_grpc_server_request_duration_seconds_count

Duration of gRPC requests

The assistant_grpc_server_request_duration_seconds metric is a histogram of every gRPC request for the server. It includes information including response codes, methods that were started, and gRPC method types. When you run the assistant_grpc_server_request_duration_seconds_sum query, the Value column displays the total duration of requests since the pod started in seconds.

assistant_grpc_server_request_duration_seconds_sum

ModelMesh

You can run the following Prometheus queries to monitor ModelMesh:

ModelMesh requests for observation buckets
Number of ModelMesh requests
Duration of ModelMesh requests

ModelMesh requests for observation buckets

The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. A model is considered evicted when it is removed from the cache. Because this metric is of the least recently used cache, you can expect the oldest models to be evicted. If the model eviction age becomes low, then it might mean that too many evictions are occurring. Generally, a model eviction age of less than 4 to 7 days is significant.

When you run the modelmesh_age_at_eviction_milliseconds_bucket query, the Value column displays the total number of evicted models for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 300000, then the Value column indicates the number of evicted models that were used less than 300,000 milliseconds ago.

modelmesh_age_at_eviction_milliseconds_bucket

Number of ModelMesh requests

The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. When you run the modelmesh_age_at_eviction_milliseconds_count query, the Value column displays the total number of models that were evicted since the pod started.

modelmesh_age_at_eviction_milliseconds_count

Duration of ModelMesh requests

The modelmesh_age_at_eviction_milliseconds metric is a histogram of every ModelMesh model that was evicted from the least recently used cache. It includes information including response codes, methods that were started, and ModelMesh method types. When you run the modelmesh_age_at_eviction_milliseconds_sum query, the Value column displays the total age of evicted models since the pod started in milliseconds.

modelmesh_age_at_eviction_milliseconds_sum

Algorithm training duration

You can run the following Prometheus queries to monitor your watsonx Assistant algorithm training duration:

Algorithm training duration for observation buckets
Number of algorithm trainings
Duration of algorithm trainings

Algorithm training duration for observation buckets

The assistant_algorithm_training_time_seconds metric is a histogram of every model training that occurs. It measures the time that the training algorithm took to train the model. It includes information including status, service, model language, and an estimate of the workspace size.

When you run the assistant_algorithm_training_time_seconds_bucket query, the Value column displays the total number of trainings for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of model trainings that took 10 seconds or less.

assistant_algorithm_training_time_seconds_bucket

Number of algorithm trainings

When you run the assistant_algorithm_training_time_seconds_count query, the Value column displays the total number of trainings that occurred since the pod started.

assistant_algorithm_training_time_seconds_count

Duration of algorithm trainings

When you run the assistant_algorithm_training_time_seconds_sum query, the Value column displays the total duration of model trainings since the pod started in seconds.

assistant_algorithm_training_time_seconds_sum

End-to-end model training duration

You can run the following Prometheus queries to monitor your watsonx Assistant model training duration:

Model training duration for observation buckets
Number of model trainings
Duration of model trainings

Model training duration for observation buckets

The assistant_total_training_time_seconds metric is a histogram of every model training that occurs. It measures how long the training took in seconds and includes information about the status and service.

When you run the assistant_total_training_time_seconds_bucket query, the Value column displays the total number of trainings for each observation bucket since the pod started. The observation buckets are indicated in seconds in the le column. For example, if the number indicated in the le column is 10.0, then the Value column indicates the total number of model trainings that took 10 seconds or less.

assistant_total_training_time_seconds_bucket

Number of model trainings

When you run the assistant_total_training_time_seconds_count query, the Value column displays the total number of trainings that occurred since the pod started.

assistant_total_training_time_seconds_count

Duration of model trainings

When you run the assistant_total_training_time_seconds_sum query, the Value column displays the total duration of model trainings since the pod started in seconds.

assistant_total_training_time_seconds_sum

Volume

You can apply the volume queries to any persistent volume claim (PVC). In these queries, replace <VOLUME> with the regular expression for data store type that you want to monitor. Use the following regular expressions for the data stores:

Data store	Regular expression
EDB Postgres	`.wa-postgres-.`
Elasticsearch	`data-.wa-es-.-.*`
etcd	`data-.wa-etcd-.`
MinIO	`export-.wa-minio-.`
All data stores (EDB Postgres, Elasticsearch, etcd, and MinIO)	`export-.wa-minio-.\|data-.wa-es-.-.\|.wa-postgres-.\|data-.wa-etcd-.*`

You can run the following Prometheus queries to monitor your PVCs:

Data remaining for volumes
Rate of change for volumes
Size of volumes after current rate of change

Data remaining for volumes

Displays the amount of data that is remaining for the specified volumes.

kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}

Rate of change for volumes

Displays the rate of change for the specified volumes over a period of 5 minutes.

deriv(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}[5m])

Size of volumes after current rate of change

Displays what the size of a volume will be after 24 hours at the current rate of change. This query is useful to help determine whether a persistent volume will run out of space if the current ingestion or growth continues.

predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"<VOLUME>"}[5m], 24 * 3600)