Enabling metrics collection for Prometurbo
Create the necessary Custom Resources (CRs) to enable the collection of performance metrics. With CRs, you can quickly update the configurations used for metric collection without needing to restart Prometurbo.
Task overview
To enable metrics collection, perform the following tasks:
-
Create the Prometurbo CRDs.
-
Create the Prometurbo CRs.
This topic describes the CRs that expose metrics for the NVIDIA DCGM, TGI, and Istio exporters.
-
Verify metrics collection.
Creating the Prometurbo CRDs
Create the required Custom Resource Definitions (CRDs). These CRDs ensure the validity of the Prometurbo custom resources (CRs) that you will create in the next task.
oc create -f https://raw.githubusercontent.com/turbonomic/turbo-metrics/main/config/crd/bases/metrics.turbonomic.io_prometheusquerymappings.yaml
oc create -f https://raw.githubusercontent.com/turbonomic/turbo-metrics/main/config/crd/bases/metrics.turbonomic.io_prometheusserverconfigs.yaml
Overview of Prometurbo CRs
Prometurbo requires the following custom resources (CRs):
-
PrometheusQueryMapping
This CR specifies how Prometurbo maps Prometheus queries to Turbonomic entities.
-
PrometheusServerConfig
This CR specifies configuration options for the Prometheus server.
Creating CRs to export GPU and TGI metrics
Create the required CRs to export GPU and TGI metrics. These metrics are required to enable the scaling of LLM inference workloads.
-
Deploy a Kubernetes secret that specifies the token that is used to access the Prometheus server.
apiVersion: v1 data: authorizationToken: {authorization_token} kind: Secret metadata: name: ocp-thanos-authorization-token type: Opaque
Replace
{authorization_token}
with the token in theprometheus-k8s-token-#####
(service-account-token) secret in theopenshift-monitoring
namespace. -
Create the following
PrometheusQueryMapping
CRs in the namespace of your choice, preferably in the namespace where Prometurbo is deployed. -
Create the
PrometheusServerConfig
CR in the namespace wherePrometheusQueryMapping
is created.apiVersion: metrics.turbonomic.io/v1alpha1 kind: PrometheusServerConfig metadata: name: {Prometheus_server_name} spec: address: {Prometheus_server_address} bearerToken: secretKeyRef: key: authorizationToken name: ocp-thanos-authorization-token clusters: - identifier: clusterLabels: {} id: {cluster_ID} queryMappingSelector: matchExpressions: - key: mapping operator: In values: - nvidia-dcgm-exporter - text-generation-inference
Update the following information:
-
metadata: name: {Prometheus_server_name}
Specify the name of your Prometheus server.
-
spec: address: {Prometheus_server_address}
Specify the address of your Prometheus server, such as
https://prometheus.us-east.containers.appdomain.cloud
. -
clusters: - identifier: clusterLabels: {} id: {cluster_ID}
Specify the cluster ID. To obtain the ID, run the following command:
`kubectl -n default get svc kubernetes -ojsonpath='{.metadata.uid}'`
-
Creating CRs for the Istio exporter
Create the following CRs to collect metrics exposed by the Istio exporter.
-
This CR specifies the location of the Prometheus server and a label selector to exclude
jmx-tomcat
from the Prometheus query mapping resource.
Verifying metrics collection
Check the Prometurbo logs to verify that Prometurbo started collecting metrics from the Prometheus server.
The following example verifies the collection of metrics.
I0328 18:42:04.003329 1 provider.go:60] Discovered 4 PrometheusQueryMapping resources.
I0328 18:42:04.007689 1 provider.go:71] Discovered 2 PrometheusServerConfig resources.
I0328 18:42:04.007903 1 serverconfig.go:19] Loading PrometheusServerConfig turbo-community/prometheusserverconfig-emptycluster.
I0328 18:42:04.007927 1 client.go:68] Creating client for Prometheus server: http://prometheus.istio-system:9090
I0328 18:42:04.007935 1 serverconfig.go:36] There are 1 PrometheusQueryMapping resources in namespace turbo-community
I0328 18:42:04.007943 1 serverconfig.go:19] Loading PrometheusServerConfig turbo/prometheusserverconfig-singlecluster.
I0328 18:42:04.007947 1 client.go:68] Creating client for Prometheus server: http://prometheus.istio-system:9090
I0328 18:42:04.007950 1 serverconfig.go:36] There are 3 PrometheusQueryMapping resources in namespace turbo
I0328 18:42:04.008048 1 clusterconfig.go:39] Excluding turbo/jmx-tomcat.