Setting up Prometheus for instance monitoring

Edit online

Prometheus is an open-source monitoring tool built by the Cloud Native Computing Foundation. It gives you access to a variety of metrics you can use to monitor your Data Gate for watsonx instances. The metrics are presented graphically by the Prometheus web UI. This section describes how to set up instance monitoring with Prometheus.

About this task

Prometheus collects the following metrics:

Core synchronization metrics:

Metric	Type	Description
`datagate_table_sync_latency_milliseconds`	Gauge	The current synchronization latency in milliseconds
`datagate_table_sync_deleted_rows_per_second`	Gauge	Number of rows that are deleted per second
`datagate_table_sync_inserted_rows_per_second`	Gauge	Number of rows that are inserted per second
`datagate_table_sync_target_inserted_rows_total`	Function Counter	Total number of rows inserted into the target database since the start of the synchronization process
`datagate_table_sync_target_deleted_rows_total`	Function Counter	Total number of rows that were deleted from the target database since the start of the synchronization process

Synchronization state metrics:

Metric	Type	Description
`datagate_table_sync_state`	Gauge	If synchronization is active, this metric has a value of 1; if not, the value is 0. This value has the label `state`.
`datagate_table_sync_state_value`	Gauge	A numeric synchronization status value that is useful for the debugging of unknown errors or unrecognized error codes.

Row operation metrics:

Metric	Type	Description
`datagate_table_sync_source_insert_log_records_total`	Function Counter	Total number of INSERT log record that were read from the source Db2® for z/OS® log since the start of the synchronization process
`datagate_table_sync_source_update_log_records_total`	Function Counter	Total number of UPDATE log record that were read from the source Db2 for z/OS log since the start of the synchronization process
`datagate_table_sync_source_delete_log_records_total`	Function Counter	Total number of DELETE log record that were read from the source Db2 for z/OS log since the start of the synchronization process.
`datagate_table_sync_source_compensated_rows_total`	Function Counter	Total number of compensated rows processed by the source log parser
`datagate_table_sync_source_utility_log_records_total`	Function Counter	Total number of utility log records processed by the source log parser

Net effect operation metrics:

Metric	Type	Description
`datagate_table_sync_target_intra_tx_net_effect_operations_total`	Function Counter	Cumulative total of intra-transaction net-effect operations applied to the target database, that is, operations within a single transaction
`datagate_table_sync_target_inter_tx_net_effect_operations_total`	Function Counter	Cumulative total of inter-transaction net-effect operations applied to the target database, that is, a total calculated on the basis of individual transaction values between the synchronization start and the metric capture point

Capture time:

Metric	Type	Description
`datagate_table_sync_capture_time_timestamp_seconds`	Gauge	A UNIX timestamp in seconds that shows when the metrics were captured from the target database

Health metrics:

Metric	Type	Description
`datagate_table_sync_statistics_available`	Gauge	A value that shows whether synchronization statistics are currently available. If statistics are available, this value is 1, if not, it is 0.
`datagate_table_sync_metrics_refresh_success_total`	Counter	The number of successful metrics refresh operations since the start of the pod. The value is reset when you restart the pod.
`datagate_table_sync_metrics_refresh_failures_total`	Counter	The number of failed metrics refresh operations since the start of the pod. The value is reset when you restart the pod.
`datagate_table_sync_metrics_last_success_timestamp_seconds`	Gauge	A UNIX timestamp in seconds that shows the time of the last successful metrics refresh. This value remains 0 until the first successful refresh after the start of the pod.
`datagate_table_sync_metrics_refresh_duration_seconds`	Timer/histogram	Duration of the metrics refresh operation

Procedure

In the terminal or shell window, set the following environment variables:

export CPD_URL="https://cpd-cpd-instance.apps.example.com"
export METRICS_URL="https://api-<host-route>/metrics"
export DATAGATE_NAMESPACE="<datagate-namespace>"
export DATAGATE_INSTANCE="<datagate-instance-name>"  # Example: dg-1879976000510032
export CPD_USERNAME="<cpd-user>"
export CPD_PASSWORD="<cpd-password>"

Enabling OpenShift user workload monitoring

Enter the following command to create an OpenShift ConfigMap that enables user workload monitoring.

oc apply -f - ‹<'EOF' 
   apiVersion: v1 kind: ConfigMap 
   metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring 
      data: config.yaml: | enableUserWorkload: true EOF

In the screen output, you might see a warning message. This should be followed by this message:

configmap/cluster-monitoring-config configured

Verify that the user workload monitoring pods are running:

oc get pods -n openshift-user-workload-monitoring

You should see a list of pods including:

Pod	Description
`prometheus-operator-*`	The Prometheus operator
`prometheus-user-workload-0`	Prometheus instance
`prometheus-user-workload-1`	Prometheus instance
`thanos-user-workload-0`	Thanos Ruler instance
`thanos-user-workload-0`	Thanos Ruler instance

Enabling user workload monitoring for the Data Gate for watsonx namespace

Enter the following command to label the namespace. This is required if you want to enable user workload monitoring instances.

oc label namespace "${DATAGATE_NAMESPACE}" openshift.io/user-monitoring=true

You see a confirmation on the screen, similar to this one:

namespace/cpd-instance labeled

Verify the label of the namespace by entering this command:

oc get namespace "${DATAGATE_NAMESPACE}" --show-labels

The output should be similar to the following:

NAME  STATUS   AGE   LABELS
cpd-instance   Active   176d  kubernetes.io/metadata.name=cpd-instance,openshift.io/user-monitoring=true,
pod-security.kubernetes.io/audit-version=latest,
pod-security.kubernetes.io/audit=restricted,
pod-security.kubernetes.io/warn-version=latest, 
pod-security. kubernetes.io/warn=restricted, 
rsi_5356bca0c5=enabled

Creating a bearer token secret

The metrics endpoint requires a bearer token for authentication. Therefore, you must create a Kubernetes secret containing the bearer token.

If a bearer token secret exists, and you cannot or do not want to use it, first delete this secret:
```
oc delete secret prometheus-bearer-token -n "${DATAGATE_NAMESPACE}"
```

Obtain a new token by entering the following command:

export TOKEN=$(
  curl -k -s -X POST "${CPD_URL}/icp4d-api/v1/authorize" \
    -H "Content-Type: application/json" \
    -d "{\"username\":\"${CPD_USERNAME}\",\"password\":\"${CPD_PASSWORD}\"}" \
  | jq -r '.token'
)

Create a bearer token secret for Prometheus:
```
oc create secret generic prometheus-bearer-token \
  --from-literal=token="${TOKEN}" \
  -n "${DATAGATE_NAMESPACE}"
```
The reply to this command should be:
```
secret/prometheus-bearer-token created
```
Tip: Bearer tokens expire. In production environments, you should implement a procedure that refreshes tokens and updates secrets automatically. To learn more, see Refreshing bearer tokens for Prometheus authentication.

Enter the following command to ensure that Prometheus user workload monitoring has been enabled in your OpenShift environment:

oc get route -n openshift-user-workload-monitoring

In the output, you should see a list of host names. At least one of these hosts should list prometheus-user-workload in the SERVICE column, as in the following example:

NAME      HOST/PORT   PATH   SERVICES                  PORT      TERMINATION        WILDCARD
federate     <hostname>   /federate   prometheus-user-workload  federate  reencrypt/Redirect None
thanos-ruler <hostname>   /api        thanos-ruler              web       reencrypt/Redirect None

Give Prometheus access to the bearer token secret you created in step 8. To do that, you must create a bearer token role and assign it to Prometheus. Enter the following command:

oc apply —f — <<'EOF' 
apiVersion: rbac.authorization.k8s.io/v1 
kind: Role 
metadata: 
   name: prometheus-bearer-token-reader 
   namespace: cpd-instance 
rules: 
- apiGroups: [""] 
  resources: ["secrets"] 
  resourceNames: ["prometheus-bearer-token"] 
  verbs: ["get"] 
---
apiVersion: rbac.authorization.k8s.io/v1 
kind: RoleBinding 
metadata: 
   name: prometheus-bearer-token-reader 
   namespace: cpd-instance 
roleRef: 
   apiGroup: rbac.authorization.k8s.io 
   kind: Role 
   name: prometheus-bearer-token-reader 
subjects: 
 - kind: ServiceAccount 
   name: prometheus-user-workload 
   namespace: openshift-user-workload-monitoring 
EOF

Having submitted the command, you should see the following confirmation messages on the screen:

role.rbac.authorization.k8s.io/prometheus-bearer-token-reader created 
rolebinding.rbac.authorization.k8s.io/prometheus-bearer-token-reader created

Creating a ServiceMonitor

The Prometheus operator needs a ServiceMonitor element to discover services and obtain ("scrape") metrics.

Enter the following commands to verify that the Data Gate for watsonx Db2z API service exists and to check its labels:

oc get svc -n  "${DATAGATE_NAMESPACE}" | grep data-gate-db2z-api
oc get svc "${DATAGATE_INSTANCE}-data-gate-db2z-api-svc" -n "${DATAGATE_NAMESPACE}" --show-labels

The output should be similar to the following:

dg-1877277353792780-data-gate-db2z-api-svc ClustertP 172.30.151.196 <none>   8334/TCP
dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46   <none>   8334/TCP 

NAME                                       TYPE      CLUSTER-IP   EXTERNAL-IP PORT(s)  AGE LABELS
dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46 <none>      8334/TCP 38d app.kuber
netes.io/component=dg,app.kubernetes.io/instance=
dg-0-1877277868942730,app.kubernetes.io/managed-by=dg-1877277868942730,
app.kubernetes.io/name=dg-1877277868942730,app=dg-1877277868942730,
component=dg,icpdsupport/addOnId=dg,icpdsupport/app=dg-instance-db2z-api,
icpdsupport/assemblyName=datagate,icpdsupport/ignore-on-nd-backup=true,
icpdsupport/module=dginstance,release=dg-0-1877277868942730,
velero.io/exclude-from-backup=true

Create a ServiceMonitor with https configuration. You find an example below. Before you enter the command, replace <instance-id> with the ID of the Data Gate for watsonx instance you want to monitor with the help of Prometheus. In the example, this is 1781677861056642.
Note: A single ServiceMonitor element can serve multiple Data Gate for watsonx instances within the same OpenShift cluster. The following example shows a configuration for a single instance only.
```
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: datagate-1781677861056642-db2z-api-metrics
  namespace: cpd-instance
  labels:
    app.kubernetes.io/name: datagate
    app.kubernetes.io/instance: "1781677861056642"
    openshift.io/user-monitoring: "true"
spec:
  selector:
    matchLabels:
      app: dg-<instance-id>
      icpdsupport/app: dg-instance-db2z-api
  endpoints:
  - port: http
    path: /metrics
    interval: 15s
    scrapeTimeout: 10s
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
    authorization:
      type: Bearer
      credentials:
        key: token
        name: prometheus-bearer-token
    metricRelabelings:
    - sourceLabels: []
      targetLabel: datagate_instance_id
      replacement: "1781677861056642"
```
Explanations:

scheme: https

The Data Gate for watsonx API uses HTTPS with a self-signed certificate.

tlsConfig.insecureSkipVerify: true

It is required that you skip the certificate verification for self-signed certificates.

openshift.io/user-monitoring: "true"

OpenShift user workload monitoring requires this setting to discover the ServiceMonitor.

selector.matchExpressions

This expression matches services with the string icpdsupport/app=dg-instance-db2z-api as part of the label.

bearerTokenSecret

This line references the secret you created in step 8.
The ServiceMonitor creation is confirmed by the following message:
```
servicemonitor.monitoring.coreos.com/datagate-1781677861056642-api-metrics created
```
where <instance-id> is replaced with the actual Data Gate for watsonx instance ID.

Verify that the ServiceMonitor works in your Data Gate for watsonx environment (namespace). Replace <instance-id> in the following command with the instance ID that was output as part of the confirmation message in the previous step.

oc get servicemonitor -n "${DATAGATE_NAMESPACE}"
oc describe servicemonitor datagate-1781677861056642-db2z-api-metrics -n "${DATAGATE_NAMESPACE}"

You should see a screen output similar to this one:

Name:         datagate-1781677861056642-db2z-api-metrics
Namespace:    cpd-instance
Labels:       app.kubernetes.io/instance=1781677861056642
              app.kubernetes.io/name=datagate
              openshift.io/user-monitoring=true
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Creation Timestamp:  2026-06-26T14:46:29Z
  Generation:          1
  Resource Version:    133491985
  UID:                 0c9bb932-1e4e-4340-a0ec-14555c4e157b
Spec:
  Endpoints:
    Authorization:
      Credentials:
        Key:   token
        Name:  prometheus-bearer-token
      Type:    Bearer
    Interval:  15s
    Metric Relabelings:
      Action:       replace
      Replacement:  1781677861056642
      Source Labels:
      Target Label:  datagate_instance_id
    Path:            /metrics
    Port:            http
    Scheme:          https
    Scrape Timeout:  10s
    Tls Config:
      Insecure Skip Verify:  true
  Selector:
    Match Labels:
      App:              dg-1781677861056642
      icpdsupport/app:  dg-instance-db2z-api
Events:                 <none>

where <instance-id> is replaced with the actual Data Gate for watsonx instance ID.