Setting up Prometheus for instance monitoring

Prometheus is an open-source monitoring tool built by the Cloud Native Computing Foundation. It gives you access to a variety of metrics you can use to monitor your Data Gate for watsonx instances. The metrics are presented graphically by the Prometheus web UI. This section describes how to set up instance monitoring with Prometheus.

About this task

Prometheus collects the following metrics:

Core synchronization metrics:
Metric Type Description
datagate_table_sync_latency_milliseconds Gauge The current synchronization latency in milliseconds
datagate_table_sync_deleted_rows_per_second Gauge Number of rows that are deleted per second
datagate_table_sync_inserted_rows_per_second Gauge Number of rows that are inserted per second
datagate_table_sync_target_inserted_rows_total Function Counter Total number of rows inserted into the target database since the start of the synchronization process
datagate_table_sync_target_deleted_rows_total Function Counter Total number of rows that were deleted from the target database since the start of the synchronization process

Synchronization state metrics:

Metric Type Description
datagate_table_sync_state Gauge If synchronization is active, this metric has a value of 1; if not, the value is 0. This value has the label state.
datagate_table_sync_state_value Gauge A numeric synchronization status value that is useful for the debugging of unknown errors or unrecognized error codes.
Row operation metrics:
Metric Type Description
datagate_table_sync_source_insert_log_records_total Function Counter Total number of INSERT log record that were read from the source Db2® for z/OS® log since the start of the synchronization process
datagate_table_sync_source_update_log_records_total Function Counter Total number of UPDATE log record that were read from the source Db2 for z/OS log since the start of the synchronization process
datagate_table_sync_source_delete_log_records_total Function Counter Total number of DELETE log record that were read from the source Db2 for z/OS log since the start of the synchronization process.
datagate_table_sync_source_compensated_rows_total Function Counter Total number of compensated rows processed by the source log parser
datagate_table_sync_source_utility_log_records_total Function Counter Total number of utility log records processed by the source log parser
Net effect operation metrics:
Metric Type Description
datagate_table_sync_target_intra_tx_net_effect_operations_total Function Counter Cumulative total of intra-transaction net-effect operations applied to the target database, that is, operations within a single transaction
datagate_table_sync_target_inter_tx_net_effect_operations_total Function Counter Cumulative total of inter-transaction net-effect operations applied to the target database, that is, a total calculated on the basis of individual transaction values between the synchronization start and the metric capture point
Capture time:
Metric Type Description
datagate_table_sync_capture_time_timestamp_seconds Gauge A UNIX timestamp in seconds that shows when the metrics were captured from the target database
Health metrics:
Metric Type Description
datagate_table_sync_statistics_available Gauge A value that shows whether synchronization statistics are currently available. If statistics are available, this value is 1, if not, it is 0.
datagate_table_sync_metrics_refresh_success_total Counter The number of successful metrics refresh operations since the start of the pod. The value is reset when you restart the pod.
datagate_table_sync_metrics_refresh_failures_total Counter The number of failed metrics refresh operations since the start of the pod. The value is reset when you restart the pod.
datagate_table_sync_metrics_last_success_timestamp_seconds Gauge A UNIX timestamp in seconds that shows the time of the last successful metrics refresh. This value remains 0 until the first successful refresh after the start of the pod.
datagate_table_sync_metrics_refresh_duration_seconds Timer/histogram Duration of the metrics refresh operation

Procedure

  1. Log in to the OpenShift® server where the Data Gate for watsonx instance is installed.
  2. In the terminal or shell window, set the following environment variables:
    export CPD_URL="https://cpd-cpd-instance.apps.example.com"
    export METRICS_URL="https://api-<host-route>/metrics"
    export DATAGATE_NAMESPACE="<datagate-namespace>"
    export DATAGATE_INSTANCE="<datagate-instance-name>"  # Example: dg-1879976000510032
    export CPD_USERNAME="<cpd-user>"
    export CPD_PASSWORD="<cpd-password>"

Enabling OpenShift user workload monitoring

  1. Enter the following command to create an OpenShift ConfigMap that enables user workload monitoring.
    oc apply -f - ‹<'EOF' 
       apiVersion: v1 kind: ConfigMap 
       metadata:
          name: cluster-monitoring-config
          namespace: openshift-monitoring 
          data: config.yaml: | enableUserWorkload: true EOF
    In the screen output, you might see a warning message. This should be followed by this message:
    configmap/cluster-monitoring-config configured
  2. Verify that the user workload monitoring pods are running:
    oc get pods -n openshift-user-workload-monitoring
    You should see a list of pods including:
    Pod Description
    prometheus-operator-* The Prometheus operator
    prometheus-user-workload-0 Prometheus instance
    prometheus-user-workload-1 Prometheus instance
    thanos-user-workload-0 Thanos Ruler instance
    thanos-user-workload-0 Thanos Ruler instance

Enabling user workload monitoring for the Data Gate for watsonx namespace

  1. Enter the following command to label the namespace. This is required if you want to enable user workload monitoring instances.
    oc label namespace "${DATAGATE_NAMESPACE}" openshift.io/user-monitoring=true 
    
    You see a confirmation on the screen, similar to this one:
    namespace/cpd-instance labeled

    Verify the label of the namespace by entering this command:

    oc get namespace "${DATAGATE_NAMESPACE}" --show-labels
    The output should be similar to the following:
    NAME  STATUS   AGE   LABELS
    cpd-instance   Active   176d  kubernetes.io/metadata.name=cpd-instance,openshift.io/user-monitoring=true,
    pod-security.kubernetes.io/audit-version=latest,
    pod-security.kubernetes.io/audit=restricted,
    pod-security.kubernetes.io/warn-version=latest, 
    pod-security. kubernetes.io/warn=restricted, 
    rsi_5356bca0c5=enabled

Creating a bearer token secret

The metrics endpoint requires a bearer token for authentication. Therefore, you must create a Kubernetes secret containing the bearer token.

  1. If a bearer token secret exists, and you cannot or do not want to use it, first delete this secret:
    oc delete secret prometheus-bearer-token -n "${DATAGATE_NAMESPACE}"
  2. Obtain a new token by entering the following command:
    export TOKEN=$(
      curl -k -s -X POST "${CPD_URL}/icp4d-api/v1/authorize" \
        -H "Content-Type: application/json" \
        -d "{\"username\":\"${CPD_USERNAME}\",\"password\":\"${CPD_PASSWORD}\"}" \
      | jq -r '.token'
    )
  3. Create a bearer token secret for Prometheus:
    oc create secret generic prometheus-bearer-token \
      --from-literal=token="${TOKEN}" \
      -n "${DATAGATE_NAMESPACE}"
    The reply to this command should be:
    secret/prometheus-bearer-token created
    
    
    Tip: Bearer tokens expire. In production environments, you should implement a procedure that refreshes tokens and updates secrets automatically. To learn more, see Refreshing bearer tokens for Prometheus authentication.
  4. Enter the following command to ensure that Prometheus user workload monitoring has been enabled in your OpenShift environment:
    oc get route -n openshift-user-workload-monitoring
    In the output, you should see a list of host names. At least one of these hosts should list prometheus-user-workload in the SERVICE column, as in the following example:
    NAME      HOST/PORT   PATH   SERVICES                  PORT      TERMINATION        WILDCARD
    federate     <hostname>   /federate   prometheus-user-workload  federate  reencrypt/Redirect None
    thanos-ruler <hostname>   /api        thanos-ruler              web       reencrypt/Redirect None
    
  5. Give Prometheus access to the bearer token secret you created in step 8. To do that, you must create a bearer token role and assign it to Prometheus. Enter the following command:
    oc apply —f — <<'EOF' 
    apiVersion: rbac.authorization.k8s.io/v1 
    kind: Role 
    metadata: 
       name: prometheus-bearer-token-reader 
       namespace: cpd-instance 
    rules: 
    - apiGroups: [""] 
      resources: ["secrets"] 
      resourceNames: ["prometheus-bearer-token"] 
      verbs: ["get"] 
    ---
    apiVersion: rbac.authorization.k8s.io/v1 
    kind: RoleBinding 
    metadata: 
       name: prometheus-bearer-token-reader 
       namespace: cpd-instance 
    roleRef: 
       apiGroup: rbac.authorization.k8s.io 
       kind: Role 
       name: prometheus-bearer-token-reader 
    subjects: 
     - kind: ServiceAccount 
       name: prometheus-user-workload 
       namespace: openshift-user-workload-monitoring 
    EOF
    Having submitted the command, you should see the following confirmation messages on the screen:
    role.rbac.authorization.k8s.io/prometheus-bearer-token-reader created 
    rolebinding.rbac.authorization.k8s.io/prometheus-bearer-token-reader created

Creating a ServiceMonitor

The Prometheus operator needs a ServiceMonitor element to discover services and obtain ("scrape") metrics.

  1. Enter the following commands to verify that the Data Gate for watsonx Db2z API service exists and to check its labels:
    oc get svc -n  "${DATAGATE_NAMESPACE}" | grep data-gate-db2z-api
    oc get svc "${DATAGATE_INSTANCE}-data-gate-db2z-api-svc" -n "${DATAGATE_NAMESPACE}" --show-labels
    The output should be similar to the following:
    dg-1877277353792780-data-gate-db2z-api-svc ClustertP 172.30.151.196 <none>   8334/TCP
    dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46   <none>   8334/TCP 
    
    NAME                                       TYPE      CLUSTER-IP   EXTERNAL-IP PORT(s)  AGE LABELS
    dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46 <none>      8334/TCP 38d app.kuber
    netes.io/component=dg,app.kubernetes.io/instance=
    dg-0-1877277868942730,app.kubernetes.io/managed-by=dg-1877277868942730,
    app.kubernetes.io/name=dg-1877277868942730,app=dg-1877277868942730,
    component=dg,icpdsupport/addOnId=dg,icpdsupport/app=dg-instance-db2z-api,
    icpdsupport/assemblyName=datagate,icpdsupport/ignore-on-nd-backup=true,
    icpdsupport/module=dginstance,release=dg-0-1877277868942730,
    velero.io/exclude-from-backup=true
  2. Create a ServiceMonitor with https configuration. You find an example below. Before you enter the command, replace <instance-id> with the ID of the Data Gate for watsonx instance you want to monitor with the help of Prometheus. In the example, this is 1781677861056642.
    Note: A single ServiceMonitor element can serve multiple Data Gate for watsonx instances within the same OpenShift cluster. The following example shows a configuration for a single instance only.
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: datagate-1781677861056642-db2z-api-metrics
      namespace: cpd-instance
      labels:
        app.kubernetes.io/name: datagate
        app.kubernetes.io/instance: "1781677861056642"
        openshift.io/user-monitoring: "true"
    spec:
      selector:
        matchLabels:
          app: dg-<instance-id>
          icpdsupport/app: dg-instance-db2z-api
      endpoints:
      - port: http
        path: /metrics
        interval: 15s
        scrapeTimeout: 10s
        scheme: https
        tlsConfig:
          insecureSkipVerify: true
        authorization:
          type: Bearer
          credentials:
            key: token
            name: prometheus-bearer-token
        metricRelabelings:
        - sourceLabels: []
          targetLabel: datagate_instance_id
          replacement: "1781677861056642"
    Explanations:
    scheme: https
    The Data Gate for watsonx API uses HTTPS with a self-signed certificate.
    tlsConfig.insecureSkipVerify: true
    It is required that you skip the certificate verification for self-signed certificates.
    openshift.io/user-monitoring: "true"
    OpenShift user workload monitoring requires this setting to discover the ServiceMonitor.
    selector.matchExpressions
    This expression matches services with the string icpdsupport/app=dg-instance-db2z-api as part of the label.
    bearerTokenSecret
    This line references the secret you created in step 8.
    The ServiceMonitor creation is confirmed by the following message:
    servicemonitor.monitoring.coreos.com/datagate-1781677861056642-api-metrics created

    where <instance-id> is replaced with the actual Data Gate for watsonx instance ID.

  3. Verify that the ServiceMonitor works in your Data Gate for watsonx environment (namespace). Replace <instance-id> in the following command with the instance ID that was output as part of the confirmation message in the previous step.
    oc get servicemonitor -n "${DATAGATE_NAMESPACE}"
    oc describe servicemonitor datagate-1781677861056642-db2z-api-metrics -n "${DATAGATE_NAMESPACE}"
    You should see a screen output similar to this one:
    Name:         datagate-1781677861056642-db2z-api-metrics
    Namespace:    cpd-instance
    Labels:       app.kubernetes.io/instance=1781677861056642
                  app.kubernetes.io/name=datagate
                  openshift.io/user-monitoring=true
    Annotations:  <none>
    API Version:  monitoring.coreos.com/v1
    Kind:         ServiceMonitor
    Metadata:
      Creation Timestamp:  2026-06-26T14:46:29Z
      Generation:          1
      Resource Version:    133491985
      UID:                 0c9bb932-1e4e-4340-a0ec-14555c4e157b
    Spec:
      Endpoints:
        Authorization:
          Credentials:
            Key:   token
            Name:  prometheus-bearer-token
          Type:    Bearer
        Interval:  15s
        Metric Relabelings:
          Action:       replace
          Replacement:  1781677861056642
          Source Labels:
          Target Label:  datagate_instance_id
        Path:            /metrics
        Port:            http
        Scheme:          https
        Scrape Timeout:  10s
        Tls Config:
          Insecure Skip Verify:  true
      Selector:
        Match Labels:
          App:              dg-1781677861056642
          icpdsupport/app:  dg-instance-db2z-api
    Events:                 <none>

    where <instance-id> is replaced with the actual Data Gate for watsonx instance ID.