Prometheus is an open-source monitoring tool built by the Cloud Native Computing
Foundation. It gives you access to a variety of metrics you can use to monitor your Data Gate for watsonx instances. The metrics are presented
graphically by the Prometheus web UI. This section describes how to set up instance monitoring with
Prometheus.
About this task
Prometheus collects the following metrics:
Core synchronization
metrics:
| Metric |
Type |
Description |
datagate_table_sync_latency_milliseconds |
Gauge |
The current synchronization latency in milliseconds |
datagate_table_sync_deleted_rows_per_second |
Gauge |
Number of rows that are deleted per second |
datagate_table_sync_inserted_rows_per_second |
Gauge |
Number of rows that are inserted per second |
datagate_table_sync_target_inserted_rows_total |
Function Counter |
Total number of rows inserted into the target database since the start of the
synchronization process |
datagate_table_sync_target_deleted_rows_total |
Function Counter |
Total number of rows that were deleted from the target database since the start of the
synchronization process |
Synchronization state metrics:
| Metric |
Type |
Description |
datagate_table_sync_state |
Gauge |
If synchronization is active, this metric has a value of 1; if not, the value is 0. This
value has the label state. |
datagate_table_sync_state_value |
Gauge |
A numeric synchronization status value that is useful for the debugging of unknown errors
or unrecognized error codes. |
Row operation metrics:
| Metric |
Type |
Description |
datagate_table_sync_source_insert_log_records_total |
Function Counter |
Total number of INSERT log record that were read from the source Db2® for z/OS® log since the start of the synchronization
process |
datagate_table_sync_source_update_log_records_total |
Function Counter |
Total number of UPDATE log record that were read from the source Db2 for z/OS log since the start of the synchronization
process |
datagate_table_sync_source_delete_log_records_total |
Function Counter |
Total number of DELETE log record that were read from the source Db2 for z/OS log since the start of the synchronization
process. |
datagate_table_sync_source_compensated_rows_total |
Function Counter |
Total number of compensated rows processed by the source log parser |
datagate_table_sync_source_utility_log_records_total |
Function Counter |
Total number of utility log records processed by the source log parser |
Net effect operation metrics:
| Metric |
Type |
Description |
datagate_table_sync_target_intra_tx_net_effect_operations_total |
Function Counter |
Cumulative total of intra-transaction net-effect operations applied to the target database,
that is, operations within a single transaction |
datagate_table_sync_target_inter_tx_net_effect_operations_total |
Function Counter |
Cumulative total of inter-transaction net-effect operations applied to the target database,
that is, a total calculated on the basis of individual transaction values between the
synchronization start and the metric capture point |
Capture time:
| Metric |
Type |
Description |
datagate_table_sync_capture_time_timestamp_seconds |
Gauge |
A UNIX timestamp in seconds that shows when the metrics were captured from the target
database |
Health metrics:
| Metric |
Type |
Description |
datagate_table_sync_statistics_available |
Gauge |
A value that shows whether synchronization statistics are currently available. If
statistics are available, this value is 1, if not, it is 0. |
datagate_table_sync_metrics_refresh_success_total |
Counter |
The number of successful metrics refresh operations since the start of the pod. The value
is reset when you restart the pod. |
datagate_table_sync_metrics_refresh_failures_total |
Counter |
The number of failed metrics refresh operations since the start of the pod. The value is
reset when you restart the pod. |
datagate_table_sync_metrics_last_success_timestamp_seconds |
Gauge |
A UNIX timestamp in seconds that shows the time of the last successful metrics refresh.
This value remains 0 until the first successful refresh after the start of the pod. |
datagate_table_sync_metrics_refresh_duration_seconds |
Timer/histogram |
Duration of the metrics refresh operation |
Procedure
- Log in to the OpenShift® server where the Data Gate for watsonx instance is installed.
- In the terminal or shell window, set the following environment variables:
export CPD_URL="https://cpd-cpd-instance.apps.example.com"
export METRICS_URL="https://api-<host-route>/metrics"
export DATAGATE_NAMESPACE="<datagate-namespace>"
export DATAGATE_INSTANCE="<datagate-instance-name>" # Example: dg-1879976000510032
export CPD_USERNAME="<cpd-user>"
export CPD_PASSWORD="<cpd-password>"
Enabling OpenShift user
workload monitoring
- Enter the following command to create an OpenShift ConfigMap that enables user workload
monitoring.
oc apply -f - ‹<'EOF'
apiVersion: v1 kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data: config.yaml: | enableUserWorkload: true EOF
In the screen output, you might see a warning message. This should be followed by
this message:
configmap/cluster-monitoring-config configured
- Verify that the user workload monitoring pods are running:
oc get pods -n openshift-user-workload-monitoring
You should see a list of pods including:
| Pod |
Description |
| prometheus-operator-* |
The Prometheus operator |
| prometheus-user-workload-0 |
Prometheus instance |
| prometheus-user-workload-1 |
Prometheus instance |
| thanos-user-workload-0 |
Thanos Ruler instance |
| thanos-user-workload-0 |
Thanos Ruler instance |
Enabling user workload monitoring for the Data Gate for watsonx namespace
- Enter the following command to label the namespace. This is required if you want to
enable user workload monitoring instances.
oc label namespace "${DATAGATE_NAMESPACE}" openshift.io/user-monitoring=true
You see a confirmation on the screen, similar to this
one:
namespace/cpd-instance labeled
Verify the label of the namespace by
entering this
command:
oc get namespace "${DATAGATE_NAMESPACE}" --show-labels
The
output should be similar to the following:
NAME STATUS AGE LABELS
cpd-instance Active 176d kubernetes.io/metadata.name=cpd-instance,openshift.io/user-monitoring=true,
pod-security.kubernetes.io/audit-version=latest,
pod-security.kubernetes.io/audit=restricted,
pod-security.kubernetes.io/warn-version=latest,
pod-security. kubernetes.io/warn=restricted,
rsi_5356bca0c5=enabled
Creating a bearer token secret
The metrics endpoint requires a bearer token
for authentication. Therefore, you must create a Kubernetes secret containing the bearer
token.
- If a bearer token secret exists, and you cannot or do not want to use it, first delete
this secret:
oc delete secret prometheus-bearer-token -n "${DATAGATE_NAMESPACE}"
- Obtain a new token by entering the following command:
export TOKEN=$(
curl -k -s -X POST "${CPD_URL}/icp4d-api/v1/authorize" \
-H "Content-Type: application/json" \
-d "{\"username\":\"${CPD_USERNAME}\",\"password\":\"${CPD_PASSWORD}\"}" \
| jq -r '.token'
)
- Create a bearer token secret for Prometheus:
oc create secret generic prometheus-bearer-token \
--from-literal=token="${TOKEN}" \
-n "${DATAGATE_NAMESPACE}"
The reply to this command should be:
secret/prometheus-bearer-token created
- Enter the following command to ensure that Prometheus user workload monitoring has been
enabled in your OpenShift environment:
oc get route -n openshift-user-workload-monitoring
In the output, you should see a list of host names. At least one of these hosts
should list
prometheus-user-workload in the SERVICE column, as in the following
example:
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
federate <hostname> /federate prometheus-user-workload federate reencrypt/Redirect None
thanos-ruler <hostname> /api thanos-ruler web reencrypt/Redirect None
- Give Prometheus access to the bearer token secret you created in step 8. To do that, you must create a bearer token role and assign it to
Prometheus. Enter the following command:
oc apply —f — <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-bearer-token-reader
namespace: cpd-instance
rules:
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["prometheus-bearer-token"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-bearer-token-reader
namespace: cpd-instance
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-bearer-token-reader
subjects:
- kind: ServiceAccount
name: prometheus-user-workload
namespace: openshift-user-workload-monitoring
EOF
Having submitted the command, you should see the following confirmation messages
on the
screen:
role.rbac.authorization.k8s.io/prometheus-bearer-token-reader created
rolebinding.rbac.authorization.k8s.io/prometheus-bearer-token-reader created
Creating a ServiceMonitor
The Prometheus operator needs a ServiceMonitor
element to discover services and obtain ("scrape") metrics.
- Enter the following commands to verify that the Data Gate for watsonx Db2z API service exists and to check its
labels:
oc get svc -n "${DATAGATE_NAMESPACE}" | grep data-gate-db2z-api
oc get svc "${DATAGATE_INSTANCE}-data-gate-db2z-api-svc" -n "${DATAGATE_NAMESPACE}" --show-labels
The output should be similar to the following:
dg-1877277353792780-data-gate-db2z-api-svc ClustertP 172.30.151.196 <none> 8334/TCP
dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46 <none> 8334/TCP
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(s) AGE LABELS
dg-1877277868942730-data-gate-db2z-api-svc ClusterIP 172.30.22.46 <none> 8334/TCP 38d app.kuber
netes.io/component=dg,app.kubernetes.io/instance=
dg-0-1877277868942730,app.kubernetes.io/managed-by=dg-1877277868942730,
app.kubernetes.io/name=dg-1877277868942730,app=dg-1877277868942730,
component=dg,icpdsupport/addOnId=dg,icpdsupport/app=dg-instance-db2z-api,
icpdsupport/assemblyName=datagate,icpdsupport/ignore-on-nd-backup=true,
icpdsupport/module=dginstance,release=dg-0-1877277868942730,
velero.io/exclude-from-backup=true
- Create a ServiceMonitor with https configuration. You find an
example below. Before you enter the command, replace
<instance-id> with the ID
of the Data Gate for watsonx instance you want to
monitor with the help of Prometheus. In the example, this is 1781677861056642.
Note: A single ServiceMonitor element can serve multiple Data Gate for watsonx instances within the same OpenShift cluster. The following example shows a
configuration for a single instance only.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: datagate-1781677861056642-db2z-api-metrics
namespace: cpd-instance
labels:
app.kubernetes.io/name: datagate
app.kubernetes.io/instance: "1781677861056642"
openshift.io/user-monitoring: "true"
spec:
selector:
matchLabels:
app: dg-<instance-id>
icpdsupport/app: dg-instance-db2z-api
endpoints:
- port: http
path: /metrics
interval: 15s
scrapeTimeout: 10s
scheme: https
tlsConfig:
insecureSkipVerify: true
authorization:
type: Bearer
credentials:
key: token
name: prometheus-bearer-token
metricRelabelings:
- sourceLabels: []
targetLabel: datagate_instance_id
replacement: "1781677861056642"
Explanations:
- scheme: https
- The Data Gate for watsonx API uses HTTPS with a
self-signed certificate.
- tlsConfig.insecureSkipVerify: true
- It is required that you skip the certificate verification for self-signed certificates.
- openshift.io/user-monitoring: "true"
- OpenShift user workload monitoring
requires this setting to discover the ServiceMonitor.
- selector.matchExpressions
- This expression matches services with the string
icpdsupport/app=dg-instance-db2z-api as part of the label.
- bearerTokenSecret
- This line references the secret you created in step 8.
The ServiceMonitor creation is confirmed by the following
message:
servicemonitor.monitoring.coreos.com/datagate-1781677861056642-api-metrics created
where
<instance-id> is replaced with the actual Data Gate for watsonx instance ID.
- Verify that the ServiceMonitor works in your Data Gate for watsonx environment (namespace). Replace
<instance-id> in the following command with the instance ID that was output as
part of the confirmation message in the previous step.
oc get servicemonitor -n "${DATAGATE_NAMESPACE}"
oc describe servicemonitor datagate-1781677861056642-db2z-api-metrics -n "${DATAGATE_NAMESPACE}"
You should see a screen output similar to this
one:
Name: datagate-1781677861056642-db2z-api-metrics
Namespace: cpd-instance
Labels: app.kubernetes.io/instance=1781677861056642
app.kubernetes.io/name=datagate
openshift.io/user-monitoring=true
Annotations: <none>
API Version: monitoring.coreos.com/v1
Kind: ServiceMonitor
Metadata:
Creation Timestamp: 2026-06-26T14:46:29Z
Generation: 1
Resource Version: 133491985
UID: 0c9bb932-1e4e-4340-a0ec-14555c4e157b
Spec:
Endpoints:
Authorization:
Credentials:
Key: token
Name: prometheus-bearer-token
Type: Bearer
Interval: 15s
Metric Relabelings:
Action: replace
Replacement: 1781677861056642
Source Labels:
Target Label: datagate_instance_id
Path: /metrics
Port: http
Scheme: https
Scrape Timeout: 10s
Tls Config:
Insecure Skip Verify: true
Selector:
Match Labels:
App: dg-1781677861056642
icpdsupport/app: dg-instance-db2z-api
Events: <none>
where <instance-id> is replaced
with the actual Data Gate for watsonx instance
ID.