Monitoring Kubernetes deployments
With the Databand Helm chart, you can enable various monitors for observability into your data.
The Databand Helm chart includes ServiceMonitor objects that you can use
with prometheus-operator and Blackbox Exporter. You can also enable specific monitors to integrate tools, such as dbt and
Next-Gen DataStage, and data warehouses, such as BigQuery and Snowflake.
Enabling Databand monitors
By default, integration and monitoring components are disabled. To connect to tracking systems
and enable monitoring, you must use a username and password or Databand
access token. If you haven't set your username and password, by default they are both
databand.
Use the following sample to enable all available Databand monitors:
## user-values.yaml
dbnd-monitors:
enabled: true
databand_access_token: ""
dbnd_username: "databand"
dbnd_password: "databand"
For more information about how to set up specific integrations, see Integrations.
For more information about changing your username and password, see the section about setting login credentials in Deploying self-hosted Databand with Kubernetes.
Monitoring with Blackbox Exporter
The Databand Helm chart also has a ServiceMonitor for Blackbox
Exporter, which you can use to monitor Databand availability on an Ingress endpoint URL.
To enable ServiceMonitor for Blackbox exporter, set the following values in
user-values.yaml and set the appropriate labels for your
ServiceMonitor discovery in your prometheus-operator:
blackbox:
enabled: true
labels:
<YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>
To list Databand web, tracking, and
rule_engine metrics, access the following endpoints in the Databand UI:
-
<INGRESS_URL>/api/internal/v1/dbnd_tracking_metricsand/api/internal/v1/dbnd_application_metricsforwebmetrics -
<INGRESS_URL>/api/internal/v1/dbnd_tracking_metricsfortrackingmetrics -
<RULE_ENGINE_SVC_NAME>:<RULE_ENGINE_SVC_PORT>forrule_enginemetrics
Most Databand metrics have a dbnd_ prefix in their metric name. Python runtime
metrics have a flask_ prefix in their metric name.
Monitoring with prometheus-operator
To enable ServiceMonitor for Databand components, set the following values in
user-values.yaml and set the appropriate label for your
ServiceMonitor in your prometheus-operator:
web:
serviceMonitor:
enabled: true
labels:
<YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>
tracking:
serviceMonitor:
enabled: true
labels:
<YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>
You can use ServiceMonitor for the following Databand components:
-
web -
tracking -
webapp -
rule_engine -
celery.flower -
dbnd-monitors
Example of a rule for monitoring Prometheus alerts:
Databand endpoint response time:
- alert: ApiResponseTimeTooHightUpdateTaskRunAttempts
expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])) > 10 < +Inf
for: 10s
labels:
severity: high
annotations:
summary: API /api/v1/tracking/update_task_run_attempts average response time is too high
description: "Avarange response time for api /api/v1/tracking/update_task_run_attempts is above 10s for the last 10s\n VALUE = {{ $value }}s\n API = {{ $labels.path }}"
- alert: ApiResponseTimeTooHightInitRun
expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/init_run"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/init_run"}[60s])) > 10 < +Inf
for: 10s
labels:
severity: high
annotations:
summary: API /api/v1/tracking/init_run average response time is too high
description: "Average response time for api /api/v1/tracking/init_run is above 10s for the last 10s\n VALUE = {{ $value }}s\n API = {{ $labels.path }}"\
Databand access token expiration:
- alert: DatabandAccessTokenIsAboutToExpire
expr: ((dbnd_auth_tokens - time()) / (3600 * 24)) <= 7 > 0 # 7 days
for: 1m
labels:
severity: high
annotations:
summary: "Databand Access Token {{ $labels.label }} will expire in {{ humanize $value }} days"
description: "Databand Access Token is about to expire"