Monitoring Kubernetes deployments

With the Databand Helm chart, you can enable various monitors for observability into your data.

The Databand Helm chart includes ServiceMonitor objects that you can use with prometheus-operator and Blackbox Exporter. You can also enable specific monitors to integrate tools, such as dbt and Next-Gen DataStage, and data warehouses, such as BigQuery and Snowflake.

Enabling Databand monitors

By default, integration and monitoring components are disabled. To connect to tracking systems and enable monitoring, you must use a username and password or Databand access token. If you haven't set your username and password, by default they are both databand.

Use the following sample to enable all available Databand monitors:

## user-values.yaml

dbnd-monitors:
  enabled: true
  databand_access_token: ""
  dbnd_username: "databand"
  dbnd_password: "databand"

For more information about how to set up specific integrations, see Integrations.

For more information about changing your username and password, see the section about setting login credentials in Deploying self-hosted Databand with Kubernetes.

Monitoring with Blackbox Exporter

The Databand Helm chart also has a ServiceMonitor for Blackbox Exporter, which you can use to monitor Databand availability on an Ingress endpoint URL.

To enable ServiceMonitor for Blackbox exporter, set the following values in user-values.yaml and set the appropriate labels for your ServiceMonitor discovery in your prometheus-operator:

blackbox:
  enabled: true
  labels:
    <YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>

To list Databand web, tracking, and rule_engine metrics, access the following endpoints in the Databand UI:

  • <INGRESS_URL>/api/internal/v1/dbnd_tracking_metrics and /api/internal/v1/dbnd_application_metrics for web metrics
  • <INGRESS_URL>/api/internal/v1/dbnd_tracking_metrics for tracking metrics
  • <RULE_ENGINE_SVC_NAME>:<RULE_ENGINE_SVC_PORT> for rule_engine metrics

Most Databand metrics have a dbnd_ prefix in their metric name. Python runtime metrics have a flask_ prefix in their metric name.

Monitoring with prometheus-operator

To enable ServiceMonitor for Databand components, set the following values in user-values.yaml and set the appropriate label for your ServiceMonitor in your prometheus-operator:

web:
  serviceMonitor:
    enabled: true
    labels:
      <YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>

tracking:
  serviceMonitor:
    enabled: true
    labels:
      <YOUR_LABEL_KEY>: <YOUR_LABEL_VALUE>

You can use ServiceMonitor for the following Databand components:

  • web
  • tracking
  • webapp
  • rule_engine
  • celery.flower
  • dbnd-monitors

Example of a rule for monitoring Prometheus alerts:

Databand endpoint response time:

- alert: ApiResponseTimeTooHightUpdateTaskRunAttempts
      expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])) > 10 < +Inf
      for: 10s
      labels:
        severity: high
      annotations:
        summary: API /api/v1/tracking/update_task_run_attempts average response time is too high
        description: "Avarange response time for api /api/v1/tracking/update_task_run_attempts is above 10s for the last 10s\n  VALUE = {{ $value }}s\n API = {{ $labels.path }}"

    - alert: ApiResponseTimeTooHightInitRun
      expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/init_run"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/init_run"}[60s])) > 10 < +Inf
      for: 10s
      labels:
        severity: high
      annotations:
        summary: API /api/v1/tracking/init_run average response time is too high
        description: "Average response time for api /api/v1/tracking/init_run is above 10s for the last 10s\n VALUE = {{ $value }}s\n API = {{ $labels.path }}"\

Databand access token expiration:

- alert: DatabandAccessTokenIsAboutToExpire
      expr: ((dbnd_auth_tokens - time()) / (3600 * 24)) <= 7 > 0 # 7 days
      for: 1m
      labels:
        severity: high
      annotations:
        summary: "Databand Access Token {{ $labels.label }} will expire in {{ humanize $value }} days"
        description: "Databand Access Token is about to expire"