Monitoring docker-compose
To monitor metrics in self-hosted Databand, you can use local components or external monitoring systems, such as New Relic and Datadog.
Local components
The Databand stack includes two optional components:
- Use
cadvisormetrics for Docker runtime and containers. - Use
node_exporterto get VM metrics where containers are running.
Databand supports Prometheus and OpenMetrics monitoring by default. You can use bundled or
external Prometheus, or other monitoring solutions, to scrape and store these metrics. The bundled
Prometheus configuration also has pre-defined targets for some Databand components, such as
webserver, tracking-server, and rule-engine.
Metrics from these components are available in bundled Prometheus by default after you deploy
Databand with docker-compose.
cadvisor and node_exporter:- Open
custom.envand addmods/monitoring.ymltoCOMPOSE_FILEvariable definition:COMPOSE_FILE=docker-compose.yml:mods/monitoring.yml - If you have other components that are enabled with the
COMPOSE_FILEvariable, for example a bundled local PostgreSQL database, use:to separate them from themods/monitoring.ymlstring:COMPOSE_FILE=docker-compose.yml:mods/local_pg.yml:mods/monitoring.yml
You can list available cadvisor and node_exporter metrics by
accessing /metrics endpoints from appropriate containers.
webserver, tracking-server, and
rule-engine metrics, access the following endpoints from appropriate containers:-
/api/internal/v1/dbnd_tracking_metricsand/api/internal/v1/dbnd_application_metricsforwebserver -
/api/internal/v1/dbnd_tracking_metricsfortracking-server -
/forrule-engine
Most Databand metrics have a dbnd_ prefix in the metric name. Python runtime
metrics have a flask_ prefix in the metric name.
You can discover all of these metrics in the bundled Prometheus UI.
- alert: ContainerMemoryUsage
expr: (sum(container_memory_working_set_bytes{name!=""}) by (instance, name) / sum(container_spec_memory_limit_bytes > 0) by (instance, name) * 100) > 80
for: 2m
labels:
severity: high
annotations:
summary: Container Memory usage (instance {{ $labels.instance }})
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 2m
labels:
severity: high
annotations:
summary: Container high throttle rate (instance {{ $labels.instance }})
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: ApiResponseTimeTooHightUpdateTaskRunAttempts
expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/update_task_run_attempts"}[60s])) > 10 < +Inf
for: 10s
labels:
severity: high
annotations:
summary: API /api/v1/tracking/update_task_run_attempts average response time is too high
description: "Avarange response time for api /api/v1/tracking/update_task_run_attempts is above 10s for the last 10s\n VALUE = {{ $value }}s\n API = {{ $labels.path }}"
- alert: ApiResponseTimeTooHightInitRun
expr: (rate(flask_http_request_duration_seconds_sum{status="200",path="/api/v1/tracking/init_run"}[60s])/rate(flask_http_request_duration_seconds_count{status="200",path="/api/v1/tracking/init_run"}[60s])) > 10 < +Inf
for: 10s
labels:
severity: high
annotations:
summary: API /api/v1/tracking/init_run average response time is too high
description: "Avarange response time for api /api/v1/tracking/init_run is above 10s for the last 10s\n VALUE = {{ $value }}s\n API = {{ $labels.path }}"
- alert: DatabandAccessTokenIsAboutToExpire
expr: ((dbnd_auth_tokens - time()) / (3600 * 24)) <= 7 > 0 # 7 days
for: 1m
labels:
severity: high
annotations:
summary: "Databand Access Token {{ $labels.label }} will expire in {{ humanize $value }} days"
description: "Databand Access Token is about to expire"
Enabling New Relic monitoring
Follow these instructions to enable New Relic monitoring of your docker-compose
deployments with your Databand environment.
- Before you begin, you must download your JSON configuration file from New Relic. For more information, see the New Relic documentation.
- Copy the JSON file into the deployment folder under
databand/config/webserver/newrelic.ini. This path is mapped under/etc/config, so the New Relic agent uses it as a configuration. - Enable New Relic at
custom.envby usingNEW_RELIC_ENABLED=true## custom.env NEW_RELIC_ENABLED=true - Use
make upto start Databand with the New Relic agent enabled.
Enabling Datadog
By default, Datadog logging is disabled when you are using make up.
To enable Datadog logging:
- Set
DATADOG_ENABLED=trueand override the following variables to match your setup incustom.env:## custom.env ## add datadog.yml mod to your COMPOSE_FILE variable COMPOSE_FILE=docker-compose.yml:./mods/datadog.yml DATADOG_ENABLED=true DATADOG_SITE=VALUE DATADOG_API_KEY=VALUE DATADOG_ENV=VALUE - Use
make upto launch Databand with the Datadog agent enabled.