Monitoring Docker

Introduction

Instana automatically discovers and monitors Docker containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

Installation

For detailed installation instructions, see Installing the agent on Docker.

The Instana agent requires Docker version 1.11 or higher.

Metrics collection

To view the metrics, select Infrastructure in the sidebar of the Instana User interface, click a specific monitored host, and then you can see a host dashboard with all the collected metrics and monitored processes.

By default, Docker metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.docker:
  stats:
    interval: 10

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

On the Docker container dashboard, the configuration and performance metrics for the container are displayed.

To view detailed information for the running container, click Get Container Info. The information displayed is the same as running the docker inspect command.

Configuration data

Configuration Description
Image The Docker image name.
Command
Created The container created timestamp.
Started The container started timestamp.
Id The container ID.
Names The container name.
Network Mode The configured network settings for the container.
Storage Driver The configured storage driver.
Docker version The Docker version in use.
Container Labels Labels applied to the container.
Ports

Performance metrics

Using the Docker Engine API, the performance metrics are returned from the /containers/{id}/stats endpoint.

CPU Total %

The total % of CPU usage. The current measured KPI value is displayed.

Data point: value is collected from the total_usage key returned in the cpu_stats object.

Memory usage %

The total memory usage in percentage. The current measured KPI value is displayed.

Data point: value is calculated as quotient of usage and limit keys returned in the memory_stats object.

CPU

The total, kernel, and user metrics as well as their normalized values in range [0, 100]% are displayed on a graph over a selected time period.

Data points: the values are collected from the total_usage, usage_in_kernalmode, and usage_in_usermode keys returned in the cpu_stats object.

Throttling count and time values are displayed on a graph over a selected time period.

Data points: the values are collected from the periods and throttling_time keys returned in the cpu_stats object.

Memory

The usage, RSS, cache metrics and memory usage are displayed on a graph over a selected time period.

Data points: the values are collected from the usage, total_rss, and total_cache keys returned in the memory_stats object. The memory usage is derived metric displayed in percentage.

Active anonymous, active cache, inactive anonymous, and inactive cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the active_anon, active_file, inactive_anon, and inactive_file keys returned in the memory_stats object.

Block IO

The read and write values are displayed on a graph over a selected time period.

Data point: the values are collected from the blkio.io_service_bytes key returned in the blkio_stats object.

Network

The network rx (received) and tx (transmitted) bytes, errors, packets, and dropped metrics are displayed on a graph over a selected time period.

Data points: all values are collected from keys returned in the network object; rx_dropped, rx_bytes, rx_errors, rx_packets, tx_dropped, tx_bytes, tx_errors, and tx_packets.

For detailed information about specific Docker runtime metrics, see the Docker documentation.

Health signature

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in events for the Docker sensor, see the Built-in events reference.