Monitoring CRI-O

The CRI-O sensor is automatically deployed and installed after you install the Instana agent.

Introduction

Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

Metrics collection

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.crio:
  stats:
    interval: 10

On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.

Pause containers

The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.

The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:

  • The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
  • Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.

Configuration data

Configuration Description
Id The container ID.
Name The container name.
Image The CRI-O image name.
IP The container IP.
Created The container created timestamp.

Performance metrics

The performance metrics are collected using the runc command.

CPU Total %

The total % of CPU usage. The current measured KPI value is displayed.

Data point: value is collected from the total key returned in the cpu.usage object.

Memory usage

The total memory usage. The current measured KPI value is displayed.

Data point: value is collected from the usage key returned in the memory.raw object.

Memory usage %

The total memory usage as a percentage. The current measured KPI value is displayed.

Data point: The value is calculated from the memory.total and memory.usage objects.

CPU

The total, kernel, and user metrics are displayed on a graph over a selected time period.

Data points: values are collected from the total, kernel, and user keys returned in the cpu.usage object.

Throttling count and time values are displayed on a graph over a selected time period.

Data points: values are collected from the throttling.throttledPeriods and throttling.throttledTime keys returned in the cpu_stats object.

Memory

The usage, RSS, and cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the usage, max, and limit keys returned in the memory.usage object.

Active anonymous, active cache, inactive anonymous, and inactive cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the active_anon, active_file, inactive_anon, and inactive_file keys returned in the memory.raw object.

The Memory Total RSS % graph shows the percentage of total memory resident set size used for the set interval. The data for the graph is calculated by using memory limit and total memory resident set size.

Block IO

The read and write values are displayed on a graph over a selected time period.

Data point: values are collected from the ioServiceBytesRecursive.op.Read and ioServiceBytesRecursive.op.Write fields.

Health signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the CRI-O sensor, see the Built-in events reference.