Monitoring CRI-O

The CRI-O sensor is automatically deployed and installed after you install the Instana agent.

Introduction

Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

For CRI-O 1.28 and later, the default runtime is crun. Therefore, you need to override the default runtime from crun to runc, as Instana currently supports only runc.

Support information

To make sure that the CRI-O sensor is compatible with your current setup, check the following support information sections:

Supported versions and support policy

The following table shows the latest supported version and support policy:

Table 1. Latest supported version and support policy
Technology Support policy Latest version Latest supported version
CRI-O 45 days 1.31.3 1.31.x

For more information about the support policy, see Support strategy for sensors.

Metrics collection

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.crio:
  stats:
    interval: 10

On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.

Pause containers

The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.

The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:

  • The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
  • Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.

Configuration data

Configuration Description
Id The container ID
Name The container name
Image The CRI-O image name
IP The container IP
Created The container created timestamp
CRI-O Version The CRI-O runtime version number

Performance metrics

The performance metrics are collected using the runc command.

CPU Total %

The total % of CPU usage. The current measured KPI value is displayed.

Data point: value is collected from the total key returned in the cpu.usage object.

Memory usage

The total memory usage. The current measured KPI value is displayed.

Data point: value is collected from the usage key returned in the memory.raw object.

Memory usage %

The total memory usage as a percentage. The current measured KPI value is displayed.

Data point: The value is calculated from the memory.total and memory.usage objects.

CPU

The total, kernel, and user metrics are displayed on a graph over a selected time period.

Data points: values are collected from the total, kernel, and user keys returned in the cpu.usage object.

Throttling count and time values are displayed on a graph over a selected time period.

Data points: values are collected from the throttling.throttledPeriods and throttling.throttledTime keys returned in the cpu_stats object.

Memory

The usage, RSS, and cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the usage, max, and limit keys returned in the memory.usage object.

Active anonymous, active cache, inactive anonymous, and inactive cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the active_anon, active_file, inactive_anon, and inactive_file keys returned in the memory.raw object.

The Memory Total RSS % graph shows the percentage of total memory resident set size used for the set interval. The data for the graph is calculated by using memory limit and total memory resident set size.

Block IO

The read and write values are displayed on a graph over a selected time period.

Data point: values are collected from the ioServiceBytesRecursive.op.Read and ioServiceBytesRecursive.op.Write fields.

Health signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the CRI-O sensor, see the Built-in events reference.