Monitoring CRI-O

The CRI-O sensor is automatically deployed and installed after you install the Instana agent.

Introduction

Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

Support information

Edit online

To make sure that the CRI-O sensor is compatible with your current setup, check the following support information sections:

Supported versions and support policy

Edit online

The following table shows the latest supported version and support policy:


Technology	Support policy	Latest technology version	Latest supported version
CRI-O	45 days	1.36.1	1.36.1

For more information about the support policy, see Support strategy for sensors.

Prerequisites

Edit online

For monitoring CRI‑O, ensure that the following version requirements are met:

Instana agent: 1.2.19 or later
CRI‑O sensor: 1.0.15 or later

Configuring CRI-O monitoring

Edit online

The agent natively monitors CRI-O, and configuration is optional.

Configuring the polling rate

Edit online

Note: Instana CRI-O sensor 1.0.16 and later support configuring the polling rate to reduce data ingestion. This feature is supported on self-hosted Instana backend 311 and later.

You can configure how often Instana polls CRI-O to collect data and metrics by using the poll_rate parameter in the agent configuration.yaml file as shown in the following example:

com.instana.plugin.crio:
  poll_rate: 1 # values are in seconds. Default value is 1 second.

Metrics collection

Edit online

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.crio:
  stats:
    interval: 10

On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.

Pause containers

Edit online

The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.

The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:

The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.

Configuration data

Edit online


Configuration	Description
Id	The container ID
Name	The container name
Image	The CRI-O image name
IP	The container IP
Created	The container created timestamp
CRI-O Version	The CRI-O runtime version number

Performance metrics

Edit online

To collect the performance metrics, run the runc command.

The following table summarizes the CPU and memory usage metrics:


Metric	Description	Data point
CPU Total %	Total percentage of CPU usage	Total key returned in the`cpu.usage` object
Memory usage	Total memory usage	Usage key returned in the`memory.raw` object
Memory usage %	Total memory usage as a percentage	Calculated from the`memory.total` and`memory.usage` objects

Collecting CPU metrics

Edit online

CRI-O sensors run the metrics collection and directs the cgroup CPU parsing to the agent. The parser reads CPU stat files from the respective cgroup directory paths.

After parsing, a CPU object that is populated with the CPU metrics is returned from the CRI-O sensor.

The following steps outline the process for CPU metrics collection and parsing in CRI-O:

Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2: /proc/1/root/sys/fs/cgroup and cgroupPath
- cgroup v1: /proc/1/root/sys/fs/cgroup/cpu and cgroupPath

Parsing CPU statistics for cgroup (v2 and v1) directories: The parser reads the CPU files from the respective cgroup directory paths.

For cgroup v2 directories:

The following table lists the usage and throttling metrics that are populated from the cpu.stat file in the CPU object:


Metrics	Description	v2	Source
Usage metrics	Total CPU usage (in nano secs)	`total` =`usage_usec` * 1000	`cpu.stat` file
	Per CPU usage	`percpu` = 0 (cgroup v2 does not provide per-cpu usage directly)	`cpu.stat` file
	User CPU time	`user` =`user_usec` * 1000 (in nano secs) See the following note	`cpu.stat` file
	System CPU time	`kernel` =`system_usec` * 1000 (in nano secs) See the following note	`cpu.stat` file
Throttling metrics	Number of periods	`periods` =`nr_periods`	`cpu.stat` file
	Number of throttled periods	`throttledPeriods` =`nr_throttled`	`cpu.stat` file
	Throttled Time	`throttledTime` =`throttled_usec` * 1000 (in nano secs)	`cpu.stat` file

Note: Both user_usec and system_usec must be available for parsing. If either is unavailable, the corresponding CPU time is set to 0.

For cgroup v1 directories:

The following table lists the usage and throttling metrics for cgroup v1 in the CPU object:


Metrics	Description	v1	Source
Usage metrics	Total CPU usage (in nano secs)	`total`	`cpuacct.usage` file
	Per CPU usage	`percpu`	Sets a list of values from the`cpuacct.usage_percpu` file
	User CPU time	`user` =`user` * 1000000 (ns)	`cpuacct.stat` file
	System CPU time	`kernel` =`system` * 1000000 (ns)	`cpuacct.stat` file
Throttling metrics	Number of periods	`periods` =`nr_periods`	`cpu.stat` file
	Number of throttled periods	`throttledPeriods` =`nr_throttled`	`cpu.stat` file
	Throttled Time	`throttledTime` =`throttled_time`	`cpu.stat` file

Updating and returning CPU metrics:

The following table provides a comprehensive mapping of CPU metrics to their corresponding source object:


CPU metrics	Source object
CPU usage metrics	`cpu.usage` object
`cpuTotalUsageNanoseconds`	`cpu.usage.total`
`cpuSystemUsageNanoseconds`	`cpu.usage.kernel`
`cpuUserUsageNanoseconds`	`cpu.usage.user`
`cpuTotalUsage`	`cpuTotalUsageNanoseconds` delta (total CPU usage time of the container over a time window) /`system delta` (total system CPU time available)
`cpuUserUsage`	`cpuUserUsageNanoseconds` delta (cpu usage time of the container in user mode over a time window ) /`system delta` (total system CPU time available)
`cpuSystemUsage`	`cpuSystemUsageNanoseconds` delta (CPU usage time of the container in system or kernel mode over a time window) / (`system delta` (total system CPU time available)
Throttling metrics	`cpu.throttling` object
`throttlingCount`	`cpu.throttling.throttledPeriods`
`throttlingTime`	`cpu.throttling.throttledTime`

Displaying CPU metrics in the Instana UI from the backend:

The following table summarizes the CPU metrics that are displayed in the Instana UI:


CPU metrics	Description	Source value
`Total` (`cpu.total_usage`)	Total CPU usage	`cpuTotalUsage`
`Kernel` (`cpu.system_usage`)	System CPU usage	`cpuSystemUsage`
`User` (`cpu.user_usage`)	User CPU usage	`cpuUserUsage`
`Throttling Count` (`cpu.throttling_count`)	Number of throttled periods	`throttlingCount`
`Throttling Time` (`cpu.throttling_time`)	Throttled time	`throttlingTime`

Collecting memory metrics

Edit online

CRI-O sensors run the metrics collection and directs the cgroup memory parsing to the agent. The memory parser reads memory files from the respective cgroup directory paths. After parsing, a memory object that is populated with the memory metrics is returned from the CRI-O sensor.

The following steps outline the process for memory metrics collection and parsing in CRI-O:

Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2: /proc/1/root/sys/fs/cgroup and cgroupPath
- cgroup v1: /proc/1/root/sys/fs/cgroup/memory and cgroupPath

Parsing memory statistics for cgroup directories: The parser reads the memory files from the respective cgroup directory paths.

For cgroup v2 directories:

The following table provides a comprehensive mapping of memory metrics to their respective source files:


Memory metrics	Source file
Base memory	Base memory files
`usage`	`memory.stat` (`file` +`anon`)
`limit`	`memory.max`
`max`	`memory.peak`
Swap memory	Swap files
`usage`	`memory.swap.current`
`limit`	`memory.swap.max`
`max`	`memory.swap.peak`
Raw memory field (`memory.raw`)	`memory.stat` file
`activeAnon`	`active_anon`
`activeFile`	`active_file`
`inactiveAnon`	`inactive_anon`
`inactiveFile`	`inactive_file`
`totalCache`	`total_cache`
`totalRss`	`total_rss`
Memory fields (`memory`)	`memory.stat` file
`cache`	`file`
`rss`	`anon`

Note: Notes: For v1 compatibility, combined swap memory is calculated as base memory plus swap memory: swap.usage = Base memory usage + swap.usage. swap.limit = Base memory limit+ swap.limit swap.max = 0

For cgroup v1 directories:

The following table provides a comprehensive mapping of memory metrics to their respective source files:


Memory metrics	Source file
Base memory	Base memory files
`usage`	`memory.usage_in_bytes`
`max`	`memory.max_usage_in_bytes`
`limit`	`memory.limit_in_bytes`
`failcnt`	`memory.failcnt`
Swap memory	Swap files
`usage`	`memory.memsw.usage_in_bytes`
`max`	`memory.memsw.max_usage_in_bytes`
`limit`	`memory.memsw.limit_in_bytes`
`failcnt`	`memory.memsw.failcnt`
Kernel memory	Kmemory files
`usage`	`memory.kmem.usage_in_bytes`
`max`	`memory.kmem.max_usage_in_bytes`
`limit`	`memory.kmem.limit_in_bytes`
`failcnt`	`memory.kmem.failcnt`
Kernel TCP memory	Kmemory TCP files
`usage`	`memory.kmem.tcp.usage_in_bytes`
`max`	`memory.kmem.tcp.max_usage_in_bytes`
`limit`	`memory.kmem.tcp.limit_in_bytes`
`failcnt`	`memory.kmem.tcp.failcnt`
Raw memory fields	similar fields as v2 from`memory.stat`
Memory fields (`memory`)	`memory.stat` file
`cache`	`cache`
`rss`	`total_rss`

Updating and returning memory metrics:

The following table maps memory metrics to their respective source objects:


Memory metric	Source object
Memory usage metrics	`memory.usage` object
`usage`	`memory.usage.usage`
`maxUsage`	`memory.usage.max`
`limit`	`memory.usage.limit`
Memory raw metrics	`memory.raw` object
`activeAnon`	`memory.raw.activeAnon`
`activeFile`	`memory.raw.activeFile`
`inactiveAnon`	`memory.raw.inactiveAnon`
`inactiveFile`	`memory.raw.inactiveFile`
Memory metrics	`memory` object
`totalCache`	`memory.cache`
`totalRss`	`memory.rss`

Displaying memory metrics in the Instana UI from the backend:

The following memory metrics are displayed in the Instana UI:


Metric	Source value
`Memory Total RSS%` (`memory.total_rss_percent`)	`total_rss`/`limit`
`Active anonymous` (`memory.active_anon`)	`activeAnon`
`Active cache` (`memory.active_file`)	`activeFile`
`Inactive anonymous` (`memory.inactive_anon`)	`inactiveAnon`
`Inactive cache` (`memory.inactive_file`)	`inactiveFile`
`Usage` (`memory.usage`)	`usage`
`RSS` (`memory.total_rss`)	`totalRss`
`Cache` (`memory.total_cache`)	`totalCache`

Collecting Block IO metrics

Edit online

CRI-O sensors run the metrics collection and directs the cgroup blk-io parsing to the agent. Block IO parser reads blk-io stats files from the respective cgroup directory paths. After parsing, a Blkio object that is populated with the IO stats is returned from the CRI-O sensor.

The following steps outline the process for block IO metrics collection and parsing in CRI-O:

Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2: /proc/1/root/sys/fs/cgroup + cgroupPath
- cgroup v1: /proc/1/root/sys/fs/cgroup/blkio + cgroupPath
Parsing block IO statistics for cgroup directories: The parser reads the block IO files from the respective cgroup directory paths.
- For cgroup v2 directories:
  - Reads io.stat file and returns 2 lists with IO stat entries
    - ioServiceBytesRecursive list: List of I/O service bytes
    - ioServicedRecursive list: List of I/O serviced
  - Maps raw operation in the io.stat file to canonical operation:
    - rbytes/rios: Read
    - wbytes/wios: Write
  Note: The operation rbytes/wbytes adds the IOStat entry to the IoServiceBytesRecursive list with values. The operation rios/wios adds the IOStat entry to the IoServicedRecursive list with values.
- For cgroup v1 directories:
  - Parses the blkio.io_service_bytes file and directly sets the service bytes IO statistics.(ioServiceBytesRecursive list)
  - Parses the blkio.io_serviced file and directly sets the serviced IO statistics.(ioServicedRecursive list)
Mapping major and minor device numbers to respective fields in the IO stat objects.

Returning a Blkio object with 2 list of IO stat entries with major, minor, operation, and values to the agent:


List	Description
`ioServiceBytesRecursive`	List of I/O service bytes
`ioServicedRecursive`	List of I/O serviced

Aggregating the totalRead and totalWrite values from the IOServiceBytesRecursive list:


List	Source
`totalRead`	`ioServiceBytesRecursive.ioStat.value` (Aggregates all read values)
`totalWrite`	`ioServiceBytesRecursive.ioStat.value` (Aggregates all write values)

Displaying block IO metrics in the Instana UI from the backend: The following table outlines the block IO metrics that are displayed in the Instana UI:


Metric	Source value
Read (`blkio.blk_read`)	`totalRead`
Write (`blkio.blk_write`)	`totalWrite`

Health signatures

Edit online

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the CRI-O sensor, see the Built-in events reference.

Logs widget

Edit online

The CRI-O sensor does not support log collection natively. Instana recommends that you follow the Collecting Kubernetes and Red Hat OpenShift logs instructions to deploy the OpenTelemetry Collector to collect logs from CRI-O containers. To correlate logs with specific CRI-O containers, add the container.runtime attribute as specified in the linked instructions.