Monitoring CRI-O
The CRI-O sensor is automatically deployed and installed after you install the Instana agent.
Introduction
Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.
Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.
Support information
To make sure that the CRI-O sensor is compatible with your current setup, check the following support information sections:
Supported versions and support policy
The following table shows the latest supported version and support policy:
| Technology | Support policy | Latest technology version | Latest supported version |
|---|---|---|---|
| CRI-O | 45 days | 1.36.1 | 1.36.1 |
For more information about the support policy, see Support strategy for sensors.
Prerequisites
For monitoring CRI‑O, ensure that the following version requirements are met:
- Instana agent: 1.2.19 or later
- CRI‑O sensor: 1.0.15 or later
Configuring CRI-O monitoring
The agent natively monitors CRI-O, and configuration is optional.
Configuring the polling rate
You can configure how often Instana polls CRI-O to collect data and metrics by using the poll_rate parameter in the agent configuration.yaml file as shown in the following example:
com.instana.plugin.crio:
poll_rate: 1 # values are in seconds. Default value is 1 second.
Metrics collection
To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.
By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:
com.instana.plugin.crio:
stats:
interval: 10
On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.
Pause containers
The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.
The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:
- The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
- Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.
Configuration data
| Configuration | Description |
|---|---|
| Id | The container ID |
| Name | The container name |
| Image | The CRI-O image name |
| IP | The container IP |
| Created | The container created timestamp |
| CRI-O Version | The CRI-O runtime version number |
Performance metrics
To collect the performance metrics, run the runc command.
The following table summarizes the CPU and memory usage metrics:
| Metric | Description | Data point |
|---|---|---|
| CPU Total % | Total percentage of CPU usage | Total key returned in thecpu.usage object |
| Memory usage | Total memory usage | Usage key returned in thememory.raw object |
| Memory usage % | Total memory usage as a percentage | Calculated from thememory.total andmemory.usage objects |
Collecting CPU metrics
CRI-O sensors run the metrics collection and directs the cgroup CPU parsing to the agent. The parser reads CPU stat files from the respective cgroup directory paths.
After parsing, a CPU object that is populated with the CPU metrics is returned from the CRI-O sensor.
The following steps outline the process for CPU metrics collection and parsing in CRI-O:
-
Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2:
/proc/1/root/sys/fs/cgroupandcgroupPath - cgroup v1:
/proc/1/root/sys/fs/cgroup/cpuandcgroupPath
- cgroup v2:
-
Parsing CPU statistics for cgroup (v2 and v1) directories: The parser reads the CPU files from the respective cgroup directory paths.
-
For cgroup v2 directories:
The following table lists the usage and throttling metrics that are populated from the
cpu.statfile in the CPU object:Metrics Description v2 Source Usage metrics Total CPU usage (in nano secs) total=usage_usec* 1000cpu.statfilePer CPU usage percpu= 0 (cgroup v2 does not provide per-cpu usage directly)cpu.statfileUser CPU time user=user_usec* 1000 (in nano secs) See the following notecpu.statfileSystem CPU time kernel=system_usec* 1000 (in nano secs) See the following notecpu.statfileThrottling metrics Number of periods periods=nr_periodscpu.statfileNumber of throttled periods throttledPeriods=nr_throttledcpu.statfileThrottled Time throttledTime=throttled_usec* 1000 (in nano secs)cpu.statfileNote: Bothuser_usecandsystem_usecmust be available for parsing. If either is unavailable, the corresponding CPU time is set to0. -
For cgroup v1 directories:
The following table lists the usage and throttling metrics for cgroup v1 in the CPU object:
Metrics Description v1 Source Usage metrics Total CPU usage (in nano secs) totalcpuacct.usagefilePer CPU usage percpuSets a list of values from the cpuacct.usage_percpufileUser CPU time user=user* 1000000 (ns)cpuacct.statfileSystem CPU time kernel=system* 1000000 (ns)cpuacct.statfileThrottling metrics Number of periods periods=nr_periodscpu.statfileNumber of throttled periods throttledPeriods=nr_throttledcpu.statfileThrottled Time throttledTime=throttled_timecpu.statfile
-
-
Updating and returning CPU metrics:
The following table provides a comprehensive mapping of CPU metrics to their corresponding source object:
CPU metrics Source object CPU usage metrics cpu.usageobjectcpuTotalUsageNanosecondscpu.usage.totalcpuSystemUsageNanosecondscpu.usage.kernelcpuUserUsageNanosecondscpu.usage.usercpuTotalUsagecpuTotalUsageNanosecondsdelta (total CPU usage time of the container over a time window) /system delta(total system CPU time available)cpuUserUsagecpuUserUsageNanosecondsdelta (cpu usage time of the container in user mode over a time window ) /system delta(total system CPU time available)cpuSystemUsagecpuSystemUsageNanosecondsdelta (CPU usage time of the container in system or kernel mode over a time window) / (system delta(total system CPU time available)Throttling metrics cpu.throttlingobjectthrottlingCountcpu.throttling.throttledPeriodsthrottlingTimecpu.throttling.throttledTime -
Displaying CPU metrics in the Instana UI from the backend:
The following table summarizes the CPU metrics that are displayed in the Instana UI:
CPU metrics Description Source value Total(cpu.total_usage)Total CPU usage cpuTotalUsageKernel(cpu.system_usage)System CPU usage cpuSystemUsageUser(cpu.user_usage)User CPU usage cpuUserUsageThrottling Count(cpu.throttling_count)Number of throttled periods throttlingCountThrottling Time(cpu.throttling_time)Throttled time throttlingTime
Collecting memory metrics
CRI-O sensors run the metrics collection and directs the cgroup memory parsing to the agent. The memory parser reads memory files from the respective cgroup directory paths. After parsing, a memory object that is populated with the memory metrics is returned from the CRI-O sensor.
The following steps outline the process for memory metrics collection and parsing in CRI-O:
-
Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2:
/proc/1/root/sys/fs/cgroupandcgroupPath - cgroup v1:
/proc/1/root/sys/fs/cgroup/memoryandcgroupPath
- cgroup v2:
-
Parsing memory statistics for cgroup directories: The parser reads the memory files from the respective cgroup directory paths.
-
For cgroup v2 directories:
The following table provides a comprehensive mapping of memory metrics to their respective source files:
Memory metrics Source file Base memory Base memory files usagememory.stat(file+anon)limitmemory.maxmaxmemory.peakSwap memory Swap files usagememory.swap.currentlimitmemory.swap.maxmaxmemory.swap.peakRaw memory field ( memory.raw)memory.statfileactiveAnonactive_anonactiveFileactive_fileinactiveAnoninactive_anoninactiveFileinactive_filetotalCachetotal_cachetotalRsstotal_rssMemory fields ( memory)memory.statfilecachefilerssanonNote: Notes: For v1 compatibility, combined swap memory is calculated as base memory plus swap memory: swap.usage = Base memory usage + swap.usage. swap.limit = Base memory limit+ swap.limit swap.max = 0 -
For cgroup v1 directories:
The following table provides a comprehensive mapping of memory metrics to their respective source files:
Memory metrics Source file Base memory Base memory files usagememory.usage_in_bytesmaxmemory.max_usage_in_byteslimitmemory.limit_in_bytesfailcntmemory.failcntSwap memory Swap files usagememory.memsw.usage_in_bytesmaxmemory.memsw.max_usage_in_byteslimitmemory.memsw.limit_in_bytesfailcntmemory.memsw.failcntKernel memory Kmemory files usagememory.kmem.usage_in_bytesmaxmemory.kmem.max_usage_in_byteslimitmemory.kmem.limit_in_bytesfailcntmemory.kmem.failcntKernel TCP memory Kmemory TCP files usagememory.kmem.tcp.usage_in_bytesmaxmemory.kmem.tcp.max_usage_in_byteslimitmemory.kmem.tcp.limit_in_bytesfailcntmemory.kmem.tcp.failcntRaw memory fields similar fields as v2 from memory.statMemory fields ( memory)memory.statfilecachecachersstotal_rss
-
-
Updating and returning memory metrics:
The following table maps memory metrics to their respective source objects:
Memory metric Source object Memory usage metrics memory.usageobjectusagememory.usage.usagemaxUsagememory.usage.maxlimitmemory.usage.limitMemory raw metrics memory.rawobjectactiveAnonmemory.raw.activeAnonactiveFilememory.raw.activeFileinactiveAnonmemory.raw.inactiveAnoninactiveFilememory.raw.inactiveFileMemory metrics memoryobjecttotalCachememory.cachetotalRssmemory.rss -
Displaying memory metrics in the Instana UI from the backend:
The following memory metrics are displayed in the Instana UI:
Metric Source value Memory Total RSS%(memory.total_rss_percent)total_rss/limitActive anonymous(memory.active_anon)activeAnonActive cache(memory.active_file)activeFileInactive anonymous(memory.inactive_anon)inactiveAnonInactive cache(memory.inactive_file)inactiveFileUsage(memory.usage)usageRSS(memory.total_rss)totalRssCache(memory.total_cache)totalCache
Collecting Block IO metrics
CRI-O sensors run the metrics collection and directs the cgroup blk-io parsing to the agent. Block IO parser reads blk-io stats files from the respective cgroup directory paths. After parsing, a Blkio object that is populated with the IO stats is returned from the CRI-O sensor.
The following steps outline the process for block IO metrics collection and parsing in CRI-O:
-
Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:
- cgroup v2:
/proc/1/root/sys/fs/cgroup+cgroupPath - cgroup v1:
/proc/1/root/sys/fs/cgroup/blkio+cgroupPath
- cgroup v2:
-
Parsing block IO statistics for cgroup directories: The parser reads the block IO files from the respective cgroup directory paths.
-
For cgroup v2 directories:
-
Reads
io.statfile and returns 2 lists with IO stat entries-
ioServiceBytesRecursivelist: List of I/O service bytes -
ioServicedRecursivelist: List of I/O serviced
-
-
Maps raw operation in the
io.statfile to canonical operation:-
rbytes/rios: Read -
wbytes/wios: Write
-
Note: The operationrbytes/wbytesadds the IOStat entry to theIoServiceBytesRecursivelist with values. The operationrios/wiosadds the IOStat entry to theIoServicedRecursivelist with values. -
-
For cgroup v1 directories:
- Parses the
blkio.io_service_bytesfile and directly sets the service bytes IO statistics.(ioServiceBytesRecursivelist) - Parses the
blkio.io_servicedfile and directly sets the serviced IO statistics.(ioServicedRecursivelist)
- Parses the
-
-
Mapping major and minor device numbers to respective fields in the IO stat objects.
-
Returning a Blkio object with 2 list of IO stat entries with major, minor, operation, and values to the agent:
List Description ioServiceBytesRecursiveList of I/O service bytes ioServicedRecursiveList of I/O serviced -
Aggregating the
totalReadandtotalWritevalues from theIOServiceBytesRecursivelist:List Source totalReadioServiceBytesRecursive.ioStat.value(Aggregates all read values)totalWriteioServiceBytesRecursive.ioStat.value(Aggregates all write values) -
Displaying block IO metrics in the Instana UI from the backend: The following table outlines the block IO metrics that are displayed in the Instana UI:
Metric Source value Read ( blkio.blk_read)totalReadWrite ( blkio.blk_write)totalWrite
Health signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about the built-in event for the CRI-O sensor, see the Built-in events reference.
Logs widget
The CRI-O sensor does not support log collection natively. Instana recommends that you follow the Collecting Kubernetes and Red Hat OpenShift logs instructions to deploy the OpenTelemetry Collector to collect logs from CRI-O containers. To correlate logs with specific CRI-O containers, add the container.runtime attribute as specified in the linked instructions.