Monitoring CRI-O

The CRI-O sensor is automatically deployed and installed after you install the Instana agent.

Introduction

Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

Support information

To make sure that the CRI-O sensor is compatible with your current setup, check the following support information sections:

Supported versions and support policy

The following table shows the latest supported version and support policy:

Technology Support policy Latest technology version Latest supported version
CRI-O 45 days 1.36.1 1.36.1

For more information about the support policy, see Support strategy for sensors.

Prerequisites

For monitoring CRI‑O, ensure that the following version requirements are met:

  • Instana agent: 1.2.19 or later
  • CRI‑O sensor: 1.0.15 or later

Configuring CRI-O monitoring

The agent natively monitors CRI-O, and configuration is optional.

Configuring the polling rate

Note: Instana CRI-O sensor 1.0.16 and later support configuring the polling rate to reduce data ingestion. This feature is supported on self-hosted Instana backend 311 and later.

You can configure how often Instana polls CRI-O to collect data and metrics by using the poll_rate parameter in the agent configuration.yaml file as shown in the following example:

com.instana.plugin.crio:
  poll_rate: 1 # values are in seconds. Default value is 1 second.

Metrics collection

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.crio:
  stats:
    interval: 10
 

On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.

Pause containers

The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.

The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:

  • The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
  • Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.

Configuration data

Configuration Description
Id The container ID
Name The container name
Image The CRI-O image name
IP The container IP
Created The container created timestamp
CRI-O Version The CRI-O runtime version number

Performance metrics

To collect the performance metrics, run the runc command.

The following table summarizes the CPU and memory usage metrics:

Metric Description Data point
CPU Total % Total percentage of CPU usage Total key returned in thecpu.usage object
Memory usage Total memory usage Usage key returned in thememory.raw object
Memory usage % Total memory usage as a percentage Calculated from thememory.total andmemory.usage objects

Collecting CPU metrics

CRI-O sensors run the metrics collection and directs the cgroup CPU parsing to the agent. The parser reads CPU stat files from the respective cgroup directory paths.

After parsing, a CPU object that is populated with the CPU metrics is returned from the CRI-O sensor.

The following steps outline the process for CPU metrics collection and parsing in CRI-O:

  1. Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:

    • cgroup v2: /proc/1/root/sys/fs/cgroup and cgroupPath
    • cgroup v1: /proc/1/root/sys/fs/cgroup/cpu and cgroupPath
  2. Parsing CPU statistics for cgroup (v2 and v1) directories: The parser reads the CPU files from the respective cgroup directory paths.

    • For cgroup v2 directories:

      The following table lists the usage and throttling metrics that are populated from the cpu.stat file in the CPU object:

      Metrics Description v2 Source
      Usage metrics Total CPU usage (in nano secs) total =usage_usec * 1000 cpu.stat file
      Per CPU usage percpu = 0 (cgroup v2 does not provide per-cpu usage directly) cpu.stat file
      User CPU time user =user_usec * 1000 (in nano secs) See the following note cpu.stat file
      System CPU time kernel =system_usec * 1000 (in nano secs) See the following note cpu.stat file
      Throttling metrics Number of periods periods =nr_periods cpu.stat file
      Number of throttled periods throttledPeriods =nr_throttled cpu.stat file
      Throttled Time throttledTime =throttled_usec * 1000 (in nano secs) cpu.stat file
      Note: Both user_usec and system_usec must be available for parsing. If either is unavailable, the corresponding CPU time is set to 0.
    • For cgroup v1 directories:

      The following table lists the usage and throttling metrics for cgroup v1 in the CPU object:

      Metrics Description v1 Source
      Usage metrics Total CPU usage (in nano secs) total cpuacct.usage file
      Per CPU usage percpu Sets a list of values from thecpuacct.usage_percpu file
      User CPU time user =user * 1000000 (ns) cpuacct.stat file
      System CPU time kernel =system * 1000000 (ns) cpuacct.stat file
      Throttling metrics Number of periods periods =nr_periods cpu.stat file
      Number of throttled periods throttledPeriods =nr_throttled cpu.stat file
      Throttled Time throttledTime =throttled_time cpu.stat file
  3. Updating and returning CPU metrics:

    The following table provides a comprehensive mapping of CPU metrics to their corresponding source object:

    CPU metrics Source object
    CPU usage metrics cpu.usage object
    cpuTotalUsageNanoseconds cpu.usage.total
    cpuSystemUsageNanoseconds cpu.usage.kernel
    cpuUserUsageNanoseconds cpu.usage.user
    cpuTotalUsage cpuTotalUsageNanoseconds delta (total CPU usage time of the container over a time window) /system delta (total system CPU time available)
    cpuUserUsage cpuUserUsageNanoseconds delta (cpu usage time of the container in user mode over a time window ) /system delta (total system CPU time available)
    cpuSystemUsage cpuSystemUsageNanoseconds delta (CPU usage time of the container in system or kernel mode over a time window) / (system delta (total system CPU time available)
    Throttling metrics cpu.throttling object
    throttlingCount cpu.throttling.throttledPeriods
    throttlingTime cpu.throttling.throttledTime
  4. Displaying CPU metrics in the Instana UI from the backend:

    The following table summarizes the CPU metrics that are displayed in the Instana UI:

    CPU metrics Description Source value
    Total (cpu.total_usage) Total CPU usage cpuTotalUsage
    Kernel (cpu.system_usage) System CPU usage cpuSystemUsage
    User (cpu.user_usage) User CPU usage cpuUserUsage
    Throttling Count (cpu.throttling_count) Number of throttled periods throttlingCount
    Throttling Time (cpu.throttling_time) Throttled time throttlingTime

Collecting memory metrics

CRI-O sensors run the metrics collection and directs the cgroup memory parsing to the agent. The memory parser reads memory files from the respective cgroup directory paths. After parsing, a memory object that is populated with the memory metrics is returned from the CRI-O sensor.

The following steps outline the process for memory metrics collection and parsing in CRI-O:

  1. Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:

    • cgroup v2: /proc/1/root/sys/fs/cgroup and cgroupPath
    • cgroup v1: /proc/1/root/sys/fs/cgroup/memory and cgroupPath
  2. Parsing memory statistics for cgroup directories: The parser reads the memory files from the respective cgroup directory paths.

    • For cgroup v2 directories:

      The following table provides a comprehensive mapping of memory metrics to their respective source files:

      Memory metrics Source file
      Base memory Base memory files
      usage memory.stat (file +anon)
      limit memory.max
      max memory.peak
      Swap memory Swap files
      usage memory.swap.current
      limit memory.swap.max
      max memory.swap.peak
      Raw memory field (memory.raw) memory.stat file
      activeAnon active_anon
      activeFile active_file
      inactiveAnon inactive_anon
      inactiveFile inactive_file
      totalCache total_cache
      totalRss total_rss
      Memory fields (memory) memory.stat file
      cache file
      rss anon
      Note: Notes: For v1 compatibility, combined swap memory is calculated as base memory plus swap memory: swap.usage = Base memory usage + swap.usage. swap.limit = Base memory limit+ swap.limit swap.max = 0
    • For cgroup v1 directories:

      The following table provides a comprehensive mapping of memory metrics to their respective source files:

      Memory metrics Source file
      Base memory Base memory files
      usage memory.usage_in_bytes
      max memory.max_usage_in_bytes
      limit memory.limit_in_bytes
      failcnt memory.failcnt
      Swap memory Swap files
      usage memory.memsw.usage_in_bytes
      max memory.memsw.max_usage_in_bytes
      limit memory.memsw.limit_in_bytes
      failcnt memory.memsw.failcnt
      Kernel memory Kmemory files
      usage memory.kmem.usage_in_bytes
      max memory.kmem.max_usage_in_bytes
      limit memory.kmem.limit_in_bytes
      failcnt memory.kmem.failcnt
      Kernel TCP memory Kmemory TCP files
      usage memory.kmem.tcp.usage_in_bytes
      max memory.kmem.tcp.max_usage_in_bytes
      limit memory.kmem.tcp.limit_in_bytes
      failcnt memory.kmem.tcp.failcnt
      Raw memory fields similar fields as v2 frommemory.stat
      Memory fields (memory) memory.stat file
      cache cache
      rss total_rss
  3. Updating and returning memory metrics:

    The following table maps memory metrics to their respective source objects:

    Memory metric Source object
    Memory usage metrics memory.usage object
    usage memory.usage.usage
    maxUsage memory.usage.max
    limit memory.usage.limit
    Memory raw metrics memory.raw object
    activeAnon memory.raw.activeAnon
    activeFile memory.raw.activeFile
    inactiveAnon memory.raw.inactiveAnon
    inactiveFile memory.raw.inactiveFile
    Memory metrics memory object
    totalCache memory.cache
    totalRss memory.rss
  4. Displaying memory metrics in the Instana UI from the backend:

    The following memory metrics are displayed in the Instana UI:

    Metric Source value
    Memory Total RSS% (memory.total_rss_percent) total_rss/limit
    Active anonymous (memory.active_anon) activeAnon
    Active cache (memory.active_file) activeFile
    Inactive anonymous (memory.inactive_anon) inactiveAnon
    Inactive cache (memory.inactive_file) inactiveFile
    Usage (memory.usage) usage
    RSS (memory.total_rss) totalRss
    Cache (memory.total_cache) totalCache

Collecting Block IO metrics

CRI-O sensors run the metrics collection and directs the cgroup blk-io parsing to the agent. Block IO parser reads blk-io stats files from the respective cgroup directory paths. After parsing, a Blkio object that is populated with the IO stats is returned from the CRI-O sensor.

The following steps outline the process for block IO metrics collection and parsing in CRI-O:

  1. Identifying the cgroup directory paths: The agent identifies the cgroup version (v1 or v2) and respective cgroup directory paths based on the OS process:

    • cgroup v2: /proc/1/root/sys/fs/cgroup + cgroupPath
    • cgroup v1: /proc/1/root/sys/fs/cgroup/blkio + cgroupPath
  2. Parsing block IO statistics for cgroup directories: The parser reads the block IO files from the respective cgroup directory paths.

    • For cgroup v2 directories:

      • Reads io.stat file and returns 2 lists with IO stat entries

        • ioServiceBytesRecursive list: List of I/O service bytes
        • ioServicedRecursive list: List of I/O serviced
      • Maps raw operation in the io.stat file to canonical operation:

        • rbytes/rios: Read
        • wbytes/wios: Write
      Note: The operation rbytes/wbytes adds the IOStat entry to the IoServiceBytesRecursive list with values. The operation rios/wios adds the IOStat entry to the IoServicedRecursive list with values.
    • For cgroup v1 directories:

      • Parses the blkio.io_service_bytes file and directly sets the service bytes IO statistics.(ioServiceBytesRecursive list)
      • Parses the blkio.io_serviced file and directly sets the serviced IO statistics.(ioServicedRecursive list)
  3. Mapping major and minor device numbers to respective fields in the IO stat objects.

  4. Returning a Blkio object with 2 list of IO stat entries with major, minor, operation, and values to the agent:

    List Description
    ioServiceBytesRecursive List of I/O service bytes
    ioServicedRecursive List of I/O serviced
  5. Aggregating the totalRead and totalWrite values from the IOServiceBytesRecursive list:

    List Source
    totalRead ioServiceBytesRecursive.ioStat.value (Aggregates all read values)
    totalWrite ioServiceBytesRecursive.ioStat.value (Aggregates all write values)
  6. Displaying block IO metrics in the Instana UI from the backend: The following table outlines the block IO metrics that are displayed in the Instana UI:

    Metric Source value
    Read (blkio.blk_read) totalRead
    Write (blkio.blk_write) totalWrite

Health signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the CRI-O sensor, see the Built-in events reference.

Logs widget

The CRI-O sensor does not support log collection natively. Instana recommends that you follow the Collecting Kubernetes and Red Hat OpenShift logs instructions to deploy the OpenTelemetry Collector to collect logs from CRI-O containers. To correlate logs with specific CRI-O containers, add the container.runtime attribute as specified in the linked instructions.