Monitoring CRI-O
The CRI-O sensor is automatically deployed and installed after you install the Instana agent.
Introduction
Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.
Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.
Metrics collection
To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.
By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml
:
com.instana.plugin.crio:
stats:
interval: 10
On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.
Pause containers
The pause container is a container that holds the network namespace for the pod. Kubernetes creates pause containers to acquire the IP address of respective pod and set up the network namespace for all other containers that join that pod.
The Infra (pause) containers are excluded from infra monitoring by default for the following reasons:
- The number of monitored containers in an environment is doubled when they are included. The exclusion can lower Instana monitoring costs.
- Monitoring pause containers does not bring much information on the infra level because they act as sidecar network helper containers.
Configuration data
Configuration | Description |
---|---|
Id | The container ID. |
Name | The container name. |
Image | The CRI-O image name. |
IP | The container IP. |
Created | The container created timestamp. |
Performance metrics
The performance metrics are collected using the runc
command.
CPU Total %
The total % of CPU usage. The current measured KPI value is displayed.
Data point: value is collected from the total
key returned in the cpu.usage
object.
Memory usage
The total memory usage. The current measured KPI value is displayed.
Data point: value is collected from the usage
key returned in the memory.raw
object.
Memory usage %
The total memory usage as a percentage. The current measured KPI value is displayed.
Data point: The value is calculated from the memory.total
and memory.usage
objects.
CPU
The total, kernel, and user metrics are displayed on a graph over a selected time period.
Data points: values are collected from the total
, kernel
, and user
keys returned in the cpu.usage
object.
Throttling count and time values are displayed on a graph over a selected time period.
Data points: values are collected from the throttling.throttledPeriods
and throttling.throttledTime
keys returned in the cpu_stats
object.
Memory
The usage, RSS, and cache metrics are displayed on a graph over a selected time period.
Data points: the values are collected from the usage
, max
, and limit
keys returned in the memory.usage
object.
Active anonymous, active cache, inactive anonymous, and inactive cache metrics are displayed on a graph over a selected time period.
Data points: the values are collected from the active_anon
, active_file
, inactive_anon
, and inactive_file
keys returned in the memory.raw
object.
The Memory Total RSS % graph shows the percentage of total memory resident set size used for the set interval. The data for the graph is calculated by using memory limit and total memory resident set size.
Block IO
The read and write values are displayed on a graph over a selected time period.
Data point: values are collected from the ioServiceBytesRecursive.op.Read
and ioServiceBytesRecursive.op.Write
fields.
Health signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about the built-in event for the CRI-O sensor, see the Built-in events reference.