Monitoring OS Process
The OS Process sensor is automatically deployed and installed after you install the Instana agent.
Configuration
Process Abnormal Termination
The Instana agent can automatically detect abnormal termination of processes (e.g., crashes) of monitored processes and the issuing and outcome of out of memory killer events issues to monitored processes on the host.
Requirements
- The detection of abnormal process termination and out of memory killer is only supported on Linux on the AMD64 architecture. A 4.8 or later Linux kernel is required or, in the case of RHEL, a Linux kernel 3.10.0-957 or later.
-
debugfsto be mounted, which is the case for all Linux OSes supported by the Instana host agent, with the exception of Amazon Linux 1.
Anomalies detected
Abnormal Termination
- Exit with erroneous status codes, e.g.,
exit 1 - Kill of the process via
kill, a.k.a.SIGKILL - Segmentation faults
- Unhandled signals
Out of memory killer
The out of memory killer event is an event sent by the operating system or a container runtime to a process, called target process that is consuming (as in: "has allocated") more memory than what it is allowed to. The target process can then decide to terminate, or indicate which of its children processes to have terminated instead.
The use-case for selecting a child-process to terminate is, for example, to deal with leader-worker architectures like NGINX or PHP-FPM, where the leader process manages and delegates work to worker processes, which usually consume far more resources than the leader.
In Instana, out of memory killer events are documented based on two dedicated events:
- The
Out of memoryevent on a process represents the reception by that process of the out of memory killer event; the event documents which process was terminated as a result, to ease the understanding of what happened when the target process does not terminate itself. - The
Killed by out of memory killerevent documents which process was terminated as the result of anOut of memoryevent. TheKilled by out of memory killerevent is accompanied by anAbnormal Terminationevent due to anuncaught SIGKILL signalthat documents how the termination of the process occurred.
Depending on whether the target decided to terminate itself or one of its children, there are two possible scenarios in Instana:
-
If the target process decides to terminate itself, you will see on the target process three events:
Out of memoryevent, theKilled by out of memory killerevent, and of course theAbnormal Terminationevent.
-
If the target process selected one of its children to be terminated, you will see the
Out of memoryevent on the target process, andKilled by out of memory killerand theAbnormal Terminationevents on the children process that was selected.
As a side-note: one may think that the Abnormal
Termination and Killed by out of memory killer
are redundant. That is not so: they explain two different aspects
of the termination of a process: the Abnormal
Termination event explains the how and the
Killed by out of memory killer event explains the
why. Also, we find that it provides a better
user-experience to be able to find all the Abnormal
Termination events looking the same, irrespective of whether
they occurr due to an out of memory killer event or otherwise.
Deactivation
The detection of abnormal process termination can be disabled
with the following setting in the configuration.yaml
file:
com.instana.plugin.ebpf:
enabled: false
Custom Processes
Instana will automatically monitor process metrics of higher level sensors like Java or MySQL by default. Should you want to monitor an OS process which is not covered by Instana automatically, you can configure it like this:
com.instana.plugin.process:
processes:
- 'sshd'
- 'slapd'
Voluntary and non-voluntary context switches
You can manually enable monitoring of context switches by
editing the host's agent configuration file
(/opt/instana/agent/etc/instana/configuration.yaml):
...
com.instana.plugin.process:
ctx_switches_enabled: true
OS Process Environment Variables
Instana's process sensor automatically captures all the environment variables of any monitored process. Because environments often contains sensitive or secret data, the process sensor will take any configured secrets into account when filtering.
More about configuring secrets can be found at Agent Configuration secrets
You can also manually disable the monitoring of process
environment variables by editing the host agent configuration file
(/opt/instana/agent/etc/instana/configuration.yaml) as
follows:
...
com.instana.plugin.process:
env_vars_enabled: false
Metrics collection
To view the metrics, select Infrastructure in the sidebar of the Instana User interface, click a specific monitored host, and then you can see a host dashboard with all the collected metrics and monitored processes.
Configuration data
- PID
- Executable
- Started At
- User
- Group
- Max Open Files
- Arguments
Performance metrics
CPU usage
CPU usage values as a percentage; user and
system. The values are displayed on a graph over a
selected time period.
Data point: Filesystem
Granularity: 3 seconds
Normalized CPU usage
Normalized CPU usage values show the amount of CPU usage (in percentage) for executing user-mode and system-mode code of a process. The values are displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Memory
Memory usage values as a byte; virtual,
resident and share. The values are
displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Open Files
Open files values used as a total number and
current as a percentage. The values are displayed on a
graph over a selected time period.
Open files current vs max will be
visible when they are available on the operating system.
Data point: Filesystem
Granularity: 3 seconds
Number of context switches
Number of times the process was context-switched;
voluntary and nonvoluntary. The values
are displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Health Signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about the built-in event for the OS process sensor, see the Built-in events reference.