Monitoring OS Process
The OS Process sensor is automatically deployed and installed after you install the Instana agent.
Configuration
Process Abnormal Termination
The Instana agent can automatically detect abnormal termination of processes (e.g., crashes) of monitored processes and the issuing and outcome of out of memory killer events issues to monitored processes on the host.
Requirements
- The detection of abnormal process termination and out of memory killer is only supported on Linux on the AMD64 architecture. A 4.8 or later Linux kernel is required or, in the case of RHEL, a Linux kernel 3.10.0-957 or later.
debugfs
to be mounted, which is the case for all Linux OSes supported by the Instana host agent, with the exception of Amazon Linux 1.
Anomalies detected
Abnormal Termination
- Exit with erroneous status codes, e.g.,
exit 1
- Kill of the process via
kill
, a.k.a.SIGKILL
- Segmentation faults
- Unhandled signals
Out of memory killer
The out of memory killer event is an event sent by the operating system or a container runtime to a process, called target process that is consuming (as in: "has allocated") more memory than what it is allowed to. The target process can then decide to terminate, or indicate which of its children processes to have terminated instead.
The use-case for selecting a child-process to terminate is, for example, to deal with leader-worker architectures like NGINX or PHP-FPM, where the leader process manages and delegates work to worker processes, which usually consume far more resources than the leader.
In Instana, out of memory killer events are documented based on two dedicated events:
- The
Out of memory
event on a process represents the reception by that process of the out of memory killer event; the event documents which process was terminated as a result, to ease the understanding of what happened when the target process does not terminate itself. - The
Killed by out of memory killer
event documents which process was terminated as the result of anOut of memory
event. TheKilled by out of memory killer
event is accompanied by anAbnormal Termination
event due to anuncaught SIGKILL signal
that documents how the termination of the process occurred.
Depending on whether the target decided to terminate itself or one of its children, there are two possible scenarios in Instana:
-
If the target process decides to terminate itself, you will see on the target process three events:
Out of memory
event, theKilled by out of memory killer
event, and of course theAbnormal Termination
event. -
If the target process selected one of its children to be terminated, you will see the
Out of memory
event on the target process, andKilled by out of memory killer
and theAbnormal Termination
events on the children process that was selected.
As a side-note: one may think that the Abnormal Termination
and Killed by out of memory killer
are redundant. That is not so: they explain two different aspects of the termination of a process: the Abnormal Termination
event explains the how and the Killed by out of memory killer
event explains the why. Also, we find that it provides a better user-experience to be able to find all the Abnormal Termination
events looking the same, irrespective of whether they occurr due to an out of memory killer event or otherwise.
Deactivation
The detection of abnormal process termination can be disabled with the following setting in the configuration.yaml
file:
com.instana.plugin.ebpf:
enabled: false
Custom Processes
Instana will automatically monitor process metrics of higher level sensors like Java or MySQL by default. Should you want to monitor an OS process which is not covered by Instana automatically, you can configure it like this:
com.instana.plugin.process:
processes:
- 'sshd'
- 'slapd'
Voluntary and non-voluntary context switches
This functionality is supported only on Linux-based operating systems.
You can manually enable monitoring of context switches by editing the host's agent configuration file (/opt/instana/agent/etc/instana/configuration.yaml
):
...
com.instana.plugin.process:
ctx_switches_enabled: true
OS Process Environment Variables
Instana's process sensor automatically captures all the environment variables of any monitored process. Because environments often contains sensitive or secret data, the process sensor will take any configured secrets into account when filtering.
More about configuring secrets can be found at Agent Configuration secrets
You can also manually disable the monitoring of process environment variables by editing the host agent configuration file (/opt/instana/agent/etc/instana/configuration.yaml
) as follows:
...
com.instana.plugin.process:
env_vars_enabled: false
Windows Services
Instana supports monitoring of Windows Services and its child processes. You can configure it like this:
...
com.instana.plugin.process:
services:
- 'WindowsService1'
- 'WindowsService2'
Metrics collection
To view the metrics, select Infrastructure in the sidebar of the Instana User interface, click a specific monitored host, and then you can see a host dashboard with all the collected metrics and monitored processes.
Configuration data
- PID
- Executable
- Started At
- User
- Group
- Max Open Files
- Arguments
Performance metrics
CPU usage
CPU usage values as a percentage; user
and system
. The values are displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Memory
Memory usage values as a byte; virtual
, resident
and share
. The values are displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Open Files
Open files values used
as a total number and current
as a percentage. The values are displayed on a graph over a selected time period.
Open files current
vs max
will be visible when they are available on the operating system.
Data point: Filesystem
Granularity: 3 seconds
Number of context switches
Number of times the process was context-switched; voluntary
and nonvoluntary
. The values are displayed on a graph over a selected time period.
Data point: Filesystem
Granularity: 3 seconds
Health Signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about the built-in event for the OS process sensor, see the Built-in events reference.