Monitoring OS Process

The OS Process sensor is automatically deployed and installed after you install the Instana agent.

Configuration

Process Abnormal Termination

The Instana agent can automatically detect abnormal termination of processes (e.g., crashes) of monitored processes and the issuing and outcome of out of memory killer events issues to monitored processes on the host.

Requirements

  • The detection of abnormal process termination and out of memory killer is only supported on Linux on the AMD64 architecture. A 4.8 or later Linux kernel is required or, in the case of RHEL, a Linux kernel 3.10.0-957 or later.
  • debugfs to be mounted, which is the case for all Linux OSes supported by the Instana host agent, with the exception of Amazon Linux 1.

Anomalies detected

Abnormal Termination
  • Exit with erroneous status codes, e.g., exit 1
  • Kill of the process via kill, a.k.a. SIGKILL
  • Segmentation faults
  • Unhandled signals

Abnormal process termination displayed on the process's dashboard.

Out of memory killer

The out of memory killer event is an event sent by the operating system or a container runtime to a process, called target process that is consuming (as in: "has allocated") more memory than what it is allowed to. The target process can then decide to terminate, or indicate which of its children processes to have terminated instead.

The use-case for selecting a child-process to terminate is, for example, to deal with leader-worker architectures like NGINX or PHP-FPM, where the leader process manages and delegates work to worker processes, which usually consume far more resources than the leader.

In Instana, out of memory killer events are documented based on two dedicated events:

  • The Out of memory event on a process represents the reception by that process of the out of memory killer event; the event documents which process was terminated as a result, to ease the understanding of what happened when the target process does not terminate itself.
  • The Killed by out of memory killer event documents which process was terminated as the result of an Out of memory event. The Killed by out of memory killer event is accompanied by an Abnormal Termination event due to an uncaught SIGKILL signal that documents how the termination of the process occurred.

Depending on whether the target decided to terminate itself or one of its children, there are two possible scenarios in Instana:

  • If the target process decides to terminate itself, you will see on the target process three events: Out of memory event, the Killed by out of memory killer event, and of course the Abnormal Termination event.

    Target process that decided to terminate itself after receiving an out of memory killer event.

  • If the target process selected one of its children to be terminated, you will see the Out of memory event on the target process, and Killed by out of memory killer and the Abnormal Termination events on the children process that was selected.

As a side-note: one may think that the Abnormal Termination and Killed by out of memory killer are redundant. That is not so: they explain two different aspects of the termination of a process: the Abnormal Termination event explains the how and the Killed by out of memory killer event explains the why. Also, we find that it provides a better user-experience to be able to find all the Abnormal Termination events looking the same, irrespective of whether they occurr due to an out of memory killer event or otherwise.

Deactivation

The detection of abnormal process termination can be disabled with the following setting in the configuration.yaml file:

com.instana.plugin.ebpf:
  enabled: false

Custom Processes

Instana will automatically monitor process metrics of higher level sensors like Java or MySQL by default. Should you want to monitor an OS process which is not covered by Instana automatically, you can configure it like this:

com.instana.plugin.process:
  processes:
    - 'sshd'
    - 'slapd'

Voluntary and non-voluntary context switches

This functionality is supported only on Linux-based operating systems.

You can manually enable monitoring of context switches by editing the host's agent configuration file (/opt/instana/agent/etc/instana/configuration.yaml):

...
com.instana.plugin.process:
  ctx_switches_enabled: true

OS Process Environment Variables

Instana's process sensor automatically captures all the environment variables of any monitored process. Because environments often contains sensitive or secret data, the process sensor will take any configured secrets into account when filtering.

More about configuring secrets can be found at Agent Configuration secrets

You can also manually disable the monitoring of process environment variables by editing the host agent configuration file (/opt/instana/agent/etc/instana/configuration.yaml) as follows:

...
com.instana.plugin.process:
  env_vars_enabled: false

configuring secrets

Windows Services

Instana supports monitoring of Windows Services and its child processes. You can configure it like this:

...
com.instana.plugin.process:
  services:
    - 'WindowsService1'
    - 'WindowsService2'

Metrics collection

To view the metrics, select Infrastructure in the sidebar of the Instana User interface, click a specific monitored host, and then you can see a host dashboard with all the collected metrics and monitored processes.

Configuration data

  • PID
  • Executable
  • Started At
  • User
  • Group
  • Max Open Files
  • Arguments

Performance metrics

CPU usage

CPU usage values as a percentage; user and system. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 3 seconds

Memory

Memory usage values as a byte; virtual, resident and share. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 3 seconds

Open Files

Open files values used as a total number and current as a percentage. The values are displayed on a graph over a selected time period.

Open files current vs max will be visible when they are available on the operating system.

Data point: Filesystem

Granularity: 3 seconds

Number of context switches

Number of times the process was context-switched; voluntary and nonvoluntary. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 3 seconds

Health Signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the OS process sensor, see the Built-in events reference.