Monitoring an AIX host

You can monitor your host with Instana. Instana provides comprehensive insights into the host's performance, health, and resource utilization, enabling efficient troubleshooting, performance optimization, and proactive issue detection.

Important: Instana SaaS 1.0.314 (23 February 2026) and Host sensor 1.1.227 introduce significant enhancements to AIX host monitoring metrics. To ensure correct data visualization and avoid missing or inconsistent metrics, upgrade the Host sensor, Instana backend, and UI. For instructions on updating the host agent, see Updating the host agent. If the metrics still do not appear in the UI after updating, restart the agent and wait 10-15 minutes depending on network bandwidth. For a complete overview of the changes introduced in this release, refer to the Host sensor release notes.

System information

Instana collects comprehensive system information from your AIX host. View these details in the System pane of the Instana dashboard:

Parameter Description
OS Operating system information with kernel version and system architecture.
CPU Logical processing units that are available to the system.
Memory Amount of system memory in GiB (gibibytes).
Hostname Hostname of the AIX machine.
FQDN The fully qualified domain name. It is the complete domain name of the host, including the subdomain and top-level domain.
System ID Unique identifier that is used by Instana to manage the monitored host and correlate with asset management systems.
Host ID The MAC address of the host's network interface, which is a unique identifier for the network adapter.
Hardware brand Name of the hardware manufacturer.
Hardware model Model name of the hardware.
Machine serial number Serial number of the machine.
Virtual processors Number of logical CPU execution units that are assigned to an LPAR by the Power Hypervisor on which the AIX operating system schedules work. This value represents core-equivalent processors and excludes SMT threads.
Started at Time when the system started.

System ID is used for correlation with asset management systems. Enable System ID collection by configuring the agent YAML file as shown in the following example:

"com.instana.plugin.host": 
  "collectSystemId": true

Interfaces

You can find the following details:

  • Interfaces: The list of network interfaces and IP addresses.
  • Instana agent: The Instana agent for the host.
  • Process: The count and details of the processes that are running on the host.

Reporting status

The historical availability of an AIX host is shown in the Reporting Status chart on the AIX host dashboard. You can see three color indicators that identify the status of a host reporting to Instana.

Status Description Color indicator
Reporting The host reported to Instana without any interruptions. Green
Reporting - monitoring issues The host reported to Instana with some interruption (such as, network interruptions or agent monitoring issues) and is not fully available. Orange
Not Reporting The host was not reporting to Instana at all during this time. Red

The metric that is used to display this data on the host dashboard is based on the aggregation of messages received from the agent monitoring the host. A host is classified as Reporting if Instana has received at least 98% of the expected messages in a given timeframe.

For example, if the metric aggregation time window is 5 minutes and the poll rate of the host is once per second, Instana expects to receive 300 messages from the host during that timeframe.

  • If at least 294 messages are received (98% of 300), the host status is shown as Reporting.
  • If less than 294 but greater than 0 messages are received, the host status is shown as Reporting – Monitoring Issues.
  • If no messages are received, the host status is shown as Not Reporting.

Performance metrics

Instana monitors and displays a comprehensive set of performance metrics for AIX hosts. These metrics provide detailed insights into system resource utilization, including CPU usage patterns, memory allocation, disk I/O operations, network interface activity, and process behavior, enabling effective performance analysis and troubleshooting.

CPU usage: Overall

This section shows the total CPU usage percentage across all processors, representing the combined utilization of all CPU resources on the host. This aggregate metric helps you quickly assess overall system load and identify periods of high CPU demand.

To collect more accurate CPU usage in an AIX LPAR environment, you must set useMpstat to true as shown in the following example:

To collect more accurate CPU usage in an AIX LPAR environment, you must set `useMpstat` to true in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml) as shown in the following example:

com.instana.plugin.host:
  useMpstat: true
 
Metric Description Granularity
CPU usage Total CPU usage in percentage for the time range that you set. 1 second

Memory usage: Overall

This section displays the overall memory utilization percentage for the AIX host, representing the total amount of physical memory currently in use. This metric provides a quick overview of memory consumption and helps identify potential memory pressure on the system.

Metric Description Granularity
Memory usage Total memory usage in percentage for the time range that you set. 1 second
Note: In AIX LPAR environments, the used memory value is computed as follows: (computational + non-computational) ÷ real total. This calculation includes both computational memory (actively used by applications and the operating system) and non-computational memory (used for caching and other reclaimable purposes). The non-computational component often represents a large portion of used memory, which means a high used percentage doesn't necessarily signal insufficient memory. For effective memory monitoring and capacity planning on AIX, the computational memory percentage is more informative, as it indicates actual memory pressure and helps determine if the system is over-committed.

CPU load: Peak

This section monitors the maximum CPU load reached during the selected time period, indicating the highest level of CPU demand experienced by the system. Peak load measurements are essential for capacity planning, as they reveal the upper bounds of system utilization and help identify periods when CPU resources may be insufficient to handle workload spikes.

Metric Description Granularity
Load Maximum CPU load recorded for the selected time range, representing the peak number of processes ready to run or actively running on the system. 1 second

Average run queue (1h)

This section monitors the average run queue depth over the last hour, measuring how many processes are waiting for CPU execution time. The run queue is a key indicator of CPU contention—higher values indicate more processes competing for CPU resources, which can signal the need for performance optimization or additional CPU capacity.

Metric Description Granularity
Average run queue (1h) Average number of processes in the run queue over the last 60 minutes. If the agent is up for less than 60 minutes, it displays Not collected. 60 minutes

Physical CPU consumed

Physical CPU consumed represents the actual amount of physical processor capacity used by an LPAR, measured in processor units (cores), regardless of how many virtual processors are configured.

Metric Description Granularity
Physical CPU consumed Number of physical processors consumed. 1 second

User sessions

This section monitors concurrent user login sessions on the AIX host, tracking how many users are actively logged into the system. This metric helps administrators monitor system access, identify unusual login patterns, and ensure compliance with licensing or security policies.

Metric Description Granularity
User sessions Number of concurrent user login sessions on the host. 1 minute

CPU usage: Total

This section breaks down total CPU usage into specific categories, showing how processor time is allocated across user processes, system operations, I/O wait states, and idle time.

Metric Description Granularity
User Percentage of CPU time spent executing user-space processes, including applications and user-initiated services. 1 second
System Percentage of CPU time spent executing kernel operations, including system calls, device drivers, and core operating system functions. 1 second
Wait Percentage of CPU time spent waiting for I/O operations to complete, indicating potential disk or network bottlenecks. 1 second
Idle Percentage of CPU time when the processor was idle and not waiting for I/O operations, indicating available CPU capacity. 1 second

CPU events

This section monitors CPU-related system events, tracking context switches and device interrupts that impact processor performance and system responsiveness.

Metric Description Granularity
Context switches Number of times the CPU switches between processes or threads. High values might indicate excessive multitasking or resource contention. 1 second
Device interrupts Number of hardware interrupt requests from peripheral devices that require immediate CPU attention for I/O operations. 1 second

CPU load: Average

This section tracks the average CPU load over time, measuring the number of processes competing for CPU resources. This metric helps assess system workload and identify periods of high demand or resource contention.

Metric Description Granularity
CPU load Average number of runnable or waiting processes over a 1-minute period, indicating overall system workload and CPU resource pressure. 5 seconds

Physical processor utilization

This section monitors physical processor utilization in AIX LPARs, tracking actual physical CPU consumption.

Metric Description Granularity
Physical CPU consumed Number of physical processors consumed. 1 second
Hypervisor calls Percentage of physical processor time spent making hypervisor calls. 1 second
Stolen busy cycles Percentage of physical processor utilization that occurs while the hypervisor is stealing busy cycles. 1 second
Stolen idle cycles Percentage of physical processor utilization that occurs while the hypervisor is stealing idle cycles. 1 second

By default, physical processor utilization metrics are not collected. To enable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectPhysicalProcessorUtil: true

This metric is also displayed in the Physical processor utilization time-series chart, where historical trends can be analyzed.

Individual CPU Usage

This section displays CPU usage breakdown for each individual processor, showing how CPU time is allocated across different execution states:

Metric Description Granularity
User Percentage of CPU time spent running user-space processes (applications and services). 1 second
System Percentage of CPU time spent running kernel-space processes (OS core functions). 1 second
Wait Percentage of CPU time spent waiting for I/O operations to complete. 1 second
Idle Percentage of CPU time when the processor was idle. 1 second

By default, individual CPU usage metrics are not collected. To enable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectIndividualCpuMetrics: true

By default, individual CPU usage metrics are disabled to optimize performance and reduce system overhead. These metrics provide per-processor breakdowns but can impact system performance when monitoring systems with a large number of logical CPUs. Enable this collection only if needed.

Memory

This section monitors memory resources on the AIX host, providing metrics for physical and virtual memory usage, swap space allocation, and paging activity to help assess memory pressure and optimize system performance.

Metric Unit Description Granularity
Used % Percentage of physical memory currently allocated, including both computational and non-computational memory. 1 second
Computational % Percentage of memory actively used by applications and the operating system for processing tasks. 1 second
Non-computational % Percentage of memory used for caching and other reclaimable purposes that can be freed if needed. 1 second
Computational Byte Absolute amount of memory actively used by applications and the operating system. 1 second
Non-computational Byte Absolute amount of memory used for caching and other reclaimable purposes. 1 second
Real available Byte Amount of physical memory available for allocation without requiring paging or swapping. 1 second
Minimum file page threshold (minperm%) % Minimum threshold for file pages, below which memory might be reclaimed from both file and computational pages. 10 minutes
Swap used % Percentage of swap space currently in use, indicating memory pressure when high. 1 second
Virtual used % Percentage of virtual memory (physical memory plus swap space) currently allocated. 1 second
Swap total Byte Total amount of swap space configured on the system. 1 second
Swap free Byte Amount of swap space available for use. 1 second
Virtual total Byte Total virtual memory capacity (physical memory plus swap space). 1 second
Virtual free Byte Amount of virtual memory available for allocation. 1 second
Virtual active Byte Amount of virtual memory actively being used by running processes. 1 second
Page-in Rate Number of pages read from disk into physical memory, indicating memory demand exceeding available RAM. 1 second
Page-out Rate Number of pages written from physical memory to disk, indicating memory pressure and potential performance impact. 1 second
Page-scan Rate Number of memory pages scanned by the system to identify candidates for reclamation or swapping. 1 second
Page-faults Rate Number of page fault exceptions when processes access memory pages not currently in physical RAM. 1 second
Page-reclaims Rate Number of memory pages reclaimed from the free list without requiring disk I/O. 1 second

All memory metrics are visualized in the Instana dashboard as time-series graphs, allowing you to analyze trends and correlations across different memory components over your selected time range.

I/O activity

This section monitors system I/O activity, tracking both application-level operations and physical disk I/O.

Metric Description Granularity
Reads Number of read and readv system calls executed by applications, representing high-level file read requests that may be satisfied from cache or require disk access. 1 second
Writes Number of write and writev system calls executed by applications, representing high-level file write requests that are initially written to buffer cache. 1 second
Block reads Number of physical block read operations from disk devices during the sampling period, indicating actual disk read I/O that bypassed or missed the buffer cache. 1 second
Block writes Number of physical block write operations to disk devices, including both synchronous and asynchronous writes that flush data from buffer cache to persistent storage. 1 second
Non block reads Number of physical block read operations including both synchronous and asynchronous I/O, providing insight into total disk read activity patterns. 1 second
Non block writes Number of raw I/O write operations that bypass the file system buffer cache entirely, typically used for direct I/O or database operations requiring immediate writes. 1 second
Logical block reads Number of logical block reads satisfied directly from system buffer cache without physical disk access, indicating effective cache utilization for read operations. 1 second
Logical block writes Number of logical block writes to system buffer cache that will be asynchronously flushed to disk later, helping assess write buffering effectiveness. 1 second
System calls The total number of system calls that are made by all processes 1 second

By default, collection of these metrics is enabled. To disable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectIOActivity: false

Process statistics

This section monitors process and thread activity, tracking process states, execution patterns, and system call activity to help assess workload distribution and identify scheduling issues.

Metric Description Granularity
Total Processes Total number of processes currently on the system across all states, including active, sleeping, stopped, and idle processes. 1 second
Runnable Number of processes waiting to be run, including both processes that are able to run and those currently executing on the CPU. 1 second
Threads waiting Number of processes or threads blocked while waiting for page-in operations to complete, indicating memory paging activity. 1 second
Execs executed Number of exec system calls executed during the sampling period, representing program execution and replacement operations. 1 second
Forks executed Number of fork system calls executed during the sampling interval, representing new process creation activity. 1 second
Stopped Number of processes currently in a stopped state, typically paused by job control signals or during debugging sessions. 1 second
Sleeping Number of processes currently in sleep state, waiting for events, resources, or I/O operations to complete. 1 second
Idle Number of processes currently in idle state with no active work to perform or resources to consume. 1 second
Zombie The number of zombie processes that completed execution but still have entries in the process table, waiting for their parent process to read status. 1 second

By default, collection of these metrics is enabled. To disable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectProcessStatistics: false

Disks

This section monitors physical disk performance metrics, including I/O operations, transfer rates, and utilization patterns.

Metric Description Granularity
Disk name Name of the physical disk device. 10 minutes
Average transfer size Average number of bytes transferred per disk I/O operation. It helps assess typical transfer sizes and identify whether workloads are performing large sequential operations or small random I/O. 5 seconds
Busy Percentage of time the disk is actively transferring data. Values exceeding 30% often indicate excessive paging activity or I/O-bound processes. When combined with high CPU usage (>80%), this value typically signals system overload requiring attention. 5 seconds
Transfer rate Number of data transfer operations completed per second, representing the disk's overall transaction throughput and helping assess I/O workload intensity. 5 seconds
Read operations Number of read transfer operations completed per second, applicable to all storage device types except adapters, indicating read workload intensity on the disk. 5 seconds
Write operations Number of write transfer operations completed per second, applicable to all storage device types except adapters, indicating write workload intensity on the disk. 5 seconds
Queue full count Frequency per second that the disk's service queue reached its maximum capacity and could not accept additional requests, indicating I/O saturation and potential performance degradation. 5 seconds
Data transferred Total kilobytes transferred during the interval, providing a key indicator of disk data movement speed, though actual performance also depends on disk format and space usage efficiency. 5 seconds
Data read Number of bytes per second read from the disk, measured over the monitoring interval to track read throughput and identify read-intensive workloads. 5 seconds
Data written Number of bytes per second written to the disk, measured over the monitoring interval to track write throughput and identify write-intensive workloads. 5 seconds
Type Storage device type classification, identifying the specific kind of disk or storage adapter for proper performance interpretation and troubleshooting. 10 minutes

By default, data is collected for the top 10 disks based on busy percentage (highest to lowest). To collect data for all disks, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectAllDisks: true # Set false to collect only top 10 disks

Physical volumes

This section monitors physical volumes. A physical volume is a raw storage device (disk or partition) that must be initialized and added to a volume group before it can be used.

Metric Description Granularity
Physical volume name Device identifier assigned to the physical volume for system reference and management operations. 10 minutes
Total size Total raw storage capacity available on the physical volume for allocation to volume groups. 10 minutes
Used size Amount of storage currently allocated from this physical volume to logical volumes within the volume group. 8 minutes
Free size Amount of unallocated storage remaining on the physical volume available for new or expanding logical volumes. 8 minutes
Used space Percentage of physical volume capacity that is currently allocated to the volume group. 8 minutes
Free space Percentage of physical volume capacity that is available for allocation to the volume group. 8 minutes

By default, data is collected for the top 10 physical volumes based on capacity utilization (highest to lowest). To collect data for all physical volumes, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectAllVolumeGroups: true # Set false to collect only top 10 physical volumes.

Volume groups

This section monitors volume groups. A volume group is a collection of physical volumes that creates a storage pool.

Metric Description Granularity
Volume group name Unique identifier assigned to the volume group for management and reference purposes. 10 minutes
Total size Total storage capacity available in the volume group, representing the sum of all physical volumes. 10 minutes
Used size Amount of storage currently allocated to logical volumes within the volume group. 8 minutes
Free size Amount of unallocated storage remaining in the volume group available for new logical volumes. 8 minutes
Used space Percentage of volume group capacity currently allocated to logical volumes. 8 minutes
Free space Percentage of volume group capacity available for allocation to new or expanding logical volumes. 8 minutes
Active physical volumes Number of physical volumes currently active and accessible within the volume group. 8 minutes
Physical volumes Total number of physical volumes configured in the volume group, including both active and inactive. 8 minutes
Logical volumes Number of logical volumes currently defined within the volume group. 8 minutes
Volume group state Operational status of the volume group, indicating whether it is active, inactive, or in a varied state. 10 minutes

By default, data is collected for the top 10 volume groups based on capacity utilization (highest to lowest). To collect data for all volume groups, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectAllVolumeGroups: true # Set false to collect only top 10 volume groups.

Logical volumes

This section monitors logical volumes. A logical volume is a virtual block device created within a volume group that can be used for file systems or raw storage.

Metric Description Granularity
Volume group name Name of the volume group containing this logical volume, establishing the storage pool relationship. 10 minutes
Logical volume name Unique identifier assigned to the logical volume for system reference and management operations. 10 minutes
Size Total storage capacity allocated to the logical volume, which can be dynamically adjusted as needed. 10 minutes
Type Logical volume type, indicating its purpose such as jfs2, jfs2log, paging, or other specialized functions. 10 minutes
Mount point File system mount point where the logical volume is accessible, if it hosts a mounted file system. 10 minutes
State Operational status of the logical volume, indicating whether it is open, closed, syncd, or in another state. 10 minutes

By default, data is collected for the top 10 logical volumes based on capacity utilization (highest to lowest). To collect data for all logical volumes, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectAllLogicalVolumes: true # Set false to collect only top 10 logical volumes.

Filesystems

These metrics provide insights into file system performance, capacity, and usage, allowing administrators to monitor and optimize their storage systems effectively.

Metric Description Granularity
Free disk space Amount of free space that is available on the file system. 1 second
Leaked Space that is allocated but not used, considered leaked or wasted. 1 second
Capacity Total capacity of the file system. 1 second
Used disk percentage Percentage of space that is used on the file system. 1 second
Inode usage Percentage of inodes (data structures describing files and directories) in use. 1 second
Inode free Number of free inodes that are available on the file system. 1 second
Bytes Read/s Number of bytes that are read from the file system. 1 second
Bytes Written/s Number of bytes that are written to the file system. 1 second
Reads/s Number of read operations per second. 1 second
Writes/s Number of write operations per second. 1 second
Tag Description
Device Name of the device.
Mount Mount point where the device is attached in the file system hierarchy.
Options The options or parameters that are used while mounting the file system.
Type The type of file system.

* The total, read, and write usage datapoint metrics display the disk I/O utilization as a percentage.

* Leaked (refers to deleted files that are in use and equates to capacity - used - free. You can find these files with lsof | grep deleted).

By default, Instana only monitors local file systems. You can list the file systems that are monitored or excluded in the configuration.yaml file (*instanaAgentDir*/etc/instana/configuration.yaml).

The name for the configuration setting is the device name, which you can obtain from the first column of the df command output.

The following example shows the list of file systems that are monitored:

com.instana.plugin.host:
  filesystems:
    - '/dev/hd11admin'
    - '/dev/livedump'
    - '/dev/hd10opt'
    - '/dev/hd2'
 

The following example shows the file systems that are included or excluded:

com.instana.plugin.host:
  filesystems:
    include:
      - '/dev/hd11admin'
      - '/dev/livedump'
    exclude:
      - '/dev/hd10opt'
      - '/dev/hd2'
 

By default, data is collected for the top 10 filesystems based on capacity utilization (highest to lowest). To collect data for all filesystems, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectAllAixFilesystems: true # Set false to collect only top 10 filesystems

NFS statistics

Network File System (NFS) statistics provide detailed visibility into the way the system interacts with remote file systems over the network. These statistics are essential for understanding performance, diagnosing latency issues, and identifying bottlenecks related to storage access or network behavior.

NFS version 2, version 3, and version 4 client and server calls are supported.

The following tables describe the available NFS metrics.

NFS top level calls table shows a high-level view of NFS activity on the system. It helps you quickly understand how many NFS requests are being handled and whether any of them are failing.

Metric Description Granularity
Total calls Total number of NFS requests received during the interval. 60 seconds
Rejected calls Number of NFS requests that were rejected and could not be processed. 60 seconds
Rejected calls percentage Percentage of rejected NFS calls compared to the total number of calls. 60 seconds

NFS client table shows the activity of the system when it acts as a client, including the number of requests sent to NFS servers and any errors or rejections encountered.

Metric Description Granularity
Lookups Number of lookup operations performed by the client to find files or directories. 60 seconds
Read calls Number of read requests sent by the client. 60 seconds
Read directory calls Number of requests to read directory contents. 60 seconds
Read link calls Number of requests to read symbolic links. 60 seconds
Writes Number of write operations performed by the client. 60 seconds
Write cache calls Number of cached write operations sent by the client. 60 seconds
File creates Number of file creation requests from the client. 60 seconds
Remove file calls Number of requests to delete files. 60 seconds
Rename file calls Number of file rename requests made by the client. 60 seconds
Make directory calls Number of requests to create new directories. 60 seconds
Remove directory calls Number of requests to remove directories. 60 seconds
Get attribute calls Number of requests to fetch file or directory attributes. 60 seconds
Set attribute calls Number of requests to update file or directory attributes. 60 seconds
File system statistics calls Number of requests to fetch file system statistics. 60 seconds
Link calls Number of requests to create hard links. 60 seconds
Symbolic link calls Number of requests to create symbolic links. 60 seconds
Null calls Number of NULL procedure calls used to check connectivity. 60 seconds
Root calls Number of root operation calls made by the client. 60 seconds

NFS server table shows the NFS requests received and processed by the server. It helps monitor server performance, track errors, and identify potential bottlenecks in file-sharing operations.

Metric Description Granularity
Lookups Number of lookup requests handled by the server during the interval. 60 seconds
Read calls Number of read requests received by the server. 60 seconds
Writes Number of write requests received by the server. 60 seconds
Read directory calls Number of requests to read directory contents. 60 seconds
Read link calls Number of requests to read symbolic links. 60 seconds
Write cache calls Number of cached write operations that the server handles. 60 seconds
File creates Number of file creation requests handled by the server. 60 seconds
Remove file calls Number of requests to delete files. 60 seconds
Rename file calls Number of file rename requests handled by the server. 60 seconds
Make directory calls Number of requests to create new directories. 60 seconds
Remove directory calls Number of requests to remove directories. 60 seconds
Get attribute calls Number of requests to fetch file or directory attributes. 60 seconds
Set attribute calls Number of requests to update file or directory attributes. 60 seconds
File system statistics calls Number of requests to fetch file system statistics. 60 seconds
Link calls Number of requests to create hard links. 60 seconds
Symbolic link calls Number of requests to create symbolic links. 60 seconds
Null calls Number of NULL procedure calls used to check connectivity. 60 seconds
Root calls Number of root operation calls received by the server. 60 seconds

By default, NFS statistics are not collected. To enable them, you must configure the setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectNfsStatistics: true

Configure this setting only if any NFS file system is mounted. Otherwise, the collected values will be zero.

RPC statistics

Remote Procedure Call (RPC) is a core communication mechanism used to enable programs on one system to execute procedures on another system as if they were local function calls. RPC abstracts the complexity of network communication, serialization, and transport, allowing distributed components to interact seamlessly.

Both connection-oriented and connectionless RPC client and server calls are supported.

RPC client table shows the RPC requests sent from the client to servers. It helps monitor call success, timeouts, retransmissions, and authentication issues to ensure reliable remote procedure communication.

Metric Description Granularity
Calls Number of RPC requests sent by the client. 60 seconds
Calls rejected by server Number of client requests rejected by the RPC server. 60 seconds
Calls timed out Number of RPC calls from the client that timed out before receiving a server response. 60 seconds
Calls retransmitted Number of RPC packets retransmitted to the server due to failed or delayed responses. 60 seconds
Replies not matching calls Number of times server replies did not match the client request. 60 seconds
Times call wait on busy Number of times the client had to wait because the server was busy. 60 seconds
Times authentication refreshed Number of times the client had to resend authentication information during the interval. 60 seconds

RPC server table shows the RPC requests received and processed by the server. It helps monitor rejected requests, duplicate calls, packet errors, and availability issues to ensure reliable server-side communication.

Metric Description Granularity
Calls Number of RPC requests received by the server. 60 seconds
Calls rejected Number of RPC requests rejected by the server. 60 seconds
Dup reqs Number of duplicate RPC requests received by the server. 60 seconds
Dup checks Number of RPC requests serviced from the duplicate request cache. 60 seconds
Packets with malformed header Number of RPC packets received with malformed headers, causing processing errors. 60 seconds
Packets too short Number of incomplete RPC packets received that were too short to process. 60 seconds
Times RPC packet unavailable Number of times the server attempted to receive a packet when none was available. 60 seconds

By default, RPC statistics are not collected. To enable them, you must configure the setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):

com.instana.plugin.host:
  collectRpcStatistics: true

Configure this setting only if an NFS filesystem is mounted or the system is acting as an NFS server. Otherwise, the collected RPC statistics will be zero.

Network interfaces

The following table outlines the network traffic and errors per an interface.

Metric Description Granularity
Interface Network interface that is used for communication. 60 seconds
Mac Media Access Control (MAC) address of the network interface. 60 seconds
IPs IP addresses assigned to the network interface. 60 seconds
RX Bytes Total number of bytes received by the network interface per second. 1 second
RX Errors Number of errors encountered while receiving data on the network interface. 1 second
TX Bytes Total number of bytes transmitted by the network interface per second. 1 second
TX errors Errors encountered while transmitting packets on the network interface. 1 second
Received/s Number of packets received by the network interface per second. 1 second
Transmitted/s Number of packets transmitted by the network interface per second. 1 second

TCP activity

These metrics provide insights into TCP connection activity, including established connections, segment transmission rates, and error occurrences.

Metric Description Granularity
Established Number of established TCP connections. 1 second
Open/s Number of new TCP connections opened per second. 1 second
In Segments/s Number of incoming TCP segments per second. 1 second
Out Segments/s Number of outgoing TCP segments per second. 1 second
Established Resets Percentage of established TCP connections that are reset per second. 1 second
Out Resets Percentage of outgoing TCP connections that are reset per second. 1 second
Fail Percentage of failed TCP connection attempts per second. 1 second
Error Percentage of TCP errors per second. 1 second
Retransmission Percentage of TCP retransmissions per second. 1 second

Top process list

These metrics offer insights into running processes, including their process ID, name, CPU usage, normalized CPU usage, and memory consumption. The top process list is updated every 30 seconds and the list contains only the processes with system usage. For example, the processes with more than 10% CPU usage over the last 30 seconds or processes with more than 512 MB memory usage (RSS) are displayed in the process top list.

To create a combined list of processes from the top 10 CPU and memory usage lists, set combineTopProcesses to true. The processes are included in the combined list even if their CPU usage is less than 10% or memory usage is less than 512 MB. If the same process is listed in the top 10 CPU and top 10 memory usage lists, it is listed only once in the combined list, which can include up to 20 entries.

com.instana.plugin.host:
  combineTopProcesses: true
 

Linux top semantics are used. 100% CPU refers to full use of a single CPU core, and you can search a history of snapshots from the previous month. The normalized CPU is calculated by dividing the CPU by the number of logical processors.

Metric Description Granularity
PID The unique identifier that is assigned to each process by the operating system. 30 seconds
Process name The name of the process as defined by the application or service. 30 seconds
PPID The parent process ID that started the current process, showing the process hierarchy. 30 seconds
GID The group ID that indicates the primary group ownership of a process. 30 seconds
UID The user ID that identifies the owner of a process. 30 seconds
Elapsed time The total time that the process has been running since it started. 30 seconds
CPU The amount of CPU resources that is consumed by the process. 30 seconds
CPU (normalized) The CPU usage of the process, normalized to a scale. 30 seconds
Memory The amount of memory that is consumed by the process. 30 seconds

Health signatures

For each sensor, a knowledge base of health signatures is evaluated continuously against the incoming metrics. They are used to raise issues or incidents depending on the user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of an entity.

For more information about the built-in events for the Host sensor, see Built-in events reference.

Error report events

On the AIX system, the errpt command generates an error report from entries in an error log. The errors in the error report are then captured as events and sent to Instana. The sensor captures permanent and temporary error types, and hardware and software error classes. You need to enable the feature by using the agent configuration.yaml file (*instanaAgentDir*/etc/instana/configuration.yaml) as shown in the following example:

com.instana.plugin.host:
  aixEventsPollRate: 900 # In seconds