Monitoring an AIX host
You can monitor your host with Instana. Instana provides comprehensive insights into the host's performance, health, and resource utilization, enabling efficient troubleshooting, performance optimization, and proactive issue detection.
- System information
- Interfaces
- Reporting status
- Performance metrics
- CPU usage: Overall
- Memory usage: Overall
- CPU load: Peak
- Average run queue (1h)
- Physical CPU consumed
- User sessions
- CPU usage: Total
- CPU events
- CPU load: Average
- Physical processor utilization
- Individual CPU usage
- Memory
- I/O activity
- Process statistics
- Disks
- Physical volumes
- Volume groups
- Logical volumes
- Filesystems
- NFS statistics
- RPC statistics
- Network interfaces
- TCP activity
- Top process list
- Health signatures
- Error report events
System information
Instana collects comprehensive system information from your AIX host. View these details in the System pane of the Instana dashboard:
| Parameter | Description |
|---|---|
| OS | Operating system information with kernel version and system architecture. |
| CPU | Logical processing units that are available to the system. |
| Memory | Amount of system memory in GiB (gibibytes). |
| Hostname | Hostname of the AIX machine. |
| FQDN | The fully qualified domain name. It is the complete domain name of the host, including the subdomain and top-level domain. |
| System ID | Unique identifier that is used by Instana to manage the monitored host and correlate with asset management systems. |
| Host ID | The MAC address of the host's network interface, which is a unique identifier for the network adapter. |
| Hardware brand | Name of the hardware manufacturer. |
| Hardware model | Model name of the hardware. |
| Machine serial number | Serial number of the machine. |
| Virtual processors | Number of logical CPU execution units that are assigned to an LPAR by the Power Hypervisor on which the AIX operating system schedules work. This value represents core-equivalent processors and excludes SMT threads. |
| Started at | Time when the system started. |
System ID is used for correlation with asset management systems. Enable System ID collection by configuring the agent YAML file as shown in the following example:
"com.instana.plugin.host":
"collectSystemId": true
Interfaces
You can find the following details:
- Interfaces: The list of network interfaces and IP addresses.
- Instana agent: The Instana agent for the host.
- Process: The count and details of the processes that are running on the host.
Reporting status
The historical availability of an AIX host is shown in the Reporting Status chart on the AIX host dashboard. You can see three color indicators that identify the status of a host reporting to Instana.
| Status | Description | Color indicator |
|---|---|---|
| Reporting | The host reported to Instana without any interruptions. | Green |
| Reporting - monitoring issues | The host reported to Instana with some interruption (such as, network interruptions or agent monitoring issues) and is not fully available. | Orange |
| Not Reporting | The host was not reporting to Instana at all during this time. | Red |
The metric that is used to display this data on the host dashboard is based on the aggregation of messages received from the agent monitoring the host. A host is classified as Reporting if Instana has received at least 98% of the expected messages in a given timeframe.
For example, if the metric aggregation time window is 5 minutes and the poll rate of the host is once per second, Instana expects to receive 300 messages from the host during that timeframe.
- If at least 294 messages are received (98% of 300), the host status is shown as Reporting.
- If less than 294 but greater than 0 messages are received, the host status is shown as Reporting – Monitoring Issues.
- If no messages are received, the host status is shown as Not Reporting.
Performance metrics
Instana monitors and displays a comprehensive set of performance metrics for AIX hosts. These metrics provide detailed insights into system resource utilization, including CPU usage patterns, memory allocation, disk I/O operations, network interface activity, and process behavior, enabling effective performance analysis and troubleshooting.
CPU usage: Overall
This section shows the total CPU usage percentage across all processors, representing the combined utilization of all CPU resources on the host. This aggregate metric helps you quickly assess overall system load and identify periods of high CPU demand.
To collect more accurate CPU usage in an AIX LPAR environment, you must set useMpstat to true as shown in the following example:
To collect more accurate CPU usage in an AIX LPAR environment, you must set `useMpstat` to true in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml) as shown in the following example:
com.instana.plugin.host:
useMpstat: true
| Metric | Description | Granularity |
|---|---|---|
| CPU usage | Total CPU usage in percentage for the time range that you set. | 1 second |
Memory usage: Overall
This section displays the overall memory utilization percentage for the AIX host, representing the total amount of physical memory currently in use. This metric provides a quick overview of memory consumption and helps identify potential memory pressure on the system.
| Metric | Description | Granularity |
|---|---|---|
| Memory usage | Total memory usage in percentage for the time range that you set. | 1 second |
used memory value is computed as follows: (computational + non-computational) ÷ real total. This calculation includes both computational memory (actively used by applications and the operating system) and non-computational memory (used for caching and other reclaimable purposes). The non-computational component often represents a large portion of used memory, which means a high used percentage doesn't necessarily signal insufficient memory. For effective memory monitoring and capacity planning on AIX, the computational memory percentage is more informative, as it indicates actual memory pressure and helps determine if the system is over-committed. CPU load: Peak
This section monitors the maximum CPU load reached during the selected time period, indicating the highest level of CPU demand experienced by the system. Peak load measurements are essential for capacity planning, as they reveal the upper bounds of system utilization and help identify periods when CPU resources may be insufficient to handle workload spikes.
| Metric | Description | Granularity |
|---|---|---|
| Load | Maximum CPU load recorded for the selected time range, representing the peak number of processes ready to run or actively running on the system. | 1 second |
Average run queue (1h)
This section monitors the average run queue depth over the last hour, measuring how many processes are waiting for CPU execution time. The run queue is a key indicator of CPU contention—higher values indicate more processes competing for CPU resources, which can signal the need for performance optimization or additional CPU capacity.
| Metric | Description | Granularity |
|---|---|---|
| Average run queue (1h) | Average number of processes in the run queue over the last 60 minutes. If the agent is up for less than 60 minutes, it displays Not collected. |
60 minutes |
Physical CPU consumed
Physical CPU consumed represents the actual amount of physical processor capacity used by an LPAR, measured in processor units (cores), regardless of how many virtual processors are configured.
| Metric | Description | Granularity |
|---|---|---|
| Physical CPU consumed | Number of physical processors consumed. | 1 second |
User sessions
This section monitors concurrent user login sessions on the AIX host, tracking how many users are actively logged into the system. This metric helps administrators monitor system access, identify unusual login patterns, and ensure compliance with licensing or security policies.
| Metric | Description | Granularity |
|---|---|---|
| User sessions | Number of concurrent user login sessions on the host. | 1 minute |
CPU usage: Total
This section breaks down total CPU usage into specific categories, showing how processor time is allocated across user processes, system operations, I/O wait states, and idle time.
| Metric | Description | Granularity |
|---|---|---|
| User | Percentage of CPU time spent executing user-space processes, including applications and user-initiated services. | 1 second |
| System | Percentage of CPU time spent executing kernel operations, including system calls, device drivers, and core operating system functions. | 1 second |
| Wait | Percentage of CPU time spent waiting for I/O operations to complete, indicating potential disk or network bottlenecks. | 1 second |
| Idle | Percentage of CPU time when the processor was idle and not waiting for I/O operations, indicating available CPU capacity. | 1 second |
CPU events
This section monitors CPU-related system events, tracking context switches and device interrupts that impact processor performance and system responsiveness.
| Metric | Description | Granularity |
|---|---|---|
| Context switches | Number of times the CPU switches between processes or threads. High values might indicate excessive multitasking or resource contention. | 1 second |
| Device interrupts | Number of hardware interrupt requests from peripheral devices that require immediate CPU attention for I/O operations. | 1 second |
CPU load: Average
This section tracks the average CPU load over time, measuring the number of processes competing for CPU resources. This metric helps assess system workload and identify periods of high demand or resource contention.
| Metric | Description | Granularity |
|---|---|---|
| CPU load | Average number of runnable or waiting processes over a 1-minute period, indicating overall system workload and CPU resource pressure. | 5 seconds |
Physical processor utilization
This section monitors physical processor utilization in AIX LPARs, tracking actual physical CPU consumption.
| Metric | Description | Granularity |
|---|---|---|
| Physical CPU consumed | Number of physical processors consumed. | 1 second |
| Hypervisor calls | Percentage of physical processor time spent making hypervisor calls. | 1 second |
| Stolen busy cycles | Percentage of physical processor utilization that occurs while the hypervisor is stealing busy cycles. | 1 second |
| Stolen idle cycles | Percentage of physical processor utilization that occurs while the hypervisor is stealing idle cycles. | 1 second |
By default, physical processor utilization metrics are not collected. To enable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectPhysicalProcessorUtil: true
This metric is also displayed in the Physical processor utilization time-series chart, where historical trends can be analyzed.
Individual CPU Usage
This section displays CPU usage breakdown for each individual processor, showing how CPU time is allocated across different execution states:
| Metric | Description | Granularity |
|---|---|---|
| User | Percentage of CPU time spent running user-space processes (applications and services). | 1 second |
| System | Percentage of CPU time spent running kernel-space processes (OS core functions). | 1 second |
| Wait | Percentage of CPU time spent waiting for I/O operations to complete. | 1 second |
| Idle | Percentage of CPU time when the processor was idle. | 1 second |
By default, individual CPU usage metrics are not collected. To enable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectIndividualCpuMetrics: true
By default, individual CPU usage metrics are disabled to optimize performance and reduce system overhead. These metrics provide per-processor breakdowns but can impact system performance when monitoring systems with a large number of logical CPUs. Enable this collection only if needed.
Memory
This section monitors memory resources on the AIX host, providing metrics for physical and virtual memory usage, swap space allocation, and paging activity to help assess memory pressure and optimize system performance.
| Metric | Unit | Description | Granularity |
|---|---|---|---|
| Used | % | Percentage of physical memory currently allocated, including both computational and non-computational memory. | 1 second |
| Computational | % | Percentage of memory actively used by applications and the operating system for processing tasks. | 1 second |
| Non-computational | % | Percentage of memory used for caching and other reclaimable purposes that can be freed if needed. | 1 second |
| Computational | Byte | Absolute amount of memory actively used by applications and the operating system. | 1 second |
| Non-computational | Byte | Absolute amount of memory used for caching and other reclaimable purposes. | 1 second |
| Real available | Byte | Amount of physical memory available for allocation without requiring paging or swapping. | 1 second |
| Minimum file page threshold (minperm%) | % | Minimum threshold for file pages, below which memory might be reclaimed from both file and computational pages. | 10 minutes |
| Swap used | % | Percentage of swap space currently in use, indicating memory pressure when high. | 1 second |
| Virtual used | % | Percentage of virtual memory (physical memory plus swap space) currently allocated. | 1 second |
| Swap total | Byte | Total amount of swap space configured on the system. | 1 second |
| Swap free | Byte | Amount of swap space available for use. | 1 second |
| Virtual total | Byte | Total virtual memory capacity (physical memory plus swap space). | 1 second |
| Virtual free | Byte | Amount of virtual memory available for allocation. | 1 second |
| Virtual active | Byte | Amount of virtual memory actively being used by running processes. | 1 second |
| Page-in | Rate | Number of pages read from disk into physical memory, indicating memory demand exceeding available RAM. | 1 second |
| Page-out | Rate | Number of pages written from physical memory to disk, indicating memory pressure and potential performance impact. | 1 second |
| Page-scan | Rate | Number of memory pages scanned by the system to identify candidates for reclamation or swapping. | 1 second |
| Page-faults | Rate | Number of page fault exceptions when processes access memory pages not currently in physical RAM. | 1 second |
| Page-reclaims | Rate | Number of memory pages reclaimed from the free list without requiring disk I/O. | 1 second |
All memory metrics are visualized in the Instana dashboard as time-series graphs, allowing you to analyze trends and correlations across different memory components over your selected time range.
I/O activity
This section monitors system I/O activity, tracking both application-level operations and physical disk I/O.
| Metric | Description | Granularity |
|---|---|---|
| Reads | Number of read and readv system calls executed by applications, representing high-level file read requests that may be satisfied from cache or require disk access. | 1 second |
| Writes | Number of write and writev system calls executed by applications, representing high-level file write requests that are initially written to buffer cache. | 1 second |
| Block reads | Number of physical block read operations from disk devices during the sampling period, indicating actual disk read I/O that bypassed or missed the buffer cache. | 1 second |
| Block writes | Number of physical block write operations to disk devices, including both synchronous and asynchronous writes that flush data from buffer cache to persistent storage. | 1 second |
| Non block reads | Number of physical block read operations including both synchronous and asynchronous I/O, providing insight into total disk read activity patterns. | 1 second |
| Non block writes | Number of raw I/O write operations that bypass the file system buffer cache entirely, typically used for direct I/O or database operations requiring immediate writes. | 1 second |
| Logical block reads | Number of logical block reads satisfied directly from system buffer cache without physical disk access, indicating effective cache utilization for read operations. | 1 second |
| Logical block writes | Number of logical block writes to system buffer cache that will be asynchronously flushed to disk later, helping assess write buffering effectiveness. | 1 second |
| System calls | The total number of system calls that are made by all processes | 1 second |
By default, collection of these metrics is enabled. To disable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectIOActivity: false
Process statistics
This section monitors process and thread activity, tracking process states, execution patterns, and system call activity to help assess workload distribution and identify scheduling issues.
| Metric | Description | Granularity |
|---|---|---|
| Total Processes | Total number of processes currently on the system across all states, including active, sleeping, stopped, and idle processes. | 1 second |
| Runnable | Number of processes waiting to be run, including both processes that are able to run and those currently executing on the CPU. | 1 second |
| Threads waiting | Number of processes or threads blocked while waiting for page-in operations to complete, indicating memory paging activity. | 1 second |
| Execs executed | Number of exec system calls executed during the sampling period, representing program execution and replacement operations. | 1 second |
| Forks executed | Number of fork system calls executed during the sampling interval, representing new process creation activity. | 1 second |
| Stopped | Number of processes currently in a stopped state, typically paused by job control signals or during debugging sessions. | 1 second |
| Sleeping | Number of processes currently in sleep state, waiting for events, resources, or I/O operations to complete. | 1 second |
| Idle | Number of processes currently in idle state with no active work to perform or resources to consume. | 1 second |
| Zombie | The number of zombie processes that completed execution but still have entries in the process table, waiting for their parent process to read status. | 1 second |
By default, collection of these metrics is enabled. To disable collection, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectProcessStatistics: false
Disks
This section monitors physical disk performance metrics, including I/O operations, transfer rates, and utilization patterns.
| Metric | Description | Granularity |
|---|---|---|
| Disk name | Name of the physical disk device. | 10 minutes |
| Average transfer size | Average number of bytes transferred per disk I/O operation. It helps assess typical transfer sizes and identify whether workloads are performing large sequential operations or small random I/O. | 5 seconds |
| Busy | Percentage of time the disk is actively transferring data. Values exceeding 30% often indicate excessive paging activity or I/O-bound processes. When combined with high CPU usage (>80%), this value typically signals system overload requiring attention. | 5 seconds |
| Transfer rate | Number of data transfer operations completed per second, representing the disk's overall transaction throughput and helping assess I/O workload intensity. | 5 seconds |
| Read operations | Number of read transfer operations completed per second, applicable to all storage device types except adapters, indicating read workload intensity on the disk. | 5 seconds |
| Write operations | Number of write transfer operations completed per second, applicable to all storage device types except adapters, indicating write workload intensity on the disk. | 5 seconds |
| Queue full count | Frequency per second that the disk's service queue reached its maximum capacity and could not accept additional requests, indicating I/O saturation and potential performance degradation. | 5 seconds |
| Data transferred | Total kilobytes transferred during the interval, providing a key indicator of disk data movement speed, though actual performance also depends on disk format and space usage efficiency. | 5 seconds |
| Data read | Number of bytes per second read from the disk, measured over the monitoring interval to track read throughput and identify read-intensive workloads. | 5 seconds |
| Data written | Number of bytes per second written to the disk, measured over the monitoring interval to track write throughput and identify write-intensive workloads. | 5 seconds |
| Type | Storage device type classification, identifying the specific kind of disk or storage adapter for proper performance interpretation and troubleshooting. | 10 minutes |
By default, data is collected for the top 10 disks based on busy percentage (highest to lowest). To collect data for all disks, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectAllDisks: true # Set false to collect only top 10 disks
Physical volumes
This section monitors physical volumes. A physical volume is a raw storage device (disk or partition) that must be initialized and added to a volume group before it can be used.
| Metric | Description | Granularity |
|---|---|---|
| Physical volume name | Device identifier assigned to the physical volume for system reference and management operations. | 10 minutes |
| Total size | Total raw storage capacity available on the physical volume for allocation to volume groups. | 10 minutes |
| Used size | Amount of storage currently allocated from this physical volume to logical volumes within the volume group. | 8 minutes |
| Free size | Amount of unallocated storage remaining on the physical volume available for new or expanding logical volumes. | 8 minutes |
| Used space | Percentage of physical volume capacity that is currently allocated to the volume group. | 8 minutes |
| Free space | Percentage of physical volume capacity that is available for allocation to the volume group. | 8 minutes |
By default, data is collected for the top 10 physical volumes based on capacity utilization (highest to lowest). To collect data for all physical volumes, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectAllVolumeGroups: true # Set false to collect only top 10 physical volumes.
Volume groups
This section monitors volume groups. A volume group is a collection of physical volumes that creates a storage pool.
| Metric | Description | Granularity |
|---|---|---|
| Volume group name | Unique identifier assigned to the volume group for management and reference purposes. | 10 minutes |
| Total size | Total storage capacity available in the volume group, representing the sum of all physical volumes. | 10 minutes |
| Used size | Amount of storage currently allocated to logical volumes within the volume group. | 8 minutes |
| Free size | Amount of unallocated storage remaining in the volume group available for new logical volumes. | 8 minutes |
| Used space | Percentage of volume group capacity currently allocated to logical volumes. | 8 minutes |
| Free space | Percentage of volume group capacity available for allocation to new or expanding logical volumes. | 8 minutes |
| Active physical volumes | Number of physical volumes currently active and accessible within the volume group. | 8 minutes |
| Physical volumes | Total number of physical volumes configured in the volume group, including both active and inactive. | 8 minutes |
| Logical volumes | Number of logical volumes currently defined within the volume group. | 8 minutes |
| Volume group state | Operational status of the volume group, indicating whether it is active, inactive, or in a varied state. | 10 minutes |
By default, data is collected for the top 10 volume groups based on capacity utilization (highest to lowest). To collect data for all volume groups, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectAllVolumeGroups: true # Set false to collect only top 10 volume groups.
Logical volumes
This section monitors logical volumes. A logical volume is a virtual block device created within a volume group that can be used for file systems or raw storage.
| Metric | Description | Granularity |
|---|---|---|
| Volume group name | Name of the volume group containing this logical volume, establishing the storage pool relationship. | 10 minutes |
| Logical volume name | Unique identifier assigned to the logical volume for system reference and management operations. | 10 minutes |
| Size | Total storage capacity allocated to the logical volume, which can be dynamically adjusted as needed. | 10 minutes |
| Type | Logical volume type, indicating its purpose such as jfs2, jfs2log, paging, or other specialized functions. | 10 minutes |
| Mount point | File system mount point where the logical volume is accessible, if it hosts a mounted file system. | 10 minutes |
| State | Operational status of the logical volume, indicating whether it is open, closed, syncd, or in another state. | 10 minutes |
By default, data is collected for the top 10 logical volumes based on capacity utilization (highest to lowest). To collect data for all logical volumes, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectAllLogicalVolumes: true # Set false to collect only top 10 logical volumes.
Filesystems
These metrics provide insights into file system performance, capacity, and usage, allowing administrators to monitor and optimize their storage systems effectively.
| Metric | Description | Granularity |
|---|---|---|
| Free disk space | Amount of free space that is available on the file system. | 1 second |
| Leaked | Space that is allocated but not used, considered leaked or wasted. | 1 second |
| Capacity | Total capacity of the file system. | 1 second |
| Used disk percentage | Percentage of space that is used on the file system. | 1 second |
| Inode usage | Percentage of inodes (data structures describing files and directories) in use. | 1 second |
| Inode free | Number of free inodes that are available on the file system. | 1 second |
| Bytes Read/s | Number of bytes that are read from the file system. | 1 second |
| Bytes Written/s | Number of bytes that are written to the file system. | 1 second |
| Reads/s | Number of read operations per second. | 1 second |
| Writes/s | Number of write operations per second. | 1 second |
| Tag | Description |
|---|---|
| Device | Name of the device. |
| Mount | Mount point where the device is attached in the file system hierarchy. |
| Options | The options or parameters that are used while mounting the file system. |
| Type | The type of file system. |
* The total, read, and write usage datapoint metrics display the disk I/O utilization as a percentage.
* Leaked (refers to deleted files that are in use and equates to capacity - used - free. You can find these files with lsof | grep deleted).
By default, Instana only monitors local file systems. You can list the file systems that are monitored or excluded in the configuration.yaml file (*instanaAgentDir*/etc/instana/configuration.yaml).
The name for the configuration setting is the device name, which you can obtain from the first column of the df command output.
The following example shows the list of file systems that are monitored:
com.instana.plugin.host:
filesystems:
- '/dev/hd11admin'
- '/dev/livedump'
- '/dev/hd10opt'
- '/dev/hd2'
The following example shows the file systems that are included or excluded:
com.instana.plugin.host:
filesystems:
include:
- '/dev/hd11admin'
- '/dev/livedump'
exclude:
- '/dev/hd10opt'
- '/dev/hd2'
By default, data is collected for the top 10 filesystems based on capacity utilization (highest to lowest). To collect data for all filesystems, configure the following setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectAllAixFilesystems: true # Set false to collect only top 10 filesystems
NFS statistics
Network File System (NFS) statistics provide detailed visibility into the way the system interacts with remote file systems over the network. These statistics are essential for understanding performance, diagnosing latency issues, and identifying bottlenecks related to storage access or network behavior.
NFS version 2, version 3, and version 4 client and server calls are supported.
The following tables describe the available NFS metrics.
NFS top level calls table shows a high-level view of NFS activity on the system. It helps you quickly understand how many NFS requests are being handled and whether any of them are failing.
| Metric | Description | Granularity |
|---|---|---|
| Total calls | Total number of NFS requests received during the interval. | 60 seconds |
| Rejected calls | Number of NFS requests that were rejected and could not be processed. | 60 seconds |
| Rejected calls percentage | Percentage of rejected NFS calls compared to the total number of calls. | 60 seconds |
NFS client table shows the activity of the system when it acts as a client, including the number of requests sent to NFS servers and any errors or rejections encountered.
| Metric | Description | Granularity |
|---|---|---|
| Lookups | Number of lookup operations performed by the client to find files or directories. | 60 seconds |
| Read calls | Number of read requests sent by the client. | 60 seconds |
| Read directory calls | Number of requests to read directory contents. | 60 seconds |
| Read link calls | Number of requests to read symbolic links. | 60 seconds |
| Writes | Number of write operations performed by the client. | 60 seconds |
| Write cache calls | Number of cached write operations sent by the client. | 60 seconds |
| File creates | Number of file creation requests from the client. | 60 seconds |
| Remove file calls | Number of requests to delete files. | 60 seconds |
| Rename file calls | Number of file rename requests made by the client. | 60 seconds |
| Make directory calls | Number of requests to create new directories. | 60 seconds |
| Remove directory calls | Number of requests to remove directories. | 60 seconds |
| Get attribute calls | Number of requests to fetch file or directory attributes. | 60 seconds |
| Set attribute calls | Number of requests to update file or directory attributes. | 60 seconds |
| File system statistics calls | Number of requests to fetch file system statistics. | 60 seconds |
| Link calls | Number of requests to create hard links. | 60 seconds |
| Symbolic link calls | Number of requests to create symbolic links. | 60 seconds |
| Null calls | Number of NULL procedure calls used to check connectivity. | 60 seconds |
| Root calls | Number of root operation calls made by the client. | 60 seconds |
NFS server table shows the NFS requests received and processed by the server. It helps monitor server performance, track errors, and identify potential bottlenecks in file-sharing operations.
| Metric | Description | Granularity |
|---|---|---|
| Lookups | Number of lookup requests handled by the server during the interval. | 60 seconds |
| Read calls | Number of read requests received by the server. | 60 seconds |
| Writes | Number of write requests received by the server. | 60 seconds |
| Read directory calls | Number of requests to read directory contents. | 60 seconds |
| Read link calls | Number of requests to read symbolic links. | 60 seconds |
| Write cache calls | Number of cached write operations that the server handles. | 60 seconds |
| File creates | Number of file creation requests handled by the server. | 60 seconds |
| Remove file calls | Number of requests to delete files. | 60 seconds |
| Rename file calls | Number of file rename requests handled by the server. | 60 seconds |
| Make directory calls | Number of requests to create new directories. | 60 seconds |
| Remove directory calls | Number of requests to remove directories. | 60 seconds |
| Get attribute calls | Number of requests to fetch file or directory attributes. | 60 seconds |
| Set attribute calls | Number of requests to update file or directory attributes. | 60 seconds |
| File system statistics calls | Number of requests to fetch file system statistics. | 60 seconds |
| Link calls | Number of requests to create hard links. | 60 seconds |
| Symbolic link calls | Number of requests to create symbolic links. | 60 seconds |
| Null calls | Number of NULL procedure calls used to check connectivity. | 60 seconds |
| Root calls | Number of root operation calls received by the server. | 60 seconds |
By default, NFS statistics are not collected. To enable them, you must configure the setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectNfsStatistics: true
Configure this setting only if any NFS file system is mounted. Otherwise, the collected values will be zero.
RPC statistics
Remote Procedure Call (RPC) is a core communication mechanism used to enable programs on one system to execute procedures on another system as if they were local function calls. RPC abstracts the complexity of network communication, serialization, and transport, allowing distributed components to interact seamlessly.
Both connection-oriented and connectionless RPC client and server calls are supported.
RPC client table shows the RPC requests sent from the client to servers. It helps monitor call success, timeouts, retransmissions, and authentication issues to ensure reliable remote procedure communication.
| Metric | Description | Granularity |
|---|---|---|
| Calls | Number of RPC requests sent by the client. | 60 seconds |
| Calls rejected by server | Number of client requests rejected by the RPC server. | 60 seconds |
| Calls timed out | Number of RPC calls from the client that timed out before receiving a server response. | 60 seconds |
| Calls retransmitted | Number of RPC packets retransmitted to the server due to failed or delayed responses. | 60 seconds |
| Replies not matching calls | Number of times server replies did not match the client request. | 60 seconds |
| Times call wait on busy | Number of times the client had to wait because the server was busy. | 60 seconds |
| Times authentication refreshed | Number of times the client had to resend authentication information during the interval. | 60 seconds |
RPC server table shows the RPC requests received and processed by the server. It helps monitor rejected requests, duplicate calls, packet errors, and availability issues to ensure reliable server-side communication.
| Metric | Description | Granularity |
|---|---|---|
| Calls | Number of RPC requests received by the server. | 60 seconds |
| Calls rejected | Number of RPC requests rejected by the server. | 60 seconds |
| Dup reqs | Number of duplicate RPC requests received by the server. | 60 seconds |
| Dup checks | Number of RPC requests serviced from the duplicate request cache. | 60 seconds |
| Packets with malformed header | Number of RPC packets received with malformed headers, causing processing errors. | 60 seconds |
| Packets too short | Number of incomplete RPC packets received that were too short to process. | 60 seconds |
| Times RPC packet unavailable | Number of times the server attempted to receive a packet when none was available. | 60 seconds |
By default, RPC statistics are not collected. To enable them, you must configure the setting in the host section of the agent configuration file (*instanaAgentDir*/etc/instana/configuration.yaml):
com.instana.plugin.host:
collectRpcStatistics: true
Configure this setting only if an NFS filesystem is mounted or the system is acting as an NFS server. Otherwise, the collected RPC statistics will be zero.
Network interfaces
The following table outlines the network traffic and errors per an interface.
| Metric | Description | Granularity |
|---|---|---|
| Interface | Network interface that is used for communication. | 60 seconds |
| Mac | Media Access Control (MAC) address of the network interface. | 60 seconds |
| IPs | IP addresses assigned to the network interface. | 60 seconds |
| RX Bytes | Total number of bytes received by the network interface per second. | 1 second |
| RX Errors | Number of errors encountered while receiving data on the network interface. | 1 second |
| TX Bytes | Total number of bytes transmitted by the network interface per second. | 1 second |
| TX errors | Errors encountered while transmitting packets on the network interface. | 1 second |
| Received/s | Number of packets received by the network interface per second. | 1 second |
| Transmitted/s | Number of packets transmitted by the network interface per second. | 1 second |
TCP activity
These metrics provide insights into TCP connection activity, including established connections, segment transmission rates, and error occurrences.
| Metric | Description | Granularity |
|---|---|---|
| Established | Number of established TCP connections. | 1 second |
| Open/s | Number of new TCP connections opened per second. | 1 second |
| In Segments/s | Number of incoming TCP segments per second. | 1 second |
| Out Segments/s | Number of outgoing TCP segments per second. | 1 second |
| Established Resets | Percentage of established TCP connections that are reset per second. | 1 second |
| Out Resets | Percentage of outgoing TCP connections that are reset per second. | 1 second |
| Fail | Percentage of failed TCP connection attempts per second. | 1 second |
| Error | Percentage of TCP errors per second. | 1 second |
| Retransmission | Percentage of TCP retransmissions per second. | 1 second |
Top process list
These metrics offer insights into running processes, including their process ID, name, CPU usage, normalized CPU usage, and memory consumption. The top process list is updated every 30 seconds and the list contains only the processes with system usage. For example, the processes with more than 10% CPU usage over the last 30 seconds or processes with more than 512 MB memory usage (RSS) are displayed in the process top list.
To create a combined list of processes from the top 10 CPU and memory usage lists, set combineTopProcesses to true. The processes are included in the combined list even if their CPU usage is less than 10% or memory usage is less than 512 MB. If the same process is listed in the top 10 CPU and top 10 memory usage lists, it is listed only once in the combined list, which can include up to 20 entries.
com.instana.plugin.host:
combineTopProcesses: true
Linux top semantics are used. 100% CPU refers to full use of a single CPU core, and you can search a history of snapshots from the previous month. The normalized CPU is calculated by dividing the CPU by the number of logical processors.
| Metric | Description | Granularity |
|---|---|---|
| PID | The unique identifier that is assigned to each process by the operating system. | 30 seconds |
| Process name | The name of the process as defined by the application or service. | 30 seconds |
| PPID | The parent process ID that started the current process, showing the process hierarchy. | 30 seconds |
| GID | The group ID that indicates the primary group ownership of a process. | 30 seconds |
| UID | The user ID that identifies the owner of a process. | 30 seconds |
| Elapsed time | The total time that the process has been running since it started. | 30 seconds |
| CPU | The amount of CPU resources that is consumed by the process. | 30 seconds |
| CPU (normalized) | The CPU usage of the process, normalized to a scale. | 30 seconds |
| Memory | The amount of memory that is consumed by the process. | 30 seconds |
Health signatures
For each sensor, a knowledge base of health signatures is evaluated continuously against the incoming metrics. They are used to raise issues or incidents depending on the user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of an entity.
For more information about the built-in events for the Host sensor, see Built-in events reference.
Error report events
On the AIX system, the errpt command generates an error report from entries in an error log. The errors in the error report are then captured as events and sent to Instana. The sensor captures permanent and temporary error types, and hardware and software error classes. You need to enable the feature by using the agent configuration.yaml file (*instanaAgentDir*/etc/instana/configuration.yaml) as shown in the following example:
com.instana.plugin.host:
aixEventsPollRate: 900 # In seconds