Monitoring a macOS host

You can monitor your host with Instana. Instana provides comprehensive insights into the host's performance, health, and resource utilization, enabling efficient troubleshooting, performance optimization, and proactive issue detection.

System information

Instana retrieves various system details from a host. You can view the following details of the host on the Instana GUI in the System pane:

Parameter Description
OS The details of the operating system, the kernel version, and the architecture.
CPU The details of the CPU and the count.
Memory The amount of system memory in GiB (gigabytes).
Max Open Files The maximum number of concurrent file operations that a system can support.
Hostname The hostname of the machine.
FQDN The fully qualified domain name. It is the complete domain name of the host, including the subdomain and top-level domain.
Machine ID The unique identifier for the host.
System ID* The custom identifier used by Instana to uniquely represent and manage the monitored host within its monitoring. System ID is used for correlation with asset management systems.
Host ID The MAC address of the host's network interface, which is a unique identifier for the network adapter.
Started At The time at which the host machine started.

*For macOS, you need to enable System ID by using the agent configuration YAML file as shown in the following example:

"com.instana.plugin.host": 
  "collectSystemId": true
 

Interfaces

You can find the following details:

  • Interfaces: The list of network interfaces and IP addresses.
  • Instana agent: The Instana agent for the host.
  • Process: The count and details of the processes that are running on the host.

Reporting status

The historical availability of a Mac host is shown in the Reporting Status chart in the Mac host dashboard. You can see three color indicators that identify the status of a host reporting to Instana.

Status Description Color indicator
Reporting The host reported to Instana without any interruptions. Green
Reporting - monitoring issues The host reported to Instana with some interruption (such as, network interruptions or agent monitoring issues) and was not fully available. Orange
Not Reporting The host was not reporting to Instana at all during this time. Red

The metric that is used to show this data on the host dashboard is based on the aggregation of messages received from the agent monitoring the host. A host is classified as Reporting if Instana receives at least 98% of the expected messages in a given timeframe.

For example, if the metric aggregation time window is 5 minutes and the poll rate of the host is once per second, Instana expects to receive 300 messages from the host during that timeframe.

  • If at least 294 messages are received (98% of 300), the host status is shown as Reporting.
  • If less than 294 but greater than 0 messages are received, the host status is shown as Reporting – Monitoring Issues.
  • If no messages are received, the host status is shown as Not Reporting.

Performance metrics

The following performance metrics are displayed for the host.

CPU usage - percentage

The CPU usage values, when combined, provide a detailed view of how the CPU resources are being utilized on the host.

Metric Description Granularity
CPU Usage The total CPU usage in percentage for the time range that you set. 1 second

Memory usage

Metric Description Granularity
Memory Usage The total memory usage in percentage 1 second

CPU load - average

The CPU load metric displays the value on a graph for a selected time period.

Datapoint: Filesystem

Granularity: 1 second

Metric Description Granularity
CPU Load The average number of processes that are run for the time range that you set. 1 second

CPU usage - total

Metric Description Granularity
User The amount of CPU time spent running user-space processes (applications and services). 1 second
System The amount of CPU time spent running kernel-space processes (OS core functions). 1 second
Wait The amount of CPU time spent waiting for input/output operations to complete. 1 second
Nice The amount of CPU time spent running processes with a lower priority (nice value). 1 second
Steal The amount of CPU time lost due to the hypervisor managing other virtual machines or containers on the same physical host. 1 second

CPU load - peak

Metric Description Granularity
Load The peak CPU load. The highest number of processes that are run for the time range that you set. 1 second

Individual CPU Usage

The CPU usage metric displays the following metrics in percentage on a graph for a selected time period for each CPU:

Metric Description Granularity
User The amount of CPU time spent running user-space processes (applications and services). 1 second
System The amount of CPU time spent running kernel-space processes (OS core functions). 1 second
Wait The amount of CPU time spent waiting for input/output operations to complete. 1 second
Nice The amount of CPU time spent running processes with a lower priority (nice value). 1 second
Steal The amount of CPU time lost due to the hypervisor managing other virtual machines or containers on the same physical host. 1 second

Datapoint: Filesystem

Memory

The following table outlines the unit for memory:

Metric Unit Description Granularity
Used Percentage Amount of memory in use 1 second

The values are displayed on a graph for a selected time period.

Datapoint: Filesystem

Open files

Open files usage when available on the operating system; current vs max. The values are displayed on a graph for a selected time period.

Metric Unit Description Granularity
Current Byte The total memory available for use by the system, including both active and inactive memory. 1 second
Used Percentage The memory in use by processes

Datapoint: Filesystem

Network interfaces

The following table outlines the network traffic and errors per an interface.

Metric Description Granularity
Interface The network interface being used for communication. 60 seconds
Mac The Media Access Control (MAC) address of the network interface. 60 seconds
IPs The IP addresses assigned to the network interface. 60 seconds
RX Bytes The total number of bytes received by the network interface per second. 1 second
RX Errors The number of errors encountered while receiving data on the network interface. 1 second
TX Bytes The total number of bytes transmitted by the network interface per second. 1 second
TX Errors The percentage of transmission attempts that resulted in errors per second. 1 second
Received/s The number of packets received by the network interface per second. 1 second
Transmitted/s The number of packets transmitted by the network interface per second. 1 second

Datapoint: Filesystem

TCP activity

These metrics provide insights into TCP connection activity, including established connections, segment transmission rates, and error occurrences.

Metric Description Granularity
Established The number of established TCP connections. 1 second
Open/s The number of new TCP connections opened per second. 1 second
In Segments/s The number of incoming TCP segments per second. 1 second
Out Segments/s The number of outgoing TCP segments per second. 1 second
Established Resets The percentage of established TCP connections that were reset per second. 1 second
Out Resets The percentage of outgoing TCP connections that were reset per second. 1 second
Fail The percentage of failed TCP connection attempts per second. 1 second
Error The percentage of TCP errors per second. 1 second
Retransmission The percentage of TCP retransmissions per second. 1 second

Datapoint: Filesystem

Process top list

These metrics offer insights into running processes, including their process ID, name, CPU usage, normalized CPU usage, and memory consumption. The top process list is updated every 30 seconds and the list contains only the processes with system usage. For example, the processes with more than 10% CPU usage over the last 30 seconds or processes with more than 512 MB memory usage (RSS) are displayed in the process top list.

To create a combined list of processes from the top 10 CPU and memory usage lists, set combineTopProcesses to true. The processes are included in the combined list even if their CPU usage is less than 10% or memory usage is less than 512 MB. If the same process is listed in the top 10 CPU and top 10 memory usage lists, it is listed only once in the combined list, which can include up to 20 entries.

com.instana.plugin.host:
  combineTopProcesses: true
 

100% CPU refers to full use of a single CPU core, and you can search a history of snapshots from the previous month. The normalized CPU is calculated by dividing the CPU by the number of logical processors.

Metric Description Granularity
PID The unique identifier that is assigned to each process by the operating system. 30 seconds
Process Name The name of the process as defined by the application or service. 30 seconds
CPU The amount of CPU resources consumed by the process. 30 seconds
CPU (normalized) The CPU usage of the process, normalized to a scale. 30 seconds
Memory The amount of memory consumed by the process. 30 seconds

Datapoint: Filesystem

Health signatures

For each sensor, a knowledge base of health signatures is evaluated continuously against the incoming metrics. They are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of an entity.

For more information about the built-in events for the Host sensor, see Built-in events reference.