An agent is a lightweight software layer installed by engineers on a host (any system or device that needs to be monitored), which collects relevant telemetry data about the state of the system. This process of installing agents on hosts is called instrumentation. With today’s leading infrastructure monitoring solutions, agents are able to use sensors to discover components up and down the infrastructure stack after configuration.
Once everything is fully instrumented, each agent begins collecting a wide range of metrics and measurements that reflect the behavior and status of the infrastructure. These metrics can include CPU and memory utilization, network bandwidth, disk space usage, response times, error rates, transaction counts and more. Ideally, the performance monitoring platform is continuously capturing this data in real time at one-second intervals with no sampling. This type of granularity is a primary benefit of agent-based collection, which makes it easier to identify and troubleshoot issues as they arise.
Agent-based collection also allows for proactive monitoring. By setting up thresholds that trigger alerts when things like CPU utilization exceeds a certain percentage, administrators can stay one step ahead of potential performance issues. Alerts can be sent through email or SMS, or integrated into notification systems like Slack or PagerDuty.
The primary benefit of agents is that data collection is much richer. In addition, things like diagnostics and issue remediation can happen automatically. On the downside, agents consume system resources such as CPU cycles, memory and network bandwidth to collect and transmit monitoring data. This can have a slight impact on system performance if the monitoring is resource-intensive or if a system has limited resources.