Monitoring agents and integration servers
To maximize the throughput of the application, the agents should be monitored to ensure that they are able to process all of the pending tasks within an acceptable time frame. If an agent is not able to process its tasks fast enough, the pending jobs accumulate and cause a bottleneck in the system.
Health monitor agent
The health monitor agent provides the following abilities:
- Shut down the entire health monitor agent
- Allowing cache changes
- Viewing server properties
- Changing logging parameters
- Sub-service visibility
A few of the statistics provided by the health monitor agent are:
- Processing Rate
- Number of heartbeats purged
- Number of snapshots purged
- Application server down alerts
- Server unavailable alerts
- Threshold reached alerts
- Heartbeats monitored
The YFS_SNAPSHOT table stores
the statistical details of pending tasks of transactions collected
by the agent servers. The parameter CollectPendingJobs in
time-triggered agents controls whether records are inserted in the
table. The health monitor deletes the records from this table after
the default purge interval of 30 days.
The heartbeat
records in the YFS_HEARTBEAT table are also purged
by the health monitor agent with a default purge interval of 30 days.
The health monitor schedules a purge once every 24
hours to purge the snapshot and heartbeat records that are older than
30 days. To change this purge interval from 30 days to suit your needs,
use the <INSTALL_DIR>/properties/customer_overrides.properties file
to set the following property:
yantra.hm.purge.interval=<value>
Server heartbeat
System Management tracks the status of the agent and integration servers by recording the server "heartbeat" while the server is running. If the server goes down, the heartbeat stops getting recorded. If a server with the same name is brought back up, the heartbeat resumes. For more information about purging heartbeat records, see Health monitor agent.
Alert when agent or integration server terminates unexpectedly
It is possible to configure a service to be run whenever an agent or integration server goes down unexpectedly. This service can perform many tasks, including sending an e-mail message to a system administrator or creating an alert in a system administrator's inbox. For more information about the data available for the service, see Data published for health monitor alerts.
Shutting down an agent or integration server through the System Management console (or pressing Ctrl+C on the command line window) does not generate an alert.
Agent pending tasks
The
number of pending tasks of every agent is recorded during every persist
interval, unless the CollectPendingJobs criteria
parameter for the agent is set to N in the Agent
Criteria Details.
Alert when the pending tasks threshold is exceeded
It is possible to configure a service to be run if the number of pending tasks for an agent goes above a threshold limit. This service can perform many tasks, including sending an e-mail message or creating an alert for a system administrator. For more information about the data available for the service, see Data published for health monitor alerts.
Other agent statistics
System Management records the processing rate for each agent during each persist interval.
Additionally, some of the most important agents record statistics that are specific to that agent. For example, the schedule order agent records the number of orders scheduled and number of orders backordered during each persist interval.
Integration server statistics
System Management records the processing rate as well as the minimum, maximum, and average response times for integration servers for each persist interval.
It is not possible to set a threshold or configure a service to be run for any of the statistics collected for integration servers.