Using the health monitor
You set variables in the vault file to configure the DR health monitoring. The health
monitoring is run by a Cron job script, /usr/bin/resDRstatus, which runs every
minute. The Cron job file is resilient_dr-monitor. It is created and set up on the
master appliance when you run the enable.dr action to enable the DR solution. If
you disable DR, it is removed. The default DR tag in Syslog is
resilient-dr.
Postgresql Replication Status: Running, Replication delay=56 bytes
File Replication Status: Running (Synced)For example, for file system
replication:resilient-filesync service isn't runningThe status message is logged with priority info to Syslog. If there are
problems, messages are logged as warn or error. For example, if
database replication is not running, this is logged as a warn message.
error message is generated in the following circumstances:- If the number of postgres replication slots is not equal to one. If there are zero replication slots or if there is more than one, there are two slightly different error messages.
- If the number of postgres replication connections is greater than one. This indicates an unwanted connection.
- If postgres replication is running and the number of replication bytes exceeds the lag threshold in bytes for postgres.
- If the
resilient-filesyncservice is running AND the delay is greater than the lag threshold forresilient-fileync(set as number of seconds).
warn message is generated in the following circumstances:- When the number of replication slots is greater than or equal to one and there are no replication connections, where there is no receiver receiving a stream from the master.
- When the number of replication slots is greater than or equal to one and the retained transactions (in bytes) is greater than the replication retained threshold (in bytes).
- When the
resilient-filesyncservice is not running.
An info message is generated every minute, when the resDrStatus
script runs. The resDrStatus script outputs the current postgres and
resilient-filesync service status.
The /var/log/res-dr-status.log file contains all entries generated by the Disaster Recovery monitor.
The health monitor checks that database replication is running and generates a Syslog message if the replication delay is more than the configured threshold, which is configured in megabytes. You configure the threshold value in the vault file, as described in Step 5: Creating Ansible vault files.
The health monitor also checks the status of the resilient-filesync service and
generates a Syslog message if the service is not running or if the delay is greater than the
specified threshold, which you configure in the vault file, as a delay in seconds.
If the receiver system is not running, the master system saves the transactions that are running.
When the size of saved transactions reach the value specified in the
vault_vars_dr_monitor_postgres_retained_threshold variable in the vault file, a
warn message is generated to alert you that the value has been reached.