Periodically monitoring a PowerHA SystemMirror cluster
PowerHA® SystemMirror® provides recovery for various failures that occur within a cluster. For example, PowerHA SystemMirror can compensate for a network interface failure by swapping in a boot interface. As a result, it is possible that a component in the cluster has failed and that you are unaware of the fact.
The danger here is that, while PowerHA SystemMirror can survive one or possibly several failures, each failure that escapes your notice threatens a cluster's ability to provide a highly available environment, as the redundancy of cluster components is diminished.
To avoid this situation, you should customize your system by adding event notification to the scripts designated to handle the various cluster events. You can specify a command that sends you mail indicating that an event is about to happen (or that an event has just occurred), along with information about the success or failure of the event. The mail notification system enhances the standard event notification methods.
In addition, PowerHA SystemMirror offers application monitoring capability that you can configure and customize in order to monitor the health of specific applications and processes.
Use the AIX® Error Notification facility to add an additional layer of high availability to a PowerHA SystemMirror environment. You can add notification for failures of resources for which PowerHA SystemMirror does not provide recovery by default. The combination of PowerHA SystemMirror and the high availability features built into the AIX system keeps single points of failure to a minimum; the Error Notification facility can further enhance the availability of your particular environment.