Monitoring applications

PowerHA® SystemMirror® uses monitors to check if the application is running before starting the application, avoiding startup of an undesired second instance of the application.

PowerHA SystemMirror also monitors specified applications and attempts to restart them upon detecting process death or application failure.

Application monitoring works in one of two ways:

  • Process application monitoring detects the termination of one or more processes of an application, using RSCT Resource Monitoring and Control (RMC).
  • Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals.

PowerHA SystemMirror uses monitors to check if the application is running before starting the application. You can configure multiple application monitors and associate them with one or more application controllers. You can assign each monitor a unique name in SMIT.

By supporting multiple monitors per application, PowerHA SystemMirror can support more complex configurations. For example, you can configure one monitor for each instance of an Oracle parallel server in use. Or, you can configure a custom monitor to check the health of the database along with a process termination monitor to instantly detect termination of the database process.

Process monitoring is easier to set up, as it uses the built-in monitoring capability provided by RSCT and requires no custom scripts; however, it may not be an appropriate option for all applications. User-defined monitoring can monitor more subtle aspects of an application's performance and is more customizable, but it takes more planning, as you must create the custom scripts.

In either case, when a problem is detected by the monitor, PowerHA SystemMirror attempts to restart the application on the current node and continues the attempts until a specified restart count is exhausted. When an application cannot be restarted within this restart count, PowerHA SystemMirror takes one of two actions, which you specify when configuring the application monitor:

  • Choosing fallover causes the resource group containing the application to fall over to the node with the next highest priority according to the resource policy.
  • Choosing notify causes PowerHA SystemMirror to generate a server_down event to inform the cluster of the failure.

When you configure an application monitor, you use the SMIT interface to specify which application is to be monitored and then define various parameters such as time intervals, restart counts, and action to be taken in the event the application cannot be restarted. You control the application restart process through the Notify Method, Cleanup Method, and Restart Method SMIT fields, and by adding pre-event and post-event scripts to any of the failure action or restart events you select.

You can temporarily suspend and then resume an application monitor in order to perform cluster maintenance.

When an application monitor is defined, each node's Configuration Database contains the names of monitored applications and their configuration data. This data is propagated to all nodes during cluster synchronization, and is backed up when a cluster snapshot is created. The cluster verification ensures that any user-specified methods exist and are executable on all nodes.

Note: If you specify the fallover option, which may cause a resource group to migrate from its original node, even when the highest priority node is up, the resource group may remain offline. Unless you bring the resource group online manually, it could remain in an inactive state.

A note on Application monitors

Application monitors configurable in PowerHA SystemMirror are a critical piece of the PowerHA SystemMirror cluster configuration; they enable PowerHA SystemMirror to keep applications highly available. When PowerHA SystemMirror starts an application controller on a node, it uses a monitor that you configure to check if an application is already running to avoid starting two instances of the application. PowerHA SystemMirror also periodically manages the application using the monitor that you configure to make sure that the application is up and running.

An erroneous application monitor may not detect a failed application. As a result, PowerHA SystemMirror would not recover it or may erroneously detect an application as failed, which may cause PowerHA SystemMirror to move the application to a takeover node, resulting in unnecessary downtime. For example, a custom monitor that uses an sql command to query a database to detect whether it is functional may not respond that the database process is running on the local node so this is not sufficient for use with PowerHA SystemMirror.

If you plan on starting the cluster services with an option of Manage Resources > Manually, or stopping the cluster services without stopping the applications, PowerHA SystemMirror relies on configured application monitors to determine whether to start the application on the node or not.

When cluster services are stopped using the unmanage option, the long-running application monitors are not brought down. As long as the clstrmgr daemon is active, it is aware that there is already a monitor running and a second instance will not be started when PowerHA SystemMirror restarts. If the monitor indicates a failure then events are not generated in response. Therefore, no cleanup or restart methods are running during this time. If your application monitor attempts a recovery or restart on its own, PowerHA SystemMirror will not be able to react. It is important to separate recovery actions from the monitor itself.

To summarize, we highly recommend properly configured and tested application monitors for all applications that you want to keep highly available with the use of PowerHA SystemMirror. During verification, PowerHA SystemMirror issues a warning if an application monitor is not configured.