AIX error notification

Although the PowerHA® SystemMirror® software does not monitor the status of disk resources, it does provide a SMIT interface to the AIX® error notification function. The AIX error notification function allows you to detect an event not specifically monitored by the PowerHA SystemMirror software. For example, a disk adapter failure, and to program a response to the event.

Permanent hardware errors on disk drives, controllers, or adapters might impact the fault resiliency of data. By monitoring these errors through error notification methods, you can assess the impact of a failure on the cluster's ability to provide high availability. A simple implementation of error notification would be to send a mail message to the system administrator to investigate the problem further. A more complex implementation could include logic to analyze the failure and to decide whether to continue processing, stop processing, or escalate the failure to a node failure and have the takeover node make the volume group resources available to clients.

Implement an error notification method for all errors that affect the disk subsystem. Doing so ensures that degraded fault resiliency does not remain undetected.

Note that, if you want PowerHA SystemMirror to react to a volume group failure on a node, you have an option to configure a customized AIX error notification method for this specific error, which would cause a node_down event or move the affected resource groups to another node.

You can customize resource recovery for a volume group that fails due to an LVM_SA_QUORCLOSE error. This error can occur if you use mirrored volume groups with quorum enabled. For this case, you can do one of the following:

  • Let the PowerHA SystemMirror selective fallover function move the affected resource group
  • Send a notification using the AIX Error Notification function
  • Continue using your pre- and post-event scripts for this type of recovery

If you previously had a pre-event or post-event configured to handle these cases, assess how they are working with the selective fallover function. For more information about how PowerHA SystemMirror handles this particular error, see Error notification method used for volume group loss.

However, PowerHA SystemMirror does not react to any other type of volume group errors automatically. In all other cases, you still need to configure customized error notification methods, or use AIX automatic error notification methods to react to volume group failures.

For information about using this utility to assign error notification methods in one step to a number of selected disk devices, see PowerHA SystemMirror automatic error notification.