Disk fencing policy

Edit online

PowerHA® SystemMirror® provides protection against a split cluster by using a disk fencing policy. The disk fencing policy uses an SCSI-3 reservation function to separate (fence out) the node with problems that is hosting the workload from the cluster. In this scenario, the workload that was running on the problematic node is started on the standby LPAR.

The fencing process ensures that standalone nodes have access to the disks and that the data remains protected. The disk fencing policy is supported for an Active-Passive deployment model. Disk fencing ensures that a workload can run with write access on only one node in the cluster. PowerHA SystemMirror registers the disks of all the volume groups that are part of any resource group.

The following figure displays what occurs when a cluster split occurs when using a disk fencing policy. In the following figure, the standby LPAR communicates with the shared storage disk and requests that access to the active LPAR disk is revoked. The shared storage disk blocks any write access from the previously active LPAR, even if the active LPAR is restarted. The standby LPAR brings the application or resource group online if PowerHA SystemMirror can fence out the disks for the resource groups in the active LPAR. If errors occur in the standby LPAR while fencing out the disks in the active LPAR, the applications are not brought online and you must correct the problems and manually bring the resource groups back online.

The active LPAR is taken offline and the standby LPAR remains online if PowerHA SystemMirror can fence out the disks in the active LPAR. — Figure 1. Disk fencing policy

The following key attributes apply to the disk fencing policy in PowerHA SystemMirror:

Disk fencing applies to only active-passive cluster environments.
Disk fencing is not supported for resource group that use a startup policy of Online on All Available Nodes.
Disk fencing is supported at the cluster level. Therefore, you can enable or disable disk fencing policy at the cluster level.
Disk fencing manages all disks that are part of volume groups included in resource groups.
Disk fencing is supported in mutual takeover configurations. If multiple resource groups exist in the cluster, you must choose one resource group to be the most critical resource group. When the cluster split event occurs, the relative location of the critical resource group determines which site wins. The site that wins is the site that was not running the critical resource group before the cluster split event occurred.

PowerHA SystemMirror uses as much information as possible from the cluster to determine the heath of the LPARs. For example, if the active LPAR was going to crash, PowerHA SystemMirror sends a message to the standby LPAR before the active LPAR goes offline. These notifications ensue that the standby LPAR is certain that the active LPAR has gone offline, and that the standby LPAR can bring the application online.

Note: In certain cases, the standby LPAR is aware that the active LPAR is not sending heartbeats but cannot determine the actual status of the active LPAR. In this case, the standby LPAR declares that the active LPAR has failed after waiting for the time you specified in the Node Failure Detection Timeout field. At this time, the standby LPAR fences out all the disks before bringing the resource groups online. If a single disk is not correctly fenced out, the resource group is not brought online.