PowerHA® SystemMirror® provides protection against
a split cluster by using a disk fencing policy. The disk fencing policy uses an SCSI-3 reservation
function to separate (fence out) the node with problems that is hosting the workload from the
cluster. In this scenario, the workload that was running on the problematic node is started on the
standby LPAR.
The fencing process ensures that standalone nodes have access to the disks and that the data
remains protected. The disk fencing policy is supported for an Active-Passive deployment model. Disk
fencing ensures that a workload can run with write access on only one node in the cluster. PowerHA SystemMirror registers the disks of all the volume groups
that are part of any resource group.
The following figure displays what occurs when a cluster split occurs when using a disk fencing
policy. In the following figure, the standby LPAR communicates with the shared storage disk and
requests that access to the active LPAR disk is revoked. The shared storage disk blocks any write
access from the previously active LPAR, even if the active LPAR is restarted. The standby LPAR
brings the application or resource group online if PowerHA SystemMirror can fence out the disks for the resource groups
in the active LPAR. If errors occur in the standby LPAR while fencing out the disks in the active
LPAR, the applications are not brought online and you must correct the problems and manually bring
the resource groups back online. Figure 1. Disk fencing policy
The following key attributes apply to the disk fencing policy in PowerHA SystemMirror:
Disk fencing applies to only active-passive cluster environments.
Disk fencing is not supported for resource group that use a startup policy of Online
on All Available Nodes.
Disk fencing is supported at the cluster level. Therefore, you can enable or disable disk
fencing policy at the cluster level.
Disk fencing manages all disks that are part of volume groups included in resource groups.
Disk fencing is supported in mutual takeover configurations. If multiple resource groups exist
in the cluster, you must choose one resource group to be the most critical resource group. When the
cluster split event occurs, the relative location of the critical resource group determines which
site wins. The site that wins is the site that was not running the critical resource group before
the cluster split event occurred.
PowerHA SystemMirror uses as much information as possible
from the cluster to determine the heath of the LPARs. For example, if the active LPAR was going to
crash, PowerHA SystemMirror sends a message to the standby
LPAR before the active LPAR goes offline. These notifications ensue that the standby LPAR is certain
that the active LPAR has gone offline, and that the standby LPAR can bring the application online.
Note: In certain cases, the standby LPAR is aware that the active LPAR is not sending heartbeats but
cannot determine the actual status of the active LPAR. In this case, the standby LPAR declares that
the active LPAR has failed after waiting for the time you specified in the Node Failure
Detection Timeout field. At this time, the standby LPAR fences out all the disks before
bringing the resource groups online. If a single disk is not correctly fenced out, the resource
group is not brought online.