Troubleshooting disk fencing

Disk fencing is only available for quarantine policies in PowerHA® SystemMirror®.

Problem 1

Disk fencing is no longer required for your environment. You can disable disk fencing and release the reservation for a disk or a volume group.

Solution 1

To disable disk fencing and release the reservation for a disk or a volume group, complete the following steps:

From the command line, run the following commands to release the reservations from a disk or volume group:
```
clgmr modify physical_volume <disk> scsipr_clear={yes}
clgmr modify volume_group <vg> scsipr_clear={yes}
```
where disk is the name of the disk and vg is the name of the volume group.
From the command line, enter smit sysmirror.
From the SMIT interface, select Custom Cluster Configuration > Cluster Nodes and Networks > Initial Cluster Setup (Custom) > Configure Cluster Split and Merge Policy > Quarantine Policy > Disk Fencing, and press Enter.
Specify No for the Disk Fencing field, and specify the critical resource group in the Critical Resource Group field. Press Enter to save your changes.
From the Quarantine Policy panel, select Active Node Halt Policy > Configure Active Node Halt Policy, and press Enter.
Specify No for the Active Node Halt Policy field, and specify the critical resource group in the Critical Resource Group field. Press Enter to save your changes.
Note: The critical resource group that you specify must be the same critical resource group that you specified in step 4.

Problem 2

A resource group goes into an error state in an active cluster. The resource group is put into an error state because a node fails to register and put a reserve on a single volume group in the resource group.

Solution 2

To fix this problem with the resource group, use one of the following options:

Run the cl_scsipr_revover_rg script. The cl_scsipr_revover_rg script registers and reserves the volume groups of the resource group that is in an error state.
To fix this problem with the SMIT interface, complete the following steps:
1. From the command line, enter smit sysmirror.
2. From the SMIT interface, select Problem Determination Tools > Recover Resource Group from SCSI Persistent Reserve Error, and press Enter.
3. Select the resource that is in an error state, and press Enter.
4. From the SMIT interface select System Management (C-SPOC) > Resource Group and Applications > Bring a Resource Group Online, and press Enter.
5. Select the resource group that you want to bring back online, and press Enter.

Problem 3

If the split merge policy is SCSI, the PowerHA SystemMirror sets up the SCSI Persistent Reserve state for all shared disks when it is started. This sets up the Persistent Reserve keys for all paths to the devices. If later, new or changed paths are added to the device, the Persistent Reserve keys are not set up for those paths.

Solution 3

To fix this problem with the resource group, use one of the following options:

From the command line, run the following commands to release the reservations from a disk or volume group:
```
clgmr modify physical_volume <disk> scsipr_clear={yes}
clgmr modify volume_group <vg> scsipr_clear={yes}
cl_scsipr_dare_reg_res <vg>
```
where disk is the name of the disk and vg is the name of the volume group.
To fix this problem with the SMIT interface, complete the following steps:
1. From the command line, enter smit sysmirror.
2. From the SMIT interface, select Custom Cluster Configuration > Cluster Nodes and Networks > Initial Cluster Setup (Custom) > Configure Cluster Split and Merge Policy > Quarantine Policy > Disk Fencing, and press Enter.

The following table displays different scenarios for disk fencing when a command is run or a specific event occurs. The configuration for these scenarios is that the site contains NodeA (contains critical resource group) and NodeB (does not contain the critical resource group). Also, in this configuration, NodeA and NodeB are registered on all disks that are part of the resource groups.

Table 1. Disk fencing scenarios
Scenario	NodeA observation	NodeB observation
hmc shutdown	NodeA is registered on the disks.	NodeB is not registered on the disks.
hmc reboot	NodeA is registered on the disks.	NodeB is not registered on the disks.
reboot	After the reboot, the disks are still intact because the reboot occurred faster than the resource group was acquired.	NodeB is not registered on the disks.
reboot -q	After the reboot, the disks are still intact because the reboot occurred faster than the resource group was acquired.	NodeB is not registered on the disks.
shutdown -Fr	NodeA is not registered on the disks.	NodeB is not registered on the disks.
shutdown	NodeA is registered on the disks.	NodeB is not registered on the disks.
halt -q	NodeA is registered on the disks.	NodeB is not registered on the disks.
halt	NodeA is registered on the disks.	NodeB is not registered on the disks.
Node crashes	NodeA is registered on the disks.	NodeB is not registered on the disks.
clstop with resource group offline	NodeA is registered on the disks.	NodeB is registered on the disks.
clstop with move resource group	NodeA is registered on the disks.	NodeB is registered on the disks.
clstop with unmanage resource group	NodeA is not registered on the disks.	NodeB is not registered on the disks.