Contents


Understanding storage framework disk fencing in IBM PowerHA SystemMirror 7.1

Comments

Recently, I came across an interesting AIX Logical Volume Manager (LVM) problem. A customer reported that he couldn't restore a volume group (VG) backup image to a free storage area network (SAN) disk. The restvg command displayed an I/O failure message, and the LVM log showed the error code 47 EWRPROTECT, which indicated that the disk was write-protected. But when he checked this problem with his SAN storage server administrator, they confirmed that no write-protect restriction was set on the disk from the storage server at all.

This problem confused me and the customer. I took several days to investigate it, and eventually, identified the root cause of this problem. It is due to the IBM PowerHA SystemMirror 7.1 new feature, storage framework disk fencing, which is preventing the write access to the disk.

I realized that this new feature is a big change in PowerHA SystemMirror 7.1, but seems it is only mentioned in the PowerHA SystemMirror 7.1 announcement letter and IBM Redbooks®. So, I want to share my study of it here.

Comparing volume group passive mode and storage framework disk fencing

In PowerHA 6.1 and previous releases, when using enhanced concurrent volume groups in a fast disk takeover mode, the VGs are in full read/write (active) mode on the node owning the resource group. Any standby node has the VGs varied on in read only (passive) mode. The passive mode is the LVM equivalent of disk fencing.

Passive mode allows readability only of the volume group descriptor area and the first 4 KB of each logical volume. It does not allow read/write access to file systems or logical volumes. It also does not support LVM operations. However, low-level commands, such as dd, can bypass LVM and write directly to the disk. And when the enhanced concurrent volume groups varied off, no write-protect restriction on the disks any more.

PowerHA SystemMirror 7.1 uses the storage framework disk fencing, and it prevents disk write access from the AIX SCSI disk driver layer, invalidating the potential for a lower-level operation, such as dd, to succeed. This new feature prevents data corruption because of inadvertent access to the shared disks from multiple nodes, and it protects the disks even after the enhanced concurrent volume groups varied off.

How PowerHA SystemMirror 7.1 uses storage framework disk fencing

The storage framework disk fencing feature is local to an AIX node. So, each node in the PowerHA cluster sets up and maintains a fence group for each enhanced concurrent volume group managed by PowerHA SystemMirror 7.1.

PowerHA SystemMirror 7.1 uses three utilities: cl_vg_fence_init, cl_set_vg_fence_height, and cl_vg_fence_term to initialize, manage, and terminate the storage framework fence group and disk fencing. The AIX SCSI disk driver rejects disk write access and returns the EWPROTECT error when disks are fenced with read only access.

The following PowerHA logs from the hacmp.out file show how this feature is used by PowerHA SystemMirror 7.1.

  1. When the PowerHA node comes up, the node_up event calls cl_vg_fence_init to add a storage framework fence group for all enhanced concurrent volume groups, and sets all disks in the enhanced concurrent volume groups to fence height read only. This prevents write access to the disks.
    Dec 27 15:52:55 EVENT START: node_up hacmp93
    ...
    :node_up[256] : Setup VG fencing. This must be done prior to any potential disk access.
    :node_up[258] node_up_vg_fence_init
    ...
    :node_up:havg1[node_up_vg_fence_init:104] : Volume group havg1 is an HACMP resource
    :node_up:havg1[node_up_vg_fence_init:110] fence_height=ro
    ...
    :cl_vg_fence_redo[112] cl_vg_fence_init -c havg1 ro hdisk2
    ...
    cl_vg_fence_init[279]: sfwAddFenceGroup(havg1, 1, hdisk2)
    cl_vg_fence_init[385]: creat(/usr/es/sbin/cluster/etc/vg/havg1.uuid)
    cl_vg_fence_init[393]: write(/usr/es/sbin/cluster/etc/vg/havg1.uuid, 16)
    cl_g_fence_init[427]: sfwSetFenceGroup(vg=havg1, height=ro(2)  
            uuid=7b139a551bc137c92bd7e1a28905a901):
    ...
    cl_vg_fence_redo[113] RC=0

    The sfwAddFenceGroup() and sfwSetFenceGroup() interfaces for the fence group and disk fencing of AIX storage framework and the AIX kdb command could show the detail of fence groups and disks fencing in the storage framework.

    (0)> sfw_info
    sfw_info_t *sfw_info = 0x60D02C0
    struct anchor {
        int cfg_cnt = 0x2
        lock_t lock = 0xFFFFFFFFFFFFFFFF;
        void *entrypts = 0xF1000A0000A8A800
        sfwobjhdr_t *cluster_head = 0xF1000A1FC0BD6200
        sfwobjhdr_t *cluster_tail = 0xF1000A1FC0BD6200
        sfwobjhdr_t *node_head = 0xF1000A1FC0BD6D80
        sfwobjhdr_t *node_tail = 0xF1000A1FC0BD6D80
        sfwobjhdr_t *fgrp_head = 0xF1000A1FC0BD7A80 <- Fence group
        sfwobjhdr_t *fgrp_tail = 0xF1000A17D0F37200
        sfwobjhdr_t *disk_head = 0xF1000A0031042500 <- Disk Fencing
        sfwobjhdr_t *disk_tail = 0xF1000A17D0F34880
    ...
    
    (0)> sfwobjhd -l 0xF1000A1FC0BD7A80
    struct object_header {
        uint   type = 0x7
        uint   subtype = 0x0
        char   name = "havg1"  <- Set fence group name to VG name
        char   uuid = 0x7B139A551BC137C92BD7E1A28905A901
        void   *dptr = 0xF1000A1FC0B00420
        struct object_header *next = 0xF1000A17D0F37200
        struct object_header *link = 0xF1000A0031042500
    }
    object_header.dptr=0xF1000A1FC0B00420
    struct sfw_fencegroup_obj {
        uint  height = 0x2
        sfwobjhdr_t *head = 0x0
    }
    
    (0)> sfwobjhd -l 0xF1000A0031042500
    struct object_header {
        uint   type = 0x3
        uint   subtype = 0x0
        char   name = "hdisk2"
        char   uuid = 0x80EE86210C6F8A0CAC83D697E5041B6C
        void   *dptr = 0xF1000A0031042580
        struct object_header *next = 0xF1000A0031042600
        struct object_header *link = 0x0
    }
    object_header.dptr=0xF1000A0031042580
    struct sfw_disk_obj {
        dev_t  devno = 0x8000001800000001
        uint   state = 0x8017
        uint   type  = 0x1
        uint   height = 0x2
    ...
    (0)> scsidisk hdisk2
    	...
        ulong64_t state = 0x0000000000080002;
        ONLINE
    	FENCED    <- FENCED state prevents write access from SCSI disk driver
    	...
    	    struct sfw_dattrs_t &sfw = 0xF1000A0150736B50
    	    {
    	        uint64_t state = 0x10017;
    	        uint  local_state = 0x00000000;
    	        uchar dtype    = 0x1;
    	        uchar fheight  = 0x2;
    	        uchar rmode    = 0x201;
    	        uchar orig_rmode = 0x201;
    	        uchar raw_devsn  = 0xA059329A;
    	    };
    	...
  1. Prior to vary on enhanced concurrent volume groups, the fence height of VG is set to read write by cl_set_vg_fence_height. Then the varyonvg command could work, and it is returned to 'read only' after passive vary on, to prevent disk write access again.
    :cl_pvo(0.123):havg1[849] varyonp havg1
    ...
    :cl_pvo(0.124):havg1[varyonp:271] : Make sure the volume group is not fenced. 
            Varyon requires read write access.
    :cl_pvo(0.124):havg1[varyonp:274] cl_set_vg_fence_height -c havg1 rw
    :cl_pvo(0.128):havg1[varyonp:275] RC=0
    ...
    :cl_pvo(0.128):havg1[varyonp:289] : Try to vary on the volume group in passive concurrent mode
    :cl_pvo(0.128):havg1[varyonp:291] varyonvg -c -P -O havg1
    :cl_pvo(0.717):havg1[varyonp:292] rc=0
    ...
    :cl_pvo(0.717):havg1[varyonp:345] : Restore the fence height to read only, for passive varyon
    :cl_pvo(0.718):havg1[varyonp:347] cl_set_vg_fence_height -c havg1 ro
    :cl_pvo(0.718):havg1[varyonp:348] RC=0
  1. When the PowerHA resource groups come online, the enhanced concurrent volume groups are varied on in active mode, and the fence height of them are set to read write, and the write access to the disks are granted.
    +rg1:cl_activate_vgs(0.103):havg1[vgs_chk:91] clvaryonvg -n havg1
    ...
    +rg1:clvaryonvg(0.038):havg1[807] : require read/write access. 
            So, set any volume group fencing appropriately.
    +rg1:clvaryonvg(0.038):havg1[809] cl_set_vg_fence_height -c havg1 rw
    +rg1:clvaryonvg(0.041):havg1[810] RC=0
    ...
    +rg1:clvaryonvg(0.255):havg1[1037] varyonvg -n -c -A -O havg1
    +rg1:clvaryonvg(0.773):havg1[1038] varyonvg_rc=0
  1. The fence height of enhanced concurrent volume groups is kept in the read write mode until the resource groups fall over to another node. When the fall over happened, the enhanced concurrent volume groups are varied off, and their fence height is changed to read only.
    Dec 28 03:52:03 EVENT START: rg_move hacmp94 1 RELEASE
    
    +rg1:cl_deactivate_vgs:havg1[712] vgs_varyoff havg1 32
    ...
    +rg1:cl_deactivate_vgs(1.515):havg1[vgs_varyoff:281] (( 0 == 0 ))
    +rg1:cl_deactivate_vgs(1.515):havg1[vgs_varyoff:284] : successful varyoff, 
            set the fence height to read-only
    +rg1:cl_deactivate_vgs(1.516):havg1[vgs_varyoff:287] cl_set_vg_fence_height -c havg1 ro
    +rg1:cl_deactivate_vgs(1.519):havg1[vgs_varyoff:288] RC=0

    And the fence height of the enhanced concurrent volume groups in the node that takes over the resource groups is changed from read only to read write, and the write access to the disks is always granted only for the active node.

    +rg1:process_resources[3091] eval JOB_TYPE=DISKS ACTION=ACQUIRE HDISKS='"hdisk3"' 
            RESOURCE_GROUPS='"rg1' '"' VOLUME_GROUPS='"havg1"'
    ...
    +rg1:clvaryonvg(0.030):havg1[807] : require read/write access. 
            So, set any volume group fencing appropriately.
    +rg1:clvaryonvg(0.030):havg1[809] cl_set_vg_fence_height -c havg1 rw
    +rg1:clvaryonvg(0.033):havg1[810] RC=0
  1. When PowerHA services are stopped on all the nodes, the fence height of all the enhanced concurrent volume groups is set toto read write in all nodes.
    :node_down_complete[356] : The last node out turns off fencing on all nodes
    :node_down_complete[358] node_down_vg_fence_term
    ...
    :node_down_complete:havg1[node_down_vg_fence_term:81] 
            cl_on_cluster -P 'cl_set_vg_fence_height -c havg1 rw'
    :node_down_complete:havg1[node_down_vg_fence_term:89] return 0

    If AIX APAR IV65140 was installed, the cl_vg_fence_term command would be called to terminate the disks fencing in all nodes.

LVM operation problems with storage framework disk fencing

PowerHA SystemMirror 7.1 implements strict controls on the disk write access. It calls the cl_set_vg_fence_height command with the rw option to allow the disk write access for all its commands that need to write to the disks, and it prohibits disks write access again when these commands are completed. It requires you to perform all LVM change operations for the PowerHA enhanced concurrent volume groups in the PowerHA C-SPOC menu, else you would get a LVM operation failure similar to the one that I had mentioned at the beginning of the article.

But not all LVM operations could be done within PowerHA C-SPOC (for example, the restvg command I mentioned). So sometimes, you may have to remove the enhanced concurrent volume groups from PowerHA or stop PowerHA services in all the nodes to do that. Now, you have an alternative way by breaking the disk fencing with the cl_set_vg_fence_height command as shown below:

# cl_set_vg_fence_height
Usage: cl_set_vg_fence_height [-c] <volume group> [rw|ro|na|ff]

# cl_set_vg_fence_height -c havg1 rw

Then use cl_getdisk to check the disk fencing status.

# cl_getdisk hdisk3
Disk name:                      hdisk3
Disk UUID:                      4248df415c0e31c8 1f4d3e193d4548e9
Fence Group UUID:               25251b0a37f7a5cd f9c3e9bd4a7f2d8e (vg1)
Disk device major/minor number: 19, 5
Fence height:                   0 (Read/Write)
Reserve mode:                   513 ( )
Disk Type:                      0x01 (Local access only)
Disk State:                     32785

If you want to terminate the disk fencing group, you can use the cl_vg_fence_term command.

# cl_vg_fence_term
Usage: cl_vg_fence_term [-c] <volume group>

# cl_vg_fence_term -c havg1

# cl_getdisk hdisk3
Disk name:                      hdisk3
Disk UUID:                      4248df415c0e31c8 1f4d3e193d4548e9
Fence Group UUID:               0000000000000000 0000000000000000 - Not in a Fence Group
Disk device major/minor number: 19, 5
Fence height:                   0 (Read/Write)
Reserve mode:                   513 ( )
Disk Type:                      0x01 (Local access only)
Disk State:                     17

Limitation

The storage framework disk fencing feature is supported only for Multipath I/O (MPIO) disks now. For other disk types, the use of this fencing capability by PowerHA SystemMirror 7.1 might result in error messages. You can ignore these types of error messages.

Resources


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=995543
ArticleTitle=Understanding storage framework disk fencing in IBM PowerHA SystemMirror 7.1
publish-date=01222015