IBM Support

C-SPOC : "Mirror a Volume Group" can hang in PowerHA due to gsclvmd timeouts : Alternate process for prevention

How To


Summary

When mirroring or copying large quantities of data in a PowerHA clustered environment using CSPOC (such as mirroring a shared volume group), there is a risk that the group services LVM daemon (gsclvmd) can timeout during the data copy process. This can force the volume group off-line on the standby node, leaving CSPOC being unable to finish the process.

There is no specific data size that may cause this condition. It is typically seen when copying data sets of 1 TB or larger but can certainly occur with smaller amounts of data, depending on the capabilities and utilization of the systems involved.

Objective

Use a 7 step procedure with classic AIX LVM commands when mirroring or moving LVM data,  to remove the risk of having a gsclvmd timeout take the passive node VGs offline in a PowerHA cluster environment.         
                                                                         

Environment

Because gsclvmd is part of the AIX operating system, the issue is not associated with any one particular version of PowerHA.
In the last several years however the focus has been HA versions 7.1.x and 7.2.x.

Steps

1) Shutdown PowerHA on the standby node.  
Do not choose the "unmanage"  option; choose "Bring Resource Groups Offline"                        
                                                                         
   When it is done - when "lssrc -ls clstrmgrES" shows the node to be in       
   state "ST_INIT" (or possibly "NOT_CONFIGURED"), then "lsvg -o"     
   should show none of the shared volume groups managed by PowerHA       
                                                                         
2) On the standby node, run "exportvg <vg_name>" for the volume groups   
   that you want to mirror.   When that's done, "lspv" should not        
   show the names of the volume groups you exported next to any hdisk.   
                                                                         
3) On the active node, mirror the volume groups using the ordinary LVM   
   commands - No HA commands IE: not C-SPOC, not the cli_* C-SPOC commands,
nor clmgr commands.                                                             
                                                                         
4) On the standby node, if the last time you ran lspv (step 2), it did not   
   show the names of all the hdisks you expect to be there, run          
   "cfgmgr".  Afterwards, lspv should show all the same disks that the   
   active node does.                                                    
                                                                         
5) On the active node, for each of the volume groups, run "ls -l         
   /dev/<vg_name>" and note the major number     
NOTE: For step 6 below, it is not required to use a -V flag for the same major number on the standby node
when importing the volume group, unless the HA resource group is doing any NFS importing or exporting.
If the resource group is managing NFS, then the volume group major number on all cluster nodes MUST match.
That is to say the major number of a volume group must be the same on both nodes for the same volume group name.
(Again, if the resource group is NOT managing any NFS exports or mounts, then the VG major numbers do not have to match on the nodes, so you are not required to use the -V flag with the major number in the importvg command).
                                                                         
6) On the standby node, for each of the volume groups, run "importvg -V  <major number> -y <volume group name>  -c <hdisk name>".  
The  "<hdisk name>" should be the name of some hdisk that you know is part of the volume group.   The expected result is these messages:    
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<vg_name>                                                                
0516-783 importvg: This imported volume group is concurrent capable.     
        Therefore, the volume group must be varied on manually           
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  >>Despite the message, do NOT manually vary on the volume group.         
                                                                         
7) At this point, when all the volume groups of interest have been       
   imported, into the standby node it should be possible to start PowerHA again on that node.   
==========================================================================
==========================================================================
Example conducted on a test cluster:
Example Goal: Mirror the VG named 'Important_VG' for the purpose of having one data copy on SAN storage unit A (where hdisk2 is located),
and a second backup data copy on SAN storage unit B (where hdisk3 is located).
This is a simple example using just two hdisks, the same actions apply for volume groups with many disks and larger data quantities.
Before beginning the process both cluster nodes have been masked at the storage level and cfgmgr ran, so both
nodes see all the necessary hdisks from both storage units.

Beginning conditions:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
PrimaryNode:
PrimaryNode /
# lssrc -ls clstrmgrES | grep ST_
Current state: ST_STABLE
PrimaryNode /
# clRGinfo
-----------------------------------------------------------------------------
Group Name                   State            Node                           
-----------------------------------------------------------------------------
rg1                          ONLINE           PrimaryNode                                 
                             OFFLINE          StandbyNode                                   

PrimaryNode /
# lsvg -o
Important_VG
caavg_private
rootvg

PrimaryNode /
# lspv
hdisk0          00f7f4948e8129b6                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG    concurrent
hdisk3          00f7f494d70ee0cf                    None
hdisk4          00f7f494d75ae11f                    None
PrimaryNode /
# lsvg -p Important_VG
Important_VG:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            1271        1271        255..254..254..254..254
--------------------------------------------------
StandbyNode:
StandbyNode /
# lssrc -ls clstrmgrES | grep ST_
Current state: ST_STABLE
StandbyNode /
# lsvg -o
caavg_private
rootvg
StandbyNode /
# lspv
hdisk0          00f7f4958f1c583d                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG    concurrent
hdisk3          00f7f494d70ee0cf                    None
hdisk4          00f7f494d75ae11f                    None
StandbyNode /
# lsvg -p Important_VG
Important_VG:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            1271        1271        255..254..254..254..254
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

Step 1:
StandbyNode:
StandbyNode /
#smitty clstop
stop cluster services on StandbyNode setting RG to come OFFLINE

StandbyNode /
# lssrc -ls clstrmgrES | grep ST_
Current state: ST_INIT

-----------------------------------
PrimaryNode:
PrimaryNode /
# lssrc -ls clstrmgrES | grep ST_
Current state: ST_STABLE

Step 2:
StandbyNode:
StandbyNode /
# exportvg Important_VG
StandbyNode /
# lspv
hdisk0          00f7f4958f1c583d                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    None
hdisk3          00f7f494d70ee0cf                    None
hdisk4          00f7f494d75ae11f                    None
Step 3:
PrimaryNode:
PrimaryNode /
# smitty lvm - >  Volume Groups - > Set Characteristics of a Volume Group - >
Add a Physical Volume to a Volume Group - > last screen:
                            [Entry Fields]
  Force the creation of a volume group?               no                        +
* VOLUME GROUP name                                  [Important_VG]             +
* PHYSICAL VOLUME names                              [hdisk3]                   
Hit enter and smit will make the change.

PrimaryNode /
# lspv
hdisk0          00f7f4948e8129b6                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG    concurrent
hdisk3          00f7f494d70ee0cf                    Important_VG    concurrent
hdisk4          00f7f494d75ae11f                    None

        
PrimaryNode /
# smitty lvm - >  Volume Groups - > Mirror a Volume Group - > choose Important_VG - >  choose the disks - >
Last screen
Note:  the default sync mode is "foreground", though you may wish to change that to background on very large VGs
          Mirror a Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
                                                        [Entry Fields]
* VOLUME GROUP name                                   Important_VG
  Mirror sync mode                                   [Foreground]       +
  PHYSICAL VOLUME names                              [hdisk2 hdisk3]    +
  Number of COPIES of each logical                    2                 +
    partition
  Keep Quorum Checking On?                            no                +
  Create Exact LV Mapping?                            no                +
 Hit enter and Get OK from smit.
                                                                                  +
PrimaryNode /
# lsvg -p Important_VG
Important_VG:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            1271        1257        255..240..254..254..254
hdisk3            active            1271        1257        255..240..254..254..254

PrimaryNode /
# lsvg -l Important_VG
Important_VG:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
loglv00             jfs2log    1       2       2    open/syncd    N/A
fslv02              jfs2       13      26      2    open/syncd    /testmount

Step 4:
PrimaryNode:
PrimaryNode /
# lspv
hdisk0          00f7f4948e8129b6                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG    concurrent
hdisk3          00f7f494d70ee0cf                    Important_VG    concurrent
hdisk4          00f7f494d75ae11f                    None
---------------------------------------------------
StandbyNode:
StandbyNode /
# lspv
hdisk0          00f7f4958f1c583d                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    None
hdisk3          00f7f494d70ee0cf                    None
hdisk4          00f7f494d75ae11f                    None
Step 5:
PrimaryNode:
PrimaryNode /
# ls -l /dev/Important_VG
crw-rw----    1 root     system       38,  0 Apr 06 11:07 /dev/Important_VG

Step 6:
StandbyNode:
StandbyNode /
# importvg -V 38 -y Important_VG -c  hdisk3
Important_VG
0516-783 importvg: This imported volume group is concurrent capable.
        Therefore, the volume group must be varied on manually.

StandbyNode /
# lspv
hdisk0          00f7f4958f1c583d                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG
hdisk3          00f7f494d70ee0cf                    Important_VG
hdisk4          00f7f494d75ae11f                    None
StandbyNode /
# ls -l /dev/Important_VG
crw-rw----    1 root     system       38,  0 Apr 06 14:14 /dev/Important_VG

Step 7:
StandbyNode:
StandbyNode /
#smitty clstart
                             Start Cluster Services
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
                                                        [Entry Fields]
* Start now, on system restart or both                now                    +
  Start Cluster Services on these nodes              [StandbyNode]               +
* Manage Resource Groups                              Automatically          +
  BROADCAST message at startup?                       false                  +
  Startup Cluster Information Daemon?                 false                  +
  Ignore verification errors?                         false                  +
  Automatically correct errors found during           Interactively          +
  cluster start?

Hit enter, get an OK from smit, confirm cluster services are STABLE, and VG is concurrent passive.
StandbyNode /
# lssrc -ls clstrmgrES | grep ST_
Current state: ST_STABLE
StandbyNode /
# lspv
hdisk0          00f7f4958f1c583d                    rootvg          active
hdisk1          00f7f4955b193def                    caavg_private   active
hdisk2          00f7f494d751c262                    Important_VG    concurrent
hdisk3          00f7f494d70ee0cf                    Important_VG    concurrent
hdisk4          00f7f494d75ae11f                    None
StandbyNode /
# lsvg -L Important_VG
VOLUME GROUP:       Important_VG             VG IDENTIFIER:  00f7f49400004c0000000171503ee0fe
VG STATE:           active                   PP SIZE:        8 megabyte(s)
VG PERMISSION:      passive-only             TOTAL PPs:      2542 (20336 megabytes)
MAX LVs:            256                      FREE PPs:       2514 (20112 megabytes)
LVs:                2                        USED PPs:       28 (224 megabytes)
OPEN LVs:           0                        QUORUM:         1 (Disabled)
TOTAL PVs:          2                        VG DESCRIPTORS: 3
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         2                        AUTO ON:        no
Concurrent:         Enhanced-Capable         Auto-Concurrent: Disabled
VG Mode:            Concurrent
Node ID:            2                        Active Nodes:       1
MAX PPs per VG:     32768                    MAX PVs:        1024
LTG size (Dynamic): 512 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
MIRROR POOL STRICT: off
PV RESTRICTION:     none                     INFINITE RETRY: no
DISK BLOCK SIZE:    512                      CRITICAL VG:    yes
FS SYNC OPTION:     no                       CRITICAL PVs:   no

Additional Information

SUPPORT:

If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract.  The technical support specialist assigned to your case will confirm that you have completed these steps.

a.  Document and/or take screen shots of all symptoms, errors, and/or messages that might have occurred

b.  Capture any logs or data relevant to the situation

c.  Contact IBM to open a case:

   -For electronic support, please visit the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, please visit the web page:
      https://www.ibm.com/planetwide/

d.  Provide a good description of your issue and reference this technote

e.  Upload all of the details and data to your case

   -You can attach files to your case in the IBM Support Community
   -Or upload to the IBM Enhanced Customer Data Repository
      https://www.secure.ecurep.ibm.com/app/upload_sf

[{"Business Unit":{"code":"BU008","label":"Security"},"Product":{"code":"SGL4G4","label":"PowerHA"},"ARM Category":[{"code":"a8m50000000L0HFAA0","label":"PowerHA->LVM {PHLVM}"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"PowerHA 7.1.x;7.2.x","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
15 September 2020

UID

ibm16191241