IBM Support

Using disk tiebreaker for HADR with TSA configuration

Technical Blog Post


Abstract

Using disk tiebreaker for HADR with TSA configuration

Body

If you prefer to use a disk device as quorum, you can do that for Db2 HADR ( High availability disaster recovery ) and TSA (Tivoli System Automation) configuration.
The feature in 'db2haicu' command menu was added in V11.1 Mod Pack 4 and Fix Pack 4.
 
"The Disk Tiebreaker and Majority Node Set quorum mechanisms can now be configured through the db2haicu environment. "
 
Currently, most of the deployed systems are using a network IP address that is pingable from both HADR primary and standby hosts
as that has been the only option for 'db2haicu' menu of previous versions.
Using network itself is enough for a quorum, but some system administrators may have a preference of using the disk if possible.
 
In this post, I will show how it looks like and some error pattern you may encounter when missing prerequisite.
The following manual page includes 'Db2 cluster services tiebreaker support' for pureScale, but the same applies for HADR and TSA configuration too for disk quorum.
 
 
From the link, 'sg_persist' linux command is introduced to check SCSI-3 PR status of the target device for using as disk quorum.
So the linux package 'sg3_utils' including this command should be installed on all hosts in advance. For the installation, by 'root' user, run 'yum -y install sg3_utils' on all hosts or manual installation up to system administrator.
Actually, 'db2haicu' will run this command internally and get the following error in db2diag.log if it's not installed.
 
 
2632-122 : The following error was returned from the TieBreaker subsystem:
       command '/usr/bin/sg_persist' is not found; FFDC ID:
 
Regarding what to fill in with 'db2haicu' menu step, the following document manual introduces for the valid input type in case of using DISK tiebreaker.
 
In short, it would be 
- AIX : device path or pvid for AIX
- Linux : device path, WWN or WWID.
 
For WWN and WWID, system or Storage/SAN administrator should be able to give input.
 
In this page, I will describe example using a device path on Linux.
 
My test environment.
Linux
Hostname : parang1, parang2
Device path for using disk quorum : /dev/sda
 
 
1. Check SCSI-3 PR status.
 
[root@parang2 rpttr]# sg_persist -c /dev/sda
  LIO-ORG   ps111             4.0
  Peripheral device type: disk
Report capabilities response:
  Compatible Reservation Handling(CRH): 1
  Specify Initiator Ports Capable(SIP_C): 1
  All Target Ports Capable(ATP_C): 1
  Persist Through Power Loss Capable(PTPL_C): 1
  Type Mask Valid(TMV): 1
  Allow Commands: 1
  Persist Through Power Loss Active(PTPL_A): 0
    Support indicated in Type mask:
      Write Exclusive, all registrants: 1
      Exclusive Access, registrants only: 1
      Write Exclusive, registrants only: 1      <=== Ensure that Write Exclusive, registrants only has a value of 1. 
      Exclusive Access: 1
      Write Exclusive: 1
      Exclusive Access, all registrants: 1
 
2. Change the permission of the device. The 'db2haicu' command will be run by Db2 instance user. 
   So it will report the following error in db2diag.log if it does not have permission to access the device path.
 
"The disk provided '/dev/sda' does not exist or not a valid tiebreaker, since the WWN could not be obtained.".
 
Change the permission like the following steps.
 
(By 'root' user)
 
[root@parang2 ~]# ls -l /dev/sda
brw-rw---- 1 root disk 8, 0 Sep 11 16:37 /dev/sda
[root@parang2 ~]# chmod 666 /dev/sda
[root@parang2 ~]# ls -l /dev/sda
brw-rw-rw- 1 root disk 8, 0 Sep 11 16:37 /dev/sda
 
[root@parang1 ~]# ls -l /dev/sda
brw-rw---- 1 root disk 8, 0 Sep  9 15:41 /dev/sda
[root@parang1 ~]# chmod 666 /dev/sda
[root@parang1 ~]# ls -l /dev/sda
brw-rw-rw- 1 root disk 8, 0 Sep  9 15:41 /dev/sda
 
 
3. Now, I am good to go ahead and let me do some steps before running 'db2haicu'.
 
On each host, run the 'preprpnode' command with two host names. 
 
( By 'root' user )
 
[root@parang1 ~] export CT_MANAGEMENT_SCOPE=2
[root@parang1 ~] preprpnode parang1 parang2
 
[root@parang2 ~] export CT_MANAGEMENT_SCOPE=2
[root@parang2 ~] preprpnode parang1 parang2
 
 
4. Make sure HADR is PEER status running 'db2pd -hadr -db <dbname> '
 
5. Run 'db2haicu' on standby host.
 
(By Db2 instance user )
 
[db2inst1@parang2 ~]$ db2haicu
..
 
Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
hadr_tsa
Nodes must now be added to the new domain.
How many cluster nodes will the domain 'hadr_tsa' contain?
2
Enter the host name of a machine to add to the domain:
parang1
Enter the host name of a machine to add to the domain:
parang2
db2haicu can now create a new domain containing the 2 machines that you specified. If you choose not to create a domain now, db2haicu will exit.
 
Create the domain now? [1]
1. Yes
2. No
1
Creating domain 'hadr_tsa' in the cluster ...
Creating domain 'hadr_tsa' in the cluster was successful.
You can now configure a quorum device for the domain. For more information, see the topic "Quorum devices" in the DB2 Information Center. If you do not configure a quorum device for the domain, then a human operator will have to manually intervene if subsets of machines in the cluster lose connectivity.
 
Configure a quorum device for the domain called 'hadr_tsa'? [1]
1. Yes
2. No
1
The following is a list of supported quorum device types:
  1. Network Quorum
  2. Disk Quorum
  3. Majority Node Set
Enter the number corresponding to the quorum device type to be used: [1]
2
The following is a list of supported input types for disk quorum device:
1. Device Name
2. WWN
3. WWID
Enter the number corresponding to the quorum device type to be used: [1]
1
Enter the device name for the disk device on cluster node 'parang2':
/dev/sda
Configuring quorum device for domain 'hadr_tsa' ...
Configuring quorum device for domain 'hadr_tsa' was successful.
Run 'db2prereqcheck -tb_dev <devicename> -hl <hostname>' as root to verify that the disk is a valid tiebreaker.
The cluster manager found the following total number of network interface cards on the machines in the cluster domain: '2'.  You can add a network to your cluster domain using the db2haicu utility.
 
Create networks for these network interface cards? [1]
1. Yes
2. No
1
 
...
 
The remained steps are same as typical HADR / TSA configuration.
 
5.After completing the 'db2haicu' steps on both standby and primary., 'lssam' will be similar like below.
 
[db2inst1@parang1 ~]$ lssam
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_SAMPLE-rg Nominal=Online
        |- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs
                |- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:parang1
                '- Offline IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:parang2
        '- Online IBM.ServiceIP:db2ip_172_20_20_200-rs
                |- Online IBM.ServiceIP:db2ip_172_20_20_200-rs:parang1
                '- Offline IBM.ServiceIP:db2ip_172_20_20_200-rs:parang2
Online IBM.ResourceGroup:db2_db2inst1_parang1_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_parang1_0-rs
                '- Online IBM.Application:db2_db2inst1_parang1_0-rs:parang1
Online IBM.ResourceGroup:db2_db2inst1_parang2_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_parang2_0-rs
                '- Online IBM.Application:db2_db2inst1_parang2_0-rs:parang2
Online IBM.Equivalency:db2_db2inst1_db2inst1_SAMPLE-rg_group-equ
        |- Online IBM.PeerNode:parang1:parang1
        '- Online IBM.PeerNode:parang2:parang2
Online IBM.Equivalency:db2_db2inst1_parang1_0-rg_group-equ
        '- Online IBM.PeerNode:parang1:parang1
Online IBM.Equivalency:db2_db2inst1_parang2_0-rg_group-equ
        '- Online IBM.PeerNode:parang2:parang2
Online IBM.Equivalency:db2_public_network_0
        |- Online IBM.NetworkInterface:eth0:parang1
        '- Online IBM.NetworkInterface:eth0:parang2
 
6. To see the created tiebreaker resource, run the following.
 
[db2inst1@parang1 ~]$ lsrsrc -Ab IBM.TieBreaker
Resource Persistent and Dynamic Attributes for IBM.TieBreaker
resource 1:
        Name                = "db2_Quorum_Disk:16_58_6"
        Type                = "SCSIPR"
        DeviceInfo          = "DEVICE=/dev/sda"
        ReprobeData         = ""
        ReleaseRetryPeriod  = 0
        HeartbeatPeriod     = 5
        PreReserveWaitTime  = 0
        PostReserveWaitTime = 0
        NodeInfo            = {}
        ActivePeerDomain    = "hadr_tsa"
        ConfigChanged       = 0
..<snippet>..
 
 
You see it's configured as expected.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11139854