DB2 10.5 for Linux, UNIX, and Windows

Configuring the cluster for high availability in a GDPC environment

The configuration procedure detailed in this topic is specific to the geographically dispersed DB2® pureScale® cluster (GDPC).

Before you begin

Ensure that you have GPFS™ replication set up (see Setting up GPFS replication in a GDPC environment.) If you are running on an AIX® operating system on a RoCE network, ensure you have set up a RoCE network (see Setting up a RoCE network in a GDPC environment (AIX).)

Procedure

Update storage failure time-outs.

Ensure that in the case of storage controller or site failure, an error is returned quickly to GPFS by setting the relevant device driver parameters. Note that the relevant parameters differs for different device drivers. Check storage controller documentation or consult a storage expert on site to ensure that errors are returned within 20 seconds.
For example, on DS8K using the default AIX SDDPCM, the updates are:
```
chdev -l hdiskX -a 'cntl_delay_time=20 cntl_hcheck_int=2' –P

repeat for every hdiskx 

chdev -1 fscsiY -a dyntrk=yes -a fc_err_recov=fast_fail -P

repeat for every fscsiY adapter

reboot the host

repeat chdevs for every host in the cluster
```

Verify the attributes have been set correctly on every computer:

root> lsattr -El fscsi0  
attach          switch     How this adapter is CONNECTED           False
dyntrk          yes        Dynamic Tracking of FC Devices          True
fc_err_recov    fast_fail  FC Fabric Event Error RECOVERY Policy   True

root> lsattr -El hdiskA1
PCM             PCM/friend/otherapdisk  Path Control Module              False
PR_key_value    none                    Persistent Reserve Key Value     True
Algorithm       fail_over               Algorithm                        True
autorecovery    no                      Path/Ownership Autorecovery      True
clr_q           no                      Device CLEARS its Queue on error True
cntl_delay_time 20                      Controller Delay Time            True
cntl_hcheck_int 2                       Controller Health Check Interval True

Update the resource time-outs.

Due to GPFS replication recovery requirements, recovery times for certain failures can be slightly longer in a geographically dispersedDB2 pureScale cluster (GDPC) environment than in a single-site DB2 pureScale environment. To account for this, some of the IBM Tivoli® System Automation for Multiplatforms resources need to have their timeout values adjusted. To adjust the time-outs, run the following commands once as root on any of the hosts in the cluster:

	root> export CT_MANAGEMENT_SCOPE=2;
	# Update 2 member-specific timeouts.  For these, the resource
	# names to update will look like db2_<instance>_<member_id>-rs.
	# In this example we have members 0-4, and our instance name is
	# db2inst1:
	root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application 		CleanupCommandTimeout=600
	root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application 		MonitorCommandTimeout=600

	# In the next two commands, replace ‘db2inst1’ with your instance
	# owning ID
	root> chrsrc -s "Name like 'primary_db2inst1_900-rs'" 			IBM.Application CleanupCommandTimeout=600
	root> chrsrc -s "Name like 'ca_db2inst1_0-rs'" IBM.Application 		CleanupCommandTimeout=600

	# In the following commands, replace ‘db2inst1’ with your
	# instance owning ID, and repeat for each host in your cluster,
	# except the tiebreaker host T
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA1'" 			IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA2'" 			IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA3'" 			IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB1'" 			IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB2'" 			IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB3'" 			IBM.Application MonitorCommandTimeout=600

	# In the last two commands, replace ‘db2inst1’ with your instance
	# owning ID, and ‘hostA3’ with the hostname of the first CF added
	# to the cluster, and ‘hostB3’ with the hostname of the second
	# CF added to the cluster.
	root> chrsrc -s "Name like 'cacontrol_db2inst1_128_hostA3'" 		IBM.Application MonitorCommandTimeout=600
	root> chrsrc -s "Name like 'cacontrol_db2inst1_129_hostB3'" 		IBM.Application MonitorCommandTimeout=600

To show the updated time-outs, run the following command as root:

lsrsrc -t IBM.Application Name MonitorCommandTimeout CleanupCommandTimeout

Verify the network resiliency scripts

List out the network resiliency scripts:

root> /home/db2inst1/sqllib/bin/db2cluster -cfs -list -network_resiliency -resources

For every host, a condition is listed and looks as follows:

condition 6:
        Name                        = "condrespV10_hostA1_condition_en2"
        Node                        = "hostA1.torolab.ibm.com"
        MonitorStatus               = "Monitored"
        ResourceClass               = "IBM.NetworkInterface"
        EventExpression             = "OpState != 1"
        EventDescription            = "Adapter is not online"
        RearmExpression             = "OpState = 1"
        RearmDescription            = "Adapter is online"
        SelectionString             = "IPAddress == '9.26.82.X'"
        Severity                    = "c"
        NodeNames                   = {}
        MgtScope                    = "l"
        Toggle                      = "Yes"
        EventBatchingInterval       = 0
        EventBatchingMaxEvents      = 0
        BatchedEventRetentionPeriod = 0
        BattchedEventMaxTotalSize   = 0
        RecordAuditLog              = "ALL"

The SelectionString must match the IB, RoCE, or TCP/IP private Ethernet IP address for the host, except for on the tiebreaker host. On configurations with RDMA and AIX IB or Linux RoCE, it is the IB or or RoCE IP address. On configurations without RDMA or AIX RoCE, this is the IP address for the private Ethernet network. For any hosts where the SelectionString must match but does not, the IP address is not correct. In this case run:

root> /home/db2inst1/sqllib/bin/db2cluster -cfs -repair -network_resiliency

Results

Your GDPC environment is installed and configured.

What to do next

You can create the database.