DB2 10.5 for Linux, UNIX, and Windows

Configuring the cluster for high availability in a GDPC environment

The configuration procedure detailed in this topic is specific to the geographically dispersed DB2® pureScale® cluster (GDPC).

Before you begin

Ensure that you have GPFS™ replication set up (see Setting up GPFS replication in a GDPC environment.) If you are running on an AIX® operating system on a RoCE network, ensure you have set up a RoCE network (see Setting up a RoCE network in a GDPC environment (AIX).)

Procedure

  1. Update storage failure time-outs.
    1. Ensure that in the case of storage controller or site failure, an error is returned quickly to GPFS by setting the relevant device driver parameters. Note that the relevant parameters differs for different device drivers. Check storage controller documentation or consult a storage expert on site to ensure that errors are returned within 20 seconds.
      For example, on DS8K using the default AIX SDDPCM, the updates are:
      chdev -l hdiskX -a 'cntl_delay_time=20 cntl_hcheck_int=2' –P
      
      repeat for every hdiskx 
      
      chdev -1 fscsiY -a dyntrk=yes -a fc_err_recov=fast_fail -P
      
      repeat for every fscsiY adapter
      
      reboot the host
      
      repeat chdevs for every host in the cluster
    2. Verify the attributes have been set correctly on every computer:
      root> lsattr -El fscsi0  
      attach          switch     How this adapter is CONNECTED           False
      dyntrk          yes        Dynamic Tracking of FC Devices          True
      fc_err_recov    fast_fail  FC Fabric Event Error RECOVERY Policy   True
      
      root> lsattr -El hdiskA1
      PCM             PCM/friend/otherapdisk  Path Control Module              False
      PR_key_value    none                    Persistent Reserve Key Value     True
      Algorithm       fail_over               Algorithm                        True
      autorecovery    no                      Path/Ownership Autorecovery      True
      clr_q           no                      Device CLEARS its Queue on error True
      cntl_delay_time 20                      Controller Delay Time            True
      cntl_hcheck_int 2                       Controller Health Check Interval True
  2. Update the resource time-outs.
    Due to GPFS replication recovery requirements, recovery times for certain failures can be slightly longer in a geographically dispersedDB2 pureScale cluster (GDPC) environment than in a single-site DB2 pureScale environment. To account for this, some of the IBM Tivoli® System Automation for Multiplatforms resources need to have their timeout values adjusted. To adjust the time-outs, run the following commands once as root on any of the hosts in the cluster:
    	root> export CT_MANAGEMENT_SCOPE=2;
    	# Update 2 member-specific timeouts.  For these, the resource
    	# names to update will look like db2_<instance>_<member_id>-rs.
    	# In this example we have members 0-4, and our instance name is
    	# db2inst1:
    	root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application 		CleanupCommandTimeout=600
    	root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application 		MonitorCommandTimeout=600
    
    	# In the next two commands, replace ‘db2inst1’ with your instance
    	# owning ID
    	root> chrsrc -s "Name like 'primary_db2inst1_900-rs'" 			IBM.Application CleanupCommandTimeout=600
    	root> chrsrc -s "Name like 'ca_db2inst1_0-rs'" IBM.Application 		CleanupCommandTimeout=600
    
    	# In the following commands, replace ‘db2inst1’ with your
    	# instance owning ID, and repeat for each host in your cluster,
    	# except the tiebreaker host T
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA1'" 			IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA2'" 			IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostA3'" 			IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB1'" 			IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB2'" 			IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'instancehost_db2inst1_hostB3'" 			IBM.Application MonitorCommandTimeout=600
    
    	# In the last two commands, replace ‘db2inst1’ with your instance
    	# owning ID, and ‘hostA3’ with the hostname of the first CF added
    	# to the cluster, and ‘hostB3’ with the hostname of the second
    	# CF added to the cluster.
    	root> chrsrc -s "Name like 'cacontrol_db2inst1_128_hostA3'" 		IBM.Application MonitorCommandTimeout=600
    	root> chrsrc -s "Name like 'cacontrol_db2inst1_129_hostB3'" 		IBM.Application MonitorCommandTimeout=600
    To show the updated time-outs, run the following command as root:
    lsrsrc -t IBM.Application Name MonitorCommandTimeout CleanupCommandTimeout
  3. Verify the network resiliency scripts
    List out the network resiliency scripts:
    root> /home/db2inst1/sqllib/bin/db2cluster -cfs -list -network_resiliency -resources
    For every host, a condition is listed and looks as follows:
    condition 6:
            Name                        = "condrespV10_hostA1_condition_en2"
            Node                        = "hostA1.torolab.ibm.com"
            MonitorStatus               = "Monitored"
            ResourceClass               = "IBM.NetworkInterface"
            EventExpression             = "OpState != 1"
            EventDescription            = "Adapter is not online"
            RearmExpression             = "OpState = 1"
            RearmDescription            = "Adapter is online"
            SelectionString             = "IPAddress == '9.26.82.X'"
            Severity                    = "c"
            NodeNames                   = {}
            MgtScope                    = "l"
            Toggle                      = "Yes"
            EventBatchingInterval       = 0
            EventBatchingMaxEvents      = 0
            BatchedEventRetentionPeriod = 0
            BattchedEventMaxTotalSize   = 0
            RecordAuditLog              = "ALL"
    The SelectionString must match the IB, RoCE, or TCP/IP private Ethernet IP address for the host, except for on the tiebreaker host. On configurations with RDMA and AIX IB or Linux RoCE, it is the IB or or RoCE IP address. On configurations without RDMA or AIX RoCE, this is the IP address for the private Ethernet network. For any hosts where the SelectionString must match but does not, the IP address is not correct. In this case run:
    root> /home/db2inst1/sqllib/bin/db2cluster -cfs -repair -network_resiliency

Results

Your GDPC environment is installed and configured.

What to do next

You can create the database.