Configuring the cluster for high availability in a GDPC environment
The configuration procedure detailed in this topic is specific to the geographically dispersed Db2® pureScale® cluster (GDPC). This procedure must be performed after the initial instance creation or after any subsequent repair or deletion operation for resource model or peer domain.
Before you begin
Ensure that you have Spectrum Scale replication set up (see Setting up IBM Spectrum Scale replication in a GDPC environment.) If you are running on an AIX operating system on a RoCE network, ensure you have set up a RoCE network.
Procedure
-
Update storage failure time-outs.
-
Ensure that in the case of storage controller or site failure, an error is returned quickly to
Spectrum Scale by setting the relevant device driver
parameters. Note that the relevant parameters differs for different device drivers. Check storage
controller documentation or consult a storage expert on site to ensure that errors are returned
within 20 seconds.
For example, on DS8K using the default AIX® SDDPCM, the updates are:
chdev -l hdiskX -a 'cntl_delay_time=20 cntl_hcheck_int=2' –P repeat for every hdiskx chdev -1 fscsiY -a dyntrk=yes -a fc_err_recov=fast_fail -P repeat for every fscsiY adapter reboot the host repeat chdevs for every host in the cluster
- Verify the attributes have been set correctly on every
computer:
root> lsattr -El fscsi0 attach switch How this adapter is CONNECTED False dyntrk yes Dynamic Tracking of FC Devices True fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True root> lsattr -El hdiskA1 PCM PCM/friend/otherapdisk Path Control Module False PR_key_value none Persistent Reserve Key Value True Algorithm fail_over Algorithm True autorecovery no Path/Ownership Autorecovery True clr_q no Device CLEARS its Queue on error True cntl_delay_time 20 Controller Delay Time True cntl_hcheck_int 2 Controller Health Check Interval True
-
Ensure that in the case of storage controller or site failure, an error is returned quickly to
Spectrum Scale by setting the relevant device driver
parameters. Note that the relevant parameters differs for different device drivers. Check storage
controller documentation or consult a storage expert on site to ensure that errors are returned
within 20 seconds.
- Update the resource time-outs. Due to Spectrum Scale replication recovery requirements, recovery times for certain failures can be slightly longer in a geographically dispersed Db2 pureScale cluster (GDPC) environment than in a single-site Db2 pureScale environment. To account for this, some of the IBM Tivoli® System Automation for Multiplatforms resources need to have their timeout values adjusted. To adjust the time-outs, run the following commands once as root on any of the hosts in the cluster:
root> export CT_MANAGEMENT_SCOPE=2; # Update 2 member-specific timeouts. For these, the resource # names to update will look like db2_<instance>_<member_id>-rs. # In this example we have members 0-4, and our instance name is # db2inst1: root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application CleanupCommandTimeout=600 root> chrsrc -s "Name like 'db2_db2inst1_%-rs'" IBM.Application MonitorCommandTimeout=600 # In the next two commands, replace ‘db2inst1’ with your instance # owning ID root> chrsrc -s "Name like 'primary_db2inst1_900-rs'" IBM.Application CleanupCommandTimeout=600 root> chrsrc -s "Name like 'ca_db2inst1_0-rs'" IBM.Application CleanupCommandTimeout=600 # In the following commands, replace ‘db2inst1’ with your # instance owning ID, and repeat for each host in your cluster, # except the tiebreaker host T root> chrsrc -s "Name like 'instancehost_db2inst1_hostA1'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'instancehost_db2inst1_hostA2'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'instancehost_db2inst1_hostA3'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'instancehost_db2inst1_hostB1'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'instancehost_db2inst1_hostB2'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'instancehost_db2inst1_hostB3'" IBM.Application MonitorCommandTimeout=600 # In the last two commands, replace ‘db2inst1’ with your instance # owning ID, and ‘hostA3’ with the hostname of the first CF added # to the cluster, and ‘hostB3’ with the hostname of the second # CF added to the cluster. root> chrsrc -s "Name like 'cacontrol_db2inst1_128_hostA3'" IBM.Application MonitorCommandTimeout=600 root> chrsrc -s "Name like 'cacontrol_db2inst1_129_hostB3'" IBM.Application MonitorCommandTimeout=600
To show the updated time-outs, run the following command as root:lsrsrc -t IBM.Application Name MonitorCommandTimeout CleanupCommandTimeout
- To
create network resiliency resources for private Ethernet network run:
root> /home/db2inst1/sqllib/bin/db2cluster -cfs -repair -network_resiliency -all
On every host, verify the network resiliency configuration by running:root> /home/db2inst1/sqllib/bin/db2cluster -cfs -verify -network_resiliency Sample output of successful verification: $ db2cluster -cfs -verify -network_resiliency Successfully verified configurations on local host.
Results
What to do next
You can create the database.