DB2 Version 10.1 for Linux, UNIX, and Windows

Getting the cluster installed and running in a GDPC environment

There are procedures to be followed in order to get the geographically dispersed DB2® pureScale® cluster (GDPC) installed, and up and running.

Before you begin

Ensure that you have the three sites setup with the proper hardware configurations. See Configuring a GDPC environment.

Procedure

Install the DB2 pureScale feature on two sites. Install the DB2 pureScale Feature on two sites using the db2setup command (for example, site A and site B). Using the Advanced Configuration menu, designate two hosts as the CFs and (optionally) one of the two to be the preferred primary CF. In the example, the hosts are hostA1, hostA2, hostB1, and hostB2.
On site A, designate hostA1, hostA2, hostB1, and hostB2 as members where hostB1 is the shared disk member and hostB2 is the tiebreaker member. During install the tiebreaker disk must be set up using one of the LUNs. This is temporary and can be changed later. For the following, an option is to use hdiskA2.

The file system that the db2setup command creates for the shared instance metadata is initially a non-replicated GPFS file system. This is converted later to a replicated file system across the sites.

Updating majority quorum and SCSI-3 PR settings

The tiebreaker setting might need to be updated to use Majority Node Set. Query the current tiebreaker device using the following command:
```
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -list 
-tiebreaker
```

If the output from the last step does not specify ‘Majority Node Set’ as the quorum device, it must be updated as follows:

root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -set -tiebreaker –majority
Configuring quorum device for domain 'db2domain_20110224005525' ...
Configuring quorum device for domain 'db2domain_20110224005525' was successful.

After updating the tiebreaker device, verify the setting and compare it to the expected output:
```
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -list 
-tiebreaker
The current quorum device is of type Majority Node Set.
```
Note: If the third site does not have direct access to the disks on the other two sites, SCSI-3 PR must be disabled.

Check to see if SCSI-3 PR is enabled. In the sample output, pr=yes indicates SCSI-3 PR is enabled:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsnsd –X

Disk name NSD volume ID Device Devtype Node name Remarks
--------------------------------------------------------
gpfs1nsd 091A33584D65F2F6 /dev/hdiskA1 hdisk hostA1 pr=yes

If your disks do not support SCSI-3 PR or you choose to disable it, run these commands:

root@hostA1:/opt/IBM/db2/V10.1/bin> su – db2inst1
db2inst1@hostA1:/home/db2inst1> db2stop force
02/24/2011 01:24:16 0 0 SQL1064N DB2STOP processing was successful.
02/24/2011 01:24:19 1 0 SQL1064N DB2STOP processing was successful.
02/24/2011 01:24:21 3 0 SQL1064N DB2STOP processing was successful.
02/24/2011 01:24:22 2 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
db2inst1@hostA1:/home/db2inst1> exit
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cfs -stop –all
All specified hosts have been stopped successfully.

Verify that GPFS is stopped on all hosts:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmgetstate -a
Node number Node name GPFS state
------------------------------------------
1 		hostA1 	down
2 		hostA2 	down
3 		hostA3 	down
4 		hostB1 	down
5 		hostB2 	down
6 		hostB3 	down

Disable SCSI-3 PR by issuing this command:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmchconfig usePersistentReserve=no
Verifying GPFS is stopped on all nodes ...
mmchconfig: Processing the disks on node hostA1.torolab.ibm.com
mmchconfig: Processing the disks on node hostA2.torolab.ibm.com
mmchconfig: Processing the disks on node hostA3.torolab.ibm.com
mmchconfig: Processing the disks on node hostB1.torolab.ibm.com
mmchconfig: Processing the disks on node hostB2.torolab.ibm.com
mmchconfig: Processing the disks on node hostB3.torolab.ibm.com
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all affected nodes. This 
is an asynchronous process.

Verify that SCSI-3 PR has been disabled (pr=yes is not displayed):

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsnsd -X
Disk name NSD volume ID Device Devtype Node name Remarks
--------------------------------------------------------
gpfs1nsd 091A33584D65F2F6 /dev/hdiskA1 hdisk hostA1

Verify that usePersistentReserve has been set to no

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsconfig
Configuration data for cluster db2cluster_20110224005554.torolab.ibm.com:
-----------------------------------------------------------
clusterName db2cluster_20110224005554.torolab.ibm.com
clusterId 655893150084494058
autoload yes
minReleaseLevel 3.4.0.7
dmapiFileHandleSize 32
maxFilesToCache 10000
pagepool 256M
verifyGpfsReady yes
assertOnStructureError yes
worker1Threads 150
sharedMemLimit 2047M
usePersistentReserve no
failureDetectionTime 35
leaseRecoveryWait 35
tiebreakerDisks gpfs1nsd
[hostA1]
psspVsd no
adminMode allToAll
File systems in cluster db2cluster_20110224005554.torolab.ibm.com:
------------------------------------------------------------------
/dev/db2fs1

Increase HostFailureDetectionTime for increased communication between sites. HostFailureDetectionTime is increased to a higher value than what would be set on a non-GDPC DB2 pureScale cluster. Changing this value allows for the increased communication lag between sites that is not present in a single-site DB2 pureScale cluster. If unexpected host down events are still triggered due to large inter-site distances, higher parameter values can be used, however this will increase the time required for DB2 pureScale to detect hardware failures or machine reboots, increasing the overall failure recovery time.
```
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -list 
-hostfailuredetectiontime
The host failure detection time is 4 seconds.
```
Change the value to 16 seconds and verify.
```
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -set -option hostfailuredetectiontime
 -value 16
The host failure detection time has been set to 16 seconds.
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -list 
-hostfailuredetectiontime
The host failure detection time is 16 seconds.
```

Add tiebreaker host into cluster to provide cluster quorum. The tiebreaker host provides cluster quorum, ensuring that during normal operation, the cluster contains an odd number of hosts. In case of a network outage between sites, only the site which can communicate with the tiebreaker host gains cluster quorum. In the following example, the tiebreaker host is Host T on site C.

Follow the steps in the Installation prerequisites section of the DB2 documentation to install the appropriate uDAPL level on the tiebreaker host. Then install DB2 software on the tiebreaker host:

Note: Install DB2 in the same location on all nodes for the operation to be completed successfully.

root@T:/path containing db2_install. /db2_install


Default directory for installation of products - /opt/IBM/db2/V10.1
***********************************************************
Do you want to choose a different directory to install [yes/no] ?
no


Specify one of the following keywords to install DB2 products.
ESE_DSF

Enter "help" to redisplay product names.
Enter "quit" to exit.
***********************************************************
ESE_DSF

DB2 installation is being initialized.

Total number of tasks to be performed: 46
Total estimated time for all tasks to be performed: 2850 second(s)

Task #1 start
...

Task #46 end

The execution completed successfully.
For more information see the DB2 installation log at /tmp/db2_install.log.nnnnnnnn.

Set up SSH for the db2sshid user on the tiebreaker host T. This user should be the same db2sshid user set during the installation on site A and site B. To check what user was used, run the following command on hostA:
```
root@hostA1>/var/db2/db2ssh/db2locssh display_config

version = 1
time_delta = 20 second(s)
debug_level = 2
db2sshid = db2inst1
gdkit_path = /opt/IBM/db2/V10.1/lib64/gskit/
fips_mode = on
```

Set up db2ssh on tiebreaker hostT. The following commands must be run as root:

Create the configuration file:
```
/var/db2/db2ssh/db2locssh reset_config
```

Set the GSKit path:

/var/db2/db2ssh/db2locssh set_gskit_path /opt/IBM/db2/V10.1/lib64/gskit/

Set db2sshid (db2sshid is determined from the previous step):
```
/var/db2/db2ssh/db2locssh set_db2sshid db2inst1
```

Verify the setting:

root@T>/var/db2/db2ssh/db2locssh display_config

version = 1
time_delta = 20 second(s)
debug_level = 2
db2sshid = db2inst1
gdkit_path = /opt/IBM/db2/V10.1/lib64/gskit/
fips_mode = on

Generate a private/public key pair:
```
/var/db2/db2ssh/db2locssh generate_keys
```

Perform key exchanges with every host in the cluster. Once the key exchange is completed, the /var/db2/db2ssh directory looks as shown:

hostA1:
root@hostA1.priv
root@hostA1.pub
root@hostA2.pub
root@hostA3.pub
root@hostB1.pub
root@hostB2.pub
root@hostB3.pub
root@T.pub

hostB1:
root@hostB1.priv
root@hostB1.pub
root@hostB2.pub
root@hostB3.pub
root@hostA1.pub
root@hostA2.pub
root@hostA3.pub
root@T.pub

T:
root@T.priv
root@T.pub
root@hostA1.pub
root@hostA2.pub
root@hostA3.pub
root@hostB1.pub
root@hostB2.pub
root@hostB3.pub

Set up the host key file. The following commands must be executed from the tiebreaker host to every other host, as well as from every other host to the tiebreaker host. When asked to save the host key file fingerprint, answer Yes:

root@T>/var/db2/db2ssh/db2locssh root@hostA1 hostname
hostA1
root@T>/var/db2/db2ssh/db2locssh root@hostB1 hostname
hostB1
root@T>/var/db2/db2ssh/db2locssh root@hostT hostname
hostT

root@hostA1>/var/db2/db2ssh/db2locssh root@T hostname
T
root@hostB1>/var/db2/db2ssh/db2locssh root@T hostname
T

Change the GPFS quorum type for the cluster to majority node set and verify:

root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cfs -set -tiebreaker –majority
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cfs -list 
-tiebreaker
The current quorum device is of type Majority Node Set.

Add the tiebreaker host to the RSCT cluster:

root@T> preprpnode hostA1 hostA2 hostB1 hostB2 hostA3 hostB3
root@hostA1:/opt/IBM/db2/V10.1/bin> lsrpnode
Name OpState RSCTVersion
hostB2 Online 3.1.2.2
hostB3 Online 3.1.2.2
hostA3 Online 3.1.2.2
hostB1 Online 3.1.2.2
hostA2 Online 3.1.2.2
hostA1 Online 3.1.2.2
root@hostA1:/opt/IBM/db2/V10.1/bin> /home/db2inst1/sqllib/bin/db2cluster -cm -add -host T
Adding node 'T' to the cluster ...
Trace spooling could not be enabled on the local host.
Adding node 'T' to the cluster was successful.

Verify that the tiebreaker host has been added to the RSCT cluster:

root@hostA1:/opt/IBM/db2/V10.1/bin> lsrpnode
Name OpState RSCTVersion
T Online 3.1.2.2
hostB3 Online 3.1.2.2
hostB2 Online 3.1.2.2
hostB1 Online 3.1.2.2
hostA3 Online 3.1.2.2
hostA2 Online 3.1.2.2
hostA1 Online 3.1.2.2

Add the tiebreaker host to the GPFS cluster. To mark this host as a quorum client ensure it never runs as a file system manager, token manager, or other role, this is done directly with the GPFS mmaddnode command:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsnode
GPFS nodeset Node list
------------- ------------------------------------------------
db2cluster_20110224005554 hostA1 hostA2 hostA3 hostB1 hostB2 hostB3
root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmaddnode T:quorum-client

Thu Feb 24 01:49:38 EST 2011: mmaddnode: Processing node T.torolab.ibm.com
mmaddnode: Command successfully completed
mmaddnode: Warning: Not all nodes have proper GPFS license designations.
	mmaddnode: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

Verify that the tiebreaker host has been added to the GPFS cluster:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsnode

===============================================================
| Warning: |
| This cluster contains nodes that do not have a proper GPFS license |
| designation. This violates the terms of the GPFS licensing agreement. |
| Use the mmchlicense command and assign the appropriate GPFS licenses |
| to each of the nodes in the cluster. For more information about GPFS |
| license designation, see the Concepts, Planning, and Installation Guide. |
===============================================================
GPFS nodeset Node list
------------- ----------------------------------------------
db2cluster_20110224005554 hostA1 hostA2 hostA3 hostB1 hostB2 hostB3 T

On the tiebreaker host add the GPFS license:

root@T:/opt/IBM/db2/V10.1/bin> ./db2cluster -cfs -add –license

The license for the shared file system cluster has been successfully added.

Verify the license warning message is gone:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsnode

GPFS nodeset Node list
------------- --------------------------------------------------
db2cluster_20110224005554 hostA1 hostA2 hostA3 hostB1 hostB2 hostB3 T

The /usr/lpp/mmfs/bin/mmlscluster command can be used to ensure that the tiebreaker host has been added to the GPFS cluster with a designation of “quorum” rather than “quorum-manager”. All other hosts in the cluster should be designated as quorum-manager. If the tiebreaker host is a quorum-manager its status can be changed to client with the /usr/lpp/mmfs/bin/mmchnode -–client -N hostT command.
The purpose of the tie breaker site is to ensure majority quorum in the event of a site outage and therefore does not require the file system to be mounted on the tie breaker site. To ensure the file system is not mounted, run the following command on the tie breaker site:
```
echo "example text" > /var/mmfs/etc/ignoreStartupMount
```
If SCSI-3 PR is required, you can turn on the SCSI-3 PR flags by running the following commands:
1. remove /var/mmfs/etc/ignoreAnyMount
2. mmshutdown -a
3. mmchconfig usePersistentReserve=yes
4. mmstartup -a
5. mmmount all -a
6. mmumount all -a
7. create /var/mmfs/etc/ignoreAnyMount
8. mmmount all -a

If you are not running SCSI-3 PR, and therefore do not require direct access to all disks from the tiebreaker site, you can ensure you will not get false errors in this configuration by verifying that the line was changed.

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsconfig
Configuration data for cluster db2cluster_20110224005554.torolab.ibm.com:
----------------------------------------------------------
clusterName db2cluster_20110224005554.torolab.ibm.com
clusterId 655893150084494058
autoload yes
minReleaseLevel 3.4.0.7
dmapiFileHandleSize 32
maxFilesToCache 10000
pagepool 256M
verifyGpfsReady yes
assertOnStructureError yes
worker1Threads 150
sharedMemLimit 2047M
usePersistentReserve no
failureDetectionTime 35
leaseRecoveryWait 35
[T]
unmountOnDiskFail yes
[common]
[hostA1]
psspVsd no
adminMode allToAll
File systems in cluster db2cluster_20110224005554.torolab.ibm.com:
------------------------------------------------------------------
/dev/db2fs1
root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmgetstate -a

Node number Node name GPFS state
------------------------------------------
1 hostA1 down
2 hostA2 down
3 hostA3 down
4 hostB1 down
5 hostB2 down
6 hostB3 down
7 T down

To enable GPFS to respond to failures faster, update the failureDetectionTime and leaseRecoveryWait parameters:

root@hostA1:/> /usr/lpp/mmfs/bin/mmchconfig failureDetectionTime=30
Verifying GPFS is stopped on all nodes ...
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

root@hostA1:/> /usr/lpp/mmfs/bin/mmchconfig leaseRecoveryWait=25
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Update the GPFS cluster to use the IB private network to communicate between sites A and B. This enables the clustering software to detect network issues between the sites, and trigger failover accordingly. First, check the subnet for the IB network:

root@hostA1:/opt/IBM/db2/V10.1/bin> ping hostA1-ib0
PING hostA1-ib0.torolab.ibm.com (10.1.1.1): 56 data bytes
64 bytes from 10.1.1.1: icmp_seq=0 ttl=255 time=0 ms

In this example, subnet 10.1.1.0 includes all the IP addresses from 10.1.1.0 through 10.1.1.255:

root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmchconfig subnets=10.1.1.0
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
root@hostA1:/opt/IBM/db2/V10.1/bin> /usr/lpp/mmfs/bin/mmlsconfig
Configuration data for cluster db2cluster_20110224005554.torolab.ibm.com:
--------------------------------------------------------
clusterName db2cluster_20110224005554.torolab.ibm.com
clusterId 655893150084494058
autoload yes
minReleaseLevel 3.4.0.7
dmapiFileHandleSize 32
maxFilesToCache 10000
pagepool 256M
verifyGpfsReady yes
assertOnStructureError yes
worker1Threads 150
sharedMemLimit 2047M
usePersistentReserve no
failureDetectionTime 30
leaseRecoveryWait 25
[T]
unmountOnDiskFail yes
[common]
subnets 10.1.1.0
[hostA1]
psspVsd no
adminMode allToAll

File systems in cluster db2cluster_20110224005554.torolab.ibm.com:
------------------------------------------------------------------
/dev/db2fs1

Ensure that each site contains a shared file system configuration server so that the GPFS configuration files will be preserved in case of a disaster on one site. In this example, change the configuration servers so that hostA1 is the primary configuration server and hostB1 is the secondary configuration server:

root@hostA1> /usr/lpp/mmfs/bin/mmchcluster -p hostA1 -s hostB1

root@hostA1:/> /usr/lpp/mmfs/bin/mmlscluster

GPFS cluster information
========================
GPFS cluster name: db2cluster_20110224005554.torolab.ibm.com
GPFS cluster ID: 655893150084494058
GPFS UID domain: db2cluster_20110224005554.torolab.ibm.com
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
Primary server: hostA1.torolab.ibm.com
Secondary server: hostB1.torolab.ibm.com

Update the RSCT communication groups to disable Loose Source Routing (LSR). When LSR is disabled, RSCT will use daemon routing, which is a more reliable communication method in the event of isolated network failures. First list all the communication groups used by RSCT, and then update each separately:

root@hostA1:/> lscomg
Name Sensitivity Period Priority Broadcast SourceRouting NIMPathName NIMParameters Grace MediaType UseForNodeMembership
CG1  4           1.6    1        Yes       Yes                                     60     1 (IP)    1
CG2  4           1.6    1        Yes       Yes                                     60     1 (IP)    1
root@hostA1:/> chcomg –x r CG1
root@hostA1:/> chcomg –x r CG2
root@hostA1:/> lscomg
Name Sensitivity Period Priority Broadcast SourceRouting NIMPathName NIMParameters Grace MediaType UseForNodeMembership
CG1  4           1.6    1        Yes       No                                      60     1 (IP)    1
CG2  4           1.6    1        Yes       No                                      60     1 (IP)    1

Note that if at anytime thedb2cluster -cm -delete -domain/create domain commands are run to recreate the TSA domain, then LSR needs to be disabled again.

For better resilience during Ethernet failures, update /etc/hosts on all hosts in the cluster to contain a mapping from each host name to it’s IP address (note that from earlier in this step, host T’s /etc/hosts file will differ from the below, as its –ib0 hostnames will map to the standard Ethernet hostname). This setting prevents some DB2 Cluster Services monitor commands from hanging in the event that one of the DNS servers at a site has failed:

root:/> cat /etc/hosts
10.1.1.1	hostA1-ib0.torolab.ibm.com	hostA1-ib0
10.1.1.2 	hostA2-ib0.torolab.ibm.com	hostA2-ib0
10.1.1.3	hostA3-ib0.torolab.ibm.com	hostA3-ib0
10.1.1.4	hostB1-ib0.torolab.ibm.com	hostB1-ib0
10.1.1.5	hostB2-ib0.torolab.ibm.com	hostB2-ib0
10.1.1.6	hostB3-ib0.torolab.ibm.com	hostB3-ib0
9.26.82.1	hostA1.torolab.ibm.com	hostA1 
9.26.82.2	hostA2.torolab.ibm.com	hostA2 
9.26.82.3	hostA3.torolab.ibm.com	hostA3
9.26.82.4	hostB1.torolab.ibm.com	hostB1
9.26.82.5	hostB2.torolab.ibm.com	hostB2 
9.26.82.6	hostB3.torolab.ibm.com	hostB3
9.23.1.12   T

What to do next

After the cluster has been installed and is running, set up GPFS replication.