Contents


Configure multiple HADR databases in a DB2 instance for automated failover using Tivoli System Automation for Multiplatforms

Comments

Introduction to HADR and Tivoli System Automation for Multiplatforms

High Availability Disaster Recovery (HADR), an integrated feature of DB2, supports high availability and disaster recovery for DB2 Enterprise Server Edition and DB2 pureScale®. It is a replication technology that ships transaction logs from the primary database to a standby database, and applies the logs on the standby to keep the primary and standby databases in sync. If the primary database becomes unavailable due to a planned maintenance activity or an unplanned outage, the standby database can assume the role of primary.

Tivoli System Automation for Multiplatforms is a high-availability cluster solution. It has mechanisms to detect system failures and initiate automated corrective actions, such as restarting the resource or failing over to a backup resource without user intervention. There are several advantages to configuring HADR databases for automated failover using Tivoli System Automation for Multiplatforms.

Automated failover for HADR databases using Tivoli System Automation for Multiplatforms

If DB2 databases are configured for HADR without some type of cluster manager, a database administrator needs to manually issue the commands for the standby database to assume the role of primary. This requires that the database administrator manually monitor the systems for any failure.

Starting with DB2 9.5, Tivoli System Automation for Multiplatforms is prepackaged with your DB2 product, providing an integrated solution for configuring your HADR databases for automated failover. Basically, you create a cluster using Tivoli System Automation for Multiplatforms with the HADR primary and standby database machines. The Tivoli System Automation for Multiplatforms cluster manager can then monitor the cluster for any failure events, such as a system crash or DB2 instance crash. It then performs the required corrective action, such as restarting the instance or failing over the instance to the standby, thereby ensuring high availability.

Essential concepts

It is important to understand db2haicu (the tool provided by DB2 to set up automated failover using Tivoli System Automation for Multiplatforms), virtual IP, and automatic client reroute.

db2haicu

DB2 9.5 and later provides a text-based utility called db2haicu (DB2 high-availability instance configuration utility) for configuring and administering highly available databases. The utility requires various inputs, such as the primary and standby machine details, network details, and the HADR database names.

The utility can be run in two modes: interactive or using an XML input file. In interactive mode, the user is prompted to enter the details required to create the System Automation for Multiplatforms cluster and configure the databases for high availability. With an XML input file, the input parameters are included in the file and passed to db2haicu. The utility parses the XML file to collect various details related to the cluster and databases and does the necessary configuration.

Running the db2haicu utility in interactive mode and using an XML file are discussed later in Setting up high availability using db2haicu.

Virtual IP

A virtual IP address can be created as a public IP address client applications can use to connect to the primary database. During a system failure and subsequent failover, Tivoli System Automation for Multiplatforms assigns the virtual IP address to the standby database, which takes the role of the primary database. This allows the client applications to seamlessly connect to the new primary.

Automatic client reroute

Automatic client reroute (ACR) is a feature that automatically reroutes clients connecting to the primary database to the standby database when it becomes the primary. Configuring ACR describes how to set up ACR. In high availability setups using Tivoli System Automation for Multiplatforms, the virtual IP is the preferred method for configuring ACR.

Setting up high availability with Tivoli System Automation for Multiplatforms

This section describes the prerequisites for configuring multiple DB2 HADR database pairs running under the same instance for automated failover using Tivoli System Automation for Multiplatforms. Figure 1 shows the topology of the example network used in this tutorial.

Figure 1. Multiple Network HADR topology for automated failover using TSA
Image shows topology of the example network
Image shows topology of the example network

There are two nodes in the topology:

  • linuxnode01 — the primary node that hosts the HADR primary databases HADRDB1, HADRDB2, and HADRDB3
  • linuxnode02 — the standby node that hosts the respective HADR standby databases

The nodes are connected using two networks: a public network and a private network. The client applications connect to the HADR database using a virtual IP address hosted using the public network. The private network is used to carry out HADR replication between the primary and the standby databases. In case of unavailability of a private network, you can use the same network for HADR replication and client connection.

Setting up the network

To set up your network:

  1. Assign a static IP address to the adaptors in your network. The example uses two network adapters: eth0, which is connected to the public network for clients to connect; and eth1, which is connected to the private network for HADR replication.

    Primary (linuxnode01):

    • eth0: 9.26.97.215 (IP), 255.255.252.0 (Subnet Mask)
    • eth1: 198.72.81.88 (IP), 255.255.255.0 (Subnet Mask)

    Standby (linuxnode02):

    • eth0: 9.26.96.102 (IP), 255.255.252.0 (Subnet Mask)
    • eth1: 198.72.81.89 (IP), 255.255.255.0 (Subnet Mask)

    The public and private networks have to be on different subnet masks.

  2. Add an entry to the /etc/hosts file mapping the public IP address to their hostname:
    • 9.26.97.215 linuxnode01 linuxnode01.fullyQualifiedDomainName.com
    • 9.26.96.102 linuxnode02 linuxnode02.fullyQualifiedDomainName.com
  3. Ensure that the hostname command, when executed on each machine, gives the machine name or the fully qualified domain name as output. The same name needs to be provided when the db2haicu utility prompts for node name. In the example, the output of the hostname command on linuxnode01 must be linuxnode01 or linuxnode01.fullyQualifiedDomainName.com.
  4. Ensure that the primary and standby are able to ping each other over the public and private networks. For the example, you need to verify that the following commands complete successfully:
    1. Primary (linuxnode01)
      • ping linuxnode02
      • ping 198.72.81.89
    2. Standby (linuxnode02):
      • ping linuxnode01
      • ping 198.72.81.88
  5. Ensure that the ~/sqllib/db2nodes.cfg for machine linuxnode01 has 0 linuxnode01. The same needs to be checked for the linuxnode02 machine.
  6. Ensure that the clock is synchronized on the primary and the standby nodes.

Setting up HADR

Before setting up for high availability using Tivoli System Automation for Multiplatforms, you must set up HADR for all three databases: HADRDB1, HADRDB2, and HADRDB3.

As shown in Figure 1, we create a primary DB2 instance called db2inst1 on the primary node linuxnode01 and a standby DB2 instance called db2inst1 on the standby instance. The primary databases HADRDB1, HADRDB2, and HADRDB3 are created in the DB2 instance db2inst1.

The primary databases have to be backed up on the primary DB2 instance after enabling roll-forward recovery. The backup has to be restored on the standby DB2 instance. After this, the HADR-related database configuration parameters have to be updated on the primary and the standby databases.

The database configuration parameters to be set on the primary and standby instances for database HADRDB1 are shown in listings 1 and 2.

The output is for DB2 10.5. If you are using an earlier version, you might not see the same entries as in this example.

Listing 1. Primary node: HADR database configuration on primary
db2 get db cfg for hadrdb1 | grep -i hadr
Database Configuration for Database hadrdb1
HADR database role = PRIMARY 
HADR local host name (HADR_LOCAL_HOST) = 198.72.81.88
HADR local service name (HADR_LOCAL_SVC) = 49868
HADR remote host name (HADR_REMOTE_HOST) = 198.72.81.89
HADR remote service name (HADR_REMOTE_SVC) = 49868
HADR instance name of remote server (HADR_REMOTE_INST) = db2inst1
HADR timeout value (HADR_TIMEOUT) = 120 
HADR target list (HADR_TARGET_LIST) =
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC 
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATIC(53248)
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0 
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 300

Listing 2. Standby node: HADR database configuration on standby
db2 get db cfg for db hadrdb1 | grep -i hadr
Database Configuration for Database hadrdb
HADR database role = STANDBY 
HADR local host name (HADR_LOCAL_HOST) = 198.72.81.89
HADR local service name (HADR_LOCAL_SVC) = 49868
HADR remote host name (HADR_REMOTE_HOST) = 198.72.81.88
HADR remote service name (HADR_REMOTE_SVC) = 49868
HADR instance name of remote server (HADR_REMOTE_INST) = db2inst1
HADR timeout value (HADR_TIMEOUT) = 120 
HADR target list (HADR_TARGET_LIST) =
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC 
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATIC(53248)
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0 
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 300

Notes:

  • The HADR_PEER_WINDOW configuration parameter should be set to a value large enough to ensure that it does not expire before Tivoli System Automation for Multiplatforms fails over the primary role to the standby in the event of a failure. Hence, a minimum value of 180 seconds is recommended and can be increased depending on your environment's needs.
  • The HADR_SYNCMODE configuration parameter can be set to SYNC or NEARSYNC. The SYNC mode ensures zero data loss, while NEARSYNC mode offers a performance improvement over SYNC mode, but carries a risk of data loss.
  • The db2pd command output was changed in DB2 10.1. If you are using an earlier version, the formatting will be different.

After setting the database configuration parameters, HADR needs to be started on the standby and then on the primary using the START HADR command.

You can use the db2pd -hadr -db <dbname> command to monitor HADR on the primary or the standby instance.

Listing 3 shows the sample output of the db2pd -hadr -db <dbname> command for the HADRDB1 database.

Listing 3. db2pd –hadr –db hadrdb1 output on Standby
HADR_ROLE = STANDBY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SYNC
STANDBY_ID = 0
LOG_STREAM_ID = 0
HADR_STATE = PEER
PRIMARY_MEMBER_HOST = 198.72.81.88
PRIMARY_INSTANCE = db2inst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = 198.72.81.89
STANDBY_INSTANCE = db2inst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 02/06/2014 07:06:17.405506 (1391688377)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 1
HEARTBEAT_EXPECTED = 14775
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 19
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 65802990
STANDBY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 65802990
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 65802990
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 02/05/2014 13:44:17.000000 (1391625857)
STANDBY_LOG_TIME = 02/05/2014 13:44:17.000000 (1391625857)
STANDBY_REPLAY_LOG_TIME = 02/05/2014 13:44:17.000000 (1391625857)
STANDBY_RECV_BUF_SIZE(pages) = 4298
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 53248
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 120
PEER_WINDOW_END = 02/011/2014 08:00:25.000000 (1392123625)
READS_ON_STANDBY_ENABLED = N

You need to ensure that all the HADR databases are in PEER state before proceeding to configure high availability with the db2haicu tool. You can verify this using the HADR_STATE field in the db2pd -hadr -db <dbname> output.

Preparing the cluster

Before you use the db2haicu tool, the primary and the standby nodes must be prepared with the proper security environment.

As root user, issue the preprpnode command on the primary and the standby nodes. The node names you provide must match the output of the hostname command on each of the machines.

Listing 4. preprnode
preprpnode linuxnode01 linuxnode02

Configuring ACR

In the event of a network failure or a machine failure, the DB2 client applications will not be able to connect to the primary database. With the client-reroute feature, the client applications can attempt to connect to an alternate server when they cannot connect to the primary server.

In high-availability configurations, clients connect to the database using a virtual IP address. During a failure of the primary node, Tivoli System Automation for Multiplatforms performs a failover to the standby node, making it the primary. The virtual IP address is assigned to the new primary and client applications can connect to the new primary using the same virtual IP address.

The virtual IP becomes active only after the cluster is configured using Tivoli System Automation for Multiplatforms. Before running the db2haicu tool, ensure that the virtual IP cannot be pinged.

Important: A virtual IP is a per-database resource, not an instance resource. Thus, in the example setup, we need to identify three different virtual IPs — one for each of the databases.

To set up ACR, provide the hostname and port number for the alternate server to which the client can connect when the primary server is unavailable. The port number here is the port of the instance TCP/IP listener as set by the SVCENAME dbm cfg parameter. Ensure that the TCP/IP listener is started for both the primary and the standby instances. The example uses a port 49864 for the TCP/IP listener.

Issue the UPDATE ALTERNATE SERVER command on the standby and the primary nodes to configure the virtual IP addresses for client reroute.

Listing 5. UPDATE ALTERNATE SERVER
DB2 UPDATE ALTERNATE SERVER FOR DB HADRDB1 USING HOSTNAME 9.26.98.232 SERVER
49864

Repeat the same step for HADRDB2 and HADRDB3 using their respective virtual IPs as the hostname.

If a virtual IP address is not available, ACR can be configured with the address used by the clients to connect to the databases.

Setting up high availability using db2haicu

This section describes how to use db2haicu to automate HADR failover. As mentioned, db2haicu can be used in interactive mode, or it can take an XML file as input. In both cases, db2haicu needs to be run on the standby first followed by the primary.

Using db2haicu in interactive mode

To use db2haicu in interactive mode to set up automated failover for the three HADR databases, follow the instructions below.

Creating a cluster domain

The first step is to create the cluster domain. The db2haicu tool gathers information such as the current DB2 instance and the databases in the instance, and also activates all the databases under the instance.

  1. Issue the db2haicu command from the standby node. The following information is printed to the screen. Because no domain exists in the system, db2haicu prompts you to create a cluster domain.
    Listing 6. db2haicu output on standby
    db2haicu
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called 
    db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster 
    domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the 
    topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 
    Information Center.
    
    db2haicu determined the current DB2 database manager instance is db2inst1. The cluster 
    configuration that follows will apply to this instance. 
    
    db2haicu is collecting information on your current setup. This step may take some time as 
    db2haicu will need to activate all databases for the instance to discover all paths ...
    When you use db2haicu to configure your clustered environment, you create cluster domains. 
    For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 
    Information Center. db2haicu is searching the current machine for an existing active cluster 
    domain ...
    db2haicu did not find a cluster domain on this machine. db2haicu will now query the system 
    for information about cluster nodes to create a new cluster domain ...
    
    db2haicu did not find a cluster domain on this machine. To continue configuring your 
    clustered environment for high availability, you must create a cluster domain; otherwise, 
    db2haicu will exit.

    Create a domain and continue? [1]

    1. Yes

    2. No

    Type 1 and press Enter at the following initial prompt.

  2. Enter the unique name of the domain and the number of nodes in the domain. The example uses the name HADR_TSA_DOMAIN for the domain we want to create.

    Create a unique name for the new domain:

    HADR_TSA_DOMAIN

    Nodes must now be added to the new domain.

  3. There are two nodes in our domain: linuxnode01 is the primary node and linuxnode02 is the standby node. Enter the node names in the prompt that follows.

    How many cluster nodes will the domain HADR_TSA_DOMAIN contain?

    2

    Enter the host name of a machine to add to the domain: linuxnode01

    Enter the host name of a machine to add to the domain: linuxnode02

    db2haicu can now create a new domain containing the two machines that you specified. If you choose not to create a domain now, db2haicu will exit.

    Create the domain now? [1]

    1. Yes

    2. No

    Select 1.

    Creating domain 'HADR_TSA_DOMAIN' in the cluster
    Creating domain 'HADR_TSA_DOMAIN' in the cluster was successful.

You can see in the output that the utility indicates whether the domain creation succeeded.

Quorum configuration

After creating the domain, you need to configure a network quorum for the cluster domain. A network quorum is an IP address that can be pinged from both the primary and the standby nodes. In the event of a site failure, the quorum decides which node serves as the active node and which node goes offline. When you are choosing the network quorum, ensure that the IP remains active all the time. The DNS server IP is always a good choice for the network quorum.

After the domain is created, db2haicu prompts you to enter configuration values for the quorum device

You can now configure a quorum device for the domain. For more information, see the topic "Quorum devices" in the DB2 Information Center. If you do not configure a quorum device for the domain, then a human operator has to manually intervene if subsets of machines in the cluster lose connectivity.

Configure a quorum device for the domain called HADR_TSA_DOMAIN? [1]

1. Yes

2. No

Select 1.

Select a quorum device. Network quorum is the only choice, so type 1 and press Enter.

The following is a list of supported quorum device types:

1. Network Quorum

Enter the number corresponding to the quorum device type to be used: 1

In the prompt that follows, specify the network address of the quorum device:

Specify 9.26.96.1

Configuring quorum device for domain 'HADR_TSA_DOMAIN'
Configuring quorum device for domain 'HADR_TSA_DOMAIN' was successful.

The output shows whether the quorum device was successfully configured.

Network setup

After defining the quorum device, you need to define the networks used in the setup. The example uses two networks: a public network, which the client applications use to connect to the database via a virtual IP address; and a private network, on which the HADR replication is carried out. In this part of the setup, db2haicu automatically discovers any network interface cards (NICs) and prompts you to create networks for them.

Create networks for these network interface cards? [1]

1. Yes

2. No

Select 1.

Define each network on each node as either public or private. As shown in the following example output, we want to add the eth0 interface of both the nodes to the public network and the eth1 interface to the private network.

Enter the name of the network for the network interface card: 'eth0' on cluster node: linuxnode01

1. Create a new public network for this network interface card.

2. Create a new private network for this network interface card.

Select 1.

Are you sure you want to add the network interface card eth0 on cluster node linuxnode01 to the network db2_public_network_0? [1]

1. Yes

2. No

Select 1.

Adding network interface card 'eth0' on cluster node 'linuxnode01' to the network
'db2_public_network_0' ...
Adding network interface card 'eth0' on cluster node 'linuxnode01' to the network
'db2_public_network_0' was successful.

Enter the name of the network for the network interface card: 'eth0' on cluster node: linuxnode02.

1. db2_public_network_0

2. Create a new public network for this network interface card.

3. Create a new private network for this network interface card.

Select 1.

Are you sure you want to add the network interface card 'eth0' on cluster node 'linuxnode02' to the network db2_public_network_0? [1]

1. Yes

2. No

Select 1.

Adding network interface card 'eth0' on cluster node 'linuxnode02' to the network
'db2_public_network_0' ...
Adding network interface card 'eth0' on cluster node 'linuxnode02' to the network
'db2_public_network_0' was successful.

Enter the name of the network for the network interface card: 'eth1' on cluster node: 'linuxnode02'

1. db2_public_network_0

2. Create a new public network for this network interface card.

3. Create a new private network for this network interface card.

Select 3.

Are you sure you want to add the network interface card 'eth1' on cluster node 'linuxnode02' to the network 'db2_private_network_0'? [1] [1]

1. Yes

2. No

Select 1.

Adding network interface card 'eth1' on cluster node 'linuxnode02' to the network
'db2_private_network_0' ...
Adding network interface card 'eth1' on cluster node 'linuxnode02' to the network
'db2_private_network_0' was successful.

Enter the name of the network for the network interface card: 'eth1' on cluster node: linuxnode01

1. db2_private_network_0

2. db2_public_network_0

3. Create a new public network for this network interface card.

4. Create a new private network for this network interface card.

Select 1.

Are you sure you want to add the network interface card 'eth1' on cluster

node 'linuxnode01' to the network 'db2_private_network_0'? [1] [1]

1. Yes

2. No

Select 1.

Adding network interface card 'eth1' on cluster node 'linuxnode01' to the network
'db2_private_network_0'...
Adding network interface card 'eth1' on cluster node 'linuxnode01' to the network
'db2_private_network_0' was successful.

Selecting the cluster manager

At this point, db2haicu prompts you to select the cluster manager for configuring high availability where you can select Tivoli System Automation for Multiplatforms or specify a different vendor. The example uses TSA (a legacy name for Tivoli System Automation for Multiplatforms) for this configuration, and db2haicu adds the database partition instance running the standby HADR database.

Listing 7. Selecting the cluster manager
The cluster manager name configuration parameter (high-availability configuration  
parameter) is not set. For more information, see the topic "cluster_mgr - Cluster 
manager name configuration parameter" in the DB2 Information Center. Do you want 
to set the high-availability configuration  parameter?

The following are valid settings for the high-availability configuration  parameter:
1.TSA
2.Vendor
Enter a value for the high-availability configuration  parameter:[1]
1
Setting a high-availability configuration  parameter for instance 'db2inst1' to 'TSA'.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.

Automating HADR failover

After adding the DB2 standby instance to the cluster domain, db2haicu prompts you to confirm HADR automation for each database in the instance. In the example, db2haicu prompts to enable failover for databases HADRDB1, HADRDB2, and HADRDB3.

Listing 8. Configuring databases for automatic HADR failover
Do you want to validate and automate HADR failover for the HADR database 'HADRDB1'? 
[1]
1. Yes
2.No
1
Adding HADR database HADRDB1 to the domain …
The HADR database 'HADRDB1' has been determined to be valid for high availability. However, 
the database cannot be added to the cluster from this node because db2haicu detected this 
node is the standby for the HADR database 'HADRDB1'. Run db2haicu on the primary for 
the HADR database 'HADRDB1' to configure the database for automated failover.
Do you want to validate and automate HADR failover for the HADR database 'HADRDB2'? 
[1]
1. Yes
2.No
1
Adding HADR database 'HADRDB2' to the domain …
The HADR database 'HADRDB2' has been determined to be valid for high availability. However, 
the database cannot be added to the cluster from this node because db2haicu detected this 
node is the standby for the HADR database 'HADRDB2'. Run db2haicu on the primary for
HADR database 'HADRDB2' to configure the database for automated failover.
Do you want to validate and automate HADR failover for the HADR database 'HADRDB3'? 
[1]
1. Yes
2.No
1
Adding HADR database 'HADRDB3' to the domain …
The HADR database 'HADRDB3' has been determined to be valid for high availability. However, 
the database cannot be added to the cluster from this node because db2haicu detected this 
node is the standby for the HADR database 'HADRDB3'. Run db2haicu on the primary for 
HADR database 'HADRDB3' to configure the database for automated failover.
All cluster configurations have been completed successfully, db2haicu exiting …

Primary instance and virtual IP address setup

After the standby instance has been configured, db2haicu has to be run from the primary DB2 instance.

Listing 9. db2haicu output on primary
$db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file 
called db2diag.log. Also, you can use the utility called db2pd to query the status of the 
cluster domains you create.

For more information about configuring your clustered environment using db2haicu, 
see the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' 
in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2inst1'. The 
cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some 
time as db2haicu will need to activate all databases for the instance to discover all 
paths ...
When you use db2haicu to configure your clustered environment, you create cluster 
domains. For more information, see the topic 'Creating a cluster domain with 
db2haicu' in the DB2 Information Center. db2haicu is searching the current machine
for an existing active cluster domain ...
db2haicu found a cluster domain called HADR_TSA_DOMAIN on this machine. 
The cluster configuration that follows will apply to this domain.

Similar to what occurred with the standby, the tool prompts you to select the cluster manager. We are using TSA. The tool then detects all the databases created under the instance and asks whether to enable HADR failover for the database. Here, you enable HADR failover for all three databases (HADRDB1, HADRDB2, and HADRDB3). After the HADR database has been added to the cluster, db2haicu prompts you to add a virtual IP address for each HADR database:

Retrieving high-availability configuration parameter for instance 'db2inst1'

The cluster manager name configuration parameter (high-availability configuration parameter) is not set. For more information, see the topic "cluster_mgr — Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the high-availability configuration parameter?

The following are valid settings for the high-availability configuration parameter:

1. TSA

2. Vendor

Enter a value for the high-availability configuration parameter: [1]

Select 1.

Setting a high-availability configuration parameter for instance 'db2inst1' to 'TSA'.

Adding DB2 database partition '0' to the cluster
Adding DB2 database partition '0' to the cluster was successful.

Do you want to validate and automate HADR failover for the HADR database 'HADRDB1'? [1]

1. Yes

2. No

Select 1.

Adding HADR database 'HADRDB1' to the domain
Adding HADR database 'HADRDB1' to the domain was successful.

Do you want to configure a virtual IP address for the HADR database 'HADRDB1'? [1]

1. Yes

2. No

Select 1.

Enter the virtual IP address:

9.26.98.232

Enter the subnet mask for the virtual IP address '9.26.98.232': [255.255.255.0]

255.255.252.0

Select the network for the virtual IP '9.26.98.232':

1. db2_private_network_0

2. db2_public_network_0

Select 2.

Adding virtual IP address '9.26.98.232' to the domain
Adding virtual IP address '9.26.98.232' to the domain was successful.

Do you want to validate and automate HADR failover for the HADR database 'HADRDB2'? [1]

1. Yes

2. No

Select 1.

Adding HADR database 'HADRDB2' to the domain
Adding HADR database 'HADRDB2' to the domain was successful.

Do you want to configure a virtual IP address for the HADR database 'HADRDB2'? [1]

1. Yes

2. No

Select 1.

Enter the virtual IP address:

9.26.98.181

Enter the subnet mask for the virtual IP address '9.26.98.181': [255.255.255.0]

255.255.252.0

Select the network for the virtual IP '9.26.98.181':

1. db2_private_network_0

2. db2_public_network_0

Select 2.

Adding virtual IP address '9.26.98.181' to the domain
Adding virtual IP address '9.26.98.181' to the domain was successful.

Do you want to validate and automate HADR failover for the HADR database 'HADRDB3'? [1]

1. Yes

2. No

Select 1.

Adding HADR database 'HADRDB3' to the domain
Adding HADR database 'HADRDB3' to the domain was successful.

Do you want to configure a virtual IP address for the HADR database 'HADRDB3'? [1]

1. Yes

2. No

Select 1.

Enter the virtual IP address:

9.26.97.208

Enter the subnet mask for the virtual IP address '9.26.97.208': [255.255.255.0]

255.255.252.0

Select the network for the virtual IP '9.26.97.208':

1. db2_private_network_0

2. db2_public_network_0

Select 2.

Adding virtual IP address '9.26.97.208' to the domain
Adding virtual IP address '9.26.97.208' to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting

Verifying your cluster configuration

After setting up the cluster, you can use either of the following two methods to verify the cluster setting:

  • Issue the lssam command as root authority from any of the primary or the standby nodes.
    Listing 10. lssam output as root
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs
    |- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_linuxnode02_0-rg Nominal=Online
    '- Online IBM.Application:db2_db2inst1_linuxnode02_0-rs
    '- Online IBM.Application:db2_db2inst1_linuxnode02_0-rs:linuxnode02
    Online IBM.Equivalency:db2_private_network_0
    |- Online IBM.NetworkInterface:eth1:linuxnode02
    '- Online IBM.NetworkInterface:eth1:linuxnode01
    Online IBM.Equivalency:db2_public_network_0
    |- Online IBM.NetworkInterface:eth0:linuxnode01
    '- Online IBM.NetworkInterface:eth0:linuxnode02
    Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB1-rg_group-equ
    |- Online IBM.PeerNode:linuxnode01:linuxnode01
    '- Online IBM.PeerNode:linuxnode02:linuxnode02
    Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB2-rg_group-equ
    |- Online IBM.PeerNode:linuxnode01:linuxnode01
    '- Online IBM.PeerNode:linuxnode02:linuxnode02
    Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB3-rg_group-equ
    |- Online IBM.PeerNode:linuxnode01:linuxnode01
    '- Online IBM.PeerNode:linuxnode02:linuxnode02
    Online IBM.Equivalency:db2_db2inst1_linuxnode02_0-rg_group-equ
    '- Online IBM.PeerNode:linuxnode02:linuxnode02

    The lssam output shows the states of the instance, database, and network in the cluster. You can see resource groups created for each of the HADR databases (HADRDB1, HADRDB2, and HADRDB3) in the output. The virtual IP address created for each database is shown as the resource group Service IP. For all three databases, the service IP remains Online in the primary instance, linuxnode01, and Offline in the standby instance. During a failover, Tivoli System Automation for Multiplatforms assigns the IP address to the standby instance and it becomes Online in the standby and Offline in the primary.

  • Issue the db2pd –ha command from the instance owner id to get the state of the resources.
    Listing 11. db2pd –ha output
    DB2 HA Status
    Instance Information:
    Instance Name = db2inst1
    Number Of Domains = 1
    Number Of RGs for instance = 3
    
    Domain Information:
    Domain Name = HADR_TSA_DOMAIN
    Cluster Version = 3.1.2.2
    Cluster State = Online
    Number of nodes = 2
    
    Node Information:
    Node Name State
    --------------------- -------------------
    linuxnode02 Online
    linuxnode01 Online
    
    Resource Group Information:
    Resource Group Name = db2_db2inst1_db2inst1_HADRDB3-rg
    Resource Group LockState = Unlocked
    Resource Group OpState = Online
    Resource Group Nominal OpState = Online
    Number of Group Resources = 2
    Number of Allowed Nodes = 2
    Allowed Nodes
    -------------
    linuxnode01
    linuxnode02
    Member Resource Information:
    Resource Name = db2ip_9_26_97_208-rs
    Resource State = Online
    Resource Type = IP
    Resource Name = db2_db2inst1_db2inst1_HADRDB3-rs
    Resource State = Online
    Resource Type = HADR
    HADR Primary Instance = db2inst1
    HADR Secondary Instance = db2inst1
    HADR DB Name = HADRDB3
    HADR Primary Node = linuxnode01
    HADR Secondary Node = linuxnode02
    
    Resource Group Name = db2_db2inst1_db2inst1_HADRDB2-rg
    Resource Group LockState = Unlocked
    Resource Group OpState = Online
    Resource Group Nominal OpState = Online
    Number of Group Resources = 2
    Number of Allowed Nodes = 2
    Allowed Nodes
    -------------
    linuxnode01
    linuxnode02

    With the cluster configuration successfully completed, all the resource states should be Online and the lock states should be Unlocked.

Setting up a cluster using db2haicu XML mode

The db2haicu utility can also work in XML mode where all the inputs required by the tool need to be passed as an XML file. Listing 12 shows an XML file for the topology used in our example.

Listing 12. Sample XML file
<DB2Cluster xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="db2ha.xsd" clusterManagerName="TSA"
version="1.0">
<ClusterDomain domainName="HADR_TSA_DOMAIN">
<Quorum quorumDeviceProtocol="network" quorumDeviceName="9.26.96.1"/>
<PhysicalNetwork physicalNetworkName="db2_public_network_0"
physicalNetworkProtocol="ip">
<Interface interfaceName="eth0" clusterNodeName="linuxnode01">
<IPAddress baseAddress="9.26.97.215" subnetMask="255.255.252.0"
networkName="db2_public_network_0"/>
</Interface>
<Interface interfaceName="eth0" clusterNodeName="linuxnode02">
<IPAddress baseAddress="9.26.96.102" subnetMask="255.255.252.0"
networkName="db2_public_network_0"/>
</Interface>
</PhysicalNetwork>
<PhysicalNetwork physicalNetworkName="db2_private_network_0"
physicalNetworkProtocol="ip">
<Interface interfaceName="eth1" clusterNodeName="linuxnode01">
<IPAddress baseAddress="198.72.81.88" subnetMask="255.255.255.0"
networkName="db2_private_network_0"/>
</Interface>
<Interface interfaceName="eth1" clusterNodeName="linuxnode02">
<IPAddress baseAddress="198.72.81.89" subnetMask="255.255.255.0"
networkName="db2_private_network_0"/>
</Interface>
</PhysicalNetwork>
<ClusterNode clusterNodeName="linuxnode01"/>
<ClusterNode clusterNodeName="linuxnode02"/>
</ClusterDomain>
<FailoverPolicy>
<HADRFailover></HADRFailover>
</FailoverPolicy>
<DB2PartitionSet>
<DB2Partition dbpartitionnum="0" instanceName="db2inst1">
</DB2Partition>
</DB2PartitionSet>
<HADRDBSet>
<HADRDB databaseName="HADRDB1" localInstance="db2inst1"
remoteInstance="db2inst1" localHost="linuxnode01" remoteHost="linuxnode02" />
<VirtualIPAddress baseAddress="9.26.98.232" subnetMask="255.255.252.0"
networkName="db2_public_network_0"/>
</HADRDBSet>
<HADRDBSet>
<HADRDB databaseName="HADRDB2" localInstance="db2inst1"
remoteInstance="db2inst1" localHost="linuxnode01" remoteHost="linuxnode02" />
<VirtualIPAddress baseAddress="9.26.98.181" subnetMask="255.255.252.0"
networkName="db2_public_network_0"/>
</HADRDBSet>
<HADRDBSet>
<HADRDB databaseName="HADRDB3" localInstance="db2inst1"
remoteInstance="db2inst1" localHost="linuxnode01" remoteHost="linuxnode02" />
<VirtualIPAddress baseAddress="9.26.97.208" subnetMask="255.255.252.0"
networkName="db2_public_network_0"/>
</HADRDBSet>

In the XML file:

  • The <ClusterDomain> element contains the cluster-wide information, such as quorum, node information, and domain name.
  • The <PhysicalNetwork> sub-element of the <ClusterDomain> element contains all the network-related information for the public network, private network, and the NIC used.
  • The <FailoverPolicy> element specifies the failover policy to be used by the cluster manager.
  • The <DB2PartitionSet> element contains the instance information such as instance name and the partition number.
  • The <HADRDBSet> element covers the HADR database information. It contains the database name, primary and standby instance names, primary and standby node names, and the virtual IP address associated with the database.

To configure automated failover using db2haicu XML mode:

  1. Log on to the standby instance.
  2. Issue the db2haicu -f < XML file name > command on the standby instance.

    The db2haicu command uses the XML file to do all the necessary configurations on the standby side. In case of an error in the XML file or the configuration, db2haicu exits with a non-zero error code. A sample output of using db2haicu on the standby instance is shown below.

    Listing 13. Sample output of db2haicu using XML file on standby
    $ db2haicu -f db2ha_sample_HADR.xml
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called 
    db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster 
    domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the 
    topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 
    Information Center.
    
    db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
    configuration that follows will apply to this instance.
    
    db2haicu is collecting information on your current setup. This step may take some time as 
    db2haicu will need to activate all databases for the instance to discover all paths …
    Creating domain 'HADR_TSA_DOMAIN' in the cluster …
    Creating domain 'HADR_TSA_DOMAIN' in the cluster was successful.
    Configuring quorum device for domain 'HADR_TSA_DOMAIN' ...
    Configuring quorum device for domain 'HADR_TSA_DOMAIN' was successful.
    Adding network interface card 'eth0' on cluster node 'linuxnode01' to the network 
    'db2_public_network_0' ...
    Adding network interface card 'eth0' on cluster node 'linuxnode01' to the network 
    'db2_public_network_0' was successful.
    Adding network interface card 'eth0' on cluster node 'linuxnode02' to the network 
    'db2_public_network_0' ...
    Adding network interface card 'eth0' on cluster node 'linuxnode02' to the network 
    'db2_public_network_0' was successful.
    Adding network interface card 'eth1' on cluster node 'linuxnode01' to the network 
    'db2_private_network_0' ...
    Adding network interface card 'eth1' on cluster node 'linuxnode01' to the network 
    'db2_private_network_0' was successful.
    Adding network interface card 'eth1' on cluster node 'linuxnode02' to the network 
    'db2_private_network_0' ...
    Adding network interface card 'eth1' on cluster node 'linuxnode02' to the network 
    'db2_private_network_0' was successful.
    Adding DB2 database partition '0' to the cluster ...
    Adding DB2 database partition '0' to the cluster was successful.
    HADR database 'HADRDB1' has been determined to be valid for high availability. However, 
    the database cannot be added to the cluster from this node because db2haicu detected this 
    node is the standby for HADR database 'HADRDB1'. Run db2haicu on the primary for 
    HADR database 'HADRDB1' to configure the database for automated failover.
    HADR database 'HADRDB2' has been determined to be valid for high availability. However, 
    the database cannot be added to the cluster from this node because db2haicu detected this 
    node is the standby for HADR database 'HADRDB2'. Run db2haicu on the primary for 
    HADR database 'HADRDB2' to configure the database for automated failover.
    HADR database 'HADRDB3' has been determined to be valid for high availability. However, 
    the database cannot be added to the cluster from this node because db2haicu detected this 
    node is the standby for HADR database 'HADRDB3'. Run db2haicu on the primary for 
    HADR database 'HADRDB3' to configure the database for automated failover.
    All cluster configurations have been completed successfully. db2haicu exiting
  3. Log on to the primary instance.
  4. Issue the db2haicu -f <XML file name> command on the primary instance.

    db2haicu now configures the primary instance and, in case of any error encountered, exits with a non-zero error code. Sample output of using db2haicu on the primary instance is shown below.

    Listing 14. Sample output of db2haicu using XML file on primary
    $db2haicu -f db2ha_sample_HADR.xml
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called 
    db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster 
    domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the 
    topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 
    Information Center.
    
    db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
    configuration that follows will apply to this instance.
    
    db2haicu is collecting information on your current setup. This step may take some time as 
    db2haicu will need to activate all databases for the instance to discover all paths ...
    Configuring quorum device for domain 'HADR_TSA_DOMAIN' ... 
    Configuring quorum device for domain 'HADR_TSA_DOMAIN' was successful.
    Network adapter 'eth0' on node 'linuxnode01' is already defined in network 
    'db2_public_network_0' and cannot be added to another network until it is removed from its
    current network.
    Network adapter 'eth0' on node 'linuxnode02' is already defined in network 
    'db2_public_network_0' and cannot be added to another network until it is removed from its 
    current network.
    Network adapter 'eth1' on node 'linuxnode01' is already defined in network 
    'db2_private_network_0' and cannot be added to another network until it is removed from its 
    current network.
    Network adapter 'eth1' on node 'linuxnode02' is already defined in network 
    'db2_private_network_0' and cannot be added to another network until it is removed from its 
    current network.
    Adding DB2 database partition '0' to the cluster ...
    Adding DB2 database partition '0' to the cluster was successful.
    Adding HADR database 'HADRDB1' to the domain ...
    Adding HADR database 'HADRDB1' to the domain was successful.
    Adding HADR database 'HADRDB2' to the domain ...
    Adding HADR database 'HADRDB2' to the domain was successful.
    Adding HADR database 'HADRDB3' to the domain ...
    Adding HADR database 'HADRDB3' to the domain was successful.
    All cluster configurations have been completed successfully. db2haicu exiting ...

As with interactive mode, you can do post-configuration verification of cluster states using the lssam command or the db2pd command with the -ha option.

Cluster behavior during outages

This section discusses the cluster behavior during outages of the primary or standby instance and behavior while performing a role-switch activity using graceful takeover. The examples still assume the topology shown in Figure 1.

Graceful takeover

When a graceful takeover is performed for any of the databases from the standby node, linuxnode02, the standby assumes the role of the primary. After the takeover command completes successfully, Tivoli System Automation for Multiplatforms assigns the virtual IP address for this database to the new primary.

  1. Issue the TAKEOVER HADR command from the standby node linuxnode02, as in Listing 15.
    Listing 15. TAKEOVER HADR
      DB2 TAKEOVER HADR ON DB HADRDB1
    DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
  2. After the command completes successfully, issue the lssam command from the primary or the standby node. Listing 16 shows the output after takeover.
    Listing 16. lssam output after takeover
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs
    |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
    '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs
    |- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs
    |- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02

The lssam output shows that Tivoli System Automation for Multiplatforms has moved the resource groups for the database HADRDB1 to the new primary: linuxnode02.

The virtual IP address for database HADRDB1, 9.26.98.232, is also now online on the node linuxnode02. Any client applications that connected to the database HADRDB1 on the old primary, linuxnode01, now connect to the new primary, linuxnode02, using this virtual IP address.

The database resources for the other databases, HADRDB2 and HADRDB3, continue to be on the node linuxnode01 because the graceful takeover was issued only for the HADRDB1 database.

Unplanned outage in the primary or the standby instance

In the event of a primary or standby instance getting abnormally terminated, Tivoli System Automation for Multiplatforms attempts to restart the instance and bring the cluster back to normal state. To simulate this scenario:

  1. Issue the db2_kill command on the primary instance.
  2. Issue the lssam command after the kill. As the resulting output shows, the HADR resources for all three databases and the DB2 instance resources on the primary instance changed their state from Online to Pending Online.
    Listing 17. lssam output after issuing db2_kill on primary
    Online IBM.ResourceGroup:db2_db2inst1_linuxnode01_0-rg Nominal=Online
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs:linuxnode01
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs
    |- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02
  3. Monitor the progress of the restart by periodically issuing the lssam command. After the instance gets restarted by the cluster manager, the Pending online state changes back to Online. The lssam output will look similar to the example below.
    Listing 18. lssam output after resources comes online
    Online IBM.ResourceGroup:db2_db2inst1_linuxnode01_0-rg Nominal=Online
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs:linuxnode01
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs
    |- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs
    |- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02

In this scenario, the behavior is similar if you issue a db2_kill command on the standby instance. The HADR and DB2 instance resources on the standby instance change their state to Pending online, and after the instance gets restarted, it comes back to Online.

Unplanned outage of the primary or standby machine

In certain cases, such as a system crash, the attempt to restart the failed instance in the same machine fails and Tivoli System Automation for Multiplatforms performs a failover of the HADR resources to the standby machine.

To simulate a primary machine failure:

  1. Unplug the power supply.
  2. Issue the lssam command on the standby node. In the example output below, all the resource groups associated with the primary node are in Failed Offline state.
    Listing 19. lssam output after unplugging power supply on standby node
    Failed offline IBM.ResourceGroup:db2_db2inst1_linuxnode01_0-rg Control=MemberInProblemState
    Nominal=Online
    '- Failed offline IBM.Application:db2_db2inst1_linuxnode01_0-rs Control=MemberInProblemState
    '- Failed offline IBM.Application:db2_db2inst1_linuxnode01_0-rs:linuxnode01 Node=Offline
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Control=MemberInProblemState
    Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs Control=MemberInProblemState
    |- Failed offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01 Node=Offline
    '- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs Control=MemberInProblemState
    |- Failed offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01 Node=Offline
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Control=MemberInProblemState
    Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs Control=MemberInProblemState
    |- Failed offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01 Node=Offline
    '- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs Control=MemberInProblemState
    |- Failed offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01 Node=Offline
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Pending online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Control=MemberInProblemState
    Nominal=Online
    |- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs Control=MemberInProblemState
    |- Failed offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01 Node=Offline
    '- Pending online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs Control=MemberInProblemState
    |- Failed offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02

    In this scenario, clients fail to connect to the database and the cluster manager initiates a failover for the HADR resource groups.

    After the standby machine, linuxnode02, pings and acquires the network quorum, Tivoli System Automation for Multiplatforms assigns the virtual IP address for each of the HADR databases (HADRDB1, HADRDB2, and HADRDB3) to the eth0 NIC on the standby machine. Tivoli System Automation for Multiplatforms then initiates a forced takeover (TAKEOVER HADR ON DB HADRDB1 BY FORCE PEER WINDOW ONLY), resulting in the standby machine assuming the role of primary.

  3. Restore power to the old primary machine, linuxnode01. After the primary machine comes up, Tivoli System Automation for Multiplatforms starts the instance, and all three old primary databases are reintegrated as standbys.
  4. To monitor that HADR replication resumes and that the primary and standby eventually reach PEER state, issue the lssam command.
    Listing 20. lssam output after power restore on primary
    Online IBM.ResourceGroup:db2_db2inst1_linuxnode01_0-rg Nominal=Online
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs
    '- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs:linuxnode01
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs
    |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
    '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs
    |- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
    '- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs
    |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
    '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs
    |- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
    '- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Nominal=Online
    |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs
    |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
    '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs
    |- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
    '- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02

If a standby machine goes down, the HADR resources for all the databases will move to Failed Offline state, similar to the example above. After the standby machine gets restarted, the standby instance is automatically started, and the standby databases eventually reach PEER state with the primary.

Maintaining the Tivoli System Automation for Multiplatforms cluster

db2haicu provides commands for maintaining your System Automation for Multiplatforms cluster, such as disabling and enabling high availability. It also provides an option for deleting the System Automation for Multiplatforms cluster completely.

Disabling and enabling Tivoli System Automation for Multiplatforms failover

After a cluster has been configured for automated failover, you disable it temporarily using the db2haicu command with the -disable option. After high availability is disabled, Tivoli System Automation for Multiplatforms will not perform any automated failover in the event of a system failure in the primary node.

The snapshot in Listing 21 shows the db2haicu command executed with the -disable option.

Listing 21. db2haicu –disable output
db2haicu -disable
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called 
db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you 
create.

For more information about configuring your clustered environment using db2haicu, see the topic 
called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information 
Center.

db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
configuration that follows will apply to this instance.

Are you sure you want to disable high availability (HA) for the database instance 'db2inst1'. This will 
lock all the resource groups for the instance and disable the HA configuration parameter. The 
instance will not failover if a system outage occurs while the instance is disabled. You will need to 
run db2haicu again to enable the instance for HA. Disable HA for the instance 'db2inst1'? [1]
1. Yes
2. No
1
Disabling high availability for instance 'db2inst1' ...
Locking the resource group for HADR database 'HADRDB3' ...
Locking the resource group for HADR database 'HADRDB3' was successful.
Locking the resource group for HADR database 'HADRDB2' ...
Locking the resource group for HADR database 'HADRDB2' was successful.
Locking the resource group for HADR database 'HADRDB1' ...
Locking the resource group for HADR database 'HADRDB1' was successful.
Locking the resource group for DB2 database partition '0' ...
Locking the resource group for DB2 database partition '0' was successful.
Locking the resource group for DB2 database partition '0' ...
Locking the resource group for DB2 database partition '0' was successful.
Disabling high availability for instance 'db2inst1' was successful.
All cluster configurations have been completed successfully. db2haicu exiting …

The following listing shows the lssam output after high availability is disabled for all databases.

Listing 22. lssam output after disabling HA
lssam
Online IBM.ResourceGroup:db2_db2inst1_linuxnode01_0-rg Request=Lock Nominal=Online
'- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs Control=SuspendedPropagated
'- Online IBM.Application:db2_db2inst1_linuxnode01_0-rs:linuxnode01
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB1-rg Request=Lock Nominal=Online
|- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs Control=SuspendedPropagated
|- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode01
'- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB1-rs:linuxnode02
'- Online IBM.ServiceIP:db2ip_9_26_98_232-rs Control=SuspendedPropagated
|- Offline IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode01
'- Online IBM.ServiceIP:db2ip_9_26_98_232-rs:linuxnode02
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB2-rg Request=Lock Nominal=Online
|- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs Control=SuspendedPropagated
|- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode01
'- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB2-rs:linuxnode02
'- Online IBM.ServiceIP:db2ip_9_26_98_181-rs Control=SuspendedPropagated
|- Offline IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode01
'- Online IBM.ServiceIP:db2ip_9_26_98_181-rs:linuxnode02
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB3-rg Request=Lock Nominal=Online
|- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs Control=SuspendedPropagated
|- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode01
'- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB3-rs:linuxnode02
'- Online IBM.ServiceIP:db2ip_9_26_97_208-rs Control=SuspendedPropagated
|- Offline IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode01
'- Online IBM.ServiceIP:db2ip_9_26_97_208-rs:linuxnode02
Online IBM.ResourceGroup:db2_db2inst1_linuxnode02_0-rg Request=Lock Nominal=Online
'- Online IBM.Application:db2_db2inst1_linuxnode02_0-rs Control=SuspendedPropagated
'- Online IBM.Application:db2_db2inst1_linuxnode02_0-rs:linuxnode02
Continues…

After high availability has been disabled, it can be re-enabled by running db2haicu without any options. The utility detects that high availability has been disabled and prompts the user whether to enable high availability. The example below shows how to enable high availability after it has been disabled.

Listing 23. db2haicu output for enabling high availability
db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called 
db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you 
create.

For more information about configuring your clustered environment using db2haicu, see the topic 
called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information
 Center.
 
db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu 
will need to activate all databases for the instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you create cluster domains. For 
more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 Information 
Center. db2haicu is searching the current machine for an existing active cluster domain ...
db2haicu found a cluster domain called 'HADR_TSA_DOMAIN' on this machine. The cluster 
configuration that follows will apply to this domain.

db2haicu has detected that high availability has been disabled for the instance 'db2inst1'. Do you 
want to enable high availability for the instance 'db2inst1'? [1]
1. Yes
2. No
1
Retrieving high-availability configuration parameter for instance 'db2inst1' ...
The cluster manager name configuration parameter (high-availability configuration parameter) is not
set. For more information, see the topic "cluster_mgr - Cluster manager name configuration
parameter" in the DB2 Information Center. Do you want to set the high-availability configuration
parameter?
The following are valid settings for the high-availability configuration parameter:
1.TSA
2.Vendor
Enter a value for the high-availability configuration  parameter: [1]
1
Setting a high-availability configuration  parameter for instance 'db2inst1' to 'TSA'.
Enabling high availability for instance 'db2inst1' ...
Enabling high availability for instance 'db2inst1' was successful.
All cluster configurations have been completed successfully. db2haicu exiting ..

Deleting the Tivoli System Automation for Multiplatforms domain

To completely delete the Tivoli System Automation for Multiplatforms domain you created and remove the automatic failover for the HADR databases, you can run the db2haicu command with the -delete option.

  1. On the standby node, issue the db2haicu command with the -delete option. In the example it is issued from node linuxnode02.
    Listing 24. db2haicu –delete output on standby
    db2haicu -delete
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called 
    db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster 
    domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the 
    topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 
    Information Center.
    
    db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
    configuration that follows will apply to this instance.
    
    When you use db2haicu to configure your clustered environment, you create cluster domains. 
    For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 
    Information Center. db2haicu is searching the current machine for an existing active cluster 
    domain ...
    db2haicu found a cluster domain called 'HADR_TSA_DOMAIN' on this machine. The 
    cluster configuration that follows will apply to this domain.
    
    Removing HADR database 'HADRDB3' from the domain ...
    Removing HADR database 'HADRDB3' from the domain was successful.
    Removing HADR database 'HADRDB2' from the domain ...
    Removing HADR database 'HADRDB2' from the domain was successful.
    Removing HADR database 'HADRDB1' from the domain ...
    Removing HADR database 'HADRDB1' from the domain was successful.
    Removing DB2 database partition '0' from the cluster ...
    Removing DB2 database partition '0' from the cluster was successful.
    All cluster configurations have been completed successfully. db2haicu exiting …
  2. On the primary node, issue the db2haicu command with the -delete option. In the example it is issued from node linuxnode01.
    Listing 25. db2haicu –delete output on primary
    db2haicu -delete
    Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
    
    You can find detailed diagnostic information in the DB2 server diagnostic log file called 
    db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster 
    domains you create.
    
    For more information about configuring your clustered environment using db2haicu, see the 
    topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 
    Information Center.
    
    db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster 
    configuration that follows will apply to this instance.
    
    When you use db2haicu to configure your clustered environment, you create cluster domains. 
    For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 
    Information Center. db2haicu is searching the current machine for an existing active 
    cluster domain ...
    db2haicu found a cluster domain called 'HADR_TSA_DOMAIN' on this machine. The 
    cluster configuration that follows will apply to this domain.
    
    Removing DB2 database partition '0' from the cluster ...
    Removing DB2 database partition '0' from the cluster was successful.
    Deleting the domain 'HADR_TSA_DOMAIN' from the cluster ...
    Deleting the domain 'HADR_TSA_DOMAIN' from the cluster was successful.
    All cluster configurations have been completed successfully. db2haicu exiting ...

Conclusion

In this article, you learned how to set up automated failover for multiple HADR databases in a DB2 instance using Tivoli System Automation for Multiplatforms. We discussed two methods to set up high availability using db2haicu: using an XML file and in interactive mode. We also explored cluster behavior during different outage scenarios.

Acknowledgements

Thanks to Phil Stedman, Information Management, IBM Software Group, for his contributions to this article.

We would also like to thank Rob Causley for his editorial contributions.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=983816
ArticleTitle=Configure multiple HADR databases in a DB2 instance for automated failover using Tivoli System Automation for Multiplatforms
publish-date=09222014