Scenario - Deploying a two-sites multiple-standby cluster on Amazon Web Services with three hosts
This scenario provides details on the planning, configuration, and deployment of three-host clusters on Amazon Web Services (AWS) in multiple availability zones. Pacemaker is used exclusively as the cluster manager for this scenario and IBM® Tivoli® System Automation for Multiplatforms (SA MP) is not supported.
Another disaster recovery scenario is to use a two-site, multiple-standby cluster with same-site failover automation deployment. For more information about this scenario, see Scenario - Deploying a two-sites multiple standby cluster with same-site failover automation. A disaster recovery-capable HADR deployment with two standby databases is another popular scenario to use.
Objective
- The principal primary host and principal standby host must be in AZ1 with HADR SYNC mode.
- The single auxiliary standby host must be in AZ2 with HADR SUPERASYNC mode.
- Automated failover with Pacemaker must be set up between the two hosts in AZ1.
- No Pacemaker cluster setup can exist in AZ2 or across AZ1 and AZ2.
- Manual takeover from the auxiliary standby is needed for disaster recovery.
- The AWS Overlay IP must be set up in AZ1 to allow virtual IP support between the principal primary and principal standby hosts.
- The AWS Elastic IP must be set up as the alternative server in your client configuration to allow transient failover when the auxiliary standby takes over as the primary.

Configure your database with Archive logging
HADR is only supported on a database that is configured with Archive logging. If your database is configured with Circular logging, you must first change the logarchmeth1 and logarchmeth2 configuration parameters. An offline backup of the database is required before the database is changed to use archive logging.
Environment
Hostname | Instance name | HADR service port | SVCENAME | Intended role |
---|---|---|---|---|
Host_A | db2inst1 | 10 | 20000 | Principal primary |
Host_B | db2inst2 | 20 | 20000 | Principal standby |
Host_C | db2inst3 | 30 | 20000 | Auxiliary standby |
Configure a multiple standby setup
- Take an offline backup of the intended principal primary HADR database by using the following
command.
db2 BACKUP DB hadrdb TO backup_dir
- Copy the backup image to the other hosts. On each of the intended standby hosts, issue
DROP DB
to clean up any old databases that might exist and restore the backup image:DB2 DROP DB hadrdb
DB2 RESTORE DB hadrdb FROM backup_dir
- After the databases are restored on all standby hosts, as in a regular HADR setup, the following
database configuration parameters must be explicitly set.
- hadr_local_host
- hadr_local_svc
- hadr_remote_host
- hadr_remote_inst
- hadr_remote_svc
On the principal primary, the settings for the hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters correspond to the hostname, instance name, and port number of the principal standby. On the principal and auxiliary standby systems, the values of these configuration parameters correspond to the hostname, port number, and instance name of the principal primary.
In addition, hostnames and port numbers are used to set the hadr_target_list configuration parameter on all the databases. The following example shows the hadr_target_list configuration parameter set for hosts A, B, and C:Hostname Intended role hadr_target_list Host_A Principal primary Host_B:20|Host_C:30
Host_B Principal standby Host_A:10|Host_C:30
Host_C Auxiliary standby Host_A:10|Host_B:20
In addition to the hadr_target_list configuration settings, the hadr_syncmode parameter needs to be set to SYNC across all databases. The hadr_syncmode parameter can also be set to SYNC for the auxiliary standby. This parameter is set because the synchronization mode that is set with the hadr_syncmode parameter is only effective when the database becomes the principal primary or principal standby. Otherwise, the auxiliary database always has an effective synchronization mode of SUPERASYNC.
-
On each of the databases update the configuration parameters.
On Host_A (principal primary):db2 "UPDATE DB CFG FOR hadrdb USING HADR_TARGET_LIST Host_B:20|Host_C:30 HADR_REMOTE_HOST Host_B HADR_REMOTE_SVC 20 HADR_LOCAL_HOST Host_A HADR_LOCAL_SVC 10 HADR_SYNCMODE sync HADR_REMOTE_INST db2inst2" db2 update alternate server for database hadrdb using hostname HOST_C port 20000
On Host_B (principal standby):db2 "UPDATE DB CFG FOR hadrdb USING HADR_TARGET_LIST Host_A:10|Host_C:30 HADR_REMOTE_HOST Host_A HADR_REMOTE_SVC 10 HADR_LOCAL_HOST Host_B HADR_LOCAL_SVC 20 HADR_SYNCMODE sync HADR_REMOTE_INST db2inst1" db2 update alternate server for database hadrdb using hostname HOST_C port 20000
On Host_C (auxiliary standby):db2 "UPDATE DB CFG FOR hadrdb USING HADR_TARGET_LIST Host_A:10|Host_B:20 HADR_REMOTE_HOST Host_A HADR_REMOTE_SVC 10 HADR_LOCAL_HOST Host_C HADR_LOCAL_SVC 30 HADR_SYNCMODE sync HADR_REMOTE_INST db2inst1" db2 update alternate server for database hadrdb using hostname 192.168.1.81 port 20000
After completion of the parts previously outlined, the configuration for each database is shown.Configuration Parameter Host_A Host_B Host_C hadr_target_list Host_B:20|Host_C:30
Host_A:10|Host_C:30
Host_A:10|Host_B:20
hadr_remote_host Host_B Host_A Host_C hadr_remote_svc 20 10 10 hadr_remote_inst db2inst2
db2inst1
db2inst1
hadr_local_host Host_A Host_B Host_C hadr_local_svc 10 20 30 Configured hadr_syncmode SYNC SYNC SYNC Effective hadr_syncmode N/A SYNC SUPERASYNC Note: The effective hadr_syncmode parameter can be viewed by running the db2pd -db hadrdb -hadr command on each host.Note: Verify that the AWS Security policy allows for TCP connections between the ports that are needed for the Db2 instance ports and the HADR service ports. By default, all communications are restricted within the virtual private cloud (VPC). To allow connections, an inbound rule can be configured for the security group that belongs to the VPC. For more information, see Authorize inbound traffic for your Linux instances.
Starting the HADR databases
- Start HADR on the standby databases first by issuing the following commands on Host_B and
Host_C.
db2 START HADR ON DB hadrdb AS STANDBY
- Start HADR on the principal primary database. In this example, the primary host is
Host_A.
db2 START HADR ON DB hadrdb AS PRIMARY
- Verify that HADR is up and running, query the status of the databases from the principal primary
on Host_A by running the db2pd -db hadrdb -hadr command, which returns
information about all the standby databases. For
example:
Once HADR is running, your Pacemaker resources need to be created for cluster management on Host_A and Host_B. You must first create Pacemaker resources on the primary site.Database Member 0 -- Database HADRDB -- Active -- Up 0 days 13:08:27 -- Date 2021-11-11-05.06.42.980971 HADR_ROLE = PRIMARY REPLAY_TYPE = PHYSICAL HADR_SYNCMODE = SYNC STANDBY_ID = 1 LOG_STREAM_ID = 0 HADR_STATE = PEER HADR_FLAGS = TCP_PROTOCOL PRIMARY_MEMBER_HOST = HOST_A PRIMARY_INSTANCE = db2inst1 PRIMARY_MEMBER = 0 STANDBY_MEMBER_HOST = HOST_B STANDBY_INSTANCE = db2inst1 STANDBY_MEMBER = 0 HADR_CONNECT_STATUS = CONNECTED HADR_ROLE = PRIMARY REPLAY_TYPE = PHYSICAL HADR_SYNCMODE = SUPERASYNC STANDBY_ID = 2 LOG_STREAM_ID = 0 HADR_STATE = REMOTE_CATCHUP HADR_FLAGS = TCP_PROTOCOL PRIMARY_MEMBER_HOST = HOST_B PRIMARY_INSTANCE = db2inst1 PRIMARY_MEMBER = 0 STANDBY_MEMBER_HOST = HOST_A STANDBY_INSTANCE = db2inst1 STANDBY_MEMBER = 0 HADR_CONNECT_STATUS = CONNECTED
-
Complete the following steps as root:
- Create the cluster and Ethernet
resource:
db2cm -create -cluster -domain db2ha -host Host_A -publicEthernet eth0 -host Host_B -publicEthernet eth0
- Create the following instance
resources:
db2cm -create -instance db2inst1 -host Host_A
db2cm -create -instance db2inst2 -host Host_B
- On Host_A, create the database
resource:
db2cm -create -db hadrdb -instance db2inst1
- Create the cluster and Ethernet
resource:
Configuring Overlay IP Address
After configuring the Pacemaker cluster, an Overlay IP needs to be configured on AWS to act as a dynamic virtual IP that applications can connect to. This Overlay IP points to either Host_A or Host_B, depending on which host is the principal primary. For more information on how to configure Overlay IPs, refer to Setting up a Db2 HADR Pacemaker cluster with Overlay IP as Virtual IP on AWS .
To ensure application transparency in the event of a disaster recovery takeover, an alternate server list needs to be set up for the auxiliary standby. This list points to an IP address of the auxiliary standby.
- If the clients are all located within the VPC as the HADR cluster, no action is required. The auxiliary host's IP address can be used as the alternate server IP.
- If the clients can connect from outside of the VPC, the auxiliary standby must be set up with a
Public IP address that can be accessed from outside of AWS. Multiple solutions are provided by AWS.
This includes, but is not limited to, the following:
- Add an additional route to the route table. For more information refer to AWS documentation:
Add and remove routes from a route table.
- Attach an Elastic IP to the auxiliary standby’s EC2 instance. For more information, refer to AWS documentation: Elastic IP addresses.
- Add an additional route to the route table. For more information refer to AWS documentation:
Add and remove routes from a route table.
Performing a manual takeover for disaster recovery
If both Host_A and Host_B go down in AZ1, a manual takeover is needed.
- On Host_C, have Host_C takeover as the principal
primary:
db2 takeover hadr on db hadrdb
Note: There is no automation on this site, so this should be a temporary state for disaster recovery. -
Once either host in the original availability zone comes back online, run the following on either Host_A or Host_B:
db2 takeover hadr on db hadrdb
Pacemaker automatically detects that the database is the principal primary again and continues to manage all the resources.
Configuring the client connections
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<dsncollection>
<dsn alias="HADRDB" name="HADRDB" host="192.168.1.81" port="20000" />
</dsncollection>
<databases>
<database name="HADRDB" host="192.168.1.81" port="20000">
<acr>
<!—Automatic Client Reroute(acr) is already enabled by default -->
<parameter name="enableSeamlessAcr" value="true" />
<!--Enable server list for application first connect -->
<parameter name="enableAlternateServerListFirstConnect" value="true" />
<alternateserverlist>
<server hostname="Host_C" port="20000" />
</alternateserverlist>
</acr>
</database>
</databases>
</configuration>
Takeover behavior
Host_A | Host_B | Host_C |
---|---|---|
Principal primary | Principal standby | Auxiliary standby |
- Principal primary database failure
- The principal primary database fails on Host_A, then the principal standby database on Host_B
takes over automatically as the principal primary. When the old principal primary database comes
back online, it reintegrates as the principal standby.
Host_A Host_B Host_C Principal standby Principal primary Auxiliary standby
- The principal primary database fails on Host_A, then the principal standby database on Host_B
takes over automatically as the principal primary. When the old principal primary database comes
back online, it reintegrates as the principal standby.
- Principal standby failure
- The principal standby database fails on Host_B, then Pacemaker attempts to
bring it back as the principal standby, while all other databases remain in the same role.
Host_A Host_B Host_C Principal primary Principal standby Auxiliary standby
- The principal standby database fails on Host_B, then Pacemaker attempts to
bring it back as the principal standby, while all other databases remain in the same role.
- Auxiliary standby failure
- An auxiliary standby database fails on Host_C . There is no automation on this host, so the
database must be brought back online manually.
Host_A Host_B Host_C Principal primary Principal standby Down
- An auxiliary standby database fails on Host_C . There is no automation on this host, so the
database must be brought back online manually.
- Both the principal primary database and the principal standby fail
- If a manual takeover by force is issued on Host_C, then the databases on Host_A and Host_B will
both become unmanaged by Pacemaker. The database
on Host_C, becomes the principal primary.
Host_A Host_B Host_C Down Down Principal primary - Once either Host_A or Host_B come back online, a manual takeover needs to be performed to bring
the database back to the principal primary site and enable automation. Host_C returns to being the
auxiliary standby, and once the other host comes back online in the principal primary site, the host
reintegrates as the principal standby.
Host_A Host_B Host_C Principal primary Principal standby Auxiliary standby
- If a manual takeover by force is issued on Host_C, then the databases on Host_A and Host_B will
both become unmanaged by Pacemaker. The database
on Host_C, becomes the principal primary.