DB2 Version 10.1 for Linux, UNIX, and Windows

Scenario: Deploying an HADR multiple standby database setup

This scenario describes the planning, configuring, and deploying of an HADR setup for a bank called ExampleBANK. The setup has three standby databases: one principal standby and two auxiliary standbys.

Background

Because banking is a 24x7 business, high availability is crucial to ExampleBANK's technology strategy. In addition, ExampleBANK experienced a close call with a hurricane hitting City A, where its head office is located, so the bank also requires a disaster recovery strategy. High availability disaster recovery (HADR) offers a solution that can help the bank achieve both of these goals with a single technology: HADR multiple standby databases.

ExampleBANK considers the following requirements essential for its HADR solution:
An aggressive recovery time objective
As a bank that offers 24-hour online service, ExampleBANK wants to minimize the time that applications cannot connect to their database.
An aggressive recovery point objective
ExampleBANK cannot tolerate data loss, so the RPO should be as close to 0 as possible.
Near-zero planned downtime
ExampleBANK's database should be available as much as possible, even through planned activities such as upgrades and maintenance.
Data protection through geographic dispersion
As part of its compliance standards, ExampleBANK wants the capability to recover operations at a remote location.
Easy deployment and management
ExampleBANK's overburdened IT department wants a solution that is relatively simple to configure and that has automation capabilities.
As the following scenarios illustrate, using the HADR feature in multiple standby mode helps ExampleBANK meet all these requirements.

Planning for a multiple standby setup

ExampleBANK wants to have both high availability and disaster recovery protection from its HADR setup, so the bank decides to use the maximum number of standbys: three. To achieve the RTO, the bank must have a standby that is in close synchronization with the primary (a standby that uses SYNC or NEARSYNC mode) and is collocated with the primary. It makes the most sense to have this standby be the principal standby because only that standby supports all synchronization modes. Both the primary and the principal standby are located in ExampleBANK's head office in City A and are connected by a LAN.

In addition, to protect the bank's data from being lost because of a disaster, the ExampleBANK DBA chooses to set up two standbys in the bank's regional office in City B. The regional office is connected to the head office in City A by a WAN. The distance between the two cities will not affect the primary because the standbys are auxiliary standbys, which automatically run in SUPERASYNC mode. The DBA can provide additional justification for the costs of these additional databases by setting up one of them to use the reads on standby feature and the other to use the time-delayed replay feature. Also, these standbys can help maintain database availability through a rolling update or maintenance scenario, preventing the loss of HADR protection.

Configuring a multiple standby setup

The ExampleBANK DBA takes a backup of the intended primary database, HADR_DB:
DB2 BACKUP DB hadr_db TO backup_dir
The DBA then restores the backup onto each of the intended standby hosts by issuing the following command:
DB2 RESTORE DB hadr_db FROM backup_dir
Tip: For more information about options for creating a standby, see Initializing a standby database.
For the initial setup, the ExampleBANK DBA decides that most of the default configuration settings are sufficient. However, as in a regular HADR setup, the following database configuration parameters must be explicitly set:
  • hadr_local_host
  • hadr_local_svc
  • hadr_remote_host
  • hadr_remote_inst
  • hadr_remote_svc
To obtain the correct values for those configuration parameters, the DBA determines the host name, port number, and instance name of the four databases that will be in the HADR setup:
Table 1. Host name, port number, and instance name for databases
Intended role Host name Port number Instance name
Primary host1 10 dbinst1
Principal standby host2 40 dbinst2
Auxiliary standby host3 41 dbinst3
Auxiliary standby host4.ibm.com 42 dbinst4
On the primary, the settings for the hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters correspond to the host name, instance name, and port number of the principal standby. On the standbys, the values of these configuration parameters correspond to the host name, port number, and instance name of the primary. In addition, the DBA uses the host name and port values to set the hadr_target_list configuration parameter on all the databases. Also, although it is not required, the DBA adds the information about all the standbys in the setup to the target list of each of the other standbys. For more information about this topic, see Database configuration for high availability disaster recovery (HADR).

As mentioned earlier, the bank wants the closest possible synchronization between the primary and principal standby, so the DBA sets the hadr_syncmode parameter on the primary to SYNC. Although the principal standby will automatically have its effective synchronization mode set to SYNC after it connects to the primary, the DBA still sets the hadr_syncmode parameter to SYNC on the principal standby. The reason is that if the principal standby switches role with the primary, the synchronization mode for the new primary and principal standby pair will also be SYNC.

The DBA decides to specify host2, which is in a different city from the auxiliary standbys, as the principal standbys for the auxiliary standbys. If one of the auxiliaries becomes the primary, SUPERASYNC would be a good synchronization mode between the primary and the remotely located host2. Thus DBA sets the hadr_syncmode parameter on the auxiliary standbys to SUPERASYNC, although the auxiliary standbys will automatically have their effective synchronization modes set to SUPERASYNC after they connect to the primary. For more information about this topic, see High Availability Disaster Recovery (HADR) synchronization mode.

Finally, the DBA has read about the new HADR delayed replay feature, which can be used to intentionally keep a standby database at a point in time that is earlier than the primary by delaying replay of logs. The DBA decides that enabling this feature would improve ExampleBANK's data protection against errant transactions on the primary. The DBA chooses host4, an auxiliary standby, for this feature, and makes a note that this feature must be disabled before host4 can take over as the primary database. For more information about this topic, see HADR delayed replay.

The DBA issues the following commands to update the configuration parameters on each of the databases:
  • On host1 (the primary):
    DB2 "UPDATE DB CFG FOR hadr_db USING
         HADR_TARGET_LIST  host2:40|host3:41|host4:42
         HADR_REMOTE_HOST  host2
         HADR_REMOTE_SVC   40
         HADR_LOCAL_HOST   host1
         HADR_LOCAL_SVC    10
         HADR_SYNCMODE     sync
         HADR_REMOTE_INST  db2inst2"
  • On host2 (the principal standby):
    DB2 "UPDATE DB CFG FOR hadr_db USING
         HADR_TARGET_LIST  host1:10|host3:41|host4:42
         HADR_REMOTE_HOST  host1
         HADR_REMOTE_SVC   10
         HADR_LOCAL_HOST   host2
         HADR_LOCAL_SVC    40
         HADR_SYNCMODE     sync
         HADR_REMOTE_INST  db2inst1"
  • On host3 (an auxiliary standby):
    DB2 "UPDATE DB CFG FOR hadr_db USING
         HADR_TARGET_LIST  host2:40|host1:10|host4:42
         HADR_REMOTE_HOST  host1
         HADR_REMOTE_SVC   10
         HADR_LOCAL_HOST   host3
         HADR_LOCAL_SVC    41
         HADR_SYNCMODE     superasync
         HADR_REMOTE_INST  db2inst1"
  • On host4 (an auxiliary standby):
    DB2 "UPDATE DB CFG FOR hadr_db USING
         HADR_TARGET_LIST  host2.:40|host1:10|host3:41
         HADR_REMOTE_HOST  host2
         HADR_REMOTE_SVC   10
         HADR_LOCAL_HOST   host4
         HADR_LOCAL_SVC    42
         HADR_SYNCMODE     superasync
         HADR_REMOTE_INST  db2inst1
         HADR_REPLAY_DELAY 86400"
Finally, the ExampleBANK DBA wants to enable the HADR reads on standby feature for the following reasons:
  • To make online changes to some of the HADR configuration parameters on the standbys
  • To call the MON_GET_HADR table function on the standbys
  • To divert some of the read-only workload from the primary
The DBA updates the registry variables on the standby databases by issuing the following commands on each of host2, host3, and host4:
DB2SET DB2_HADR_ROS=ON
DB2SET DB2_STANDBY_ISO=UR

Starting the HADR databases

The DBA starts the standby databases first, by issuing the following command on each of host2, host3, and host 4:
DB2 START HADR ON DB hadr_db AS STANDBY

Next, the DBA starts HADR on the primary database, on host1:
DB2 START HADR ON DB hadr_db AS PRIMARY
To verify that HADR is up and running, the DBA queries the status of the databases from the primary on host1 by issuing the db2pd command, which returns information about all of the standbys:
db2pd -db hadr_db -hadr

  Database Member 0 -- Database hadr_db -- Active -- Up 0 days 00:23:17 -- 
Date 06/08/2011 13:57:23

                              HADR_ROLE = PRIMARY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = SYNC
                             STANDBY_ID = 1
                          LOG_STREAM_ID = 0
                             HADR_STATE = PEER
                    PRIMARY_MEMBER_HOST = host1
                       PRIMARY_INSTANCE = db2inst1
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = host2
                       STANDBY_INSTANCE = db2inst2
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:38:10.199479 (1307565490)
            HEARTBEAT_INTERVAL(seconds) = 30
                  HADR_TIMEOUT(seconds) = 120
          TIME_SINCE_LAST_RECV(seconds) = 3
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.006298
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.516
                    LOG_HADR_WAIT_COUNT = 82
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 50772
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87616
              PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
              STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = Y
      STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

                              HADR_ROLE = PRIMARY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = SUPERASYNC
                             STANDBY_ID = 2
                          LOG_STREAM_ID = 0
                             HADR_STATE = REMOTE_CATCHUP
                    PRIMARY_MEMBER_HOST = host1
                       PRIMARY_INSTANCE = db2inst1
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = host3
                       STANDBY_INSTANCE = db2inst3
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:35:51.724447 (1307565351)
            HEARTBEAT_INTERVAL(seconds) = 30
                  HADR_TIMEOUT(seconds) = 120
          TIME_SINCE_LAST_RECV(seconds) = 16
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.006298
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.516
                    LOG_HADR_WAIT_COUNT = 82
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
              PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
              STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = Y
      STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

                              HADR_ROLE = PRIMARY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = SUPERASYNC
                             STANDBY_ID = 3
                          LOG_STREAM_ID = 0
                             HADR_STATE = REMOTE_CATCHUP
                    PRIMARY_MEMBER_HOST = host1
                       PRIMARY_INSTANCE = db2inst1
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = host4
                       STANDBY_INSTANCE = db2inst4
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:46:51.561873 (1307566011)
            HEARTBEAT_INTERVAL(seconds) = 30
                  HADR_TIMEOUT(seconds) = 120
          TIME_SINCE_LAST_RECV(seconds) = 6
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.006298
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.516
                    LOG_HADR_WAIT_COUNT = 82
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
              PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
              STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = Y
      STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N