DB2 Version 9.7 for Linux, UNIX, and Windows

Database configuration for high availability disaster recovery (HADR)

You can use database configuration parameters to achieve optimal performance with DB2® high availability disaster recovery (HADR).

To achieve optimal performance with DB2 high availability disaster recovery (HADR), ensure that your database configuration meets the following requirements.

Recommendation: To the extent possible, the database configuration parameters and database manager configuration parameters should be identical on the systems where the primary and standby databases reside. If the configuration parameters are not properly set on the standby database the following problems might occur:

Error messages might be returned on the standby database while replaying the log files that were shipped from the primary database.
After a takeover operation, the new primary database will not be able to handle the workload, resulting in performance problems or in applications receiving error messages they were not receiving when they were connected to the original primary database.

Changes to the configuration parameters on the primary database are not automatically propagated to the standby database and must be made manually on the standby database. For dynamic configuration parameters, changes take effect without shutting down and restarting the database management system (DBMS) or the database. For non-dynamic configuration parameters, changes will take effect after the standby database is restarted.

Size of log files configuration parameter on the standby database

One exception to the configuration parameter behavior described in the previous paragraph is the logfilsiz database configuration parameter. Although this parameter is not replicated to the standby database, to guarantee identical log files on both databases, the standby database ignores the local logfilsizconfiguration and creates local log files that match the size of the log files on the primary database.

After a takeover, the original standby (new primary) takes the logfilsiz value that was set on the original primary until the database is restarted. At that point, the new primary reverts to the value configured locally. In addition, the new primary also truncates the current log file and resizes any pre-created log files.

If the databases keep switching roles as a result of a non-forced takeover and neither database is deactivated, then the log file size used is always the one established by the very first primary. However, if there is a deactivate and then a restart on the original standby (new primary) then it would use the log file size configured locally. This log file size would continue to be used if the original primary takes over again. Only after a deactivation and restart on the original primary would the log file size revert to the settings on the original primary.

Database configuration parameter autorestart

The recommended configuration for the autorestart parameter on HADR systems is ON. If the autorestart parameter is set to OFF, and the server fails, your response depends on whether or not you want to restart or fail over to the standby:

If you want to restart, run the RESTART DATABASE command manually. If the restart fails, perform failover.
If you want to fail over, perform the following steps:
1. Shut down the old primary to prevent a "split brain". Do this by either stopping the DB2 instance or powering off the host machine. If the server is not accessible for administration, fence it off from clients by disabling the client/server network.
  Note: Deactivating the database is not sufficient because client connections can bring it back online. If it failed in a consistent state, then even if the autorestart parameter is set to OFF, this does not prevent client connections from bringing it back online.
2. After shutting down old primary, issue the TAKEOVER HADR command with the BY FORCE option on the standby.

Log receive buffer size on the standby database

By default, the log receive buffer size on the standby database is two times the value specified for the logbufszconfiguration parameter on the primary database. There might be times when this size is not sufficient. For example, when the HADR synchronization mode is asynchronous and the primary and standby databases are in peer state, if the primary database is experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity and the log shipping operation from the primary database might stall. To manage these temporary peaks, you can increase the size of the log receive buffer on the standby database by modifying the DB2_HADR_BUF_SIZE registry variable.

Load operations and HADR

If a load operation is executed on the primary database with the COPY YES option, the command executes on the primary database, and the data is replicated to the standby database as long as the copy can be accessed through the path or device specified by the LOAD command. If the standby database cannot access the data, the table space in which the table is stored is marked invalid on the standby database. The standby database will skip future log records that pertain to this table space. To ensure that the load operation can access the copy on the standby database, it is recommended that you use a shared location for the output file from the COPY YES option. Alternatively, you can deactivate the standby database while the load operation is performed, perform the load on the primary, place a copy of the output file in the standby path, and then activate the standby database.

If a load operation is executed on the primary database with the NONRECOVERABLE option, the command executes on the primary database and the table on the standby database is marked invalid. The standby database will skip future log records that pertain to this table. You can choose to issue the LOAD command with the COPY YES and REPLACE options specified to bring the table back, or you can drop the table to recover the space.

Because executing a load operation with the COPY NO option is not supported with HADR, the command is automatically converted to a load operation with the NONRECOVERABLE option. To enable a load operation with the COPY NO option to be converted to a load operation with the COPY YES option, set the DB2_LOAD_COPY_NO_OVERRIDE registry variable on the primary database. This registry variable is ignored by the standby database. Ensure that the device or directory specified on the primary database can be accessed by the standby database using the same path, device, or load library.

If you are using Tivoli® Storage Manager (TSM) to perform a load operation with the COPY YES option, you might need to set the vendoropt configuration parameter on the primary and standby databases. Depending on how TSM is configured, the values on the primary and standby databases might not be the same. Also, when using TSM to perform a load operation with the COPY YES option, you must issue the db2adutl command with the GRANT option to give the standby database read access for the files that are loaded.

If table data is replicated by a load operation with the COPY YES option specified, the indexes are replicated as follows:

If the indexing mode is set to REBUILD and the table attribute is set to LOG INDEX BUILD, or the table attribute is set to DEFAULT and the logindexbuild database configuration parameter is set to ON, the primary database includes the rebuilt index object (that is, all of the indexes defined on the table) in the copy file to enable the standby database to replicate the index object. If the index object on the standby database is marked invalid before the load operation, it becomes usable again after the load operation as a result of the index rebuild.
If the indexing mode is set to INCREMENTAL and the table attribute is set to LOG INDEX BUILD, or the table attribute is set to NULL and the logindexbuild database configuration parameter on the primary database is set to ON, the index object on the standby database is updated only if it is not marked invalid before the load operation. Otherwise, the index is marked invalid on the standby database.

Registry variable DB2_HADR_PEER_WAIT_LIMIT

When the DB2_HADR_PEER_WAIT_LIMIT registry variable is set, HADR primary database breaks out of peer state if logging on the primary database has been blocked for the specified number of seconds because of log replication to the standby. When this limit is reached, primary database breaks the connection to the standby database. If peer window is disabled, the primary enters disconnected state and logging resumes. If peer window is enabled, the primary database enters disconnected peer state, in which logging continues to be blocked. The primary leaves disconnected peer state upon re-connection or peer window expiration. Logging resumes after the primary leaves disconnected peer state.

Note: If you set DB2_HADR_PEER_WAIT_LIMIT, use a minimum value of 10 to avoid triggering false alarms.

Honoring peer window transition when breaking out of peer state ensures peer window semantics for safe takeover in all cases. If the primary fails during the transition, normal peer window protection still applies (safe takeover from standby as long as it is still in disconnected peer state).

On the standby side, after disconnection, the database continues replaying already received logs. After the received logs have been replayed, the standby reconnects to the primary. Upon re-connection, normal state transition follows (first remote catchup state, then peer state)

Relationship to HADR_TIMEOUT:

The hadr_timeout database configuration parameter does not break the primary out of peer state if the primary keeps receiving heartbeat messages from the standby while blocked. hadr_timeout is a timeout for the HADR network layer. An HADR database breaks the connection to its partner database if it has not received any message from its partner for the hadr_timeout period. It does not control timeout for higher layer operations such as log shipping and ack. If log replay on the standby database is stuck on a large operation such as load or reorganization, the HADR component still sends heartbeat messages to the primary database on normal schedule. In such a scenario, the primary is blocked as long as the standby replay is blocked, unless DB2_HADR_PEER_WAIT_LIMIT is set.

DB2_HADR_PEER_WAIT_LIMIT unblocks primary logging regardless of connection status. Note that even if DB2_HADR_PEER_WAIT_LIMIT is not set, the primary always breaks out of peer state when a network error is detected or the connection is closed (possibly as result of HADR_TIMEOUT).

HADR configuration parameters

Several new database configuration parameters are available to support HADR. Setting these parameters does not change the role of a database. You must issue the START HADR or STOP HADR commands to change the role of a database.

HADR configuration parameters are not dynamic. Any changes made to an HADR configuration parameter are not effective until the database has been shut down and restarted. In a partitioned database environment, the HADR configuration parameters are visible and can be changed, but they are ignored.

The local host name of the primary database must be the same as the remote host name of the standby database, and the local host name of the standby database must be the same as the remote host name of the primary database. Use the hadr_local_host and hadr_remote_host configuration parameters to set the local and remote hosts for each database. Configuration consistency for the local and remote host names is checked when a connection is established to ensure that the remote host specified is the expected database.

An HADR database can be configured to use either IPv4 or IPv6 to locate its partner database. If the host server does not support IPv6, the database uses IPv4. If the server does support IPv6, whether the database uses IPv4 or IPv6 depends upon the format of the address specified for thehadr_local_host and hadr_remote_host configuration parameters. The database attempts to resolve the two parameters to the same IP format. The following table shows how the IP mode is determined for IPv6-enabled servers:

IP mode used for hadr_local_host	IP mode used for hadr_remote_host	IP mode used for HADR communications
IPv4 address	IPv4 address	IPv4
IPv4 address	IPv6 address	Error
IPv4 address	hostname, maps to v4 only	IPv4
IPv4 address	hostname, maps to v6 only	Error
IPv4 address	hostname, maps to v4 and v6	IPv4
IPv6 address	IPv4 address	Error
IPv6 address	IPv6 address	IPv6
IPv6 address	hostname, maps to v4 only	Error
IPv6 address	hostname, maps to v6 only	IPv6
IPv6 address	hostname, maps to v4 and v6	IPv6
hostname, maps to v4 only	IPv4 address	IPv4
hostname, maps to v4 only	IPv6 address	Error
hostname, maps to v4 only	hostname, maps to v4 only	IPv4
hostname, maps to v4 only	hostname, maps to v6 only	Error
hostname, maps to v4 only	hostname, maps to v4 and v6	IPv4
hostname, maps to v6 only	IPv4 address	Error
hostname, maps to v6 only	IPv6 address	IPv6
hostname, maps to v6 only	hostname, maps to v4 only	Error
hostname, maps to v6 only	hostname, maps to v6 only	IPv6
hostname, maps to v6 only	hostname, maps to v4 and v6	IPv6
hostname, maps to v4 and v6	IPv4 address	IPv4
hostname, maps to v4 and v6	IPv6 address	IPv6
hostname, maps to v4 and v6	hostname, maps to v4 only	IPv4
hostname, maps to v4 and v6	hostname, maps to v6 only	IPv6
hostname, maps to v4 and v6	hostname, maps to v4 and v6	IPv6

The primary and standby databases can make HADR connections only if they use the same format. If one server is IPv6 enabled (but also supports IPv4) and the other server supports IPv4 only, at least one of the hadr_local_host and hadr_remote_host parameters must specify an IPv4 address. This tells the database to use IPv4 even if the server supports IPv6.

When you specify values for the high availability disaster recovery (HADR) local service and remote service parameters (hadr_local_svc and hadr_remote_svc) while preparing an update database configuration command, the values you specify must be ports that are not in use for any other service, including other DB2 components or other HADR databases. In particular, you cannot set either parameter value to the TCP/IP port used by the server to await communications from remote clients (the SVCENAME database manager configuration parameter) or the next port (SVCENAME + 1).

If the primary and standby databases are on different machines, they can use the same port number or service name; otherwise, different values should be used. The hadr_local_svc and hadr_remote_svc parameters can be set to either a port number or a service name.

The synchronization mode (specified by the hadr_syncmode configuration parameter) and timeout period (specified by the hadr_timeout configuration parameter) must be identical on both the primary and standby databases. The consistency of these configuration parameters is checked when an HADR pair establishes a connection.

TCP connections are used for communication between the primary and standby databases. A primary database that is not connected to a standby database, either because it is starting up or because the connection is lost, listens on its local port for new connections. A standby database that is not connected to a primary database continues issuing connection requests to its remote host.

Although the local host and local service parameters (hadr_local_host, hadr_local_svc) are only used on the primary database, you should still set them on the standby database to ensure that they are ready if the standby database has to take over as the primary database.

When the primary database starts, it waits for a standby to connect for a minimum of 30 seconds or for the number of seconds specified by the value of the hadr_timeout database configuration parameter, whichever is longer. If the standby does not connect in the specified time, the startup fails. (The one exception to this is when the START HADR command is issued with the BY FORCE option.)

After an HADR pair establishes a connection, they will exchange heart beat messages. The heartbeat interval is one-quarter of the value of the hadr_timeout database configuration parameter, or 30 seconds, whichever is shorter. The hadr_heartbeat monitor element shows the current number of heartbeats that have been consecutively missed on the HADR connection. If one database does not receive any message from the other database within the number of seconds specified by hadr_timeout, it initiates a disconnect. This means that at most it takes the number of seconds specified by hadr_timeout for a primary to detect the failure of either the standby or the intervening network. If you set the hadr_timeout configuration parameter too low, you will receive false alarms and frequent disconnections.

If the hadr_peer_window database configuration parameter is set to zero, then when the primary and standby databases are in peer state, problems with the standby or network block primary transaction processing only for the number of seconds specified by the hadr_timeout configuration parameter, at most. If you set the hadr_peer_window to a non-zero value, then the primary database does not commit transactions until connection with the standby database is restored, or the hadr_peer_window time value elapses, whichever happens first.

Note: For maximal availability, the default value for the hadr_peer_window database configuration parameter is zero. When hadr_peer_window is set to zero, then as soon as the connection between the primary and the standby is closed (either because the standby closed the connection, a network error is detected, or timeout is reached), the primary drops out of peer state to avoid blocking transactions. For increased data consistency, but reduced availability, you can set the hadr_peer_window database configuration parameter to a non-zero value, which causes the primary database to remain in disconnected peer state for the length of time specified by the hadr_peer_window value.

The following sample configuration is for the primary and standby databases.

On the primary:

   HADR_LOCAL_HOST           host1.ibm.com
   HADR_LOCAL_SVC            hadr_service
   HADR_REMOTE_HOST          host2.ibm.com
   HADR_REMOTE_SVC           hadr_service
   HADR_REMOTE_INST          dbinst2
   HADR_TIMEOUT              120
   HADR_SYNCMODE             NEARSYNC
   HADR_PEER_WINDOW          120

On the standby:

   HADR_LOCAL_HOST           host2.ibm.com
   HADR_LOCAL_SVC            hadr_service
   HADR_REMOTE_HOST          host1.ibm.com
   HADR_REMOTE_SVC           hadr_service
   HADR_REMOTE_INST          dbinst1
   HADR_TIMEOUT              120
   HADR_SYNCMODE             NEARSYNC
   HADR_PEER_WINDOW          120