Database configuration for high availability disaster recovery (HADR)
You can use database configuration parameters to help achieve optimal performance with Db2® HADR.
- Error messages might be returned for the standby database while the log files that were shipped from the primary database are being replayed.
- After a takeover operation, the new primary database might be unable to handle the workload, resulting in performance problems or in applications receiving error messages that they did not receive when they were connected to the original primary database.
- Size of log files configuration parameter on the standby database
- Database configuration parameter autorestart
- Log receive buffer size on a standby database
- Load operations and HADR
- DB2_HADR_PEER_WAIT_LIMIT registry variable
- DB2_FAIL_RECOVERY_ON_TABLESPACE_ERROR registry variable
- HADR configuration parameters
Size of log files configuration parameter on the standby database
One exception to the configuration parameter behavior that is described in the previous paragraph is the behavior of the logfilsiz database configuration parameter. Although the value of this parameter is not replicated to the standby database, to help ensure that there are identical log files on both databases, the setting for the logfilsiz configuration parameter on the standby is ignored. Instead, the database creates local log files whose sizes match the size of the log files on the primary database.
After a takeover, the original standby (new primary) uses the logfilsiz parameter value that you set on the original primary until you restart the database. At that point, the new primary reverts to using the value that you set locally. In addition, the current log file is truncated and any pre-created log files are resized on the new primary.
If the databases keep switching roles as a result of a non-forced takeover and neither database is deactivated, the log file size that is used is always the one from the original primary database. However, if there is a deactivation and then a restart on the original standby (new primary), the new primary uses the log file size that you configured locally. This log file size continues to be used if the original primary takes over again. Only after a deactivation and restart on the original primary would the log file size revert to the settings on the original primary.
Database configuration parameter autorestart
- If you want to restart, run the RESTART DATABASE command manually. If the restart fails, perform failover.
- If you want to fail over, perform the following steps:
- Shut down the old primary to prevent a
split brain
. Do this by either stopping the Db2 instance or powering off the host machine. If the server is not accessible for administration, fence it off from clients by disabling the client/server network.Note: Deactivating the database is not sufficient because client connections can bring it back online. If it failed in a consistent state, then even if the autorestart parameter is set to OFF, client connections can bring it back online. - After you shut down old primary, issue the TAKEOVER HADR command with the BY FORCE option on the standby.
- Shut down the old primary to prevent a
Log receive buffer size on a standby database
- Increase the size of the log receive buffer on the standby database by modifying the value of the DB2_HADR_BUF_SIZE registry variable.
- Enable log spooling on a standby database by setting the hadr_spool_limit configuration parameter.
Load operations and HADR
If you issue the LOAD command on the primary database with the COPY YES parameter, the command executes on the primary database, and the data is replicated to the standby database if the load copy can be accessed through the path or device that is specified by the command. If load copy data cannot be accessed from the standby database, the table space in which the table is stored is marked invalid on the standby database. Any future log records that pertain to this table space are skipped. To ensure that the load operation can access the load copy on the standby database, use a shared location for the output file from the COPY YES parameter. Alternatively, you can deactivate the standby database while performing the load on the primary, place a copy of the output file in the standby path, and then activate the standby database.
If you issue the LOAD command with the NONRECOVERABLE parameter on the primary database, the command executes on the primary database, and the table on the standby database is marked invalid. Any future log records that pertain to this table are skipped. You can issue the LOAD command with the COPY YES and REPLACE parameters to bring the table back with new data on both the primary and standby databases, or you can drop the table to recover the space.
- The table was created with the NOT LOGGED INITIALLY attribute.
- The table is a multidimensional clustered (MDC) table.
- The table has compression dictionaries.
- The table has XML columns.
Because a load operation with the COPY NO parameter is not supported with HADR, the operation is automatically converted to a load operation with the NONRECOVERABLE parameter. To enable a load operation with the COPY NO parameter to be converted to a load operation with the COPY YES parameter, set the DB2_LOAD_COPY_NO_OVERRIDE registry variable on the primary database. This registry variable is ignored on the standby database. Ensure that the device or directory that you specify for the primary database can be accessed by the standby database by using the same path, device, or load library.
If you are using the Tivoli® Storage Manager (TSM) software to perform a load operation with the COPY YES parameter, you might have to set the vendoropt configuration parameter on the primary and standby databases. Depending on how you configured TSM, the values on the primary and standby databases might not be the same. Also, when using TSM to perform a load operation with the COPY YES parameter, you must issue the db2adutl command with the GRANT parameter to give the standby database read access to the files that are loaded.
- If you specify the REBUILD indexing mode option with the LOAD command and the LOG INDEX BUILD table attribute is set to ON (using the ALTER TABLE statement), or if it is set to NULL and the logindexbuild database configuration parameter is set to ON, the primary database includes the rebuilt index object (that is, all of the indexes defined on the table) in the copy file to enable the standby database to replicate the index object. If the index object on the standby database is marked invalid before the load operation, it becomes usable again after the load operation as a result of the index rebuild.
- If you specify the INCREMENTAL indexing mode option with the LOAD command and the LOG INDEX BUILD table attribute is set to ON (using the ALTER TABLE statement), or if it is set to NULL and the logindexbuild database configuration parameter on the primary database is set to ON, the index object on the standby database is updated only if it is not marked invalid before the load operation. Otherwise, the index is marked invalid on the standby database.
DB2_HADR_PEER_WAIT_LIMIT registry variable
Honoring peer window transition when a database breaks out of peer state ensures peer window semantics for safe takeover in all cases. If the primary fails during the transition, normal peer window protection still applies: safe takeover from the standby if it is still in disconnected peer state.
On the standby side, after disconnection, the database continues replaying already received logs. After the received logs have been replayed, the standby reconnects to the primary. After replaying the received logs, the standby reconnects to the primary. Upon reconnection, normal state transition follows: first remote catchup state, then peer state.
- Relationship to hadr_timeout database configuration parameter
The hadr_timeout database configuration parameter does not break the primary out of peer state if the primary keeps receiving heartbeat messages from the standby while blocked. The hadr_timeout database configuration parameter specifies a timeout value for the HADR network layer. An HADR database breaks the connection to its partner database if it has not received any message from its partner for the period that is specified by the hadr_timeout configuration parameter. The timeout does not control timeout for higher-layer operations such as log shipping and ack (acknowledgement) signals. If log replay on the standby database is stuck on a large operation such as load or reorganization, the HADR component still sends heartbeat messages to the primary database on the normal schedule. In such a scenario, the primary is blocked as long as the standby replay is blocked unless you set the DB2_HADR_PEER_WAIT_LIMIT registry variable.
The DB2_HADR_PEER_WAIT_LIMIT registry variable unblocks primary logging regardless of connection status. Even if you do not set the DB2_HADR_PEER_WAIT_LIMIT registry variable, the primary always breaks out of peer state when a network error is detected or the connection is closed, possibly as result of the hadr_timeout configuration parameter.
DB2_FAIL_RECOVERY_ON_TABLESPACE_ERROR registry variable
In an HADR environment, by default, when a standby database has a table space in an invalid or error state, the replay of transactions on this table space stops. On other valid table spaces, the replay of transactions will continue. This default behavior is preferable when the affected table space is only a small portion of the database and most applications can function with the valid table spaces.
You can specify a different behavior, by setting the
DB2_FAIL_RECOVERY_ON_TABLESPACE_ERROR
registry variable to
ROLLFORWARD. With this setting, the standby database will shut down when
encountering a table space which is invalid or in error. The standby database can be restarted after
the error is fixed. However, if the error cannot be fixed, you can recover the affected table space
by referring to Recovering table space errors on an HADR standby database for more information.
HADR configuration parameters
Some HADR configuration parameters are static, such as hadr_local_host
and
hadr_remote_host
. Static parameters are loaded on database startup, and changes are
ignored during run time. HADR parameters are also loaded when the START HADR
command completes. On the primary database, HADR can be started and stopped dynamically, with the
database remaining online. Thus, one way to refresh the effective value of an HADR configuration
parameter without shutting down the database is to stop and restart HADR. In contrast, the
STOP HADR brings down the database on the standby, so the standby's parameters
cannot be refreshed with the database online.
- Host name parameters and service and port name parameters
- There are six interrelated configuration parameters that you need to set for HADR:
hadr_target_list
hadr_local_host
hadr_remote_host
hadr_local_svc
- hadr_remote_svc (except in a Db2 pureScale® environment, which does not make use of this parameter)
- hadr_remote_inst
The target list specifies a set of host:port pairs that act as standbys (for the primary) or the standby hosts to be used if the standby takes over as the new HADR primary database. For a detailed description of its usage, see the hadr_target_list topic in the Related links.
TCP connections are used for communication between the primary and standby databases. The
local
parameters specify the local address and theremote
parameters specify the remote address. A primary database or primary database member listens on its local address for new connections. A standby database that is not connected to a primary database retries connection to its remote address.The standby database also listens on its local address. In some scenarios, another HADR database can contact the standby database on this address and send it messages.
Unless theHADR_NO_IP_CHECK
registry variable is set, Db2 runs the following cross-checks at connection time, of the local and remote addresses of the primary and principal standby databases:
andmy local address = your remote address
The check is done using the IP address and port number, rather than the literal string in the configuration parameters. You need to set themy remote address = your local address
hadr_local_host
andhadr_local_svc
configuration parameters respectively, to specify the internal address and external address of the host. You can bypass this check by setting theDB2_HADR_NO_IP_CHECK
registry variable to ON. For more information, see HADR and network address translation (NAT) support.You can configure an HADR database to use either IPv4 or IPv6 to locate its partner database. If the host server does not support IPv6, you must use IPv4. If the server supports IPv6, whether the database uses IPv4 or IPv6 depends upon the format of the address that you specify for the hadr_local_host and hadr_remote_host configuration parameters. The database attempts to resolve the two parameters to the same IP format and use IPv6 when possible. Table 1 shows how the IP mode is determined for IPv6-enabled servers:Table 1. How the address space used for HADR communication is determined IP mode used for hadr_local_host parameter IP mode used for hadr_remote_host parameter IP mode used for HADR communications IPv4 address IPv4 address IPv4 IPv4 address IPv6 address Error IPv4 address host name, maps to IPv4 only IPv4 IPv4 address host name, maps to IPv6 only Error IPv4 address host name, maps to IPv4 and v6 IPv4 IPv6 address IPv4 address Error IPv6 address IPv6 address IPv6 IPv6 address host name, maps to IPv4 only Error IPv6 address host name, maps to IPv6 only IPv6 IPv6 address host name, maps to IPv4 and IPv6 IPv6 hostname, maps to IPv4 only IPv4 address IPv4 hostname, maps to IPv4 only IPv6 address Error hostname, maps to IPv4 only hostname, maps to IPv4 only IPv4 hostname, maps to IPv4 only hostname, maps to IPv6 only Error hostname, maps to IPv4 only hostname, maps to IPv4 and IPv6 IPv4 hostname, maps to IPv6 only IPv4 address Error hostname, maps to IPv6 only IPv6 address IPv6 hostname, maps to IPv6 only hostname, maps to IPv4 only Error hostname, maps to IPv6 only hostname, maps to IPv6 only IPv6 hostname, maps to IPv6 only hostname, maps to IPv4 and IPv6 IPv6 hostname, maps to IPv4 and IPv6 IPv4 address IPv4 hostname, maps to IPv4 and IPv6 IPv6 address IPv6 hostname, maps to IPv4 and IPv6 hostname, maps to IPv4 only IPv4 hostname, maps to IPv4 and IPv6 hostname, maps to IPv6 only IPv6 hostname, maps to IPv4 and IPv6 hostname, maps to IPv4 and IPv6 IPv6 The primary and standby databases can make HADR connections only if they use the same IPv4 or IPv6 format. If one server is IPv6 enabled (but also supports IPv4) and the other server supports IPv4 only, at least one of the hhadr_local_host and hadr_remote_host parameters on the IPv6 server must specify an IPv4 address to force database on this server to use IPv4.
You can set the HADR local service and remote service parameters (hadr_local_svc and hadr_remote_svc) to either a port number or a service name. The values that you specify must map to ports that are not being used by any other service, including other Db2 components or other HADR databases. In particular, you cannot set either parameter value to the TCP/IP port that is used by the server to await communications from remote clients (the value of the svcename database manager configuration parameter) or the next port (the value of the svcename parameter + 1).
If the primary and standby databases are on different servers, they can use the same port number or service name; otherwise, they must have different values.
- Automatic reconfiguration
- The hadr_remote_host, hadr_remote_svc, and hadr_remote_inst configuration parameters are automatically reset when HADR starts if you did not correctly set them. Even though this automatic reconfiguration occurs, always try to set the correct initial values because that reconfiguration might not take effect until a connection is made between a standby and its primary. In some HADR deployments, those initial values might be needed. For example, if you are using the IBM® Tivoli System Automation for Multiplatforms software, the value for the hadr_remote_inst configuration parameter is needed to construct a resource name.Note: If the DB2_HADR_NO_IP_CHECK registry variable is set to ON, the hadr_remote_host and hadr_remote_svc are not automatically updated.
Reconfiguration is predicated on the values of the hadr_target_list configuration parameter being correct; if anything is wrong in a target list entry, you must correct it manually.
On the primary, the reconfiguration occurs in the following manner:- If the values for the hadr_remote_host and hadr_remote_svc configuration parameters do not match the host:port pair that is the first entry of the hadr_target_list configuration parameter (namely, the principal standby), the hadr_remote_host and hadr_remote_svc configuration parameters are updated with the values from the target list.
- If the value for the hadr_remote_inst configuration parameter does not match the instance name of the principal standby, the correct instance name is copied to the hadr_remote_inst configuration parameter for the primary after the principal standby connects to it.
On a standby database, the reconfiguration occurs in the following manner:- When a standby starts, it attempts to connect to the database that you specified for its hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters.
- If the standby cannot connect to the primary, it waits for the primary to connect to it.
- The primary attempts to connect to its standbys using addresses listed in its hadr_target_list parameter. After the primary connects to a standby, the hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters for the standby are updated with the correct values for the primary.
In a non-forced takeover, the values for the hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters on the new primary are automatically updated to its principal standby, and these parameters on the standbys listed in the new primary's hadr_target_list are automatically updated to point to the new primary. Any database that is not listed in the new primary's hadr_target_list is not updated. Those databases continue to attempt to connect to the old primary and get rejected because the old primary is now a standby. The old primary is guaranteed to be in the new primary's target list because of the requirement of mutual inclusion in the target list.
In a forced takeover, automatic update on the new primary and its standbys (excluding the old primary) work the same way as non-forced takeover. However, automatic update on the old primary does not happen until it is shut down and restarted as a standby for reintegration.
Any database that is not online during the takeover will be automatically reconfigured after it starts. Automatic reconfiguration might not take effect immediately on startup, because it relies on the new primary to periodically contact the standby. On startup, a standby might attempt to connect to the old primary and follow the log stream of the old primary, causing it to diverge from the new primary's log stream and, making that standby unable to pair with the new primary. As a result, you must shut down the old primary before takeover to avoid that kind of split brain scenario.
- Synchronization mode
- The setting for the hadr_syncmode configuration parameter does not have to be the same on the primary and standby databases. Whatever setting you specify for the hadr_syncmode configuration parameter on a standby is considered its configured synchronization mode; this setting has relevance only if the standby becomes a primary. The standby is assigned an effective synchronization mode. For any auxiliary standby, the effective synchronization mode is always SUPERASYNC. For the principal standby, the effective synchronization mode is the setting for the hadr_syncmode configuration parameter for the primary. For a standby, the monitoring interfaces display the effective synchronization mode as the synchronization mode.Note: If you have set up HADR without using the hadr_target_list configuration parameter (a method that is deprecated starting in 10.5), the hadr_syncmode configuration parameter must be identical on the primary and standby databases.
For more detailed information, see
Db2 high availability disaster recovery (HADR) synchronization mode
. - HADR timeout and peer window
The timeout period, which you specify with the hadr_timeout configuration parameter, must be identical on the primary and standby databases. The consistency of the values of these configuration parameters is checked when an HADR pair establishes a connection.
With one exception, when the primary database starts, it waits for the longer of the two following periods for a standby to connect:- For a minimum of 30 seconds
- For the number of seconds that is specified by the hadr_timeout database configuration parameter.
After an HADR pair establishes a connection, they exchange heartbeat messages. The heartbeat interval is computed from factors like the hadr_timeout and hadr_peer_window configuration parameters. It is reported by the HEARTBEAT_INTERVAL field in the MON_GET_HADR table function and the db2pd command. If one database does not receive any message from the other database within the number of seconds that is specified by the hadr_timeout configuration parameter, it initiates a disconnect. This behavior means that at most, it takes the number of seconds that is specified by the hadr_timeout configuration parameter for an HADR database to detect the failure of either its partner database or the intervening network. If you set the hadr_timeout configuration parameter too low, you might receive false alarms and frequent disconnections.
The setting for the hadr_peer_window configuration parameter does not have to be the same on the primary and standby databases; the principal standby uses the peer window setting of the primary. The exception to this is if you set up HADR without using the hadr_target_list configuration parameter (a method that is deprecated starting in 10.5), in which case, the hadr_peer_window configuration parameter must be identical on the primary and standby databases.
Peer window cannot be enabled (that is, it must be set to 0) in the following situations:- If you are using the Db2 pureScale feature
- If the hadr_syncmode parameter is set to ASYNC or SUPERASYNC
- If you are configuring an auxiliary standby
If you have the hadr_peer_window configuration parameter set to a nonzero value and the primary loses connection to the standby in peer state, the primary database does not commit transactions until the connection with the standby database is restored or the time value of the hadr_peer_window configuration parameter elapses, whichever happens first.
For maximal availability, the default value for the hadr_peer_window database configuration parameter is 0. When this parameter is set to 0, as soon as the connection between the primary and the standby is closed, the primary drops out of peer state to avoid blocking transactions. The connection can close because the standby closed the connection, a network error is detected, or timeout is reached. For increased data consistency, but reduced availability, you can set the hadr_peer_window database configuration parameter to a nonzero value.
For more information, see
Setting the hadr_timeout and hadr_peer_window database configuration parameters
.
The following sample configuration is for the primary and standby databases:
HADR_TARGET_LIST host2.ibm.com:hadr_service
HADR_LOCAL_HOST host1.ibm.com
HADR_LOCAL_SVC hadr_service
HADR_REMOTE_HOST host2.ibm.com
HADR_REMOTE_SVC hadr_service
HADR_REMOTE_INST dbinst2
HADR_TIMEOUT 120
HADR_SYNCMODE NEARSYNC
HADR_PEER_WINDOW 120
HADR_TARGET_LIST host1.ibm.com:hadr_service
HADR_LOCAL_HOST host2.ibm.com
HADR_LOCAL_SVC hadr_service
HADR_REMOTE_HOST host1.ibm.com
HADR_REMOTE_SVC hadr_service
HADR_REMOTE_INST dbinst1
HADR_TIMEOUT 120
HADR_SYNCMODE NEARSYNC
HADR_PEER_WINDOW 120