HADR delayed replay

HADR delayed replay helps prevent data loss due to errant transactions. To implement HADR delayed replay, set the hadr_replay_delay database configuration parameter on the HADR standby database.

Delayed replay intentionally keeps the standby database at a point in time that is earlier than that of the primary database by delaying replay of logs on that standby. If an errant transaction is executed on the primary, you have until the configured time delay has elapsed to take action to prevent the errant transaction from being replayed on the standby. To recover the lost data, you can either copy this data back to the primary, or you can have the standby take over as the new primary database.

Delayed replay works by comparing timestamps in the log stream, which is generated on the primary, and the current time of the standby. As a result, it is important to synchronize the clocks of the primary and standby databases. Transaction commit is replayed on the standby according to the following equation:
(current time on the standby - value of the hadr_replay_delay configuration parameter) >= 
	timestamp of the committed log record
You should set the hadr_replay_delay database configuration parameter to a large enough value to allow time to detect and react to errant transactions on the primary.

You can use this feature with one standby, multiple standbys, and in a Db2® pureScale® environment. With multiple standbys, typically one or more standbys stays current with the primary for high availability or disaster recovery purposes, and one standby is configured with delayed replay for protection against errant transactions. If you use this feature with one standby, you should not enable IBM® Tivoli® System Automation for Multiplatforms because the takeover will fail.

There are several important restrictions for delayed replay:
  • You can set the hadr_replay_delay configuration parameter only on a standby database.
  • A TAKEOVER command on a standby with replay delay enabled fails. You must first set the hadr_replay_delay configuration parameter to 0 and then deactivate and reactivate the standby to pick up the new value, and then issue the TAKEOVER command.
  • The delayed replay feature is supported only in SUPERASYNC mode. Because log replay is delayed, numerous non-replayed log data might accumulate on the standby, filling up receive buffer and spool (if configured). In other synchronization modes, this would cause the primary to be blocked.

    The objective of this feature is to protect against application error. If you want to use this feature and ensure that there is no data loss in the event of a primary failure, consider a multiple standby setup with a more synchronous setting on the principal standby.

  • When you upgrade HADR databases, an important verification step for upgrade is to ensure that the primary's log shipping position matches the standby's log replay position. Naturally, a standby with replay delay configured can interfere with this verification step and cause it to fail. To avoid any failures, you must first set the hadr_replay_delay configuration parameter to 0, deactivate and reactivate the standby database to pick up the new value, and then start the upgrade procedure.

Recommendations

Delayed replay and disaster recovery
Consider using a small delay if you are using the standby database for disaster recovery purposes and errant transaction protection.
Delayed replay and the HADR reads on standby feature
Consider using a small delay if you are using the standby database for reads on standby purposes, so that reader sessions can see more up-to-date data. Additionally, because reads on standby runs in uncommitted read isolation level, it can see applied, but not yet committed changes that are technically still delayed from replay. These uncommitted transactions can be rolled back in errant transaction recovery procedure when you roll forward the standby to the PIT that you want and then stop.
Delayed replay and log spooling
If you enable delayed replay, it is recommended that you also enable log spooling by setting the hadr_spool_limit database configuration parameter. Because of the intentional delay, the replay position can be far behind the log receive position on the standby. Without spooling, log receive can only go beyond replay by the amount of the receive buffer. With spooling, the standby can receive many more logs beyond the replay position, providing more protection against data loss in case of primary failure. Note that in either case, because of the mandatory SUPERASYNC mode, the primary will not be blocked by the delayed replay.