HADR delayed replay
HADR delayed replay helps prevent data loss due to errant transactions. To implement HADR delayed replay, set the hadr_replay_delay database configuration parameter on the HADR standby database.
Delayed replay intentionally keeps the standby database at a point in time that is earlier than that of the primary database by delaying replay of logs on that standby. If an errant transaction is executed on the primary, you have until the configured time delay has elapsed to take action to prevent the errant transaction from being replayed on the standby. To recover the lost data, you can either copy this data back to the primary, or you can have the standby take over as the new primary database.
(current time on the standby - value of the hadr_replay_delay configuration parameter) >=
timestamp of the committed log record
You should
set the hadr_replay_delay database configuration
parameter to a large enough value to allow time to detect and react
to errant transactions on the primary.You can use this feature with one standby, multiple standbys, and in a Db2® pureScale® environment. With multiple standbys, typically one or more standbys stays current with the primary for high availability or disaster recovery purposes, and one standby is configured with delayed replay for protection against errant transactions. If you use this feature with one standby, you should not enable IBM® Tivoli® System Automation for Multiplatforms because the takeover will fail.
- You can set the hadr_replay_delay configuration parameter only on a standby database.
- A TAKEOVER command on a standby with replay delay enabled fails. You must first set the hadr_replay_delay configuration parameter to 0 and then deactivate and reactivate the standby to pick up the new value, and then issue the TAKEOVER command.
- The delayed replay feature is supported only in SUPERASYNC mode.
Because log replay is delayed, numerous non-replayed log data might
accumulate on the standby, filling up receive buffer and spool (if
configured). In other synchronization modes, this would cause the
primary to be blocked.
The objective of this feature is to protect against application error. If you want to use this feature and ensure that there is no data loss in the event of a primary failure, consider a multiple standby setup with a more synchronous setting on the principal standby.
- When you upgrade HADR databases, an important verification step for upgrade is to ensure that the primary's log shipping position matches the standby's log replay position. Naturally, a standby with replay delay configured can interfere with this verification step and cause it to fail. To avoid any failures, you must first set the hadr_replay_delay configuration parameter to 0, deactivate and reactivate the standby database to pick up the new value, and then start the upgrade procedure.
Recommendations
- Delayed replay and disaster recovery
- Consider using a small delay if you are using the standby database for disaster recovery purposes and errant transaction protection.
- Delayed replay and the HADR reads on standby feature
- Consider using a small delay if you are using the standby database
for reads on standby purposes, so that reader sessions can see more
up-to-date data. Additionally, because reads on standby runs in
uncommitted read
isolation level, it can see applied, but not yet committed changes that are technically still delayed from replay. These uncommitted transactions can be rolled back in errant transaction recovery procedure when you roll forward the standby to the PIT that you want and then stop. - Delayed replay and log spooling
- If you enable delayed replay, it is recommended that you also enable log spooling by setting the hadr_spool_limit database configuration parameter. Because of the intentional delay, the replay position can be far behind the log receive position on the standby. Without spooling, log receive can only go beyond replay by the amount of the receive buffer. With spooling, the standby can receive many more logs beyond the replay position, providing more protection against data loss in case of primary failure. Note that in either case, because of the mandatory SUPERASYNC mode, the primary will not be blocked by the delayed replay.