Replay-only window (and replay-only window avoidance) on the active standby database

When an HADR active standby database is replaying DDL log records or maintenance operations, new or existing application connections to the standby database may be effected.

In versions prior to 11.1.4.4, or when the DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW registry variable is set to OFF, the standby enters the "replay-only window" – all existing connections to the standby are terminated (SQL1224N) and new connections to the standby are rejected (SQL1776N, reason code 4). New connections are allowed on the standby after the replay of all active DDL or maintenance operations has completed. The only user connections that can remain active on a standby in the replay-only window are connections that are executing DEACTIVATE DATABASE or TAKEOVER commands. When applications are forced off at the outset of the replay-only window, an error is returned (SQL1224N). Depending on the number of readers connected to the active standby, there may be a slight delay before the DDL log records or maintenance operations are replayed on the standby.

In Version 11.1.4.4 or later releases, when the DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW registry variable is set to ON, the standby database does not enter the replay-only window, we call this "replay-only window avoidance" – only existing connections on the standby database which hold locks on tables, indexes or other objects that conflict with the DDL or maintenance operation to be replayed will be forced off. Other connections, including new connections, are permitted to continue.

There are a number of DDL statements and maintenance operations that, when run on the HADR primary, will either trigger the replay-only window on the standby (when the replay-only window is enable), or will force off the application connections which hold a lock conflicting with the DDL statement or maintenance operation being replayed (when replay-only window avoidance is enabled). The following lists are not exhaustive.

DDL statements

(* statements denoted with an asterisk will result in a replay-only window even when replay-only window avoidance is enabled).

  • CREATE, ALTER, or DROP TABLE (except DROP TABLE for DGTT)
  • CREATE GLOBAL TEMP TABLE
  • TRUNCATE TABLE
  • RENAME TABLE
  • RENAME TABLESPACE
  • CREATE, DROP, or ALTER INDEX
  • CREATE or DROP VIEW
  • CREATE, ALTER, or DROP TABLESPACE
  • CREATE, ALTER, or DROP BUFFER POOL
  • CREATE, ALTER, or DROP FUNCTION
  • CREATE, ALTER, or DROP PROCEDURE
  • CREATE or DROP TRIGGER
  • CREATE, ALTER, or DROP TYPE
  • CREATE, ALTER, or DROP ALIAS
  • CREATE or DROP SCHEMA
  • CREATE, ALTER, or DROP METHOD
  • CREATE, ALTER, or DROP MODULE
  • CREATE, ALTER, or DROP NICKNAME
  • CREATE, ALTER, or DROP SEQUENCE
  • CREATE, ALTER, or DROP WRAPPER
  • CREATE, ALTER, or DROP FUNCTION MAPPING
  • CREATE or DROP INDEX EXTENSION
  • CREATE or DROP INDEX FOR TEXT
  • CREATE or DROP EVENT MONITOR
  • CREATE, ALTER, or DROP SECURITY LABEL
  • CREATE, ALTER, or DROP SECURITY LABEL COMPONENT
  • CREATE, ALTER, or DROP SECURITY POLICY
  • CREATE or DROP TRANSFORM
  • CREATE, ALTER, or DROP TYPE MAPPING
  • CREATE, ALTER, or DROP USER MAPPING
  • CREATE or DROP VARIABLE
  • CREATE, ALTER, or DROP WORKLOAD *
  • GRANT USAGE ON WORKLOAD *
  • REVOKE USAGE ON WORKLOAD *
  • GRANT *
  • REVOKE *
  • CREATE, ALTER, or DROP SERVICE CLASS *
  • CREATE, ALTER, or DROP WORK CLASS SET *
  • CREATE, ALTER, or DROP WORK ACTION SET *
  • CREATE, ALTER, or DROP THRESHOLD *
  • CREATE, ALTER, or DROP HISTOGRAM TEMPLATE *
  • AUDIT *
  • CREATE, ALTER, or DROP AUDIT POLICY *
  • CREATE or DROP ROLE
  • CREATE, ALTER, or DROP TRUSTED CONTEXT *
  • REFRESH TABLE
  • SET INTEGRITY
Maintenance operations
  • Classic, or offline, reorg
  • Inplace, or online, reorg
  • Index reorg (indexes all, individual index)
  • MDC and ITC reclaim reorg
  • Load
  • Bind or rebind
  • db2rbind
  • Runstats
  • Table move
  • Auto statistics
  • Auto reorg
  • Real Time Statistics
Other operations or actions
  • Automatic Dictionary Creation for tables with COMPRESS YES attribute
  • Asynchronous Index Cleanup on detached table partition
  • Implicit rebind
  • Implicit index rebuild
  • Manual update of statistics.
  • Deferred MDC rollout
  • Asynchronous Index cleanup after MDC rollout
  • Reuse of a deleted MDC or ITC block on insert into MDC or ITC table
  • Asynchronous background processes updating catalog tables SYSJOBS and SYSTASKS for inserting, updating, and deleting tasks

Diagnostic log messages during replay-only window

When a replay-only window is active, the db2diag.log will contain several diagnostic messages, which indicate the start and end of the replay-only window. When the registry variable DB2_HADR_REPLAY_ONLY_WINDOW_DIAGLEVEL is enabled for it, additional details about the DDL operation or utility invocation which caused the replay-only window are also displayed. Note that the LOG_DDL_STMTS database configuration parameter should be set to YES on the Primary database for DDL statement text to be transmitted to the Standby so that it can be displayed.

For example:

2017-11-22-15.48.36.321657-300 I522432E731           LEVEL: Info
PID     : 25476                TID : 140255646705408 KTID : 26687<
PROC    : db2sysc
INSTANCE: db2inst1             NODE : 000 DB   : TESTDB
APPHDL  : 0-8                  APPID: *LOCAL.DB2.171122204107
HOSTNAME: myhost1
EDUID   : 87      
EDUNAME: db2redom (TESTDB)
FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpStartHadrReplayOnlyWindow, probe:9140
MESSAGE : Replay only window is triggered by this log record: LogStreamId / TID<
          / LSO / action
DATA #1 : SQLP_TID, PD_TYPE_SQLP_TID, 6 bytes
000000000FFA
DATA #2 : unsigned integer, 8 bytes
47025987
DATA #3 : db2LogStreamIDType, PD_TYPE_DB2_LOG_STREAM_ID, 2 bytes
0
DATA #4 : String, 3 bytes
DDL
2017-11-22-15.48.36.326071-300 E523164E517           LEVEL: Warning
PID     : 25476                TID : 140255646705408 KTID : 26687<
PROC    : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : TESTDB
APPHDL  : 0-8                  APPID: *LOCAL.DB2.171122204107
HOSTNAME: myhost1
EDUID   : 87                   EDUNAME: db2redom (TESTDB)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrForceAppsInReplayOnlyWindow, probe:100
DATA #1 : String, 28 bytes
Replay only window is active 
2017-11-22-15.48.36.334268-300 I524198E504           LEVEL: Info
PID     : 25476                TID : 140255646705408 KTID : 26687<
PROC    : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : TESTDB
APPHDL  : 0-8                  APPID: *LOCAL.DB2.171122204107
HOSTNAME: myhost1
EDUID   : 87                   EDUNAME: db2redom (TESTDB)
FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpSetDDLStmtForHadrReplayOnlyWindow, probe:9150
MESSAGE : DDL statement text
DATA #1 : String, 23 bytes
drop table test_script1
2017-11-22-15.49.28.915844-300 E546560E554           LEVEL: Warning
PID     : 25476                TID : 140255646705408 KTID : 26687<
PROC    : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : TESTDB
APPHDL  : 0-8                  APPID: *LOCAL.DB2.171122204107
HOSTNAME: myhost1
EDUID   : 87                   EDUNAME: db2redom (TESTDB)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrEndRep
layOnlyWindow, probe:210
DATA #1 : String, 73 bytes
Replay only window is inactive, connections to Active Standby are allowed

Monitoring the replay-only window

You can monitor the replay-only window using the db2pd command with the -hadr option (on either the standby or the primary) or the MON_GET_HADR table function (from the primary). The standby's status, including information about the replay-only window, is sent to the primary on every heartbeat.

There are three pertinent elements to monitor:
  • STANDBY_REPLAY_ONLY_WINDOW_ACTIVE, which indicates whether DDL or maintenance-operation replay is in progress on the standby. Normally, the value is N, but when the replay-only window is active, the value is Y.
  • STANDBY_REPLAY_ONLY_WINDOW_START, which indicates the time at which the current replay-only window (if there is one) became active.
  • STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT, which indicates the total number of existing uncommitted DDL or maintenance transactions executed so far in the current replay-only window (if there is one).
To use the table function, issue something similar to the following query on the primary:
select STANDBY_ID, STANDBY_REPLAY_ONLY_WINDOW_ACTIVE, STANDBY_REPLAY_ONLY_WINDOW_START,
	STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT from table (mon_get_hadr(NULL))
Here is an example using the db2pd command on a standby that is currently in a replay-only window:
Database Member 0 -- Database HADRDB -- Active -- Up 0 days 00:23:17 -- Date 06/08/2011 13:57:23

                              HADR_ROLE = STANDBY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = NEARSYNC
                             STANDBY_ID = 1
                          LOG_STREAM_ID = 0
                             HADR_STATE = PEER
                             HADR_FLAGS = 
                    PRIMARY_MEMBER_HOST = hostP.ibm.com
                       PRIMARY_INSTANCE = db2inst
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = hostS1.ibm.com
                       STANDBY_INSTANCE = db2inst
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:38:10.199479 (1307565490)
            HEARTBEAT_INTERVAL(seconds) = 25
                  HADR_TIMEOUT(seconds) = 120
          TIME_SINCE_LAST_RECV(seconds) = 3
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.006298
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.516
                    LOG_HADR_WAIT_COUNT = 82
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 50772
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87616
              PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
              STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 0
                  STANDBY_SPOOL_PERCENT = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = Y
      STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = Y
       STANDBY_REPLAY_ONLY_WINDOW_START = 06/08/2011 13:50:23
  STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT = 5

Recommendations for minimizing the impact of the replay-only window

Because replay operations on an HADR standby take priority over readers, frequent read-only windows can be disruptive to readers connected to or attempting to connect to the standby. To avoid or minimize this impact, consider the following recommendations:
  • Run DDL and maintenance operations during a scheduled maintenance window, preferably at off-peak hours.
  • Run DDL operations collectively rather than in multiple groups.
  • Run REORG or RUNSTATS only on the required tables instead of all tables.
  • Terminate applications on the active standby using the FORCE APPLICATION command with the ALL option before running the DDL or maintenance operations on the primary. Monitor the replay-only window to determine when it is inactive, and redeploy the applications on the standby.

Monitoring lock conflicts when replay-only window avoidance is enabled

When replay-only window avoidance is enabled, application connections on the standby database which hold locks on tables, indexes or other objects that conflict with the DDL or maintenance operation to be replayed will be forced off (returning an SQL1224N error) and the standby database will proceed to replay the DDL or maintenance operation. New or existing application connections on the standby database which then attempt to access the table, index, or objects which are locked by the replay agent may experience a lock wait condition for the duration of time that the replay agent is replaying the DDL or maintenance operation and holding the desired lock. Monitoring and resolving lock wait conditions is described well in Diagnosing and resolving locking problems.

When diagnosing a lock wait condition, it is possible to determine if a lock is held by a replay agent, by examining the transaction Tflag value of the lock holder for the existence of the 0x00000010 bit (other bits are inconsequential).

For example, consider this db2pd -locks output, showing TranHdl 34 in Wait (‘W’) state for table lock ‘0007000F000000000000000054’ in intent-share (‘IS’) mode, and TranHdl 9 holds this table lock in Granted (‘G’) state in super-exclusive (‘Z’) mode.

$ db2pd -db MYDB1 -locks

Locks:
Address TranHdl Lockname                   Type         Mode Sts Owner …
0x…     9       0007000F000000000000000054 TableLock    ..Z  G   9    …
0x…     34      0007000F000000000000000054 TableLock    .IN  W   34   …
0x…     34      0000000E0000000100014060D6 VarLock      ..S  G   34   …
0x…     34      0000002C000000000000000054 TableLock    .IN  G   34   …
…etc…

db2pd -transactions shows the lock holder TranHdl 9 in WRITE state, with a 0x00000010 bit set in the Tflags:

$ db2pd -db MYDB1 -transactions

Transactions:
Address AppHandl ... TranHdl ... State 	Tflag 
 0x... 	0 ... 		4 ... 	READ 	0x00000000 ...
 0x... 	0 ... 		5 ... 	READ 	0x00000000 ...
 0x... 	7 ... 		9 ... 	WRITE 	0x00000010 ...
 …etc…
 0x... 	22 ... 		34 ... 	READ 	0x00000000 ...