Database accessibility during backward phase of crash recovery or HADR takeover
If a database fails unexpectedly before all transactions (or units of work) are committed, or before the changes associated with committed transactions are fully written to disk, then the database is left in an inconsistent state. When a database restart is performed, these transactions are rectified during a process called crash recovery. During crash recovery, the changes associated with these transactions are first replayed from the transaction logs (we often refer to this as the Forward or Redo phase of crash recovery). Transactions which were not yet committed at the time of the database failure are then rolled back (we often refer to this as the Backward or Undo phase of crash recovery). A TAKEOVER HADR operation can also perform a Backward phase to rollback changes that were not yet committed at the time of the operation.
For prior releases, the database is only accessible after the full completion of crash recovery or HADR Takeover. This is not ideal, since a very large transaction could require a lengthy Backward phase, leaving the database entirely inaccessible until it is completed.
db2set DB2_ONLINERECOVERY=YES
The
registry variable is read at the beginning of crash recovery, it is not possible to enable or
disable while crash recovery is actively running. - SQL Data Definition Language (DDL) statements.
- Operations which modify the database catalogs.
- Operations against column - organized tables.
- Workloads that generate some other log - records that are not yet supported .
After completion of the Synchronous portion of the Backward phase, database access is allowed. We refer to this as the Asynchronous portion of the Backward phase.
- Any tables, indexes, or objects that are not associated with any uncommitted transactions are fully accessible during the asynchronous Backward phase.
- Any tables, indexes, or objects that are associated with an uncommitted
transaction are protected by an Exclusive (X) lock if
DB2_ONLINERECOVERY_WITH_UR_ACCESS=YES
, or a Super Exclusive (Z) lock ifDB2_ONLINERECOVERY_WITH_UR_ACCESS=NO
. As the Backward phase compensates the uncommitted transactions, this lock will be released when there are no more remaining uncommitted transactions referencing the table/object. Thus, prior to the release of these locks, these tables/objects are only accessible to queries using UR isolation level whenDB2_ONLINERECOVERY_WITH_UR_ACCESS=YES,
and completely inaccessible whenDB2_ONLINERECOVERY_WITH_UR_ACCESS=NO.
If these tables also contain LONG or LOB fields, or are organized by MDC or ITC, they will always be protected by a Super Exclusive (Z) lock. These objects are never accessible during the Asynchronous Backward phase.
Note: The undo of these transactions may inherently require a Super Exclusive (Z) lock. For example, undoing utility operations such as LOAD), and that will always supersede the Exclusive (X) lock acquired during the Asynchronous portion whenDB2_ONLINERECOVERY_WITH_UR_ACCESS=YES.
The RESTART DATABASE or TAKEOVER HADR command will return at the start of the Asynchronous Backward phase, when the database becomes connectable.
While the crash recovery operation is running, a DEACTIVATE DB operation will return an SQL1495W warning/error. A db2stop operation will return an SQL1025N error. Attempting to force the crash recovery application agent using FORCE APPLICATION 0 will return an SQL0104N error. A db2stop force operation is allowed and will interrupt the crash recovery. A TAKEOVER HADR operation is also allowed during any phase of crash recovery.
If the INDEXREC
configuration parameter is configured
for index recreation at database restart time, then indexes will
be recreated after the completion of the Asynchronous Backward phase.
Monitoring Crash Recovery Progress:
The progress of crash recovery can be monitored during any phase
using either of the "db2pd -recovery" option; the LIST
UTILITIES SHOW DETAIL
CLP command; the SNAPUTIL
administrative
view or SNAP_GET_UTIL
table function. During the
asynchronous backward phase, the MON_GET_UTILITY
table
function and the CHANGE HISTORY event monitor can also be used.
$ db2pd -db mydb1 -recovery
Recovery:
Recovery Status 7400010F40000000
Current Log S0000031.LOG
Current LSN 000000000006114D
Current LRI 000000000000000100000000000026E8000000000006114D
Current LSO 80969655
Job Type CRASH RECOVERY
Job ID 1
Job Start Time (1479058883) Sun Nov 13 12:41:23 2016
Job Description Crash Recovery
Invoker Type User
Total Phases 2
Current Phase 2
Progress:
Address PhaseNum Description StartTime CompletedWork TotalWork
0x078... 1 Forward Sun Nov 13 12:41:231290045 4 bytes 12900454 bytes
0x078... 2 Backward Sun Nov 13 12:49:27 0 bytes 12900454 bytes
The Forward phase of crash recovery (when transactions are replayed from the transaction logs) is displayed as the first phase, and its progress can be monitored using the Total Work and Completed Work values over time.
The Backward phase of crash recovery (where uncommitted transactions are rolled back or undone) is displayed after the first phase completes. Its progress can also be monitored by observing the Total Work and Completed Work values over time.
CompletedWork
value is observed 10mins
apart,
and the difference in the CompletedWork
value is 123,456
bytes
during this duration of time, then we can easily
extrapolate the time to complete the remaining work:ObservationTime_inMins = 10mins
WorkDone_inBytes = 123,456bytes
RemainingWork_inBytes = TotalWork – CompletedWork;
TimeToCompletion_inMins = (RemaingWork_inBytes / WorkDone_inBytes) * ObservationTime_inMins.
ADM1505I
message is printed in the administration notification log
("ADM1505I Crash recovery has completed synchronous processing"), and the following message will be
displayed in the db2diag.log:
2016-11-12-50.03.33.674725-300 I269681A541 LEVEL: Info
PID : 63373472 TID : 2058 KTID : 131596723
PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : MYDB1
APPHDL : 0-7 APPID: *LOCAL.dsciaraf.161114020311
AUTHID : DB2INST1 HOSTNAME: hotelaix2
EDUID : 2058 EDUNAME: db2agent (X)
FUNCTION: Db2 UDB, recovery manager, sqlpresr, probe:3170
DATA #1 : <preformatted>
Crash recovery synchronous phase completed. Next LSN is 000000000006113C.
If
the Backward phase of crash recovery performs an Asynchronous portion
, then the MON_GET_UTILITY
table function will display
an entry with a UTILITY_TYPE
value of ‘ONLINERECOVERY
’
after the database opens up for connectivity .
$ db2 “ SELECT COORD_MEMBER, APPLICATION_HANDLE AS APPHDL, SUBSTR(APPLICATION_NAME, 1, 30) AS APPNAME,
SUBSTR(SESSION_AUTH_ID, 1, 10) AS USER, UTILITY_TYPE, UTILITY_INVOKER_TYPE, SUBSTR(UTILITY_DETAIL, 1, 50) AS CMD FROM TABLE(MON_GET_UTILITY(-2)) AS T ”
COORD_MEMBER APPHDL APPNAME USER UTILITY_TYPE UTILITY_INVOKER_TYPE CMD
------------------------------------------------------------------------------------------------------------------------------------
0 21 db2undo_trans - ONLINERECOVERY AUTO RESTART DATABASE MYDB1
1 record(s) selected.
The CHANGE HISTORY
event monitor
is also capable of recording the start and end of the asynchronous
portion of the Backward phase of crash recovery using an event-control
type called ONLINERECOVERY
.
Monitoring applications during the Backward phase of crash recovery
After database access is allowed during the Backward phase of crash recovery, monitoring of applications should be done using traditional methods, with the following considerations:
All uncommitted transactions that are part of the Backward phase of crash recovery will have an Application Handle (AppHandl) of zero (0). These transactions will also have a transaction state of 'ABORT' and Tflag2 bit 0x00000001 set.
$ db2pd -db MYDB1 -transactions
Transactions:
Address AppHandl ... TranHdl ... State Tflag Tflag2 ... LogSpace ...
0x... 0 ... 3 ... ABORT 0x00000000 0x00000001 ... 780 ...
0x... 0 ... 4 ... ABORT 0x00000000 0x00000001 ... 9465875 ...
0x... 7 ... 5 ... READ 0x00000000 0x00000000 ... 0 ...
0x... 8 ... 6 ... READ 0x00000000 0x00000000 ... 0 ...
Any
tables, indexes, or objects that were associated with an uncommitted
transaction are protected by an Exclusive (X) or Super Exclusive
(Z) lock for the duration of the Backward phase of crash recovery.
This means that applications which attempt to access these objects
during the Backward phase of crash recovery are subject to lock -
wait conditions. MON_GET_APPL_LOCKWAIT
table function , or the db2pd
-wlocks option. For example: $ db2 "select LOCK_OBJECT_TYPE as TYPE, LOCK_NAME, LOCK_MODE as MODE,
LOCK_MODE_REQUESTED as MODE_REQ, LOCK_STATUS as STATUS,
HLD_APPLICATION_HANDLE as HL
D_APPHANDL, REQ_APPLICATION_HANDLE as REQ_APPHANDL
from TABLE (MON_GET_APPL_LOCKWAIT(NULL, -2))"
TYPE LOCK_NAME MODE MODE_REQ STATUS HLD_APPHANDL REQ_APPHANDL
----------------------------------------------------------------------------------------------------------
TABLE 02000500000000000000000054 X IS W 0 7
1 record(s) selected.
In this example, Application Handle 7 is requesting an Intent Share (IS) mode lock on a Table, which is already held in Exclusive (X) mode by Application Handle 0 (an uncommitted transaction that is part of crash recovery).
Additional
lock details can be observed using the MON_FORMAT_LOCK_NAME
table
function. For example:
$ db2 "SELECT SUBSTR(TABNAME,1,20) AS TABNAME, SUBSTR(VALUE,1,50) AS
VALUE FROM TABLE(MON_FORMAT_LOCK_NAME('02000500000000000000000054')) as LOCK"
NAME VALUE
------------------------------------------------------
LOCK_OBJECT_TYPE TABLE
TBSP_NAME TSPA
TABSCHEMA MYSCHEM
A1TABNAME MYTABLE3
4 record(s) selected.
In this example, the lock wait
condition is associated with a table lock on table MYSCHEMA1.MYTABLE3
in
tablespace TSPA.
Additional details about the application in
lock - wait state can be obtained using the WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES
table
function:
db2 "SELECT APPLICATION_HANDLE as APPHNDL,
LOCK_WAIT_TIME, SUBSTR(STMT_TEXT,1,45) AS STMT_TEXT
FROM TABLE(WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES(NULL,-1)) AS T"
APPHNDL LOCAL_START_TIME LOCK_WAIT_TIME STMT_TEXT
-----------------------------------------------------------------------------------------
10 2016-11-13-12.55.35.821373 0 SELECT APPHNDL,LOCK_WAIT_TIME
7 2016-11-13-12.53.33.777571 27 select count(*) from t2
2 record(s) selected.