IBM Support

DB2 : SQL2038N while reading a log file on HADR Primary

Technical Blog Post


Abstract

DB2 : SQL2038N while reading a log file on HADR Primary

Body

 

Recently, we had a case where customer reported of problems on Primary HADR when their Q-replication was trying to read a log but failed with SQL2038N. 

 

 

Problem Description :

 

Customer used TSM for log archiving and the log archival location was shared between Primary(write to TSM) and Standby(read from TSM). When Q-rep tried to read a log on Primary, it reported the following error :

 

ASN0005E CAPTURE "ASN01" : "LogrdThread". The Capture program encountered an error when reading the DB2 log.
The log sequence number is “<log_sequence_number>", the SQLCODE is "-2038", and the reason code is "".


 

Root cause identification :


1> Db2diag.log had the below error logged for the SQL2038N :

 

EDUID   : 112                  EDUNAME: db2lfr.0 (<dbname>) 0

FUNCTION: DB2 UDB, recovery manager, sqlplfrIsLogFromValidChain, probe:4700

DATA #1 : <preformatted>

LFR Scan Num            = 4304204

LFR Scan Caller's EDUID = 1715

Log extent 105382 on log stream 0 starts on a log chain larger than the most recent valid log chain.

Valid log chain:  8075083570/0000004E128922DA/1532490193/1227929500

Extent log chain: 8123381362/0000004E7A5CAA87/1532659966/1256964695

 

This shows that the log file we are trying to read (Log extent 105382) also exists on a higher log chain than the valid one we are currently working on. 

 

2> On checking TSM, we saw 2 copies of same log file in 2 different chains:

$ db2adutl query logs | grep -i S0105468.LOG


Log file: S0105468.LOG, Chain Num: 59, Log stream: 0, Taken at: 2018-07-27-03.19.10

Log file: S0105468.LOG, Chain Num: 58, Log stream: 0, Taken at: 2018-07-28-07.20.15



So, the SQL2038N was received because db2 on primary tried to access the log file - S0105468.LOG from the wrong log chain.  
 
Explanation :

Standby is normally in a rollforward pending state. In this case, the HADR was no more active. On asking we found out that order to issue a reorg on Standby, customer had brought it out of rollforward pending state, which created a new log chain 59.

 

Note : Although it might work in some cases, this operation is striclty unsupported by IBM DB2. No db maintenance activity which needs to break hadr and make a connection to the standby should be performed in a HADR setup. Should such an activity happen, we recommend to setup the HADR again from scratch by taking a new backup and restoring it in Standby. 

 

Because the log directory was shared on TSM, the Q-replication from Primary tried to read the next log in sequence S0105468 from that location.

Q-rep will always read from the highest log chain, and since the log file S0105468.LOG generated by standby was in log chain 59 (in TSM), it asked DB2 to read from there. Obviously, DB2 did not find the next log record in log chain 58 (current valid log chain on Primary) it was expecting and reported it with SQL2038N. 

 

Please refer below link for more information about when log chains are created : 

Title : Log chains are associated with bringing database out of rollforward.

Existing tech note: /support/pages/node/127841

 

Solution : 

Hiding the log chain 59 so that the Q-replication will get the valid log from log chain 58 is the solution.

 

We suggested following steps in this case :

1> Let us query the log files on this chain 59 :

db2adutl query logs chain 59 db <dbname>
2> Extract logs from chain 59 to a different location (just in case we need to check or use them later): 
db2adutl extract logs chain 59 db <dbname>
3> Delete the logs on chain 59 : 
db2adutl delete logs chain 59 db <dbname> 

 
On starting Qrep, it read the required log from log chain 58 and operations were resumed as usual. 

 

As for HADR, we recommended to rebuild it from scratch as the earlier setup had become unsupported for already specified reasons.


 

 

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140154