Technical Blog Post
Abstract
DB2's DIAGPATH filled up with files named like *.SQLP.HADR.events.bin
Body
There are cases when DB2's DIAGPATH is quickly filled up with following files in a short period of time even the database server has been working very well.
*.SQLP.HADR.events.bin
*.SQLP.GENERAL.events.bin
*.SQLP.HIGHFREQ.events.bin
By examing the db2diag.log, we are able to identify the possible root cause
2017-08-23-20.38.02.123217+000 I84443E2689 LEVEL: Warning
PID : 12345 TID : 140721814234880 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : PRODB
APPHDL : 0-58267 APPID: 10.182.223.0.55902.170823200102
AUTHID : CDCAUDIT HOSTNAME: dbserver12b
EDUID : 1771 EDUNAME: db2agent (PRODB) 0
FUNCTION: DB2 UDB, recovery manager, sqlpshrValidateLogStreamEndPoint, probe:1122
DATA #1 : <preformatted>
Some log records might not be returned on this call, will be returned
on next call.please collect the diag data and send it to IBM support
CALLSTCK: (Static functions may not be resolved correctly, as they are
resolved to the nearest symbol)
[0] 0x00007FFFF499AC37 pdLogVPrintf + 0x2F7
[1] 0x00007FFFF499A92C pdLogPrintf + 0x8C
[2] 0x00007FFFF3E00D69 _Z32sqlpshrValidateLogStreamEndPointP14sqlpMasterDbcbP17SQLPSHR_MASTER_CBP17SQLPSHR_WORKER_CBmmbb + 0x7C9
[3] 0x00007FFFF3DFBEB5 _Z26sqlpshrFlushLogsAndRequeueP8sqeAgentP17SQLPSHR_MASTER_CBP14sqlpMasterDbcbPbm + 0x2E5
[4] 0x00007FFFF3E03521 _Z15sqlpshrScanNextPvP8sqeAgentmPP16SQLPR_LOGREC_DISPm + 0x1A21
[5] 0x00007FFFF3D721F7 _Z22sqlpgReadLogReadActionP8sqeAgentP14sqlpMasterDbcbjP15SQLP_READLOG_CBP17SQLP_READLOG_INFOmPc + 0x277
[6] 0x00007FFFF3D7425D _Z15sqlpgReadLogAPIP8sqeAgentjP6db2LRIS2_S2_mmPcP17SQLP_READLOG_INFOP5sqlca + 0x51D
[7] 0x00007FFFF3E2D471 _Z22sqlpReadLogInternalAPIP8sqeAgentjP6db2LRIS2_S2_PcjP17SQLP_READLOG_INFOjP5sqlca + 0xC1
[8] 0x00007FFFF3E2DA69 _Z15sqlpReadLogDRDAP5sqldaS0_P8sqeAgentP5sqlca + 0x429
[9] 0x00007FFFF1C52B9E _Z19sqlerKnownProcedureiPcPiP5sqldaS2_P13sqlerFmpTableP8sqeAgentP5sqlca + 0x8EE
[10] 0x00007FFFF1C5478E _Z11sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x5CE
[11] 0x00007FFFF4A3E3E4 _Z19sqljs_ddm_excsqlsttP14db2UCinterfaceP13sqljDDMObject + 0x8F4
[12] 0x00007FFFF4A3B5EE _Z21sqljsParseRdbAccessedP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x7E
[13] 0x00007FFFF4A3BCF7 _Z10sqljsParseP13sqljsDrdaAsCbP14db2UCinterfaceP8sqeAgentb + 0x377
[14] 0x00007FFFF1CF4A72 /home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x14B3A72
[15] 0x00007FFFF1CF387F /home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x14B287F
[16] 0x00007FFFF1CF1663 /home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x14B0663
[17] 0x00007FFFF1CF13F3 _Z17sqljsDrdaAsDriverP18SQLCC_INITSTRUCT_T + 0xF3
[18] 0x00007FFFF1AADBC3 _ZN8sqeAgent6RunEDUEv + 0x823
[19] 0x00007FFFF2B48F83 _ZN9sqzEDUObj9EDUDriverEv + 0xF3
[20] 0x00007FFFF2B48E89 _Z10sqlzRunEDUPcj + 0x9
[21] 0x00007FFFF25B00D1 sqloEDUEntry + 0x2A1
[22] 0x0000003ADC807AA1 /lib64/libpthread.so.0 + 0x7AA1
[23] 0x0000003ADC0E893D clone + 0x6D
...
2017-08-23-20.38.02.199299+000 E12656376E489 LEVEL: Error
PID : 12345 TID : 140736968255232 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : PRODB
HOSTNAME: dbserver12b
EDUID : 39 EDUNAME: db2lfr.0 (PRODB) 0
FUNCTION: DB2 UDB, data protection services,
sqlpSearchForLogArchiveOnDisk, probe:4000
MESSAGE : ZRC=0x860F000A=-2045837302=SQLO_FNEX "File not found."
DIA8411C A file "" could not be found.
2017-08-23-20.38.02.199742+000 I12656866E430 LEVEL: Info
PID : 12345 TID : 140736968255232 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : PRODB
HOSTNAME: dbserver12b
EDUID : 39 EDUNAME: db2lfr.0 (PRODB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgPostLogMgrToRetrieve,
probe:1050
DATA #1 : <preformatted>
RTStatus is in state 5 at index 0.
2017-08-23-20.38.02.199845+000 I12657297E428 LEVEL: Warning
PID : 12345 TID : 140736968255232 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : PRODB
HOSTNAME: dbserver12b
EDUID : 39 EDUNAME: db2lfr.0 (PRODB) 0
FUNCTION: DB2 UDB, recovery manager, sqlplfrFMReadLog, probe:5120
MESSAGE : Return code for LFR opening file S00122407.LOG was -2146434659
The error code -2146434659 is returned by edu named db2lfr who is responsible for reading individual log files.
$ db2diag -rc -2146434659
Input ZRC string '-2146434659' parsed as 0x8010019D (-2146434659).
ZRC value to map: 0x8010019D (-2146434659)
ZRC class :
SQL Error, User Error,... (Class Index: 0)
Component:
SQLP ; data protection services (Component Index: 16)
Reason Code:
413 (0x019D)
Identifer:
SQLP_LOG_NOT_IN_ARCHIVE
Identifer (without component):
SQLZ_RC_LOG_NOT_IN_ARCHIVE
Description:
Log extent not found in archive.
Associated information:
Sqlcode -1042
SQL1042C An unexpected system error occurred.
Number of sqlca tokens : 0
Diaglog message number: 1
After looking at the archive log path, we notice the mentioned log file S00122407.LOG is not there. This is the root cause.
The error was triggered by replication programs, like CDC. It calls db2readlog() API and requests a specific LSN from DB2 server. DB2 server can't find the requested log which contains this LSN in either active log path or archive log path, so it generates some files as above in DIAGPATH which will help support team to analyze the root cause. When the replication program keeps trying after the failure, these files are generatedly repeatedly.
UID
ibm13286173