IBM Support

表函数pd_get_diag_hist有可能会导致hadr standby replay hang

Troubleshooting


Problem

表函数pd_get_diag_hist执行慢的话有可能会导致application无法被断开,进而造成log不能replay,发生hadr standby replay hang的问题

Symptom

客户通过db2pd -db <db_name> -hadr,发现STANDBY_RECV_REPLAY_GAP值不变,而STANDBY_RECV_REPLAY_GAP一直增大,说明发生了standby replay hang的问题。

HADR_ROLE = STANDBY
..
PRIMARY_LOG_FILE,PAGE,POS = S0002731.LOG, 151, 182424963042
STANDBY_LOG_FILE,PAGE,POS = S0002731.LOG, 151, 182424963042
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0002707.LOG, 223, 180822508874
STANDBY_RECV_REPLAY_GAP(bytes) = 1602454168

正常情况下,在诊断日志中可以观测到,db2redom负责的HdrForceAppsInReplayOnlyWindow-> HdrEndReplayOnlyWindow这个过程很快能完成,HdrForceAppsInReplayOnlyWindow的作用是在进行replay前断开所有的连接,但是客户的诊断日志中发现HdrForceAppsInReplayOnlyWindow一直没有完成,说明有应用程序不能够强制断开。
在standby收集了"db2pd -stack all ",发现有agent的stack如下:
..
0x00000000004223B7 __intel_new_memset + 0x0a77
0x00007F3402810391 pdDiagGetNextLogRecord + 0x03e1
0x00007F340280FDD8 pdDiagGetNextRecordFromBuffer + 0x0038
0x00007F340280FA69 pdDiagGetNextRecord + 0x0239
0x00007F340276AFFB
_ZN17PADiagLogCollEngn11getNextRowsEjPP13PA_DATA_VALUEPjS3_ + 0x053b
0x00007F3403E1144C
_Z25sqlrwGetPDDiagHist_v10fp3P8sqlrr_cbP22sqlrwGetPDDiagHistArgsPPvPl +
0x0b2c
0x00007F3401CEF2C6
_Z30sqlrwGetWLMTableFunctionResultP8sqlrr_cbP20sqlrw_rpc_tf_requestPPvPl
b + 0x0446
0x00007F3401CEA7B1 _Z36sqlrwGetWLMTableFunctionMergedResultjPPv +
0x01f1
0x00007F3400CF80D1 _Z29sqlerTrustedRtnCallbackRouterjPPv + 0x00b1
0x00007F33D1491A9B pd_get_diag_hist_v10fp3 + 0x1c8b

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"High Availability - Cluster Management","Platform":[{"code":"PF002","label":"AIX"}],"Version":"10.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
30 April 2025

UID

swg21981283