Fixes are available
DB2 Version 9.1 Fix Pack 7 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 7a for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 8 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 9 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 10 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 11 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 12 for Linux, UNIX and Windows
APAR status
Closed as program error.
Error description
Today's trap Wallace and I have looked at it together today. 0x09000000168AE540 sqloCrashOnCriticalMemoryValidationFailure + 0x20 0x09000000168B52C4 diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCc + 0x264 0x09000000168AF8C8 sqloDiagnoseFreeBlockFailure__FP8SMemFBlk + 0x628 0x09000000168AF1C8 sqlofmblkEx + 0x7A8 0x09000000169B1B14 __dl__16Sqlqg_Base_ClassFPv + 0x14 0x0900000018C7A0B8 __dt__15sqlqg_FMP_ReplyFv + 0x98 0x09000000169BB874 sqlqgClose__FP12sqlri_rquerys + 0x414 <<=========== This is where double free is attempted from 0x090000001757AA38 sqlricjpInfrequent__FP8sqlrr_cbPP12sqlri_opparml + 0x358 0x0900000017574F68 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 + 0x2028 0x0900000017579060 sqlricls_complex__FP8sqlrr_cbilN23 + 0x3FC0 0x09000000195B3C4C sqlracal_finalcmt_rb__FP8sqlrr_cb + 0xD0C 0x09000000195B2158 sqlracal__FP8sqlrr_cbUiT2 + 0x10D8 0x09000000173220F0 sqlrr_cleanup_tran_before_DPS__FP8sqlrr_cbiN62PiT9 + 0x870 0x0900000017325F70 sqlrrbck__FP8sqlrr_cbiN32P15SQLXA_CALL_INFO + 0xE30 0x0900000017555C50 sqlrr_rds_common_post__FP14db2UCinterfaceiT2l + 0x16F0 0x090000001753C360 sqlrr_open__FP14db2UCinterfaceP15db2UCCursorInfo + 0x3C0 0x0900000019616390 sqljs_ddm_opnqry__FP14db2UCinterfaceP13sqljDDMObject + 0x1830 0x09000000176CF374 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2U Cinterface + 0x234 LOC analysis points to this code: // Before deleting runtime_obj if there is a cached reply // from previous fectch, free it DELETE_BLOCK_OR_NOT(runtime_obj->m_stp_block); DELETE_REP_OR_NOT(runtime_obj->m_stp_rep); //@d15901rel DELETE_REP_OR_NOT macro seems to reset m_stp_rep pointer to NULL. Memory diagnostics file reports block header corruption, possibly due to a double memory free though. Wallace rerun reproduction with the trace turned on, and it indeed showed that this memory was attempted to be freed twice: This is where we fail: ........................ 15573335 | sqlofmblkEx entry [eduid 22477 eduname db2agent] bytes 16 Data1 (PD_TYPE_PTR,8) Pointer: 0x0000000116fccf20 << this is the memory we're trying to free 15573956 | | sqloDiagnoseFreeBlockFailure data [probe 10] ..................... Earlier, we see that the same EDU has already freed this block: 15466651 | | | | | | | | sqlofmblkEx entry [eduid 22477 eduname db2agent] bytes 16 Data1 (PD_TYPE_PTR,8) Pointer: 0x0000000116fccf20 15466652 | | | | | | | | sqlofmblkEx mbt [Marker:PD_OSS_FREED_MEMORY ] Marker:PD_OSS_FREED_MEMORY Description: Freeing memory bytes 16 Data1 (PD_TYPE_PTR,8) Pointer: 0x0000000116fccf20 15466653 | | | | | | | | sqlofmblkEx exit We attempted to reconstruct the stack for the original memory free from the trace flow and it looks like this: FencedServer::~FencedServer sqlqg_FMP_DeleteServer sqlqgRouter_conn_lost_cleanup sqlqgRouter sqlqg_fedstp_hook sqlqg_Call_FMP_Thread sqlqgClose Here is where first free happens in ~FencedServer(): if (m_reply != NULL) //@bd230249tzh { delete m_reply; //@d240258tzh m_reply = NULL; } //@ed230249tzh Looks like we freed m_reply here, but we still had a stale pointer referencing it from another place and we crash when we attempt to free the same memory through another pointer: runtime_obj->m_stp_rep Please, let us know if you want us to enter a defect. Thanks, Albert Grankin Senior Software Engineer
Local fix
Problem summary
Users affected: Users affected: Users of the DB2 for LUW Homogeneous Federation Feature or InfoSphere Federation Server Problem description and summary: See error description
Problem conclusion
Problem was first fixed in Version 9.1, FixPak 7 (s090308). This fix should be applied on the federation server.
Temporary fix
Comments
APAR Information
APAR number
JR32248
Reported component name
FEDERATED RUNTI
Reported component ID
5724N9703
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2009-03-04
Closed date
2009-05-11
Last modified date
2009-05-11
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
FEDERATED RUNTI
Fixed component ID
5724N9703
Applicable component levels
R910 PSN
UP
R911 PSN
UP
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCAVPX","label":"Federated Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
11 May 2009