IBM Support

JR32248: FED: MEMORY CORRUPTION FROM SQLQGCLOSE, CALLER SQLRICJPINFREQUEN T

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Today's trap
    Wallace and I have looked at it together today.
    0x09000000168AE540 sqloCrashOnCriticalMemoryValidationFailure +
    0x20
    0x09000000168B52C4
    diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCc +
    0x264
    0x09000000168AF8C8 sqloDiagnoseFreeBlockFailure__FP8SMemFBlk +
    0x628
    0x09000000168AF1C8 sqlofmblkEx + 0x7A8
    0x09000000169B1B14 __dl__16Sqlqg_Base_ClassFPv + 0x14
    0x0900000018C7A0B8 __dt__15sqlqg_FMP_ReplyFv + 0x98
    0x09000000169BB874 sqlqgClose__FP12sqlri_rquerys + 0x414
         <<=========== This is where double free is
    attempted from
    0x090000001757AA38
    sqlricjpInfrequent__FP8sqlrr_cbPP12sqlri_opparml + 0x358
    0x0900000017574F68 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 +
    0x2028
    0x0900000017579060 sqlricls_complex__FP8sqlrr_cbilN23 + 0x3FC0
    0x09000000195B3C4C sqlracal_finalcmt_rb__FP8sqlrr_cb + 0xD0C
    0x09000000195B2158 sqlracal__FP8sqlrr_cbUiT2 + 0x10D8
    0x09000000173220F0
    sqlrr_cleanup_tran_before_DPS__FP8sqlrr_cbiN62PiT9 + 0x870
    0x0900000017325F70 sqlrrbck__FP8sqlrr_cbiN32P15SQLXA_CALL_INFO
    + 0xE30
    0x0900000017555C50
    sqlrr_rds_common_post__FP14db2UCinterfaceiT2l + 0x16F0
    0x090000001753C360
    sqlrr_open__FP14db2UCinterfaceP15db2UCCursorInfo + 0x3C0
    0x0900000019616390
    sqljs_ddm_opnqry__FP14db2UCinterfaceP13sqljDDMObject + 0x1830
    0x09000000176CF374
    sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2U
    Cinterface + 0x234
    LOC analysis points to this code:
        // Before deleting runtime_obj if there is a cached reply
        // from previous fectch, free it
        DELETE_BLOCK_OR_NOT(runtime_obj->m_stp_block);
        DELETE_REP_OR_NOT(runtime_obj->m_stp_rep);
      //@d15901rel
    DELETE_REP_OR_NOT macro seems to reset m_stp_rep pointer to
    NULL.
    Memory diagnostics file reports block header corruption,
    possibly due to a double memory free though.
    Wallace rerun reproduction with the trace turned on, and it
    indeed showed that this memory was attempted to be freed twice:
    This is where we fail:
    ........................
    15573335    | sqlofmblkEx entry [eduid 22477 eduname db2agent]
     bytes 16
     Data1 (PD_TYPE_PTR,8) Pointer:
     0x0000000116fccf20        << this is the memory
    we're trying to free
    15573956    | | sqloDiagnoseFreeBlockFailure data [probe 10]
    .....................
    Earlier, we see that the same EDU has already freed this
    block:
    15466651    | | | | | | | | sqlofmblkEx entry [eduid 22477
    eduname db2agent]
     bytes 16
     Data1 (PD_TYPE_PTR,8) Pointer:
     0x0000000116fccf20
    15466652    | | | | | | | | sqlofmblkEx mbt
    [Marker:PD_OSS_FREED_MEMORY ]
     Marker:PD_OSS_FREED_MEMORY
     Description: Freeing memory
     bytes 16
     Data1 (PD_TYPE_PTR,8) Pointer:
     0x0000000116fccf20
    15466653    | | | | | | | | sqlofmblkEx exit
    We attempted to reconstruct the stack for the original memory
    free from the trace flow and it looks like this:
    FencedServer::~FencedServer
    sqlqg_FMP_DeleteServer
    sqlqgRouter_conn_lost_cleanup
    sqlqgRouter
    sqlqg_fedstp_hook
    sqlqg_Call_FMP_Thread
    sqlqgClose
    Here is where first free happens in ~FencedServer():
      if (m_reply != NULL)
      //@bd230249tzh
      {
        delete m_reply;
       //@d240258tzh
        m_reply = NULL;
      }
      //@ed230249tzh
    Looks like we freed m_reply here, but we still had a stale
    pointer referencing it from another place and we crash when we
    attempt to free the same memory through another pointer:
    runtime_obj->m_stp_rep
    Please, let us know if you want us to enter a defect.
    Thanks,
    Albert Grankin
    Senior Software Engineer
    

Local fix

Problem summary

  • Users affected:
      Users affected: Users of the DB2 for LUW Homogeneous
    Federation Feature or InfoSphere Federation Server
    Problem description and summary:
      See error description
    

Problem conclusion

  • Problem was first fixed in Version 9.1, FixPak 7 (s090308). This
    fix should be applied on the federation server.
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR32248

  • Reported component name

    FEDERATED RUNTI

  • Reported component ID

    5724N9703

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2009-03-04

  • Closed date

    2009-05-11

  • Last modified date

    2009-05-11

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    JR32249

Fix information

  • Fixed component name

    FEDERATED RUNTI

  • Fixed component ID

    5724N9703

Applicable component levels

  • R910 PSN

       UP

  • R911 PSN

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCAVPX","label":"Federated Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
11 May 2009