IBM Support

IT07722: HADR Standby may crash due to memory corruption with top function sqloCrashOnCriticalMemoryValidationFailure

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • HADR STANDBY MAY CRASH SUDDENLY with below symptoms :
    
    Error seen in trap: Signal #11 (SIGSEGV): si_addr is
    0x0000000000000000, si_code is 0x00000033 (SEGV_ACCERR:Invalid
    permissions for mapped object.)
    
    Stacks:
    -------Frame------ ------Function + Offset------
    0x090000000DF89F0C sqloCrashOnCriticalMemoryValidationFailure +
    0x38
    0x090000000DFA7C08
    diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCcCb +
    0x400
    0x090000000CE95300
    MemTreePut__13SQLO_MEM_POOLFP8SMemNodeUlP17SqloChunkSubgroup +
    0x90
    0x090000000CE9419C sqlofmblkEx + 0x270
    0x090000000AEA565C
    sqlplfrDeleteFromFileTbl__FP12SQLPLFR_DBCBPUiUiUlT4CbT6 + 0x510
    0x090000000AEC546C
    sqlplfrPerformOpenFileTblReclaim__FP12SQLPLFR_DBCBP21SQLPLFR_REQ
    _SCAN_NEXT + 0x330
    0x090000000A9C2744
    sqlplfrFMCloseLog__FP12SQLPLFR_DBCBUiUlbP21SQLPLFR_REQ_SCAN_NEXT
    Cb + 0x3F8
    0x090000000AEC7D88
    sqlplfrFMReadLog__FP12SQLPLFR_DBCBP21SQLPLFR_REQ_SCAN_NEXTP17SQL
    PLFR_SCAN_DATA + 0xB64
    0x090000000BDFAF50 RunEDU__9sqpLfrEduFv + 0x14A4
    
    
    Diaglog:
    
    2015-02-06-17.36.05.382058-300 I2989961A597         LEVEL:
    Warning
    PID     : 15335598             TID : 24420          PROC :
    db2sysc 0
    INSTANCE: inst01             NODE : 000           DB   : SAMPLE
    HOSTNAME: hostid
    EDUID   : 24420                EDUNAME: db2hadrs.0.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrEdu::hdrEduS, probe:20755
    MESSAGE : ZRC=0x87800148=-2021654200=HDR_ZRC_BAD_LOG
              "HADR standby found bad log"
    DATA #1 : String, 99 bytes
    HADR standby error handling: will close connection to primary,
    then reconnect, and perform a retry.
    
    2015-02-06-17.36.05.386236-300 E2992580A905         LEVEL:
    Critical
    PID     : 15335598             TID : 19537          PROC :
    db2sysc 0
    INSTANCE: inst01             NODE : 000           DB   : SAMPLE
    HOSTNAME: hostid
    EDUID   : 19537                EDUNAME: db2lfr.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, SQO Memory Management,
    sqloDiagnoseFreeBlockFailure, probe:10
    MESSAGE : ADM14001C  An unexpected and critical error has
    occurred: "Panic".
              The instance may have been shutdown as a result.
    "Automatic" FODC
              (First Occurrence Data Capture) has been invoked and
    diagnostic
              information has been recorded in directory
    
    "/db2area/inst01/db2dump/FODC_Panic_2015-02-06-17.36.05.385563_0
    000
              /". Please look in this directory for detailed
    evidence about what
              happened and contact IBM support if necessary to
    diagnose the
              problem.
    
    2015-02-06-17.36.05.393213-300 E2993486A1643        LEVEL:
    Severe
    PID     : 15335598             TID : 19537          PROC :
    db2sysc 0
    INSTANCE: inst01             NODE : 000           DB   : SAMPLE
    HOSTNAME: hostid
    EDUID   : 19537                EDUNAME: db2lfr.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, SQO Memory Management,
    sqloDiagnoseFreeBlockFailure, probe:999
    MESSAGE : Memory validation failure, diagnostic file dumped.
    DATA #1 : String, 28 bytes
    Corrupt pool free tree node.
    DATA #2 : File name, 34 bytes
    15335598.19537.mem_diagnostics.txt
    CALLSTCK: (Static functions may not be resolved correctly, as
    they are resolved to the nearest symbol)
      [0] 0x090000000DFA7C00
    diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCcCb +
    0x3F8
      [1] 0x090000000CE95300
    MemTreePut__13SQLO_MEM_POOLFP8SMemNodeUlP17SqloChunkSubgroup +
    0x90
      [2] 0x090000000CE9419C sqlofmblkEx + 0x270
      [3] 0x090000000AEA565C
    sqlplfrDeleteFromFileTbl__FP12SQLPLFR_DBCBPUiUiUlT4CbT6 + 0x510
      [4] 0x090000000AEC546C
    sqlplfrPerformOpenFileTblReclaim__FP12SQLPLFR_DBCBP21SQLPLFR_REQ
    _SCAN_NEXT + 0x330
      [5] 0x090000000A9C2744
    sqlplfrFMCloseLog__FP12SQLPLFR_DBCBUiUlbP21SQLPLFR_REQ_SCAN_NEXT
    Cb + 0x3F8
      [6] 0x090000000AEC7D88
    sqlplfrFMReadLog__FP12SQLPLFR_DBCBP21SQLPLFR_REQ_SCAN_NEXTP17SQL
    PLFR_SCAN_DATA + 0xB64
      [7] 0x090000000BDFAF50 RunEDU__9sqpLfrEduFv + 0x14A4
      [8] 0x090000000BDC2AAC RunEDU__9sqpLfrEduFv + 0x7E4
      [9] 0x0000000000000000 ?unknown + 0x0
      [10] 0x090000000CF4C520 EDUDriver__9sqzEDUObjFv + 0x3C0
      [11] 0x090000000BEFB440 sqloEDUEntry + 0x38C
      [12] 0x090000000080CD30 _pthread_body + 0xF0
      [13] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
    
    Errpt has below stack information:
    ADDITIONAL INFORMATION
    sqloCrash 38
    sqloCrash 10
    diagnoseM 404
    MemTreePu 94
    sqlofmblk 274
    sqlplfrDe 514
    sqlplfrPe 334
    sqlplfrFM 3FC
    sqlplfrFM B68
    RunEDU__9 14A8
    RunEDU__9 7E8
    ??
    EDUDriver 3C4
    sqloEDUEn 390
    _pthread_ F4
    ??
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * See SYSROUTE APARs to see where this APAR is addressed       *
    ****************************************************************
    

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    IT07722

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    A50

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-03-16

  • Closed date

    2017-05-09

  • Last modified date

    2017-05-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.5","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
29 June 2020