IBM Support

PI85182: Big SQL becomes unresponsive and hangs

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The hang is caused by a problem in the ipc communication layers
    (semaphores) between an agent that is executing "force
    applications" and a db2fmp process that is executing a fenced
    java stored procedure.  The sending side (agent) and receiving
    side (db2fmp) between these two processes are not in sync,
    resulting in the internal interrupt control message not getting
    through to the db2fmp, and both processes wait indefinitely.
    .
    Because this is a communication issue between the FMP agent
    and the master thread, there can be different manifestations
    of this problem.  The following is one possible codepath and
    scenario.
    .
    The agent acquired to drive a force interrupt is sitting in the
    following stack:
    sqloSSemP + 0x05bc
    sqlccipcinit + 0x0cd1
    _Z9sqlccinitP18SQLCC_INITSTRUCT_TPP17SQLCC_COMHANDLE_TP12SQLCC_C
    OND_TP13SQLO_MEM_POOL + 0x026f
    _Z20sqlerMasterThreadReqP17sqlerFmpParmsBaseP13sqlerFmpTableP14s
    qlerFmpHandleP18sqlerFmpThreadListP8sqeAgentjccP5sqlcab + 0x0427
    _Z25sqlerInterruptThreadedFmpP14sqlerFmpHandleP17sqlerFmpParmsBa
    se + 0x00c2
    _ZN8sqeAgent14InterruptAgentEhj + 0x0c03
    _Z24sqleInterruptApplicationP19SQLE_COORDINATOR_CBhji + 0x014b
    _ZN14sqeAppServices19InterruptAppByIndexEth + 0x0364
    _Z19sqljs_ddm_intrdbrqsP14db2UCinterfaceP13sqljDDMObject +
    0x030d
    _Z17sqljsParseConnectP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCint
    erface + 0x0084
    _Z10sqljsParseP13sqljsDrdaAsCbP14db2UCinterfaceP8sqeAgentb +
    0x0524
    address: 0x00007F5B4E577334 ; dladdress: 0x00007F5B4908F000 ;
    offset in lib: 0x00000000054E8334 ;
    address: 0x00007F5B4E57DE82 ; dladdress: 0x00007F5B4908F000 ;
    offset in lib: 0x00000000054EEE82 ;
    _Z17sqljsDrdaAsDriverP18SQLCC_INITSTRUCT_T + 0x011f
    _ZN8sqeAgent6RunEDUEv + 0x0dd7
    _ZN9sqzEDUObj9EDUDriverEv + 0x010a
    sqloEDUEntry + 0x05c5
    address: 0x00007F5B55FD2DC5 ; dladdress: 0x00007F5B55FCB000 ;
    offset in lib: 0x0000000000007DC5 ;
    clone + 0x006d
    while holding latches, similar to the following:
    <LatchInformation>
    Holding Latch type: (SQLO_LT_sqlerFmpRow__ipcLatch) - Address:
    (0x2036af2e4), Line: 8956, File: sqlerFmpEngine.C HoldCount: 1
    Holding Latch type: (SQLO_LT_sqeAppServices__m_appServLatch) -
    Address: (0x200d20528), Line: 2819, File: sqle_app_services.C
    HoldCount: 1
    Holding Latch type: (SQLO_LT_SQLE_COORDINATOR_CB__coordCBLatch)
    - Address: (0x203eb78b8), Line: 2675, File:
    sqle_conn_services.C HoldCount: 1
    Holding Latch type: (SQLO_LT_sqeAgent__intrptLatch) - Address:
    (0x203c07edc), Line: 156, File: sqle_agent_interrupt.C
    HoldCount: 1
    </LatchInformation>
    Other agents are waiting on the various latches held by the
    above.
    A stack of the db2fmp process will show the Master Thread as
    identified by the sqlerMasterThreadListener  function.
    will be stuck in this stack.
    #0  semop ()
    #1  sqloSSemP ()
    #2  sqlccIPCWaitForReceive(SQLCC_IPC_RESOURCES_T*, unsigned
    int, SQLCC_IPC_CON_HANDLE_T*) ()
    #3  sqlccipcdarihandshake(SQLCC_INITSTRUCT_T*,
    SQLCC_COMHANDLE_T*) ()
    #4  sqlerMasterThreadListener ()
    #5  main ()
    

Local fix

  • Restarting bigsql releases the hang
    

Problem summary

  • See error description
    

Problem conclusion

  • The problem is fixed in Version 5.0.0.0 and later fix packs
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI85182

  • Reported component name

    INFO BIGINSIGHT

  • Reported component ID

    5725C0900

  • Reported release

    410

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-07-28

  • Closed date

    2018-03-01

  • Last modified date

    2018-03-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • Unknown
    

Fix information

  • Fixed component name

    INFO BIGINSIGHT

  • Fixed component ID

    5725C0900

Applicable component levels

  • R425 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"410","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
24 August 2020