IBM Support

LI72577: DB2 INSTANCE HANG DUE TO A 3-WAY DEADLATCH AMONG A DB2FCMR PROCESS AND TWO DB2AGENT PROCESSESS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • DB2 instance hang due to a 3-way dead latch among db2fcmr
    process
    and two db2agent processes.
    
    From latch info included in trap files, you can notice the
    following symptom:
    
    A db2agent, which is performing a snapshot, is holding
    appGroupLatch and dbLatch and waiting for masterAppLatch. It has
    stack:
    
      0000002A97BC40D8 sqloXlatchNewConflict + 0x0220
      0000002A960DC166 _Z20sqloxltc_new_notrackP14sqlo_xsemlatchPKcm
    + 0x0076
      0000002A960DC39D
    _Z18sqloxltc_new_trackP14sqlo_xsemlatchPKcm14SQLO_LT_VALUES +
    0x0025
      0000002A965ED54E
    _Z25sqleGetNextAppForDatabaseP8sqledbcbPP18sqle_master_app_cb +
    0x00e6
      0000002A96743E8A
    _Z12sqlmonssagntj13sqm_entity_idP6sqlmaijPvP14sqlm_collectedttP5
    sqlca + 0x0cea
      0000002A96733F0A _Z20sqlmPdbRequestRouterP13sqle_agent_cb +
    0x068a
      0000002A965D11CE _Z20sqleSubRequestRouterP13sqle_agent_cbPjS1_
    + 0x0366
    
    A subagent, which is in the process of being stolen from a
    different application, is holding masterAppLatch and waiting for
    assoc_active_agent_latch. Its stack is like
    
      0000002A97BC3E3D sqloSpinLockConflict + 0x00ed
      0000002A960D3D6D _Z16sqloencs_notrackP12sqloSpinLockPKcm +
    0x006d
      0000002A960D3F5E _Z16sqloxltc_notrackP11sqlo_xlatchPKcm +
    0x0006
      0000002A960D3F9D
    _Z14sqloxltc_trackP11sqlo_xlatchPKcm14SQLO_LT_VALUES + 0x0025
      0000002A965EDC92 _Z17sqleGetAppLatchesP18sqle_master_app_cb +
    0x0092
      0000002A96771A7E
    _Z20sqlmon_agent_cleanupP20sqle_agent_privatecb + 0x016e
      0000002A965D98E7 _Z19sqleAgentDissociateP13sqle_agent_cbi +
    0x18b7
      0000002A965CED34 _Z16sqleInitSubAgentP13sqle_agent_cb + 0x00f4
    
    
    
    A db2fcmr is agent_list_latch and appGroupListLatch and waiting
    for appGroupLatch. And this db2fcmr is also holding
    assoc_active_agent_latch which is not shown in latch info. The
    stack of this db2fcmr process is like the following:
    
      0000002A97BC40D8 sqloXlatchNewConflict + 0x0220
      0000002A960DC166 _Z20sqloxltc_new_notrackP14sqlo_xsemlatchPKcm
    + 0x0076
      0000002A960DC39D
    _Z18sqloxltc_new_trackP14sqlo_xsemlatchPKcm14SQLO_LT_VALUES +
    0x0025
      0000002A965AA6E1
    _Z20sqleGetAgentFromPooliP17sqlcc_init_structiP12sqlz_app_hdlP16
    sqlkdRqstRplyFmtP17sqle_connect_info + 0x0961
      0000002A965A7B41
    _Z12sqleGetAgentiP17sqlcc_init_structiP12sqlz_app_hdlP16sqlkdRqs
    tRplyFmti + 0x0449
      0000002A965CC7AA _Z21sqlePdbProcessRequestP11sqkfChannelPv +
    0x081a
      0000002A966A486C
    _ZN19sqkfFastCommManager18RouteInboundBufferERP10sqkfBufferP17sq
    kfSessionHandleii + 0x029c
      0000002A961190FB
    _ZN19sqkfFastCommManager13DeliverBufferERP10sqkfBufferi + 0x016b
      0000002A966AF678 _ZN15sqkfRecvConduit15HandleDataEventEm +
    0x0238
      0000002A966AFBC5 _ZN15sqkfRecvConduit6RunEDUEv + 0x010d
    

Local fix

  • 1) Refrain from running snapshots
    
    AND/OR
    2) Increase the NUM_POOLAGENTS to reduce the occurrence
    of agent stealing.
    
    AND/OR
    3) db2set DB2_PREFER_AGENT_CREATE=ON
    

Problem summary

  • DB2 INSTANCE HANG DUE TO A 3-WAY DEADLATCH AMONG A DB2FCMR
    PROCESS AND TWO DB2AGENT PROCESSESS
    

Problem conclusion

  • First fixed in DB2 UDB Version 9.1, FixPak 4a
    

Temporary fix

Comments

APAR Information

  • APAR number

    LI72577

  • Reported component name

    DB2 UDE ESE LIN

  • Reported component ID

    5765F4104

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2007-09-20

  • Closed date

    2008-04-22

  • Last modified date

    2008-04-22

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 UDE ESE LIN

  • Fixed component ID

    5765F4104

Applicable component levels

  • R910 PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 October 2021