IBM Support

IT31945: INSTANCE CAN HANG DURING NODE FAILURE RECOVERY

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During node failure processing it is possible for an instance to
    hang in a latch wait.
    
    FCM thread #1 is for the db2fcmr daemon.  This thread holds the
    fcmNodeState latch in Exclusive mode in HandleNodeFailures but
    requires the fcmChannel latch in InformNodeFailed.
    
    <StackTrace>
    thread_wait
    getConflictComplex
    sqkfFastCommManager::InformNodeFailed
    sqkfConduit::HandleNodeFailure
    sqkfRecvConduit::HandleDeliverBufferError
    sqkfRecvConduit::RunEDU()
    sqkfRecvConduit::RunEDU()
    sqzEDUObj::EDUDriver
    sqloEDUEntry
    </StackTrace>
    :
    <LatchInformation>
    
    Waiting on latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmChannelLatch) - Address:
    (0x78000002f1dff40), Line: 570, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h
    
    Holding Latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmNodeStateLatch) - Address:
    (0x78000002f1dff20), Line: 2157, File: sqlkf_fcm.C HoldCount: 1
    Holding Latch type: (SQLO_LT_sqkfMLNMgr__m_fcmMlnLatch) -
    Address: (0x78000002b543de8), Line: 2142, File: sqlkf_fcm.C
    HoldCount: 1
    </LatchInformation>
    
    
    FCM Thread #2 is for the db2fcms daemon.  This thread is holding
    the fcmChannel latch and waiting on fcmNodeState latch (shared):
    
    <StackTrace>
    thread_wait
    getConflictComplex
    getConflict
    sqkfChannel::SendControlBuffer
    sqlkd_snd_buffer
    sqlkd_snd_complete
    sqlkdDispatchRequest
    sqleSendDbmCfgRecovery
    sqkfSendConduit::HandleRequests
    sqkfSendConduit::HandleRequests
    sqkfSendConduit::RunEDU
    sqzEDUObj::EDUDriver
    sqloEDUEntry
    </StackTrace>
    :
    <LatchInformation>
    
    Waiting on latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmNodeStateLatch) - Address:
    (0x78000002f1dff20), Line: 2185, File: sqlkf_channel.C
    
    Holding Latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmChannelLatch) - Address:
    (0x78000002f1dff40), Line: 813, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h HoldCount: 1
    Holding Latch type: (SQLO_LT_sqkfMLNMgr__m_fcmMlnLatch) -
    Address: (0x78000002b543de8), Line: 807, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h HoldCount: 1
    </LatchInformation>
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 6 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 6
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31945

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-02-21

  • Closed date

    2021-03-15

  • Last modified date

    2021-03-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 May 2022