IBM Support

IT31945: INSTANCE CAN HANG DURING NODE FAILURE RECOVERY

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • During node failure processing it is possible for an instance to
    hang in a latch wait.
    
    FCM thread #1 is for the db2fcmr daemon.  This thread holds the
    fcmNodeState latch in Exclusive mode in HandleNodeFailures but
    requires the fcmChannel latch in InformNodeFailed.
    
    <StackTrace>
    thread_wait
    getConflictComplex
    sqkfFastCommManager::InformNodeFailed
    sqkfConduit::HandleNodeFailure
    sqkfRecvConduit::HandleDeliverBufferError
    sqkfRecvConduit::RunEDU()
    sqkfRecvConduit::RunEDU()
    sqzEDUObj::EDUDriver
    sqloEDUEntry
    </StackTrace>
    :
    <LatchInformation>
    
    Waiting on latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmChannelLatch) - Address:
    (0x78000002f1dff40), Line: 570, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h
    
    Holding Latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmNodeStateLatch) - Address:
    (0x78000002f1dff20), Line: 2157, File: sqlkf_fcm.C HoldCount: 1
    Holding Latch type: (SQLO_LT_sqkfMLNMgr__m_fcmMlnLatch) -
    Address: (0x78000002b543de8), Line: 2142, File: sqlkf_fcm.C
    HoldCount: 1
    </LatchInformation>
    
    
    FCM Thread #2 is for the db2fcms daemon.  This thread is holding
    the fcmChannel latch and waiting on fcmNodeState latch (shared):
    
    <StackTrace>
    thread_wait
    getConflictComplex
    getConflict
    sqkfChannel::SendControlBuffer
    sqlkd_snd_buffer
    sqlkd_snd_complete
    sqlkdDispatchRequest
    sqleSendDbmCfgRecovery
    sqkfSendConduit::HandleRequests
    sqkfSendConduit::HandleRequests
    sqkfSendConduit::RunEDU
    sqzEDUObj::EDUDriver
    sqloEDUEntry
    </StackTrace>
    :
    <LatchInformation>
    
    Waiting on latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmNodeStateLatch) - Address:
    (0x78000002f1dff20), Line: 2185, File: sqlkf_channel.C
    
    Holding Latch type:
    (SQLO_LT_sqkfFastCommManager__m_fcmChannelLatch) - Address:
    (0x78000002f1dff40), Line: 813, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h HoldCount: 1
    Holding Latch type: (SQLO_LT_sqkfMLNMgr__m_fcmMlnLatch) -
    Address: (0x78000002b543de8), Line: 807, File:
    /view/db2_v105fp5_aix64_s141128/vbs/engn/include/sqlkf_fcm_inlin
    es.h HoldCount: 1
    </LatchInformation>
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 6 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 6
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31945

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-02-21

  • Closed date

    2021-03-15

  • Last modified date

    2021-03-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1"}]

Document Information

Modified date:
16 March 2021