IBM Support

IT30876: DEADLOCK BETWEEN RSS_SEND AND DR_PRSEND THREADS USING DRCB_NODE_COUNT_LOCK AND RELIABLECV_T CONDITION

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • It may happen on an Instance in an HA environment that the
    server face in a complete freeze
    because of a deadlock wait situation between an RSS_send and an
    dr_prsend thread.
    
    The wait can be identified by the print of the locked mutexes
    
    Locked mutexes:
    mid      addr             name               holder   lkcnt
    waiter   waittime
    21294    cc52ce68         drcb_lock          25677    0
    21295    cc52cf10         drcb_node_count_lo 25677    0
    69992    24827
    21307    cc534080         SynchSWMR_t::0xcc5 69992    0
    
    Owner of the drcb_node_count_lock is the thread 25677 which is
    dr_prsend.
    The wait for this mutex is thread 69992 which is RSS_send.
    
    From the onstat -g ath we can see the following status for the
    threads
    
     25677  dr_prsend         1cpu    11/01 11:40:40        2.5455
    46375    cond wait  ReliableCV
     69992  RSS_Send_ie1_ix   8cpu    11/01 11:40:40        0.0029
    7    mutex wait drcb_node_
    
    the owner of the mutex, thread 25677 is waiting for an condition
    which is tied to the mutext the
    RSS_send is owning.
    
    The stacks for the threads are
    
    
    Stack for thread: 69992 RSS_Send_ie1_ixdpp01a_qa
     base: 0x00000000d8126000
      len:   69632
       pc: 0x000000000143ead7
      tos: 0x00000000d8136a10
    state: mutex wait
       vp: 8
    
    0x000000000143ead7 (oninit) yield_processor_mvp
    0x000000000144afae (oninit) mt_lock_wait
    0x0000000001451072 (oninit) mt_lock_helper
    0x0000000001202138 (oninit) cloneAttachCB
    0x0000000001206bf2 (oninit) cloneSend_Int
    0x00000000011f0b82 (oninit) cloneStdSend
    0x0000000001419870 (oninit) th_init_initgls
    0x000000000145f2b7 (oninit) startup
    
    
    Stack for thread: 25677 dr_prsend
     base: 0x00000000dc972000
      len:   69632
       pc: 0x000000000143ead7
      tos: 0x00000000dc982c40
    state: cond wait
       vp: 1
    
    0x000000000143ead7 (oninit) yield_processor_mvp
    0x0000000001453441 (oninit) mt_wait
    0x000000000107db04 (oninit) reliablecv_wait
    0x000000000107ed7b (oninit) synchswmr_reader_enter
    0x000000000128e418 (oninit) SendGlobalVersionInfo
    0x00000000011d3822 (oninit) dr_state_change
    0x00000000011dbf46 (oninit) dr_session_thread
    0x000000000145f2b7 (oninit) startup
    
    Additional in the customer environment where the problem was
    diagnosed, there were a lot of waiters for the condition
    ReliableCV, since there were reads on the tables syscluster and
    sysha_nodes. These are victims not the rootcause.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Update to Informix Server 12.10.xC14 or 14.10.xC4.           *
    ****************************************************************
    

Problem conclusion

  • Fixed in Informix Server 12.10.xC14 and 14.10.xC4.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT30876

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-11-07

  • Closed date

    2020-02-27

  • Last modified date

    2020-02-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
27 February 2020