IBM Support

IT27523: HDR PRIMARY DR_GETTYPE THREAD AND SECONDARY DR_ACCEPT THREADS CAN HANG ATTEMPTING TO RECONNECT AFTER DR:TURNED OFF ON PRIMARY SE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • After network issues caused HDR to be shutdown, the act of
    reconnecting the primary to hdr
    led to continually hung threads on both the primary and
    secondary that never released and
    prevented HDR from syncing back up.
    
    On the primary, the dr_prsend thread was observed waiting for
    dr_gettype thread that was stuck:
    
        Thread CPU Info:
         tid    name              vp       Last Run           CPU
    Time     #scheds    status
    
         147     dr_prsend         8cpu    10/12 10:03:51
    65.1068    1977253    join wait  3679913
         3679913 dr_gettype        9cpu    10/12 19:11:09
    0.2838      33084    cond wait  smx pipe1
    
    The stack thread of dr_gettype thread showed it stuck in
    smx_connect:
    
        Stack for thread: 3679913 dr_gettype
         base: 0x000000015098d000
          len:   69632
           pc: 0x00000001012c63c0
          tos: 0x000000015099cdd1
        state: cond wait
           vp: 10
    
        oninit :: yield_processor_mvp
        oninit :: mt_wait
        oninit :: smx_connect
        oninit :: SC_smx_sporadic_connect
        oninit :: SC_maxmsg_ping
        oninit :: GetServerVersionInfo
        oninit :: verify_server_version
        oninit :: dr_whattype
        oninit :: startup
    
    At the same time, on the HDR side, there was a dr_accept thread
    that was also hung and it's stack was:
    
       Stack for thread: 47176 dr_accept
    
        oninit :: yield_processor_mvp
        oninit :: mt_wait
        oninit :: net_buf_get
        oninit :: recvtli
        oninit :: slSQIrecv
        oninit :: pfRecv
        oninit :: asfRecv
        oninit :: ASF_Call
        oninit :: rsasf_recv_buf
        oninit :: rsasf_recv_with_timeout
        oninit :: dr_asf_recv_with_timeout
        oninit :: dr_session_recv_with_timeout
        oninit :: dr_acceptInt
        oninit :: dr_accept
        oninit :: listen_verify
        oninit :: spawn_thread
        oninit :: th_init_initgls
    
    In this scenario, both the HDR dr_accept thread and the primary
    dr_gettype threads had been hung
    for over 9 hours.  Restarting the HDR secondary did not release
    the dr_gettype thread on the primary and
    the primary had to be restarted to sync HDR back up.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of IDS 12.10.xC10 and earlier versions.                *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * HDR primary dr_gettype thread and secondary dr_accept        *
    * threads can hang attempting to reconnect after DR:Turned off *
    * on primary server.                                           *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    

Problem conclusion

  • Fixed in IDS 12.10.xC11.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT27523

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-12-24

  • Closed date

    2019-10-07

  • Last modified date

    2019-10-07

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
07 October 2019