IBM Support

IT16755: ADDING A READAHEAD THREAD DURING TRANSITION FROM SECONDARY TO PRIMARY CAN HANG THE SERVER

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • With a bit of readahead activity going on, i.e. a number of
    readahead requests pending, a readahead thread might decide to
    get some company and spawn another readahead thread.
    
    If this happens during transition from secondary to primary,
    e.g. for a 'make primary', the allocation of this new thread
    will be blocked in txalloc() (no new transactions allowed as
    long as log stream not yet owned by new primary).  If on the
    other hand one of the outstanding readahead requests is for a
    thread 'in critical section' and that thread eventually abandons
    the request, and if our readahead thread still is the only one
    of its kind, that readahead request could not be served and not
    be abandoned and the next checkpoint - required to procede with
    the transition - would be stuck.
    
    
    Symptoms seen:
    
     - engine still in 'Fast Recovery (CKPT REQ)' -> Blocked:CKPT
     - main_loop in wait4critex()
     - lots of threads (mostly sqlexec) in ra_terminate_req()
     - RSS_onmode thread in this func as well  -  and 'X'-flagged
    (critical section)
    
    
    Threads:
     tid     tcb              rstcb            prty status
    vp-class       name
     ...
     7        700000026fa6248  7000000260b9028  1    sleeping secs:
    1       12cpu         main_loop()
     ...
     828      70000002c4e5bf8  7000000260d50a8  1    join wait  829
    1cpu         sqlexec
     829      70000002c22d028  7000000260d8468  1    join wait  830
    1cpu         Priority Make Primary
     830      70000002c22d2b0  7000000260c67c8  1    sleeping secs:
    1       12cpu         RSS_onmod
     ...
    
    
    Userthreads
    address          flags   sessid   user     tty      wait
    tout locks nreads   nwrites
    ...
    7000000260c67c8  --R-XR- 31       informix -        0
    0    0     8        1
    ...
    
    Stack for thread: 7 main_loop()
     base: 0x0700000026fb0000
      len:   69632
       pc: 0x0000000100062610
      tos: 0x0700000026fbcb60
    state: sleeping
       vp: 8
    
      yield_processor_mvp
      mt_yield
      wait4critex
      checkpoint
      main_loop
      th_init_initgls
      startup
    
    Stack for thread: 830 RSS_onmod
     base: 0x070000002c7c1000
      len:   69632
       pc: 0x0000000100062610
      tos: 0x070000002c7cfff0
    state: sleeping
       vp: 10
    
      yield_processor_mvp
      mt_yield
      ra_terminate_req
      ra_free_req
      rollback
      rsrollback
      txcleanup
      rsclose_lgr
      dr_lgr_end
      dr_finish_recovery
      sdcloneAssumePrimaryActivate
      sdcloneAssumePrimary
      cloneOnmodeSetSDCloneThread
      th_init_initgls
      startup
    
    
    Stack for thread: 33 readahead_0
     base: 0x070000002762e000
      len:   69632
       pc: 0x0000000100062610
      tos: 0x070000002763e310
    state: sleeping
       vp: 10
    
      yield_processor_mvp
      mt_yield
      txalloc
      init_rstcb
      rstcb_alloc
      fork_be_session
      add_readahead_thread
      readahead_daemon
      th_init_initgls
      startup
    
    
    Many other threads in stack like these:
    
      yield_processor_mvp
      mt_yield
      ra_terminate_req
      ra_init_leafscan
      ...
    
      yield_processor_mvp
      mt_yield
      ra_terminate_req
      ra_free_req
      opfree
      ...
    

Local fix

  • Bring down and restart either old or new primary.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Informix users using MACH11 SDS                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * During transition from SDS to primary role, a readahead      *
    * thread might hang in the following stack and block the       *
    * transition:                                                  *
    *                                                              *
    *   yield_processor_mvp                                        *
    *   mt_yield                                                   *
    *   txalloc                                                    *
    *   init_rstcb                                                 *
    *   rstcb_alloc                                                *
    *   fork_be_session                                            *
    *   add_readahead_thread                                       *
    *   readahead_daemon                                           *
    *   th_init_initgls                                            *
    *   startup                                                    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Update to IBM Informix Server 12.10.xC8                      *
    ****************************************************************
    

Problem conclusion

  • Problem Fixed In IBM Informix Server 12.10.xC8
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT16755

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-08-24

  • Closed date

    2016-12-09

  • Last modified date

    2016-12-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

  • RC10 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
09 December 2016