IBM Support

IT31694: ON WINDOWS, LOSING TRACK OF A CPU VP'S NUM_READY_THREADS CAN BURN 100% CPU CYCLES ON OTHERWISE IDLE SYSTEM

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • On a seemingly idle Windows IDS server, it's possible to have a
    cpu vp using 100% cpu.
    
    For instance, on a 12 cpu Windows IDS 12.10.TC11 server, we were
    able to get stacks for
    the cpu vps from a memory dump.
    
    The stacks for 1cpu, 8cpu, 9cpu, 10cpu, 11cpu, 12cpu, 13cpu,
    14cpu, 16cpu, 17cpu, 18cpu:
    
        oninit.exe!net_aio_poll(void *hPort, int timeout) Line 173
        oninit.exe!NT_P(_VP *v) Line 1640
        oninit.exe!NT_idle_loop(_VP*i_vp, unsigned int bz, int
    wakeup) Line 5210
        oninit.exe!NT_idle_processor() Line 5124
        oninit.exe!startup() Line 177
    
    The stack for 15cpu is slightly different:
    
        oninit.exe!net_aio_poll(void *hPort, int timeout) Line 173
        oninit.exe!NT_yield_processor_mvp() Line 18070
        oninit.exe!NT_idle_processor() Line 5107
        oninit.exe!startup() Line 177
    
    Looking at process explorer, we could see that the oninit.exe
    thread for 15cpu was running at 100%.
    
    The underlying issue here is that the vp struct associated with
    that 15cpu has a positive num_ready_threads
    value but there are no threads in its ready queue(s).  This
    keeps the idle vp from every sleeping as it constantly
    thinks there is a thread ready to run when there isn't.
    
    To identify this on an idle system, you can first observe the
    100% cpu usage, but you can also look at "onstat -g sch"
    output.  The cpu vp that is using up the cpu cycles will have a
    positive number in the Q-ln column with nothing
    in the ready queue "onstat -g rea".  For instance, from "onstat
    -g sch" you can see the value 1 in the Q-ln column for 15cpu
    below:
    
    Thread Migration Statistics:
     vp    pid       class      steal-at steal-sc idlvp-at idlvp-sc
    inl-polls Q-ln
     1     9568      cpu        0        0        0        0
    0         0
     2     8184      adm        0        0        0        0
    0         0
     3     8156      lio        0        0        0        0
    0         0
     4     7212      pio        0        0        0        0
    0         0
     5     7156      aio        0        0        0        0
    0         0
     6     11088     msc        0        0        0        0
    0         0
     7     816       fifo       0        0        0        0
    0         0
     8     7476      cpu        0        0        0        0
    0         0
     9     10904     cpu        0        0        0        0
    0         0
     10    10940     cpu        0        0        0        0
    0         0
     11    10936     cpu        0        0        0        0
    0         0
     12    11096     cpu        0        0        0        0
    0         0
     13    8064      cpu        0        0        0        0
    0         0
     14    6256      cpu        0        0        0        0
    0         0
     15    5996      cpu        0        0        0        0
    0         1
     16    5984      cpu        0        0        0        0
    0         0
     17    6928      cpu        0        0        0        0
    0         0
     18    8056      cpu        0        0        0        0
    0         0
     19    924       soc        0        0        0        0
    0         0
     20    920       soc        0        0        0        0
    0         0
     21    10960     soc        0        0        0        0
    0         0
     22    10932     soc        0        0        0        0
    0         0
    
    This defect is being entered for defensive purposes.  We should
    be
    able to identify this case and address it returning the idle cpu
    vp to
    normal behavior.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    

Problem conclusion

  • Problem fixed in Informix Server versions 12.10.xC14 and
    14.10.xC4.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31694

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-01-29

  • Closed date

    2020-02-24

  • Last modified date

    2020-02-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
24 February 2020