IBM Support

IT32060: CDRLSQN_WAIT IS EXPENSIVE WHEN MANY SESSIONS RUNNING TRANSACTIONS ON PRIMARY OF HDR PAIR WITH HDR_TXN_SCOPE NEAR_SYNC

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • A customer running 12.10.FC7 on linux x86_64 reported a
    benchmark was reporting many less transactions per second when
    running ~6000 clients against a 52 CPU VP server as compared to
    splitting the clients over two 26 CPU VP servers.  The servers
    were each primary servers of HDR pairs and DRINTERVAL was set to
    0 and HDR_TXN_SCOPE set to NEAR_SYNC.
    
    The 52 CPU VP server was showing hundreds of threads sleeping
    forever with stacks like:
    
    0x00000000013e79ef (oninit) yield_processor_mvp
    0x00000000013eb1d7 (oninit) mt_yield
    0x000000000106550d (oninit) cdrLSNQ_Wait
    0x00000000011f55af (oninit) proxyWaitForAllNodesFromPrimary
    0x0000000000d3401f (oninit) rscommit
    0x000000000078788e (oninit) sqiscommit
    0x000000000074c0fa (oninit) sqcommit
    0x00000000006bec06 (oninit) aud_sqcommit
    0x0000000000a03f44 (oninit) sql_commit
    0x0000000000a040c9 (oninit) sq_commit
    0x0000000000ad3653 (oninit) sqmain
    0x00000000014f7756 (oninit) spawn_thread
    0x00000000013c1790 (oninit) th_init_initgls
    0x0000000001428327 (oninit) startup
    
    This is a common stack for a thread running in a NEAR_SYNC HDR
    environment when the thread is waiting for ack from the
    secondary that the commit log record made it to an HDR buffer.
    
    While it is expected to see stacks like these for sqlexec
    threads in this HDR environment, if you see a multitude of these
    stacks it might indicate another unexpected issue.  In stress
    testing designed to mimic comparable work, I observed via
    profiling that the cdrLSNQ_Wait function was often a top 10 or
    at least a top 20 expensive function.  There is a list in this
    function thatgets pretty long and we traverse this list looking
    for a particular waiter often having to go several hundred
    iterations into the list.  The more threads that are running
    transactions and entering this function leads to longer lists
    and more and more expensive list traversals which explains why
    the customer saw better throughput when they divided clients
    across 2 instances.
    
    So, the above stack which shows a sleeping thread can also
    indicate a performance issue when hundreds of threads are in
    this state.
    

Local fix

  • set HDR_TXN_SCOPE to ASYNC
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of Informix Server prior to 12.10.xC15 and 14.10.xC4.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Informix Server 12.10.xC15 (when available) or    *
    * 14.10.xC4.                                                   *
    ****************************************************************
    

Problem conclusion

  • Fixed in Informix Server 12.10.xC15 and 14.10.xC4.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT32060

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-03-03

  • Closed date

    2020-12-10

  • Last modified date

    2020-12-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10"}]

Document Information

Modified date:
11 December 2020