IBM Support

IT33316: SLOW LOG TRANSFER FROM PRIMARY TO RSS NODE DUE TO BAD SMX PIPE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • There appears to be a problem that might occur when a primary
    and rss node connect.  This can lead to 1 or more bad smx pipes
    that can't then be used to transfer data between servers.  When
    this happens since pipes are picked randomly, if the bad pipe is
    picked there is an approximate 30 second delay before the server
    picks a new pipe to try and send data.  This will greatly impact
    performance.
    
    In onstat -g smx output, a bad pipe would show up in the output
    for the node, but will have no data transmitted on it.  So it
    would look something like this:
    
      Peer server name: Rssnode1
      SMX connection address: 0x700000116143480
      Encryption status: Disabled
      Total bytes sent: 0
      Total bytes received: 0
      Total buffers sent: 0
      Total buffers received: 0
      Total write calls: 0
      Total read calls: 0
      Total retries for write call: 0
      Data compression level: 0
      Data sent: compressed 0 bytes by  0%
      Data received: compressed 0 bytes by  0%
      SMX connection address: 0x70000090f1f3a30
      Encryption status: Disabled
      Total bytes sent: 7110578
      Total bytes received: 76965
      Total buffers sent: 11990
      Total buffers received: 4853
      Total write calls: 3545
      Total read calls: 4853
      Total retries for write call: 0
      Data compression level: 1
      Data sent: compressed 37158220 bytes by 80%
      Data received: compressed 92260 bytes by 16%
    
    So the above -g smx output the 1st SMX connection is bad and not
    being used while the 2nd appears to be functioning.
    
    Additionally, smx pipe connections between servers involve
    spawning multiple threads, a smxsnd <servername>, smxrcv
    <servername>, and smxRecvSnd.  For the bad pipe, the setup
    hasn't finished and in this case, the onstat -g cpu/-g ath
    output you would only see a smxsnd <servername>, smx, and
    smxRecvSnd thread.  The smx thread hasn't renamed itself to
    smxrcv <servername> yet.  Those threads would have the following
    stacks:
    
    Stack for thread: 370 smxsnd <servername>
    
    0x0000000100062b94 (oninit)yield_processor_mvp
    0x000000010006e25c (oninit)mt_wait
    0x000000010067aba0 (oninit)smx_send_thread
    0x0000000100fea500 (oninit)th_init_initgls
    0x00000001017c2860 (oninit)startup
    
    smxRecvSnd
    
    0x0000000100062b94 (oninit)yield_processor_mvp
    0x0000000100069b6c (oninit)mt_yield
    0x00000001002b6b8c (oninit)cdrTimerWait
    0x000000010067c358 (oninit)smx_send_from_recv_thread
    0x0000000100fea500 (oninit)th_init_initgls
    0x00000001017c2860 (oninit)startup
    
    Stack for thread: 363 smx
    
    0x0000000100062b94 (oninit)yield_processor_mvp
    0x000000010006e25c (oninit)mt_wait
    0x0000000100680964 (oninit)smx_thread
    0x0000000100c62718 (oninit)listen_verify
    0x0000000100c6133c (oninit)spawn_thread
    0x0000000100fea500 (oninit)th_init_initgls
    0x00000001017c2860 (oninit)startup
    
    Thread states in onstat -g ath:
    
     363      700000116120028  70000011356f728  1    cond wait  smx
    pipe1   11cpu         smx
     369      7000001161ff850  700000113572b78  1    sleeping secs:
    1       31cpu         smxRecvSnd
     370      700000114a56148  700000113573430  3    cond wait  smx
    pipe1   11cpu         smxsnd hdr_ausp3
    
    When examining the condition "smx pipe1" (there can be multiple
    conditions with the same name so have to use the thread ids of
    the waiters to find the correct one)
    
    Conditions with waiters:
    cid      addr             name               waiter   waittime
    ...
    1568     7000001161e6648  smx pipe1          363      1048602
                                         370      1048602
                                         321      1048602
    
    We also see a different thread also waiting on this "smx pipe1"
    condtion, thread 321.  If we look at what that thread is and
    it's stack we see the following:
    
    Stack for thread: 321 Notification
    
    0x0000000100062b94 (oninit)yield_processor_mvp
    0x000000010006e25c (oninit)mt_wait
    0x0000000100674524 (oninit)smx_connect
    0x00000001006b0de8 (oninit)cloneWakeupRSSRetry
    0x00000001002bbf80 (oninit)cdrExstmt
    0x00000001006b0fe0 (oninit)cloneRSSRetry
    0x00000001006927b0 (oninit)cloneNotificationThread
    0x0000000100fea500 (oninit)th_init_initgls
    0x00000001017c2860 (oninit)startup
    
    The presence of this "Notification" thread is likely key in
    trying to identify hitting this problem as well.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of Informix Server prior to 12.10.xC15 and 14.10.xC5.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Informix Server 12.10.xC15 (when available) or    *
    * 14.10.xC5.                                                   *
    ****************************************************************
    

Problem conclusion

  • Fixed in Informix Server 12.10.xC15 and 14.10.xC5.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT33316

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-06-24

  • Closed date

    2021-01-06

  • Last modified date

    2021-01-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10"}]

Document Information

Modified date:
11 January 2021