IBM Support

PI85618: Segfault when high traffic coming to the Intelligent Management Enabled plugin and a Liberty member is stopped

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Segfault when high traffic coming to the Intelligent
    Management Enabled plugin and a Liberty member is stopped
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server Intelligent Managed Plugin           *
    ****************************************************************
    * PROBLEM DESCRIPTION: Segfault when high traffic coming to    *
    *                      the Intelligent Management Enabled      *
    *                      plugin and a Liberty member is stopped  *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Segfault 11 when high traffic is coming to the Intelligent
    Management Enabled plugin at the same time that a Liberty
    collective member is stopped.  The problem is due to having
    clusters with only one member.  A request arrives and we
    associate the request to a cluster that we save in the context.
    If the Liberty member is stopped the active request is unable to
    be processed because the member is no longer available. This
    causes us to retry the request and in doing so we must find
    another cluster that includes the application to receive the
    request since the previous member is no longer available. Only
    the context is not updated with the new cluster on the retry.
    This results in an attempt to free an already freed request when
    the context is later freed. Here is the call stack from the seg
    fault to assist in diagnosis of the problem:
    
    #0  0x00007fbcaf0591d7 in raise () from /lib64/libc.so.6
    #1  0x00007fbcaf05a8c8 in abort () from /lib64/libc.so.6
    #2  0x00007fbcaf098f07 in __libc_message () from
    /lib64/libc.so.6
    #3  0x00007fbcaf0a0503 in _int_free () from /lib64/libc.so.6
    #4  0x00007fbcad3d34cd in odrFree (p=0x7fbc9c0f7ac0,
    file=0x7fbcad3fe360
    "/home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/o
    drTargetSelector.c",
        line=999) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/od
    rLibUtil.c:105
    #5  0x00007fbcad3dcbda in clusterDelete (cluster=0x7fbc9c10fb80)
    at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/od
    rTargetSelector.c:999
    #6  0x00007fbcad3dec5a in tsTargetInfoDecrementRefCnt
    (ctx=0x70fc20) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/od
    rTargetSelector.c:1847
    #7  0x00007fbcad3c927f in clean (ctx=0x70fc20) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/od
    rHttpContext.c:238
    #8  0x00007fbcad3ccbbf in odrHttpContextRelease (ctx=0x70fc20)
    at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/odrlib/src/od
    rHttpContext.c:1449
    #9  0x00007fbcad96af39 in odrHandleRequest
    (request=0x7fbc94ff0ba0) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/src/common/od
    r/lib_odr.c:279
    #10 0x00007fbcad97b8f2 in websphereHandleRequest
    (reqInfo=0x7fbc68006628) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/src/common/ws
    _common.c:4753
    #11 0x00007fbcad960c7f in as_handler (req=0x7fbc68004978) at
    /home/ibmadmin/odrbuildx/NATV/ws/code/plugins.http/src/apache_22
    /mod_was_ap22_http.c:1625
    #12 0x000000000042844e in ihs_run_handler ()
    #13 0x000000000043a47c in ap_invoke_handler ()
    #14 0x0000000000444ac8 in ap_process_request ()
    #15 0x0000000000441dfc in ap_process_http_connection ()
    #16 0x000000000043e152 in ap_run_process_connection ()
    #17 0x000000000044993f in worker_thread ()
    #18 0x00007fbcaf83cdc5 in start_thread () from
    /lib64/libpthread.so.0
    #19 0x00007fbcaf11b73d in clone () from /lib64/libc.so.6
    

Problem conclusion

  • Code was corrected to properly update the cluster associated
    with the request when a retry is necessary because a member
    became unavailable and changed the target cluster.
    
    The fix for this APAR is currently targeted for inclusion in fix
    pack 9.0.0.5.  Please refer to the Recommended Updates page for
    delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

  • No workaround aside from limiting traffic when stopping Liberty
    members in the collective and alternatively, ensuring that only
    some (and not all) members of a cluster are stopped concurrently
    when there may be inflight requests such that there will be
    other members in the same cluster that can service the request
    and thereby avoid the issue.
    

Comments

APAR Information

  • APAR number

    PI85618

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-08-08

  • Closed date

    2017-08-16

  • Last modified date

    2018-08-20

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R900 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
04 May 2022