IBM Support

IZ74801: WAS PROCESS WITH NATIVE BINDINGS CONNECTIONS TO WEBSPHERE MQ CAN CRASH ON LINUX WITH A SIGSEGV UNDER THE LIBC MALLOC FUNCTION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • You are running WebSphere MQ on Linux and experience
    one of more of: an unexplained core dump, MQ FFST that reports a
    SIGSEGV, hang of a queue manager process.  These symptoms will
    apply to a wide range of problems, but a review of the failure
    documentation may show some errors that are specific to this
    problem.
    
    Analysis of the thread stack of a core file generated at the
    time that the issue occurred may show the following for the
    failing thread:
    
    #0   __kernel_vsyscall ()
    #1   __lll_mutex_lock_wait () from /lib/libc.so.6
    #2   _L_lock_113 () from /lib/libc.so.6
    #3   ptmalloc_lock_all () from /lib/libc.so.6
    #4   fork () from /lib/libc.so.6
    #5   fork () from /lib/libpthread.so.0
    #6   j9dump_create () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #7   doSystemDump () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so
    #8   protectedDumpFunction () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so
    #9   j9sig_protect () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #10  runDumpFunction () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so
    #11  triggerDumpAgents () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so
    #12  dumpCrashData () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #13  j9sig_protect () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #14  structuredSignalHandler () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #15  masterSynchSignalHandler () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #16  xehInterpretSavedSigaction () from
    /opt/mqm/lib/libmqmcs_r.so
    #17  xehExceptionHandler () from /opt/mqm/lib/libmqmcs_r.so
    #18 <signal handler called>
    #19  malloc_consolidate () from /lib/libc.so.6
    #20  _int_malloc () from /lib/libc.so.6
    #21  malloc () from /lib/libc.so.6
    #22  j9mem_allocate_memory_basic () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #23  j9mem_allocate_memory_callSite () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #24  allocateJavaStack () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #25  allocateVMThread () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #26  startJavaThread () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #27  java_lang_Thread_startImpl () from
    /opt/WebSphere61/AppServer/java/jre/bin/libjclscar_23.so
    #28  ?? ()
    #29  ?? ()
    #30  ?? ()
    #31  ?? ()
    #32  ?? ()
    #33  allocateVMThread () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #34  javaProtectedThreadProc () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #35  j9sig_protect () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so
    #36  javaThreadProc () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    #37  thread_wrapper () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9thr23.so
    #38  start_thread () from /lib/libpthread.so.0
    #39  clone () from /lib/libc.so.6
    
    This stack happened to be from an environment where MQ was being
    called by a JVM but the important part of the stack is the
    malloc_consolidate call as this is the stack frame which causes
    the SIGSEGV.
    
    Another example of the failure can be seen here.  Note that in
    this example, both MQ and Java signal handlers were disabled:
    
    #0  malloc_consolidate (av=0x8b300010) at malloc.c:4576
    
    #1  0xb7dfc769 in _int_malloc (av=0x8b300048, bytes=4372) at
    
    malloc.c:3975
    
    #2  0xb7dfdce6 in *__GI___libc_malloc (bytes=4372) at
    malloc.c:3393
    #3  0x8e787cce in xihQueryThreadEntry () from
    /opt/mqm/lib/libmqmcs_r.so
    #4  0x8e7501e4 in xcsInitialize () from
    /opt/mqm/lib/libmqmcs_r.so
    #5  0x8e909e6b in zstMQCONNX () from /opt/mqm/lib/libmqz_r.so
    
    #6  0x8f32e8f7 in MQCONNX () from /opt/mqm/lib/libmqm_r.so
    
    #7  0x8e97e7eb in Java_com_ibm_mq_server_MQSESSION__1MQCONNX ()
    from
    /opt/mqm/java/lib/libmqjbnd05.so
    
    #8  0xb7cf328e in VMprJavaSendNative () from
    
    
    The core file may show one or more threads in an xcsWaitThread
    MQ call, for example:
    
    #0  __kernel_vsyscall ()
    #1  __lll_mutex_lock_wait () from /lib/libpthread.so.0
    #2  pthread_cond_timedwait@@GLIBC_2.3.2 () from
    /lib/libpthread.so.0
    #3  pthread_cond_timedwait@GLIBC_2.0 () from
    /lib/libpthread.so.0
    #4  xcsWaitThread () from /opt/mqm/lib/libmqmcs_r.so
    #5  xtmStopTimerThread () from /opt/mqm/lib/libmqmcs_r.so
    #6  xcsTerminate () from /opt/mqm/lib/libmqmcs_r.so
    #7  xcsReleaseThread () from /opt/mqm/lib/libmqmcs_r.so
    #8  zutReleaseSharedPCD () from /opt/mqm/lib/libmqz_r.so
    #9  zstMQDISC () from /opt/mqm/lib/libmqz_r.so
    #10 MQDISC () from /opt/mqm/lib/libmqm_r.so
    #11 Java_com_ibm_mq_server_MQSESSION__1MQDISC () from
    /opt/mqm/java/lib/libmqjbnd05.so
    #12 VMprJavaSendNative () from
    /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so
    
    Further examination of the core file will show that there are no
    threads containing a stack frame named xtmTimerThread.
    
    These symptoms were not directly observed for this problem but
    could both be indicative of it:
    1. An FDC may be generated with Probe Id XC130003 in
    xcsWaitThread.
    2. An MQ thread may appear to hang in a call to
    pthread_cond_timedwait
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Whether a system is potentially affected by this problem depends
    on its implementation of the pthread_cond_destroy() API -
    specifically if EBUSY is implemented as documented by the POSIX
    standard.  This will vary both between platform and C library
    release and so it is not feasible to give a definitive list of
    at risk systems.
    
    Platforms affected:
    Linux (Power),Linux (s390x),Linux (x86),Linux (x86-64),
    Linux (zSeries)
    
    ****************************************************************
    PROBLEM SUMMARY:
    The problem is caused by an incorrect expectation of how the
    pthread_cond_destroy API should behave.  When destroying a
    conditional variable using the pthread_cond_destroy API, the
    WebSphere MQ code expected the API to return EBUSY if a
    pthread_cond_timedwait was currently using the conditional
    variable.  This assumption is based on the specification at:
    
    http://www.opengroup.org/onlinepubs/009695399/functions/pthread_
    cond_destroy.html
    
    The [EBUSY] and [EINVAL] error checks, if implemented
    
    [EBUSY]
    The implementation has detected an attempt to destroy the
    object referenced by cond while it is referenced (for example,
    while being used in a pthread_cond_wait() or
    pthread_cond_timedwait()) by another thread.
    
    
    It was assumed that the EBUSY check would be implemented as
    standard but this was in fact an optional part of the
    specification and so not all platform implementations would
    necessarily behave in this way.
    
    As a result of this incorrect assumption, it was possible for a
    waiting thread to try and use a conditional variable that had
    been destroyed by the thread being waited on.  This would happen
    if the thread being waited on had managed to call
    pthread_cond_destroy() before the thread calling
    pthread_cond_timedwait() had been dispatched.
    

Problem conclusion

  • The problem was resolved by changing the way that WebSphere MQ
    destroys conditional variables.  Instead of relying on the
    thread being waited on to destroy the conditional variable, the
    code now ensures that the variable is destroyed only when no
    other threads are using it.  By doing this, MQ avoids the race
    condition that caused the failure.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
                       v6.0
    Platform           Fix Pack 6.0.2.11
    --------           --------------------
    Linux (x86)        tbc_p600_0_2_11
    Linux (x86-64)     tbc_p600_0_2_11
    Linux (zSeries)    tbc_p600_0_2_11
    Linux (Power)      tbc_p600_0_2_11
    Linux (s390x)      tbc_p600_0_2_11
    
                       v7.0
    Platform           Fix Pack 7.0.1.4
    --------           --------------------
    Linux (x86)        U836460
    Linux (x86-64)     U836464
    Linux (zSeries)    U836461
    Linux (Power)      U836462
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available, information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ74801

  • Reported component name

    WMQ LIN X86 V6

  • Reported component ID

    5724H7204

  • Reported release

    602

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-04-19

  • Closed date

    2010-07-30

  • Last modified date

    2010-07-30

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WMQ LIN X86 V6

  • Fixed component ID

    5724H7204

Applicable component levels

  • R602 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCPQ5M","label":"APAR"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0.2","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 July 2010