IBM Support

IC74652: WMQ V7 JVM THREADS STUCK, HUNG, WAITING WHEN HEARTBEAT INTERVAL HBINT IS SET TO 0

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Intermittently threads may be Stuck (hung / wait) when HBINT
    (Heartbeat interval) is set to 0.
    Channel HBINT of 0 will result in the Java Socket objects having
    a timeout value of 0, interpreted as an infinite timeout.
    The normal communication error methods are not invoked leaving
    the threads in a wait state.
    
    A coredump of the hung JVM shows a thread blocked at
    java.net.SocketInputStream.socketRead0(Native Method)
    

Local fix

  • There are two options:
    
    1.  Set the heartbeat interval (HBINT property) on the channel
        to a positive, non-zero value.  This will ensure that a
        timeout is set on all socket read operations performed on
        connections using this channel.
    Option 1 will cause the client to send heartbeat messages to the
    queue manager and retry the request if it does not receive a
    response within the timespan defined by HBINT.  If this behavior
    is not desirable for the customer then option 2 should be
    considered.
    
    2. Set the environment variable
    "com.ibm.mq.tuning.socketGrainTimeout".
    This variable allows the customer to specify, in seconds, a
    timeout for socket read operations to override the value derived
    from HBINT.  Using this property will not enable client
    heartbeating, but the socket timeouts will be applied
    system-wide, rather than to a specific channel.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects users of:
    - The IBM WebSphere MQ v7 classes for Java
    - The IBM WebSphere MQ v7 classes for JMS
    
    who wish to use a TCP connection to a queue manager using a
    server-connection (SVRCONN) channel with a heartbeat interval
    (HBINT) of 0.
    
    Platforms affected:
    All Distributed (iSeries, all Unix and Windows) +Java
    ****************************************************************
    PROBLEM SUMMARY:
    At v7, the JMQI layer (classes used by both the WebSphere MQ
    classes for Java and classes for JMS to communicate with the
    queue manager) uses the channel's heartbeat interval property
    to derive the value to set for the socket timeout when waiting
    for data from the queue manager.
    
    If the heartbeat interval is 0, then the socket timeout will be
    set to 0. A Java socket timeout of 0 results in an indefinite
    socket read.
    
    It was observed that in some cases, when socket timeout was set
    to 0, the JVM failed to respond to errors on the socket
    connection, and did not return control the WMQ client classes.
    This caused applications to hang indefinitely.
    

Problem conclusion

  • JMQI code was modified to split infinite-wait socket receive
    operations into finite chunks.
    
    After each chunk elapses, the JVM times out the socket and
    returns control to the client classes. The client classes then
    restart the socket receive, detecting any socket errors in the
    process.
    
    In addition, the code change associated with this APAR also
    corrects the implementation of the
    com.ibm.mq.tuning.socketGrainTimeout property.
    
    This property allows the user to override the HBINT value for
    all queue manager connections within the JVM. Previously an
    integer value set on this property would be added to the
    existing HBINT value, rather than overwriting as was intended.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
                       v7.0
    Platform           Fix Pack 7.0.1.6
    --------           --------------------
    Windows            U200328
    AIX                U840698
    HP-UX (PA-RISC)    U841555
    HP-UX (Itanium)    U841560
    Solaris (SPARC)    U841556
    Solaris (x86-64)   U841562
    iSeries            tbc_p700_0_1_6
    Linux (x86)        U841557
    Linux (x86-64)     U841561
    Linux (zSeries)    U841558
    Linux (Power)      U841559
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available, information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC74652

  • Reported component name

    WMQ WINDOWS V7

  • Reported component ID

    5724H7220

  • Reported release

    700

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2011-02-23

  • Closed date

    2011-03-25

  • Last modified date

    2011-03-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WMQ WINDOWS V7

  • Fixed component ID

    5724H7220

Applicable component levels

  • R700 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCPQ63","label":"APAR \/ Maintenance"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
25 March 2011