IBM Support

IZ34170: PROBEID ZT162010 IN FUNCTION zusQueryAncillaryProcess AFTER MQ RECEIVED UNEXPECTED ZERO RETURN CODE FROM READ().

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • MQ receives an unexpected zero return code from read() during
    interprocess communication using UNIX domain sockets.
    A zero return code indicates that the other end of the socket
    has been closed, however there is no evidence of the other
    end of the socket having been closed.
    Initially a ZT162010 probe id from zusQueryAncillaryProcess
    is received and this is followed by a KN254007 Probe Id from
    kqiQueryServiceStatus.
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This problem may be be related to a rare timing condition
    in the OS read() service leading to an unexpected return code
    arising. The error has been observed from a variety of
    customers on on Linux 2.6.5-7.244-smp, Linux 2.6.9-55.ELsmp,
    Linux 2.6.9-67.ELsmp, Linux 2.6.9-67.0.4.ELsmp, Linux 2.6.9-
    67.0.7.ELsmp, and appears to be Linux x86-64 specific.
    
    Platforms affected:
    All Distributed (iSeries, all Unix and Windows)
    ****************************************************************
    PROBLEM SUMMARY:
    MQ uses UNIX domain sockets in order to implement inter process
    communication between internal MQ processes.
    A zero return code (0 bytes read) when reading from a
    UNIX domain socket is intended to signify that the other end of
    the socket has been closed.
    There have been a number of instances where an amqzlaa0
    process is trying to read a response from the amqzmgr0 process
    and has received a 0 bytes read return code.
    As the amqzmgr0 process should outlive the amqzlaa0
    process then it is unexpected that the socket would be closed
    before the read completed. The diagnostic data supplied gives
    no indication of the amqzmgr0 process having ended
    unexpectedly.
    
    A diagnostic patch supplied to one customer indicated that upon
    retrying the read the data appeared to be unexpectedly
    available (which would imply a bug in the OS).
    

Problem conclusion

  • When a zero return code from read is returned when reading
    a response from the amqzmgr0 process then, on Linux only, the
    queue manager will wait for a very short time and then retry
    the read. This is an attempt to work around a possible bug in
    the OS. The APAR fix also includes improved diagnostics that
    will make it easier to conclude whether an OS bug does exist
    (at which point it can be reported to the OS support group).
    
    In addition, the inter process communication mechanism has been
    reworked to handle the situation of partial writes/reads,
    requiring repeated writes/reads to send/receive all the
    requisite data. This rework makes the mechanism more resilient,
    and may in itself be a solution to the problem for which this
    APAR was raised. This rework has been implemented for all
    distributed platforms.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
                       v6.0
    Platform           Fix Pack 6.0.2.8
    --------           --------------------
    Windows            U200309
    AIX                U825517
    HP-UX (PA-RISC)    U824678
    HP-UX (Itanium)    U825875
    Solaris (SPARC)    U825511
    Solaris (x86-64)   U825872
    iSeries            tbc_p600_0_2_8
    Linux (x86)        U825181
    Linux (x86-64)     U825874
    Linux (zSeries)    U825516
    Linux (Power)      U825182
    Linux (s390x)      U825873
    
                       v7.0
    Platform           Fix Pack 7.0.0.2
    --------           --------------------
    Windows            U200302
    AIX                U822354
    HP-UX (PA-RISC)    U822349
    HP-UX (Itanium)    U822351
    Solaris (SPARC)    U822353
    Solaris (x86-64)   U822394
    iSeries            tbc_p700_0_0_2
    Linux (x86)        U822348
    Linux (x86-64)     U822393
    Linux (zSeries)    U822352
    Linux (Power)      U822350
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available, information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ34170

  • Reported component name

    WMQ LIN X86 V6

  • Reported component ID

    5724H7204

  • Reported release

    600

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2008-10-08

  • Closed date

    2009-01-31

  • Last modified date

    2009-04-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WMQ LIN X86 V6

  • Fixed component ID

    5724H7204

Applicable component levels

  • R600 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCPQ5M","label":"APAR"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 April 2009