IBM Support

IC83299: WMQ V5.3.1 FOR HP NSS: CPU FAILURE DURING MQGET OF PERSISTENT MESSAGES IN GLOBAL UNIT OF WORK

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A Websphere MQ Java application running using TMF
    transactions to get a single message from a Websphere MQ Queue.
    All messages in this Queue are persistent and all are greater
    than 3 Kb so the number of records in both the Q and overflow
    files should match.
    There was a CPU failure while a TMF transaction was in flight,
    the backup process detected the failure and failed over to the
    backup process.
    AMQ8846: MQ NonStop Server takeover initiated
    The Queue Server process failover completed
    AMQ8638: A Queue Server completed takeover processing.
    After which the application performing the MQGET failed with
    message
    MQJMS2002: failed to get message from MQ queue,
    completionCode=2, errorCode=2003
    Customers may find that the underlying Q and Overflow files have
    a mismatched record count.
    

Local fix

  • Run Websphere MQ in a single CPU.
    
    To restrict LQMA processes to a specific CPUs,  an entry should
    be added to the qmproc.ini file under the LQMA stanza using the
    "CPUs" value.  For example, the following would cause LQMAs to
    run only in CPU 3.
    # Agent type stanzas
    # ------------------
    LQMA:
    CPUs=3
    
    As with all changes to the qmproc.ini file, the change can be
    actioned either by using runmqsc (refresh qmgr type(nsproc)), or
    by using the ecasvc tool, option 13 - the two functions are
    equivalent.   Note that changes to process-related rules apply
    only to newly created processes.
    All newly created LQMA processes would be created in CPU 3.
    

Problem summary

  • In a CPU failure scenario where multiple applications are
    performing FIFO MQGETs from the same queue, and the
    following is true:
    The primary queue server is running in the failing CPU
    Some (but not all) of the applications are running in the
    failing CPU
    The LQMAs are not running in the failing CPU
    The applications are using global units of work
    Applications in the failing CPU have completed MQGET operations
    but not committed the transactions
    There is a failure window where an application not in the
    failing CPU will remove an additional message record from the
    queue file.
    

Problem conclusion

  • Amend the queue server takeover implemention to take account of
    pending transaction locks on queues during the recovery and doom
    transactions whos status is undefined or indeterminate
    

Temporary fix

  • Configure the queue manager to run LQMAs in the same CPU as the
    primary queue server
    

Comments

APAR Information

  • APAR number

    IC83299

  • Reported component name

    WEBS MQ NSS ITA

  • Reported component ID

    5724A3902

  • Reported release

    531

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-05-10

  • Closed date

    2012-05-10

  • Last modified date

    2012-09-17

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS MQ NSS ITA

  • Fixed component ID

    5724A3902

Applicable component levels

  • R531 PSY

       UP

[{"Line of Business":{"code":"LOB36","label":"IBM Automation"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SSFKSJ","label":"WebSphere MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.3.1"}]

Document Information

Modified date:
19 September 2021