IBM Support

IT31118: MQCMIT of deep queue takes a very long time to complete, leadingto a transmission queue buildup

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An IBM MQ transmission queue grows in depth due to an MQCMIT
    taking a
    long time. There are no errors or FDCs.
    Trace shows that a channel process requested a queue lock, and
    the
    queue lock is owned by the amqzmuc0 process performing a
    checkpoint.
    
    Since the queue is already deep, checkpoint will take a long
    time, at which point MQCMIT will not complete until the
    checkpoint is done.
    
    Channel trace:
    
    13:52:23.440821  36856.1    RSESS:000001    MQCMIT >>
    ...
    13:52:23.443405  36856.1    RSESS:000001    Data: 0x00000000
    *13:52:33.444659  36856.1    RSESS:000001    Data: 0x0000006e
    13:52:33.444726  36856.1    RSESS:000001    Data: 0x10806020
    13:52:33.444795  36856.1    RSESS:000001    Data: 0x00000001
    0x00000000
    13:52:33.444847  36856.1    RSESS:000001 ----------}!
    xlsWaitEvent rc=xecL_W_TIMEOUT FunctionTime=10001729
    13:52:33.444913  36856.1    RSESS:000001    Data: 0x00000000
    0x00000000
    13:52:33.444960  36856.1    RSESS:000001 ---------}!
    zcpReceiveOnPipe rc=zrcC_E_TIMEOUT FunctionTime=10001921
    
    Corresponding agent process shows this:
    
    13:52:23.444714   5207.20450 CONN:00007e    Corresponding
    APPLICATION pid.tid (36856.1)
    13:52:23.444886   5207.20450 CONN:00007e --{
    zlaProcessMQIRequest
    13:52:23.445060   5207.20450 CONN:00007e ---{ zlaMQCMIT
    ...
    13:52:23.904293   5207.20450 CONN:00007e ----------{
    aqmSyncPoint
    13:52:23.904429   5207.20450 CONN:00007e -----------{
    atmEnquireTranLastLSN
    13:52:23.904587   5207.20450 CONN:00007e -----------}
    atmEnquireTranLastLSN rc=OK FunctionTime=158
    13:52:23.904738   5207.20450 CONN:00007e
    QHandle:0x7f413b09fcc0 appended at 0
    13:52:23.904924   5207.20450 CONN:00007e
    QHandle:0x7f413b4ebd80 appended at 1
    ...
    13:52:23.905738   5207.20450 CONN:00007e -----------{
    aocLockQHandle
    13:52:23.905855   5207.20450 CONN:00007e ------------{
    xlsRequestMutex
    13:52:23.906037   5207.20450 CONN:00007e    MtxName: ABC Id:
    2306
    13:52:23.906170   5207.20450 CONN:00007e    Data: 0x0000001d
    0x00004a36 0x00000017
    13:52:24.906367   5207.20450 CONN:00007e    Data: 0x0000006e
    0x00000000
    13:52:24.906537   5207.20450 CONN:00007e    OwnerConnId(29)
    OwnerPid(18998) OwnerTid(23)
    
    We request a mutex, but never get it because the mutex is held
    by another process: amqzmuc0, and here is what it is doing:
    
    *13:52:22.972891  18998.23        :     ---{ xlsRequestMutex
     13:52:23.156086  18998.23        :          MtxName: ABC Id:
    2306
     13:52:23.156190  18998.23        :          MtxName: ABC Id:
    2306
     13:52:23.156281  18998.23        :     ---} xlsRequestMutex
    rc=OK FunctionTime=183390
     13:52:23.156394  18998.23        :     --}
    aocEnquireObjectByQid rc=OK FunctionTime=1697
     13:52:23.157374  18998.23        :     --{ aqmCheckPointQueue
     13:52:23.157545  18998.23        :     ---{ aqsStartQOp
     13:52:23.157719  18998.23        :          Thread(0 882049
    FFFFFFFF.FFFFFFFF) QHandle(7A509CC0.149439 5DF2FA85.2)
     13:52:23.157874  18998.23        :     ----{ aqsAttachQueue
     13:52:23.158038  18998.23        :
    QHandle(7A509CC0.149439 5DF2FA85.2)
     13:52:23.158196  18998.23        :     -----{
    xcsAttachSetExtents
     13:52:23.158358  18998.23        :     -----}
    xcsAttachSetExtents rc=OK FunctionTime=162
     13:52:23.158527  18998.23        :     ----} aqsAttachQueue
    rc=OK FunctionTime=653
     13:52:23.158706  18998.23        :     ---} aqsStartQOp rc=OK
    FunctionTime=234
     13:52:23.158878  18998.23        :     ---{ aqqCopyDeferred
     13:52:23.158993  18998.23        :     ----{ xcsGetMemFn
    *13:52:23.159055  18998.23        :          component:4
    function:138 length:12288 options:0 cbmindex:-1
    *pointer:0x7fdd24006150
     13:52:23.159176  18998.23        :     ----} xcsGetMemFn rc=OK
    FunctionTime=178393
     13:52:23.159288  18998.23        :     ----{ aqtIdxToSpcFn
     13:52:23.159408  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=57
     13:52:23.159524  18998.23        :     ----{ aqtIdxToSpcFn
     13:52:23.159642  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=55
     13:52:23.159758  18998.23        :     ----{ aqqGetDeferredLink
     13:52:23.159883  18998.23        :     -----{
    aqqGetPreviousLink
     13:52:23.160007  18998.23        :     ------{ aqtIdxToSpcFn
     13:52:23.160123  18998.23        :     ------} aqtIdxToSpcFn
    rc=OK FunctionTime=116
     13:52:23.160238  18998.23        :
    bFirstGettableMsg(TRUE) FirstGettableMsg(1072605) NextMsg(0)
     13:52:23.160354  18998.23        :     -----}
    aqqGetPreviousLink rc=OK FunctionTime=58
     13:52:23.160462  18998.23        :     -----{ aqqGetForwardLink
     13:52:23.160581  18998.23        :     -----}
    aqqGetForwardLink rc=OK FunctionTime=119
     13:52:23.160696  18998.23        :     ----}
    aqqGetDeferredLink rc=OK FunctionTime=57
     13:52:23.160821  18998.23        :     ----{ aqtIdxToSpcFn
     13:52:23.160936  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=115
     13:52:23.161066  18998.23        :     ----{ aqtIdxToSpcFn
     13:52:23.161184  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=118
    ...
    10 seconds later, we are still stuck in the same function:
    
    13:52:34.878299  18998.23        :     ----{ aqtIdxToSpcFn
    13:52:34.878323  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=24
    13:52:34.878334  18998.23        :     ----{ aqtIdxToSpcFn
    13:52:34.878354  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=20
    13:52:34.878395  18998.23        :     ----{ aqtIdxToSpcFn
    13:52:34.878424  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=29
    13:52:34.878428  18998.23        :     ----{ aqtIdxToSpcFn
    13:52:34.878430  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=2
    13:52:34.878434  18998.23        :     ----{ aqtIdxToSpcFn
    13:52:34.878437  18998.23        :     ----} aqtIdxToSpcFn
    rc=OK FunctionTime=3
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Users with deep queues may be affected by this problem.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    During IBM MQ queue checkpoint processing, a piece of code
    iterates through each message to find those messages that are
    part of a unit of work.  For very deep queues, this can delay
    checkpoint processing and cause MQI operations for that queue to
    be held up.
    

Problem conclusion

  • The IBM MQ code has been changed so that iterates through a
    cache containing the subset of the messages on the queue which
    are part of a unit of work.  This avoids scanning all messages
    on the queue and so improves the performance of checkpoint
    processing when the queue is deep.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v8.0       8.0.0.16
    v9.0 LTS   9.0.0.11
    v9.1 LTS   9.1.0.7
    v9.x CD    9.2.1
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31118

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7251

  • Reported release

    800

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-11-29

  • Closed date

    2020-07-29

  • Last modified date

    2020-11-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7251

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0.0"}]

Document Information

Modified date:
16 November 2020