APAR status
Closed as program error.
Error description
An IBM MQ transmission queue grows in depth due to an MQCMIT taking a long time. There are no errors or FDCs. Trace shows that a channel process requested a queue lock, and the queue lock is owned by the amqzmuc0 process performing a checkpoint. Since the queue is already deep, checkpoint will take a long time, at which point MQCMIT will not complete until the checkpoint is done. Channel trace: 13:52:23.440821 36856.1 RSESS:000001 MQCMIT >> ... 13:52:23.443405 36856.1 RSESS:000001 Data: 0x00000000 *13:52:33.444659 36856.1 RSESS:000001 Data: 0x0000006e 13:52:33.444726 36856.1 RSESS:000001 Data: 0x10806020 13:52:33.444795 36856.1 RSESS:000001 Data: 0x00000001 0x00000000 13:52:33.444847 36856.1 RSESS:000001 ----------}! xlsWaitEvent rc=xecL_W_TIMEOUT FunctionTime=10001729 13:52:33.444913 36856.1 RSESS:000001 Data: 0x00000000 0x00000000 13:52:33.444960 36856.1 RSESS:000001 ---------}! zcpReceiveOnPipe rc=zrcC_E_TIMEOUT FunctionTime=10001921 Corresponding agent process shows this: 13:52:23.444714 5207.20450 CONN:00007e Corresponding APPLICATION pid.tid (36856.1) 13:52:23.444886 5207.20450 CONN:00007e --{ zlaProcessMQIRequest 13:52:23.445060 5207.20450 CONN:00007e ---{ zlaMQCMIT ... 13:52:23.904293 5207.20450 CONN:00007e ----------{ aqmSyncPoint 13:52:23.904429 5207.20450 CONN:00007e -----------{ atmEnquireTranLastLSN 13:52:23.904587 5207.20450 CONN:00007e -----------} atmEnquireTranLastLSN rc=OK FunctionTime=158 13:52:23.904738 5207.20450 CONN:00007e QHandle:0x7f413b09fcc0 appended at 0 13:52:23.904924 5207.20450 CONN:00007e QHandle:0x7f413b4ebd80 appended at 1 ... 13:52:23.905738 5207.20450 CONN:00007e -----------{ aocLockQHandle 13:52:23.905855 5207.20450 CONN:00007e ------------{ xlsRequestMutex 13:52:23.906037 5207.20450 CONN:00007e MtxName: ABC Id: 2306 13:52:23.906170 5207.20450 CONN:00007e Data: 0x0000001d 0x00004a36 0x00000017 13:52:24.906367 5207.20450 CONN:00007e Data: 0x0000006e 0x00000000 13:52:24.906537 5207.20450 CONN:00007e OwnerConnId(29) OwnerPid(18998) OwnerTid(23) We request a mutex, but never get it because the mutex is held by another process: amqzmuc0, and here is what it is doing: *13:52:22.972891 18998.23 : ---{ xlsRequestMutex 13:52:23.156086 18998.23 : MtxName: ABC Id: 2306 13:52:23.156190 18998.23 : MtxName: ABC Id: 2306 13:52:23.156281 18998.23 : ---} xlsRequestMutex rc=OK FunctionTime=183390 13:52:23.156394 18998.23 : --} aocEnquireObjectByQid rc=OK FunctionTime=1697 13:52:23.157374 18998.23 : --{ aqmCheckPointQueue 13:52:23.157545 18998.23 : ---{ aqsStartQOp 13:52:23.157719 18998.23 : Thread(0 882049 FFFFFFFF.FFFFFFFF) QHandle(7A509CC0.149439 5DF2FA85.2) 13:52:23.157874 18998.23 : ----{ aqsAttachQueue 13:52:23.158038 18998.23 : QHandle(7A509CC0.149439 5DF2FA85.2) 13:52:23.158196 18998.23 : -----{ xcsAttachSetExtents 13:52:23.158358 18998.23 : -----} xcsAttachSetExtents rc=OK FunctionTime=162 13:52:23.158527 18998.23 : ----} aqsAttachQueue rc=OK FunctionTime=653 13:52:23.158706 18998.23 : ---} aqsStartQOp rc=OK FunctionTime=234 13:52:23.158878 18998.23 : ---{ aqqCopyDeferred 13:52:23.158993 18998.23 : ----{ xcsGetMemFn *13:52:23.159055 18998.23 : component:4 function:138 length:12288 options:0 cbmindex:-1 *pointer:0x7fdd24006150 13:52:23.159176 18998.23 : ----} xcsGetMemFn rc=OK FunctionTime=178393 13:52:23.159288 18998.23 : ----{ aqtIdxToSpcFn 13:52:23.159408 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=57 13:52:23.159524 18998.23 : ----{ aqtIdxToSpcFn 13:52:23.159642 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=55 13:52:23.159758 18998.23 : ----{ aqqGetDeferredLink 13:52:23.159883 18998.23 : -----{ aqqGetPreviousLink 13:52:23.160007 18998.23 : ------{ aqtIdxToSpcFn 13:52:23.160123 18998.23 : ------} aqtIdxToSpcFn rc=OK FunctionTime=116 13:52:23.160238 18998.23 : bFirstGettableMsg(TRUE) FirstGettableMsg(1072605) NextMsg(0) 13:52:23.160354 18998.23 : -----} aqqGetPreviousLink rc=OK FunctionTime=58 13:52:23.160462 18998.23 : -----{ aqqGetForwardLink 13:52:23.160581 18998.23 : -----} aqqGetForwardLink rc=OK FunctionTime=119 13:52:23.160696 18998.23 : ----} aqqGetDeferredLink rc=OK FunctionTime=57 13:52:23.160821 18998.23 : ----{ aqtIdxToSpcFn 13:52:23.160936 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=115 13:52:23.161066 18998.23 : ----{ aqtIdxToSpcFn 13:52:23.161184 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=118 ... 10 seconds later, we are still stuck in the same function: 13:52:34.878299 18998.23 : ----{ aqtIdxToSpcFn 13:52:34.878323 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=24 13:52:34.878334 18998.23 : ----{ aqtIdxToSpcFn 13:52:34.878354 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=20 13:52:34.878395 18998.23 : ----{ aqtIdxToSpcFn 13:52:34.878424 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=29 13:52:34.878428 18998.23 : ----{ aqtIdxToSpcFn 13:52:34.878430 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=2 13:52:34.878434 18998.23 : ----{ aqtIdxToSpcFn 13:52:34.878437 18998.23 : ----} aqtIdxToSpcFn rc=OK FunctionTime=3
Local fix
If queues are not allowed to get deep then this problem will not occur. If this cannot be avoided then the impact of the problem may be reduced by tuning IBM MQ so that checkpoint processing occurs less frequently, see https://www.ibm.com/support/pages/checkpoints-websphere-mq-loggi ng
Problem summary
**************************************************************** USERS AFFECTED: Users with deep queues may be affected by this problem. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: During IBM MQ queue checkpoint processing, a piece of code iterates through each message to find those messages that are part of a unit of work. For very deep queues, this can delay checkpoint processing and cause MQI operations for that queue to be held up.
Problem conclusion
The IBM MQ code has been changed so that iterates through a cache containing the subset of the messages on the queue which are part of a unit of work. This avoids scanning all messages on the queue and so improves the performance of checkpoint processing when the queue is deep. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v8.0 8.0.0.16 v9.0 LTS 9.0.0.11 v9.1 LTS 9.1.0.7 v9.x CD 9.2.1 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT31118
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7251
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-11-29
Closed date
2020-07-29
Last modified date
2020-11-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7251
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0.0"}]
Document Information
Modified date:
16 November 2020