A fix is available
APAR status
Closed as program error.
Error description
The customer experienced a crash in their z/OS system (not related to MQ). During the startup of MQ, the backward recovery process was taking a long time to complete. . The root cause of the slow recovery process was a combination of two aspects of the work which needed recovering, which caused recovery processing to go through logic whose performance degrades as the volume of data to recover increases. . The first aspect is the volume of data which needs to be processed to recover MQ. In this particular system, there was a significant amount of work which had been performed since the last checkpoint, which meant that a large log range needed to be read to complete the recovery. The checkpoint frequency is controlled by the size of the MQ log datasets (a checkpoint is taken each time a log is filled) and the LOGLOAD parameter which controls how much logging can be done before a checkpoint is taken. In this customer scenario, the LOGLOAD set to 9,000,000. This value is large enough that it is unlikely to be reached before a log dataset fills. . The second aspect of the workload was the use of XA transaction coordination. It is the recovery logic related to the handling of this style of UOW which has the performance problem. . The issue that was this combination exposed was that the recovery processing builds a list of XA transactions it has seen during the forward recovery phase, and for some log record types it scans this list to see if the record can be associated with a corresponding XA transaction. As the list grows, the processing required to traverse the list for each log record increases. Where a large number of log records need to be processed and they contain a large number of XA transactions, this increasing CPU cost causes the recovery processing to become slower as the processing of the log proceeds. Additional Symptom(s) Search Keyword(s):
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 8 * * Release 0 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: After abnormal queue manager * * termination, the queue manager can take * * a long time to restart when using XA. * * While restarting the queue manager * * demonstrates high cpu usage. * **************************************************************** * RECOMMENDATION: * **************************************************************** During queue manager restart, MQ processes all MQ log records since the last checkpoint while recovering the state of transactions performed before the abnormal termination. For each XA transaction found, an XTE control block is created and chained from the XIT - this chain is searched for each control log record processed in order to determine if the log record relates to a known XA transaction. As the number of XA transactions performed since the last checkpoint increases, the time to perform this search increases greatly, leading to high cpu in CSQMLCUR, and extended delays during queue manager restart.
Problem conclusion
CSQMLCUR is changed to remove XTEs for XA transactions that have completed commit or abort processing earlier in queue manager restart processing, so that the XIT chain only contains XTEs for active XA units of work. This greatly reduces the number of XTEs chained, and consequently reduces the time required to scan the chain. 000Y CSQMLCUR
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PI59634
Reported component name
WMQ Z/OS 8
Reported component ID
5655W9700
Reported release
000
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-03-23
Closed date
2016-04-26
Last modified date
2016-06-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI37315
Modules/Macros
CSQMLCUR
Fix information
Fixed component name
WMQ Z/OS 8
Fixed component ID
5655W9700
Applicable component levels
R000 PSY UI37315
UP16/05/26 P F605
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
02 June 2016