A fix is available
APAR status
-
Closed as program error.
Error description
-
Queue manager appears to be in a hung state. STOP QMGR MODE(FORCE), STOP CHINIT commands were successfully taken by the queue manager but not executed. The performance problem is a result of very inefficient I/O when shunting a UR from archive logs located on tape. The problem will occur when: - A long-running unit of work needs to be shunted. - Part of the log range required for shunting is only available in an archive log. - The archive log is a tape. In these circumstances, the shunt processing will result in a very inefficient pattern of reading data from the archive log. Every time we encounter a log record that spans a block boundary, we need to re-read a block we have already passed. This is done by reopening the archive log and reading from the end back to the block we're interested in. For a large log, this means that the queue manager will repeatedly read the end portion of the log as it has to keep traversing back to the current point. In the testing done by IBM MQ Development: z/OS Service team, they used a unit of work containing 300 puts of messages with 100,000 byte of data each. Shunt processing for this unit of work took over 2 minutes, with continuous I/O. The nature of the issue means that as the size of the data being read increases, the workload increases in two ways: - For each additional block we need to process, we have to restart reading the log another time. - As processing moves further from the end of the log, each reread has to move through more data to get back to the point we were processing at. From this, we would expect the processing time to increase by the size of the data range squared. For the customer's case there is a much larger log range to be processed, which would explain why the shunting was still going after hours of processing. . Additional keywords: latch
Local fix
-
The problem can be avoided by ensuring that the shunt task never needs to read from an archive log on tape. There are several ways this could be achieved: - Add extra active logs. Shunting will be triggered after a UR has been active for three checkpoints. At the latest, this means that it will start at the end of the third active log after the UR started. With 4 active logs, that means that there is a risk that the 4th log fills before shunting has completed and the first log gets reused. If you add more logs, this gives more head-room for shunting to finish before that happens. - Change the archive log configuration to use DASD, rather than tape. This would avoid the scenario completely, as different read logic is used for accessing data from DASD datasets. - Reduce the LOGLOAD setting to make checkpointing occur more frequently. This option may be easier to implement in the short-term than the other two. If LOGLOAD is lowered so that MQ is taking checkpoints part-way through an active log, rather than just at the end of the log, that would mean that long-running URs would be shunted earlier. While this may result in more instance of shunting, it would significantly reduce the risk of a shunt process requiring data from the archive logs, as the shunt would start while the UR only covers 1 or 2 logs.
Problem summary
-
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 1 Modification 0 and Release 2 * * Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Very busy QMGRs which offload active * * logs to tape data sets may experience * * performance problems when using low * * numbers of active logs. * * * * This may manifest as delays to * * recoverable actions, in addition to * * large amounts of I/O to archive log * * data sets. * **************************************************************** Log data from long running units of work (UOW) which span multiple log data sets will be shunted forward into more recent logs. Depending on the amount of shunting required, and the number of active log data sets, the shunting task may be required to read from archive log data sets. The processing used to read from tape archive log data sets is extremely inefficient when reading large log records, and results in the tape being rewound many times.
Problem conclusion
-
The logic for shunting log data from tape archive log data sets has been improved. This improves the performance of reading large log records from tape archive logs in certain scenarios.
Temporary fix
Comments
APAR Information
-
APAR number
PH40206
-
Reported component name
IBM MQ Z/OS V9
-
Reported component ID
5655MQ900
-
Reported release
100
-
Status
CLOSED PER
-
PE
NoPE
-
HIPER
YesHIPER
-
Special Attention
NoSpecatt / Xsystem
-
Submitted date
2021-08-29
-
Closed date
2022-04-01
-
Last modified date
2022-05-03
-
APAR is sysrouted FROM one or more of the following:
-
APAR is sysrouted TO one or more of the following:
UI79981 UI79982
Modules/Macros
-
CSQJR103 CSQRSHUN
Fix information
-
Fixed component name
IBM MQ Z/OS V9
-
Fixed component ID
5655MQ900
Applicable component levels
Fix is available
-
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100"}]
Document Information
Modified date:
07 June 2022