PH40206: QUEUE MANAGER APPEARS TO BE IN A HUNG STATE / ALLIED ADDRESS SPACES HANG WHEN TERMINATED / MQ LATCHES FOR SHUNTING HELD

A fix is available

APAR status

Closed as program error.

Error description

Queue manager appears to be in a hung state.
STOP QMGR MODE(FORCE), STOP CHINIT commands were successfully
taken by the queue manager but not executed.

The performance problem is a result of very inefficient I/O
when shunting a UR from archive logs located on tape.

The problem will occur when:

- A long-running unit of work needs to be shunted.

- Part of the log range required for shunting is only available
in an archive log.

- The archive log is a tape.


In these circumstances, the shunt processing will result in a
very inefficient pattern of reading data from the archive log.
Every time we encounter a log record that spans a block
boundary, we need to re-read a block we have already passed.
This is done by reopening the archive log and reading from the
end back to the block we're interested in. For a large log,
this means that the queue manager will repeatedly read the end
portion of the log as it has to keep traversing back to the
current point.

In the testing done by IBM MQ Development: z/OS Service team,
they used a unit of work containing 300 puts of messages with
100,000 byte of data each.

Shunt processing for this unit of work took over 2 minutes,
with continuous I/O.

The nature of the issue means that as the size of the data
being read increases, the workload increases in two ways:

- For each additional block we need to process, we have to
restart reading the log another time.

- As processing moves further from the end of the log, each
reread has to move through more data to get back to the point
we were processing at.


From this, we would expect the processing time to increase by
the size of the data range squared. For the customer's case
there is a much larger log range to be processed, which would
explain why the shunting was still going after hours of
processing.
.
Additional keywords:
latch

Local fix

The problem can be avoided by ensuring that the shunt task
never needs to read from an archive log on tape. There are
several ways this could be achieved:

- Add extra active logs. Shunting will be triggered after a UR
has been active for three checkpoints. At the latest, this
means that it will start at the end of the third active log
after the UR started. With 4 active logs, that means that there
is a risk that the 4th log fills before shunting has completed
and the first log gets reused. If you add more logs, this gives
more head-room for shunting to finish before that happens.

- Change the archive log configuration to use DASD, rather than
tape. This would avoid the scenario completely, as different
read logic is used for accessing data from DASD datasets.

- Reduce the LOGLOAD setting to make checkpointing occur more
frequently. This option may be easier to implement in the
short-term than the other two. If LOGLOAD is lowered so that MQ
is taking checkpoints part-way through an active log, rather
than just at the end of the log, that would mean that
long-running URs would be shunted earlier. While this may
result in more instance of shunting, it would significantly
reduce the risk of a shunt process requiring data from the
archive logs, as the shunt would start while the UR only covers
1 or 2 logs.

Problem summary

****************************************************************
* USERS AFFECTED: All users of IBM MQ for z/OS Version 9       *
*                 Release 1 Modification 0 and Release 2       *
*                 Modification 0.                              *
****************************************************************
* PROBLEM DESCRIPTION: Very busy QMGRs which offload active    *
*                      logs to tape data sets may experience   *
*                      performance problems when using low     *
*                      numbers of active logs.                 *
*                                                              *
*                      This may manifest as delays to          *
*                      recoverable actions, in addition to     *
*                      large amounts of I/O to archive log     *
*                      data sets.                              *
****************************************************************
Log data from long running units of work (UOW) which span
multiple log data sets will be shunted forward into more recent
logs. Depending on the amount of shunting required, and the
number of active log data sets, the shunting task may be
required to read from archive log data sets.

The processing used to read from tape archive log data sets is
extremely inefficient when reading large log records, and
results in the tape being rewound many times.

Problem conclusion

The logic for shunting log data from tape archive log data sets
has been improved. This improves the performance of reading
large log records from tape archive logs in certain scenarios.

Temporary fix

Comments

APAR Information

APAR number

PH40206
Reported component name

IBM MQ Z/OS V9
Reported component ID

5655MQ900
Reported release

100
Status

CLOSED PER
PE

NoPE
HIPER

YesHIPER
Special Attention

NoSpecatt / Xsystem
Submitted date

2021-08-29
Closed date

2022-04-01
Last modified date

2022-05-03

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

UI79981 UI79982

Modules/Macros

```
CSQJR103 CSQRSHUN
```

Fix information

Fixed component name

IBM MQ Z/OS V9
Fixed component ID

5655MQ900

Applicable component levels

R100 PSY UI79982

UP22/04/15 P F204 ¢
R200 PSY UI79981

UP22/04/13 P F204 ¢

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100"}]

Document Information

Modified date:
07 June 2022

Tips

PH40206: QUEUE MANAGER APPEARS TO BE IN A HUNG STATE / ALLIED ADDRESS SPACES HANG WHEN TERMINATED / MQ LATCHES FOR SHUNTING HELD

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R100 PSY UI79982

R200 PSY UI79981

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

Document Information

Share your feedback

Need support?