A fix is available
APAR status
Closed as program error.
Error description
Customer complains that after queue manager suffers abend S6C6-00D96021,queue manager can't be started normally and they have to do cold start. The error message in the log: CSQJ113E +CSQ1 RBA 850256B99000 NOT IN ANY ACTIVE OR ARCHIVE LOG DATA SET, CONNECTION-ID=CSQ1 THREAD-XREF=003.RCRSC IEA794I SVC DUMP HAS CAPTURED: DUMPID=057 REQUESTED BY JOB (CSQ1MSTR) DUMP TITLE=CSQ1,ABN=5C6-00D1032A,U=SYSOPR , C=R3600.710.RLMC-CS QJLGR ,M=CSQJRE01,LOC=CSQJL002.CSQJR003+00000ACA ... CSQR001I +CSQ1 RESTART INITIATED CSQR003I +CSQ1 RESTART - PRIOR CHECKPOINT RBA=850256B99227 ACTIVE OR ARCHIVE LOG DATA SET, CONNECTION-ID=CSQ1 THREAD-XREF=003.RCRSC 02 ... IEA794I SVC DUMP HAS CAPTURED: DUMPID=057 REQUESTED BY JOB (CSQ1MSTR) DUMP TITLE=CSQ1,ABN=5C6-00D1032A,U=SYSOPR ,C=R3600.710.RLMC-CSQJLGR ,M=CSQJRE01,LOC= CSQJL002.CSQJR003+00000ACA ... *CSQV086E +CSQ1 QUEUE MANAGER ABNORMAL TERMINATION REASON=00D96021 ... IEF450I CSQ1MSTR CSQ1MSTR - ABEND=S6C6 U0000 REASON=00D96021 442 TIME=01.34.17 The long running DWP processing has delayed checkpoint processing from completing before the log containing the last logged checkpoint was reused. While this doesn't necessarily cause a problem while the system continues to run, it prevents the queue manager from restarting after terminating abnormally, as happened here following the storage exhaustion. To change deferred write processing to ensure checkpoint processing can still proceed, even when DWP is already active/busy, and consequently prevent the situation the customer encountered where an abnormally terminated queue manager had to be cold started. Additional Symptom(s) Search Keyword(s): CSQY222E short of storage SOS
Local fix
n/a
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 * * Release 1 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Checkpointing is delayed if a * * bufferpool is short of stealable * * buffers, and Deferred Write Processing * * (DWP) is busy trying to write out * * buffers to relieve the shortage. * * * * If the delay results in checkpointing * * being delayed until the log containing * * the last checkpoint record has been * * reused, and log offloading is not * * active, the queue manager will not be * * able to recover from abnormal * * termination until the checkpoint * * processing is able to continue. * * * * If abnormal termination occurs during * * this window, attempts to restart the * * queue manager will fail with error: * * " * * CSQJ113E RBA xxxxxxxxxxxx not in any * * active or archive log data set' * * " * * and queue manager startup will * * fail with abend 5C6-00D1032A, and * * CSQV086E will report abnormal * * termination S6C6 with REASON=00D96021 * **************************************************************** * RECOMMENDATION: * **************************************************************** During checkpoint processing CSQPBCKW queues a request to the defererred write processor (DWP) to flush any old buffers requiring IO to the pageset. Later on in checkpoint processing, CSQPECKW waits until DWP notifies it that the queued request has completed. If the bufferpools are under stress, DWP could already be active and attempting to write buffers to the pageset until >25% of the buffers are available for stealing. However applications can continue to steal these buffers and add them back to the deferred write queue while this processing is occurring. This can lead to delays until DWP reaches the 25% threshold and checks for subsequent requests, and consequently leads to delays in flushing the old buffers and resuming checkpoint processing. During this delay, log switches will continue to occur for persistent workload, and it is possible that the log records for the previous checkpoint are no longer available on the active logs. If offloading is not active, this leaves the queue manager temporarily in a situation where it cannot tolerate abnormal termination, due to no checkpoints being available in the logs. Normally, DWP will complete the checkpoint request, and resume checkpoint processing, and a new checkpoint record will be written, ending this situation. However if the queue manager terminates abnormally before this occurs, the queue manager cannot be restarted.
Problem conclusion
Deferred write processing is changed to check whether a checkpoint request has been queued, and if so, to interrupt normal deferred write processing while this request is processed. This prevents long running deferred write processing causing the extended delays to checkpoint processing that can lead to the inability to tolerate abnormal termination. 100Y CSQP1DWP CSQP2DWP
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PI52553
Reported component name
WMQ Z/OS V7
Reported component ID
5655R3600
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2015-11-16
Closed date
2015-12-11
Last modified date
2016-02-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI33713
Modules/Macros
CSQP1DWP CSQP2DWP
Fix information
Fixed component name
WMQ Z/OS V7
Fixed component ID
5655R3600
Applicable component levels
R100 PSY UI33713
UP16/01/08 P F601 ¢
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
01 February 2016