IBM Support

PI52553: WMQ Z/OS V710:QMGR ABNORMAL TERMINATION DUE TO DEFERRED WRITE PROCESSING.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer complains that after queue manager suffers abend
    S6C6-00D96021,queue manager can't be started normally and they
    have to do cold start.
    
    The error message in the log:
    CSQJ113E +CSQ1 RBA 850256B99000 NOT IN ANY ACTIVE OR
    ARCHIVE LOG DATA SET, CONNECTION-ID=CSQ1
    THREAD-XREF=003.RCRSC
    IEA794I SVC DUMP HAS CAPTURED:
    DUMPID=057 REQUESTED BY JOB (CSQ1MSTR)
    DUMP TITLE=CSQ1,ABN=5C6-00D1032A,U=SYSOPR  ,
    C=R3600.710.RLMC-CS QJLGR
    ,M=CSQJRE01,LOC=CSQJL002.CSQJR003+00000ACA
    ...
    CSQR001I +CSQ1 RESTART INITIATED
    CSQR003I +CSQ1 RESTART - PRIOR CHECKPOINT
     RBA=850256B99227
    ACTIVE OR ARCHIVE LOG DATA SET, CONNECTION-ID=CSQ1
    THREAD-XREF=003.RCRSC 02
    ...
    IEA794I SVC DUMP HAS CAPTURED:
    DUMPID=057 REQUESTED BY JOB (CSQ1MSTR)
    DUMP TITLE=CSQ1,ABN=5C6-00D1032A,U=SYSOPR
     ,C=R3600.710.RLMC-CSQJLGR ,M=CSQJRE01,LOC=
        CSQJL002.CSQJR003+00000ACA
    
    ...
    *CSQV086E +CSQ1    QUEUE MANAGER ABNORMAL
     TERMINATION REASON=00D96021
    ...
     IEF450I CSQ1MSTR CSQ1MSTR - ABEND=S6C6
    U0000 REASON=00D96021  442
          TIME=01.34.17
    
    The long running DWP processing has delayed checkpoint
    processing from completing before the log containing the last
    logged   checkpoint was reused.
    While this doesn't necessarily cause a problem while the system
    continues to run, it prevents the queue manager from restarting
    after terminating abnormally, as happened here following the
    storage exhaustion.
    
    To change deferred  write processing to ensure checkpoint
    processing can still proceed,   even when DWP is already
    active/busy, and consequently prevent the  situation the
    customer encountered where an abnormally terminated
    queue manager had to be cold started.
    
    
    Additional Symptom(s) Search Keyword(s):
    CSQY222E short of storage SOS
    

Local fix

  • n/a
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 1 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: Checkpointing is delayed if a           *
    *                      bufferpool is short of stealable        *
    *                      buffers, and Deferred Write Processing  *
    *                      (DWP) is busy trying to write out       *
    *                      buffers to relieve the shortage.        *
    *                                                              *
    *                      If the delay results in checkpointing   *
    *                      being delayed until the log containing  *
    *                      the last checkpoint record has been     *
    *                      reused, and log offloading is not       *
    *                      active, the queue manager will not be   *
    *                      able to recover from abnormal           *
    *                      termination until the checkpoint        *
    *                      processing is able to continue.         *
    *                                                              *
    *                      If abnormal termination occurs during   *
    *                      this window, attempts to restart the    *
    *                      queue manager will fail with error:     *
    *                      "                                       *
    *                      CSQJ113E RBA xxxxxxxxxxxx not in any    *
    *                      active or archive log data set'         *
    *                      "                                       *
    *                      and queue manager startup will          *
    *                      fail with abend 5C6-00D1032A, and       *
    *                      CSQV086E will report abnormal           *
    *                      termination S6C6 with REASON=00D96021   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    During checkpoint processing CSQPBCKW queues a request to the
    defererred write processor (DWP) to flush any old buffers
    requiring IO to the pageset. Later on in checkpoint processing,
    CSQPECKW waits until DWP notifies it that the queued request has
    completed.
    If the bufferpools are under stress, DWP could already be active
    and attempting to write buffers to the pageset until >25% of the
    buffers are available for stealing. However applications can
    continue to steal these buffers and add them back to the
    deferred write queue while this processing is occurring.
    This can lead to delays until DWP reaches the 25% threshold and
    checks for subsequent requests, and consequently leads to delays
    in flushing the old buffers and resuming checkpoint processing.
    
    During this delay, log switches will continue to occur for
    persistent workload, and it is possible that the log records
    for the previous checkpoint are no longer available on the
    active logs.
    If offloading is not active, this leaves the queue manager
    temporarily in a situation where it cannot tolerate abnormal
    termination, due to no checkpoints being available in the logs.
    Normally, DWP will complete the checkpoint request, and resume
    checkpoint processing, and a new checkpoint record will be
    written, ending this situation. However if the queue manager
    terminates abnormally before this occurs, the queue manager
    cannot be restarted.
    

Problem conclusion

  • Deferred write processing is changed to check whether a
    checkpoint request has been queued, and if so, to interrupt
    normal deferred write processing while this request is
    processed. This prevents long running deferred write processing
    causing the extended delays to checkpoint processing that can
    lead to the inability to tolerate abnormal termination.
    100Y
    CSQP1DWP
    CSQP2DWP
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI52553

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-11-16

  • Closed date

    2015-12-11

  • Last modified date

    2016-02-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UI33713

Modules/Macros

  • CSQP1DWP CSQP2DWP
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R100 PSY UI33713

       UP16/01/08 P F601 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 February 2016