IBM Support

PI25131: WMQ Z/OS: AFTER PM93543, USERS OF PERMDYN QUEUES MAY EXPERIENCE A CORRUPTION IN PSID(00), CAUSING A LOOP A 14/09/05 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The initial problem report was high CPU. MSTR trace showed a
    loop between CSQP1GET and and CSQP1REL in the SCAVNGOB (object
    scavenger) thread. PSWs from the systrace showed that CSQISCO2
    was driving the loop. The loop was caused by a corruption of the
    queue object page chain in PSID(00).
    
    The problem occurs when there is an MQCLOSE with
    MQCO_DELETE_PURGE of a permanent dynamic queue, which
    still has messages on it, at the the same time as the object
    scavenger is running. A change made by PM93543 (PTF UI11858)
    means that a lock on the page in page set 0 which contains
    the queue object is released prematurely, with the result
    that the page can be deallocated twice.
    
    ABEND symptoms may include:
    
    - ABN=5C6-00C91600,U=SYSOPR  ,C=R3600.710.DMC -CSQIERS3,
      M=CSQGFRCV,LOC=CSQILPLM.CSQIERS3+00000F9E
    
      ABN=5C6-00C91600,U=SYSOPR  ,C=R3600.710.DMC
      -CSQIERS3,M=CSQGFRCV,LOC=CSQILPLM.CSQIERS3+00000F9E
    
      where 00C91600 means CSQI_OBJECT_ALREADY_EXISTS
    
    - ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC
      -CSQIMGE6,M=CSQGFRCV,LOC=CSQILPLM.CSQIMGE6+000004C8
    
      ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC
      -CSQILCHG,M=CSQGFRCV,LOC=CSQILPLM.CSQILCHG+0000051C
    
      ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC
      -CSQILVAL,M=CSQGFRCV,LOC=CSQILPLM.CSQILVAL+0000038A
    
      where 00C90600 means CSQI_NO_RECORD_FOUND
    
    In the reported case, the queue manager was part of a Queue
    Sharing Group ( QSG ).  The loop prevented MQ from processing
    other work so that the following occurred.
    
    IXC431I GROUP CSQGSEP0 MEMBER QBPK JOB ssidMSTR ASID nnnn
     STALLED AT mm/dd/yyyy hh:mm:ss.ssssss ID: 0.1
     LAST MSGX: mm/dd/yyyy hh:mm:ss.ssssss  nn STALLED nnnn PENDINGQ
     LAST GRPX: mm/dd/yyyy hh:mm:ss.ssssss   n STALLED    n PENDINGQ
     LAST STAX:                              0 STALLED
    
    IXC430E SYSTEM ssss HAS STALLED XCF GROUP MEMBERS
    
    The the inbound paths for the Coupling Facility ( CF ) were
    being affected because of messages being accumulated. XCF tried
    to restart the pathins to try alleviate the problem:
    IXC467I RESTARTING PATHIN STRUCTURE IXC_PATH_S2 LIST 15
            USED TO COMMUNICATE WITH SYSTEM ssss
            RSN: START CONVERTED TO RESTART
    
    Eventually, people could not log in or issue system commands.
    An IPL was required.  The queue manager abended with
    ABN=5C6-00C91600 upon restart.  A zap from the change team was
    required to repair the chain and get the queue manager
    restarted.
    
    Additional Symptom(s) Search Keyword(s):
    ABEND5C6 ABENDS5C6 5C6 S5C6 S05C6 00C90600 00C91600
    performance loops looping PSID 00
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 1 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: After applying UI11858, or its          *
    *                      superseded PTF UI13085, users may       *
    *                      experience a chain corruption in        *
    *                      PSID(0) if using permanent dynamic      *
    *                      queues. This can result in a loop       *
    *                      in an SRB in the queue manager.         *
    *                                                              *
    *                      Symptoms include one or more of         *
    *                      the following:                          *
    *                      - The queue manager issues abend        *
    *                        5C6-00C90600 in CSQIMGE6,             *
    *                        CSQILCHG and CSQILVAL                 *
    *                      - The queue manager issues abend        *
    *                        5C6-00C90B00 in CSQIMGE9              *
    *                      - The queue manager issues abend        *
    *                        5C6-00C91600 in CSQIERS3 during       *
    *                        startup and fails to start            *
    *                      - The queue manager is using high       *
    *                        CPU in load-module CSQILPLM,          *
    *                        CSECT CSQISCO2                        *
    *                      - The queue manager storage usage       *
    *                        increases over time                   *
    *                      - The LPAR responsiveness is reduced,   *
    *                        particularly on LPARs with few CPs    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    UI11858 (PM93543) has introduced changes to the processing of
    MQCLOSE API calls for permanent dynamic queues, where the
    MQCO_DELETE_PURGE option is specified. These changes release the
    lock on the page holding the queue definition prematurely,
    opening a small timing window where the same page could be
    deallocated twice.
    If the scavenger is invoked and tries to release pages on
    pageset 0 with a corrupt chain, it can go into a loop.
    This problem can also occur for a "DELETE QLOCAL() PURGE"
    command issued against a local or permanent dynamic queue.
    

Problem conclusion

  • The code has been changed to release the page locks at the
    correct point in the delete processing, ensuring that the object
    page chain remains consistent.
    100Y
    CSQIDDEL
    CSQIDEL3
    CSQIMGE3
    CSQLRSAV
    CSQMHDRS
    CSQMRPUT
    CSQMSSUB
    CSQMSUB
    CSQMSUBI
    CSQMSUBV
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI25131

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2014-09-04

  • Closed date

    2014-09-30

  • Last modified date

    2014-11-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PI26011 UI21836

Modules/Macros

  • CSQIDDEL CSQIDEL3 CSQIMGE3 CSQLRSAV CSQMHDRS
    CSQMRPUT CSQMSSUB CSQMSUB  CSQMSUBI CSQMSUBV
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R100 PSY UI21836

       UP14/10/17 P F410 «

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
04 November 2014