IBM Support

PI14984: CQS STRUCTURE REBUILD FAILS WITH... CQS0034A CANNOT REBUILD (STRUCTURE NAME) FROM LOG

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • CQS Message Queue Structure rebuild may fail with message...
    .
       CQS0034A CANNOT REBUILD ( structure name ) FROM LOG
    .
    The CQS Structure checkpoint used to rebuild the strucure was
    incomplete because of a problem which may allow CQSREAD & CQSDEL
    functions to process while a checkpoint is taken.  Changes to
    the contents of the CQS Message Queue Structure cannot be
    allowed during checkpoint processing.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All V13 IMS users of shared queues and CQS.  *
    ****************************************************************
    * PROBLEM DESCRIPTION: An IMS message structure rebuild may    *
    *                      stop with a WTOR:                       *
    *                                                              *
    *                      CQS0034A CANNOT REBUILD                 *
    *                      STRUCTURE FFMSGQ_STR FROM LOG,          *
    *                      ENTER  ABEND OR CONTINUE.               *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    At the end of an IMS message structure rebuild, CQS initiates a
    structure checkpoint to take a snapshot of the structure by
    copying data from the structure to a recovery data set. This
    checkpoint requires structure activity to be quiesced. Client
    requests processing is not allowed during the checkpoint.
    
    Next, upon structure rebuild completion, z/OS sends ENF 35
    'structure available' notifications to connectors to re-
    establish their structure connection. The ENF 35 process runs
    asynchronously and also requires structure activity to be
    quiesced.
    
    A timing window exists during the coordination of the structure
    serialization between the checkpoint and the ENF 35 processes.
    In the reported case, the ENF 35 process has the exclusive
    control of the structure and has not yet released and
    transferred the control to the checkpoint process. However, the
    checkpoint process assumes it has the control and goes ahead
    copying structure data to the recovery data set, when in fact
    the structure serialization has been lost and some client
    requests manage to get in during this checkpoint. As a result,
    some unexpected client request log records are logged during the
    checkpoint, where there should be none.
    
    A subsequent IMS message structure rebuild reads the log records
    from the log that was built by the prior checkpoint and cannot
    locate a message that represented by an incorrect log record.
    So this rebuild stops with the WTOR message CQS0034A 'unable to
    rebuild from log'.
    

Problem conclusion

  • GEN:
    KEYWORDS:
     SYSPLEXSQ
    
    *** END IMS KEYWORDS ***
    
    The IXLUSYNC services is changed for checkpoint only to check if
    the ENF 35 process is holding the structure quiesce latch before
    allowing the checkpoint process to copy the structure data to
    the recovery data set.  If this is the case, then the service
    will wait in the intervals of quarter of a second for a maximum
    of five seconds for the ENF 35 process to have time to process
    and transfer the serialization to the checkpoint process.  If
    the maximum wait time elapses and the ENF 35 process still has
    not released the control, then the checkpoint will be aborted
    since its structure serialization can't be obtained.
    
    The checkpoint process is changed to add a check to ensure that
    it obtains the structure serialization after returning from the
    IXLUSYNC service module.  If the checkpoint does not have the
    control, then the checkpoint will be aborted also.
    
    When a structure checkpoint fails to be initiated due to the
    above serialization issues, the previous successful checkpoint
    remains valid as a rebuild starting point.  A manual checkpoint
    can be issued to re-attempt the checkpoint to establish a more
    current recovery point.
    
    The structure rebuild process is enhanced to issue a new
    informational message CQS0246I to show the structure recovery
    data set name that is in use for rebuild.
    
    
    * CQSSTE30 module - ENF 35 event processor
    In the get latch QUIESCST routine, after successfully obtaining
    the structure quiesce latch exclusively, set the new flag
    STRRQL30 indicating this ENF 35 event processor is the current
    owner of the latch.  The flag is reset when the latch is
    released or transferred to the IXLUSYNC processor.
    
    
    * CQSSTE10 module - Structure IXLUSYNC event
    In the get latch GETLATCH routine, if the caller wants to check
    and ENF 35 event processor has the structure quiesce latch
    exclusively, wait for some time maximally 5 seconds for the
    processor to release the latch.  If the processor releases the
    latch before the maximum time expires, then go to obtain the
    latch.  If the processor still has the latch when the maximum
    time expires, then return to caller with 'unable to get latch'
    return code indicating the structure activity serialization is
    not obtained.
    
    
    * CQSCHK30 module - Structure checkpoint processor
    Above CHK18000 label, when returning from calling CQSSTE10 to
    get the structure quiesce latch for checkpoint, added a check to
    ensure CQSSTE10 is the latch owner, by checking if the first
    word of the latch header has the CQSSTE10 ECB address.  Also,
    when checking the ECB address, ignore the low bit of the ECB
    address to be checked in the latch header.
    
    If CQSSTE10 does not own the structure quiesce latch , then
    aborts the checkpoint with an error message indicating
    checkpoint failure with reason code RCNOTOWN X'00F4' indicating
    the checkpoint was unable to quiesce the structure:
    
      CQS0222E CQS CQS1CQS  FAILED STRUCTURE CHECKPOINT FOR
      STRUCTURE IMSMSGQ01 RC=300000F4 CQS1CQS
    
    Checkpoint cannot continue if the structure activity
    serialization is not obtained.
    
    
    * CQSSTR00 module - Structure recovery main processor
    In DSSELECT subroutine, issue message CQS0246I to show the
    structure recovery data set name in use for rebuild.
    
    
    * CQSM1ENU module - CQS Message Table 1
    Defined new informational message skeleton CQS0246I to display
    the structure recovery data set name in use for rebuild.
      CQS0246I CQS SRDS READ STARTED, DSN=srds_data_set_name
    
    
    * CQSBPAW0 module - CQS AWE server definitions
    Updated the comment for the STE1 AWE server to indicate this
    server must remain single-threaded server due to the latch
    ownership detection code in CQSCHK30.
    
    
    * CQSFTRC0 module - CQS Trace Entry Dump Formatter
    Changed to format new trace subcodes added in CQSTRTCH and
    CQSTRSTE macros.
    
    
    * CQSFM020 - CQS Structure Block Dump Formatter
    Reassembled for adding STRRQL30, STRCNT10, STRCNT20 in CQSSTRUC
    macro.
    
    
    * CQSSTRUC macro - Structure control block
    Defined new bit STRRQL30 in the structure status (second) byte
    STRSTATB of the rebuild word STRRBLWD to indicate the ENF 35
    event processor owns the latch.
    
    Defines new field STRCNT10 indicating number of times CQSSTE10
    wait for CQSSTE30 to release the structure quiesce latch.
    
    Defines new field STRCNT20 indicating number of cycles BPETIMER
    is taken to wait for the latch.
    
    
    * CQSTRTCH macro - Structure checkpoint trace records
    Added new trace subcode:
      TRTCNOOW indicating CQSSTE10 does not own latch as expected.
    
    
    * CQSTRSTE macro - CQS structure event trace records
    Added new trace subcodes:
      TRSEBTIM indicating a BPE timer error,
      TRSESTE3 indicating CQSSTE30 holds the latch too long.
    
    
    * DOCUMENTATION CHANGE
    
    IMS Messages and codes, Volume 2 - Non-DFS Messages
    Chapter 3. CQS messages (Common Queue Server)
    
    1-
    After the existing message CQS0222E title, after the existing
    return code X'300000F0', add new return code description:
    
    X'300000F4'
    The CQS structure checkpoint was unable to quiesce the
    structure.  An internal serialization error occurred, and
    the structure was not correctly quiesced when the CQS
    checkpoint process began to copy the structure checkpoint
    data. Issue a structure checkpoint manually to attempt
    another structure checkpoint.
    
    2-
    After the existing message CQS0245E, add a new informational
    message:
    
    CQS0246I
     CQS SRDS READ STARTED, DSN=structure recovery data set name
    
    Explanation
    The message indicates CQS is reading the more current one of the
    structure recovery data sets for structure rebuild.  In case of
    a structure rebuild failure, the structure may be able to be
    recovered by renaming the structure recovery data set as
    indicated in the DSN= field of the CQS0246I message, and
    initiating a new structure rebuild. This structure rebuild will
    use the other older structure recovery data set, and may be able
    to successfully complete the rebuild in some cases. Ensure that
    renamed SRDS is renamed back to its original name at the end of
    structure rebuild.
    
    structure recovery data set name
     The name of the structure recovery data set that is currently
     used for structure rebuild.
    
    System action
     CQS processing continues.
    
    Module
     CQSSTR00
    
    3-
    In the 'System programmer response' section of the message
    CQS0034A, add this new paragraph:
    
    If the rebuild of a shared queues structure failed not because
    of lost or damaged log data, then the structure may be able to
    be recovered by renaming the structure recovery data set as
    indicated in the DSN= field of the CQS0246I message, and
    initiating a new structure rebuild. This structure rebuild will
    use the other older structure recovery data set, and may be able
    to successfully complete the rebuild in some cases. Ensure that
    renamed SRDS is renamed back to its original name at the end of
    structure rebuild.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI14984

  • Reported component name

    IMS V13

  • Reported component ID

    5635A0400

  • Reported release

    300

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2014-04-01

  • Closed date

    2015-05-29

  • Last modified date

    2015-07-01

  • APAR is sysrouted FROM one or more of the following:

    PI14961

  • APAR is sysrouted TO one or more of the following:

    UI28066

Modules/Macros

  • CQSBPAW0 CQSCHK30 CQSFM020 CQSFTRC0 CQSM1ENU CQSSTE10 CQSSTE30
    CQSSTR00 CQSTRSTE CQSTRTCH
    

Publications Referenced
GC18971310    

Fix information

  • Fixed component name

    IMS V13

  • Fixed component ID

    5635A0400

Applicable component levels

  • R300 PSY UI28066

       UP15/06/05 P F506 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"300","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
14 December 2020