A fix is available
APAR status
Closed as program error.
Error description
CQS Message Queue Structure rebuild may fail with message... . CQS0034A CANNOT REBUILD ( structure name ) FROM LOG . The CQS Structure checkpoint used to rebuild the strucure was incomplete because of a problem which may allow CQSREAD & CQSDEL functions to process while a checkpoint is taken. Changes to the contents of the CQS Message Queue Structure cannot be allowed during checkpoint processing.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All V13 IMS users of shared queues and CQS. * **************************************************************** * PROBLEM DESCRIPTION: An IMS message structure rebuild may * * stop with a WTOR: * * * * CQS0034A CANNOT REBUILD * * STRUCTURE FFMSGQ_STR FROM LOG, * * ENTER ABEND OR CONTINUE. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** At the end of an IMS message structure rebuild, CQS initiates a structure checkpoint to take a snapshot of the structure by copying data from the structure to a recovery data set. This checkpoint requires structure activity to be quiesced. Client requests processing is not allowed during the checkpoint. Next, upon structure rebuild completion, z/OS sends ENF 35 'structure available' notifications to connectors to re- establish their structure connection. The ENF 35 process runs asynchronously and also requires structure activity to be quiesced. A timing window exists during the coordination of the structure serialization between the checkpoint and the ENF 35 processes. In the reported case, the ENF 35 process has the exclusive control of the structure and has not yet released and transferred the control to the checkpoint process. However, the checkpoint process assumes it has the control and goes ahead copying structure data to the recovery data set, when in fact the structure serialization has been lost and some client requests manage to get in during this checkpoint. As a result, some unexpected client request log records are logged during the checkpoint, where there should be none. A subsequent IMS message structure rebuild reads the log records from the log that was built by the prior checkpoint and cannot locate a message that represented by an incorrect log record. So this rebuild stops with the WTOR message CQS0034A 'unable to rebuild from log'.
Problem conclusion
GEN: KEYWORDS: SYSPLEXSQ *** END IMS KEYWORDS *** The IXLUSYNC services is changed for checkpoint only to check if the ENF 35 process is holding the structure quiesce latch before allowing the checkpoint process to copy the structure data to the recovery data set. If this is the case, then the service will wait in the intervals of quarter of a second for a maximum of five seconds for the ENF 35 process to have time to process and transfer the serialization to the checkpoint process. If the maximum wait time elapses and the ENF 35 process still has not released the control, then the checkpoint will be aborted since its structure serialization can't be obtained. The checkpoint process is changed to add a check to ensure that it obtains the structure serialization after returning from the IXLUSYNC service module. If the checkpoint does not have the control, then the checkpoint will be aborted also. When a structure checkpoint fails to be initiated due to the above serialization issues, the previous successful checkpoint remains valid as a rebuild starting point. A manual checkpoint can be issued to re-attempt the checkpoint to establish a more current recovery point. The structure rebuild process is enhanced to issue a new informational message CQS0246I to show the structure recovery data set name that is in use for rebuild. * CQSSTE30 module - ENF 35 event processor In the get latch QUIESCST routine, after successfully obtaining the structure quiesce latch exclusively, set the new flag STRRQL30 indicating this ENF 35 event processor is the current owner of the latch. The flag is reset when the latch is released or transferred to the IXLUSYNC processor. * CQSSTE10 module - Structure IXLUSYNC event In the get latch GETLATCH routine, if the caller wants to check and ENF 35 event processor has the structure quiesce latch exclusively, wait for some time maximally 5 seconds for the processor to release the latch. If the processor releases the latch before the maximum time expires, then go to obtain the latch. If the processor still has the latch when the maximum time expires, then return to caller with 'unable to get latch' return code indicating the structure activity serialization is not obtained. * CQSCHK30 module - Structure checkpoint processor Above CHK18000 label, when returning from calling CQSSTE10 to get the structure quiesce latch for checkpoint, added a check to ensure CQSSTE10 is the latch owner, by checking if the first word of the latch header has the CQSSTE10 ECB address. Also, when checking the ECB address, ignore the low bit of the ECB address to be checked in the latch header. If CQSSTE10 does not own the structure quiesce latch , then aborts the checkpoint with an error message indicating checkpoint failure with reason code RCNOTOWN X'00F4' indicating the checkpoint was unable to quiesce the structure: CQS0222E CQS CQS1CQS FAILED STRUCTURE CHECKPOINT FOR STRUCTURE IMSMSGQ01 RC=300000F4 CQS1CQS Checkpoint cannot continue if the structure activity serialization is not obtained. * CQSSTR00 module - Structure recovery main processor In DSSELECT subroutine, issue message CQS0246I to show the structure recovery data set name in use for rebuild. * CQSM1ENU module - CQS Message Table 1 Defined new informational message skeleton CQS0246I to display the structure recovery data set name in use for rebuild. CQS0246I CQS SRDS READ STARTED, DSN=srds_data_set_name * CQSBPAW0 module - CQS AWE server definitions Updated the comment for the STE1 AWE server to indicate this server must remain single-threaded server due to the latch ownership detection code in CQSCHK30. * CQSFTRC0 module - CQS Trace Entry Dump Formatter Changed to format new trace subcodes added in CQSTRTCH and CQSTRSTE macros. * CQSFM020 - CQS Structure Block Dump Formatter Reassembled for adding STRRQL30, STRCNT10, STRCNT20 in CQSSTRUC macro. * CQSSTRUC macro - Structure control block Defined new bit STRRQL30 in the structure status (second) byte STRSTATB of the rebuild word STRRBLWD to indicate the ENF 35 event processor owns the latch. Defines new field STRCNT10 indicating number of times CQSSTE10 wait for CQSSTE30 to release the structure quiesce latch. Defines new field STRCNT20 indicating number of cycles BPETIMER is taken to wait for the latch. * CQSTRTCH macro - Structure checkpoint trace records Added new trace subcode: TRTCNOOW indicating CQSSTE10 does not own latch as expected. * CQSTRSTE macro - CQS structure event trace records Added new trace subcodes: TRSEBTIM indicating a BPE timer error, TRSESTE3 indicating CQSSTE30 holds the latch too long. * DOCUMENTATION CHANGE IMS Messages and codes, Volume 2 - Non-DFS Messages Chapter 3. CQS messages (Common Queue Server) 1- After the existing message CQS0222E title, after the existing return code X'300000F0', add new return code description: X'300000F4' The CQS structure checkpoint was unable to quiesce the structure. An internal serialization error occurred, and the structure was not correctly quiesced when the CQS checkpoint process began to copy the structure checkpoint data. Issue a structure checkpoint manually to attempt another structure checkpoint. 2- After the existing message CQS0245E, add a new informational message: CQS0246I CQS SRDS READ STARTED, DSN=structure recovery data set name Explanation The message indicates CQS is reading the more current one of the structure recovery data sets for structure rebuild. In case of a structure rebuild failure, the structure may be able to be recovered by renaming the structure recovery data set as indicated in the DSN= field of the CQS0246I message, and initiating a new structure rebuild. This structure rebuild will use the other older structure recovery data set, and may be able to successfully complete the rebuild in some cases. Ensure that renamed SRDS is renamed back to its original name at the end of structure rebuild. structure recovery data set name The name of the structure recovery data set that is currently used for structure rebuild. System action CQS processing continues. Module CQSSTR00 3- In the 'System programmer response' section of the message CQS0034A, add this new paragraph: If the rebuild of a shared queues structure failed not because of lost or damaged log data, then the structure may be able to be recovered by renaming the structure recovery data set as indicated in the DSN= field of the CQS0246I message, and initiating a new structure rebuild. This structure rebuild will use the other older structure recovery data set, and may be able to successfully complete the rebuild in some cases. Ensure that renamed SRDS is renamed back to its original name at the end of structure rebuild.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PI14984
Reported component name
IMS V13
Reported component ID
5635A0400
Reported release
300
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2014-04-01
Closed date
2015-05-29
Last modified date
2015-07-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI28066
Modules/Macros
CQSBPAW0 CQSCHK30 CQSFM020 CQSFTRC0 CQSM1ENU CQSSTE10 CQSSTE30 CQSSTR00 CQSTRSTE CQSTRTCH
| GC18971310 |
Fix information
Fixed component name
IMS V13
Fixed component ID
5635A0400
Applicable component levels
R300 PSY UI28066
UP15/06/05 P F506 ¢
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"300","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
14 December 2020