A fix is available
APAR status
Closed as program error.
Error description
Scenario is such that if CQS is up and running and Allowautoalt has been specified as yes, and the alter has processed to adjust the element to entry ration, this could be lost across a CQS restart. If CQS is down and the structure needs recovery, when CQS is restarted the new element to entry ratio could be lost as it is maintained in a CQS control block. CQS needs to preserve the element to entry ration in the SRDS header record for recovery purposes.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All IMS V10 shared queues users, shared EMH * * users, and all CQS users. * **************************************************************** * PROBLEM DESCRIPTION: CQS restart abended U0014-00000390 * * because the structure recovery failed * * with CQS0242E RC=43000002 message. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** After APAR PK49989, the structure attributes, including the structure size and the element-to-entry ratio, are being kept in the structure block. Whenever a change to these values takes place, either by CQS internal processes or by MVS with structure specifying ALLOWAUTOALT(YES) in the CFRM policy, CQS updates its structure block with these new structure attributes. If CQS loses access to the structure, either because the link to the structure is lost or because of structure failure, these updated structure attributes are used in the IXLCONN REBUILD to recover the structure. If CQS has terminated normally or abnormally for any reason and the structure is intact, then the subsequent CQS restart process obtains the updated structure attributes after the initial IXLCONN to the existing structure. In both cases, CQS has the updated structure attributes. There is no problem in recovering the structure after structure failure or CQS restart. But if the structure becomes inaccessible (structure failure or link failure) after CQS has terminated, the restart process may fail to recover the structure from the SRDS. CQS has no mechanism to retrieve the structure size and element-to-entry ratio that belongs to the previous instance of the structure. Thus, obsolete values of structure attributes are used to allocate new and rebuild structure. In some instances, these invalid attributes prevent CQS to recover the structure. The rebuild structure is always filled up before all data has been recovered, which leads to rebuild recovery failure message CQS0242E with RC=43000002.
Problem conclusion
AIDS: RIDS/SYS RIDS/QMGR SYS QMGR GEN: POSTREQ PK76852 POSTREQ PM30411 KEYWORDS: SYSPLEXSQ MULTSYS *** END IMS KEYWORDS *** To resolve this problem, the control record of the SRDS is expanded to save the structure attributes. For now, the structure attributes include the structure size, element-to- entry ratio and the EMC count of the structure pair (both primary and overflow structures). Whenever an IXLALTER occurs with changes, one and only one CQS within the same XCF group will initiate a structure checkpoint at the end of the Structure Alter End event. Before completing the structure checkpoint, the checkpoint master saves the new structure attributes to the control record of the SRDS. If the structure becomes inaccessible then these updated structure attributes will be used to recover the structure during the restart of CQS. At restart time, a new structure is allocated with attributes from the CFRM policy and from the global structure definition proclib member CQSSGxxx. If any one of the two SRDSes contains client data then CQS starts structure recovery. Before the structure is recovered from the SRDS in rebuild phase 2, the rebuild master verifies the attributes of the rebuild structure with the structure attributes from the control record of the SRDS. If they are mismatched then the current recovery process is aborted. Before aborting the current rebuild, the rebuild master not only saves the valid attributes to its structure block, it also writes them to the CQSSTRATTRIBUTES entry on the original structure. All CQSes will receive the Rebuild Stop Complete Event and the non-master CQSes will read the valid structure attributes from the newly created CQSSTRATTRIBUTES entry. Now, all CQSes have the valid structure attributes (as in the control record of the SRDS). These attributes are used in the IXLCONN REBUILD to allocate the rebuild structure. With ALLOWAUTOALT(YES) specified in the CFRM policy, CQS has no control over the frequency of IXLALTER on a structure. For example, after the structure has reached its maximum size and the demand for available storage still exists, MVS will keep on changing the element-to-entry ratio trying to 'squeeze' out any unused elements/entries for the application. To prevent the performance impact from frequent structure checkpoint in such a scenario, another flavor of structure checkpoint, the express checkpoint, is added in the structure checkpoint logic module (CQSCHK30). The only task of the express checkpoint is to save the updated structure attributes to the control record of the most recent SRDS. Note: This APAR can be applied serially (rolling maintenance) for each CQS in the sysplex. In a mixed environment, this function only works if: 1. The last structure checkpoint master MUST be a CQS with this maintenance, AND 2. The structure recovery master MUST be an CQS with this maintenance. Documentation Changes: ===================== This APAR changes the IMS V10 Messages and Codes Reference, Volume 4: IMS Component Codes (GC18-9715-01): In Chapter 5 - CQS Codes, under section CQS Restart and Rebuild Reason Codes, add the following new code to Table 13: Code Meaning System Programmer Action ------- ----------------------------- ------------------------- X'0080' The structure size and/or the None. CQS will initiate element-to-entry ratio of the another rebuild with rebuild structure and the valid structure structure attributes from the attributes reading from SRDS are mismatched. the SRDS. Parts List: ========== 1. CQSANCHR - Add a new control list entry key CQSSTRATTRIBUTES, the structure attributes key. This entry is created by the rebuild master on the original structure after detecting the attributes of the rebuild structure don't match the attributes on the SRDS. This entry is used as a temporary storage to hold the correct structure attributes. All other CQSs will read this entry for structure attributes in the Rebuild Stop Complete Event. 2. CQSAWE - Add a new function code, AWECEXPR, for the Chkpt AWE. This express checkpoint AWE only updates the control record of the most recent SRDS, which will be used in the next structure recovery process, with new values of structure attributes after an IXLALTER. 3. CQSCHK30 - Add logic to handle a new express checkpoint function code, AWECEXPR. There will be no new message for this express checkpoint process. A "Begin exp chkpt" and a "End exp chkpt" entries are written to the structure trace table at the beginning and the end of the process. The "End exp chkpt" trace also contains the return code in trace word1. 4. CQSCLE - Add a new definition for the structure attributes key, CQSSTRATTRIBUTES. The adjunct of this entry contains the structure size, entry part of the E/E ratio, the element part of the E/E ratio and the EMC count. 5. CQSFM010 - Recompile for changes in CQSANCHR. 6. CQSFM020 - Recompile for changes in CQSSTRUC. 7. CQSFTRC0 - Add two new format strings for the two new str checkpoint subcodes, TRTCEXPB & TRTCEXPE. 8. CQSICQS0 -Initialize the newly added key, CQSSTRATTRIBUTES, i the anchor block. 9. CQSIST30 - Add code to initialize the newly added structure attributes fields in the control record of the SRDS. Also, if CQS restart fails because of invalid attributes of the rebuild structure then allow it to make another rebuild attempt to allocate the rebuild structure with correct structure attributes. 10. CQSRSTP - Add two new rebuild failure reasons, RFSRMISM and RFSATWER, and one new rebuild start reason RSTSRMM. These reasons are used to make various decisions in different phases during the recovery process. 11. CQSSRDS - Add fields in the control record to save the current structure size, the entry part of the E/E ratio, the element part of the E/E ratio and the EMC count for both the primary structure and the overflow structure, if it exists. 12. CQSSTE00 - Add code to initiate a structure checkpoint if there is any changes to the structure size or the element- to-entry ratio at the end of the Structure Alter End event. Depending on what and how the structure attributes is changed, a regular structure checkpoint AWE or an express checkpoint AWE will be built and enqueued to the structure checkpoint queue header. A regular structure checkpoint AWE will be built and enqueued if: . The element-to-entry ratio changes but the structure size has not reached its maximum size. . The structure size is decreased. . The last regular structure checkpoint is older than 10 minutes. Otherwise, an express checkpoint AWE is built and enqueued to avoid the performance impact of the regular structure checkpoint. The express checkpoint process only updates the control record of the most recent SRDS, which will be used in the next rebuild recovery, with the new structure size, element-to-entry ratio and EMC count. All CQSes in the same XCF group will receive the same Structure Alter End event. Only the CQS that has connec- tion ID - STRCONID - represented FIRST in the connection bitmap - STRCONB - initiates the checkpoint. 13. CQSSTE10 - Save the structure checkpoint timestamp in the structure block for non-master CQS. The master CQS saves the timestamp in CQSCHK30. This timestamp is used to make decision on the type of the next structure checkpoint (regular or express) at the end of the Structure Alter End Event. 14. CQSSTE20 - Add code in the Rebuild Stop Complete Event to read the valid str attributes from the CQSSTRATTRIBUTES entry on the original structure if the on-going recovery has failed with StrAttr mismatch reason. These values are saved in the structure block for use in the next rebuild to allocate the new rebuild structure with correct str attributes. If this CQS cannot obtain these values then it will not be able to allocate a suitable rebuild structure in the next rebuild process. A flag bit in the structure block, STRRATTF, is turned on for this CQS. If this flag bit is ON then this CQS will delay its connection to the rebuild structure to allow other CQSes that have the valid str attributes to allocate the rebuild structure in the next Rebuild Connect Event. The delay connection to the rebuild str also applies to CQS that hasn't completed its initialization. 15. CQSSTRUC - following are fields and EQUs added in the structure block: . Define a new bit, STRRATTF, in the structure status byte STRSTATB. This bit is set when this CQS fails to get the valid structure attributes, and it's reset in phase 4 of CQSSTR00 after the rebuild completes. . Define new fields STRRENTY and STRRELMT to contain the entry part and element part in the E/E ratio of the rbld structure. For a successful rebuild, values in these two fields are first being transferred to STRENTRY and STRELMNT, then cleared out in phase 4. For a rebuild that failed with structure attributes mismatch reason, values in these 2 fields together with value in STRRSIZE will be used in the next rebuild to allocate the rebuild structure. . Define new field STRCKTIM, the last (regular) structure checkpoint timestamp. This timestamp is only substan- tially accurate. It's used to make decision whether a regular structure checkpoint or an express structure checkpoint is needed at the end of the Structure Alter End Event. . Define two new reasons for locking the control list header: + LCKRSATW: to write the CQSSTRATTRIBUTES entry + LCKRCHEX: to read the CQSSTRCHKPTINPRG entry in the express checkpoint logic of CQSCHK30. . Define 2 new flag bits, SIE2OKPR and SIE2OKOF, in the initialization extension block, CQSSTRIE. These 2 bits are used to prevent resource (Control List, SRDS) contention between the STE2 thread and the STRD thread during initialization process. 16. CQSSTR00 - Add code in rebuild phase 2 to verify the rebuild structure's attributes with values from the control record of SRDS for CQS restart. If they don't match then save the values from the SRDS to the structure block and write those values to the new entry CQSSTRATTRIBUTES on the original structure before aborting the current rebuild. In rebuild abort phase, preserve the valid structure attri -butes in STRRSIZE, STRRENTY, STRRELMT and STRREMCC for the next rebuild if the rebuild failure reason is structure attributes mismatch. 17. CQSSTS10 - Logic that used to derive the element-to-entry (E/E) ratio is removed from the main execution paths of CONN0000 and CONS0000 to form a subroutine EERATIO. CONN0000 and CONS0000 will now call EERATIO to obtain the E/E ratio. Also, code is added in CONR000 to call EERATIO to get the E/E ratio after an IXLCONN REBUILD call. Since the EntryCount & ElementCount returned on an IXLCONN call are only 'substantially accurate', the IXLCONN REBUILD call always uses the maximum structure size for the rebuild structure. This is to ensure that the newly allocated rbld structure always has enough entries and elements for the rebuild process. 18. CQSTRACE - Add a new entry type, TRCETSAT, for the CQSSTRATTRIBUTES entry. 19. CQSTRTCH - Add new trace subcodes, TRTCEXPB & TRTCEXPE, and a new block type for an express checkpoint buffer, TRTC7EXP, in the structure checkpoint trace record.
Temporary fix
********* * HIPER * *********
Comments
REPINNED RP08/12/03 (ATXT) TO ADD POSTREQ PK76852 INFO. **** PE08/12/03 PTF IN ERROR. SEE APAR PK76852 FOR DESCRIPTION ×**** PE08/12/03 FIX IN ERROR. SEE APAR PK76852 FOR DESCRIPTION REPINNED RP11/01/17 (ATXT) TO ADD POSTREQ PM30411 INFO. **** PE11/01/17 PTF IN ERROR. SEE APAR PM30411 FOR DESCRIPTION ×**** PE11/01/17 FIX IN ERROR. SEE APAR PM30411 FOR DESCRIPTION ×**** PE15/07/23 FIX IN ERROR. SEE APAR PI44960 FOR DESCRIPTION
APAR Information
APAR number
PK64986
Reported component name
IMS V10
Reported component ID
5635A0100
Reported release
010
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2008-04-23
Closed date
2008-06-19
Last modified date
2015-07-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK37531
Modules/Macros
CQSANCHR CQSAWE CQSCHK30 CQSCLE CQSFM010 CQSFM020 CQSFTRC0 CQSICQS0 CQSIST30 CQSRSTP CQSSRDS CQSSTE00 CQSSTE10 CQSSTE20 CQSSTRUC CQSSTR00 CQSSTS10 CQSTRACE CQSTRTCH
GC18971501 |
Fix information
Fixed component name
IMS V10
Fixed component ID
5635A0100
Applicable component levels
R010 PSY UK37531
UP08/06/27 P F806
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
29 July 2015