A fix is available
APAR status
Closed as program error.
Error description
A LOCK or SERIALIZE LIST structure system managed rebuild following a user managed rebuild hangs in PHASE: WAITING FOR QUIESCE. EXTERNAL SYMPTOMS: D XCF,STR,STRNAME=str_name shows the rebuild is waiting for quiesce. 11:14:14.91 IXC360I 11.14.14 DISPLAY XCF 163 STRNAME: str_name STATUS: REASON SPECIFIED WITH REBUILD START: OPERATOR INITIATED DUPLEXING REBUILD METHOD: SYSTEM-MANAGED AUTO VERSION: xxxxxxxx xxxxxxxx PHASE: WAITING FOR QUIESCE Trying to stop the rebuild results in hanging in the STOP phase. 12:03:08.16 SETXCF STOP,RB,DUPLEX,STRNM=str_name,KEEP=OLD IXC522I SYSTEM-MANAGED DUPLEXING REBUILD FOR STRUCTURE 214 str_name IS BEING STOPPED TO FALL BACK TO THE OLD STRUCTURE DUE TO REQUEST FROM AN OPERATOR IXC571I SYSTEM-MANAGED DUPLEXING REBUILD FOR STRUCTURE 215 str_name HAS COMPLETED THE QUIESCE PHASE AND IS ENTERING THE STOP PHASE. TIME: 11/05/2009 12:03:08.197233 AUTO VERSION: C50A69E0 B465FD72 IXC367I THE SETXCF STOP REBUILD REQUEST FOR STRUCTURE 216 str_name WAS ACCEPTED. 12:05:00.47 D XCF,STR,STRNM=str_name IXC360I 12.05.00 DISPLAY XCF 394 STRNAME: str_name STATUS: REASON SPECIFIED WITH REBUILD STOP: OPERATOR INITIATED DUPLEXING REBUILD STOPPING METHOD: SYSTEM-MANAGED AUTO VERSION: xxxxxxxx xxxxxxxx PHASE: STOP . Rebuild request is not accepted . 12:08:26.30 SETXCF START,RB,STRNM=str_name,LOC=OTHER IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 808 str_name WAS REJECTED: REBUILD STOP IS IN PROGRESS FOR THE STRUCTURE 12:10:42.45 IXC360I 12.10.42 DISPLAY XCF 025 STRNAME: str_name STATUS: REASON SPECIFIED WITH REBUILD STOP: OPERATOR INITIATED DUPLEXING REBUILD STOPPING METHOD: SYSTEM-MANAGED AUTO VERSION: xxxxxxxx xxxxxxxx PHASE: STOP . Canceling the connectors resolved the hang. . 12:13:39.88 C appropriate_application 12:13:40.54 IXC577I SYSTEM-MANAGED DUPLEXING REBUILD HAS 864 BEEN STOPPED FOR STRUCTURE str_name STRUCTURE NOW IN COUPLING FACILITY CF22 PHYSICAL STRUCTURE VERSION: C50A69DF 589BA892 LOGICAL STRUCTURE VERSION: C50A69DF 589BA892 AUTO VERSION: C50A69E0 B465FD72 ANALYSIS: The problem is due to the getNextRecb routine in Ixlr2man which sets aRecEventType = RecEventType_kOtherQuiesce for the new unquiesce done by OA27289. It should be setting it to RecEventType_kOtherUnQuiesce. This causes RewaAutoQuiesce to stay on after the unquiesce, and messes up the subsequent quiesce for system managed rebuild which gets hung up in the quiesce phaase. KNOWN IMPACT: Rebuild hangs, connected application subsequently hangs and must be canceled to regain use. VERIFICATION STEPS: D XCF,STR,STRNAME=str_name shows the LOCK structure or SERIALIZE LIST structure in PHASE: WAITING FOR QUIESCE PE INFORMATION: If the PTFs for OA27289 are not already applied, do no apply them. If the PTFs are applied avoid rebuilds and consider removing the PTFs. USERS AFFECTED: Users running z/OS 1.8 (HBB7730) or above with APAR OA27289 who are running in a Parallel Sysplex environment making use of coupling facility serialized (lock or serialized list) structures for sysplex-scope serialization functions. In particular,those using user-managed rebuild followed by a system-managed rebuild. APAR OA27289 PTFs: UA50728 (HBB7730) UA50729 (HBB7740) UA50730 (HBB7750) UA50731 (HBB7760) USER IMPACT: APAR OA27289 fixed the problem it reported but introduced a new problem. Problem 1 --------- PURGEDQ processing driven as part of user-managed rebuild cleanup processing may encounter SRBs suspended awaiting completion of requests to the coupling facility (CF) and issue ABENDS047B to force their completion. When these suspended requests were made on behalf of the rebuild new connector as part of IXLLOCK or IXLSYNCH processing and get targeted by the PURGEDQ, XES module level recovery processing converts the ABEND to indicate the suspend failed and causes connector termination. Terminating the connector in this case is deemed too harsh an action. Problem 2 --------- Users running z/OS 1.10 (HBB7750) may encounter an ABENDS0C4 in IXLR1SUS due to incorrectly referencing internal XES storage related to the rebuild new connector which has been freed. Problem 3 --------- Users running z/OS 1.10 (HBB7750) or above may encounter ABENDS047B followed by ABENDS026 RSN0C6C001 and ABENDS026 RSN0C210101 and connector termination due to improper handling of the ABENDS047B by the module level recovery exit for IXLR1GLB Problem 4 --------- Improper handling of ABENDS047B by XES signalling routine IXLS1RNP may cause IXLLOCK request completion to not be made. This may lead to hung IXLLOCK requests. The new problem introduced by the original APAR will cause a system-managed rebuild following a user-managed rebuild to either a lock or serialized list structure to hang in the WAITING FOR QUIESCE phase.
Local fix
BYPASS/CIRCUMVENTION: If OA27289 is applied then avoid rebuilds if at all possible. If OA27289 is not applied, there is no exposure to this defect. RECOVERY ACTION: If the hang is encountered the connectors must be recycled.
Problem summary
**************************************************************** * USERS AFFECTED: Users running z/OS 1.8 (HBB7730) or above * * who are running in a Parallel Sysplex * * environment making use of coupling facility * * serialized (lock or serialized list) * * structures for sysplex-scope serialization * * functions. In particular, those wanting to * * install service provided by APAR OA27289 or * * those with APAR OA27289 installed, and using * * user-managed rebuild followed by a * * system-managed rebuild. * **************************************************************** * PROBLEM DESCRIPTION: Due to a problem introduced by APAR * * OA27289, a system-managed rebuild * * following a user-managed rebuild to * * either the same lock or serialized list * * structure may hang in the WAITING FOR * * QUIESCE phase of the system-managed * * rebuild. Attempts to subsequently stop * * the system-managed rebulid result in a * * hang in the STOP phase of the * * system-managed rebuild. * * * * SYSPLEXDS * **************************************************************** * RECOMMENDATION: PTFs should be applied to all systems * * in the sysplex via rolling ipl. Note * * that PTFs for APAR OA27289 still need * * to be applied with PTFs for APAR * * OA31042. * **************************************************************** APAR OA27289 introduced a problem where user-managed rebuild processing incorrectly leaves an indication (RewaAutoQuiesce) on that a connection to a serialized structure has been quiesced for an auto process, such as system-managed rebuild. Leaving this indication on, causes subsequent system-managed rebuild processing to get confused since it incorrectly determines the connection to a serialized connection is already quiesced when it isn't. This leads to hangs when subsequently attempting to start the system-managed rebuild, or when attempting to stop the system-managed rebuild after the hang condition is encountered. MSGIXC360I displayed as part of entering the D XCF,STR,STRNAME=str_name command will include the following text: PHASE: WAITING FOR QUIESCE when the system-managed rebuild hangs due to this problem when attempting to start the rebuild. MSGIXC360I displayed after subsequently attempting to stop the rebuild as part of entering the D XCF,STR,STRNAME=str_name command will include the following text: REBUILD STOPPING when the system-managed rebuild hangs due to this problem.
Problem conclusion
Modifications made to IXLR2MAN to not manipulate the RewaAutoQuiesce indication during user-managed rebuild processing. This avoids incorrectly turning the indication on and leaving it on, preventing subsequent system-managed rebuild processing from getting confused that the connector to a serialized structure is quiesced when it really isn't.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
OA31042
Reported component name
CROSS SYS.EXT.S
Reported component ID
5752SCIXL
Reported release
760
Status
CLOSED PER
PE
YesPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2009-11-06
Closed date
2009-12-03
Last modified date
2010-11-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA51564 UA51565 UA51566 UA51567
Modules/Macros
IXLR2MAN
Fix information
Fixed component name
CROSS SYS.EXT.S
Fixed component ID
5752SCIXL
Applicable component levels
R730 PSY UA51564
UP10/10/06 P F010 «
R740 PSY UA51565
UP09/12/16 P F912 «
R750 PSY UA51566
UP09/12/16 P F912 «
R760 PSY UA51567
UP09/12/16 P F912 «
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"760"}]
Document Information
Modified date:
10 January 2021