IBM Support

OA31042: SYSTEM MANAGED REBUILD FOR A LOCK OR SERIALIZED LIST STRUCTURE HANGS IN QUIESCE 09/11/06 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A LOCK or SERIALIZE LIST structure system managed rebuild
    following a user managed rebuild hangs in
    PHASE:  WAITING FOR QUIESCE.
     EXTERNAL SYMPTOMS:
     D XCF,STR,STRNAME=str_name shows the rebuild is waiting for
     quiesce.
     11:14:14.91
     IXC360I  11.14.14  DISPLAY XCF 163
     STRNAME: str_name
      STATUS: REASON SPECIFIED WITH REBUILD START:
                OPERATOR INITIATED
              DUPLEXING REBUILD
                METHOD: SYSTEM-MANAGED
                  AUTO VERSION: xxxxxxxx xxxxxxxx
                PHASE:  WAITING FOR QUIESCE
     Trying to stop the rebuild results in hanging in the STOP
     phase.
     12:03:08.16
     SETXCF STOP,RB,DUPLEX,STRNM=str_name,KEEP=OLD
     IXC522I SYSTEM-MANAGED DUPLEXING REBUILD FOR STRUCTURE 214
     str_name IS BEING STOPPED
     TO FALL BACK TO THE OLD STRUCTURE DUE TO
     REQUEST FROM AN OPERATOR
     IXC571I SYSTEM-MANAGED DUPLEXING REBUILD FOR STRUCTURE 215
     str_name HAS COMPLETED THE QUIESCE PHASE
     AND IS ENTERING THE STOP PHASE.
      TIME: 11/05/2009 12:03:08.197233
      AUTO VERSION: C50A69E0 B465FD72
     IXC367I THE SETXCF STOP REBUILD REQUEST FOR STRUCTURE 216
     str_name WAS ACCEPTED.
     12:05:00.47
     D XCF,STR,STRNM=str_name
     IXC360I  12.05.00  DISPLAY XCF 394
     STRNAME: str_name
      STATUS: REASON SPECIFIED WITH REBUILD STOP:
                OPERATOR INITIATED
              DUPLEXING REBUILD STOPPING
                METHOD: SYSTEM-MANAGED
                  AUTO VERSION: xxxxxxxx xxxxxxxx
                PHASE:  STOP
     .
     Rebuild request is not accepted
     .
     12:08:26.30
     SETXCF START,RB,STRNM=str_name,LOC=OTHER
     IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 808
     str_name WAS REJECTED:
     REBUILD STOP IS IN PROGRESS FOR THE STRUCTURE
     12:10:42.45
     IXC360I  12.10.42  DISPLAY XCF 025
     STRNAME: str_name
      STATUS: REASON SPECIFIED WITH REBUILD STOP:
                OPERATOR INITIATED
              DUPLEXING REBUILD STOPPING
                METHOD: SYSTEM-MANAGED
                  AUTO VERSION: xxxxxxxx xxxxxxxx
                PHASE:  STOP
     .
     Canceling the connectors resolved the hang.
     .
     12:13:39.88
     C appropriate_application
     12:13:40.54
     IXC577I SYSTEM-MANAGED DUPLEXING REBUILD HAS 864
     BEEN STOPPED FOR STRUCTURE str_name
     STRUCTURE NOW IN COUPLING FACILITY CF22
      PHYSICAL STRUCTURE VERSION: C50A69DF 589BA892
      LOGICAL STRUCTURE VERSION: C50A69DF 589BA892
      AUTO VERSION: C50A69E0 B465FD72
    
     ANALYSIS:
     The problem is due to the getNextRecb routine in Ixlr2man which
     sets aRecEventType = RecEventType_kOtherQuiesce for the new
     unquiesce done by OA27289. It should be setting it to
     RecEventType_kOtherUnQuiesce. This causes RewaAutoQuiesce to
     stay on after the unquiesce, and messes up the subsequent
     quiesce for system managed rebuild which gets hung up in the
     quiesce phaase.
    
     KNOWN IMPACT:
     Rebuild hangs, connected application subsequently hangs and
     must be canceled to regain use.
     VERIFICATION STEPS:
     D XCF,STR,STRNAME=str_name shows the LOCK structure or
     SERIALIZE LIST structure in
     PHASE:  WAITING FOR QUIESCE
    
     PE INFORMATION:
     If the PTFs for OA27289 are not already applied, do no apply
     them.  If the PTFs are applied avoid rebuilds and consider
     removing the PTFs.
    
     USERS AFFECTED:
     Users running z/OS 1.8 (HBB7730) or above
     with APAR OA27289 who are running in a Parallel Sysplex
     environment making use of coupling facility
     serialized (lock or serialized list) structures for
     sysplex-scope
     serialization functions. In particular,those using user-managed
     rebuild followed by a system-managed rebuild.
    
     APAR OA27289 PTFs:
     UA50728 (HBB7730)
     UA50729 (HBB7740)
     UA50730 (HBB7750)
     UA50731 (HBB7760)
    
     USER IMPACT:
     APAR OA27289 fixed the problem it reported but introduced a new
     problem.
    
     Problem 1
     ---------
     PURGEDQ processing driven as part of
     user-managed rebuild cleanup processing
     may encounter SRBs suspended awaiting
     completion of requests to the coupling
     facility (CF) and issue ABENDS047B to
     force their completion.  When these
     suspended requests were made on behalf
     of the rebuild new connector as part of
     IXLLOCK or IXLSYNCH processing and get
     targeted by the PURGEDQ, XES module
     level recovery processing converts the
     ABEND to indicate the suspend failed
     and causes connector termination.
     Terminating the connector in this case
     is deemed too harsh an action.
    
     Problem 2
     ---------
     Users running z/OS 1.10 (HBB7750) may
     encounter an ABENDS0C4 in IXLR1SUS due
     to incorrectly referencing internal XES
     storage related to the rebuild new
     connector which has been freed.
    
     Problem 3
     ---------
     Users running z/OS 1.10 (HBB7750) or
     above may encounter ABENDS047B followed
     by ABENDS026 RSN0C6C001 and ABENDS026
     RSN0C210101 and connector termination
     due to improper handling of the
     ABENDS047B by the module level recovery
     exit for IXLR1GLB
    
     Problem 4
     ---------
     Improper handling of ABENDS047B by XES
     signalling routine IXLS1RNP may cause
     IXLLOCK request completion to not be
     made.  This may lead to hung IXLLOCK
     requests.
    
     The new problem introduced by the original
     APAR will cause a system-managed rebuild
     following a user-managed rebuild to either
     a lock or serialized list structure to hang
     in the WAITING FOR QUIESCE phase.
    

Local fix

  •  BYPASS/CIRCUMVENTION:
     If OA27289 is applied then avoid rebuilds if at all possible.
     If OA27289 is not applied, there is no exposure to this defect.
     RECOVERY ACTION:
     If the hang is encountered the connectors must be recycled.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Users running z/OS 1.8 (HBB7730) or above    *
    *                 who are running in a Parallel Sysplex        *
    *                 environment making use of coupling facility  *
    *                 serialized (lock or serialized list)         *
    *                 structures for sysplex-scope serialization   *
    *                 functions.  In particular, those wanting to  *
    *                 install service provided by APAR OA27289 or  *
    *                 those with APAR OA27289 installed, and using *
    *                 user-managed rebuild followed by a           *
    *                 system-managed rebuild.                      *
    ****************************************************************
    * PROBLEM DESCRIPTION: Due to a problem introduced by APAR     *
    *                      OA27289, a system-managed rebuild       *
    *                      following a user-managed rebuild to     *
    *                      either the same lock or serialized list *
    *                      structure may hang in the WAITING FOR   *
    *                      QUIESCE phase of the system-managed     *
    *                      rebuild.  Attempts to subsequently stop *
    *                      the system-managed rebulid result in a  *
    *                      hang in the STOP phase of the           *
    *                      system-managed rebuild.                 *
    *                                                              *
    *                      SYSPLEXDS                               *
    ****************************************************************
    * RECOMMENDATION: PTFs should be applied to all systems        *
    *                 in the sysplex via rolling ipl.  Note        *
    *                 that PTFs for APAR OA27289 still need        *
    *                 to be applied with PTFs for APAR             *
    *                 OA31042.                                     *
    ****************************************************************
    APAR OA27289 introduced a problem where user-managed rebuild
    processing incorrectly leaves an indication (RewaAutoQuiesce)
    on that a connection to a serialized structure has been quiesced
    for an auto process, such as system-managed rebuild.  Leaving
    this indication on, causes subsequent system-managed rebuild
    processing to get confused since it incorrectly determines the
    connection to a serialized connection is already quiesced when
    it isn't.  This leads to hangs when subsequently attempting to
    start the system-managed rebuild, or when attempting to stop the
    system-managed rebuild after the hang condition is encountered.
    
    MSGIXC360I displayed as part of entering the
    D XCF,STR,STRNAME=str_name command will include the following
    text:
         PHASE:  WAITING FOR QUIESCE
    when the system-managed rebuild hangs due to this problem when
    attempting to start the rebuild.
    
    MSGIXC360I displayed after subsequently attempting to stop the
    rebuild as part of entering the D XCF,STR,STRNAME=str_name
    command will include the following text:
         REBUILD STOPPING
    when the system-managed rebuild hangs due to this problem.
    

Problem conclusion

  • Modifications made to IXLR2MAN to not manipulate the
    RewaAutoQuiesce indication during user-managed rebuild
    processing.  This avoids incorrectly turning the indication on
    and leaving it on, preventing subsequent system-managed rebuild
    processing from getting confused that the connector to a
    serialized structure is quiesced when it really isn't.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA31042

  • Reported component name

    CROSS SYS.EXT.S

  • Reported component ID

    5752SCIXL

  • Reported release

    760

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2009-11-06

  • Closed date

    2009-12-03

  • Last modified date

    2010-11-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA51564 UA51565 UA51566 UA51567

Modules/Macros

  •    IXLR2MAN
    

Fix information

  • Fixed component name

    CROSS SYS.EXT.S

  • Fixed component ID

    5752SCIXL

Applicable component levels

  • R730 PSY UA51564

       UP10/10/06 P F010 «

  • R740 PSY UA51565

       UP09/12/16 P F912 «

  • R750 PSY UA51566

       UP09/12/16 P F912 «

  • R760 PSY UA51567

       UP09/12/16 P F912 «

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"760"}]

Document Information

Modified date:
10 January 2021