IBM Support

OA61848: FALSE CONTENTION RATE INCREASE, EXCESSIVE CHASE PROCESSING FOR LTE GLOBALLY MANAGED WITHOUT RESOURCES 21/08/02 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer observed the false contention rate increased after PTF
    for OA60394 was rolled in all LPAR in the plex.
    False contention rate (ie False contention count/ total
    contention) went up to 12 to 19% while it was below 2% prior.
    LOCK str itself was also enlarged to double the number of the
    LOCKs than before.
    This condition is one of the problem related to symptom
    described in OA60854.
    In this incident, ISGLOCK str for GRS was the str of interest.
    Based on the SYSXES CTRACE collected on all members with the
    dumps captured by the :
    SLIP
    SET,IF,RA=(6.10%+8C?+9C?+1C?+24?+FAE),DATA=(3R,GT,000007D0),ASID
    
    SVCD,JL=GRS,SDATA=(XESDATA,COUPLE,GRSQ,RGN,SQA,CSA,TRT,NUC,SUM),
    We could see the ENQ /DEQ request for single resource with
    EXCLusive interest were going toward the GLM of the LTE on
    remote LPAR. On the  remote LPAR, it has no other resource under
    such LTE. Ideally, the GLM role on this LTE should be
    de-escalated once the last resource under this LTE is DEQ'ed.But
    the pacing indicator was reset after the DEQ, so the actual
    de-escalate was delayed.
    
    Higher rates of false contention can also be observed whenever
    a LTE is being globally managed with no lock resources and a
    new request for the same LTE is processed by the global manager
    when no contention exists.
    
    During the implementation of the fix, we also recognize the fact
    that global lock cleanup processing currently did NOT reset the
    global byte for the LTE in str once there was no resource
    underneath the LTE. In other words, once the LTE is placed into
    the defer de-escalate state, global lock cleanup process will
    NOT clear off the GLM id on the str. That could lead to a lot
    more LTE entries in the str with dirty global byte which
    eventually triggering many more CHASE processing on all
    connectors when trying to update resource under such LTE with
    out dated GLM connector id.
    
    
    PE INFORMATION:
    USERS AFFECTED:
    
    For false contention rate increase:
    
    All users with either OA60394 HBB77B0 PTF UJ04727 or HBB77C0 PTF
    UJ04728 installed supporting connectors to lock structures.
    
    APAR OA60394 corrected a performance impact due to a higher
    false contention rate for a lock structure that resulted with
    the fix for APAR OA59122 HBB77B0 PTF UJ03932 and HBB77C0 PTF
    UJ03933.
    
    
    For excessive Chase processing when LTE globally managed with no
    lock resources:
    
    All users with OA59122 PTF UJ03932 or UJ03933 installed
    supporting connectors to lock structures.
    
    APAR OA59122 addressed a problem where locking connectors become
    message-isolated when a burst of IXLLOCK requests releases a
    large number of globally managed resources causing the XES
    global management of those resources to end suddenly.
    
    
    USER IMPACT:
    
    For false contention rate increase:
    
    APAR OA60394 did not completely fix the problem it reported.
    
    APAR OA60394 reduced much of the performance impact caused by a
    higher rate of false contention encountered when XES lock
    resources are no longer in contention and XES global management
    of the resources does not end in a timely manner. However some
    performance impact due to a higher rate of false contention for
    a lock structures may still be encountered when a single
    exclusive resource being globally managed by another connector
    is no longer in contention. It may also be encountered when a
    LTE is being globally managed with no lock resources and a new
    request for the same LTE is processed by the global manager when
    no contention exists.
    While spikes of false contention rates from 1-19% have been
    observed due to this problem, no severe impact has been
    encountered from it.
    
    For excessive chase processing when LTE globally managed with no
    lock resources:
    
    APAR OA59122 fixed the problem it reported but introduced a new
    problem.
    
    APAR OA59122 better handles bursts of IXLLOCK requests that
    cause global management of a large number of resources to end
    avoiding becoming message-isolated  (and associated MSGIXC638I
    and MSGIXC645E from being issued), and avoiding exploiter delays
    (avoiding ABENDS026 RSN08118001 from being issued) due to the
    associated processing. It introduces some performance impact due
    to triggering chase processing on all connectors when trying to
    update the resource being managed for LTEs that are improperly
    cleaned up during connector termination. During chase
    processing, the lock structure exploiter will be unable to
    access the LTE. Excessive chase processing may result in other
    problems such as those described for APAR OA61788.
    

Local fix

  • BYPASS/CIRCUMVENTION:
    No local fix
    
    RECOVERY ACTION:
    Running with the fix for OA60394 should not be a severely
    impactful condition.
    Apply the ++APAR for OA61848 when it becomes available.
    for the final fix.
    Removal of ++APAR packages mentioned on 08/10/2021 as the fix
    package did NOT include the fix for the global lock cleanup
    change.                      08/24/2021 POK NCH
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users who are running in a Parallel Sysplex environment      *
    * making use of coupling facility (CF) lock structures for     *
    * sysplex-scope serialization functions with APAR OA59122 or   *
    * OA60394 installed.                                           *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * High rates of false contention may be encountered due to     *
    * mismanagement of an internal XES indication when a connector *
    * other than the global lock manager (GLM) is the only one     *
    * with an exclusive interest in a resource that is no longer   *
    * in contention. The high false contention rate may also be    *
    * encountered whenever a lock table entry (LTE) is being       *
    * globally managed with no lock resources and a new request    *
    * for the same LTE is processed by the global manager when no  *
    * lock contention exists. Additionally, peer survivor recovery *
    * processing for a terminating connector with LTEs globally    *
    * managed with no lock resources can fail to properly clean up *
    * the LTE. This can result in a performance problem during     *
    * request processing as XES chase processing corrects the LTE  *
    * with a proper GLM.                                           *
    *                                                              *
    * SYSPLEXDS                                                    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * This PTF will not be fully effective on the system it is     *
    * being applied until the PTF(s) for this APAR are applied to  *
    * all systems in the sysplex.                                  *
    * An IPL is required to activate this fix on each system of    *
    * the SYSPLEX, however, a rolling IPL is sufficient to         *
    * accomplish the activation.                                   *
    ****************************************************************
    With APAR OA59122, LTEs being globally managed with no lock
    resources may encounter false contention when a new request for
    the same LTE is processed by the global manager even when no
    lock contention still exists. This contributes to a higher rate
    of false contention. Peer survivor recovery processing for a
    terminating connector with LTEs in this state may also miss
    properly cleaning up the LTE. This can subsequently cause
    excessive chase processing to occur for the LTE to get it
    corrected.
    
    Due to an error with the management of an internal XES
    indication with APAR OA60394, a high rate of false contention
    may be encountered when a connector other than the GLM is the
    only one left with an exclusive interest in a resource that is
    no longer in contention.
    

Problem conclusion

  • Logic added to reduce high rates of false contention by:
    1.No longer reporting requests for LTEs being globally managed
    with no lock resources that are not in contention.
    2.Correcting management of internal XES indication when a
    connector other than the GLM is the only one left with an
    exclusive interest in a resource that is no longer in
    contention.
    
    Logic added to peer survivor recovery processing to properly
    clean up LTEs being globally managed with no lock resources.
    
    +--- PUBLICATIONS AFFECTED -----------------------------------+
    |                                                             |
    | o   z/OS MVS System Messages, Vol 10 (IXC-IZP),             |
    |     SA38-0677-xx                                            |
    +-------------------------------------------------------------+
    
    In z/OS MVS System Messages, Vol 10 (IXC-IZP), MSGIXL016I is
    updated to include new Additional Status Information, and
    include other missing information:
    
      IXL016I CONNECTOR conname TO [NEW] STRUCTURE strname
      [DISCONNECTING | TERMINATING]:
      JOB jobname ASID asid trigger.
    | [statusinfo]
    
      Explanation
    
      Connector disconnect or termination processing is being
      performed by XES.
    
      In the message text:
    
      conname
        The name of the CF structure connection.
    | NEW
    |   Indicates processing for the rebuild connection.
    |   When not included indicates processing for the
    |   original or only connection.
      strname
        The name of the CF structure.
    | DISCONNECTING
    |   Indicates the connector is disconnecting.
    | TERMINATING
    |   Indicates the connector is terminating.
    ...
      trigger
        One of the following:
    ...
          REQUESTED DISCONNECT WITH LOCK RESOURCES
            The connector used IXLDISC REASON=NORMAL or IXLDISC
            REASON=DELETESTR to disconnect from the structure, but
            the lock structure connector held lock resources.
    
    |       For connections to a lock structure with RECORD=YES,
    |       the disconnect is treated as if the connector had used
    |       IXLDISC REASON=FAILURE. Note in particular that if
    |       IXLDISC REASON=DELETESTR was used in this case and the
    |       structure has a disposition of keep, the structure will
    |       persist even if it has no connectors.
    ...
    |     REQUESTED DISCONNECT REASON=DELETESTR
    |       The connector used IXLDISC REASON=DELETESTR to
    |       disconnect from the structure.
    | statusinfo
    |   Status information which may include:
    |
    |     ADDITIONAL STATUS INFORMATION:
    |       This line is issued whenever there is additional status
    |       information to be displayed. It is followed by the
    |       following line:
    |
    |       - MANAGING CONTENTION WITH NO LOCK RESOURCES
    |         The lock structure connector is managing contention
    |         with no lock resources currently held. Lock table
    |         cleanup by peer connectors is required.
    ...
      System action
    
      The system continues processing.
    
    | When z/OS is running on a server that supports recovery
    | process boosts, coupling facility disconnect processing takes
    | advantage of System Recovery Boost's recovery process boost
    | support to expedite lock structure connector cleanup
    | processing performed by the system and active connected
    | users. When a connected user is disconnected from a lock
    | structure while holding lock resources, managing contention
    | with no lock resources currently held or is disconnected
    | implicitly as the result of task termination, address space
    | termination or a system being removed from the sysplex, a
    | System Recovery Boost will be requested on the systems where
    | active connected users receive the Disconnected or Failed
    | Connection event.
    ...
    
    +--- PUBLICATIONS AFFECTED -----------------------------------+
    |                                                             |
    | o   z/OS MVS System Codes,                                  |
    |     SA38-0665-xx                                            |
    +-------------------------------------------------------------+
    
    Change reason code 0A0D0101 shown under Operator response for
    026 System completion code to 0A0Dxxxx.
    
    +--- PUBLICATIONS AFFECTED -----------------------------------+
    |                                                             |
    | o   z/OS MVS Programming: Sysplex Services Guide,           |
    |     SA23-1400-xx                                            |
    +-------------------------------------------------------------+
    
    Under Sysplex Services for Data Sharing (XES)
      Connection Services make the following updates/corrections:
    
    Under Disconnecting from a Coupling Facility Structure,
      Persistence Considerations,
        Handling Resources for a Disconnection:
    
    Update description for handling resources for a disconnection
    from a lock structure:
    
    ...
      -Lock structure
    ...
    | When z/OS is running on a server that supports recovery
      process boosts, coupling facility disconnect processing takes
      advantage of System Recovery Boost's recovery process boost
      support to expedite lock structure connector cleanup
      processing performed by the system and active connected
      users. When a connected user is disconnected from a lock
    | structure while holding lock resources, managing contention
    | with no lock resources currently held or is disconnected
    | implicitly as the result of task termination, address space
      termination or a system being removed from the sysplex, a
      System Recovery Boost will be requested on the systems where
      active connected users receive the Disconnected or Failed
      Connection event.
    
    Under Disconnecting from a Coupling Facility Structure,
      Persistence Considerations,
        Disconnection to Delete the Structure:
    
    Update note:
    
    | Note: If a connected user with RECORD=YES disconnects with
      REASON=DELETESTR on IXLDISC while still holding locks, XES
      treats the disconnect as if the user had specified
      REASON=FAILURE on IXLDISC.
    
    Under Structure Concepts,
      Identifying Connection States:
    
    Update note:
      *An IXLDISC REASON=NORMAL or REASON=DELETESTR request by a
    | connection with RECORD=YES which owns resources in a lock
      structure will be converted to an IXLDISC REASON=FAILURE
      request.
    
    Under Structure Concepts,
      Understanding Connection Persistence and Structure
      Persistence,
        Connection Persistence:
    
    Update last paragraph:
    
      If the connection terminates normally (disconnect with
      REASON=NORMAL or REASON=DELETESTR), the persistence attribute
      for the connection does not apply, and so the connection
    | becomes not defined. However, if a connector with RECORD=YES
      to a lock structure disconnects with REASON=NORMAL or
      REASON=DELETESTR while still owning resources associated
      with the lock structure, XES converts the reason to
      REASON=FAILURE.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA61848

  • Reported component name

    CROSS SYS.EXT.S

  • Reported component ID

    5752SCIXL

  • Reported release

    7B0

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-07-28

  • Closed date

    2022-02-09

  • Last modified date

    2022-03-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UJ07740 UJ07741 UJ07742

Modules/Macros

  • IXLR1GLB IXLR1GLU IXLF1TFT IXLC1DCN IXLR2SSF IXCL2GAT IXLR2SSD
    IXLF1TX2 IXLF1TF5 IXLM1MST IXLR1GLC IXLI1SIN
    

Publications Referenced
SA380677XXSA380665XXSA231400XX  

Fix information

  • Fixed component name

    CROSS SYS.EXT.S

  • Fixed component ID

    5752SCIXL

Applicable component levels

  • R7D0 PSY UJ07742

       UP22/02/23 P F202 ­

  • R7C0 PSY UJ07741

       UP22/02/23 P F202 ­

  • R7B0 PSY UJ07740

       UP22/02/23 P F202 ­

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"7B0"}]

Document Information

Modified date:
02 March 2022