IBM Support

OA51685: ORPHANED LATCH SYS1.ICSF.LATCH.GLOBAL SYSZCKT ENQ ABEND0C4 CSFNCKDL ABEND18F RSN460 ISGLRELS

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • After migrating to ICSF release HCR77B0, with the
    ICSF_KEY_EXPIRATION Health Check on (Enabled by default), the
    LPAR began experiencing persistent latch contention on latch #1
    within the SYS1.ICSF.LATCH.GLOBAL latchset. HZSPROC was listed
    as the holder and an ICSF task was listed as the waiter:
    
    LATCH SET NAME:  SYS1.ICSF.LATCH.GLOBAL
    CREATOR JOBNAME: CSFPROD   CREATOR ASID: 0061
      LATCH NUMBER:  1
        REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED
    TIME
        HZSPROC    0014  SHARED     OWN       007F83F0   Y
    05:27:00.364
        CSFPROD    0061  EXCLUSIVE  WAIT      007F8588   Y
    00:09:41.933
    
    The ICSF task waiting for the latch above is also holding the
    following ENQ SHR:
    
    S=SYSTEMS SYSZCKT  CRPROD.CSF.SCSFCKDS.V01
     SYSNAME   JOBNAME   ASID     TCBADDR   EXC/SHR  STATUS
     PRDD      CSFPROD   0061     007F8588   SHARE    OWN
     PRDL      CSFPROD   0059     007CD528 EXCLUSIVE  WAIT
     PRDT      CSFPROD   0057     007CD528 EXCLUSIVE  WAIT
     PRDC      CSFPROD   005F     007CD528 EXCLUSIVE  WAIT
    
    Other systems on the plex also get stuck behind this SYSZCKT
    ENQ, causing plexwide ICSF contention. The result is that many
    applications throughout the plex (including CICS and DB2)
    requiring certain ICSF functionality will hang and/or timeout
    (AbendRLEL / ABENDU4082).
    
    ADDITIONAL SYMPTOMS:
    1. The CICS QR TCB was blocked causing the CICS region to stall.
    Dependent regions see timeout abends such as RLEL, AZI4 and
    AZI2 due to excessive IR WAIT delays for mirror (CSMI) tasks.
    2. DB2 L8 TCBs, up to the TCBLIMIT threshold, are in a hang
    causing tasks to abend AEXZ - these are timed out waiting for an
    OPEN L8 TCB.  Tasks on the L8 TCBs appear to be ***Running** in
    CICS terms but are waiting for ICSF to respond.
    In SMF data, tasks abending AEXZ have MaxOTDly times that match
    the task's Deadlock Timeout Interval.
    3. Many CICS TCBs are in a permanent WAIT in module IEAVEPS1
    showing PC call from Module CSFINLP2 + X'4A4'.
    4. ICSF ABEND18F RSN460 in CSFPLKUP because the SRB responsible
    for updating the current messaging sequence number gets hung up
    behind the SYS1.ICSF.LATCH.GLOBAL contention. The "CSFPLKUP
    PrepMsg" CTRACE entry will show an underlying RC1F RSN8CC0, and
    the SRB will show up as an EXCL latch waiter, with a WEB address
    (pointing to SRB).
    
    ANALYSIS:
    The ICSF_KEY_EXPIRATION health check, introduced at ICSF release
    HCR77B0, calls the CSFKDSL service, invoking CSFNCKDL. The
    List_Labels routine is querying information within an LblF_Table
    without checking whether the table contains any entries. The
    bad check causes us to pick up an invalid length and use it to
    copy residual stack data from the LBLF_AREA into CA_LABEL,
    causing a large overlay of CSFNCKDL's stack. Our latch token is
    part of the data which gets overlaid, leading to two possible
    outcomes:
      * The token is overlaid with garbage, causing an abend0C4 in
        GRS ISGLRELS code.
      * The token is overlaid with zeros, so the associated latch
        becomes orphaned with no outward symptoms until an EXCL
        waiter enters the picture.
    
    
    Verification Steps:
    1. On one system in the plex, D GRS,C will show jobname HZSPROC
       holding either latch #1 or #2 in latch set
       SYS1.ICSF.LATCH.GLOBAL for a persistently
       long period of time.  He is holding the latch SHR.
    2. An ICSF task will likely also be holding SYSZCKT ENQ as
       shown in display above.
    3. Holder of SYSZCKT ENQ is waiting for latch #1 or #2 EXCL on
       same system.
    4. If plexwide symptoms, D GRS,C on other systems will show
       a picture where all contention traces back to a local
       ICSF task waiting on the SYSZCKT ENQ held in step 2.
    5. On system with HZSPROC latch holder, issue
       IP VERBX CSFDATA 'CELL' and search on HZSPROC's ASCB
       address.  It should not be in the display, indicating
       that the HZSPROC task is no longer active within an ICSF
       syscall.
    6. Check for any occurrences of ABEND0C4 in CSFNCKDL+???
       on any system in the plex. Note systems with orphaned
       latch(es) will likely not show this ABEND0C4.
    7. Ensure ICSF release is HCR77B0 or higher on system with
       HZSPROC latch holder.
    
    ADDITIONAL KEYWORDS:
    CDB2CONN MXTDelay OPENPOOL IRLINK
    

Local fix

  • CIRCUMVENTION/BYPASS:
    Disabling the ICSF_KEY_EXPIRATION health check will prevent the
    reported issue. You can disable the health check by going into
    the SDSF panel -> CK
    Then, put an "H" to the left of the ICSF_KEY_EXPIRATION Health
    Check to  disable the check.  Exit and reenter the CK panel to
    verify that your  check now looks as follows:
    
        ICSF_KEY_EXPIRATION       IBMICSF INACTIVE(ENABLED)
    
    Alternatively, you could issue the following console command to
    make the same dynamic change:
    
        MODIFY HZSPROC,DEACTIVATE,CHECK(IBMICSF,ICSF_KEY_EXPIRATION)
    
    Note that this dynamic HC deactivation change will NOT persist
    across restarts of ICSF.  To make a persistent change, please
    modify your health check parmlib member (HZSPRMxx) with the
    following line on all systems running ICSF release HCR77B0 or
    higher:
    
        UPDATE CHECK(IBMICSF,ICSF_KEY_EXPIRATION) INACTIVE
    
    ++APARs are now available.
    RECOVERY ACTION:
    Recycle ICSF on any systems showing HZSPROC as a persistent
    holder of latch#1 or latch#2 in the SYS1.ICSF.LATCH.GLOBAL latch
    set (D GRS,C will show HZSPROC task holding this latch
    persistently for minutes or hours). Once ICSF is recycled, any
    application using SSL will also need to be recycled.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Users of ICSF with a CKDS and/or PKDS with   *
    *                 the common record format (KDSR)              *
    ****************************************************************
    * PROBLEM DESCRIPTION: Latch contention was caused by the      *
    *                      ICSF_KEY_EXPIRATION health check not    *
    *                      releasing the CKDS and PKDS latches     *
    *                      properly during processing. Internal    *
    *                      storage was overlaid causing the        *
    *                      release of the latches not to be        *
    *                      attempted.                              *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    ----------------------PROBLEM SUMMARY--------------------------
    Internal storage including the latch token was overlaid during
    processing. The CKDS and PKDS latches were not properly released
    by the ICSF_KEY_EXPIRATION health check.
    

Problem conclusion

  • The cause of the storage overlay was determined and corrected.
    The CKDS and PKDS latches will be properly released by the
    ICSF_KEY_EXPIRATION health check.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA51685

  • Reported component name

    ICSF/MVS

  • Reported component ID

    568505101

  • Reported release

    7B0

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-11-23

  • Closed date

    2016-12-08

  • Last modified date

    2017-01-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA83599 UA83600 UA83601

Modules/Macros

  • CSFNCKDL
    

Fix information

  • Fixed component name

    ICSF/MVS

  • Fixed component ID

    568505101

Applicable component levels

  • R7B0 PSY UA83599

       UP16/12/09 P F612 Ž

  • R7B1 PSY UA83600

       UP16/12/09 P F612 Ž

  • R7C0 PSY UA83601

       UP16/12/09 P F612 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
06 January 2017