A fix is available
APAR status
Closed as program error.
Error description
After migrating to ICSF release HCR77B0, with the ICSF_KEY_EXPIRATION Health Check on (Enabled by default), the LPAR began experiencing persistent latch contention on latch #1 within the SYS1.ICSF.LATCH.GLOBAL latchset. HZSPROC was listed as the holder and an ICSF task was listed as the waiter: LATCH SET NAME: SYS1.ICSF.LATCH.GLOBAL CREATOR JOBNAME: CSFPROD CREATOR ASID: 0061 LATCH NUMBER: 1 REQUESTOR ASID EXC/SHR OWN/WAIT WORKUNIT TCB ELAPSED TIME HZSPROC 0014 SHARED OWN 007F83F0 Y 05:27:00.364 CSFPROD 0061 EXCLUSIVE WAIT 007F8588 Y 00:09:41.933 The ICSF task waiting for the latch above is also holding the following ENQ SHR: S=SYSTEMS SYSZCKT CRPROD.CSF.SCSFCKDS.V01 SYSNAME JOBNAME ASID TCBADDR EXC/SHR STATUS PRDD CSFPROD 0061 007F8588 SHARE OWN PRDL CSFPROD 0059 007CD528 EXCLUSIVE WAIT PRDT CSFPROD 0057 007CD528 EXCLUSIVE WAIT PRDC CSFPROD 005F 007CD528 EXCLUSIVE WAIT Other systems on the plex also get stuck behind this SYSZCKT ENQ, causing plexwide ICSF contention. The result is that many applications throughout the plex (including CICS and DB2) requiring certain ICSF functionality will hang and/or timeout (AbendRLEL / ABENDU4082). ADDITIONAL SYMPTOMS: 1. The CICS QR TCB was blocked causing the CICS region to stall. Dependent regions see timeout abends such as RLEL, AZI4 and AZI2 due to excessive IR WAIT delays for mirror (CSMI) tasks. 2. DB2 L8 TCBs, up to the TCBLIMIT threshold, are in a hang causing tasks to abend AEXZ - these are timed out waiting for an OPEN L8 TCB. Tasks on the L8 TCBs appear to be ***Running** in CICS terms but are waiting for ICSF to respond. In SMF data, tasks abending AEXZ have MaxOTDly times that match the task's Deadlock Timeout Interval. 3. Many CICS TCBs are in a permanent WAIT in module IEAVEPS1 showing PC call from Module CSFINLP2 + X'4A4'. 4. ICSF ABEND18F RSN460 in CSFPLKUP because the SRB responsible for updating the current messaging sequence number gets hung up behind the SYS1.ICSF.LATCH.GLOBAL contention. The "CSFPLKUP PrepMsg" CTRACE entry will show an underlying RC1F RSN8CC0, and the SRB will show up as an EXCL latch waiter, with a WEB address (pointing to SRB). ANALYSIS: The ICSF_KEY_EXPIRATION health check, introduced at ICSF release HCR77B0, calls the CSFKDSL service, invoking CSFNCKDL. The List_Labels routine is querying information within an LblF_Table without checking whether the table contains any entries. The bad check causes us to pick up an invalid length and use it to copy residual stack data from the LBLF_AREA into CA_LABEL, causing a large overlay of CSFNCKDL's stack. Our latch token is part of the data which gets overlaid, leading to two possible outcomes: * The token is overlaid with garbage, causing an abend0C4 in GRS ISGLRELS code. * The token is overlaid with zeros, so the associated latch becomes orphaned with no outward symptoms until an EXCL waiter enters the picture. Verification Steps: 1. On one system in the plex, D GRS,C will show jobname HZSPROC holding either latch #1 or #2 in latch set SYS1.ICSF.LATCH.GLOBAL for a persistently long period of time. He is holding the latch SHR. 2. An ICSF task will likely also be holding SYSZCKT ENQ as shown in display above. 3. Holder of SYSZCKT ENQ is waiting for latch #1 or #2 EXCL on same system. 4. If plexwide symptoms, D GRS,C on other systems will show a picture where all contention traces back to a local ICSF task waiting on the SYSZCKT ENQ held in step 2. 5. On system with HZSPROC latch holder, issue IP VERBX CSFDATA 'CELL' and search on HZSPROC's ASCB address. It should not be in the display, indicating that the HZSPROC task is no longer active within an ICSF syscall. 6. Check for any occurrences of ABEND0C4 in CSFNCKDL+??? on any system in the plex. Note systems with orphaned latch(es) will likely not show this ABEND0C4. 7. Ensure ICSF release is HCR77B0 or higher on system with HZSPROC latch holder. ADDITIONAL KEYWORDS: CDB2CONN MXTDelay OPENPOOL IRLINK
Local fix
CIRCUMVENTION/BYPASS: Disabling the ICSF_KEY_EXPIRATION health check will prevent the reported issue. You can disable the health check by going into the SDSF panel -> CK Then, put an "H" to the left of the ICSF_KEY_EXPIRATION Health Check to disable the check. Exit and reenter the CK panel to verify that your check now looks as follows: ICSF_KEY_EXPIRATION IBMICSF INACTIVE(ENABLED) Alternatively, you could issue the following console command to make the same dynamic change: MODIFY HZSPROC,DEACTIVATE,CHECK(IBMICSF,ICSF_KEY_EXPIRATION) Note that this dynamic HC deactivation change will NOT persist across restarts of ICSF. To make a persistent change, please modify your health check parmlib member (HZSPRMxx) with the following line on all systems running ICSF release HCR77B0 or higher: UPDATE CHECK(IBMICSF,ICSF_KEY_EXPIRATION) INACTIVE ++APARs are now available. RECOVERY ACTION: Recycle ICSF on any systems showing HZSPROC as a persistent holder of latch#1 or latch#2 in the SYS1.ICSF.LATCH.GLOBAL latch set (D GRS,C will show HZSPROC task holding this latch persistently for minutes or hours). Once ICSF is recycled, any application using SSL will also need to be recycled.
Problem summary
**************************************************************** * USERS AFFECTED: Users of ICSF with a CKDS and/or PKDS with * * the common record format (KDSR) * **************************************************************** * PROBLEM DESCRIPTION: Latch contention was caused by the * * ICSF_KEY_EXPIRATION health check not * * releasing the CKDS and PKDS latches * * properly during processing. Internal * * storage was overlaid causing the * * release of the latches not to be * * attempted. * **************************************************************** * RECOMMENDATION: * **************************************************************** ----------------------PROBLEM SUMMARY-------------------------- Internal storage including the latch token was overlaid during processing. The CKDS and PKDS latches were not properly released by the ICSF_KEY_EXPIRATION health check.
Problem conclusion
The cause of the storage overlay was determined and corrected. The CKDS and PKDS latches will be properly released by the ICSF_KEY_EXPIRATION health check.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
OA51685
Reported component name
ICSF/MVS
Reported component ID
568505101
Reported release
7B0
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-11-23
Closed date
2016-12-08
Last modified date
2017-01-06
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA83599 UA83600 UA83601
Modules/Macros
CSFNCKDL
Fix information
Fixed component name
ICSF/MVS
Fixed component ID
568505101
Applicable component levels
R7B0 PSY UA83599
UP16/12/09 P F612
R7B1 PSY UA83600
UP16/12/09 P F612
R7C0 PSY UA83601
UP16/12/09 P F612
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
06 January 2017