A fix is available
APAR status
Closed as program error.
Error description
The TCPIP sockets code which earlier ran under L8007 as part of TASK-00066 has created a USS process for TCB L8007. During shutdown, several tasks get purged including TASK-00066. This task was running OPENAPI under L8007 when the purge got issued which resulted in an ABTERM of the L8 TCB. From this point, nothing can execute on L8007 and any CHANGE_MODEs to this TCB will fail. At task termination, new code added to DFHDSKE in CICS TS 5.5 attempts to release any USS process associated with an open TCB. However, this code fails to take account of the fact that the L8 is no longer be available. It ends making a BPX1MPC (process cleanup call) while running on the QR TCB (after the CHANGE_MODE to L8 fails). The code should never do this. This appears to hang for a period. Eventually, this call triggers a series of severe errors elsewhere in CICS. This is what the task's kernel stacks look like: KE_NUM @STACK LEN TYPE ADDRESS LINK REG OFFSET ERR NAME 00F5 28D61040 0200 Bot 27102F00 A71033FE 0004FE DFHKETA 00F5 28D61240 03D0 Dom 2711F298 A711FC3E 0009A6 DFHDSKE Int +000946 A711F726 00048E FREE_USS_PROCESS Trace shows that the CHANGE_MODE to the L8 TCB fails which is what leaves processing on the QR TCB: 0002 DSAT ENTRY - FUNCTION(CHANGE_MODE) MODENAME(L8) TASK-XM KE_NUM-00F5 TCB-C/QR 0003 DSAT EXIT - FUNCTION(CHANGE_MODE) RESPONSE(EXCEPTION) REASON(TCB_FAILED) OLD_TCB_TOKEN(27315F00 , 00000001) TASK-XM KE_NUM-00F5 TCB-C/QR Additional Symptoms: DFHKE0002 severe error code 0506 in DFHKEDS DFHSO0002 severe error code 0C48 in DFHSOLS BPXP023I THREAD 2B6AC00000000000, IN PROCESS 207, WAS TERMINATED BY SIGNAL SIGKILL, SENT FROM THREAD 2B6A780000000000, IN PROCESS 198, UID 1251, IN JOB jjjjjjjj. Additional Symptoms: CICS region unresponsive hang In Dump, there are many EC6 abends in the kernel error table CICS Trace you find the following exception: . DS 0213 DSKE *EXC* - FREE_USS_PROCESS_FAILED - TASK-XM KE_NUM-xxxx TCB-C/QR 1-0000 FFFFFFFF 2-0000 0000009D 3-0000 0B7000B9 Data area 3 is the reason code- 0B7000B9 which means JRActiveProcess. Backing up in the trace, a task was running on an open TCB, when CICS Recovery was driven for that open TCB, on our main QR TCB: ERM *EXC* RECOVERY PROGRAM(EZACIC01) TASK-nnnnn KE_NUM-xxxx TCB-C/QR Trace only had DS level 1 tracing, so we don't see the change mode to that open TCB that failed. This caused the bpx_process_cleanup call to incorrectly be issued on the QR TCB, which failed with reason code 0B7000B9. LMQUEUE waits
Local fix
n/a
Problem summary
**************************************************************** * USERS AFFECTED: All CICS users. * **************************************************************** * PROBLEM DESCRIPTION: CICS hangs, other tasks (including * * CSOL) abend and JVM servers disable * * after a task is purged. * **************************************************************** CICS is processing an IP CICS Socket workload with OTE=YES. This processing causes a USS process to be associated with each L8/L9 TCB. One such task is purged whilst in a wait. This cases the TCB to be queued for termination when the task ends. Task termination is run on the QR TCB and an attempt is made to switch to the L8/L9 in order to make a BPX1MPC call to clean up the USS process. Switching TCBs fails as the TCB is queued for termination so the BPX1MPC call is made on the QR TCB. The BPX1MPC call may hang the QR TCB and will cause a SIGKILL to be sent to all USS processes associated with any other TCBs in the region. This will cause any other IP CICS Sockets tasks to abend and JVM servers to disable. As a result of this, the following may also be seen: - msgDFHKE0002 with serve error code X'0506' in module DFHKEDS - msgDFHSO0002 with serve error code X'0C48' in module DFHSOLS -Thread terminated with msgBPXP023I for a thread terminated with signal SIGKILL in the CICS region job
Problem conclusion
USS Clean up processing in DFHDSKE, for these TCBs to handle the case where the associated L8/L9 TCB has been abterm'd, to not continue the processing, and not call BPX1MPC if the change_mode response shows the TCB has been terminated.
Temporary fix
Comments
APAR Information
APAR number
PH24659
Reported component name
CICS TS Z/OS V5
Reported component ID
5655Y0400
Reported release
200
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-04-21
Closed date
2020-07-28
Last modified date
2020-12-07
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI70777 UI70778
Modules/Macros
DFHDSKE
Fix information
Fixed component name
CICS TS Z/OS V5
Fixed component ID
5655Y0400
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Line of Business":{"code":"LOB35","label":"Mainframe SW"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGMGV","label":"CICS Transaction Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.5"}]
Document Information
Modified date:
08 December 2020