Question & Answer
Why does my CICS Transaction Server for z/OS (CICS TS) region crash after it gets the IRR401I 878 ABEND DURING RACF PROCESSING error message? This particular problem was followed in the log by the following message:
DFHXS0001 CICSRGN1 An abend (code 878/AKEB) has occurred at offset X'FFFF' in module DFHXSPW
and a dump was taken.
A system dump was taken for the abend878 in module DFHXSPW.
Enter IPCS command VERBX DFHPD700 'KE=1' (where "700" is for CICS TS 5.3) to display the The kernel error table in the Kernel domain shows that there are a number of different transactions that received 878 abends at this time:
Err_Num Err_Time KE_NUM Task Error Type Err_Code Module 0000059 21:20:21 0155 93684 ABEND 878/AKEB DFHXSPW 000005A 21:20:21 00CC 93581 ABEND 878/AKEB DFHXSPW 000005B 21:20:21 00E7 93647 ABEND 878/AKEB DFHXSPW 000005C 21:20:21 0121 93681 ABEND 878/AKEB DFHXSPW
The kernel stacks for these abending tasks show that they were all trying to do a VERIFY_PASSWORD when the 878 MVS Short On Storage abends occurred.
Enter IPCS command VERBX DFHPD700 'SM=3' to display the storage domain. In this case, CICS itself does not look like it is SOS in any of its DSA's so you can reduce the size of the DSA if needed:
Current DSA limit: 5376K Current DSA total: 4352K Currently SOS below 16M: NO Current EDSA limit: 400M Current EDSA total: 202M Currently SOS above 16M: NO
Looking at the Storage Map for this region in the dump (IPCS command VERBX VSMDATA 'NOG SUMM') and from the map you can see that the LSQA bottom is very close to the top of the User Region. This is where the shortage is occurring, in LSQA (as LSQA Storage grows down not up). LSQA storage has grown about 2 Meg in size.
LOCAL SUBPOOL USAGE SUMMARY TCB/OWNER SP# KEY BELOW ABOVE TOTAL --------- --- --- ----- ----- ----- LSQA 255 0 107000 1BF000 2C6000
The AQAT entries represent this area of storage and for the below the line allotment, and they all have a similar pattern of having DFE (free elements) of the same size (x'5B0') in them:
AQAT: Addr 00746000 Size 4000 DFE: Addr 00746000 Size 5B0 DFE: Addr 00747000 Size 5B0 DFE: Addr 00748000 Size 5B0 DFE: Addr 00749000 Size 5B0 AQAT: Addr 00750000 Size 4000 DFE: Addr 00750000 Size 5B0 DFE: Addr 00751000 Size 5B0 DFE: Addr 00752000 Size 5B0 DFE: Addr 00753000 Size 5B0 AQAT: Addr 00755000 Size 6000 DFE: Addr 00755000 Size 5B0 DFE: Addr 00756000 Size 5B0 DFE: Addr 00757000 Size 5B0 DFE: Addr 00758000 Size 5B0 DFE: Addr 00759000 Size 5B0
The AE entries, that describe the allocated storage for each TCB from that storage have a pattern as well:
Data for TCB at address 006DDE88 --Below the line LSQA AEs AE: Addr 007665B0 Size A50 AE: Addr 006DDD00 Size 130 Data for TCB at address 006FC488 --Below the line LSQA AEs AE: Addr 007575B0 Size A50 AE: Addr 006FC300 Size 130 Data for TCB at address 006C6388 --Below the line LSQA AEs AE: Addr 006D25B0 Size A50 AE: Addr 006C6200 Size 130
The eyecatchers in those AQAT's and in the AE's show SSAT in the eyecatcher. For example, the last AE above shows:
006D25B0|E2E2C1E3 006D2600 0000000F 0000020F| SSAT._.......... | 006D25C0.:6D25FF.--All bytes contain X'00'
For each TCB there will be at least one (amongst with other things) SSAT. These are Subsystem Affinity Table (SSAT) entries. Each SSAT can hold 16 subsystems.
Looking in the Dispatcher (DS) domain, you can see that currently there are 154 open TCB's allocated to tasks:
DATA FOR TCB POOL CONTROLLED BY MAXOPENTCBS MODES IN POOL ARE: L8 L9 NUMBER OF TCBS IN POOL CURRENT HIGH WATER IN EXISTENCE 158 158 ALLOCATED TO TASKS 154 158
Using IPCS command SSIDATA to show the subsystem information:
Summary Report for SSIDATA -------------------------- NUMBER OF DEFINED SUBSYSTEMS = 519
That means every TCB that is created has to have 519 SSATs in case it needs to contact any of those subsystems.
So, 519 / 16 = 32.4 .... This means at least 33 SSAT's per TCB are needed to contain all the subsystems.
Each entry in the SSIDATA looks something like:
SUBSYS = ABC1 DYNAMIC = YES STATUS = ACTIVE COMMANDS = NO SUBSYSTEM DEFINITION DATA SSCVT ADDRESS = 00CA48D8 USER FIELD 1 = 2CF62000 USER FIELD 2 = 2019AD3B SUBSYS = ABC2 DYNAMIC = YES STATUS = INACTIVE COMMANDS = NO SUBSYSTEM DEFINITION DATA SSCVT ADDRESS = 00CA4938 USER FIELD 1 = 00000000 USER FIELD 2 = 00000000
Sorting through all of these entries I found:
183 Active systems 336 Inactive systems
Every TCB has to hold information about 336 inactive systems.
This appears to be a tuning problem. There are a large number of subsystems defined to the image and a large number of TCBs capable of running in the CICS region. MVS still holds some information below the line.
This combination increases the storage that grows down from the top significantly. So the best course of action for this particular 878 (all 878's are not the same) would be to either reduce the number of subsystems in the image (for example, look at the number of subsystems defined in the LPAR and cleanup or delete any unnecessary ones) or to reduce their DSA to give some storage back to MVS.
CICS/TS CICSTS CICS TS CICS Transaction Server
06 February 2019