A fix is available
APAR status
Closed as program error.
Error description
WAIT084-4 due to incorrect FRR stack for the program check FLIH. PSAPSTK at offset x'398' in the PSA for CP3 contains: 07323020 07323020 0734C0B8 . The first word (PSAPSTK) and the second word (PSAPSAV) should never be equal which is why the wait state is loaded. The sequence of events leading up to the wait state are as follows: 1. A CLKC external interrupts happens on one CP due to a TQE timer pop 2. On the other CP - the timer routine is given control and frees the TQE 3. Back on the original CP - we go to trace the CLKC interrupt in the system trace but the TQE is freed so we take a PIC11 ( this is the 'chance of a lifetime' timing issue - very low probability of that happening). 4. Program check first level interrupt handles the PIC11 who then calls RSM to resolve the page fault. RSM needs a lock that is unavailable (again - very low probability of that happening) and another external interrupt comes in and uses the same save area as the CLKC external interrupt. The Program check first level interrupt handler detects the problem and issues the WAIT084-4 preventing any possibility of a PSA overlay. Verification Steps: 1. Check PSAPSTK (offset x'398' in PSA) and PSAPSAV (offset x'39C') they should be equal. 2. Trace table will be similar to the following: 03 0031 009CB300 PGM 011 00000000_0133C976 00040011 04043000 80000000 *Pic11 will be in IEAVETRC 03 0031 009CB300 EMS 00000000_012F96E2 00001201 05041000 80000000 811860BC *EMS will be in IEAVELKX 03 0031 009CB300 *RCVY PROG 940C4000 03 0031 009CB300 *RCVY RTRY 040C0000 8133C97A 940C4000 03 0031 009CB300 CLKC 00000000_0000B312 00001004 07851000 80000000 03 0031 009CB300 *PGM 001 00000000_07B6B002 00020001 04042000 80000000
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: Users running z/OS HBB7780 and HBB7790. * **************************************************************** * PROBLEM DESCRIPTION: A very rare situation resulted in * * a PSA overlay and a WAIT084-04: * **************************************************************** * RECOMMENDATION: * **************************************************************** The scenario that resulted in the PSA overlay is as follows: - System trace got a program check while processing a CLKC external interrupt. - The Program FLIH handling the program check had to call RSM to resolve a page fault, but it had to spin to wait for a lock. - During the wait, another external interrupt came in and used the same PSA save area as the CLKC external interrupt. The root cause of this problem was the program check occurring in System Trace processing. On a cpu, system trace was handling a CLKC external interrupt and it was referencing a TQE (timer queue element). However, on another cpu, the timer routine was given control and then freed the TQE. Back on the original cpu, system trace takes a PIC11 program check as it tries to reference the freed TQE.
Problem conclusion
System trace will no longer reference the TQE while processing a CLKC external interrupt. DOCUMENTATION HOLD FOR APAR OA44060 +-------------------------------------------------------+ MVS Diagnosis: Tools and Service Aids +--- LOCATION IN PUBLICATION ---------------------------+ | | | System Trace | | Reading system trace output | | CALL, CLKC, EMS, EXT, I/O, MCH, RST, and SS | | trace entries | +-------------------------------------------------------+ | - Make the following changes for the CLKC trace entry: | In the table that shows all the trace entries, for the | CLKC trace entry, remove the tqe-tcb- and tqe-asid | references. | In the section marked by: UNIQUE-1/UNIQUE-2/UNIQUE-3 | UNIQUE-4/UNIQUE-5/UNIQUE-6 | | Remove the tqe-tcb- and tqe-asid and their respective | definitions
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
OA44060
Reported component name
SUPERVISOR CONT
Reported component ID
5752SC1C5
Reported release
780
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2013-12-12
Closed date
2014-01-31
Last modified date
2014-03-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA72243 UA72244
Modules/Macros
IEAVETRC
| GA227589XX | GA320905XX |
Fix information
Fixed component name
SYSTEM TRACE
Fixed component ID
5752SC142
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"780","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"780","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
03 March 2014