APAR status
Closed as program error.
Error description
From the dump, we see that both master and subordinate address spaces are waiting. The Master address is getting 'error stop' notification and requesting and waiting for subordinate address space to shut down, the subordinate address space was waiting for a response on the permission request to use the tape device (device request) from the master address space (this was where the hang occurred). Since master was shutting down and the response was not sent, the subordinate was waiting there forever.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All IMS DRF V3R1 users with TAPECHK(Y) and * * ERROR(STOP) might be affected. * **************************************************************** * PROBLEM DESCRIPTION: DRF hangs with ERROR(STOP) when error * * is detected. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** When TAPECHK(Y) was specified with ERROR(STOP) to run recovery, we had hang instead of early end if there is any IC allocation failure.
Problem conclusion
The problem occurred due to a coordination problem between the master address space and subordinate address spaces when recovery is run with the ERROR(STOP) option specified. When a subordinate address space experiences a failure resulting in early end of recovery and image copies, all subordinate address spaces are supposed to be notified and shutdown. In this case, one subordinate address space had sent a request to read an image copy from a tape device when early end processing was initiated. The thread asking permission is in a wait state at this point until permission is granted or denied. Then the early end notification was sent to the subordinate address space. The early end process did not detect the wait for a tape device state and enqueued an early end notification to the already waiting thread. The early end notification could not be processed without the tape device wait being posted first. Since early end processing happened, no tape device permission will be granted. This results in a hang. The problem is fixed by including code in the early end processing in the subordinate address space that detects and posts a thread waiting for a tape device. If early end processing is in progress when the thread wakes up, a nonzero return code is set causing the subordinate address space to terminate the current recovery process. This allows early end processing to continue. FRXIRTH0, FRXICTL0, FRXIDYN0 have been changed to enhance the communication between master and subordinate. The status of the subordinate address spaces is changed in this scenario in the recovery report. The final status in SUMMARY Report has been changed in this situation from general message 'Recovery processing error' to more specific 'Sub address space failure'.
Temporary fix
Comments
APAR Information
APAR number
PK84761
Reported component name
IMS DB RECOVERY
Reported component ID
5655I4400
Reported release
310
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2009-04-15
Closed date
2009-06-19
Last modified date
2009-07-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK47599
Modules/Macros
FRXCON FRXICTL0 FRXIDYN0 FRXIRTH0 FRXMSTR1 FRXPSDR0 FRXRVGB
Fix information
Fixed component name
IMS DB RECOVERY
Fixed component ID
5655I4400
Applicable component levels
R310 PSY UK47599
UP09/06/23 P F906
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCX88Z","label":"IMS Database Recovery Facility"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1.0","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
01 July 2009