APAR status
Closed as program error.
Error description
When performing recovery of 20 DEDB areas using DRF PITR, the processing CPU time exceeded 14 hours and the problem was judged to be a hang or loop and was canceled. This problem can occur for Full Function DBs. The problem occurs when processing lots of log data, causing data to be spilled to dataspaces after all buffers have been used/filled. This is a data integrity issue.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IMS Database Recovery Facility * * Version 2 Release 1 running recovery to any * * prior point in time PITR. * **************************************************************** * PROBLEM DESCRIPTION: While running a PITR recovery the DRF * * master address space runs into a loop * * or hang condition. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** PITR results in a hang or endless loop. This occurs when the amount of log data is a high enough percentage of private storage that it needs to be moved to data spaces. As part of the problem determination for the originally reported problem, it was disovered that under some high load circumstances, log data is lost as part of PITR processing. In addition, it was discovered that SDEP log records were not processed correctly and SDEP information was lost resulting in a data integrity problem. The SDEP problem is when updates to the SDEPs cause it to wrap. As part of testing the fix, an ABENDS40D was encountered as a shortage of available private storage.
Problem conclusion
AIDS: RIDS/UTIL RIDS/DBS DBS/UTIL DEP: NONE GEN: *** END IMS KEYWORDS *** The initial problem reported by the customer of a hang or endless loop during large recoveries is fixed in the following ways. As part of testing the fix, multiple problems were encountered and fixed as documented below. The hang and endless loop ended up being several different problems. They are fixed in the following ways. First, buffer contention caused a hang and is fixed by separating the buffers in to two pools. One for log read and one for buffer send to the subordinate address spaces. A loop is fixed by using awes from the awe pool instead of local module storage. The missing data is fixed in three major ways. First, spill management is fixed to always return the data spilled, not extraneous data due to residual data in the token on a spill. If the token on a spill is non-zero, the spill manager interprets the request as a retrieve and the spilled data is lost because the caller does not expect to have data back from a spill request. The other way buffers were lost was through awe's in local storage being enqueued and the thread not waiting. When control returned from the module enqueuing the awe, the awe storage is reused. The awe field is cleared at times resulting in the buffer being enqueued over being lost. The last is for fast path SDEP processing. PK11200 added support for the LOG5957 record. The support is not complete and did not use FRXRRR to copy the LOG5957 from the input buffer to the buffer sent to the subordinate address space. Also, processing the LOG5957 depended on a UOR token but the LOG5957 does not contain a UOR token. The LOG5957 is sent to random subordinate address spaces and not applied to the area intended. The ABENDS40D is fixed by separating the buffer pool into two pools. One for read and one for buffer send to subordinate address spaces. It is also fixed by sending all log data through FRXQBUF0 and FRXLMRG0 to have FRXUORM0 process the data in order. This way, the UOR related storage can be released when end of UOR notification is encountered. The hash table is reduced from a maximum of five levels to one hash table. This significantly reduces the private storage utilization for extremely large recoveries. The code is changed in the following parts: Parameter change for multiple buffer types FRXCAMG0, FRXCBDM0, FRXICLI0, FRXICTL0, FRXLMRG0, FRXMNP, FRXMSTR1, FRXPDIR0, FRXPDIS0, FRXPDSR0, FRXPDSS0, FRXPSDR0, FRXPSDS0, FRXQBUF0, FRXRBUF0, FRXRCTL0, FRXRDTH0, FRXUORM0, FRXHBUF0 Dump formatter recompile FRXADF00, FRXADM10, FRXADM20 FRXBDCB0 Add rvur as a fixed length control block for performance enhancement on get/release rvur storage FRXCON define buffersend and bufferread FRXGFST support added to release buffers from multiple pool types FRXHBUF0 Add support for multiple buffer pool types FRXLMRG0 fix intermittent hang on end of read when end of log read is not propogated to unit of recovery manager due to timing window. Add diagnostic count on buffer release. FRXMINI0 and FRXMSTR0 move upper limit of buffer percentage of private storage to half instead of three quarters to avoid storage shortage and ABENDS40D FRXMSTR0 Clear storage before reusing to avoid endless loop FRXPDSS0 Process no-op notification from FRXUORM0 at end of data FRXQBUF0 send end of log data notification to FRXUORM0 if no log data sets are to be read for recovery to avoid hang. Separate the OLR buffer logic from the non-OLR logic. FRXRBUF0 Add diagnostic buffer counts FRXRCTL0 Simplify the buffer freed process FRXRDTH0 Use awe from awe storage pool instead of local storage. Add diagnostic count for buffers. FRXRLRA0 Add logic to track uor token for 5950, 5937, and 5938. FRXRRR Add diagnostic count for each log record FRXRVCS Add olr indicator flag FRXRVDL Add data space free space diagnostic information FRXRVGB Add diagnostic count fields for buffer and record counting FRXRVQB Add support for separate olr code path in FRXQBUF0 FRXRVUR Add support for spill and add diagnostic fields for uor tracking FRXURHS Reduce number of hash table levels to 1 FRXUORM0 Add support to spill log data buffers on input if recovery running low on private storage. Fix lost buffer problem when spilling data. Fix support for SDEPs (LOG5957) and copy the LOG5957 to the output buffers via FRXRRR calls instead of MVCL. Fix hang and endless loop on buffer pool storage contention by supporting multiple buffer pool types. FRXWSPL0 If request is to spill data, clear the remote token to avoid ABENDU0385 - RSN 0015 in FRXWSPL0 FRXWSPM0 Add free space diagnostic field and tracking
Temporary fix
********* * HIPER * *********
Comments
×**** PE07/09/26 FIX IN ERROR. SEE APAR PK52261 FOR DESCRIPTION
APAR Information
APAR number
PK20271
Reported component name
IMS DB RECOVERY
Reported component ID
5655I4400
Reported release
210
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2006-02-22
Closed date
2006-09-19
Last modified date
2007-10-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK18149
Modules/Macros
FRXADF00 FRXADM10 FRXADM20 FRXAWEX FRXBDCB0 FRXCAMG0 FRXCBDM0 FRXCON FRXDCB FRXDDRF FRXGFST FRXHBUF0 FRXICLI0 FRXICTL0 FRXLMRG0 FRXMINI0 FRXMINI1 FRXMNP FRXMSTR0 FRXMSTR1 FRXPDIR0 FRXPDIS0 FRXPDSR0 FRXPDSS0 FRXPSDR0 FRXPSDS0 FRXQBUF0 FRXRBUF0 FRXRCTL0 FRXRDTH0 FRXRLRA0 FRXRRR FRXRVCS FRXRVDL FRXRVGB FRXRVQB FRXRVUR FRXUORM0 FRXURHS FRXWSPL0 FRXWSPM0
Fix information
Fixed component name
IMS DB RECOVERY
Fixed component ID
5655I4400
Applicable component levels
R210 PSY UK18149
UP06/09/22 P F609
[{"Line of Business":{"code":null,"label":null},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCX88Z","label":"IMS Database Recovery Facility"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"210"}]
Document Information
Modified date:
09 November 2020