APAR status
Closed as program error.
Error description
When performing recovery of 20 DEDB areas using DRF PITR, the processing CPU time exceeded 14 hours and the problem was judged to be a hang or loop and was canceled. This problem can occur for Full Function DBs. The problem occurs when processing lots of log data, causing data to be spilled to dataspaces after all buffers have been used/filled. This is a data integrity issue. Additional symptoms: Fixes problem reported in R210 APAR PK30937
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IMS Database Recovery Facility * * Version 3 Release 1 running recovery to any * * prior point in time PITR. * **************************************************************** * PROBLEM DESCRIPTION: While running a PITR recovery,the DRF * * master address space runs into a loop * * or hang condition. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** PITR results in a hang or endless loop. This occurs when the amount of log data is a high enough percentage of private storage that it needs to be moved to data spaces. As part of the problem determination for the originally reported problem, it was disovered that under some high load circumstances, log data is lost as part of PITR processing. In addition, it was discovered that SDEP log records were not processed correctly and SDEP information was lost resulting in a data integrity problem. The SDEP problem is when updates to the SDEPs cause it to wrap. As part of testing the fix, an ABENDS40D was encountered as a shortage of available private storage.
Problem conclusion
AIDS: RIDS/UTIL RIDS/DBS DBS/UTIL DEP: NONE GEN: *** END IMS KEYWORDS *** The initial problem reported by the customer of a hang or endless loop during large recoveries is fixed in the following ways. As part of testing the fix, multiple problems were encountered and fixed as documented below. The hang and endless loop ended up being several different problems. They are fixed in the following ways. First, buffer contention caused a hang and is fixed by separating the buffers in to two pools. One for log read and one for buffer send to the subordinate address spaces. A loop is fixed by using awes from the awe pool instead of local module storage. The missing data is fixed in three major ways. First, spill management is fixed to always return the data spilled, not extraneous data due to residual data in the token on a spill. If the token on a spill is non-zero, the spill manager interprets the request as a retrieve and the spilled data is lost because the caller does not expect to have data back from a spill request. The other way buffers were lost was through awe's in local storage being enqueued and the thread not waiting. When control returned from the module enqueuing the awe, the awe storage is reused. The awe field is cleared at times resulting in the buffer being enqueued over being lost. The last is for fast path SDEP processing. PK11200 added support for the LOG5957 record. The support is not complete and did not use FRXRRR to copy the LOG5957 from the input buffer to the buffer sent to the subordinate address space. Also, processing the LOG5957 depended on a UOR token but the LOG5957 does not contain a UOR token. The LOG5957 is sent to random subordinate address spaces and not applied to the area intended. The ABENDS40D is fixed by separating the buffer pool into two pools. One for read and one for buffer send to subordinate address spaces. It is also fixed by sending all log data through FRXQBUF0 and FRXLMRG0 to have FRXUORM0 process the data in order. This way, the UOR related storage can be released when end of UOR notification is encountered. The hash table is reduced from a maximum of five levels to one hash table. This significantly reduces the private storage utilization for extremely large recoveries. Timestamp was never expected to have hex zeroes so a check for timestamp validity is not done. Some vendor products may place x'F0' in the timestamp so validity checking is done with the fix. The code is changed in the following parts: Parameter change for multiple buffer types FRXCAMG0, FRXCBDM0, FRXICLI0, FRXICTL0, FRXLMRG0, FRXMNP, FRXMSTR1, FRXPDIR0, FRXPDIS0, FRXPDSR0, FRXPDSS0, FRXPSDR0, FRXPSDS0, FRXQBUF0, FRXRBUF0, FRXRCTL0, FRXRDTH0, FRXUORM0, FRXHBUF0, FRXBDMG0, FRXMTC Dump formatter recompile FRXADF00, FRXADM10, FRXADM20 FRXBDCB0 Add rvur as a fixed length control block for performance enhancement on get/release rvur storage FRXBDMG0 and FRXMTC are changed to add MSGFRD2892I. FRXCON define buffersend and bufferread FRXGFST support added to release buffers from multiple pool types FRXHBUF0 Add support for multiple buffer pool types FRXLMRG0 fix intermittent hang on end of read when end of log read is not propogated to unit of recovery manager due to timing window. Add diagnostic count on buffer release. FRXMINI0 and FRXMSTR0 move upper limit of buffer percentage of private storage to half instead of three quarters to avoid storage shortage and ABENDS40D FRXMSTR0 Clear storage before reusing to avoid endless loop FRXPDSS0 Process no-op notification from FRXUORM0 at end of data FRXQBUF0 send end of log data notification to FRXUORM0 if no log data sets are to be read for recovery to avoid hang. Separate the OLR buffer logic from the non-OLR logic. FRXRBUF0 Add diagnostic buffer counts. FRXRBUF0 is also modified to check the time tamp for a type x06 log record. If the time stamp is zero, MSG FRD2892I is issued and recovery is abnormally terminated with ABENDU385 RSN00A. FRXRCTL0 Simplify the buffer freed process FRXRDTH0 Use awe from awe storage pool instead of local storage. Add diagnostic count for buffers. FRXRLRA0 Add logic to track uor token for 5950, 5937, and 5938. FRXRRR Add diagnostic count for each log record FRXRVCS Add olr indicator flag FRXRVDL Add data space free space diagnostic information FRXRVGB Add diagnostic count fields for buffer and record counting FRXRVQB Add support for separate olr code path in FRXQBUF0 FRXRVUR Add support for spill and add diagnostic fields for uor tracking FRXURHS Reduce number of hash table levels to 1 FRXUORM0 Add support to spill log data buffers on input if recovery running low on private storage. Fix lost buffer problem when spilling data. Fix support for SDEPs (LOG5957) and copy the LOG5957 to the output buffers via FRXRRR calls instead of MVCL. Fix hang and endless loop on buffer pool storage contention by supporting multiple buffer pool types. FRXWSPL0 If request is to spill data, clear the remote token to avoid ABENDU0385 - RSN 0015 in FRXWSPL0 FRXWSPM0 Add free space diagnostic field and tracking - DOCUMENTATION CHANGE FOR APAR PK24575 THIS MAINTENANCE IS BEING HELD SO YOU WILL BE AWARE OF DOCUMENTATION CHANGE TO MANUAL(S): SC18940700 - THE FOLLOWING TEXT DESCRIBES THE DOC CHANGE: A change has been made in IBM IMS Database Recovery Facility for z/OS, User's Guide and Reference, Version 3 Release 1, at page 103, chapter 7: Messages and Codes of IMS Database Recovry Facility. FRD2892I reason IN LOG RECORD seqnum DETECTED IN dsname Explanation: An invalid record contents is detected for the log data set with the dsname during database data set recovery by the IMS Database Recovery Facility. The message destination is the z/OS system console and the IMS master terminal. If the message is issued in batch mode, the message destination is the z/OS system console. The message is followed by an ABEND 385-00A reason: Identified the problem and is one of the following: Invalid time stamp seqnum: The sequence number that identifies the log record in the log data set. It can be used to determine which record is bad. dsname: The data set from which the log record was read. User Action: Examine the log record identified in the message within the log data set listed in the message. Use the IMS DFSLOG06 macro mapping of the log record to determine the offset to the ACPRILOG field. If this time stamp is zero, determine if anything in your environment interacts with the IMS Logger component initialization or termination processing. If not, report this problem to IBM. In any case, use the appropriate tool or procedure to place the prilog time for the subsystem or batch job which created the log in the ACPRILOG field of the log record. Refer to the appropriate IMS documentation for the format of the prilog time stamp for the 06 log record. Make sure the 06 log records at the beginning and end of the log data set have the time stamp provided. System Action: The IMS Database Recovery Facility address space terminates. Module: FRXRBUF0
Temporary fix
********* * HIPER * *********
Comments
×**** PE07/09/26 FIX IN ERROR. SEE APAR PK52492 FOR DESCRIPTION
APAR Information
APAR number
PK24575
Reported component name
IMS DB RECOVERY
Reported component ID
5655I4400
Reported release
310
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2006-05-08
Closed date
2006-10-26
Last modified date
2007-10-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK19165
Modules/Macros
FRXADF00 FRXADM10 FRXADM20 FRXAWEX FRXBDCB0 FRXBDMG0 FRXCAMG0 FRXCBDM0 FRXCON FRXDCB FRXDDRF FRXEDRF0 FRXGFST FRXGRPT0 FRXHBUF0 FRXICLI0 FRXICTL0 FRXLMRG0 FRXMINI0 FRXMINI1 FRXMNP FRXMSTR0 FRXMSTR1 FRXMTC FRXPDIR0 FRXPDIS0 FRXPDSR0 FRXPDSS0 FRXPSDR0 FRXPSDS0 FRXQBUF0 FRXRBUF0 FRXRCTL0 FRXRDTH0 FRXRLRA0 FRXRRR FRXRVCS FRXRVDL FRXRVGB FRXRVQB FRXRVUR FRXUORM0 FRXURHS FRXVSTA0 FRXWSPL0 FRXWSPM0
| SC18940700 |
Fix information
Fixed component name
IMS DB RECOVERY
Fixed component ID
5655I4400
Applicable component levels
R310 PSY UK19165
UP06/10/28 P F610
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCX88Z","label":"IMS Database Recovery Facility"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1.0","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
24 October 2007