z/TPF - Group home

Recent Online Database Reorganization (ODBR) maintenance

  

For customers thinking about using the Online Database Reorganization package in the near future, here is a list of recent ODBR related APARs. These APARs did not ship on PUT 11 but will be shipped later this year with PUT 12.  All of these APARs are now available for download from the z/TPF website. It is highly recommended that our customers download and integrate these APARs into their z/TPF environment as soon as practical.

 

APAR PJ42785

 

Please see the following blog post to understand the issue solved by this APAR.

https://www.ibm.com/developerworks/community/blogs/zTPF/entry/apar_pj42785_a_new_type_of_spare_ramfil?lang=en

 

More information on APAR PJ42785 can be seen at the following link:

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1PJ42785

 

 

APAR PJ42768

 

This APAR solves two problems that can occur when modules are offline on a selectively duplicated device type:

  • ·    The z/TPF system lost a module during the ODBR move session or a module was offline at the time the session was started.
  • ·    A selectively duplicated record is being flushed from VFA and the owning module or shadow copy module is offline.

In the documentation (see topic entitled Using ODBR), IBM recommends that all modules be online before starting an ODBR move session; this is especially true for selectively duplicated systems:

 

If the device type in your MDBF subsystem is defined in the system initialization process (SIP) as selectively duplicated, you must ensure real-time DASD modules are online during an ODBR session if you attempt to move simplex fixed or pool records.”

 

To move a record the system must obtain a Record Reorganization Lock (RRL) before reading the reading the data from disk.  If the module is selectively duplicated and offline, and the record is not duplicated, the attempt to obtain the RRL will fail. The ODBR move program recognizes that the module is offline and will not move the data. Prior to PJ42768, at exit time ODBR believe that it had obtained an RRL lock and would attempt a release. The release attempt would also fail leading to a CTL-38 and CTL-2D catastrophic. Although the ODBR move session may put itself into a paused state when reading from or writing to an offline module, the code recognizes failure appropriately now and no outage is taken. At this time the user must bring the offline module back online. Once the module is brought online the ODBR can be restarted from where it left off (ZODBR RESTART). 

 

When a delay-file VFA buffer associated with a simplex record being moved with ODBR is flushed, this will trigger the shadow copy process. If the regular or shadow destination module is offline, the code will progress to the SNAPC-30 routine (CJPXEMTR).  Prior to PJ42768, CJPXEMTR did not account for ODBR shadow copy and would release the VFA buffer when the IOB completed.  The code did not interrogate the cross chain pointers in the IOB (the ODBR equivalent of MIOPAIR) to be sure only the last IOB took action to release the VFA buffer. This could lead to a CTL-542 catastrophic and IPL when the second IOB hit post interrupt as the VFA buffer was likely already released. Now, one SNAP-30 is issued for each the normal and shadow IOB (if modules are offline) and the system releases the VFA buffer when the last IOB hits post interrupt, which avoids the logic error and processor IPL. 

 

More information on APAR PJ42768 can be seen at the following link:

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1PJ42768

 

 

APAR PJ42737

 

During an ODBR move session, the time initiated and operator input ZODBR DISPLAY STATUS commands will display an estimated completion time for the move session. Prior to PJ42737, the estimated completion time displayed for the ODBR move session was not accurate. Items like the following were not properly accounted for in the original estimation:

  • Time spent doing no ODBR work (idle time). For example, this could be time spent during an IPL or ZODBR PAUSE.
  • Time spent moving a record, meaning that the number of I/Os required to move a record must be factored.
  • Time required for automatic validation if enabled in the ODBR profile.

Finally, the number of restart areas used for an ODBR move session was decreased from 16 to 8. This allows ECB level to be spread over fewer restart areas and will allow for a quicker completion of in-progress work.  Since restart areas will complete quicker this means that the time window that would require Record Reorganization Locks (RRLs) be obtained during shadow copy will be reduced. 

 

More information on APAR PJ42737 can be seen at the following link:

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1PJ42737

 

 

APAR PJ42654

 

ODBR support is designed such that successive move sessions can be loaded, executed and accepted. ODBR keypoint records are initialized at appropriate times during the process so that intervening steps or IPLs are not required. Prior to PJ42654, if the operator were to choose to pause the ODBR move session (ZODBR PAUSE) on the second to the Nth ODBR move session the pause action would be rejected. Message ODBR2504E would instead be issued and this forced the operator to make a decision. Should I let the move session complete or should I cancel (ZODBR CANCEL) what has already been moved and validated? You may have to wait some time for move session completion or undo a lot of progress made with a ZODBR CANCEL. Certain fields were found in our processor unique keypoint that were assumed to be zero when the keypoint was initialized.  These fields will now be set to zero when the processor unique keypoint is initialized and ZODBR PAUSE is now possible of successive ODBR move sessions. 

 

The ODBR activity monitor performs actions like:

  • Time initiated keypointing of ODBR status
  • Time initiated displaying ODBR status
  • Validation that ODBR management ECBs are still active. If a management ECB exited abnormally, the code is to recognize this issue in a reasonable amount of time and cancel the ODBR move session.

Prior to PJ42654, the code to validate ODBR management ECBs was in error. As a result, the activity monitor could not properly identify a management ECB that had exited abnormally. This means the ODBR move session could hang and it would be up to the operator to recognize this and cancel the session. Time could be wasted waiting that could be spent cancelling and starting the move session over again.

 

Track validation, during ZODBR LOAD, may take a long time depending upon the number of records moved. Prior to PJ42654 if a track was duplicated, the prime track was validated and then the duplicate track was validated. A significant performance improvement was realized by overlapping the track validation I/Os.

 

Prior to PJ42654 track validation would only use the number of ECBs set in STARTLVL at the time ZODBR LOAD was entered. Now the operator can change STARTLVL during track validation, via the ZODBR PROFILE command, and the system will react and properly throttle the number of ECBs it may create.

 

More information on APAR PJ42654 can be seen at the following link:

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1PJ42654