Procedure for failed RLS mode forward recovery operation

There are some forward recovery failures that can be resolved.

For example, when:

FRSETRR fails because the data set is already allocated to another job
The restore fails because the backup has gone to tape and operator intervention is required
The forward recovery does not work because the log data sets have been migrated

In these cases, you can resolve the cause of the failure and try the whole process again.

This topic describes what to do when the failure in forward recovery cannot be resolved. In this case, where you are unsuccessful in applying all the forward recovery log data to a restored backup, you are forced to abandon the forward recovery, and revert to your most recent full backup. For this situation, the access method services SHCDS command provides the FRDELETEUNBOUNDLOCKS subcommand, which allows you to delete the retained locks that were associated with the data set, instead of re-binding them to the recovered data set as in the case of a successful forward recovery.

The most likely cause of a forward recovery failure is the loss or corruption of one or more forward recovery logs. In this event, you probably have no alternative other than to restore the most recent backup and reapply lost updates to the data set manually. In this case, it is important that you force CICS® to discard any pending (shunted) units of work for the data set that has failed forward recovery before you restore the most recent backup. This is because, during recovery processing, CICS assumes that it is operating on a data set that has been correctly forward recovered.

CICS performs most of its recovery processing automatically, either when the region is restarted, or when files are opened, or when a data set is unquiesced. There is not any way that you can be sure of preventing CICS from attempting this recovery processing. How you force recovery processing before restoring the backup depends on whether the affected CICS regions are still running:

For a CICS region that is still running, issue the appropriate CICS commands to initiate the retry of pending units of work.
For a CICS region that is shut down, restart it to cause CICS to retry automatically any pending units of work.

Note: Ensure that you issue any CICS commands, or restart a CICS region, before you restore the most recent backup, otherwise CICS performs recovery processing against the restored data set, which you do not want.

In the event of a failed forward recovery of a data set, use the following procedure:

Tidy up any outstanding CICS recovery work, as follows:
1. Make sure that any CICS regions that are not running, and which could have updated the data set, are restarted to enable emergency restart processing to drive outstanding backouts.
2. Using information returned from the INQUIRE UOWDSNFAIL command issued in each CICS region that uses the data set, compile a list of all shunted UOWs that hold locks on the data set.
3. If there are shunted indoubt units of work, try to resolve the in-doubts before proceeding to the next step. This is because the indoubt units of work might have updated resources other than the failed data set, and you don’t want to corrupt these other resources.
  If the resolution of an indoubt unit of work results in backout, this will fail for the data set that is being restored, because it is still in a recovery-required state. The (later) step to reset locks for backout-failed UOWs allows you to tidy up any such backout failures that are generated by the resolution of in-doubts.
4. In all CICS regions that could have updated the failed data set:
  1. Force shunted indoubt units of work using SET DSNAME(…) UOWACTION(COMMIT | BACKOUT | FORCE).
    Before issuing the next command, wait until the SET DSNAME(…) UOWACTION has completed against all shunted indoubt units of work.
    
    If the UOWACTION command for an indoubt unit of work results in backout, this will fail for the data set that is being restored, because it is still in a recovery-required state. The (next) step to reset locks for backout-failed UOWs allows you to tidy up any such backout failures that are generated by the resolution of in-doubts.
  2. Reset locks for backout-failed units of work using SET DSNAME(…) RESETLOCKS.
    Do not issue this command until the previous UOWACTION command has completed. RESETLOCKS operates only on backout-failed units of work, and does not affect units of work that are in the process of being backed out. If you issue RESETLOCKS too soon, and shunted indoubt units of work fail during backout, the data set is left with recovery work pending.
  The DFH0BAT3 sample program provides an example of how you can do this.
  
  There should not now be any shunted units of work on any CICS region with locks on the data set.
When you are sure that all CICS regions have completed any recovery work for the data set, delete the unbound locks using SHCDS FRDELETEUNBOUNDLOCKS.
Note: It is very important to enter this command (and the following SHCDS FRRESETRR) at this stage in the procedure. If you do not, and the failed data set was in a lost locks condition, the data set remains in a lost locks condition unless you perform a cold start of all CICS regions which have accessed it. If you mistakenly issued this command before step 1 of the procedure, then the problem concerning lost locks still arises even if you issue the command again at this stage.
Allow access to the data set again using SHCDS FRRESETRR (even if you use CICS VSAM Recovery for z/OS®, you have to allow access manually if you have abandoned the forward recovery).
Finally, restore the backup copy from which you intend to work.

If the restored data set is eligible for backup-while-open (BWO) processing, you might need to reset the BWO attributes of the data set in the ICF catalog. This is because the failed forward recovery may have left the data set in a ‘recovery-in-progress’ state. You can do this using the CEMT, or EXEC CICS, SET DSNAME RECOVERED command.

If you do not follow this sequence of operations, the restored backup could be corrupted by CICS backout operations.

All the parts of step 1 of the previous procedure can also be appropriate in similar situations where you do not want CICS to perform pending backouts. An example of this might be before you convert an RLS SMS-managed data set to non-SMS when it has retained locks, because the locks will be lost.