Recovering data sets after CICS VR batch job failures

Data Replication for VSAM captures and replicates CICS® VR changes written during backout.

If a batch job fails at the source site that includes a data set being logged for replication, you need to consider the implications for the corresponding target data set. The source and target data sets must be point-in-time consistent.

The actions that you take to restart a failed source batch job might require you to perform actions against your target data sets. For example, if you decide to restore a source data set to a point-in-time prior to the batch job and re-run the batch job, you also need to copy the restored version of the data set to the target and set the log position for the replication mapping before you re-run the batch job.

Recovering data sets using DWWBACK

After a batch job fails that updates VSAM data sets defined with FRLOG(UNDO) or FRLOG(ALL), you can run the CICS VR batch backout program (DWWBACK) to undo the changes to the source data sets. When you use DWWBACK, Data Replication for VSAM captures and replicates those changes to maintain the corresponding target data sets.

Recommendation: Use DWWBACK whenever possible. Other recovery methods are likely to result in a replication outage for the data sets.

For detailed information about the DWWBACK program, see Starting CICS VR batch backout.

Recovering data sets using other recovery methods

If you do not run DWWBACK, problems can arise that affect replication and require additional actions. You must ensure that the source and target data sets are point-in-time consistent before restarting the batch job. The actions that you take depend on the type of CICS VR batch logging that is enabled.

The following examples describe the actions that you will likely need to perform to recover VSAM data sets when you do not run DWWBACK.

Example: No logging

A batch job fails that updates VSAM data sets defined with FRLOG(NONE). The action that you take depends on whether you can restart the batch job to reverse the failed changes.
  • If the batch job is restartable, you can restart the job at the source and no action is needed at the target.
  • Otherwise, you need to restore the data sets. You can take the following actions:
    1. Stop replication for the subscription with a controlled stop.
    2. Park the replication mapping.
    3. Restart replication for the other replication mappings in the subscription.
    4. Restore the source data sets.
    5. Re-run the batch job.
    6. Determine a quiesce point.
    7. Stop replication for the subscription with a controlled stop.
    8. Copy the source data sets to the target and use the quiesce point to set the log position for the replication mapping.
    9. Restart replication.

Example: Forward recovery logging

A batch job fails that updates VSAM data sets defined with FRLOG(REDO). Changes applied by CICS VR during forward recovery are not replicated. You need to restore the target data sets to match the source data sets after forward recovery processing. To restore the data sets to a point-in-time by forward recovering changes against the data sets, you can take the following actions:
  • If the batch job is restartable, you can restart the job at the source and no action is needed at the target.
  • To restore the data sets to a point-in-time by forward recovering changes against the data sets, you can take the following actions:
    1. Stop replication for the subscription with a controlled stop.
    2. Park the replication mapping.
    3. Restart replication for the other replication mappings in the subscription.
    4. Forward recover the source data sets.
    5. Re-run the batch job.
    6. Determine a quiesce point.
    7. Stop replication for the subscription with a controlled stop.
    8. Copy the source data sets to the target and use the quiesce point to set the log position for the replication mapping.
    9. Restart replication.

Example: Undo logging

A batch job fails that updates VSAM data sets defined with FRLOG(UNDO). If the batch job was running for an hour and the subscription is fairly current, the bookmark will be after the fall-back source copy (a copy of each data set prior to the start of the batch job). In this case, you cannot restore the data sets back to the before-job copy and copy them to the target because you will not be able to set a log position earlier than the bookmark and restart replication successfully. To recover the data sets, you can take the following actions:
  • If the batch job is restartable, you can restart the job at the source and no action is needed at the target.
  • Otherwise, you can take the same actions as described for FRLOG(NONE).