Processing historical changes if replication stops

Planned or unplanned interruptions of replication can cause a subscription to lag behind current processing. Data Replication for VSAM manages this situation for you in most cases.

Your replication environment can go back in time and process units of recovery (UORs) that were not processed while replication was inactive for a subscription.

Figure 1. Historical changes available for replication processing after an outage
The source server stores units of recovery for reprocessing.

The target server maintains information in a bookmark database that the source server and log reader service can use to restart change data capture at the correct log position. The exact position depends upon the last contiguous committed UOR and the oldest in-flight UOR (U2 and U3 in the illustration). Change data capture begins reading the logs with the first change for any uncommitted UORs. If changes for the log position are unavailable for immediate retrieval in the cache, the log reader retrieves them from the logs.

Replication can stop under these conditions:

A subscription stops automatically
  • Your replication environment encounters a serious inconsistency between the source and target, such as the following inconsistencies:
    • Apply errors
      For example, when the replication mapping is in standard apply mode:
      • Attempting to update a record that does not exist
      • Inserting a record and finding a duplicate key record that already exists in the data set
    • Validation processing that detects mismatches between VSAM data sets
  • An error or system outage occurs:
    • Link loss

      A link loss can be a lost TCP/IP connection between source and target servers or a lost z/OS IP socket connection to CICS. The cause could also be a transient communications problem or the abrupt loss of the source server or source site. In cases where the connection is unexpectedly lost, the target server tries to end replication in a controlled manner from the target server's perspective. Primarily, this means that replication tries to apply all UORs that were completely received prior to the connection loss. This action is known as apply cache drain. If replication is restarted before the apply cache is completely drained, apply cache drain ends immediately when the new connection for the subscription is detected and then replication is allowed to restart normally on the new connection.

      Note: If secondary errors are encountered during apply cache drain (for example, target data store errors), replication ends immediately.
    • A CICS system outage occurs and the subscription is stopped
    • Other internal errors
You stop and then subsequently restart a subscription
  • The source or target data set is offline for maintenance.
  • You perform administrative operations on a replication mapping or a subscription. such as changing the state of a replication mapping from Parked to Active.

Processing for other subscriptions continues. Use the following approach if you want to replicate most of the data sets in the subscription, or all the data sets but one:

  1. Stop the subscription.
  2. Park the replication mappings that you do not want to replicate.
  3. Restart replication for the subscription.