Conflict resolution in high availability environments

If you replicate VSAM data in both directions as part of a HADR solution, adaptive apply processing cannot handle all the data inconsistencies that can arise.

Failover and restart processing

In a HADR solution, you deploy a source server at the site that you are updating actively (in this example, site 1) and a target server at the failover site (site 2). You also set up a second deployment that consists of a source server at site 2 and a target server at site 1. Because you start replication in both directions, bookmarks advance at both sites during normal operations, even if you route all transactions to site 1. If a planned or unplanned outage occurs at site 1, you redirect transactions to site 2 during the outage.

When you restore service to site 1, these replication operations typically occur:

  1. Restart processing replicates any committed UORs at site 1 that failed to replicate prior to the outage.
  2. Failover processing replicates changes made at site 2 during the outage back to site 1.

These replication operations in both directions typically lead to mismatches between source and target databases that adaptive apply cannot resolve. You can use adaptive apply messages to resolve which UOR should have been replicated and make the appropriate corrections to the data sets. Ensure that the CONFLICTRPTLVL configuration parameter is set to generate messages for review.

Example

This example demonstrates how data inconsistencies can arise in a HADR environment. Consider the following scenario:

  1. Prior to the failover, a unit of recovery (U1) at site 1 sets the BALANCE field in record A from 50 to 100.

    This is a committed UOR, but the failover occurs before the source server can replicate U1 to the target site.

  2. After the failover, another unit of recovery (U2) at site 2 sets the BALANCE field in record A from 50 to 200.
  3. After site 1 is back online, the source server at site 1 sends U1 to site 2.

    The balance information at site 2 (BALANCE=200) does not match the before image data in U1 (BALANCE=50), so adaptive apply processing discards U1. BALANCE=100 at site 1 and BALANCE=200 at site 2.

  4. Site 2 replicates U2 to Site 1 when failover processing sends back the changes that occurred at the failover site during the outage.

    The balance information at site 1 (BALANCE=100) does not match the before image data in U2 (BALANCE=50), so adaptive apply processing discards U2. As before, BALANCE=100 at site 1 and BALANCE=200 at site 2.

Recursion protection

Subscriptions at two different sites (site A and site B) must be identical. They must have the same number of replication mappings, the replication mappings must be in the same states, and the names and attributes of the VSAM data sets must match.

If you set up matching subscriptions that capture and replicate changes in both directions during normal operations, you must protect your environment from recursion, which occurs when you recapture changes that are written by a target server. Without recursion protection, your environment replicates the same changes repeatedly in a continuous loop. Data Replication for VSAM automatically prevents recursion. Changes that are made by the default CICS apply transaction, CFC1, do not generate replication log records. This feature avoids recursion because the replication log for these data sets does not capture changes that are made by Data Replication for VSAM. If you modify your configuration to specify CICSTRANOPT=2, Data Replication for VSAM uses CICS writer transaction CFC2 and generates replication log records. Data Replication for VSAM still automatically prevents recursion by checking the replication log records for a bit that indicates that they were written by replication, and by not processing those log records when they are read by a source server.