Resynchronization processing

The entries in the Object Tracking List (OTL) are used for resynchronizing replicated objects on the primary and secondary nodes.

The OTL entries can be accessed in many ways through the Db2® Mirror GUI or by using SQL to query the QSYS2.RESYNC_STATUS view. See RESYNC_STATUS view.

In rare failure cases, an OTL entry may exist that was added for normal replication. Before resynchronization starts, the OTLs on both source and target nodes are compared to ensure that a conflicting operation is not found for an object. If a conflict is found, the user will need to determine which entry needs to be deleted or deferred using the Db2 Mirror GUI or the QSYS2.CHANGE_RESYNC_ENTRIES procedure prior to resynchronization. For more information on comparing OTL entries, see COMPARE_RESYNC_STATUS table function.

To use the Db2 Mirror GUI to view conflicts that will prevent resynchronization from starting, select Resolve Tracking List Conflicts from the Object Tracking List set of actions.

Figure 1. Conflict resolution option for the OTL
Conflict resolution option for the OTL

To help determine what is different between objects on the two nodes, you can compare the object using the Db2 Mirror interface.

Figure 2. OTL conflict compare and resolution using the Db2 Mirror GUI
OTL conflict compare and resolution using the Db2 Mirror GUI

Once QSYS2.COMPARE_RESYNC STATUS is successful, resynchronization will begin. There are three main steps to resynchronization:

  1. Resync preparation
  2. Resync phase one processing
  3. Resync phase two processing

Resync preparation

The purpose of resync preparation is three-fold:
  1. Determine the correct order in which to process OTL entries
  2. Optimize the processing of OTL entries
  3. Add additional entries necessary for resynchronization

Order of processing OTL entries

Objects may depend on other objects. For example, objects in a library depend on that library, objects are owned by a user profile and depend on that user profile, and logical files depend on physical files. In some cases, many entries may be added to the OTL for normal replication. For example, if a library is added to the RCL, an entry will be added for each object that will be replicated. These objects must be processed in the correct order to ensure that the depended on objects are replicated prior to the dependent objects. In the case of delete operations, the opposite is true; the dependent objects must be processed prior to the depended on objects.

Complicated database networks add an additional level of complexity. For example, a view or logical file may be dependent on more than one table or physical file, or referential constraints may require additional processing after DDL operations on the entire referential network are completed. To handle these types of situations, database files are grouped into networks. A network number is updated in the OTL during resync preparation to indicate which objects are related.

In some cases, one tracked entry on the OTL may be dependent on another entry for the same object. For example, a column in a table may be added and then a constraint added to that same table that depends on that column. In those cases, the tracking timestamp in the OTL is used to guarantee that the column is added prior to the add of the constraint.

Resynchronization of entries on the OTL can be run in parallel. However, in the case of dependencies, one OTL entry may need to be completed prior to another OTL entry being processed.

The user can specify a resynchronization priority for objects. For more information, see SET_RESYNC_PRIORITIES procedure. The priority will be considered when determining the order in which entries will be processed. However, the priority is not absolute since resynchronization must consider dependencies and database network processing in order to function correctly. The highest priority of any database file in a database network will be applied to the entire network of resync entries.

Optimizing the processing of OTL entries

Resynchronization can be a long running operation depending on the number of entries that need to be processed in the OTL. The preparation phase optimizes operations in two different ways:
  1. Save/restore entries are common in the OTL. While each entry could be processed separately, resynchronization performance is improved by bundling multiple objects on save/restore operations. The preparation step will group save/restore operations together.
  2. Certain entries on the OTL may be unnecessary. In these cases, the RESYNC_DEFERRED column in the OTL entry is updated with a value of RESYNC DEFERRED. For example,
    • When an object is deleted and a delete entry is added to the OTL, any previous entries for the object may not need to be processed since the object is going to be deleted. If the OTL contains a prior create entry for the same object, neither the create or delete may need to be processed. In some cases, this simplification is not possible due to potential dependent objects.
    • When an OTL save/restore entry for an object is processed during resync, processing for the object will include any changes that occurred before the save/restore entry and after the save/restore entry since the save/restore operation will use the current object.

Adding additional entries necessary for resynchronization

Database tables or physical files require special processing to resynchronize them.
  • If a save/restore OTL entry for a database physical file will be processed, applications on the primary node must be able to continue to perform I/O changes. In order to not disrupt these applications, the save/restore is performed using save while active. Because I/O changes can continue while the save/restore is being performed, an OTL IO entry will be added to process any rows that might change after the start of the save/restore.
  • Referential constraints on the target node are disabled prior to resynchronizing database I/O operations in order to prevent any possible constraint violations. To avoid errors, a CHGPFCST entry will be added to the OTL which will be processed when I/O resynchronization for the entire database network is complete.

Resync phase one processing

Most types of OTL entries are processed in phase one. However, the most numerous and longer running OTL entries are typically going to be database I/O entries that are processed in phase two. During phase one, the secondary node is in BLOCKED state and will stay blocked until all phase one OTL entries have been processed.

There are two situations which will prevent resync from leaving phase one.
  • Many database and SQL DDL operations run under commit. As such, the related entries on the OTL are also added under commit. These entries are locked until the end of the transaction. The prepare phase of resync must wait until either a commit or rollback has occurred by acquiring a lock on those OTL rows. If the lock cannot be acquired, processing of phase one entries will only occur on entries that precede the locked row in tracking timestamp order. The prepare phase will continue to be invoked and phase one entries resynced until no lock time-out occurs on any OTL rows.
  • In order to process database I/O entries for a table or physical file in phase two, database DDL operations must be prevented for those files on the primary node. A *SHRNUP lock is acquired on the file object (not the file data) which will prevent any DDL operations, but not prevent any I/O operations. Only a few relatively short running database DDL operations do not run under commit so it is unlikely that the *SHRNUP lock cannot be acquired. However, if the lock cannot be acquired, the prepare phase will continue to be invoked and phase one entries resynchronized until no lock time-out occurs. If after ten attempts a *SHRNUP lock still cannot be acquired, the resync will fail.

Since applications or users can perform operations that add new phase one OTL entries, once the first set of phase one entries are processed, a lock will be acquired which will delay any applications from adding new OTL entries until any remaining phase one OTL entries have been processed. Most phase one operations are very short running. Since the lock will cause applications to wait, the best practice is to minimize any long running phase one operations until phase two begins.

When resynchronization fails, the Db2 Mirror GUI indicates there is a problem with OTL resynchronization as shown in the following figure.

Figure 3. Resynchronization failure
Resynchronization failure

Right click to see the pull-down menu.

Figure 4. View tracking list
View tracking list

Select View Tracking List Details.

Figure 5. Tracking list details
Tracking list details

There are two error entries shown in the OTL. Both error messages indicate an object lock failure. Since phase one processing could not acquire locks on the object on both nodes, the error was recorded in the OTL and processing terminated before phase two.

To resolve the conflict:
  1. Identify the lock holder and release the locks causing the conflict.
  2. Select Resume Db2 Mirror from the pull-down menu to retry the resyncronization.

Resync phase two processing

After phase one of resynchronization has completed, the following three steps will be performed during phase two resynchronization:
  1. *SHRNUP locks were acquired during phase one on all tables and physical files that need to be resynced in phase two. These locks will prevent any DDL operations from modifying the tables or physical files.

    The locks are released and DDL is allowed again after phase two resync ends.

    Record locks are obtained and released as needed during phase two processing.

  2. An internal attribute will be changed in any tables or physical files that need to be resynchronized. This attribute will prevent any users or applications running on the secondary node from performing updates, deletes, or inserts to these tables or physical files.

    The attribute will be reset for all related tables and physical files in the same network when resync of the entire network is complete.

    During phase two, updates, deletes, and inserts that are performed on the primary node will have their corresponding operations replicated to the secondary node. Because I/O operations are being performed on the primary node as it is replicating changes to the secondary node, once phase two starts the number of row changes that need to be resynchronized will no longer increase and may decrease if rows that need to be resynchronized are deleted or updated by an application before they are resynchronized.

  3. The Db2 Mirror state will become ACTIVE.

Database I/O is always resynchronized in phase two. Spooled file entries are handled in phase two as well, unless a replicated output queue is moved, renamed, or deleted.

If a failure prevents resynchronization phase two from succeeding, the replication state will revert back to TRACKING/BLOCKED. On an IASP, both SYSBAS and the IASP would revert back to TRACKING/BLOCKED. SYSBAS replication can be resumed immediately and then the issues preventing resynchronization in the IASP can be addressed.