Performing an external refresh

You can perform a refresh of one or more tables in a subscription using a third-party tool, and still integrate this activity into the refresh while active capability of CDC Replication.

About this task

Such third-party tools might be a table unload at source and load at target, an application program that regenerates the content of a table being run against both the source and target tables, recovery of a table's content to a prior point in time at both the source and target, or even a refresh of the table using a subscription other than the one that will be mirroring the data. A refresh or reconstruction of a table by such a means is referred to as an external refresh.

Note: An external refresh can only be performed when the target is a database, not a message queue or an ETL solution. Performing an external refresh using the dmmarkexternalunloadstart and dmmarkexternalunloadend commands is not supported for the CDC Replication Engine for InfoSphere® DataStage®.

In order to integrate the activity of the external refresh into the mirroring activity of the subscription, a new command capability is added to the product. To understand what this command does, how to provide usable data to it, and practices to avoid misuse, it is necessary to understand the mechanism of refresh while active processing.

Understanding refresh while active

When a table within a subscription is set to Method:Mirror and Status:Refresh, it is readied for a refresh while active. When the subscription is started, as viewed externally, the table is refreshed and then mirroring of the tables in the subscription starts. When the subscription starts, it must necessarily start scraping from the log at a place prior to the place when the refresh was performed. For the table that was refreshed at the start of mirroring, changes scraped from the log are discarded until scraping reaches the place where the refresh had started. From that point forward, changes scraped for that table are forwarded to the target for application to the target table. Between the place in the log where the refresh started and the place where the refresh ended, a change scraped from the log could either have been applied to the table or not yet have been applied to the table by the time that portion of the table was refreshed. Thus the change may or may not have been forwarded to the target and been applied to the target table during the refresh. For this reason, changes scraped from the log between the places where the refresh ran are applied at the target with an error mitigation filter that suppresses operational errors (INSERT of a preexisting row, UPDATE or DELETE of a nonexisting row). These changes are said to be sent with the "indefinitely refreshed" indicator set. Once scraping proceeds past the place in the log where the refresh ended, the indefinitely refreshed indicator is no longer set, and operational errors are treated as harshly as any other.

It is possible that changes that have been logged before the place in the log when the refresh started were not yet committed when the refresh started. Such changes would appear in the table after the COMMIT was issued, but would be discarded when they were scraped from the log because they were logged prior to the refresh starting place in the log. The capture of the table's content during the refresh could miss these changes, depending on how much time passes before the COMMIT is issued. This would cause the source and target tables to be unsynchronized. To avoid this, CDC Replication uses methods based on the DBMS in use to establish a point of consistency for the table at the time that the refresh for a refresh while active is starting. For example, on the z/OS platform, before the refresh starts, a shared lock with a table scope is obtained on the table being refreshed. The current log write position is determined, then the lock is released. This forces any units of recovery that contain changes for the table being locked to complete (that is, a COMMIT). When the shared lock is released the refresh starts, and the log write position sampled while the lock was held will be used as the place for the start of the refresh. The shared lock is held for milliseconds at most, and should not be disruptive to normal application processing for the table.

Considerations for an external refresh

When an external refresh occurs, it involves the capturing of the source table's content and the writing of that content to the target table. The period when the source table's content is being captured is equivalent to the period during a refresh while active when the source table is being read. Depending on how the source and target tables are being re-synchronized outside of the subscription, this period could be significantly long (for example, using unload and load) or effectively instantaneous (for example, point-in-time recovery at both source and target).

For the CDC Replication Engine for Db2® for z/OS®, this period needs to be converted to a starting place and an ending place in the log, expressed in terms of log positions in the DBMS's log. The two (possibly identical) log positions are then provided to CDC Replication using the command interface, and CDC Replication updates the metadata with them as though they had originated from the execution of the refresh of a refresh while active. When the subscription starts to mirror, these two log positions describe the scrape point when discarding of the table's data stops and marking changes as 'indefinitely refreshed starts, and when marking changes as indefinitely refreshed stops.

For all other supported engines, the refresh while active period is determined by the current position of the log when the dmmarkexternalunloadstart and dmmarkexternalunloadend commands are issued.

In order for this update to the metadata to be effective and productive, the following have to be true:
  • Unless the table has been flagged for an external load/refresh while mirror, the subscription must not be active.
  • The first (earlier) of the two log positions must not be earlier than the log position where the subscription will start scraping. If this rule is not followed, the results are not predictable.
  • Neither of the two log positions should be for a place that has not yet been written to the log. If this rule is not followed, the results are not predictable.
  • In case of rule based mapping, the procedure only works for a subscription that has been started at least once after creation.

In addition to these considerations, there is also the issue of establishing a point of consistency at the start of capturing the source table for the external refresh. When performing a refresh while active, a point of consistency for the table being refreshed is established by acquiring a shared lock with table scope. This method is unavailable when doing an external refresh, so other means of establishing a point of consistency should be employed. Sometimes, this will not be possible, such as for a point-in-time recovery at both the source and target. If it is possible, such as when performing an unload and reload, then it is strongly recommended that a point of consistency be established. This can be done by quiescing activity against the table, or simply by stopping all updating application programs. Sometimes it may not be necessary, such as when a program is used to regenerate the entire table content at both the source and target (assuming such a program would, by its action, implicitly lock the entire table). Failure to take the steps to establish a point of consistency could produce operational errors later, under the scenario described above.

Special considerations for external table load while mirroring

An external table load (refresh) while mirroring means the table load is allowed to occur while mirroring is running. During a table refresh, any mirrored operations detected are temporarily stored in a "spill queue" on the target system. Once the user signals that the table load has completed, by invoking the dmmarkexternalunloadend function, these queued operations are applied to the target table. After all spill queue operations are processed, the table becomes active. There are two ways to enable table load/refresh during mirroring:
  • Set the CDC system property table_load_while_mirror_type to EXTERNAL. Subsequently, flag the table for refresh using standard methods such as MC, CHCCLP, dmflagforrefresh, or the Java API.
  • Use the -em option with the dmflagforrefresh command.
RESTRICTIONS:
  • To perform an external table load during mirroring, the subscription must be active and mirroring in real time. If the table is flagged for external load/refresh while mirroring, and the subscription is not running or actively mirroring, the functions dmmarkexternalunloadstart and dmmarkexternalunloadend fails and return an exception.
  • Tables cannot undergo both internal and external refresh processes simultaneously. To begin an external table load during mirroring, all internal refreshes must first be completed, and mirroring must be actively running. After initiating a subscription in mirroring mode, users should wait for warning event 9765 W_TABLE_NEEDS_TO_BE_EXTERNALLY_LOADED to be triggered for the table. Only then should the dmmarkexternalunloadstart function be called for that table.
  • The subscription must remain active and continuously mirroring from the moment dmmarkexternalunloadstart is invoked until the table becomes active. A table becomes active only after dmmarkexternalunloadend is called and all mirrored operations captured during the refresh have been successfully applied. If the subscription stops at any point before the table becomes active, the entire external table load process must be repeated.
  • Currently, this feature is not supported for BigQuery, Databricks, or Snowflake target databases. Target tables with referential integrity constraints are not supported.

To perform an external refresh

Procedure

  1. Stop the subscription, if it is currently running.
  2. Run the dmmarkexternalunloadstart command.
  3. Use an external tool to unload the table data.

    You should determine if there are any limitations on transaction isolation levels by your external tool.

  4. Wait for the unload to complete.
  5. Run the dmmarkexternalunloadend command.
  6. Use your external tool to load the table data on the target.
  7. Start the subscription.

    CDC Replication will reconcile the differences corresponding to the changes made to the source table during the synchronization phase. CDC Replication runs in the manner of Adaptive Apply during the range marked by the start and end commands.