Sample: Refreshing subsets of rows

There are several uses for refreshing subsets of rows:

  • Re-synchronizing tables which are out-of-sync
  • Refreshing very large tables in stages
    • Accommodating smaller batch windows
    • Less interruption for other tables being replicated
  • Synchronization check for subset of rows

For the following examples, we have a table mapping containing columns to indicate the status of customers. On the source table, this column is called STATUS. On the target table, the column is called ACTIVITY_STATUS. The table mapping has a Data Translation of the following values:

Table 1. Values for the sample table mapping
Source table (Before image) Target table (After image)
Column name: STATUS Row Count Column name: ACTIVITY_STATUS Row Count
A 255 ACTIVE 255
D 49 DELETED 49
I 16 INACTIVE 16
P 64 PENDING 32

The difference between the two tables is that 32 PENDING rows are missing from the target table

Example 1: Refreshing a subset of rows

To synchronize the tables, you could perform a standard refresh, which would replace all rows in the target table. Instead, refreshing a subset of rows will allow you to refresh just the PENDING rows.

Steps:

  1. Select the table, right-click and choose Flag for refresh ....
  2. Select Standard Refresh.
  3. Select Refresh only a subset of rows.
  4. In the Source WHERE clause list, type the following text: STATUS='P'
  5. In the Target WHERE clause list, type the following text: ACTIVITY_STATUS='PENDING'
  6. Click OK

When the refresh starts, CDC Replication will first delete the rows on the target using the target WHERE clause. All rows with ACTIVITY_STATUS='PENDING' will be removed.

Subsequently, rows on the source will be selected using the source WHERE clause. All rows with STATUS='P' will be transferred to the target table.

Example 2: Partial synchronization checks

A Differential Refresh compares each row in the target table with each row in the source table to identify missing, changed or additional rows. For large tables, a full synchronization check could be time consuming. Using the Refresh only a subset of rows option will allow you to do partial synchronization checks.

In this example, all rows with STATUS='D' will be checked.

Steps:

  1. Select the table, right-click and choose Flag for refresh ....
  2. Select Differential Refresh and choose the Only Log Differences (Refresh Not Performed) mode.
  3. Select Refresh only a subset of rows.
  4. In the Source WHERE clause list, type the following text: STATUS='D'
  5. In the Target WHERE clause list, type the following text: ACTIVITY_STATUS='DELETED'
  6. Click OK

Subsequently, when starting the subscription, all rows with STATUS=‘D' will be sent to the target for synchronization checking.

No rows will be updated on the target, due to Only Log Differences (Refresh Not Performed) being selected.

If the subscription is started in Continuous mirroring mode, the mirroring will resume after the differences are logged.

Considerations

The following issues should be considered when you are using Refresh only a subset of rows:

  • Refresh of a subset of rows can only be configured when flagging individual tables
  • Refreshing a subset of rows will always insert the rows on the target with JDBC apply.
    • Fast Load utility cannot be used when refreshing a subset of rows
    • If the subset of rows to be refreshed is a large portion of the table, you may consider refreshing all rows instead to save time
  • When flagging the table for refresh next time, the Refresh only a subset of rows checkbox is unchecked by default. The WHERE clauses are maintained though, but not used when refreshing the table.
  • If both row filtering and Refresh only a subset of rows have been specified, row filtering expression and the source WHERE clause will be concatenated (AND operator).
  • If both deletion of selected rows and Refresh only a subset of rows have been specified, the expressions will be concatenated (AND operator)
  • As shown in the examples, the source and target WHERE clauses must match, with any differences solely the result of transformations being done in the mapping.

    Only rows that matches the target WHERE clause will be affected by the refresh. If your where clause includes a column that isn't part of the table's primary key then you need to ensure before running the refresh that the values for that column are synchronized between the source and target. In the examples, if for some rows in the target table the ACTIVITY_STATUS value is incorrect then you cannot perform a subset refresh that includes ACTIVITY_STATUS in the WHERE clause.