DB2 10.5 for Linux, UNIX, and Windows

Latency and throughput of shadow tables

Latency is the length of time between the system applying an update to a source table and then applying that same update to the shadow table. It indicates how up to date the shadow table is. Throughput is the quantity of data that is processed within a certain period.

You can tune your environment to optimize latency and throughput of your shadow tables in a number of ways:

Ensure that you have a suitable latency setting

Choose your setting for the CURRENT REFRESH AGE special register that is based on your business requirements. This special register determines the maximum age of the shadow table data after which the shadow table is no longer considered for queries. The CURRENT REFRESH AGE register setting effectively sets the shelf life of the data in the shadow table. A low value for the CURRENT REFRESH AGE special register means that there is less replication lag, but a value that is too small can limit the use of a shadow table. On the other hand, a value that is too high could lead to shadow tables that have outdated data. If the value is set to ANY, the replication latency is ignored.

Ensure that you have sufficient I/O subsystem bandwidth

The I/O subsystem bandwidth needs to be able to handle the extra I/O for shadow tables. Consider putting shadow tables in a separate I/O subsystem or increasing the disk capacity of the existing I/O subsystem.

Enable the InfoSphere® CDC fast apply feature

Fast apply is a product feature that provides opportunities to increase throughput and reduce latency when the apply process is a performance bottleneck. To improve the performance of your shadow tables, enable fast apply for your subscription in the InfoSphere CDC Management Console. For information about how enable fast apply, see To enable fast apply for a subscription in Management Console. For steps 5 and 6, use these fast apply modes:

Group by table

In this mode, InfoSphere CDC reorders a set of operations by creating lists of operations for each table, and then attempts to apply them to the target system. Reordering of the operations provides an opportunity for InfoSphere CDC to use the JDBC batch feature.

Important: Enable this mode only when you are shadowing multiple tables.

To enable this fast apply mode enter the following text in the Class Name box of the subscription-level user exit dialog box:

com.datamirror.ts.target.publication.userexit.fastapply.GroupByTable

InfoSphere CDC performs this ordering on a group of transactions from the source system that can be referred to as a "unit of work". You can specify a threshold (the maximum size of the unit of work) by specifying an integer value in the Parameter box of the subscription-level user exit dialog box. The recommended setting for a database with shadow tables is 10000.

Parallelize by table

When you shadow an entire database with many tables, you might find that the "Group by table" fast apply mode is not sufficient. If you do not see any performance improvement or the improvement is insignificant after you enable the "Group by table" mode, try the "Parallelize by table" mode. This mode of fast apply is similar to the "Group by table" mode, but instead of applying the reordered operations on a single database connection, the operations are applied concurrently across multiple database connections.

To enable this fast apply mode, enter the following text in the Class Name box of the subscription-level user exit dialog box:

com.datamirror.ts.target.publication.userexit.fastapply.ParallelizeByTable

You can specify the unit of work threshold and the number of database connections by specifying two integer values (separated by a colon) in the Parameter box of the subscription-level user exit dialog box. The recommended setting for a database with shadow tables is 8:10000.

Restriction:

You cannot use multiple subscriptions in environments with shadow tables.

InfoSphere CDC users who do not see adequate performance improvements from the fast apply modes sometimes split their tables into multiple subscriptions as further step to improve performance. However, in an environment that is enabled for shadow tables, a single DB2 database requires a single InfoSphere CDC instance and a single subscription that replicates all shadow tables because there can be only one latency table per database. That latency table cannot be shared among multiple subscriptions.

Maximize batching

To take full advantage of batching opportunities, increase the value of the InfoSphere CDC global_max_batch_size system parameter to 1024, as follows:

$ cd <cdc-installation-path>/bin 
$ ./dmset -I <cdc-instance-name> global_max_batch_size=1024

This parameter specifies the maximum number of rows that InfoSphere CDC can place in an array and apply to the target database during refresh or mirroring. For more information, see global_max_batch_size.

Transaction size during apply to shadow tables

Shadow tables benefit from larger transaction sizes when the tables are maintained by CDC. You can use the InfoSphere CDC acceptable_latency_in_seconds_for_column_organized_tables parameter to increase the size of transactions against shadow tables by grouping. Larger transactions delay the commits. Set the parameter to a value that is smaller than the value of the CURRENT REFRESH AGE special register.

When you tune this parameter, gradually increase its value from the default setting of 5 as follows:

$ cd <cdc-installation-path>/bin
$ ./dmset -I <cdc-instance-name>
acceptable_latency_in_seconds_for_column_organized_tables=10

For more information, see acceptable_latency_in_seconds_for_column_organized_tables.