Latency is the length of time between the system applying
an update to a source table and then applying that same update to
the shadow table. It indicates how up to date the shadow table is.
Throughput is the quantity of data that is processed within a certain
period.
You can tune your environment to optimize latency and throughput
of your shadow tables in a number of ways:
- Ensure that you have a suitable latency setting
- Choose your setting for the CURRENT REFRESH AGE special register
that is based on your business requirements. This special register
determines the maximum age of the shadow table data after which the
shadow table is no longer considered for queries. The CURRENT REFRESH
AGE register setting effectively sets the shelf life of the data in
the shadow table. A low value for the CURRENT REFRESH AGE special
register means that there is less replication lag, but a value that
is too small can limit the use of a shadow table. On the other hand,
a value that is too high could lead to shadow tables that have outdated
data. If the value is set to ANY, the replication latency is ignored.
- Ensure that you have sufficient I/O subsystem bandwidth
- The I/O subsystem bandwidth needs to be able to handle the extra
I/O for shadow tables. Consider putting shadow tables in a separate
I/O subsystem or increasing the disk capacity of the existing I/O
subsystem.
- Enable the InfoSphere® CDC
fast apply feature
Fast apply is a product feature that provides opportunities
to increase throughput and reduce latency when the apply process is
a performance bottleneck. To improve the performance of your shadow
tables, enable fast apply for your subscription in the InfoSphere CDC Management Console. For
information about how enable fast apply, see
To enable fast apply for a subscription in Management
Console. For steps 5 and 6, use these fast apply modes:
- Group by table
- In this mode, InfoSphere CDC
reorders a set of operations by creating lists of operations for each
table, and then attempts to apply them to the target system. Reordering
of the operations provides an opportunity for InfoSphere CDC to use the JDBC batch feature.
Important: Enable this mode only when you are
shadowing multiple tables.
To enable this fast apply mode
enter the following text in the
Class Name box
of the subscription-level user exit dialog box:
com.datamirror.ts.target.publication.userexit.fastapply.GroupByTable
InfoSphere
CDC performs this ordering on a group of transactions from the source
system that can be referred to as a
"unit of work". You can specify
a threshold (the maximum size of the unit of work) by specifying an
integer value in the
Parameter box of the subscription-level
user exit dialog box. The recommended setting for a database with
shadow tables is
10000.
- Parallelize by table
- When you shadow an entire database with many tables, you might
find that the "Group by table" fast apply mode is not sufficient.
If you do not see any performance improvement or the improvement is
insignificant after you enable the "Group by table" mode, try
the "Parallelize by table" mode. This mode of fast apply is similar
to the "Group by table" mode, but instead of applying the reordered
operations on a single database connection, the operations are applied
concurrently across multiple database connections.
To enable this
fast apply mode, enter the following text in the
Class
Name box of the subscription-level user exit dialog box:
com.datamirror.ts.target.publication.userexit.fastapply.ParallelizeByTable
You
can specify the unit of work threshold and the number of database
connections by specifying two integer values (separated by a colon)
in the Parameter box of the subscription-level user exit dialog box.
The recommended setting for a database with shadow tables is
8:10000.
Restriction: You cannot use multiple subscriptions in environments
with shadow tables.
InfoSphere CDC users who do not see adequate
performance improvements from the fast apply modes sometimes split
their tables into multiple subscriptions as further step to improve
performance. However, in an environment that is enabled for shadow
tables, a single DB2 database requires a single InfoSphere CDC instance
and a single subscription that replicates all shadow tables because
there can be only one latency table per database. That latency table
cannot be shared among multiple subscriptions.
- Maximize batching
- To take full advantage of batching opportunities, increase the
value of the InfoSphere CDC global_max_batch_size system
parameter to 1024, as follows:
$ cd <cdc-installation-path>/bin
$ ./dmset -I <cdc-instance-name> global_max_batch_size=1024
This
parameter specifies the maximum number of rows that InfoSphere CDC
can place in an array and apply to the target database during refresh
or mirroring. For more information, see global_max_batch_size.
- Transaction size during apply to shadow tables
Shadow tables benefit from larger transaction sizes when the
tables are maintained by CDC. You can use the InfoSphere CDC acceptable_latency_in_seconds_for_column_organized_tables parameter
to increase the size of transactions against shadow tables by grouping.
Larger transactions delay the commits. Set the parameter to a value
that is smaller than the value of the CURRENT REFRESH AGE special
register.
When you tune this parameter, gradually increase its
value from the default setting of 5 as follows:
$ cd <cdc-installation-path>/bin
$ ./dmset -I <cdc-instance-name>
acceptable_latency_in_seconds_for_column_organized_tables=10
For
more information, see
acceptable_latency_in_seconds_for_column_organized_tables.