Understanding how CDC Replication interacts with your database

When CDC Replication interacts with your database, by reading its logs or applying data to its tables, it creates a dependency on your database.

This dependency manifests itself in several ways:

  • Log management
  • Resource utilization and availability
  • Change management

Log management

Log management requires that you keep the logs from which CDC Replication reads until such time as CDC Replication has replicated data from them. The dmshowlogdependency command, available for most CDC Replication engines, informs you of those database logs on which CDC Replication continues to depend. Database logs should not be removed until such time as they no longer appear in the list of logs displayed when the command is issued.

The consequences of not adhering to this policy are that CDC Replication will either end with an error or appear to hang as it waits for the log files to become available to read, depending on the database. If the log files have been deleted and are permanently unavailable then you will have no option but to refresh the data. CDC Replication cannot skip logs while maintaining data integrity as it will never know what data would be missed in the log files that were skipped.

Similarly, the log files must have file system permissions sufficient for CDC Replication to be able to read them. Should such permissions not be sufficient, CDC Replication will fail with a message indicating that the specified log could not be opened for reading.

Resource utilization and availability

CDC Replication is frequently installed on the same server as the database from which it is replicating or to which it is replicating. For this reason, it is important to ensure that the memory allocated for use by CDC Replication is actually physically available on the machine. By default, some databases can be configured to use all available memory on the machine. Such a configuration will not work for CDC Replication, as it will have no memory with which to run. At least the amount of memory allocated to InfoSphere® Data Replication will need to be set aside from the database to ensure that CDC Replication will be able to run.

Symptoms of resource starvation include many variations on CDC Replication failing due to out of memory conditions, communications failures, very high latency, timeout errors, and others.

Change management

Sometimes referred to as schema evolution, change management refers to the necessity of planning changes to the structure of database tables that CDC Replication is replicating and coordinating those changes with the operation of CDC Replication to ensure that the changes do not disrupt replication.

The database and CDC Replication must share the same understanding of the structure of the tables being replicated. Without a shared understanding, CDC Replication will interpret the table data incorrectly, and thereby replicate that data incorrectly. CDC Replication endeavours to protect users from potential data loss or corruption resulting from uncoordinated table structure changes, but it is not always able to do so. In order to minimize recovery efforts resulting from uncoordinated table structure changes, it is a best practice to follow the change management procedures appropriate to your database. Coordinating change management between the database and CDC Replication will ensure smooth continuity of replication with minimal effort. Please note that change management practices apply to the tables in both source and target databases.

Recognizing that some table structure changes are inadvertently performed, tech notes are also available to assist you in recovering from uncoordinated table structure changes.

InfoSphere Data Replication 11.3.3 continues replication without interruption during an index rebuild, table rebuild, or index reorganization operation that is unrelated to a table structure change.