IBM InfoSphere Change Data Capture overview

IBM® InfoSphere® Change Data Capture (InfoSphere CDC) is a replication solution that captures database changes as they happen and delivers the changes to target databases, message queues, or an ETL solution, such as IBM InfoSphere DataStage®.

The unit of replication within InfoSphere CDC is called a subscription. A subscription contains mapping details that specify how data in a source data store is applied to a target data store. A subscription can be scheduled to run at a specific time and stop when the updates to the target data store are complete, or a subscription can be set to run on a continuous basis to apply updates to the target data store as the changes occur. Subscriptions are created, configured, run, and monitored in the IBM InfoSphere Change Data Capture Management Console.

When a subscription runs, InfoSphere CDC captures changes on the source database. InfoSphere CDC delivers the change data to the target, and stores sync point information in a bookmark table in the target database. InfoSphere CDC uses the bookmark information to monitor progress of the InfoSphere DataStage job and determine restart points in the event of a failure. Synchronous updates to the bookmark information ensure that no updates are lost even if InfoSphere DataStage is not able to process all available updates. The bookmark information can also be used to determine the log retention policy for the source database.

InfoSphere CDC supports the following connection methods for writing change data to InfoSphere DataStage:
Direct connect
Uses TCP/IP as the transport protocol to stream data from InfoSphere CDC to InfoSphere DataStage. When you use this method, you can configure options that cause the job to start automatically when you run the subscription.
Flat File
Uses a file system to deliver source changes to InfoSphere DataStage.

This documentation describes how to use the direct connect method. For more information about using the flat file method, see the documentation for InfoSphere CDC. The documentation for InfoSphere CDC also contains additional details about using the direct connect method.