General Page
This replaces the IBM Data Replication Community Wiki IIDR CDC Big Data page.
There are various ways to replicate data in near-realtime from databases to Big Data Utilizing CDC. The following diagram illustrates some options:
Option 1: a) CDC can replicated directly to HDFS which can be consumed by IBM Big Insights or other Hadoop distributions
b) Use the WebHDFS support which formats the data stream in a compatible form consumable by Hive
Option 2: CDC can replicate (usually via flatfiles) to DataStage which can then apply the data into Hadoop
Option3: There is a custom CDC user exit available in Developer Works to write data directly to Streams. Developer Works link to article
Additionally IIDR's CDC also has the industry's best integration for targeting Pure Data for Analytics (Netezza) directly. Additionally CDC can replicate to DataStage via flatfiles and DataStage can apply to Netezza using their high speed Netezza adapter giving additional options for transformations.
InfoSphere Data Replication's Change Data Capture (CDC) Big Data Reference Information
Title | Link |
---|---|
New WebHDFS support available in IIDR 11.3.3.1 | link |
Document and sample user exit to target IBM Streams from IIDR's CDC | link |
Introduction to the native CDC apply for Netezza | |
Link to Additional Table with information on CDC for DataStage Integration | |
Comparing Apache Sqoop and IBM InfoSphere Data Replication (IIDR) : Moving incremental data from relational database management system (RDBMS) into the hadoop distributed file system (HDFS) | |
Presentation describing how to configure IIDR CDC WebHDFS apply to Analytics for Apache Hadoop (BigInsights V4.0) on Bluemix |
Was this topic helpful?
Document Information
Modified date:
13 November 2019
UID
ibm11105143