Data integration tutorial: Replicate data
Take this tutorial to set up data replication between a source and target data store.
Your goal is to use Data Replication to integrate the credit score information from the provider's Db2 on Cloud data source by setting up a near real time and continuous replication feed with efficient data capture from the source database into your Golden Bank's Event Streams instance. Event Streams is a high-throughput message bus built with Apache Kafka. It is optimized for event ingestion into IBM Cloud and event stream distribution between your services and applications. For more information about Event Streams, see the Learn more section.
The story for the tutorial is that Golden Bank needs to adhere to a new regulation where it cannot lend to underqualified loan applicants. As a data engineer at Golden Bank, you need to provide access to the most up to date credit scores of loan applicants. These credit scores are sourced from a Db2 on Cloud database owned by an external provider and continuously delivered into Golden Bank's Event Streams hub. The data in Event Streams hub is used by the application to lookup credit scores for mortgage applicants to determine loan approval for qualified applicants.
The following animated image provides a quick preview of what you’ll accomplish by the end of the tutorial.
Preview the tutorial
In this tutorial, you will complete these tasks:
- Set up the prerequisites.
- Task 1: Set up Event Streams.
- Task 2: View credit score data.
- Task 3: Create a connection to your Event Streams instance.
- Task 4: Set up data replication.
- Task 5: Run data replication.
- Cleanup
Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.
Set up your browser windows
For the optimal experience completing this tutorial, open your account in one browser window, and keep this tutorial page open in another browser window to switch easily between the two windows. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Set up the prerequisites
Sign up for IBM watsonx.data integration
You must sign up for IBM watsonx.data integration to set up data replication. If you don't have an account yet, then sign up for watsonx.data integration.
Verify the necessary provisioned services
Follow these steps to verify or provision the necessary services:
-
In watsonx, verify that you are in the Toronto region. If not, click the region drop down, and then select Toronto.

-
From the Navigation menu
, choose Services > Service instances.
-
Use the Product drop-down list to determine whether an existing watsonx.data integration as a Service service instance exists.
-
If you need to create a watsonx.data integration as a Service service instance, click Add service.
-
Select watsonx.data integration as a Service.
-
Select the Trial plan.
-
Click Create.
-
-
Wait while the service is provisioned, which might take a few minutes to complete.
-
Repeat these steps to verify or provision the following additional services:
- Event Streams - You might be prompted to log in to your IBM Cloud account.
Create the sample project
If you already have the sample project for this tutorial, then skip to Task 1. Otherwise, follow these steps:
-
Access the Data integration tutorial sample project in the Resource hub.
-
Click Create project.
-
If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
-
Click Create.
-
Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.
-
Click the Assets tab to see the connections, connected data asset
Check your progress
The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.

Task 1: Set up Event Streams
As part of the Prerequisites, you provisioned a new Event Streams instance. Now, you need to set up that service instance. Follow these steps to:
-
Create a topic to store the data replicated from the source data in Db2 on Cloud. The topic is the core of Event Streams flows. Data passes through a topic from producing applications to consuming applications.
-
Copy sample code that contains the bootstrap server information necessary to set up data replication.
-
Create credentials that you will use to create a connection to the service in the project.
-
Return to the IBM Cloud console Resources list.
-
Expand the Integration section.
-
Click the service instance name for your Event Streams instance to view the instance details.
-
First, to create the topic, click the Topics page.
-
Click Create topic.
-
For the Topic name, type
golden-bank-mortgage. -
Click Next.
-
In the Partitions section, accept the default value, and click Next.
-
In the Message retention section, accept the default value, and click Create topic.
-
Open a text editor, and paste the topic name
golden-bank-mortgageinto the text file to use later.
-
-
Next, back on the Topics page, click Connect to this service to retrieve the connection information.
-
Copy the value in the Bootstrap server field. The bootstrap server is required when creating a connection to the Event Streams instance in your project.
-
Paste the bootstrap server value into the same text file to use later.
-
Click the Sample code tab.
-
Copy the value in the Sample configuration properties field. You will use some properties from this snippet to connect securely to the service.
-
Paste the sample code into the same text file to use later.
-
Click the X to close the Connect to this service panel.
-
-
Lastly, to create the credentials, click the Service credentials page.
-
Click New credential.
-
Accept the default name, or change it if you would prefer.
-
For the Role, accept the default value of Manager.
-
In the Select Service ID field, select Auto Generate.
-
Click Create.
-
Click the Copy to clipboard icon.
-
Paste the credentials into the same text file to use later.
-
Your text file should contain all of the following information:
TOPIC NAME: golden-bank-mortgage
BOOTSTRAP SERVER FIELD
broker-5-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-1-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-2-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-0-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-3-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-4-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093
SAMPLE CODE
bootstrap.servers=broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="token" password="<APIKEY>";
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=HTTPS
CREDENTIALS
{
"api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"apikey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"bootstrap_endpoints": "broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"iam_apikey_description": "Auto-generated for key crn:v1:bluemix:public:messagehub:us-south:a/a53b11fc95fcca4d96484d0de5f3bc3c:6b5a2cb2-74ef-432d-817f-f053873e7ed2:resource-key:96372942-5d26-4c59-8ca4-41ab6766ba91",
"iam_apikey_name": "Service credentials-1",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/a53b11fc95fcca4d96484d0de5f3bc3c::serviceid:ServiceId-4773bed1-f423-43ea-adff-469389dca54c",
"instance_id": "6b5a2cb2-74ef-432d-817f-f053873e7ed2",
"kafka_admin_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
"kafka_brokers_sasl": [
"broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
"broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093"
],
"kafka_http_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
"password": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"user": "token"
Check your progress
The following image shows the Topics page for your Event Streams instance in IBM Cloud. You are now ready to create connection to the Event Streams instance in your project.

Task 2: View the credit score data
The sample project includes a connection to the Db2 on Cloud instance where the source data is stored. Follow these steps to view the connection asset and the credit score data:
-
Return to your IBM watsonx browser tab. You will see the Data integration project. If you don't see the project, then follow these steps:
-
From the Navigation menu
, choose Projects > View all projects.
-
Click the Data integration project to open it.
-
-
On the the Assets tab, click All assets.
-
Locate the the Data Fabric Trial - Db2 on Cloud - Source connection asset.
-
Locate the CREDIT_SCORE connected data asset.
-
Click the CREDIT_SCORE asset to see a preview. This data asset maps to the CREDIT_SCORE table in the BANKING schema in the provider's Db2 on Cloud instance. It includes information about the mortgage applicants such as ID, name, address, and credit score. You want to set up data replication for this data asset.
-
Click Data integration project name in the navigation trail to return to the project.

Check your progress
The following image shows the credit score data asset in the sample project. You are now ready to create a connection to the Event Streams service in this project.

Task 3: Create a connection to your Event Streams instance
To set up replication, you also need a connection to the new Event Streams instance that you provisioned as part of the Prerequisites using the information you gathered in Task 1. Follow these steps to create the connection asset:
-
On the Assets tab, click New asset > Connect to a data source.
-
Select the Apache Kafka connector, and then click Next.
-
For the Name, type
Event Streams. -
In the Connection details section, complete the following fields:
- Kafka server host name: Paste the bootstrap server value from the text file you created in Task 1.
- Secure connection: Select SASL_SSL.
- User principal name: Paste the user value from the service credentials in your text file. This value is usually
token. - Password: Paste the password value from the service credentials in your text file.
-
Click Test connection.
-
When the test is successful, click Create. If the test is not successful, verify the information you copied and pasted from your text file, and try again. If prompted to confirm creating the connection without setting location and sovereignty, click Create again.
-
Click All assets to see the new connection.
Check your progress
The following image shows the Assets tab in the sample project showing the new Event Streams connection asset. You are now ready to associate the Data Replication service with this project.

Task 4: Set up data replication
Now you can create a Data Replication asset to start continuous data replication between the Db2 on Cloud source and the Event Streams target. Follow these steps to set up data replication:
-
Click the Assets tab in the project.
-
Click New asset > Replicate data.
-
For the Name, type
CreditScoreReplication. -
For the Business goal, select Copy.
-
Click Source options.
-
On the Source options page, select Data Fabric Trial - Db2 on Cloud - Source from the list of connections.
-
Click Select data.
-
On the Select data page, select the BANKING schema > CREDIT_SCORE table.
-
Click Target options.
-
On the Target options page, select Event Streams from the list of connections.
-
In the Default topic field, paste the topic name created in Task 1,
golden-bank-mortgage. -
Accept the default value for the rest of the fields, and click Review.
-
Review the summary, and click Create.
Check your progress
The following image shows the ReplicateCreditScoreData screen with replication stopped. You are now ready to run data replication.

Task 5: Run data replication
After creating the Data Replication asset, you can run data replication and view information about the replication status. Follow these steps to run data replication:
-
On the CreditScoreReplication screen, click the Run icon
to start the replication process.
If this is your first time running a Data Replication asset, you might be prompted to provide an API key. Data replication assets use your personal IBM Cloud API key to execute replication operations securely without disruption. If want to use a specific API key, then click the Settings icon
.
- If you have an existing API key, click Use existing API key, paste the API key, and click Save.
- If you don't have an existing API key, click Generate new API key, and then click Generate. Save the API key for future use, and then click Close.
-
In the Event logs section, click the Refresh icon
to see any new messages.
-
After a few minutes, the message
Completed initial synchronization for table "BANKING"."CREDIT_SCORE"displays in the Event logs section.
From this point forward, any changes to the BANKING.CREDIT_SCORE table in the Db2 on Cloud instance will be detected automatically and replicated to the target.
Check your progress
The following image shows the CreditScoreReplication screen with replication running and messages in the Event log. You are now ready to monitor replication by watching the status of the replication asset, the events and metrics, and to
verify that the data is being replicated.

As a data engineer at Golden Bank, you set up continuous access to the most up to date credit scores of loan applicants by configuring data replication between the CREDIT_SCORE table in the Db2 on Cloud source database and a topic in Event Streams. If there are changes to an applicant's credit score, then Golden Bank's mortgage approvers will have near real time access to those changes.
Cleanup (Optional)
If you would like to retake the tutorials in the Data integration use case, delete the following artifacts.
| Artifact | How to delete |
|---|---|
| watsonx.data integration and Event Streams service instances | 1. From the Navigation Menu 2. Click the Action menu next to the service name, and choose Delete. |
| Data integration sample project | Delete a project |
Next steps
-
Try other tutorials:
-
View another Data fabric use case.