Replicating Apache Kafka data
You can replicate data from other databases to Apache Kafka with the Data Replication service.
Restriction
Apache Kafka can only be used as a target data store for Data Replication.
Before you begin
Ask your Kafka administrator for Kafka credentials that Data Replication can use to write to the cluster. Ask for Kafka producer properties for the cluster.
To connect to Apache Kafka in a project in Cloud Pak for Data, see Apache Kafka connection.
Configuring a replication asset to use Apache Kafka
To create a Data Replication asset with Apache Kafka as a target data store:
-
Click the Assets tab in the project.
-
Click New asset > Replicate data.
-
Enter a name.
-
Click Connections.
-
On the Source options page, select a connection from the list of connections or click Add connection to create a new connection.
-
Click Select data, select a schema, and optionally a table from the schema.
-
On the Target options page, select the Kafka connection from the list.
-
Optional: Provide any Kafka producer configuration properties.
-
Select the Kafka message format. Avro can only be selected for a connection with a schema registry.
-
Optional: If you selected Avro as your message format, you can provide Avro serializer properties.
-
Expand the Topic mappings. Source tables replicate to the target topic if specified. Complete either the default topic, target topic, or both. To proceed, all mappings must be defined.
The default topic name is the default topic name that is used for any defaulted schema mappings. Providing a default topic name is optional provided all the schema mappings are explicitly defined.
Entering a topic name in the topic column replicates the corresponding source data to that topic. Any rows that are left as default uses the default topic property. A schema that has specific tables that are selected can specify a topic for the schema that are used as the default for any tables that are selected from that schema that are not specifically mapped to a topic.
-
Optional: Select Override the default Kafka message parameters to customize the Kafka topic headers and key. Headers and keys both use JMESPath format.
- Click Add header and complete a Kafka topic header field name and corresponding JMESpath expression. The following functions are available to provide dynamic information based on the current context of the replication
record:
- meta(@, 'container_name) is
“<schema name>”.”<table name>”
. - meta(@, 'object_id') is the value of the primary key column.
- meta(@, 'database_name') is the name of the database.
- meta(@, 'schema_name') is the schema name.
- meta(@, 'table_name’) is the table name.
- meta(@, 'is_consistent’) is the flag indicating the target and source are in a consistent state.
- meta(@, 'dml_operation_type') is used to Insert, Delete, or Update.
Important: Headers are part of every Kafka message and are visible to users. Do not provide sensitive information such as API keys, usernames, or passwords.
- meta(@, 'container_name) is
- Adjust the Kafka topic key by modifying the Key expression. The functions that are listed above can be used within the key expression.
- Specify the partitioning strategy that is used for this replication by modifying the partition expression. Note that the partitioning strategy specified applies to all topics in the replication.
- null: This is the default value. A hash of the key specified and the number of partitions in a topic is used to determine the partition where the message is written. If the key is not unique and has limited variance the resultant partitioning can be skewed. In the large tables where the key is unique the default partitioning should result in relatively even distribution among the partitions.
- Integer (nonnegative): This sends all messages to the specified partition. In Apache Kafka topic partitions are zero-based so a topic with 5 partitions allows values of 0, 1, 2, 3, or 4 only. Not that this applies to all topics within the replication. If the replication is mapping tables to different topics, all topics will be writing to the single specified partition.
- Click Add header and complete a Kafka topic header field name and corresponding JMESpath expression. The following functions are available to provide dynamic information based on the current context of the replication
record:
-
On the Review page, review the summary, then click Create.
When you configure Data Replication to replicate a source schema, tables you create on the source database replicate to the default topic when you run the Data Replication asset. When you replicate a source schema and you change a source table's structure, Data Replication updates the target topic headers accordingly. When you start a replication, the topics that are configured in the replication are truncated. When you pause and resume a replication, the topics are not truncated.
Parent topic: Supported Data Replication connections