Maximizing CDC Replication Engine for Kafka robustness

You can configure the properties file of the CDC Kafka producer to leverage Kafka's native handling for leader rebalancing, broker failure, and other errors from the Kafka server.

Procedure

Edit the properties of the CDC Kafka producer in the CDC_Kafka_instance_directory/conf/kafkaproducer.properties file.
Add or modify the following properties:

retry.backoff.ms=time1

Where time1 is an integer value in milliseconds.

retries=num1

Where num1 is an integer value for the number of times the client must retry when it encounters an error. Setting a value for num1 that is greater than zero causes the client to resend any record whose sending fails with an error.

For Kafka version 2.3.1 or higher:

retry.backoff.ms=time2

Where time2 is an integer value in milliseconds.

delivery.timeout.ms=num2

Where num2 is an integer value for the total time to await acknowledgment from the broker and the time that is allowed for send failures with possibility to retry. This value must be greater than anticipated network outage.
If the version of the Kafka cluster supports the idempotence feature, set enable.idempotence=true. For older Kafka cluster versions that do not support idempotence, you can set max.in.flight.requests.per.connection=1 to promote ordered batch delivery.
End replication and start replication for Kafka subscriptions.

What to do next

For a full discussion of these parameters and their implications, see the Apache Kafka documentation.

To tune the values for your environment, adjust the Kafka producer properties retry.backoff.ms and retries according to the following formula:

retry.backoff.ms * retries > the anticipated maximum time for leader change metadata to propagate in the cluster

For example, you might want to configure enable.idempotence=true, retry.backoff.ms=300, and retries=150.

Note: The Kafka documentation for retries includes the following consideration when using the desired robustness:

"Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first."

If max.in.flight.requests.per.connection is set to 1, when a Kafka producer returns an error to the CDC Replication Engine for Kafka, replication ends for that subscription. Duplicate records might be written when the next CDC mirroring session starts. Depending on business logic, you might wish to increase max.in.flight.requests.per.connection. If you do so, note that transparent retries for the producer with a single batch in flight means you might see a repeated batch within a single replication session.