Kafka
The Kafka destination writes data to a Kafka cluster. The destination supports Apache Kafka 0.10 and later. When using a Cloudera distribution of Apache Kafka, use CDH Kafka 3.0 or later.
The destination writes each record as a Kafka message to the specified topic. The Kafka cluster determines the number of partitions that the destination uses to write the data.
When you configure the Kafka destination, you specify the Kafka brokers that the destination connects to, the Kafka topic to write to, and the data format to use. You can configure the destination to connect securely to Kafka. You can also use a connection to configure the origin.
You can configure the destination to pass Kafka message keys to Kafka along with the data. You can also specify additional Kafka configuration properties.
Generated Messages and Kafka Message Keys
Each Kafka message contains two parts: an optional message key and a required value. By default, the destination generates a null value for the message key and writes the record data to the message value. However, when the destination processes data that is not delimited, you can configure the destination to process Kafka message keys.
Example: Default messages
order_id | customer_id | amount |
---|---|---|
1075623 | 2 | 34.56 |
1076645 | 47 | 234.67 |
1050945 | 342 | 126.05 |
Key | Value |
---|---|
null | {"order_id":1075623,"customer_id":2,amount":34.56} |
null | {"order_id":1076645,"customer_id":47,"amount":234.67} |
null | {"order_id":1050945,"customer_id":342,"amount":126.05} |
Example: Messages with message keys
key | order_id | customer_id | amount |
---|---|---|---|
123 | 1075623 | 2 | 34.56 |
124 | 1076645 | 47 | 234.67 |
125 | 1050945 | 342 | 126.05 |
key
field for Kafka message keys, and to use JSON as the data
format, the destination writes the following messages to Kafka:
Key | Value |
---|---|
123 | {"order_id":1075623,"customer_id":2,amount":34.56} |
124 | {"order_id":1076645,"customer_id":47,"amount":234.67} |
125 | {"order_id":1050945,"customer_id":342,"amount":126.05} |
Note that the data in the key field is used as the message key and is not included in the message value.
Kafka Security
You can configure the destination to connect securely to Kafka through SSL/TLS, SASL, or both. For more information about the methods and details on how to configure each method, see Security in Kafka Stages.
Data Formats
The Kafka destination writes records based on the specified data format.
- Avro
- The destination writes records based on the Avro schema. You can use one of the following methods to specify the location of the Avro schema definition:
- In Pipeline Configuration - Use the schema defined in the stage properties. Optionally, you can configure the destination to register the specified schema with Confluent Schema Registry at a URL with a schema subject.
- Confluent Schema Registry - Retrieve the schema from Confluent Schema Registry. Confluent Schema Registry is a distributed storage layer for Avro schemas. You specify the URL to Confluent Schema Registry and whether to look up the schema by the schema ID or subject.
You can also compress data with an Avro-supported compression codec.
- Delimited
- The destination writes a delimited message for every record. You can specify a custom delimiter, quote, and escape character to use in the data.
- JSON
- The destination writes a JSON line message for every record. For more information, see the JSON Lines website.
- Text
- The destination writes a message with a single String field for every record. When you configure the destination, you select the field to use.
Configuring a Kafka Destination
Configure a Kafka destination to write data to a Kafka cluster.