Kudu
The Kudu destination writes data to a Kudu cluster. For information about supported versions, see Supported Systems and Versions.
When you configure the Kudu destination, you specify the connection information for one or more Kudu masters, define the table to use, and optionally define field mappings. By default, the destination writes field data to columns with matching names. You can also enable Kerberos authentication.
The Kudu destination can use CRUD operations defined in the
sdc.operation.type
record header attribute to write
data. You can define a default operation for records without the header
attribute or value. You can also configure how to handle records with
unsupported operations.
For information about Data Collector change data
processing and a list of CDC-enabled origins, see Processing Changed Data.
If the destination receives a change data capture log from some origin systems, you must select the format of the change log.
You can configure the external consistency mode, operation timeouts, and the maximum number of worker threads to use.
You can also use a connection to configure the destination.
CRUD Operation Processing
The Kudu destination can insert, update, delete, or upsert data. The destination writes the records based on the CRUD operation defined in a CRUD operation header attribute or in operation-related stage properties.
The destination uses the header attribute and stage properties as follows:
- CRUD operation header attribute
- The destination
looks for the CRUD operation in the
sdc.operation.type
record header attribute. - Operation stage properties
- If there is no CRUD operation in the
sdc.operation.type
record header attribute, the destination uses the operation configured in the Default Operation property.
Kudu Data Types
The Kudu destination converts Data Collector data types to the following compatible Kudu data types:
Data Collector Data Type | Kudu Data Type |
---|---|
Boolean | Bool |
Byte | Int8 |
Byte Array | Binary |
Decimal | Decimal. Available in Kudu version 1.7 and later. If using an earlier version of Kudu, configure your pipeline to convert the Decimal data type to a different Kudu data type. |
Double | Double |
Float | Float |
Integer | Int32 |
Long | Int64 or Unixtime_micros. The destination determines the data
type to use based on the mapped Kudu column. The Data Collector Long data type stores millisecond values. The Kudu Unixtime_micros data type stores microsecond values. When converting to the Unixtime_micros data type, the destination multiplies the field value by 1,000 to convert the value to microseconds. |
Short | Int16 |
String | String |
- Character
- Date
- Datetime
- List
- List-Map
- Map
- Time
Kerberos Authentication
You can use Kerberos authentication to connect to a Kudu cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Kudu. By default, Data Collector uses the user account who started it to connect.
The Kerberos principal and keytab are defined in Data Collector configuration properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration properties.
For more information about enabling Kerberos authentication for Data Collector, see Kerberos Authentication.
Configuring a Kudu Destination
Configure a Kudu destination to write to a Kudu cluster.