Cassandra
The Cassandra destination writes data to a Cassandra cluster. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
When you configure the Cassandra destination, you define connection information and map incoming fields to columns in the Cassandra table. You can also use a connection to configure the destination. You specify whether the destination writes each batch to Cassandra as a logged batch or an unlogged batch. You can disable batch writes and have the destination write records individually instead.
You configure whether the destination uses no authentication or username and password authentication to access the Cassandra cluster. If you install the DataStax Enterprise (DSE) Java driver, you can configure the destination to use DSE username and password authentication or Kerberos authentication. You can also enable the destination to use SSL/TLS to connect to the cluster.
Batch Type
The Cassandra destination can write batches to a Cassandra cluster using one of the following batch types:
- Logged
- Logged batches written to Cassandra use the Cassandra distributed batch log and are atomic. This means that the destination can only write entire batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination fails the entire batch. When a batch fails, all records are sent to the stage for error handling.
- Unlogged
- Unlogged batches written to Cassandra do not use the Cassandra distributed batch log and are nonatomic. This means that the destination can write partial batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination sends only those failed records to the stage for error handling. The destination writes the remaining records in the batch to Cassandra.
By default, the destination uses the logged batch type.
For more information about the Cassandra distributed batch log, see the Cassandra Query Language (CQL) documentation.
Authentication
- None - Performs no authentication.
- Username/Password - Uses Cassandra username and password authentication.
- Username/Password (DSE) - Uses DataStax Enterprise username and password authentication. Requires that you install the DSE Java driver.
- Kerberos (DSE) - Uses Kerberos authentication. Requires that you install the DSE Java driver.
Before selecting one of the DSE authentication providers, install the DSE Java driver
version 1.2.4 or later. For a compatibility matrix, see the Cassandra documentation. You install the
driver into the Cassandra stage library, streamsets-datacollector-cassandra_3-lib
, which includes the destination. For information about installing additional drivers, see Install External
Libraries in the Data Collector
documentation.
Kerberos (DSE) Authentication
If you install the DSE Java driver, you can use Kerberos authentication to connect to a Cassandra cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to the cluster. By default, Data Collector uses the user account who started it to connect.
The Kerberos principal and keytab are defined in the Data Collector configuration file,
$SDC_CONF/sdc.properties
. To use Kerberos authentication, configure all Kerberos properties in the Data Collector
configuration file, install the DSE Java driver, and then enable Kerberos (DSE) authentication in the
Cassandra destination.
Cassandra Data Types
Due to Cassandra requirements, the data types of the incoming fields must match the data types of the corresponding Cassandra columns. When appropriate, use a Field Type Converter processor earlier in the pipeline to convert data types.
For details about the conversion of Java data types to Cassandra data types, see the Cassandra documentation.
- ASCII
- Bigint
- Boolean
- Counter
- Decimal
- Double
- Float
- Int
- List
- Map
- Text
- Timestamp
- Timeuuid
- Uuid
- Varchar
- Varint
- Blob
- Inet
- Set