Kafka
Available when using an authoring Data Collector version 3.19.0 or later.
- Apache Kafka,
streamsets-datacollector-apache-kafka_<version>-lib
- Cloudera CDH,
streamsets-datacollector-cdh_<version>-lib
- Cloudera CDP,
streamsets-datacollector-cdp_<version>-lib
For a description of the Kafka connection properties, see Kafka Connection Properties.
Engine | Stages and Locations |
---|---|
Data Collector 3.19.0 or later |
|
Transformer 3.18.0 or later |
|
For information about features added to the connection with different engine releases, see the connection requirements for the engine.
Kafka Connection Properties
Kafka Property | Description |
---|---|
Broker URI | Comma-separated list of connection strings for the Kafka brokers. Use
the following format for each broker:
<host>:<port> .To ensure a pipeline can connect to Kafka in case a specified broker goes down, list as many brokers as possible. |
Security Property | Description |
---|---|
Security Option | Authentication and encryption option used
to connect to the Kafka brokers:
Enabling security requires completing several prerequisite tasks on all execution engines that access the connection and requires configuring additional security properties in the connection, as described in Kafka Security. Note: The Custom Authentication option is supported
with Data Collector 5.1.0 or
later.
|
Kafka Security
- SSL/TLS Encryption
- SSL/TLS Encryption and authentication
- SASL Authentication
- SASL Authentication on SSL/TLS
- Custom Authentication
The SASL authentication options provide two SASL mechanisms: PLAIN or GSSAPI (Kerberos).
Enabling security requires completing several prerequisite tasks in addition to configuring security properties in the connection.
Prerequisite Tasks
Complete the following prerequisite tasks for the security option that you want to use:
- SSL/TLS
-
Complete the following prerequisite tasks before using SSL/TLS to connect to Kafka:
- Make sure Kafka is configured for SSL/TLS as described in the Kafka documentation.
- Store the SSL truststore and keystore files on the execution
engine machine.
For Transformer pipelines, store the files in the same location on the Transformer machine and on each node in the Spark cluster.
For Data Collector Kafka YARN cluster pipelines, store the files in the same location on the Data Collector machine and on each node in the YARN cluster.
- SASL with the PLAIN mechanism
- Complete the following prerequisite tasks before using SASL with the PLAIN
mechanism to connect to Kafka:
- Make sure Kafka is configured for SASL authentication with the PLAIN mechanism as described in the Kafka documentation.
- Define the username and password credentials in a JAAS configuration file, as described in Providing PLAIN Credentials.
- Store the JAAS configuration file on the execution engine
machine.
For Transformer pipelines, store the files in the same location on the Transformer machine and on each node in the Spark cluster. When configuring the pipeline, you also must specify the path to the JAAS configuration file as Extra Spark Configuration properties on the pipeline Cluster tab. For more information, see Enabling SASL Authentication in the Transformer documentation.
For Data Collector Kafka YARN cluster pipelines, store the JAAS file in the same location on the Data Collector machine and on each node in the YARN cluster.
- SASL with the GSSAPI (Kerberos) mechanism
-
Complete the following prerequisite tasks before using SASL with the GSSAPI (Kerberos) mechanism to connect to Kafka:
- Make sure Kafka is configured for SASL authentication with the GSSAPI (Kerberos) mechanism as described in the Kafka documentation.
- Make sure Kerberos authentication is enabled for the execution
engines.
For Data Collector pipelines, see Kerberos Authentication in the Data Collector documentation. For Transformer pipelines that run on a Hadoop YARN cluster configured for Kerberos, see Kerberos Authentication in the Transformer documentation.
- Determine how to provide the Kerberos credentials and complete the required tasks as described in Providing Kerberos Credentials.
- Store the JAAS configuration and Kafka keytab files on the execution
engine machine.
For Transformer pipelines, store the files in the same location on the Transformer machine and on each node in the Spark cluster. When configuring the pipeline, you also must specify the path to the JAAS configuration file as Extra Spark Configuration properties on the pipeline Cluster tab. For more information, see Enabling SASL Authentication in the Transformer documentation.
For Data Collector Kafka YARN cluster pipelines, store the files in the same location on the Data Collector machine and on each node in the YARN cluster.
SASL Authentication Credentials
When using SASL authentication to connect to Kafka, the method that you use to provide credentials depends on whether you use the PLAIN or GSSAPI (Kerberos) SASL mechanism.
Providing PLAIN Credentials
To connect to Kafka using SASL authentication with the PLAIN mechanism, provide the credentials in a Java Authentication and Authorization Service (JAAS) file.
Create a JAAS configuration file on the Data Collector or Transformer machine. You can define a single JAAS file for an execution engine. As a result, every Kafka connection in every pipeline run on that engine uses the same credentials.
Add the following KafkaClient
login section to the file:
KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="<username>"
password="<password>";
};
-Djava.security.auth.login.config=<JAAS config path>/kafka_client_jaas.conf
Modify Data Collector environment variables or Transformer environment variables using the method required by your installation type.
Providing Kerberos Credentials
To connect to Kafka using SASL authentication with the GSSAPI (Kerberos) mechanism, you must provide the Kerberos credentials to use.
You can provide Kerberos credentials in either of the following ways. You can also use both methods, as needed:
- JAAS file
- Define Kerberos credentials in a Java Authentication and Authorization Service (JAAS) file when you want to use the same keytab and principal for every Kafka connection in every pipeline that you create. When configured, credentials defined in connection properties override JAAS file credentials.
- Connection properties
- You can define Kerberos credentials in connection properties when the Kafka
connection uses a stage library for Kafka 0.11.0.0 or higher. Define
Kerberos credentials in connection properties when you want to use different
credentials in different Kafka connections. Important: Configuring Kerberos credentials in connection properties is not supported in Transformer pipelines nor in Data Collector cluster pipelines at this time.
Using a Credential Store
You can define Kerberos keytabs in a credential store, then call the appropriate keytab from a Kafka connection.
Defining Kerberos keytabs in a credential store allows you to store multiple keytabs for use by Kafka connections. It also provides flexibility in how you use the keytabs. For example, you might create two separate keytabs, one for connections used in Kafka origins and one for connections used in Kafka destinations. Or, you might provide separate keytabs for every Kafka connection that you define.
Using a credential store makes it easy to update keytabs without having to edit the connections that use them. This can simplify tasks such as recycling keytabs or migrating pipelines to production.
Make sure that Data Collector is configured to use a supported credential store. For a list of supported credential stores and instructions on enabling each credential store, see Credential Stores in the Data Collector documentation.
For an additional layer of security, you can require group access to credential store secrets. For more information, see Group Access to Secrets in the Data Collector documentation.
Enabling SSL/TLS Encryption
When the Kafka cluster uses the Kafka SSL security protocol, enable the Kafka connection to use SSL/TLS encryption.
Before you enable a Kafka connection to use SSL/TLS encryption, make sure that you have performed all necessary prerequisite tasks. Then, perform the following steps to enable the connection to use SSL/TLS encryption to connect to Kafka.
Enabling SSL/TLS Encryption and Authentication
When the Kafka cluster uses the Kafka SSL security protocol and requires client authentication, enable the Kafka connection to use SSL/TLS encryption and authentication.
Before you enable a Kafka connection to use SSL/TLS encryption and authentication, make sure that you have performed all necessary prerequisite tasks. Then, perform the following steps to enable the connection to use SSL/TLS encryption and authentication to connect to Kafka.
Enabling SASL Authentication
When the Kafka cluster uses the SASL_PLAINTEXT security protocol, enable the Kafka connection to use SASL authentication.
Before you enable a Kafka connection to use SASL authentication, make sure that you have performed all necessary prerequisite tasks.
Enabling SASL Authentication on SSL/TLS
When the Kafka cluster uses the SASL_SSL security protocol, enable the Kafka connection to use SASL authentication on SSL/TLS.
Before you enable a Kafka connection to use SASL authentication on SSL/TLS, make sure that you have performed all necessary prerequisite tasks.
Enabling Custom Authentication
To specify security requirements with custom properties, enable the connection to use custom authentication.
With custom authentication, you specify custom properties that contain the information required by the security protocol rather than using the properties in the connection. For example, you can enable custom authentication and then configure custom properties required for the SASL_SSL security protocol rather than enabling SASL Authentication on SSL/TLS.
Before enabling custom authentication, complete any necessary prerequisites for the security methods you are using, as described in the Kafka documentation. For example, if using SSL/TLS to connect to Kafka, you must make sure Kafka is configured for SSL/TLS.