Apache Kafka connection

To access your data with Apache Kafka, create a connection asset for it.

Apache Kafka is a distributed event streaming platform. Connect to an Apache Kafka real-time processing server to write and to read streams of events from and into topics.

Supported versions

Apache Kafka versions 0.11 - 2.x

Create a connection to Apache Kafka

To create the connection asset, you need these connection details.

For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Kafka server hostname: Hostname and port number for the Kafka server. Use this format: hostname:port-number. To connect to a Kafka cluster, separate the values with commas: hostname1:port-number1,hostname2:port-number2,hostname3:port-number3. If you connect to a cluster, the connection uses all the servers irrespective of which servers are specified for bootstrapping. Because these servers are used for the initial connection to discover the full cluster membership, which can change dynamically, this list does not need to contain the full set of servers. But if the Kafka cluster has three hosts or fewer, include all the hosts in this list to prevent data loss.

Secure connection

Select the network authentication protocol that is set up on the Kafka server. None is the default. These are the other selections and their properties:

Kerberos

User principal name: The user principal that is configured to access a Kafka server that is configured for Kerberos. The Kerberos administrator creates user principals in the Kerberos server. The user principal name has three components: Primary, Instance, and Realm. The Instance component is optional. A valid user principal name is Kafka-user@example.com.

Keytab: The fully qualified path to the keytab file for the specified user. You must have the permission to read the keytab files.

Important: The keytab file has to be placed on the storage and mounted on PX engine pods.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

SASL_Plain

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

SASL_SSL

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

SCRAM-SHA-256 or SCRAM-SHA-512

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

SSL

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Key: Private key in PEM format. The key must use PKCS #8 syntax.

Key certificates chain: Certificate chain for the private key in PEM format. Only X.509 certificates are supported.

Key password: This value is required if the key is encrypted.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter values for Truststore certificates, Key, or Key certificates chain.

Message format

A schema registry is third-party software that manages the messages and maps the schemas to topics so that producers know which topics are accepting which types (schemas) of messages and consumers know how to read and parse messages in a topic. If you select Use Schema Registry for message format, you can select these additional details to securely connect to the schema registry service.

Prerequisite

Set up the schema registry for your Kafka server with Confluent.
Confluent versions 6.x and 7.x are supported.

Schema Registry URL: URL to the schema registry service.

Authentication

Select the authentication method to the schema registry service. None is the default. These are the other selections and their properties:

Secure connection

Select the secure network authentication protocol to the schema registry service. None is the default. These are the other selections and their properties:

Schema Registry type

Select Confluent.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Data access tools > Connection. See Adding a connection to a project.


In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.


In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use the Apache Kafka connection in the following workspaces and tools:

Analytics projects

Catalogs

Apache Kafka setup

Known issue

Learn more

Kafka documentation

Parent topic: Supported connections