Apache Kafka connection

To access your data with Apache Kafka, create a connection asset for it.

Apache Kafka is a distributed event streaming platform. Connect to an Apache Kafka real-time processing server to write and to read streams of events from and into topics.

Supported versions

Apache Kafka versions 0.11 - 2.x

Create a connection to Apache Kafka

To create the connection asset, you need these connection details.

For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Kafka server hostname: Hostname and port number for the Kafka server. Use this format: hostname:port-number. To connect to a Kafka cluster, separate the values with commas: hostname1:port-number1,hostname2:port-number2,hostname3:port-number3. If you connect to a cluster, the connection uses all the servers irrespective of which servers are specified for bootstrapping. Because these servers are used for the initial connection to discover the full cluster membership, which can change dynamically, this list does not need to contain the full set of servers. But if the Kafka cluster has three hosts or fewer, include all the hosts in this list to prevent data loss.

Secure connection

Select the network authentication protocol that is set up on the Kafka server. None is the default. These are the other selections and their properties:

Kerberos

Prerequisite for Kerberos authentication

To use Kerberos authentication, the data source must be configured for Kerberos and the service that you plan to use the connection in must support Kerberos. For information, see Enabling platform connections to use Kerberos authentication.

User principal name: The user principal that is configured to access a Kafka server that is configured for Kerberos. The Kerberos administrator creates user principals in the Kerberos server. The user principal name has three components: Primary, Instance, and Realm. The Instance component is optional. A valid user principal name is Kafka-user@example.com.

Keytab: The fully qualified path to the keytab file for the specified user. You must have the permission to read the keytab files.

Important: The keytab file has to be placed on the storage and mounted on PX engine pods.

Alternatively, you can enter the keytab content in Base64-encoded format.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

  • Truststore location: Location of the truststore file, for example, /opt/kafka/certs/client.truststore.jks.
  • Truststore password: Password for the truststore file.

SASL_Plain

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

SASL_SSL

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

  • Truststore location: Location of the truststore file, for example, /opt/kafka/certs/client.truststore.jks.
  • Truststore password: Password for the truststore file.

SCRAM-SHA-256 or SCRAM-SHA-512

User principal name: The authenticated user in the Kafka server or cluster.

Password: Password for the user principal name.

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter a value for Truststore certificates.

  • Truststore location: Location of the truststore file, for example, /opt/kafka/certs/client.truststore.jks.
  • Truststore password: Password for the truststore file.

SSL

Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.

Key: Private key in PEM format. The key must use PKCS #8 syntax.

Key certificates chain: Certificate chain for the private key in PEM format. Only X.509 certificates are supported.

Key password: This value is required if the key is encrypted.

Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

Important: If you enter values for the legacy keystore configuration, do not enter values for Truststore certificates, Key, or Key certificates chain.

  • Truststore location: Location of the truststore file, for example, /opt/kafka/certs/client.truststore.jks.
  • Truststore password: Password for the truststore file.
  • Keystore location: Location of the keystore file, for example, /opt/kafka/certs/client.keystore.jks.
  • Keystore password: Password for the keystore file.

Message format

A schema registry is third-party software that manages the messages and maps the schemas to topics so that producers know which topics are accepting which types (schemas) of messages and consumers know how to read and parse messages in a topic. If you select Use Schema Registry for message format, you can select these additional details to securely connect to the schema registry service.

Prerequisite

Set up the schema registry for your Kafka server with Confluent.
Confluent versions 6.x and 7.x are supported.

Schema Registry URL: URL to the schema registry service.

Authentication

Select the authentication method to the schema registry service. None is the default. These are the other selections and their properties:

  • Use Kafka server SASL user credentials: You can choose this selection if you entered properties for SASL_Plain or SASL_SSL for the secure connection to the Kafka server. The username and password for the SASL security settings will be used for authentication to schema registry service.

  • User credentials: Username and password to the schema registry service.

Secure connection

Select the secure network authentication protocol to the schema registry service. None is the default. These are the other selections and their properties:

  • Use Kafka server SSL user credentials: You can choose this selection if you entered properties for SSL for the secure connection to the Kafka server. The certificates configuration from the Kafka server connection will be used for the secure connection to schema registry service.

  • SSL

    • Truststore certificates: Truststore certificates in PEM format. Only X.509 certificates are supported.
    • Key: Private key in PEM format. The key must use PKCS #8 syntax.
    • Key certificates chain: Certificate chain for the private key in PEM format. Only X.509 certificates are supported.
    • Key password: This value is required if the key is encrypted.

      Select Use legacy keystore configuration to run jobs that were set up for the traditional version of DataStage. These fields are for DataStage jobs that require files to be present in storage, instead of PEM format.

    Important: If you enter values for the legacy keystore configuration, do not enter values for Truststore certificates, Key, Key certificates chain, or Key password.

    • Truststore location: Location of schema registry truststore file.
    • Truststore password: Password for schema registry truststore file.
    • Keystore location: Location of the schema registry keystore file, for example, /opt/kafka/certs/client.keystore.jks.
    • Keystore password: Password for the schema registry key file.

Schema Registry type

Select Confluent.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Data access tools > Connection. See Adding a connection to a project.

In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.

In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use the Apache Kafka connection in the following workspaces and tools:

Projects

Catalogs

  • Platform assets catalog
  • Other catalogs (Watson Knowledge Catalog)
    Note: Preview, profile, and masking do not work for this connection in Watson Knowledge Catalog.

Apache Kafka setup

Known issue

Learn more

Kafka documentation

Parent topic: Supported connections