Kafka Resource Configuration
Before you configure your scanner, make sure you meet the prerequisites. Read our guide on Kafka integration requirements to double-check.
Kafka Source System Properties
This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / Kafka / specific connection. New connection can also be created via Manta Orchestration API.
One Automatic Data Lineage connection for Kafka corresponds to one Kafka cluster that will be analyzed.
|
Property name |
Description |
Example |
|---|---|---|
|
kafka.dictionary.id |
Name of a resource representing this Kafka cluster known as the dictionary ID, also used as an input subdirectory name for manually provided files for the given cluster. Dictionary ID is used as the name of the cluster. |
kafka |
|
kafka.extractor.brokerList |
List of broker addresses for the given Kafka cluster. Individual brokers are in the format address:port and are separated by commas. |
|
|
kafka.extractor.schemaRegistryEnabled |
True enables the extraction from Schema Registry. Otherwise, it is false. |
|
|
kafka.schemaRegistry.type |
Type of Schema Registry. The types currently supported are Confluent and Cloudera. |
|
|
kafka.schemaRegistry.address |
URL of the Schema Registry server, only considered for the extraction from Schema Registry. |
|
|
kafka.schemaRegistry.port |
Port of the Schema Registry server, only considered for the extraction from Schema Registry. |
|
|
kafka.schemaRegistry.basicAuthEnabled |
If the Schema Registry server requires HTTP basic authentication, the property is set to true. Otherwise, it is false, only considered for extraction from Schema Registry. |
|
|
kafka.schemaRegistry.username |
The user used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry. |
admin |
|
kafka.schemaRegistry.password |
Password used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry. |
admin |
|
kafka.schemaRegistry.scheme |
Scheme of the Schema Registry server used for extraction. The default value is http. |
http https |
|
kafka.schemaRegistry.include.topics |
List of topics to be extracted from Schema Registry. It is possible to use regular expressions to list the topics. |
|
|
kafka.schemaRegistry.exclude.topics |
List of topics excluded from Schema Registry extraction. It is possible to use regular expressions to list the topics. |
|
|
kafka.schemaRegistry.namingStrategy |
Naming strategy for mapping topics to subjects. The strategies currently supported are Topic Name, Simple Topic Name, and Custom Name (described in the Naming Strategies section). |
|
|
kafka.input.encoding |
Encoding used for manually provided inputs. The default value is UTF-8. See Encodings for applicable values. |
UTF-8 |
Common Properties
This configuration is common for all Kafka source systems and for all Kafka scenarios, and is configure in Admin UI / Configuration / CLI / Kafka / Kafka Common. It can be overridden on individual connection level.
| Property name | Description | Example |
|---|---|---|
| kafka.dictionary.dir | Directory with data dictionaries extracted from Kafka | ${manta.dir.temp}/kafka |
| filepath.lowercase | Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise) | true |
| kafka.input.dir | A directory with extracted schemas from Schema Registry | ${manta.dir.temp}/kafka/${kafka.dictionary.id} |
| kafka.manualInput.dir | A directory with manually provided schemas | ${manta.dir.input}/kafka/${kafka.dictionary.id} |
| kafka.dictionary.mappingFile | Path to automatically generated mappings for Kafka databases | ${manta.dir.temp}/kafka/kafkaDictionaryMantaMapping.csv |
| kafka.dictionary.mappingManualFile | Path to manually provided mappings for Kafka databases | ${manta.dir.scenario}/conf/kafkaDictionaryMantaMappingManual.csv |
| kafka.schemaRegistry.customNamingStrategyFile | Path to manually provided mappings for custom naming strategy | ${manta.dir.input}/kafka/${kafka.dictionary.id}/kafkaCustomNamingStrategy.csv |
| kafka.extractor.kerberos.toggle (as of 42.13) | If the setting is set to true, requests to Kafka Schema Registry APIs are sent by using the Kerberos Authentication. In such case, Kerberos details such as keytab and krb5.conf are required. |
false |
| kafka.extractor.kerberos.keytab (as of 42.13) | The name of the Kerberos Keytab file with the associated user or service principal. Add this file to the connection input folder: ${manta.dir.input}/kafka/${kafka.dictionary.id}/ |
principalName.keytab |
| kafka.extractor.kerberos.krb5 (as of 42.13) | The name of the krb5 .conf file with the Kerberos configuration details. Add this file to the connection input folder: ${manta.dir.input}/kafka/${kafka.dictionary.id}/ |
krb5.conf |
| kafka.extractor.kerberos.principal (as of 42.13) | The name of the user or service principal that is used for the Kerberos authentication. | principal/subdomain@DOMAIN.COM |
Manual Mapping Properties
It is possible to manually configure mappings for Kafka clusters. Each mapping has its own row with the following parts separated by semicolons.
| Property name | Description | Example |
|---|---|---|
| Dictionary ID | Name of a resource representing this Kafka server known as the dictionary ID | kafka |
| Broker URL | Broker URL in the format address:port belonging to the Kafka cluster. Note that for the multi-broker cluster, each broker has to be on a separate line in the dictionary mappings. | 192.168.0.16:9092 |
| Connection ID | External Kafka connection ID in third-party tools; can be left empty | broker |
Include/Exclude Topics
The properties kafka.schemaRegistry.include.topics and
kafka.schemaRegistry.exclude.topics in the connection configuration are used to select topics that should be included/excluded in/from the Schema Registry extraction. Leaving the include topics property empty leads to the extraction
of all topics in Schema Registry.
Include Topics Example
# extracts only topics with the names example_topic and example_topic_2 if they are present in Schema Registry
kafka.schemaRegistry.include.topics=example_topic,example_topic_2
kafka.schemaRegistry.exclude.topics=
Include Topics with a Regular Expression
It is also possible to list topics with a regular expression.
# extracts only topics with the names a-topic, b-topic, or c-topic if they are present in Schema Registry
kafka.schemaRegistry.include.topics=[a-c]-topic
kafka.schemaRegistry.exclude.topics=
Exclude Topics Example
# extracts all topics except example_topic and example_topic_2
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=example_topic,example_topic_2
Exclude Topics with a Regular Expression
# extracts all topics except topics with the names a-topic, b-topic and c-topic
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=[a-c]-topic
Naming Strategies
The Kafka API does not have information on how subjects correspond to topics. As such, we need a way to match them ourselves. We do this by naming both according to the following strategies.
Topic Name
The subject has the same name as the topic, but with “-value” at the end.
For example, a topic named “events” will correspond to a subject named “events-value”.
Simple Topic Name
The subject name is exactly the same as the topic.
For a topic named “events”, the subject would be “events” as well.
Custom Name
Subject names are mapped to topic names by a CSV file defined in Common Properties, such that any custom names may be used as long as they are specified.
The format of this CSV file is simply a header row with “Topic name” and “Subject name”, followed by each entry in the following rows.
The example below will match a topic named topic1 to a subject named
subject1 and the topic someName to the subject someOtherName.
"Topic name";"Subject name"
topic1;subject1
someName;someOtherName