Kafka Resource Configuration

The following are the prerequisites necessary for IBM Automatic Data Lineage to connect to this third-party system, which you may choose to do at your sole discretion. Please note that while these are usually sufficient to connect to this third-party system, we cannot guarantee the success of the connection or integration since we have no control, liability, or responsibility for third-party products or services, including for their performance.

Supported Features

This is a list of the Kafka Schema Registry features that Automatic Data Lineage supports. There may be features that aren't explicitly named but are included in the named items, as this list aims to be a high-level overview. Otherwise, features that are not listed are primarily considered not supported.

Known Unsupported Features

Automatic Data Lineage does not support the following Kafka Schema Registry features. This list includes all of the features that IBM is aware are unsupported, but it might not be comprehensive.

Kafka Source System Properties

This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / Kafka / specific connection. New connection can also be created via Manta Orchestration API.

One Automatic Data Lineage connection for Kafka corresponds to one Kafka cluster that will be analyzed.

Property name

Description

Example

kafka.dictionary.id

Name of a resource representing this Kafka cluster known as the dictionary ID, also used as an input subdirectory name for manually provided files for the given cluster.

Dictionary ID is used as the name of the cluster.

kafka

kafka.extractor.brokerList

List of broker addresses for the given Kafka cluster. Individual brokers are in the format address:port and are separated by commas.

192.168.0.16:9092,192.168.0.16:9093

prod.getmanta.com:9092,prod.getmanta.com:9093

kafka.extractor.schemaRegistryEnabled

True enables the extraction from Schema Registry. Otherwise, it is false.

true

false

kafka.schemaRegistry.type

Type of Schema Registry. The types currently supported are Confluent and Cloudera.

Confluent

Cloudera

kafka.schemaRegistry.address

URL of the Schema Registry server, only considered for the extraction from Schema Registry.

172.30.0.75

prod.getmanta.com

kafka.schemaRegistry.port

Port of the Schema Registry server, only considered for the extraction from Schema Registry.

8081

kafka.schemaRegistry.basicAuthEnabled

If the Schema Registry server requires HTTP basic authentication, the property is set to true. Otherwise, it is false, only considered for extraction from Schema Registry.

true

false

kafka.schemaRegistry.username

The user used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry.

admin

kafka.schemaRegistry.password

Password used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry.

admin

kafka.schemaRegistry.scheme

Scheme of the Schema Registry server used for extraction. The default value is http.

http

https

kafka.schemaRegistry.include.topics

List of topics to be extracted from Schema Registry. It is possible to use regular expressions to list the topics.

views,orders,[a-c]-topic

kafka.schemaRegistry.exclude.topics

List of topics excluded from Schema Registry extraction. It is possible to use regular expressions to list the topics.

views,orders,[a-c]-topic

kafka.schemaRegistry.namingStrategy

Naming strategy for mapping topics to subjects. The strategies currently supported are Topic Name, Simple Topic Name, and Custom Name (described in the Naming Strategies section).

Topic name

Simple topic name

Custom name

kafka.input.encoding

Encoding used for manually provided inputs. The default value is UTF-8. See Encodings for applicable values.

UTF-8

Common Properties

This configuration is common for all Kafka source systems and for all Kafka scenarios, and is configure in Admin UI / Configuration / CLI / Kafka / Kafka Common. It can be overridden on individual connection level.

Property name Description Example
kafka.dictionary.dir Directory with data dictionaries extracted from Kafka ${manta.dir.temp}/kafka
filepath.lowercase Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise) true
kafka.input.dir A directory with extracted schemas from Schema Registry ${manta.dir.temp}/kafka/${kafka.dictionary.id}
kafka.manualInput.dir A directory with manually provided schemas ${manta.dir.input}/kafka/${kafka.dictionary.id}
kafka.dictionary.mappingFile Path to automatically generated mappings for Kafka databases ${manta.dir.temp}/kafka/kafkaDictionaryMantaMapping.csv
kafka.dictionary.mappingManualFile Path to manually provided mappings for Kafka databases ${manta.dir.scenario}/conf/kafkaDictionaryMantaMappingManual.csv
kafka.schemaRegistry.customNamingStrategyFile Path to manually provided mappings for custom naming strategy ${manta.dir.input}/kafka/${kafka.dictionary.id}/kafkaCustomNamingStrategy.csv

Manual Mapping Properties

It is possible to manually configure mappings for Kafka clusters. Each mapping has its own row with the following parts separated by semicolons.

Property name Description Example
Dictionary ID Name of a resource representing this Kafka server known as the dictionary ID kafka
Broker URL Broker URL in the format address:port belonging to the Kafka cluster. Note that for the multi-broker cluster, each broker has to be on a separate line in the dictionary mappings. 192.168.0.16:9092
Connection ID External Kafka connection ID in third-party tools; can be left empty broker

Include/Exclude Topics

The properties kafka.schemaRegistry.include.topics and kafka.schemaRegistry.exclude.topics in the connection configuration are used to select topics that should be included/excluded in/from the Schema Registry extraction. Leaving the include topics property empty leads to the extraction of all topics in Schema Registry.

Include Topics Example

# extracts only topics with the names example_topic and example_topic_2 if they are present in Schema Registry
kafka.schemaRegistry.include.topics=example_topic,example_topic_2
kafka.schemaRegistry.exclude.topics=

Include Topics with a Regular Expression

It is also possible to list topics with a regular expression.

# extracts only topics with the names a-topic, b-topic, or c-topic if they are present in Schema Registry
kafka.schemaRegistry.include.topics=[a-c]-topic
kafka.schemaRegistry.exclude.topics=

Exclude Topics Example

# extracts all topics except example_topic and example_topic_2
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=example_topic,example_topic_2

Exclude Topics with a Regular Expression

# extracts all topics except topics with the names a-topic, b-topic and c-topic
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=[a-c]-topic

Naming Strategies

The Kafka API does not have information on how subjects correspond to topics. As such, we need a way to match them ourselves. We do this by naming both according to the following strategies.

Topic Name

The subject has the same name as the topic, but with “-value” at the end.

For example, a topic named “events” will correspond to a subject named “events-value”.

Simple Topic Name

The subject name is exactly the same as the topic.

For a topic named “events”, the subject would be “events” as well.

Custom Name

Subject names are mapped to topic names by a CSV file defined in Common Properties, such that any custom names may be used as long as they are specified.

The format of this CSV file is simply a header row with “Topic name” and “Subject name”, followed by each entry in the following rows.

The example below will match a topic named topic1 to a subject named subject1 and the topic someName to the subject someOtherName.

"Topic name";"Subject name"
topic1;subject1
someName;someOtherName