Kafka Resource Configuration
The following are the prerequisites necessary for IBM Automatic Data Lineage to connect to this third-party system, which you may choose to do at your sole discretion. Please note that while these are usually sufficient to connect to this third-party system, we cannot guarantee the success of the connection or integration since we have no control, liability, or responsibility for third-party products or services, including for their performance.
-
Confluent Platform Schema Registry accessible via network
-
Connection parameters for Schema Registry
-
Schema Registry address
-
Schema Registry port
-
(Basic HTTP authentication activated) user
-
(Basic HTTP authentication activated) password
-
-
One of the supported naming strategies for subjects in Schema Registry
Supported Features
This is a list of the Kafka Schema Registry features that Automatic Data Lineage supports. There may be features that aren't explicitly named but are included in the named items, as this list aims to be a high-level overview. Otherwise, features that are not listed are primarily considered not supported.
-
HTTP basic authentication for Schema Registry
-
JSON Schema schemas
-
Avro schemas
-
Extraction of schemas from JSON payloads
-
Confluent Schema Registry extraction over HTTPS
-
Cloudera Schema Registry extraction over HTTP
-
Naming strategies for subjects (described in the Naming Strategies section):
-
Topic name
-
Simple topic name
-
Custom name
-
Known Unsupported Features
Automatic Data Lineage does not support the following Kafka Schema Registry features. This list includes all of the features that IBM is aware are unsupported, but it might not be comprehensive.
-
Confluent Cloud Schema Registry and different Schema Registry implementations
-
Different subject naming strategies (RecordName, TopicRecordName)
-
Protobuf schemas (Protocol Buffers)
Kafka Source System Properties
This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / Kafka / specific connection. New connection can also be created via Manta Orchestration API.
One Automatic Data Lineage connection for Kafka corresponds to one Kafka cluster that will be analyzed.
Property name |
Description |
Example |
---|---|---|
kafka.dictionary.id |
Name of a resource representing this Kafka cluster known as the dictionary ID, also used as an input subdirectory name for manually provided files for the given cluster. Dictionary ID is used as the name of the cluster. |
kafka |
kafka.extractor.brokerList |
List of broker addresses for the given Kafka cluster. Individual brokers are in the format address:port and are separated by commas. |
|
kafka.extractor.schemaRegistryEnabled |
True enables the extraction from Schema Registry. Otherwise, it is false. |
|
kafka.schemaRegistry.type |
Type of Schema Registry. The types currently supported are Confluent and Cloudera. |
|
kafka.schemaRegistry.address |
URL of the Schema Registry server, only considered for the extraction from Schema Registry. |
|
kafka.schemaRegistry.port |
Port of the Schema Registry server, only considered for the extraction from Schema Registry. |
|
kafka.schemaRegistry.basicAuthEnabled |
If the Schema Registry server requires HTTP basic authentication, the property is set to true. Otherwise, it is false, only considered for extraction from Schema Registry. |
|
kafka.schemaRegistry.username |
The user used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry. |
admin |
kafka.schemaRegistry.password |
Password used for Schema Registry basic authentication, only considered for basic authentication for Schema Registry. |
admin |
kafka.schemaRegistry.scheme |
Scheme of the Schema Registry server used for extraction. The default value is http. |
http https |
kafka.schemaRegistry.include.topics |
List of topics to be extracted from Schema Registry. It is possible to use regular expressions to list the topics. |
|
kafka.schemaRegistry.exclude.topics |
List of topics excluded from Schema Registry extraction. It is possible to use regular expressions to list the topics. |
|
kafka.schemaRegistry.namingStrategy |
Naming strategy for mapping topics to subjects. The strategies currently supported are Topic Name, Simple Topic Name, and Custom Name (described in the Naming Strategies section). |
|
kafka.input.encoding |
Encoding used for manually provided inputs. The default value is UTF-8. See Encodings for applicable values. |
UTF-8 |
Common Properties
This configuration is common for all Kafka source systems and for all Kafka scenarios, and is configure in Admin UI / Configuration / CLI / Kafka / Kafka Common. It can be overridden on individual connection level.
Property name | Description | Example |
---|---|---|
kafka.dictionary.dir | Directory with data dictionaries extracted from Kafka | ${manta.dir.temp}/kafka |
filepath.lowercase | Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise) | true |
kafka.input.dir | A directory with extracted schemas from Schema Registry | ${manta.dir.temp}/kafka/${kafka.dictionary.id} |
kafka.manualInput.dir | A directory with manually provided schemas | ${manta.dir.input}/kafka/${kafka.dictionary.id} |
kafka.dictionary.mappingFile | Path to automatically generated mappings for Kafka databases | ${manta.dir.temp}/kafka/kafkaDictionaryMantaMapping.csv |
kafka.dictionary.mappingManualFile | Path to manually provided mappings for Kafka databases | ${manta.dir.scenario}/conf/kafkaDictionaryMantaMappingManual.csv |
kafka.schemaRegistry.customNamingStrategyFile | Path to manually provided mappings for custom naming strategy | ${manta.dir.input}/kafka/${kafka.dictionary.id}/kafkaCustomNamingStrategy.csv |
Manual Mapping Properties
It is possible to manually configure mappings for Kafka clusters. Each mapping has its own row with the following parts separated by semicolons.
Property name | Description | Example |
---|---|---|
Dictionary ID | Name of a resource representing this Kafka server known as the dictionary ID | kafka |
Broker URL | Broker URL in the format address:port belonging to the Kafka cluster. Note that for the multi-broker cluster, each broker has to be on a separate line in the dictionary mappings. | 192.168.0.16:9092 |
Connection ID | External Kafka connection ID in third-party tools; can be left empty | broker |
Include/Exclude Topics
The properties kafka.schemaRegistry.include.topics
and
kafka.schemaRegistry.exclude.topics
in the connection configuration are used to select topics that should be included/excluded in/from the Schema Registry extraction. Leaving the include topics property empty leads to the extraction
of all topics in Schema Registry.
Include Topics Example
# extracts only topics with the names example_topic and example_topic_2 if they are present in Schema Registry
kafka.schemaRegistry.include.topics=example_topic,example_topic_2
kafka.schemaRegistry.exclude.topics=
Include Topics with a Regular Expression
It is also possible to list topics with a regular expression.
# extracts only topics with the names a-topic, b-topic, or c-topic if they are present in Schema Registry
kafka.schemaRegistry.include.topics=[a-c]-topic
kafka.schemaRegistry.exclude.topics=
Exclude Topics Example
# extracts all topics except example_topic and example_topic_2
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=example_topic,example_topic_2
Exclude Topics with a Regular Expression
# extracts all topics except topics with the names a-topic, b-topic and c-topic
kafka.schemaRegistry.include.topics=
kafka.schemaRegistry.exclude.topics=[a-c]-topic
Naming Strategies
The Kafka API does not have information on how subjects correspond to topics. As such, we need a way to match them ourselves. We do this by naming both according to the following strategies.
Topic Name
The subject has the same name as the topic, but with “-value” at the end.
For example, a topic named “events” will correspond to a subject named “events-value”.
Simple Topic Name
The subject name is exactly the same as the topic.
For a topic named “events”, the subject would be “events” as well.
Custom Name
Subject names are mapped to topic names by a CSV file defined in Common Properties, such that any custom names may be used as long as they are specified.
The format of this CSV file is simply a header row with “Topic name” and “Subject name”, followed by each entry in the following rows.
The example below will match a topic named topic1
to a subject named
subject1
and the topic someName
to the subject someOtherName
.
"Topic name";"Subject name"
topic1;subject1
someName;someOtherName