OpenLineage lineage configuration
To import lineage metadata from OpenLineage, create a connection, data source definition and metadata import job.
This information applies to IBM Manta Data Lineage service.
To import lineage metadata for OpenLineage, complete these steps:
- Create a data source definition.
- Create a connection to the data source in a project.
- Create a metadata import.
Creating a data source definition
Create a data source definition. Select OpenLineage as the data source type.
Creating a connection to OpenLineage
Create a connection to the data source in a project. For connection details, see OpenLineage connection.
Creating a metadata import
Create a metadata import. Learn more about options that are specific to OpenLineage data source:
Include and exclude lists
You can include or exclude assets by using job namespaces in OpenLineage events. The whole input is evaluated as a regular expression. Example values:
myPrestoApp1Namespace
: all events with job namespacemyPrestoApp1Namespace
.mySparkApp[1-5]Namespace
: all events with job namespace that starts withmySparkApp1Namespace
and ends with a digit between 1 and 5.
External inputs
You can add OpenLineage events as external inputs. The file can have the following structure:
<event_file_name>.json
Additional information
Column level lineage In some cases, events do not contain column-level lineage information. Each source column is then connected to all target columns, which generates inadequate lineage. Starting in 5.1.2, a smart mapping method is used. This method starts with matching source columns to target columns based on their names. For the rest of the columns that do not have a matching column, the previous method is used. As a result, the column level lineage is more adequate.
Learn more
Parent topic: Supported connectors for lineage import