OpenLineage lineage configuration

To import lineage metadata from OpenLineage, create a connection, data source definition and metadata import job.

This information applies to IBM Manta Data Lineage service.

To import lineage metadata for OpenLineage, complete these steps:

  1. Create a data source definition.
  2. Create a connection to the data source in a project.
  3. Create a metadata import.

Creating a data source definition

Create a data source definition. Select OpenLineage as the data source type.

Creating a connection to OpenLineage

Create a connection to the data source in a project. For connection details, see OpenLineage connection.

Creating a metadata import

Create a metadata import. Learn more about options that are specific to OpenLineage data source:

Include and exclude lists

You can include or exclude assets by using job namespaces in OpenLineage events. The whole input is evaluated as a regular expression. Example values:

  • myPrestoApp1Namespace: all events with job namespace myPrestoApp1Namespace.
  • mySparkApp[1-5]Namespace: all events with job namespace that starts with mySparkApp1Namespace and ends with a digit between 1 and 5.

External inputs

You can add OpenLineage events as external inputs. The file can have the following structure:

<event_file_name>.json

Additional information

Column level lineage In some cases, events do not contain column-level lineage information. Each source column is then connected to all target columns, which generates inadequate lineage. Starting in 5.1.2, a smart mapping method is used. This method starts with matching source columns to target columns based on their names. For the rest of the columns that do not have a matching column, the previous method is used. As a result, the column level lineage is more adequate.

Learn more

Parent topic: Supported connectors for lineage import