OpenLineage connection (IBM Knowledge Catalog)
To access your data in OpenLineage, create a connection asset for it.
OpenLineage is an open framework that can be used to collect and analyze data lineage.
Create a connection to OpenLineage
To create the connection asset, you need the following connection details:
- Hostname or IP address
- Port number
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click Assets > New asset > Prepare data > Connect to a data source. See Adding a connection to a project.
- In a catalog
- Click Add to catalog > Connection. See Adding a connection asset to a catalog.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Configuring lineage metadata import for OpenLineage
When you create a metadata import for the OpenLineage connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.
To import lineage metadata for OpenLineage, complete these steps:
- Create a data source definition. Select OpenLineage as the data source type.
- Create a connection to the data source in a project.
- Create a metadata import. Learn more about options that are specific to OpenLineage data source:
- When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact job namespaces that you want to be analyzed. See Include and exclude lists.
- Optionally, you can provide external input. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.
Include and exclude lists
You can include or exclude assets by using job namespaces in OpenLineage events. The whole input is evaluated as a regular expression. Example values:
myPrestoApp1Namespace
: all events with job namespacemyPrestoApp1Namespace
.mySparkApp[1-5]Namespace
: all events with job namespace that starts withmySparkApp1Namespace
and ends with a digit between 1 and 5.
External inputs
You can add OpenLineage events as external inputs. The file can have the following structure:
<event_file_name>.json
Additional information
Column level lineage
In some cases, events do not contain column-level lineage information. Each source column is then
connected to all target columns, which generates inadequate lineage. Starting in 5.1.2, a smart mapping method is used. This method starts with matching source columns to target columns based on their names. For the rest of the columns that
do not have a matching column, the previous method is used. As a result, the column level lineage is more adequate.
Parent topic: Supported connections