OpenLineage

OpenLineage is an open framework for data lineage collection and analysis. OpenLineage is directly supported by Keboola and can be integrated with dbt, Apache Airflow, and Apache Spark.

Manta’s OpenLineage scanner processes OpenLineage events, visualizes the data flow within the analyzed system, and (in cooperation with other scanners) displays it in the context of its entire environment.

Check out the guides below for more details on setting up this scanner.

Extraction and Analysis Phase Scenarios

Extraction Phase

For the extraction phase for OpenLineage, there is only one scenario.

  1. Openlineage extractor scenario — preprocesses events placed into manual input location and makes it ready for Analysis Phase.

  2. OpenLineage ingestion scenario - pulls inputs from git Manta Flow Agent Configuration for Extraction:Git Source or a remote agent filesystem location Manta Flow Agent Configuration for Extraction:Agent Source.

Analysis Phase

At the beginning of the analysis phase, a new revision scenario is called, which creates a new revision in the internal metadata repository so it is ready to accept new metadata.

For the analysis phase for openlineage events, there is the following scenario:

  1. Openlineage dataflow scenario — harvests metadata and lineage from the extracted events and saves it in your Automatic Data Lineage metadata repository.