Designing lineage export

When you export data lineage, you must decide about the source and target of the export, what data to include in the export, whether to synchronize data on the target system, or whether to schedule the export jobs.

The configuration of the export asset depends on the target format to which you want to export data lineage. When you integrate with third-party tools like Collibra, you can upload the exported lineage to the target system so you need to configure the target and upload options.
If you export your lineage to the OpenLineage format, you only configure options which are specific to the source lineage. Therefore, you can skip sections about export target and synchronization mode.

Typically when you start creating a data lineage export asset, the target system is defined and configured, and is ready to be used in the export. The source lineage metadata is already imported to a selected project. The data lineage export asset contains information about what lineage data to include, how to process assets that no longer exist on the target system, or when to run the export job.

Export target

The export target is the endpoint where you want to add lineage metadata. Before you create the data lineage export asset, you need to create and configure the Collibra instance and connection. Then, you need to decide on how to connect to this instance.

Collibra instance

The instance is identified by a data source definition. Each instance is configured with attribute and relation type parameters. For details, see Collibra instances.

Uploading lineage to the target system

You can choose whether the generated lineage is added automatically to the target system. If you set the Upload data to the export target option to on, the lineage is added immediately after it is generated. If you leave this option set to off, you can add the generated file to the target system later. To do that, edit the data lineage export job, set this option to on, specify connection details, and start the job again to upload the lineage to the target system.

Connection

Create a connection to Collibra in the same project where you want to create the export asset. Use the same endpoint details as in the data source definition that you configured as Collibra instance. For details, see Collibra connection.

Connection method

You can export lineage metadata by connecting directly from the platform, or by using external Manta agents to connect remotely. You install the Manta agent directly on the Collibra instance and register it in the platform. Then, you can select it when you create a data lineage export. For more information, see Configuring agents for lineage metadata import.

Export source

To identify the lineage to export, select a data source definition for the source technology. You can select source lineage from any project, the lineage metadata import asset doesn't need to be in the same project where you create the export asset. For the list of data sources from which you can export lineage, see:

Advanced options

Export goals

Decide what elements of lineage to include in the export.

Important: When you set the export job to run regularly, make sure that you select the same data scope each time. The data is synchronized on the target system, and any assets that were exported earlier but are not present in the current export are deleted from the target system. You can disable the synchronization with the synchronization mode setting.
Data assets
New assets are created if you select this option.
Transformation assets and lineage relationships
Only the transformations and lineage relations are exported. No new assets are created, except mapping specifications, if this is the only selected option. Existing assets in Collibra are used as the sources and targets of the exported relationships. You might choose this option when you want to create a lineage for a physical data catalog that is generated by Collibra Catalog. Decide what type of transformation assets to export:
  • Transformations that have at least one source or target
  • Transformations that have both source and target
  • Transformations that don’t have any lineage

If you want to export the complete lineage, with physical data, transformation assets and lineage relationships, select both options.

Deduced assets
A deduced asset is an inferred object that is created by the system when it encounters references to unknown or missing components during data lineage extraction. Deduced assets are created to fill the gaps in an incomplete lineage. The source deduced asset is often referenced as an unknown server. You can export deduced assets when you include transformation assets and lineage relationships in your export.

Synchronization mode

When the initial export of lineage to Collibra is set, the data can be updated in the target system regularly to ensure that all recent changes are present. In some cases, the assets that were exported to the target system earlier, are no longer present in the new export. Choose one of the following synchronization modes to decide what to do with assets that are present in the target system, but not in the latest export file.

Change missing asset status
The assets that are present on the target system but not in the latest export file get a new status, which is Obsolete. You can then easily find such assets, and decide to govern them later, or delete manually after further verification.
Delete
After the exported data is uploaded to the target system, the assets that are present in the target system but not in the latest export file are deleted automatically.
No synchronization
When you select this mode, assets are not modified or deleted in the target system, they are only imported. This mode is faster than the other modes, but data is not synchronized.

Lineage aggregation level

The lineage aggregation level specifies the asset types between which the relations are exported.

Column and table level
The exported lineage contains relations between both tables and columns. It is the most detailed lineage.
Table level
The exported lineage contains relations only between tables. Columns are not exported at all.
Column level
The exported lineage contains relations only between columns. Tables are exported as parent assets of columns, but relations between tables are not exported. This option is available only for Collibra.

Schedule

If you don't set a schedule, you run the export when you initially save the data lineage export asset. You can rerun the export manually at any time.

If you select to run the export on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job runs exactly one time at the specified day and time. If you schedule recurring runs, the job runs for the first time at the timestamp that is indicated in the Recurrence section.

The default name of the export job is data_lineage_export_name job. You can change the name to fit your naming schema. You can access the export job that you create from within the data lineage export asset or from the project's Jobs page.

What to do next

When you are ready, create an export asset and start the first job. For details, see Creating an export asset and managing jobs.