Viewing data lineage

You can view lineage for the assets that are in the lineage repository. Choose predefined configurations or select custom.

Required permission

You must have the following user permission:

  • Manage data lineage or Access data lineage

Viewing lineage

To view your lineage:

  1. Go to Data > Data lineage > View lineage tab to view lineage.
  2. Add assets from the repository as the starting assets for your lineage. The chosen assets are listed in the Selected assets panel.
  3. Customize the initial view of your lineage. See Lineage filters.
  4. Click View lineage.

Starting assets and starting parents

When you select a starting asset, its related assets up and down in the hierarchy are marked as starting parents and starting assets. For example, when a table is selected as a starting asset, this table and all its columns are marked as starting assets. The schema and database are marked as starting parents.

Identical data assets If any of the assets that you use for lineage is an identical data asset, the asset references a set of shared properties. When you update shared properties for an identical data asset, the properties are updated for all data assets with the same unique identifier in governed catalogs that enforce shared properties across workspaces and in all projects where the asset isn't in the Draft state. For more information, see Identical data assets.

Repository

All available lineage assets are listed in the Repository. In the search bar, you can look for a specific asset.

Note: The search bar does not support any special characters.

To narrow down your search, you can filter assets by: technologies, types, business terms, or tags.

Previous versions of the data lineage

Access earlier versions of lineage. When a system is added and rescanned, you can choose to view lineage from previous scans, allowing you to track changes and analyze how lineage has evolved over time. Click the Lineage versions icon to open a window, where you can select any earlier version of your lineage.

Comparing lineage versions

You can compare two versions of a lineage graph to see how data flows changed between two points in time. This comparison helps you understand the impact of changes, such as updates to pipelines or models, by highlighting what was added, removed, or affected.

To compare lineage versions, on the lineage repository page, click the Lineage versions icon, and set Compare lineage versions to on. After you select two versions to compare, add assets for visualization and proceed to view the lineage.

In the comparison view, each asset and edge are assigned a status that reflects the changes between the selected versions:

Added
The asset exists in the later version but not in the earlier version.
Deleted
The asset exists in the earlier version but not in the later version.
Mixed changes
The asset exists in both versions, but one or more descendant assets were added or removed.

If an asset does not have any status, it means that it is the same in both versions.
For transformation assets, if the source code information is available, you can compare how the code changed between the selected versions.

Lineage filters

When you select starting assets for which you want to display lineage, you can further customize the size and contents of the lineage with the filters.

Note: Filters that you select are applied to the initial state of the lineage. When you expand the lineage, all filters are cleared. For newly expanded nodes, the default lineage filters are applied.

Lineage scope

Define the scope of lineage based on the number of assets in relation with the starting assets. By default, three assets in the upstream and downstream directions from the starting asset are displayed.

You can use the following filters to change the size of the lineage:

  • Range:
    • Asset range: a custom range of assets is displayed, in relation to the starting asset. You define how many assets are displayed in the Hops from the starting assets option. For example, when you set the number of hops to 5, then the lineage shows the starting assets and five nearest assets in the defined direction. The maximum number of hops is 50.
    • Only source and target assets: a starting asset, its original source asset and final target asset are displayed. All assets that are in between are hidden.
    • Complete lineage: a complete lineage with all assets is displayed. The maximum number of assets that can be displayed is 50 in each direction from the starting asset.
  • Data flow direction:
    • Upstream: the assets that flow towards the starting asset from the direction of the source asset are displayed.
    • Downstream: the assets that flow from the starting asset in the direction of the target asset are displayed. By default, assets that flow in both directions are displayed.

Asset attributes

Apart from modifying the size of your lineage, you can decide what assets to display. You can filter on a type of asset, a technology or various asset attributes like tags or assigned business terms.

Lineage asset groups
Lineage asset groups are marked with tags to make them easy to see. By default all groups are displayed. To hide them, set each group to off. You can filter the following asset groups:

  • Deduced: A deduced asset is an inferred object that is created by the system when it encounters references to unknown or missing components during data lineage extraction. Deduced assets are created to fill the gaps in an incomplete lineage.
  • Operational: Operational group refers to a job, process, or step that interacts with data while the data is in motion.
  • Transforming: Transforming asset is a type of operational asset that changes data by altering its values, structure, or lifecycle.
  • Temporary: Temporary assets refer to assets such as temporary data sets or temporary, global temporary, or volatile tables.

Monitoring and managing lineage

Go to the Monitor and manage tab on the Data lineage page to view lineage usage with the current number of assets and tables. You can also see lineage usage by technology and data source definition.

Deleting data lineage

Required permission
To delete lineage, you must have the following user permission:

  • Manage data lineage

You can delete your lineage on the Monitor and manage page. In the table Usage by data source definition, from the action menu of a data source definition that you want to delete, select Delete lineage. Specify the lineage versions to delete. The date that you select refers to the time when the lineage was imported.

When you delete a lineage, all assets that are included in this lineage are removed from the lineage repository. Related data source definitions and metadata imports are not affected. When a related metadata import job is in the running state, it is stopped and the lineage assets are removed. In some cases, not all assets might be removed when you delete a running job. You can wait for the job to finish and then delete it. You can also delete the job when it is running, and if any assets are still left after the deletion is complete, delete the remaining assets.

You can also delete lineage for multiple data source definitions at the same time.

Learn more