Getting data lineage (IBM Knowledge Catalog)

Capture end-to-end data lineage for assets on various data sources. Add business data lineage to assets in your catalogs and access detailed technical data lineage in MANTA Automated Data Lineage.

Lineage depicts the lifecycle of a unit of data, such as a table or a column, and indicates where the data comes from and how the data changes as it moves between data stores of any type.

This import option is not available in projects that are marked as sensitive.

Before you import metadata, design your metadata import so that you understand all your options and make the appropriate choices for your goals. For more information, see Designing metadata imports.

You can also use APIs instead of the user interface to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.

To run metadata enrichment on data assets that were added with a lineage import, you must make the data assets available in a project. For more information, see Adding catalog assets to a project.

Asset types
Data assets.
COBOL copybooks.
Transformation scripts.
See Asset types created through metadata import.
Supported connections
See the Metadata import (lineage) column in Supported connectors.
Required permissions
To create, manage, and run a metadata import, you must have these roles and permissions:
  • The Manage asset discovery user permission.
  • The Admin or the Editor role in the project.
  • The Admin or the Editor role in the catalog to which you want to import the assets.
  • Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.

Creating a metadata import asset and importing lineage metadata

To create a metadata import asset and a job for importing metadata into a catalog, follow these steps:

  1. Open a project, go to the project's Asset page and click New asset > Metadata Import.

  2. Select the option Get lineage. If you don't see this option, the Advanced metadata import feature is not enabled and no license key is installed. For more information, see Installed features and license requirements.

  3. Specify a name for the metadata import. Optionally, you can provide a description.

  4. Optional: Select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.

  5. Select the target catalog for the import. You can pick one from all catalogs that are available to you. However, make sure that the target catalog has a duplicate asset handling set to update the original asset.

  6. Define a scope for the lineage metadata import. For more information, see Scope of import.

    You can select connections that exist in the project as the source of the data, but you can also click Create a new connection and create a connection asset. You can import metadata and lineage from the data sources that are listed in Supported connectors.

    1. Select the connections from which you want to import metadata and lineage.

      You can also provide HiveQL scripts or BTEQ (Basic Teradata Query) scripts as input to accompany a connection for importing lineage from an Apache Hive or Teradata data source.

      Apache Hive
      You can provide HiveQL scripts as input. Create a .zip archive of your HiveQL scripts and add that .zip file to your project. Then, select the .zip file when you define the scope of your metadata import.
      Teradata
      You can provide BTEQ scripts as input. Create a .zip archive of your BTEQ scripts. The folder structure within the .zip file must be bteq/<database_name>/bteq_scripts, where database_name is optional. Add that .zip file to your project. Then, select the .zip file when you define the scope of your metadata import.
    2. Review the selected scope. If you selected an input file, select the corresponding data source from Technology tool to make the input type known.

      You can directly delete connections, schemas, or input files from the data scope, or you can rework the entire scope by clicking Edit data scope.

    3. When you're done refining the data scope, click Next.

  7. Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. See Scheduling options.

  8. Optional. Customize the import behavior. You can choose to prevent specific properties from being updated and to delete existing assets that are not included in the reimport. For more information, see Advanced import options.

  9. Review the metadata import configuration. To make changes, click the edit (edit icon) icon on the tile and update the settings.

  10. Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import runs on the defined schedule.

    Important: Assets from the same connection that were already imported through a different metadata import are not imported anew but are updated. Such assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.

When the import is complete, you can see the list of imported assets with the following information:

  • The asset name, which provides a link to the asset in the catalog.
  • The asset type, such as Data or Report. For data assets, also the format, such as Relational table, is shown. For other asset types, the format column shows a dash (—).
  • The asset context, such as the parent or file path.
  • The date and time that the asset was last imported.
  • The import status, which can be Imported for successfully imported data, In progress, or Removed if the asset couldn't be reimported.

Lineage imports are long-running processes. Don't expect immediate results. When the import is complete, the imported assets and their business data lineage are available in the catalog you selected as the target. The imported lineage is available on the asset's Lineage tab. Extra lineage information is available in MANTA Automated Data Lineage. You can access that information through the Go to asset's technical data lineage link in the About the asset panel.

Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.

A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.

Learn more

Next steps

Parent topic: Importing metadata