Getting ETL job lineage (Watson Knowledge Catalog)

Capture end-to-end data lineage for ETL jobs. Add business data lineage to data integration assets and, optionally, to imported data assets in your catalogs, and access detailed technical data lineage in MANTA Automated Data Lineage.

Lineage for ETL jobs depicts the order of activities within a job, optionally including the database tables that the job reads from or writes to. Also, lineage shows the flow of data to or from a selected data asset through a job into databases and business intelligence (BI) reports.

The import option is not available for projects that are marked as sensitive.

Before you import metadata, design your metadata import to ensure that you understand all your options and make appropriate choices for your goals. For more information, see Designing metadata imports.

Instead of the user interface, you can also use APIs to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.

To run metadata enrichment on data assets that were added with an ETL job lineage import, make the data assets available in a project. For more information, see Adding catalog assets to a project.

Asset types

Data integration assets that represent components of ETL jobs. Optionally, data assets that serve as the source or target in an ETL job. For more information, see Asset types created through metadata import.

These asset types are created starting in Cloud Pak for Data 4.7.2.

Supported connections

See the Metadata import (lineage) column and the Other data sources section in Supported connectors.

Required permissions

To create, manage, or run a metadata import, you must have the following roles and permissions:

The Manage asset discovery user permission.
The Admin or the Editor role in the project.
The Admin or the Editor role in the catalog to which you want to import the assets.
Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.

Prerequisites

Before you generate and import an ETL job lineage, complete the prerequisite tasks:

For InfoSphere DataStage, Talend, or Informatica PowerCenter ETL jobs, create an ETL job file and upload it to your project. For more information, see Preparing ETL job files.
For DataStage flows (DataStage on Cloud Pak for Data), select flows from your project.

Creating a metadata import asset and generating or importing lineage metadata for ETL jobs

To create a metadata import asset and a job for generating and importing technical and lineage metadata for ETL jobs:

Import results

Lineage imports are long-running processes. Don't expect immediate results.

Import results in Cloud Pak for Data 4.7.0 and 4.7.1

When the import job is complete, lineage information is available in MANTA Automated Data Lineage. You can access that information in the MANTA Automated Data Lineage UI. Depending on the type of the ETL job, you might need different information:

Get the necessary information:
- For DataStage flows, you need the ID of your metadata import asset to locate the lineage information. To identify this ID, open the metadata import asset. The asset ID is part of the URL: https://<hostname>/gov/metadata-imports/<asset_ID>?project_id=<project_ID>.
- For legacy DataStage or Talend ETL jobs, you need the asset ID of the ETL job file that is used for the import. To identify this ID, open the ETL job file by clicking its name on the projects Assets page. The asset ID is part of the URL: https://<hostname>/projects/<project_ID>/data-assets/<asset_ID>.
To open the lineage viewer in MANTA Automated Data Lineage, open the metadata import asset and click the link in the result area. Alternatively, you can enter the following URL in a new browser window. Replace hostname with the hostname of your Cloud Pak for Data deployment.
```
https://<hostname>/manta-dataflow-server/viewer
```
Expand the entry for your data source. For example, DataStage.
Locate the entry for your metadata import asset or your ETL job file, which is the asset ID with the suffix _lineage, and expand it.
Select the elements for which you want to view lineage information and click Visualize.

Import results in Cloud Pak for Data 4.7.2 and later

When the import is complete, you can view the list of imported assets with the following information:

The asset name, which provides a link to the asset in the catalog.
The asset type, such as Data integration job. For data assets, also the format, such as Relational table, is displayed.
The date and time that the asset was last imported.
The import status, which can be Imported for successfully imported data, In progress, or Removed if the asset couldn't be reimported.

When the import is complete, the imported assets and their business data lineage are available in the catalog that you selected as target. The imported lineage is available on the asset's Lineage tab. Extra lineage information is available in MANTA Automated Data Lineage. You can access that information through the Go to asset's technical data lineage link in the About the asset panel.

Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.

A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.

Learn more

Next steps

Parent topic: Importing metadata