Getting ETL job lineage (IBM Knowledge Catalog)
Capture end-to-end data lineage for ETL jobs. Add business data lineage to data integration assets and, optionally, to imported data assets in your catalogs, and access detailed technical data lineage in MANTA Automated Data Lineage.
Lineage for ETL jobs depicts the order of activities within a job, optionally including the database tables that the job reads from or writes to. In addition, it shows the flow of data to or from a selected data asset through a job into databases and business intelligence (BI) reports.
This import option is not available in projects that are marked as sensitive.
Before you import metadata, design your metadata import so that you understand all your options and make the appropriate choices for your goals. For more information, see Designing metadata imports.
You can also use APIs instead of the user interface to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.
If you want to run metadata enrichment on data assets that were added with an ETL job lineage import, make the data assets available in a project. For more information, see Adding catalog assets to a project.
- Asset types
- Data integration assets that represent components of ETL jobs. Optionally, data assets that serve as source or target in an ETL job. For more information, see Asset types created through metadata import.
- Supported connections
- For more information, see the Metadata import (external lineage) column and the Other data sources section in Supported connectors.
- Required permissions
- To create, manage, and run a metadata import, you must have these roles and permissions:
- The Manage asset discovery user permission.
- The Admin or the Editor role in the project.
- The Admin or the Editor role in the catalog to which you want to import the assets.
- Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.
Prerequisites
Before you can generate and import ETL job lineage, complete the following prerequisite tasks:
- For InfoSphere DataStage, Talend, Informatica PowerCenter, or OpenLineage ETL jobs, create an ETL job file and upload it to your project. For more information, see Preparing ETL job files.
- For DataStage on Cloud Pak for Data projects, export the project to a .zip file and upload it to your project. For more information, see Preparing ETL job files.
- For DataStage flows (DataStage on Cloud Pak for Data), select flows from your project.
Creating a metadata import asset and generating or importing lineage metadata for ETL jobs
To create a metadata import asset and a job for generating and importing technical and lineage metadata for ETL jobs:
-
Open a project, go to the project's Asset page and click New asset > Import metadata for data assets.
-
Select the option Get ETL lineage. If you don't see this option, the Advanced metadata import feature is not enabled and no license key is installed. For more information, see Installed features and license requirements.
-
Specify a name for the metadata import. Optionally, you can provide a description.
-
Optional: Select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.
-
You can select a catalog as the import target. For this type of import, the import target can be only a catalog. For more information, see Scope of import. Select a catalog from the list.
If your project is marked as sensitive, you can't create and run ETL job lineage imports.
-
Define a scope for the metadata import. For more information, see Scope of import.
-
Select the ETL job input for the import.
-
Click Select scope to pick an ETL job file from your project for the import. You can select only one file at a time. For more information, see Preparing ETL job files.
An ETL job file is static. You can't update its content for a later rerun of the metadata import. To work with a new version of the ETL job file, you must create a new metadata import.
-
After you select an ETL job file, select the data integration tool to make the proper file structure known.
If you want to capture the lineage of DataStage assets, you have these options:
- To add a DataStage project, click Select scope and select a project file for the import. After you select the file, select the data integration tool to make the proper file structure known. This method is especially useful when you want to work with a DataStage project that is available on a different Cloud Pak for Data instance.
- To add DataStage flows that exist in your project, you have these options:
- To select individual flows, click Select scope and select the flows that you want to import. You can select more than one flow for a single metadata import. All DataStage flow dependencies are automatically included in the scope.
- To select all DataStage flows, you use the Select all DataStage flows and their dependencies in the project option. With this option, you include all DataStage flows in your project in the scope and skip the step for individual selection. This option is not available when you select a DataStage project exported to a .zip file.
-
-
Optional: Select source and target assets associated with the ETL job to include technical and lineage metadata for them.
You can select connections that exist in the project but you can also click Create a new connection and create a connection asset. You can import metadata and lineage from the data sources listed in Supported connectors.
-
Review the selected scope.
-
-
Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. For more information, see Scheduling options.
-
Optional. Customize the import behavior. You can choose to prevent specific properties from being updated and to delete existing assets that are not included in the reimport. For more information, see Advanced import options.
-
Review the metadata import configuration. To make changes, click the Edit icon
on the tile and update the settings.
-
Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import runs on the defined schedule.
Important: If the ETL job or set of DataStage flows was already imported through a different metadata import, it is not imported anew but is updated. The data integration assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.The same is true for data assets if you selected to add those to your ETL job lineage import.
Lineage imports are long-running processes. Don't expect immediate results.
When the import is complete, you can see the list of imported assets with the following information:
- The asset name, which provides a link to the asset in the catalog.
- The asset type, such as
Data integration job
. For data assets, also the format, such asRelational table
, is shown. - The date and time that the asset was last imported.
- The import status, which can be
Imported
for successfully imported data,In progress
, orRemoved
if the asset couldn't be reimported.
When the import is complete, the imported assets and their business data lineage are available in the catalog you selected as target. The imported lineage is available on the asset's Lineage tab. Additional lineage information is available in MANTA Automated Data Lineage. You can access that information through the Go to asset's technical data lineage link in the About the asset panel.
Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.
A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.
Learn more
Next steps
Parent topic: Importing metadata with MANTA Automated Data Lineage