Information asset lineage reports (Watson Knowledge Catalog)
You can create reports that analyze the flow of data from data sources, through jobs and stages, and into databases, data files, business intelligence reports, and other assets. In addition, you can create a business lineage report that displays only the flow of data, without the details of a full data lineage report. You can use lineage report templates to create lineage reports. You must have the Access information assets permission to view lineage reports.
- Required permissions
- You must have these user permissions to create asset lineage reports:
- Access advanced governance. Otherwise, the menu options for information asset lineage administration are hidden.
- Access information assets
Data lineage reports show the movement of data through a job or multiple jobs. These reports can show the order of activities within a run of a job. Business lineage reports show a simplified view of lineage that highlights the transformation and aggregation of data that is needed by a business user. Business lineage reports do not show jobs and mapping specification asset types.
When you run reports, they display information assets in the context of your enterprise goals. You see them not as isolated database tables, database columns, jobs, or stages, but as integrated parts of the process that extracts, loads, investigates, cleanses, transforms, and reports on your data. Your lineage reports can also include virtual assets, which represent data sources that were not imported or created in the catalog, but are accessed by a job.
On request, lineage reports can show the impact dependencies in addition to data flows. Impacted assets are assets that are influenced by processes other than data flow. Such processes include job scheduling, job optimization management, and rule invocation.
You can run these types of reports:
- Data Lineage
- Data lineage reports can show different types of information:
- The flow of data to or from a selected information asset, through stages and stage columns, through one or more jobs, into databases and business intelligence (BI) reports.
- The order of activities within a job run, including the database tables that the jobs write to or read from.
- Business Lineage
- Business lineage reports do not display extension mapping documents or jobs from IBM InfoSphere DataStage and QualityStage. Data still flows through assets that are not displayed in the report. The business lineage report displays the graphical and textual components for only source, target, and intermediate assets that are configured to be included in business lineage.
Assets included in the lineage reports
Jobs and mapping specifications are excluded from lineage reports by default. Assets of all other types, including assets from registered asset bundles, are included for lineage reports by default. When you include or exclude an asset type for a report, the child assets of that asset type are also included in or excluded from the report. In addition, virtual assets are also displayed in a lineage report. A virtual asset is an asset that was not imported or created in the catalog, but it is accessed by a job.
Data lineage and business lineage reports
To see whether an asset type and its child assets can be included in business lineage reports, go to the Details page of the asset.
You can run data lineage and business lineage reports on the following assets and their child assets.
Data lineage reports
You can run data lineage, but not business lineage, reports on the following assets:
At times, you might not want or be able to import some data sources into the catalog to be catalog assets. In this case, the display of virtual assets can show the uninterrupted lineage even through the data sources that are not imported. As a result, the display of virtual assets provides a complete lineage representation despite incomplete metadata in the catalog.
Virtual assets display information about a data source from the properties of a stage when no corresponding data source exists in the catalog. If a matching data source does exist in the catalog, the actual data source rather than the virtual asset is displayed in the lineage report. Ideally, for design lineage, all parameters that are used in data source identity definitions of stages need to have realistic default values. This best practice applies to job parameters and to environment variables that are used as project-level parameters. When default values are missing, lineage reports display the job as linked to virtual data sources. These virtual assets are named according to how the stages address them. When the data source name is explicitly given or when default values do exist, the virtual asset name is that data source name. In the case where no default values exist, the virtual asset name is the parameter name enclosed in pound (#) signs. An example might be #some_param_name#.
You need to map data connection objects that are found and extracted from stage properties to databases that exist in the catalog. If this mapping is not done, stages and their jobs are displayed as connecting to a virtual asset rather than to an imported database.
When you use virtual assets, consider these limitations:
- You cannot search for virtual assets.
- Virtual assets are only displayed in lineage reports and in the Usage Information section of the Details page of jobs, stages, and stage columns.
- Lineage filtering cannot hide virtual assets.
- You cannot exclude virtual assets from a business lineage report.
The icons for virtual assets are lighter-colored icons than the ones that are used for assets in the catalog.
Running lineage reports
To run a lineage report, complete these steps:
- Go to Catalogs > Information assets.
- Search for the asset for which you want to run a report and open its details page.
- From the menu, select either Data Lineage Viewer or Business Lineage Viewer.