Data lineage and business lineage reports

Data lineage reports show the movement of data through a job or multiple jobs. These reports can show the order of activities within a run of a job. Business lineage reports show a simplified view of lineage that highlights the transformation and aggregation of data that is needed by a business user. Business lineage reports do not show jobs and mapping specification asset types.

When you run reports, they display information assets in the context of your enterprise goals. You see them not as isolated database tables, database columns, jobs, or stages, but as integrated parts of the process that extracts, loads, investigates, cleanses, transforms, and reports on your data. Your lineage reports can also include virtual assets, which represent data sources that were not imported or created in the catalog, but are accessed by a job.

On request, lineage reports can show the impact dependencies in addition to data flows. For example, you use the Balanced Optimization process in IBM® InfoSphere® DataStage® and QualityStage® Designer to analyze a root job and then to create an optimized job that does the same thing, but with improvements in performance and resource utilization. The root job is linked to the optimized job so that impact flow, when enabled, shows the dependency.

Report types

You can run these types of reports:
Data Lineage
Data lineage reports can show different types of information:
  • The flow of data to or from a selected information asset, through stages and stage columns, through one or more jobs, into databases and business intelligence (BI) reports.
  • The order of activities within a job run, including the database tables that the jobs write to or read from.
Business Lineage

Business lineage reports do not display extension mapping documents or jobs from IBM InfoSphere DataStage and QualityStage. Data still flows through assets that are not displayed in the report.

The Information Governance Catalog Information Asset Administrator configures which information assets are displayed in business lineage reports. The business lineage report displays the graphical and textual components for only source, target, and intermediate assets that are configured to be included in business lineage.

A user of IBM Glossary Anywhere and external programs such as IBM Cognos®, can create a business lineage report for an asset. The user must have at least the Information Governance Catalog User role. The report is displayed in a new window in the web browser.

Information latency in lineage reports

IBM InfoSphere Information Governance Catalog analyzes and indexes lineage flows to make them available for lineage reports on subsequent requests. When changes in lineage-impacting assets occur, InfoSphere Information Governance Catalog analyzes them and updates the Flows area of the metadata repository with the current flows. The process of analysis and storage is called Flow Publishing. The Flows area is used for efficient lineage reporting, but it is not directly accessible to InfoSphere Information Governance Catalog. Therefore, any changes in the catalog might take some time before they are reflected in your lineage report.

This delay applies to all assets whose content is relevant for lineage reports, such as:
  • Designs of jobs that are included for lineage
  • Operational metadata from runs of jobs that are included for lineage
  • Extension mapping documents
  • Mapping specifications that are included for lineage
  • Database views
  • BI reports and BI models
  • MDM models
  • Data connection mappings
  • Database schema identity mappings
  • Manual stage bindings
The maximum amount of latency is the sum of the following factors:
Polling interval
The minimum interval is 30 seconds.
You can change the polling interval in the Lineage Administration page (Administration > Lineage Management > Lineage Administration > Lineage Configuration).
Number and complexity of changed assets
Flow publication of a change might take 1 seconds - 30 seconds for each asset, depending on complexity.
Current publication request
The flow publication that is being serviced.
Number of publication requests in the queue
Flow publication occurs sequentially. As a result, a publication might be queued.

You can monitor tasks that are pending or completed in the Lineage Administration page (Administration > Lineage Management > Lineage Administration > Monitor Lineage Tasks). Changes in the catalog might take some time before they are reflected in the lineage report. When there are pending tasks, lineage reports might not reflect the latest changes in the catalog.