Data lineage, business lineage, and impact analysis reports

Data lineage and impact analysis reports show the movement of data within a job or through multiple jobs. These reports also show the order of activities within a run of a job. Business lineage reports show a scaled-down view of lineage without the detailed information that is not needed by a business user. Impact analysis reports show the dependencies between assets.

You can start a report from the task list of an asset information page or from the right-click menu of an asset in a results list. You can track the flow from source to target or from target to source.

When you run reports, the metadata workbench displays information assets in the context of your enterprise goals. You see them not as isolated tables, columns, jobs, or stages, but as integrated parts of the process that extracts, loads, investigates, cleanses, transforms, and reports on your data.

Before you run reports, the Metadata Workbench Administrator must run the Manage Lineage utility and, where necessary, manual linking actions to set relationships between assets. To report on relationships that are created by operational metadata, you must first import operational metadata.

If a report does not return the expected results, take the following actions:
  • Ensure that the Manage Lineage utility was run.
  • Browse to the asset information page of information assets that you suspect are not properly linked. Expand the Design, Operational, and User-Defined information sections to validate that the correct relationships are set.
  • Perform manual linking actions to set the necessary relationships.

Report types

You can run these types of reports:
Data Lineage and Impact Analysis
Both data lineage and impact analysis reports can show different types of information:
  • The flow of data to or from a selected metadata asset, through stages and stage columns, through one or more jobs, into databases and business intelligence (BI) reports.
    For example, a data lineage report might start with a database column that is read by a stage in a job. The report might show the following flow of data at the column level:
    • A stage in the first job reads the database column
    • Information flows through one or more stages in the first job until one of the stages writes to a table in a second database
    • A stage in a second job reads the database column in the second database
    • Information flows through one or more stages in the second job until one of the stages writes to a table in a third database
    • The data in the database column in the third database is captured in a BI report
  • The flow of data to or from a selected metadata asset through one or more jobs, through database tables, views, or data file structures and into BI reports and information services operations.

    For example, a data lineage report might show that the sources for a BI report come from three separate jobs that write to a single database table. It might then show that the table is bound to a BI report collection that is used by the BI report.

  • The order of activities within a job run, including the tables that the jobs write to or read from and the number of rows that are written and read. You can inspect the results of each part of a job run by drilling into the job run activities to see the links, stages, and database tables, or data file structures that the job reads from and writes to.

    For example, for a simple job, a data lineage report might show that the first activity read six rows from a text file. It might then show that the second activity wrote six rows to a database table. For a more complex job, the data lineage report might show the order of activities that are responsible for every read, write, or lookup.

Business Lineage

Business lineage reports show data flows only through those information assets that have been configured to be included in business lineage reports. In addition, business lineage reports do not include extension mapping documents or jobs from IBM® InfoSphere® DataStage® and QualityStage®.

You are not required to specify the flow direction of data, the analysis type, or a target asset. The business lineage report displays the graphical and textual components for only those source, target, and intermediate assets that are configured to be included in business lineage.

The Metadata Workbench Administrator configures which information assets are displayed in business lineage reports. A report is generated from the right-click menu of an asset that is configured for business lineage. The report is read-only and you cannot get further information about the data flow or about the assets themselves.

A user of IBM InfoSphere Business Glossary Anywhere, IBM InfoSphere Business Glossary, IBM InfoSphere Metadata Workbench, and external programs such as IBM Cognos®, can create a business lineage report for an asset. The user must have at least the Business Glossary User role. The report is displayed in a new window in the web browser. For example, a business lineage report for a BI report might show the flow of data from one database table to another database table. From the second database table, the data flows into a BI report collection table and then to a BI report. The context of the database tables and the BI report collection table is displayed.

For each type of analysis, you can create a report that shows the flow of information from asset to asset that participates in the lineage or analysis flow.