Table of contents

Configuring information asset lineage reports (Watson Knowledge Catalog)

You can configure information asset lineage reports in the following ways:

Required permissions
You must have both these user permissions:
  • Access advanced governance
  • Manage information assets

Including assets in data lineage and business lineage reports

You can select which assets to include in data lineage and business lineage reports to make them more meaningful to the user. Jobs, transformation projects, and mapping specifications are excluded from lineage by default.

Jobs and transformation projects from IBM® InfoSphere DataStage and QualityStage
In typical development transformation projects, many jobs are experimental. As a result, those jobs need to be excluded from data lineage reports. Set Include For Lineage to True only for jobs that are production candidates.
Mapping specifications from IBM InfoSphere FastTrack
By default, lineage assumes that mapping specifications do not describe actual data flows. As a result, mapping specifications are ignored in lineage reports. In those cases where mapping specifications do describe actual data flows, you can set Include For Lineage to True for those mapping specifications.

Business lineage reports can include certain asset types, such as application, business intelligence (BI) report, and data file. By default, all assets of these asset types are included in business lineage reports. You can exclude individual assets from business lineage reports. Only assets that have Include for Business lineage set to True are displayed in business linage relationships. Their relationships enable the lineage to continue the flow even if they themselves are not shown.

To include assets in lineage reports, complete these steps:

  1. Go to Administration > Lineage > Lineage administration> Include for Lineage.
  2. Select the asset type to display in lineage reports, and then select the assets that you want to include in lineage reports.
  3. Change the inclusion state to either True or False.

To include assets in business lineage reports, complete these steps:

  1. Go to Administration > Lineage > Business lineage.
  2. Select the asset type to display in business lineage reports, and then select the assets that you want to include in business lineage reports.
  3. Change the inclusion state to either True or False.

Mapping data connection objects to a database

You can map data connection objects to an imported database. As a result, lineage reports display the actual database name rather than the data connection object wherever the database asset occurs. Jobs from IBM InfoSphere DataStage read from or write to data sources by using stages. The stage defines data connections such as the URL, server name, database name, user credentials, and SQL statements that are needed to access the database. Different stages might use different data source names to refer to the same database. During lineage analysis of job designs and job runs, the names of the host and data sources that are found in the job’s stages are added to the list of data connection objects.

For lineage to continue the flow through databases that are referred to by different data source names, you must map the data connection object to the imported database. Lineage uses this information to create relationships between stages and imported database tables. If this mapping is not done, the lineage for that job displays the database tables as virtual assets that might not link correctly to the rest of the lineage flow.

At times, you might need to map the data connection objects to a database that cannot be imported because of your internal security policies. In this case, you can identify those data connection objects as being the same, and select one of them as the preferred name for when you display lineage reports.

To map data connection objects to a database, complete these steps:

  1. Go to Administration > Lineage > Lineage administration > Data Connection Mappings.
  2. If the database that you want to map to was imported, complete these steps:
    1. In the Data Connections Mappings pane, select the data connection that you want to map to the database, and then click Edit.
    2. In the Bound to Database list, select the database to map a data connection to. Do this step for each data connection that you need to map to the database. If you have many mappings, it might be more convenient to edit the first data connection that was bound to a database. Then, in the Same as Data Connections field, add all other data connections that map to the same database.
  3. If the database that you want to map to cannot be imported due to your security policies, complete these steps:
    1. Edit the first data connection. In the Same as Data Connections field, add all other data connections that would map to the same database that is not imported. The database assets are displayed as virtual assets because the database was not imported.
    2. Select one of the identical data connections to be the preferred data connection. In lineage reports, this preferred data connection is displayed wherever any of these data connection objects participate in the data flow.
  4. Save the changes.

Defining database schemas as identical

You can define a database schema as identical to other database schemas or data file folders. Tables and columns that are contained by each of the identical assets are also defined as identical when their names match.

Database and data file asset types can be imported into the catalog by different means, such as by a connector and by a bridge. As a result, they can exist in the catalog as different assets even though they are identical.

For example, one user might import Schema1 of database DW on host DataServer and Schema1 of database DW on host BIServer into the default catalog. Another user discovers Schema1 of database DW on host ProdDOS. All three database schemas exist as distinct entities in the catalog. When you specify that the database schemas are identical, you enable lineage to continue the data flow.

Similarly, you can indicate which data file folder on HDFS is the storage location for a HIVE database schema. Data lineage will show the Hive database schema, database table, and database columns, assuming that this identity is the preferred representation.

If a database schema is selected as the preferred schema, it is the one that is displayed in the lineage report. Otherwise, which of the identical database schemas is displayed in the lineage report is arbitrary. Therefore, it is best to define a preferred database schema.

To define database schemas as identical, complete these steps:

  1. Go to Administration > Lineage > Lineage administration > Same as Database Schemas.
  2. In the Same as Database Schemas pane, select the database schema that you need to mark as identical to another asset, and then click Define Identical Data Sources.
  3. In the Same as Data Sources list, select either database schema or data file folder as the asset type. Then, select the asset that you want to define as identical to the database schema that you selected in the previous step.
  4. Save the changes.

Configuring automatic update of lineage flows and information latency in lineage reports

Lineage flows are analyzed and indexed so that they’re available for lineage reports on subsequent requests. When changes in lineage-impacting assets occur, lineage flows are analyzed and automatically updated.

Configuring automatic flow updates

By default, automatic update of lineage flows is enabled. It might impact system performance, especially, when other tasks that you run use a lot of memory. Therefore, if you do not use lineage feature, you can disable automatic update of lineage flows. Alternatively, you can disable the update only for the selected asset types.

  1. Go to Administration > Lineage > Lineage administration > Lineage Configuration.
  2. To disable automatic update for selected asset types, clear the check boxes for asset types that you want to skip.
  3. To disable automatic update for all asset types, in Do you want to enable automatic update of lineage flows?, select No.

When you disable automatic update, the lineage flows are not recalculated and lineage reports are not updated. You can do it manually by detecting lineage relationships. You can also enable automatic update again. In this case, the data is analyzed and updated from the moment you disabled the automatic update.

Information latency in lineage reports

Because flows are automatically analyzed and updated, any changes in the catalog might take some time before they are reflected in your lineage report. This delay applies to all assets whose content is relevant for lineage reports, such as:

  • Designs of jobs that are included for lineage
  • Operational metadata from runs of jobs that are included for lineage
  • Extension mapping documents
  • Mapping specifications that are included for lineage
  • Database views
  • BI reports and BI models
  • MDM models
  • Data connection mappings
  • Database schema identity mappings
  • Manual stage bindings

The maximum amount of latency is the sum of the following factors:

Polling interval
The minimum interval is 30 seconds. The default value is 5 minutes. You can change the polling interval in Administration > Lineage > Lineage administration > Lineage Configuration.
Number and complexity of changed assets
Flow publication of a change might take 1 second - 30 seconds for each asset, depending on complexity.
Current publication request
The flow publication that is being serviced.
Number of publication requests in the queue
Flow publication occurs sequentially. As a result, a publication might be queued.

Creating lineage filters

Use lineage filters to refine the list of assets that are displayed in your lineage reports. Apply the filters when you run a lineage report.

To create a filter, complete these steps:

  1. Go to Administration > Lineage > Lineage filters.
  2. From the menu, select New.
  3. In the New Lineage Filter window, type in a filter name. Select the direction of the data flow relative to the asset that you want in your lineage report. For example, you might want to see data that flows from a selected asset, or to see data that flows to a selected asset. A final possibility is to see flows coming to and exiting the selected asset.
  4. Expand Asset Types to Hide. Click the Select to add an asset type condition field and select an asset type that you want to exclude from your lineage report.
  5. In the new window, do either or both of these actions:
    1. Click the Hide Assets tab to exclude assets from the lineage report. Data flows continue through the excluded assets.
    2. Click the Hide Assets and Their Flows tab to exclude assets and their data flows from the lineage report.
  6. In either or both tabs, do these steps to specify each asset to exclude:
    1. In the right pane, click the Down arrow to add a condition or sub-condition.
    2. In the Available Properties list, double-click an attribute or a related asset. The selected attribute or related asset is displayed in the right pane. Specify values for each new condition or sub-condition.
  7. Save the changes.

Creating lineage report templates

Use lineage report template to customize the data that is displayed in the lineage report. You can use existing filters in lineage report templates and define additional properties of assets to display.

To create a lineage report template, complete these steps:

  1. Go to Administration > Lineage > Lineage report templates.
  2. From the menu, select New.
  3. In the New Lineage Report Template window, type in a report name.
  4. To use existing lineage filters to design the lineage report template, select the filters one-by-one in the Uses Lineage Filter list.
  5. Expand Asset Type Properties to Display.
  6. Click the Select to add asset type display properties field, and then select an asset type that you want to include in your lineage template. To edit the display properties of an asset type, do these steps:
    1. Select the asset type.
    2. In the Properties window, select the properties to display.
    3. In the Available Properties list, double-click an attribute or a related asset. The selected attribute or related asset is displayed in the Displayed Properties pane.
  7. Save the changes.

Monitoring lineage tasks

You can monitor tasks that are pending or completed in Administration > Lineage > Lineage administration > Monitor Lineage Tasks. Changes in the catalog might take some time before they are reflected in the lineage report. When there are pending tasks, lineage reports might not reflect the latest changes in the catalog.

At times, a lineage task might stay in the Pending state for several days, either because the task is not valid or for some other reason. You can delete pending lineage tasks by clicking Delete in the Pending Tasks column on the Monitor Lineage Tasks page. By default, only the tasks that were submitted 72 hours before or earlier are deleted.

Learn more