IBM Data Virtualization lineage configuration

Connect to IBM Data Virtualization to scan metadata and display it on lineage. To import lineage metadata, create data source definition, a connection, and metadata import job.

This information applies to IBM Manta Data Lineage service.

Supported IBM Data Virtualization versions

Data Virtualization 3.3.1 on IBM Cloud Pak for Data 5.3.1 and later

Processed metadata

The following Data Virtualization metadata is processed and displayed on lineage:

Virtualized tables from relational data sources

Limitations

The following limitations apply:

Lineage for virtualized object store objects and virtualized files is not imported.
Virtual-to-source lineage is not connected when the source object is an alias.
Virtual-to-source lineage is not connected when multiple objects with the same name, for example a table and a function, exist within the same schema at the data source.
When you create a metadata import for IBM Data Virtualization, the system doesn't validate the job. This allows you to create imports without the necessary privileges, which can lead to missing metadata and errors in the extraction log.
When data source is not scanned along with the Data Virtualization objects, the deduced data source objects that share the same schema and object name will be grouped under a single deduced source object even if they come from different sources.

Prerequisite configuration

Before you import lineage metadata, ensure that the following prerequisites are met:

You have Data Virtualization instance with Federal Information Processing Standards (FIPS) disabled.
Data Virtualization Admin grants you DV_METADATA_READER role by using GRANT ROLE DV_METADATA_READER TO USER/ROLE/GROUP <auth_id> SQL statement. The following <auth_id> values are valid for ROLE:
- DV_ADMIN
- DV_STEWARD
- DV_ENGINEER
- DV_USER
You have the Admin or Steward role in Data Virtualization or the SELECT privilege on all Data Virtualization objects or schemas and their dependencies participating in the lineage import.

Alternatively, you can disable the Restrict visibility option by using Managing the visibility of virtual objects in Data Virtualization in the Cloud Pak for Data documentation.
You have the INSPECT privilege on all data sources in Data Virtualization that your scanned virtual tables originate from. Alternatively, the INSPECT privilege can be granted on each data source in Data Virtualization to the DV_METADATA_READER role.

Creating a metadata import asset

Data source connection

To connect to the data source from which you want to import lineage metadata, you need to select a data source definition and a connection. You can create them before you start creating the metadata import, or you can create them when you create and configure the metadata import asset.

If a Data Virtualization instance is available on the same platform as watsonx.data intelligence, then the data source definition for Data Virtualization is already created. The platform connection for Data Virtualization is also already present. You only have to create a connection in a project, based on the platform connection.

Data source definition

Select IBM Data Virtualization as the data source type.

Connection

When you create a connection, ensure these requirements are met:

Use the username and password as connection credentials to extract lineage from Data Virtualization. The Use my platform credentials can't be used to import lineage metadata.
In case of a non-SSL connection, create a connection in your project based on the existing platform connection.
In case of an SSL connection, complete these steps:
1. Download Data Virtualization SSL certificate from Data > Data virtualization > Menu > Configure connection > Download SSL Certificate.
2. Create a new non-platform IBM Data Virtualization connection with exactly the same details as the IBM Data Virtualization platform connection in Data > Connections.
3. Enter details of the SSL certificate in the SSL certificate field.

You can use both authentication methods in the connection when you import lineage, user credentials and API authentication.

For connection details, see IBM Data Virtualization connection.

Include and exclude lists

You can include or exclude assets up to the schema level. Each value is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:

mySchema: mySchema schema
- myTable: myTable table
mySchema[1-5]: any schema with a name that starts with mySchema and ends with a digit between 1 and 5.

Advanced import options

Extract extended attributes: You can extract extended attributes like primary key, unique and referential integrity constraints of columns. By default these attributes are extracted.

Learn more

IBM Data Virtualization connection
Data Virtualization on Cloud Pak for Data in the IBM Software Hub documentation