Importing metadata with IBM Manta Data Lineage
You can capture and import asset metadata and lineage metadata for the data in your organization. This data can be on a wide variety of data sources. When you import asset metadata, assets are created.
- Required services
- IBM Knowledge Catalog (to import asset metadata)
- Manta Data Lineage (to import lineage metadata and some asset metadata)
These services are not available by default. An administrator must install the services. To determine whether a service is installed, open the Services catalog. If the service is installed and ready to use, the tile in the catalog shows Ready to use.
- Required permissions
- To create, manage, and run a metadata import, you must have these roles and permissions:
- The Manage asset discovery user permission.
- The Admin or the Editor role in the project.
- The Admin or the Editor role in the catalog to which you want to import or publish the assets.
- Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.
- Supported connections
- You can import lineage metadata from the data sources that are listed in Supported connectors for lineage import.
Overview of the import process
The process of importing metadata involves the configuration of the connection to the data source and specifying parameters for the metadata import job.
The following steps provide an overview about the process of importing metadata. Follow the links in each step for more details.
- Create a data source definition.
- Create a connection to the data source in a project.
- Create a metadata import.
Each data source requires various connection and configuration details. You can find this information in each connection topic in the Connectors section and in Supported connectors for lineage import.
When you create a metadata import, the process of importing metadata starts immediately, unless you scheduled it to run at a specific time, or configured to job to not run at all.
Types of metadata
You can import these types of metadata:
- Technical metadata
- Technical metadata provides the information that is required to create an asset in a project or catalog. Technical metadata provides asset details, relationships, and the preview of the contents of the asset. For data assets, the technical metadata also allows for data profiling, data quality analysis, and provides access for people to work with the data.
- Lineage metadata
- Lineage metadata provides the lineage information for the data lineage graph. Data lineage shows where your data comes from, how it changes, and where it moves over time.
Types of assets
You can create the following types of assets by importing metadata:
- Data assets
- Data tables or files from a connection. If you want to run metadata enrichment or data quality rules on the imported assets, you import them to a project.
- Cobol copybooks
- The data structure of a COBOL program. You can import Cobol copybooks into projects and catalogs. Such assets cannot be downloaded, profiled, enriched through metadata enrichment, or used in Data Refinery.
- Transformation script assets
- The data transformations that change the format, structure, or values of data and that usually are part of ETL (extract, transform, and load) processes.
Importing assets with data lineage connections
If you use IBM Cloud Pak for Data version 5.3.0, you can import assets from the data sources which are listed in the Supported connectors for discovery, enrichment, and data quality topic. Starting with IBM Cloud Pak for Data version 5.3.1, you can additionally import assets from the data sources that can also be used for lineage metadata. For the list of such data sources, see Supported connectors for lineage import with Manta Data Lineage. However, importing assets from these two types of data sources requires different configuration and has different capabilities.
To clarify the difference between two types of asset import, the following terms are used:
- Asset import with data governance connectors: these are the connectors that are listed in Supported connectors for discovery, enrichment, and data quality topic. You can use them if you have IBM Cloud Pak for Data version 5.3.0 or 5.3.1.
- Asset import with data lineage connectors: these are connectors that are listed in Supported connectors for lineage import with Manta Data Lineage. You can use them only if you have IBM Cloud Pak for Data version 5.3.1.
The following table contains information about the required services for each type of the asset import.
| Service configuration | Asset import (data governance connectors) | Asset import (data lineage connectors) | Lineage import |
|---|---|---|---|
| IBM Knowledge Catalog only | ✓ | — | — |
| Manta Data Lineage only | — | — | ✓ |
| IBM Knowledge Catalog + Manta Data Lineage 5.3.0 | ✓ | — | ✓ |
| IBM Knowledge Catalog + Manta Data Lineage 5.3.1 | ✓ | ✓ | ✓ |
The following table summarizes key feature differences between two types of asset import.
| Feature | Asset import (data governance connectors) | Asset import (data lineage connectors) |
|---|---|---|
| Import target | Catalog or project | Catalog only |
| Metadata enrichment | ✓ | — |
| Select asset types to import (with the Expand asset import option) |
— | ✓ |
| External agent connection | — | ✓ |
| Optional import configuration: - Define import scope (by selecting assets or adding include or exclude lists) - Provide external inputs |
— | ✓ |