Importing metadata from a project (Watson Knowledge Catalog)
You can capture and import technical metadata and lineage information for the data in your organization. This data can be on a wide variety of data sources.
Required service Watson Knowledge Catalog MANTA Automated Data Lineage for IBM Cloud Pak for Data (for adding specific asset types and for lineage import)
Data format Tables from relational data sources Files from file-based connections to the data sources
Other formats Cobol copybooks Business intelligence reports Transformation scripts Data models
These assets are descriptive, cannot be added to projects, and, except for Cobol copybooks, require that the advanced metadata import feature is installed.
Data size The size of an uncompressed data model file should not exceed 75 MB.
Supported connections You can import assets from the data sources listed in Supported data sources for metadata import and metadata enrichment.
Required permissions To create and run a metadata import, you must have the Admin or the Editor role in the project. Also, you must be authorized to access the connections to the data sources of the data assets to be imported. To import metadata into a catalog, you must also have the Admin or the Editor role in the catalog to which you want to import.
For a data asset such as a table or a view imported from a database, technical metadata includes the following information:
- Table name
- Table view description
- Column information that consists of these items:
- Column name
- Column type
- Column length
- Column description
- Data source information (connection information) that consists of these items:
- Server hostname or IP
- Parent database that holds the table
- Parent schema of the table
- Lineage information such as the following information:
- Source tables and schema from where data flows into the table
- Target table and schema to which the data from this table flows
This list is not exhaustive. Also, when you import metadata from an unstructured data source such as a Box folder, a different set of metadata is imported. It includes, for example, file name, file type, size, access permissions, owner, creation date, last access date, parent folder, and other information.
Depending on your installation, you have these options:
You can import metadata associated with the data into a project or a catalog to inventory, evaluate, and catalog these assets. Technical metadata in this context describes the characteristics of data objects. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.
The data asset can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.
For other types of assets such as business intelligence or data model assets, basically information about hierarchies and relationships is imported.
When you import metadata, you add assets to projects or catalogs:
You can add data assets from connections to a project or a catalog. Data assets that you import to a project are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets. If you want to run metadata enrichment on the imported assets, you import them to a project.
You can add business intelligence assets to a catalog for inspecting the components of business intelligence reports and how they are related. In this case, the advanced metadata import feature must be enabled. To be able to visualize the data flows that transform and populate the source data for the reports, use the Get lineage metadata import option.
In 4.5.2 and later, you can add data model assets to a catalog to have a single collection point for all the business knowledge relating to your data management landscape. Imported data models are read-only copies of the originals created and maintained in database modeling tools such as ER/Studio and erwin Data Modeler. Depending on the size of the imported data model, a large amount of assets might be created in the catalog. To find the root of the model as a starting point, filter the catalog assets on the Logical data model or Physical data model asset type.
Importing data models also requires the advanced metadata import feature to be enabled.
If the advanced metadata import feature is installed with Watson Knowledge Catalog, you can also import metadata only (no lineage information) from Power BI and Tableau data sources.
See Importing metadata for discovery.
Get lineage option
Import metadata from data integration and reporting tools into a catalog. Capture lineage information from various data sources to provide business lineage information for imported assets. You add information to a catalog about where your data comes from, how it changes, and where it moves over time. In addition, you can access more advanced lineage information for the imported asset in MANTA Automated Data Lineage. Additional lineage details include technical data lineage, historical data Lineage, and indirect data lineage.
Lineage information is important, for example, in these cases:
- For traceability: where does an asset originate and how did it change over time. This is of interest to financial services companies, for example.
- For impact analysis: you might want to know whether and what downstream impact a change to an asset might have before you make any changes.
- For root cause analysis: you might want to see where a specific asset came from and how it impacted other assets.
This feature is not available by default. To import and work with lineage information, the optional features advanced metadata import and knowledge graph must be installed with Watson Knowledge Catalog. Additionally, MANTA Automated Data Lineage for IBM Cloud Pak for Data must be purchased separately and the license key that's provided on purchase must be installed. The license entitles users to a certain number of lineage imports, the so-called script count. Every new lineage import for an object such as a database table counts toward this script count. Reruns of imports don't consume script count. For details, see MANTA Automated Data Lineage for IBM Cloud Pak for Data Script counting details. If the allocated number of scripts is exhausted, extra entitlements can be obtained by buying additional licenses and installing the new license key with more script count. You can check the script count of the active license in the MANTA Automated Data Lineage UI by clicking the information icon.
The available script count is shown underneath the Get lineage tile when you define the goal of your metadata import.
See Capturing data lineage.
If you don't see this option, lineage import with MANTA Automated Data Lineage isn't enabled. If the Get lineage option is disabled, the script count is exhausted.
- Installing Watson Knowledge Catalog
- Enabling lineage import
- Supported connection types
- MANTA Automated Data Lineage documentation
- MANTA Automated Data Lineage for IBM Cloud Pak for Data Script counting details
Parent topic: Curating data