Discovering data (IBM Knowledge Catalog)
You can import technical metadata to add data assets to a project or a catalog. In a project, you can prepare and analyze the data before you publish it to a catalog.
Before you import metadata, design your metadata import so that you understand all your options and make the appropriate choices for your goals. See Designing metadata imports.
You can use APIs instead of the user interface to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.
- Asset types
-
Data assets that represent tables or files from a connection to an external data source.
Note: For Microsoft Excel workbooks, each sheet is imported as a separate data asset. The data asset name equals the name of the Excel sheet.Transformation script assets require the Advanced metadata import feature. - Supported connections
-
See the Metadata import (data assets) column in Supported connectors.
- Required permissions
-
To create, manage, and run a metadata import, you must have these roles and permissions:
- The Manage asset discovery user permission.
- The Admin or the Editor role in the project.
- The Admin or the Editor role in the catalog to which you want to import or publish the assets.
- Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.
Creating a metadata import asset and importing metadata
To create a metadata import asset and a job for importing metadata into a project or a catalog:
-
Open a project, go to the project's Asset page and click New asset > Import metadata for data assets.
-
Select the option Discover and click Next. If you don't see different metadata import options, the Advanced metadata import feature is not enabled.
-
Specify a name for the metadata import. Optionally, you can provide a description.
-
Optional: Select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.
-
Select the import target. You can import metadata into the project that you're working in or to any catalog that you are a member of. See Import target.
-
Define a scope for the metadata import. See Scope of import.
- Select an existing connection asset as the source of the data, or click Create a new connection and create a connection asset. You can import metadata from the data sources that are listed in Supported connectors.
- Select the items that you want to include in the import and click Select.
- Review the selected scope. You can directly delete assets from the data scope or you can rework the entire scope by clicking Edit data scope. When you're done refining the data scope, click Next.
-
Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. See Scheduling options.
-
Optional. Customize the import behavior. You can choose to prevent specific properties from being updated and to delete existing assets that are not included in the reimport. See Advanced import options.
-
Review the metadata import configuration. To make changes, click the Edit icon
on the tile and update the settings.
-
Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import runs on the defined schedule.
Important: Assets from the same connection that were already imported through a different metadata import are not imported anew but are updated. Such assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.
Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.
A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.
When the import is complete, you can see the list of assets with the following information:
- The asset name, which provides a link to the asset in the project or catalog.
- The asset type, such as
Data
orReport
. For data assets, also the format, such asRelational table
, is shown. For other asset types, the format column shows a dash (—). - The asset context, such as the parent or file path.
- The date and time that the asset was last imported.
- The import status, which can be
Imported
for successfully imported data,In progress
, orRemoved
if the asset couldn't be reimported.
You can work with most imported data assets in the same way as with connected data assets. Imported assets have a tag automatically assigned that reflects the asset's parent if applicable.
To profile, analyze, and provide business context to imported data assets, create a metadata enrichment asset and include the metadata import asset in the data scope.
Learn more
Next steps
Parent topic: Importing metadata with MANTA Automated Data Lineage