Importing metadata from a project (Watson Knowledge Catalog)
You can import technical metadata associated with the data assets in your organization into a project or a catalog to inventory, evaluate, and catalog these assets.
Technical metadata describes the structure of data objects. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.
The metadata that you import can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.
When you import metadata, you add data assets to a project or a catalog. If you import the assets to a project, they are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets.
- Metadata import overview
- Creating a metadata import asset and importing metadata
- Viewing the metadata import
- Rerunning the import
- Deleting a metadata import asset
Required permissions
To create and run a metadata import, you must have the Admin or the Editor role in the project.
To import metadata into a catalog, you must also have the Admin or the Editor role in the catalog to which you want to import.
Metadata import overview
Importing metadata involves the following process:
- Identify the data source from which you want to import. You might already have a connection to this data source defined. Otherwise, ensure that you have the credentials to connect to it. For a list of supported connections, see step 4 of the instructions for adding a metadata import asset.
- In a project, create a metadata import asset to configure the import details like the scope and the target of the import and the schedule for the the import job.
- Import assets to the project or the catalog. When you access an imported data asset, the data is dynamically retrieved from the data source.
- Analyze and preview the imported metadata, and share it to the catalog if you imported the metadata to a project. You can create profiles for individual assets one at a time from each asset’s Profile tab.
Creating a metadata import asset and importing metadata
To create a metadata import asset and a job for importing metadata into a project or a catalog:
- Open a project and click Add to project > Metadata Import. After you create the first metadata import in this way, you can add new metadata import assets from the project's Asset page.
-
Specify a name for the metadata import. Optionally, you can provide a description.
-
Optionally, select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.
-
Select an existing connection asset as the source of the data, or click Create a new connection and create a connection asset.
You can import metadata from the data sources listed in Table 1.
-
Select the import target. You can import metadata into the project that you're working in or to a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you.
Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. If you know the contents of the data assets well, you can import their metadata directly into the catalog.
-
Click Next.
-
Define a scope for the metadata import.
Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.
- Select the items that you want to include in the import. When you're done, click Select.
- Review the selected scope. You can directly delete assets from the data scope or you can rework the entire scope by clicking Edit data scope.
- When you're done refining the data scope, click Next.
-
Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time.
If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job will run exactly one time at the specified day and time. If you schedule recurring runs, the job will run for the first time at the timestamp indicated in the Repeat section.
Optionally, change the name of the import job. The default name is metadata_import_name job.
You can later access the import job you create from within the metadata import asset or from the project's Jobs page. See Jobs.
-
Review the metadata import configuration. To make changes, click the edit () icon on the tile and update the settings.
-
Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import will run on the defined schedule.
Important: Assets from the same connection that were already imported through a different metadata import are not imported anew but are updated. Such assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.
You can work with imported data assets in exactly the same way as with connected data assets. Imported assets have a tag automatically assigned that reflects the asset's parent if applicable.
Table 1. Import connections
Notes:
¹ With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.
The following map types are not imported: ACI, Catalog, Natural
When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.
For more information, see Adding COBOL copybook assets.
² This type of connection must be created at project level and then selected from the list of existing connections when you create a metadata import. You cannot create such a connection from within the metadata import.
³ Box-specific metadata such as tags, descriptions, classifications, and other metadata are not imported.
Viewing the metadata import
Metadata import assets are listed in the Metadata imports section of the Assets page. To view an asset, click its name or select View from the asset's action menu.
When you view the metadata import asset, you can see the list of assets imported with a run of the associated import job. You can work with these assets, edit the metadata import, or rerun the import.
For each imported asset, you can see the following information:
- The asset name, which provides a link to the asset in the project or catalog.
- The type, such as
Relational table
. - The asset context, such as the parent or file path.
- The date and time that the asset was last imported.
- The import status, which can be
Imported
for successfully imported data,In progress
, orRemoved
if the asset couldn't be reimported. See Rerunning the import.
You can view additional information for an asset, publish it to a catalog, or delete the asset. When you delete an asset from the list of imported assets, it is deleted from the project or catalog to which it was imported but not from the metadata import scope.
The About this metadata import side panel provides a summary of the import configuration, job details, and a list of related assets. To hide the details, click the information icon.
To edit the metadata import asset, click Edit metadata import. You can change these configuration settings:
- Asset details such as the asset name, the description, or tags. Note that changing the asset name does not change the name of the associated import job. You cannot change the connection or the import target.
- The data scope.
- The schedule.
Rerunning the import
If you did not configure a schedule, you can manually rerun the metadata import at any time in several ways:
- Open the metadata import asset and select Reimport assets.
- Open the metadata import asset and click the job name in the About this metadata import side panel, which takes you to the job page. Click the run icon on this page.
- Go to the project's Jobs page and run the import job from there.
Reimporting refreshes the asset information. Existing assets are updated, which means, any content changes are merged. New assets in the data source might be added, depending on the defined scope. If you removed an asset from the metadata import
asset, project, or catalog, the asset in question is imported again unless you removed it from the scope. Assets that were removed from the data scope or deleted from the data source after the last import can't be reimported and have the status
Removed
.
Deleting a metadata import asset
You can delete a metadata import asset from a project. Select the Delete option from the action menu next to the asset on the project Assets page. The metadata import configuration and its associated metadata import job are deleted. Assets in the project or a catalog that were imported with this metadata import asset are not affected.
Next steps
Parent topic: Adding data to an analytics project