Importing metadata from a project (Watson Knowledge Catalog)

You can import technical metadata associated with the data assets in your organization into a project or a catalog to inventory, evaluate, and catalog these assets.

Technical metadata describes the structure of data objects. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.

The metadata that you import can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.

When you import metadata, you add data assets to a project or a catalog. If you import the assets to a project, they are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets.

Required permissions
To create and run a metadata import, you must have the Admin or the Editor role in the project. To import metadata into a catalog, you must also have the Admin or the Editor role in the catalog to which you want to import.

Metadata import overview

Importing metadata involves the following process:

Creating a metadata import asset and importing metadata

To create a metadata import asset and a job for importing metadata into a project or a catalog:

  1. Open a project and click Add to project > Metadata Import. After you create the first metadata import in this way, you can add new metadata import assets from the project's Asset page.
  2. Specify a name for the metadata import. Optionally, you can provide a description.

  3. Optionally, select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.

  4. Select an existing connection asset as the source of the data, or click Create a new connection and create a connection asset.

    You can import metadata from the data sources listed in Table 1.

  5. Select the import target. You can import metadata into the project that you're working in or to a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you.

    Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. If you know the contents of the data assets well, you can import their metadata directly into the catalog.

  6. Click Next.

  7. Define a scope for the metadata import.

    Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.

    1. Select the items that you want to include in the import. When you're done, click Select.
    2. Review the selected scope. You can directly delete assets from the data scope or you can rework the entire scope by clicking Edit data scope.
    3. When you're done refining the data scope, click Next.
  8. Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time.

    If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job will run exactly one time at the specified day and time. If you schedule recurring runs, the job will run for the first time at the timestamp indicated in the Repeat section.

    Optionally, change the name of the import job. The default name is metadata_import_name job.

    You can later access the import job you create from within the metadata import asset or from the project's Jobs page. See Jobs.

  9. Review the metadata import configuration. To make changes, click the edit (edit icon) icon on the tile and update the settings.

  10. Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import will run on the defined schedule.

    Important: Assets from the same connection that were already imported through a different metadata import are not imported anew but are updated. Such assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.

You can work with imported data assets in exactly the same way as with connected data assets. Imported assets have a tag automatically assigned that reflects the asset's parent if applicable.

Table 1. Import connections

Supported connections for metadata import
IBM Third-party User-defined
Analytics Engine HDFS
Cloud Object Storage
Compose for MySQL
Data Virtualization Manager for z/OS¹
Databases for MongoDB
Databases for PostgreSQL
Db2
Db2 Big SQL
Db2 for i
Db2 for z/OS
Db2 Hosted
Db2 on Cloud
Db2 Warehouse
Informix
Netezza Performance Server
SQL Query





Amazon RDS for MySQL
Amazon RDS for PostgreSQL
Amazon S3²
Apache HDFS
Apache Hive
Box³
Google BigQuery²
Greenplum
MariaDB
Microsoft Azure Data Lake Store
Microsoft Azure SQL Database
Microsoft SQL Server
MongoDB
MySQL
Oracle
PostgreSQL
Salesforce.com
SAP ASE
SAP HANA
SAP IQ
Snowflake
Generic JDBC

Notes:

¹ With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.

The following map types are not imported: ACI, Catalog, Natural

When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.

For more information, see Adding COBOL copybook assets.

² This type of connection must be created at project level and then selected from the list of existing connections when you create a metadata import. You cannot create such a connection from within the metadata import.

³ Box-specific metadata such as tags, descriptions, classifications, and other metadata are not imported.

Viewing the metadata import

Metadata import assets are listed in the Metadata imports section of the Assets page. To view an asset, click its name or select View from the asset's action menu.

When you view the metadata import asset, you can see the list of assets imported with a run of the associated import job. You can work with these assets, edit the metadata import, or rerun the import.

For each imported asset, you can see the following information:

You can view additional information for an asset, publish it to a catalog, or delete the asset. When you delete an asset from the list of imported assets, it is deleted from the project or catalog to which it was imported but not from the metadata import scope.

The About this metadata import side panel provides a summary of the import configuration, job details, and a list of related assets. To hide the details, click the information icon.

To edit the metadata import asset, click Edit metadata import. You can change these configuration settings:

Rerunning the import

If you did not configure a schedule, you can manually rerun the metadata import at any time in several ways:

Reimporting refreshes the asset information. Existing assets are updated, which means, any content changes are merged. New assets in the data source might be added, depending on the defined scope. If you removed an asset from the metadata import asset, project, or catalog, the asset in question is imported again unless you removed it from the scope. Assets that were removed from the data scope or deleted from the data source after the last import can't be reimported and have the status Removed.

Deleting a metadata import asset

You can delete a metadata import asset from a project. Select the Delete option from the action menu next to the asset on the project Assets page. The metadata import configuration and its associated metadata import job are deleted. Assets in the project or a catalog that were imported with this metadata import asset are not affected.

Next steps

Parent topic: Adding data to an analytics project