Importing metadata from a project (Watson Knowledge Catalog)

You can import technical and process metadata associated with the data assets in your organization into a project or a catalog to inventory, evaluate, and catalog these assets.

Technical metadata describes the structure of data objects. Process metadata includes operative information about the lineage of a data asset. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.

The metadata that you import can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.

When you import metadata, you add data assets to a project or a catalog. If you import the assets to a project, they are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets.

Required permissions
To create and run a metadata import, you must have this user permission:
  • Discover assets

To import metadata into a catalog, you must also have the Admin or the Editor role in the catalog to which you want to import.

Metadata import overview

Importing metadata involves the following process:

Creating a metadata import asset and importing metadata

To create a metadata import asset for importing metadata into a project or a catalog:

  1. Open a project and click Add to project > Metadata Import. After you create the first metadata import in this way, you can add new metadata import assets from the project’s Asset page.
  2. Specify a name for the metadata import. Optionally, you can provide a description.
  3. Select the import target. You can import metadata into the project that you’re working in or to a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you.

    Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. If you know the contents of the data assets well, you can import their metadata directly into the catalog.

  4. Select an existing connection asset as the source of the data, or click Add connection and create a connection asset.

    You can import metadata from these data sources:

    IBM Third-party User-defined
    Analytics Engine HDFS
    Cloud Object Storage
    Compose for MySQL
    Data Virtualization Manager for z/OS¹
    Databases for PostgreSQL
    Db2
    Db2 Big SQL
    Db2 for i
    Db2 for z/OS
    Db2 Hosted
    Db2 on Cloud
    Db2 Warehouse
    Informix
    Amazon RDS for MySQL
    Amazon RDS for PostgreSQL
    Amazon S3²
    Apache HDFS
    Apache Hive
    Google BigQuery²
    Microsoft Azure Data Lake Store
    Microsoft Azure SQL Database
    Microsoft SQL Server
    MySQL
    Oracle
    Pivotal Greenplum
    PostgreSQL
    Sybase
    Sybase IQ
    Generic JDBC²

    Notes:

    ¹ With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.

    When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.

    For more information, see Adding COBOL copybook assets.

    ² This type of connection must be created at project level and then selected from the list of existing connections when you create a metadata import. You cannot create such a connection from within the metadata import.

  5. Click Next.
  6. Define a scope for the metadata import. Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.

    Scoping is not supported when importing copybook assets.

    Click Add to scope for each item that you want to include in the import. When you’re done selecting items, click Next.

  7. Define whether you want to run scheduled import jobs. If you don’t set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs.

    Optionally, change the name of the import job. The default name is metadata_import_name job.

    You can later access the import job you create from within the metadata import asset or from the project’s Jobs page.

  8. Review the metadata import configuration. If you want to make changes, go back and change settings as required.

  9. Click Save. If you didn’t configure a schedule, the metadata import asset is saved, and the import is run immediately. If you configured a schedule, the metadata import asset is also saved, but the import will run on the defined schedule.

    Important: Assets that were already imported through a different metadata import are not imported again and do not show up in the current metadata import. Thus, the new metadata import might not contain any assets at all.

Viewing the metadata import

Metadata import assets are listed in the Metadata imports section of the Assets page. To view an asset, click its name.

When you view the metadata import asset, you can see the list of assets imported with a run of the associated import job. You can work with these assets, rerun the import to refresh the imported assets, or delete all assets imported with this metadata import from the project.

You can work with imported data assets in exactly the same way as with connected data assets. However, imported assets have some tags automatically assigned: the tag discovered and a tag reflecting the asset’s parent, if applicable.

Available import actions are:

To view metadata import asset details, click the information icon. You can edit the asset name and the description. Note that changing the asset name does not change the name of the associated import job.

Rerunning the import

If you did not configure a schedule, you can manually rerun the metadata import at any time in several ways:

Any reruns of an import refresh asset information as described in the Import again section.