Importing metadata from a project (Watson Knowledge Catalog)
You can import technical and process metadata associated with the data assets in your organization into a project or a catalog to inventory, evaluate, and catalog these assets.
Technical metadata describes the structure of data objects. Process metadata includes operative information about the lineage of a data asset. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.
The metadata that you import can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.
When you import metadata, you add data assets to a project or a catalog. If you import the assets to a project, they are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets.
- Metadata import overview
- Creating a metadata import asset and importing metadata
- Viewing the metadata import
- Rerunning the import
- Required permissions
- To create and run a metadata import, you must have this user permission:
- Discover assets
To import metadata into a catalog, you must also have the Admin or the Editor role in the catalog to which you want to import.
Metadata import overview
Importing metadata involves the following process:
- Identify the data source from which you want to import. You might already have a connection to this data source defined. Otherwise, ensure that you have the credentials to connect to it. For a list of supported connections, see step 4 of the instructions for adding a metadeta import asset.
- In a project, create a metadata import asset to configure the import details like the scope and the target of the import and the schedule for the the import job.
- Import assets to the project or the catalog. When you access an imported data asset, the data is dynamically retrieved from the data source.
- Analyze and preview the imported metadata, and share it to the catalog if you imported the metadata to a project.
Creating a metadata import asset and importing metadata
To create a metadata import asset for importing metadata into a project or a catalog:
- Open a project and click Add to project > Metadata Import. After you create the first metadata import in this way, you can add new metadata import assets from the project’s Asset page.
- Specify a name for the metadata import. Optionally, you can provide a description.
-
Select the import target. You can import metadata into the project that you’re working in or to a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you.
Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. If you know the contents of the data assets well, you can import their metadata directly into the catalog.
-
Select an existing connection asset as the source of the data, or click Add connection and create a connection asset.
You can import metadata from these data sources:
IBM Third-party User-defined Analytics Engine HDFS
Cloud Object Storage
Compose for MySQL
Data Virtualization Manager for z/OS¹
Databases for PostgreSQL
Db2
Db2 Big SQL
Db2 for i
Db2 for z/OS
Db2 Hosted
Db2 on Cloud
Db2 Warehouse
InformixAmazon RDS for MySQL
Amazon RDS for PostgreSQL
Amazon S3²
Apache HDFS
Apache Hive
Google BigQuery²
Microsoft Azure Data Lake Store
Microsoft Azure SQL Database
Microsoft SQL Server
MySQL
Oracle
Pivotal Greenplum
PostgreSQL
Sybase
Sybase IQGeneric JDBC² Notes:
¹ With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.
When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.
For more information, see Adding COBOL copybook assets.
² This type of connection must be created at project level and then selected from the list of existing connections when you create a metadata import. You cannot create such a connection from within the metadata import.
- Click Next.
-
Define a scope for the metadata import. Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.
Scoping is not supported when importing copybook assets.
Click Add to scope for each item that you want to include in the import. When you’re done selecting items, click Next.
-
Define whether you want to run scheduled import jobs. If you don’t set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs.
Optionally, change the name of the import job. The default name is metadata_import_name job.
You can later access the import job you create from within the metadata import asset or from the project’s Jobs page.
-
Review the metadata import configuration. If you want to make changes, go back and change settings as required.
-
Click Save. If you didn’t configure a schedule, the metadata import asset is saved, and the import is run immediately. If you configured a schedule, the metadata import asset is also saved, but the import will run on the defined schedule.
Important: Assets that were already imported through a different metadata import are not imported again and do not show up in the current metadata import. Thus, the new metadata import might not contain any assets at all.
Viewing the metadata import
Metadata import assets are listed in the Metadata imports section of the Assets page. To view an asset, click its name.
When you view the metadata import asset, you can see the list of assets imported with a run of the associated import job. You can work with these assets, rerun the import to refresh the imported assets, or delete all assets imported with this metadata import from the project.
You can work with imported data assets in exactly the same way as with connected data assets. However, imported assets have some tags automatically assigned: the tag discovered
and a tag reflecting the asset’s parent, if applicable.
Available import actions are:
-
Import again
This action refreshes the assets. Existing assets are updated, which means, any content changes are merged. New assets in the data source might be added, depending on the defined scope. If you removed an asset from the metadata import asset, project, or catalog, the asset in question is imported again. Removal of an asset from the data source is not reflected in the metadata import. Such an asset might still show up in the metadata import asset, project, or catalog but will be stale.
-
Remove all assets
This action removes all imported assets from the list within the metadata import and, if imported into the project, from the project. If assets were imported into the catalog, this action has no effect.
To view metadata import asset details, click the information icon. You can edit the asset name and the description. Note that changing the asset name does not change the name of the associated import job.
Rerunning the import
If you did not configure a schedule, you can manually rerun the metadata import at any time in several ways:
- Open the metadata import asset and select Import actions > Import again.
- Open the metadata import asset and click the job name beneath the asset name, which takes you to the job page. Click the run icon on this page.
- Go to the project’s Jobs page and run the import job from there.
Any reruns of an import refresh asset information as described in the Import again section.