Designing metadata imports (IBM Knowledge Catalog)

When you import metadata, you must decide what type of metadata to import, the import target and scope, whether to schedule import jobs, and how you want to customize the import behavior.

Import goals and methods

The first step when you import metadata is to select the import method for your goal. You must decide which type of metadata to import for which type of data object. You must also decide whether you want to work with the imported assets in a project or publish them directly into a catalog.

Typically, metadata import is part of a larger data curation plan. For example, after you import metadata for data assets, you can add business metadata to your imported data assets by running metadata enrichment. You can also run data quality rules. Finally, you can publish the completed data assets to a catalog to share with your organization. Before you design your metadata import, make sure that you understand the implications of your choices to your overall curation plan. See Planning for curation.

For example, a typical curation process for data assets includes the following tasks:

  1. Run metadata import with the Discover option to add data assets to a project.
  2. Run metadata enrichment on the data assets to profile your data, to do basic data quality analysis, and to provide business context through term assignment.
  3. Run data quality rules on the assets.
  4. Publish the assets to a catalog.
  5. Run metadata import for the same data assets with the Get lineage option to add lineage information to those assets in the catalog.

You can add other types of assets directly to a catalog because metadata enrichment and data quality assessment are not applicable. You can choose one of the Get lineage options to simultaneously import technical and lineage metadata for assets while you add those assets to a catalog.

You can choose from the following import methods:

Discover
Import technical metadata that describes the characteristics of data objects such as data tables or files, COBOL copybooks, and transformation scripts into a project or a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of assets. For data assets, the technical metadata allows access to the data and the generation of the data profile and data quality analysis.
Import ETL job
Import technical metadata that describes ETL jobs to a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of ETL job assets.
Import BI report
Import business intelligence reports from analysis and reporting tools to a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of business intelligence assets.
Import data model
Import technical metadata that describes hierarchical data models into a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of data model assets.
Get lineage
Import technical metadata and lineage metadata that shows the flow of data between data objects to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.
Get ETL job lineage
Import technical metadata that describes ETL jobs and lineage metadata that shows the flow of data for ETL jobs to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.
Get BI report lineage
Import technical metadata that describes business intelligence reports and lineage metadata that shows the flow of data for business intelligence reports to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.

Special considerations for projects marked as sensitive

If your project is marked as sensitive, the only import option is Discover with a project as the import target. See Marking a project as sensitive.

Data stewards who are responsible for curating data and have access to the data source work in projects that are not marked as sensitive to import lineage to a catalog. The data assets that are created in the catalog by the import can be added to a project for consumption.

Data consumers such as data scientists can search for data assets in catalogs and also view the lineage information in the catalog. They can add the data assets to a project marked as sensitive for analysis or transformation. However, they cannot move any data assets out of such a project.

Choosing an import method

Choose the import method based on your goals and requirements.

Goal Method
Import data to a project for enrichment and quality analysis Discover
Import data to a catalog Discover
Add COBOL copybooks to a project or catalog Discover
Add business intelligence reports to a catalog Import BI report
Add transformation scripts to a project or catalog Discover
Add data integration components from ETL jobs to a catalog Import ETL job
Add data models to a catalog Import data model
Add or update data and lineage information in a catalog Get lineage
Add or update business intelligence reports and their lineage information in a catalog Get BI report lineage
Add or update data integration components and lineage information for ETL jobs in a catalog Get ETL job lineage

The following table shows the results for each import method.

Method Metadata types Resulting asset types
Discover Technical • Data
• COBOL copybooks
• Transformation scripts
Import ETL jobs Technical • Data integration job
• Data integration component
• Data integration column
Import BI reports Technical • Reports
• Report queries
• Report query items
Import data model Technical • Logical model
• Logical model attribute
• Logical model entity
• Logical model relationship
• Physical model
• Physical model schema
• Physical model table
• Physical model view
• Physical model column
• Physical model constraint
Get lineage Technical
Lineage
• Data
• COBOL copybooks
• Reports
• Report queries
• Report query items
• Transformation scripts
Get ETL job lineage Technical
Lineage
• Data
• Data integration job
• Data integration component
• Data integration column
Get BI report lineage Technical
Lineage
• Data
• Reports
• Report queries
• Report query items

Import target

Depending on the type of metadata that you want to import, you can import the metadata into the project that you're working in or into a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you. However, if your project is marked as sensitive, you can import only to the project, not to a catalog. See Marking a project as sensitive.

The following table shows what type of metadata you can import into projects and catalogs.

What to import Import to project? Import to catalog?
Data assets Yes Yes
COBOL copybook assets Yes Yes
Business intelligence report assets No Yes
Transformation script assets No Yes
Data integration assets No Yes
Data model assets No Yes
Data lineage No Yes

For data assets, if you want to run metadata enrichment and data quality rules on them, select the project as the import target. You publish the imported data assets to a catalog after you are satisfied with their business metadata assignments and data quality.

If you know the contents of the data assets well, and you do not want to run metadata enrichment or data quality rules, you can import their metadata directly into the catalog.

If you import to a catalog, make sure that the target catalog has duplicate asset handling set to update the original assets instead of to allow duplicate assets. See Duplicate asset handling.

If you want data protection rules to be enforced on the imported data assets, you must select a governed catalog as the import target.

Scope of import

You have different scope options depending on the import method that you select.

Discover

You have the following scope options:

  • You can select one data source.
  • For some data sources, you can select the entire connection or a database. Next time when the import job runs, or when you reimport assets manually, new schemas that were added in the data source are also imported to IBM Knowledge Catalog.
  • For some data sources, you can narrow the scope by schemas or folders, or to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better. However, you can't import data from schemas where the name contains special characters. After you select a scope, you can delete assets from the data scope until you have exactly what you need.
  • You can create multiple metadata import assets for a single data source. Each metadata import contains tables or files that have a similar frequency of changes to structure, schema, or data rows. You can then run each metadata import on a different schedule. For example, you might create separate metadata imports for tables with frequent updates, infrequent updates, and rare updates.
Import ETL job

You have the following scope options:

  • You can select a single ETL job file from your project to import data integration assets.
  • You can import one or more DataStage flows from your project.

See Preparing ETL job files.

Import BI report

You have the following scope options:

  • You can select a single input file from your project to import reports from Microsoft SQL Server Analysis Services or Statistical Analysis System.
  • You can select a single connection to a reporting tool. For a connection to a reporting tool other than Cognos Analytics, the import always covers all assets that are available from the selected connection. For a connection to Cognos Analytics, you can narrow the data scope to one or more folders.

See Preparing manual input for importing business intelligence reports.

Import data model

You can select a single data model file from your project to import data model assets. See Preparing data model files.

Get lineage

You have the following scope options:

  • You can select multiple data sources.
  • For connections to supported relational data sources, you can narrow the data scope to one or more schemas.
Get ETL job lineage

You have the following scope options:

  • You can select a single ETL job file from your project to import data integration assets and their lineage.
  • You can import one or more DataStage flows and their lineage from your project.
  • In addition to the input files, you can select connections to data sources that are used as source or target in the ETL job to get the lineage between the ETL job and these data sources.
Get BI report lineage

You have the following scope options:

  • You can select a single input file from your project to import reports from Microsoft SQL Server Analysis Services or Statistical Analysis System.
  • You can select one or more connections to a reporting tool. For connections to reporting tools other than Cognos Analytics, the import always covers all assets that are available from the selected connection. For connections to Cognos Analytics, you can narrow the data scope to one or more folders.
  • In addition to an input file or connections to a reporting tool, you can select connections to data sources that hold source assets of the business intelligence report. Thus, you include technical and lineage metadata for such assets and get the lineage between the report and these assets.
  • You can also combine a single input file with one or more connections to a reporting tool and any number of connections to data sources.

See Preparing manual input for importing business intelligence reports.

Scheduling options

If you don't set a schedule, you run the import when you initially save the metadata import asset. You can rerun the import manually at any time.

If you select to run the import on a specific schedule, define the date and time you want the job to run. You might want to coordinate scheduled metadata import and the corresponding metadata enrichment jobs for the same assets.

If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job runs exactly one time at the specified day and time. If you schedule recurring runs, the job runs for the first time at the timestamp indicated in the Repeat section.

The default name of the import job is metadata_import_name job. When you set up the metadata import, you can change the name to fit your naming schema. However, you can't change the name later. You can access the import job you create from within the metadata import asset or from the project's Jobs page. See Jobs.

You can update the schedule of a metadata import by editing the metadata import asset.

Advanced import options

You can customize the general import behavior and what happens to imported assets when you rerun a metadata import.

Prevent specific properties from being updated

By default, all asset properties are updated when assets are reimported. If you don't want the asset names, asset descriptions, or any column descriptions to be updated on reimport, clear the respective checkboxes on the Update on reimport list.

Delete existing assets that are not included in the reimport

By default, no assets are deleted from the target project or catalog when you rerun the import. To clean up the target project or catalog, select from the Delete on reimport options.

These options are not available for all import goals.

Option Description Import goal
Asset not found in the data source or excluded from import In these cases, delete previously imported assets from the import target when the import is rerun:
• The asset is no longer available in the data source.
• The Exclude from import setting changed for the rerun, so that the asset is now excluded from import (applicable ony for import goal Discover).
Discover, Get lineage, Discover, Import ETL job, Get lineage, Get ETL job lineage
Asset removed from the import scope Delete assets that were removed from the scope of this metadata after the last run from the import target when the import is rerun. Discover
Do not import specific types of relational assets

For metadata imports with the goal Discover that you run on relational databases, you can select whether you want to import all types of relational assets or whether you want to exclude tables, or views, aliases, and synonyms. These options are mutually exclusive.

Import additional asset properties

For metadata imports with the goal Discover that you run on relational databases, you can select whether primary and foreign keys that might be defined in the database are imported.

Enable additional import options

Enable incremental imports to import only new or modified data assets when you rerun the import. This option is available only for metadata imports with the goal Discover where the selected data source supports incremental imports:

Updating or removing the description of an asset in the data source does not change the asset's modification date. The modification date also doesn't change for assets that are removed from the list of imported assets. Therefore, such assets are not considered for incremental imports. In addition, assets that are deleted from the data source or from the scope are not detected with incremental imports. Thus, such assets are not marked as Removed or deleted as specified with the Delete on reimport settings. To see such changes reflected, disable incremental imports to reimport all assets in the data scope.

Important:

Incremental imports might not work if the data source and the Cloud Pak for Data client workstation are in different time zones. If the client is in a time zone that is ahead of the data source's time zone, the metadata import job might not detect assets that were added or modified after the last import run. In this case, disable incremental import so that all assets are included when you rerun the import.
For incremental imports to work, the data source must be in the GMT time zone regardless of the client's time zone.

Collect metadata from database catalog

For metadata imports with the goal Discover that you run on relational databases, you can choose to import metadata from the database catalog. Thus, the user who runs the import needs access only to the database catalog but doesn't need to have SELECT permission on the actual data. The imported assets cannot be profiled or used in metadata enrichment.

Include DataStage job runs

For metadata imports with the goal Get ETL job lineage, you can include or exclude DataStage job runs to be used by MANTA Automated Data Lineage. Exclude job runs to limit the script count for MANTA Automated Data Lineage. This option is not available when you edit the import configuration. DataStage job runs are not imported to a catalog and they are only displayed in the MANTA Automated Data Lineage user interface.

Import lineage for dependent assets not in scope

For metadata imports with the goal Get lineage, you can additionally import lineage assets that are not selected in the scope, but that are related to assets selected in the scope. This option is applicable to imports from the Oracle data sources.

Import asset timestamp

For metadata imports with the goal Discover, you can include the information about the time when the asset was last modified. The metadata_modification_token attribute is added to the extended_metadata property of an asset. This option is available for the following data sources:

Learn more

Parent topic: Importing metadata with MANTA Automated Data Lineage