Designing metadata imports (IBM Knowledge Catalog)
When you import metadata, you must decide what type of metadata to import, the import target and scope, whether to schedule import jobs, and how you want to customize the import behavior.
Import goals and methods
The first step when you import metadata is to select the import method for your goal. You must decide which type of metadata to import for which type of data object. You must also decide whether you want to work with the imported assets in a project or publish them directly into a catalog.
Typically, metadata import is part of a larger data curation plan. For example, after you import metadata for data assets, you can add business metadata to your imported data assets by running metadata enrichment. You can also run data quality rules. Finally, you can publish the completed data assets to a catalog to share with your organization. Before you design your metadata import, make sure that you understand the implications of your choices to your overall curation plan. See Planning for curation.
For example, a typical curation process for data assets includes the following tasks:
- Run metadata import with the Discover option to add data assets to a project.
- Run metadata enrichment on the data assets to profile your data, to do basic data quality analysis, and to provide business context through term assignment.
- Run data quality rules on the assets.
- Publish the assets to a catalog.
- Run metadata import for the same data assets with the Get lineage option to add lineage information to those assets in the catalog.
You can add other types of assets directly to a catalog because metadata enrichment and data quality assessment are not applicable. You can choose one of the Get lineage options to simultaneously import technical and lineage metadata for assets while you add those assets to a catalog.
You can choose from the following import methods:
- Discover
- Import technical metadata that describes the characteristics of data objects such as data tables or files, COBOL copybooks, and transformation scripts into a project or a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of assets. For data assets, the technical metadata allows access to the data and the generation of the data profile and data quality analysis.
- Import ETL job
- Import technical metadata that describes ETL jobs to a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of ETL job assets.
- Import BI report
- Import business intelligence reports from analysis and reporting tools to a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of business intelligence assets.
- Import data model
- Import technical metadata that describes hierarchical data models into a catalog. The imported technical metadata provides information for asset details, relationships, and the preview of data model assets.
- Get lineage
- Import technical metadata and lineage metadata that shows the flow of data between data objects to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.
- Get ETL job lineage
- Import technical metadata that describes ETL jobs and lineage metadata that shows the flow of data for ETL jobs to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.
- Get BI report lineage
- Import technical metadata that describes business intelligence reports and lineage metadata that shows the flow of data for business intelligence reports to a catalog. The imported lineage metadata provides the lineage information for the Lineage page and the technical lineage in the MANTA Automated Data Lineage UI.
Special considerations for projects marked as sensitive
If your project is marked as sensitive, the only import option is Discover with a project as the import target. See Marking a project as sensitive.
Data stewards who are responsible for curating data and have access to the data source work in projects that are not marked as sensitive to import lineage to a catalog. The data assets that are created in the catalog by the import can be added to a project for consumption.
Data consumers such as data scientists can search for data assets in catalogs and also view the lineage information in the catalog. They can add the data assets to a project marked as sensitive for analysis or transformation. However, they cannot move any data assets out of such a project.
Choosing an import method
Choose the import method based on your goals and requirements.
Goal | Method |
---|---|
Import data to a project for enrichment and quality analysis | Discover |
Import data to a catalog | Discover |
Add COBOL copybooks to a project or catalog | Discover |
Add business intelligence reports to a catalog | Import BI report |
Add transformation scripts to a project or catalog | Discover |
Add data integration components from ETL jobs to a catalog | Import ETL job |
Add data models to a catalog | Import data model |
Add or update data and lineage information in a catalog | Get lineage |
Add or update business intelligence reports and their lineage information in a catalog | Get BI report lineage |
Add or update data integration components and lineage information for ETL jobs in a catalog | Get ETL job lineage |
The following table shows the results for each import method.
Method | Metadata types | Resulting asset types |
---|---|---|
Discover | Technical | • Data • COBOL copybooks • Transformation scripts |
Import ETL jobs | Technical | • Data integration job • Data integration component • Data integration column |
Import BI reports | Technical | • Reports • Report queries • Report query items |
Import data model | Technical | • Logical model • Logical model attribute • Logical model entity • Logical model relationship • Physical model • Physical model schema • Physical model table • Physical model view • Physical model column • Physical model constraint |
Get lineage | Technical Lineage |
• Data • COBOL copybooks • Reports • Report queries • Report query items • Transformation scripts |
Get ETL job lineage | Technical Lineage |
• Data • Data integration job • Data integration component • Data integration column |
Get BI report lineage | Technical Lineage |
• Data • Reports • Report queries • Report query items |
Import target
Depending on the type of metadata that you want to import, you can import the metadata into the project that you're working in or into a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you. However, if your project is marked as sensitive, you can import only to the project, not to a catalog. See Marking a project as sensitive.
The following table shows what type of metadata you can import into projects and catalogs.
What to import | Import to project? | Import to catalog? |
---|---|---|
Data assets | Yes | Yes |
COBOL copybook assets | Yes | Yes |
Business intelligence report assets | No | Yes |
Transformation script assets | No | Yes |
Data integration assets | No | Yes |
Data model assets | No | Yes |
Data lineage | No | Yes |
For data assets, if you want to run metadata enrichment and data quality rules on them, select the project as the import target. You publish the imported data assets to a catalog after you are satisfied with their business metadata assignments and data quality.
If you know the contents of the data assets well, and you do not want to run metadata enrichment or data quality rules, you can import their metadata directly into the catalog.
If you import to a catalog, make sure that the target catalog has duplicate asset handling set to update the original assets instead of to allow duplicate assets. See Duplicate asset handling.
If you want data protection rules to be enforced on the imported data assets, you must select a governed catalog as the import target.
Scope of import
You have different scope options depending on the import method that you select.
- Discover
-
You have the following scope options:
- You can select one data source.
- For some data sources, you can select the entire connection or a database. Next time when the import job runs, or when you reimport assets manually, new schemas that were added in the data source are also imported to IBM Knowledge Catalog.
- For some data sources, you can narrow the scope by schemas or folders, or to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better. However, you can't import data from schemas where the name contains special characters. After you select a scope, you can delete assets from the data scope until you have exactly what you need.
- You can create multiple metadata import assets for a single data source. Each metadata import contains tables or files that have a similar frequency of changes to structure, schema, or data rows. You can then run each metadata import on a different schedule. For example, you might create separate metadata imports for tables with frequent updates, infrequent updates, and rare updates.
- Import ETL job
-
You have the following scope options:
- You can select a single ETL job file from your project to import data integration assets.
- You can import one or more DataStage flows from your project.
- Import BI report
-
You have the following scope options:
- You can select a single input file from your project to import reports from Microsoft SQL Server Analysis Services or Statistical Analysis System.
- You can select a single connection to a reporting tool. For a connection to a reporting tool other than Cognos Analytics, the import always covers all assets that are available from the selected connection. For a connection to Cognos Analytics, you can narrow the data scope to one or more folders.
See Preparing manual input for importing business intelligence reports.
- Import data model
-
You can select a single data model file from your project to import data model assets. See Preparing data model files.
- Get lineage
-
You have the following scope options:
- You can select multiple data sources.
- For connections to supported relational data sources, you can narrow the data scope to one or more schemas.
- Get ETL job lineage
-
You have the following scope options:
- You can select a single ETL job file from your project to import data integration assets and their lineage.
- You can import one or more DataStage flows and their lineage from your project.
- In addition to the input files, you can select connections to data sources that are used as source or target in the ETL job to get the lineage between the ETL job and these data sources.
- Get BI report lineage
-
You have the following scope options:
- You can select a single input file from your project to import reports from Microsoft SQL Server Analysis Services or Statistical Analysis System.
- You can select one or more connections to a reporting tool. For connections to reporting tools other than Cognos Analytics, the import always covers all assets that are available from the selected connection. For connections to Cognos Analytics, you can narrow the data scope to one or more folders.
- In addition to an input file or connections to a reporting tool, you can select connections to data sources that hold source assets of the business intelligence report. Thus, you include technical and lineage metadata for such assets and get the lineage between the report and these assets.
- You can also combine a single input file with one or more connections to a reporting tool and any number of connections to data sources.
See Preparing manual input for importing business intelligence reports.
Scheduling options
If you don't set a schedule, you run the import when you initially save the metadata import asset. You can rerun the import manually at any time.
If you select to run the import on a specific schedule, define the date and time you want the job to run. You might want to coordinate scheduled metadata import and the corresponding metadata enrichment jobs for the same assets.
If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job runs exactly one time at the specified day and time. If you schedule recurring runs, the job runs for the first time at the timestamp indicated in the Repeat section.
The default name of the import job is metadata_import_name job. When you set up the metadata import, you can change the name to fit your naming schema. However, you can't change the name later. You can access the import job you create from within the metadata import asset or from the project's Jobs page. See Jobs.
You can update the schedule of a metadata import by editing the metadata import asset.
Advanced import options
You can customize the general import behavior and what happens to imported assets when you rerun a metadata import.
- Prevent specific properties from being updated
-
By default, all asset properties are updated when assets are reimported. If you don't want the asset names, asset descriptions, or any column descriptions to be updated on reimport, clear the respective checkboxes on the Update on reimport list.
- Delete existing assets that are not included in the reimport
-
By default, no assets are deleted from the target project or catalog when you rerun the import. To clean up the target project or catalog, select from the Delete on reimport options.
These options are not available for all import goals.
Option Description Import goal Asset not found in the data source or excluded from import In these cases, delete previously imported assets from the import target when the import is rerun:
• The asset is no longer available in the data source.
• The Exclude from import setting changed for the rerun, so that the asset is now excluded from import (applicable ony for import goal Discover).Discover, Get lineage, Discover, Import ETL job, Get lineage, Get ETL job lineage Asset removed from the import scope Delete assets that were removed from the scope of this metadata after the last run from the import target when the import is rerun. Discover
- Do not import specific types of relational assets
-
For metadata imports with the goal Discover that you run on relational databases, you can select whether you want to import all types of relational assets or whether you want to exclude tables, or views, aliases, and synonyms. These options are mutually exclusive.
- Import additional asset properties
-
For metadata imports with the goal Discover that you run on relational databases, you can select whether primary and foreign keys that might be defined in the database are imported.
- Enable additional import options
-
Enable incremental imports to import only new or modified data assets when you rerun the import. This option is available only for metadata imports with the goal Discover where the selected data source supports incremental imports:
- Amazon RDS for Oracle
- IBM Db2
- IBM Db2 Big SQL
- IBM Db2 on Cloud
- IBM Netezza Performance Server
- IBM Data Virtualization
- Microsoft Azure SQL Database
- Microsoft SQL Server
- Oracle
- Teradata
Updating or removing the description of an asset in the data source does not change the asset's modification date. The modification date also doesn't change for assets that are removed from the list of imported assets. Therefore, such assets are not considered for incremental imports. In addition, assets that are deleted from the data source or from the scope are not detected with incremental imports. Thus, such assets are not marked as Removed or deleted as specified with the Delete on reimport settings. To see such changes reflected, disable incremental imports to reimport all assets in the data scope.
Important:Incremental imports might not work if the data source and the Cloud Pak for Data client workstation are in different time zones. If the client is in a time zone that is ahead of the data source's time zone, the metadata import job might not detect assets that were added or modified after the last import run. In this case, disable incremental import so that all assets are included when you rerun the import.
For incremental imports to work, the data source must be in the GMT time zone regardless of the client's time zone. - Collect metadata from database catalog
-
For metadata imports with the goal Discover that you run on relational databases, you can choose to import metadata from the database catalog. Thus, the user who runs the import needs access only to the database catalog but doesn't need to have SELECT permission on the actual data. The imported assets cannot be profiled or used in metadata enrichment.
- Include DataStage job runs
-
For metadata imports with the goal Get ETL job lineage, you can include or exclude DataStage job runs to be used by MANTA Automated Data Lineage. Exclude job runs to limit the script count for MANTA Automated Data Lineage. This option is not available when you edit the import configuration. DataStage job runs are not imported to a catalog and they are only displayed in the MANTA Automated Data Lineage user interface.
- Import lineage for dependent assets not in scope
-
For metadata imports with the goal Get lineage, you can additionally import lineage assets that are not selected in the scope, but that are related to assets selected in the scope. This option is applicable to imports from the Oracle data sources.
- Import asset timestamp
-
For metadata imports with the goal Discover, you can include the information about the time when the asset was last modified. The
metadata_modification_token
attribute is added to theextended_metadata
property of an asset. This option is available for the following data sources:
Learn more
Parent topic: Importing metadata with MANTA Automated Data Lineage