Importing governance artifacts (Watson Knowledge Catalog)

You can create governance artifacts outside of the catalog and import them from a file that is in CSV (comma-separated value) format. This file can be generated from another software application such as a spreadsheet program.

You can also export governance artifacts, modify the exported file, and then import your updates.

To import all governance artifacts from a different service instance of Watson Knowledge Catalog, or to import Industry accelerators, use the Watson Data API to import a ZIP file.

Required permissions You must have this user permission to import top-level categories:

Consider the following when preparing a CSV file for import:

After you create a CSV file, you can import it:

Governance artifacts that you can import

The following governance artifacts can be imported using a CSV file:

You can't import data protection rules.

All governance artifacts are imported as drafts and subject to workflow, except categories. Any categories you import are published immediately without workflow.

The format of an import CSV file

The header row of the CSV file determines what data is imported for an artifact. For information about which data you can import, see Common columns and Artifact-specific columns.

To define relationships for artifacts that you want to import, specify the related artifact with its full category hierarchy path. Hierarchies are delimited by two greater-than (>>) symbols. Use the following format:

top-level category >> primary category >> artifact

Common columns

The following table shows which common columns are supported for each type of artifact when you import governance artifacts. The asterisk (*) indicates mandatory columns. Use the exact column names in the header row of your CSV import file (except for custom attribute values).

Table 1. Supported columns for importing governance artifacts
Categories Terms Classifications Data classes Reference data sets Policies Governance rules
Name*
Artifact Type*
Category*
Description
Secondary Categories
Stewards ✓ (¹)
Tags ✓ (¹)
Classifications ✓ (¹)
Business Start
Business End
Related Terms
Custom attribute values

¹ This information will not be displayed in the UI. To check the values, you can use the REST API call GET /v3/categories/{guid}. For details, see the API documentation.

Name* Artifact names can't contain any greater-than (>) symbols. Names can't start or end with a blank space or greater-than (>) symbol. Names can be up to 255 characters long.
The name is associated with the artifact's primary category.

Artifact Type* You can import data for only one type of governance artifact at a time. Use the following values to specify the artifact type:

Table 1. Artifact type values
Artifact type Value
Business terms glossary_term
Categories category
Classifications classification
Data classes data_class
Governance rules rule
Policies policy
Reference data sets reference_data

Category (or category)* Refers to the primary category of the artifact. It must be specified with its full hierarchy path, starting with a top-level category, such as: top-level category >> parent category >> subcategory

The primary category is associated with the name of the artifact. If you do not specify a category, the artifacts are assigned to the category named [uncategorized] by default.

To import a new category (named new-category) specify:

 Name,Artifact Type,Category 
 new-category,category,

To add the category (named new-category) as a subcategory to an existing category (named parent-category) specify:

 Name,Artifact Type,Category
 new-category,category,parent-category

Description (or description) Enter a description for the artifact.

Secondary Categories (or secondary_categories) Any additional categories in which a term is referenced. To import multiple secondary categories for a term, add multiple rows for that term, with a different secondary category value in each row.

Stewards (or stewards)

Are represented as user IDs consisting of numerical values. If you have the Administer platform permission, you can view user IDs on the User management page.

Tags (or tags) If the tag does not yet exist, a new tag is created. To import multiple tags for an artifact, add multiple rows for that artifact, with a different value for a tag in each row.

Classifications (or classifications) A classification must be specified with its full category hierarchy path, such as: top-level category >> primary category >> classification

Example

  Name,Artifact Type,Category,Classifications 
  Account,glossary_term,Insurance >> Business Area >> All Accounts,Insurance >> concept

To import multiple classifications for an artifact, add multiple rows for that artifact, with a different value for a classification in each row.

Business Start (or businessStart) Effective date for when the published artifact becomes active. Specify the start date and time including the UTC time offset in the format yyyy-mm-dd hh:mm±hh:mm, for example, 2020-10-07 16:00+00:00. Specifying the time offset is optional. If you omit that part, UTC is assumed. By default, no check occurs during the import whether the business start date is in the past to allow for imports of previously exported artifacts. However, an administrator can enable validation of this date. See Enable validation of the business start date on CSV import.

The effective start date is set as follows:

Business End (or businessEnd) Effective date for when the published artifact becomes inactive. Specify the start date and time including the UTC time offset in the format yyyy-mm-dd hh:mm±hh:mm, for example, 2021-10-07 17:00+00:00. Specifying the time offset is optional. If you omit that part, UTC is assumed.

Example

  Name,Artifact Type,Category,Classifications,Related Terms 
  Account claim tracking,glossary_term,Insurance >> Business Area >> Claim,Insurance >> property,Insurance >> Business Area >> Account >> Account

To import multiple terms for an artifact, add multiple rows for that artifact, with a different value for a term in each row.

Custom attribute values (or custom_attributedefinitionname) Values for any type of custom attribute except for custom attributes of type Relationship. The custom attribute must exist in the target catalog. It cannot be created via import.
The column name for such a value is the name of the custom attribute prefixed with custom_.

Artifact-specific columns

Table 1. Artifact-specific columns
Terms Classifications Data classes Reference data sets Policies Rules
Part of Terms
Type of Terms
Synonyms
Data Classes
Abbreviations
Parent Classification Parent Data Class
Enabled
Definition
Type*
Parent Reference Data Sets
Custom Columns
Parent Policy
Rules
Parent Policies
Rules
Reference Data Sets

Part of Terms (or part_of_terms) Shows part relationships: Is a part of or Has a part of

Is a part of Specifies a relationship in which a term is a component of, a part of, an attribute of, or a member of another term.

Has a part of Specifies a relationship in which another term is a component of, a part of, an attribute of, or a member of the first term.

A part relationship must be specified with its full category hierarchy path, such as top-level category >> primary category >> term name, and it must always be defined as an Is a part of relationship.

To import multiple relationships for a term, add multiple rows for that term.

Example For example, Fixed Rate, Hybrid Rate, and Variable Rate are attributes of the term Interest Rate, which in turn is an attribute of the term Home Loan. In this case, each of the terms Fixed Rate, Hybrid Rate, and Variable Rate has the Is a part of relationship to the term Interest Rate. The term Interest Rate has the Has a part of relationship to those terms but also the Is a part of relationship to the term Home Loan.

This image shows a sample for Is a part of and Has a part of - part relationships.


The rows for these terms look similar to this sample:

   Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Part of Terms,Abbreviations
   Interest Rate,glossary_term,category,,,,,,category >> Home Loan,,
   Fixed Rate,glossary_term,category,,,,,category >> Interest Rate,,
   Hybrid Rate,glossary_term,category,,,,,category >> Interest Rate,,
   Variable Rate,glossary_term,category,,,,,category >> Interest Rate,,

Type of Terms (type_of_terms) Shows type relationships: Is a type of or Has a type of

Is a type of Specifies a relationship in which a term is an instance of the concept that is expressed by another term typically broader in scope.

Has a type of Specifies a relationship in which the concept expressed by a term has one or more subtypes that are expressed by other terms. For example, the term Loan might be specified as having the Has a type relationship to the terms Home Loan, Car Loan, and Student Loan.

A type relationship must be specified with its full category hierarchy path, such as top-level category >> primary category >> term name, and it must always be defined as an Is a type of relationship.

To import multiple relationships for a term, add multiple rows for that term.

Example For example, Home Loan, Car Loan, and Student Loan are different types of loan. In this case, each of the terms Home Loan, Car Loan, and Student Loan has the Is a type of relationship to the term Loan. The term Loan has the Has a type relationship to those terms.

The rows for these terms look similar to this sample:

   Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Part of Terms,Type of Terms
   Home Loan,glossary_term,category,,,,,,,category >> Loan,
   Car Loan,glossary_term,category,,,,,,,category >> Loan,
   Student Loan,glossary_term,category,,,,,,,category >> Loan,

Synonyms (or synonyms) Any alternatives for a given term. To import multiple synonyms for a term, add multiple rows for that term, with a different synonym in each row.

Data Classes (or data_classes) Any data classes that are assigned to a term. A data class must be specified with its full category hierarchy path. To import multiple data classes for a term, add multiple rows for that term, with a different data class in each row.

Abbreviations (or abbreviations) Any abbreviation defined for a term. To import multiple abbreviations for a term, add multiple rows for that term, with a different value for a abbreviation in each row.

Parent Classification (or parent_classification) The main classification to which this classification belongs. A parent classification must be specified with its full category hierarchy path.

Example

  Name,Artifact Type,Category,Parent Classification,Related Terms 
  financial,classification,Insurance >> Business Area >> Claim,Insurance >> property

Parent Data Class (or parent_data_class) Specifies the relationship within a hierarchy of data classes. Each data class can have only one parent data class. A parent data class must be specified with its full category hierarchy path.

Example

  Name,Artifact Type,Category,Description,Classifications,Related Terms,Parent Data Class    
  Georgia State Driver's License,data_class,Driver's License >> US License,A string representing a driver license of US state Georgia.,,,Driver's License >> Driver's License

Enabled Indicates whether a data class is enabled for data matching. Possible values are TRUE or FALSE. This column is optional but if you add it, it must contain a value for each imported data class.

Definition

Specify the data class definition in XML format. This column is optional but if you add it, it must contain a value for each imported data class.

Example

  Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Business Start,Business End,Secondary Categories,Parent Data Class,Enabled,Definition
  Idaho State Driver's License,data_class,,A string representing a driver license of US state Idaho.,,,,,,,,Driver's License,TRUE,"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?  ><DataClasses xmlns=""http://www.ibm.com/infosphere/ia/classification/DataclassesDefinition""><DataClass id=""IDDL"" name=""Idaho State Driver's License"" example=""AA123456X"" description=""A   string representing a driver license of US state Idaho."" active=""true"" provider=""IBM""><RegexClassifier><RegularExpression applicableFor=""structured_data_only"">^([0-6]\d{2}|7[0-6]\d|77[0-2])([   \-\.]?)(\d{2})\2(\d{4})$|^[a-zA-Z]{2}[0-9]{6}[a-zA-Z]{1}$</RegularExpression></RegexClassifier><DataTypeFilter><LogicalDataType>string</LogicalDataType></DataTypeFilter><DataLengthFilter   minLength=""9"" maxLength=""11""/></DataClass></DataClasses>"

Data Set Type (or type)* The data type of the reference data set. This property is mandatory if you import a reference data set. Supported values are text, date, number.

Parent Reference Data Sets (or parent_rds) Specifies the relationship within a hierarchy of reference data sets. A reference data set can have any number of parent reference data sets. A parent classification must be specified with its full category hierarchy path.

To import multiple parent reference data sets, add a separate row for each one.

Example

  Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Business Start,Business End,Data Set Type,Parent Reference Data Sets,Secondary Categories,Custom Columns
  RDS1,reference_data,cat1,Test RDS,tag1,Confidential,1000330999,cat1 >> term1,6/15/2021 13:08,7/31/2021 21:59,TEXT,cat1 >> RDS2,Information Governance,TEXT||CUSTOM COLUMN1||test
  ,,,,,,,,,,,,Business Information,NUMBER||CUSTOM COLUMN2||test2
  RDS2,reference_data,cat1,,,,,,6/15/2021 13:13,,TEXT,,,


In a spreadsheet program, the definition would look like this:
This image shows a sample for how the definition looks in a spreadsheet program.

Custom Columns (or custom_columns) Map columns in a reference data set to existing custom columns or to new custom columns that you create. Specify the information in the following format:
type||column name||(optional: column description).

Examples

  NUMBER||Digit Count||Number of digits of the prime
  TEXT||Additional information

Parent Policy (or parent_policy)

Specifies the relationship within a hierarchy of policies when you import a policy. Each policy can have only one parent policy.

A parent policy must be specified with its full category hierarchy path.

Example

  Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Secondary Categories,Rules,Parent Policy
  policy7,policy,PolicyDef,test policy7,Tag1,,,,,PolicyDef >> policy1

Rules (or rules)

When you import a policy, specify any information governance rules that are or should be governed by this information governance policy.

When you import a rule, specify any information governance rules that are related in some way to the current rule. The relationship is symmetrical. If you specify that rule A is related to rule B, then rule B is related to rule A. A rule can have multiple related information governance rules.

To import multiple rules, add a separate row for each one.

Example

  Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Business Start,Business End,Secondary Categories,Rules,Parent Policy,custom_policy_text_0
  pol3,policy,cat1,,,,,,6/15/2021 13:16,,,,,
  pol2,policy,cat1,,,,,,6/15/2021 13:14,,,cat1 >> rule1,cat1 >> pol1,
  pol1,policy,cat1,test descritpion,tag1,Confidential,1000330999,cat1 >> term1,6/15/2021 13:11,7/31/2021 21:59,Information Governance,cat1 >> rule3,cat1 >> pol3,example CA value
  ,,,,,,,,,,,cat1 >> rule1,,
  ,,,,,,,,,,,cat1 >> rule2,,

Parent Policies (or parent_policies)

Specifies the policies that govern the rule to import.

A parent policy must be specified with its full category hierarchy path.

Example

  Name,Artifact Type,Category,Description,Tags,Classifications,Stewards,Related Terms,Business Start,Business End,Type,Secondary Categories,Rules,Reference Data Sets,Parent Policies,custom_Rule_text_0
  rule2,rule,cat1,,,,,,6/15/2021 13:17,,Governance,,cat1 >> rule1,,cat1 >> pol1,
  rule1,rule,cat1,test description,tag1,Confidential,1000330999,cat1 >> term1,6/15/2021 13:16,8/31/2021 21:59,Governance,Business Information,cat1 >> rule3,cat1 >> RDS2,cat1 >> pol2,example CA value
  ,,,,,,,,,,,,cat1 >> rule2,cat1 >> RDS1,cat1 >> pol1,
  rule3,rule,cat1,,,,,,6/15/2021 13:20,,Governance,,cat1 >> rule1,,cat1 >> pol1,

Reference Data Sets (or rds)

Reference data sets provide logical groupings of code values (reference data values), such as product codes and country codes. These codes are typically sets of allowed values that are associated with data fields and can be assigned to business terms.

Tips and hints for importing data

When creating or updating a CSV import file, consider the following:

When you import a category, you become its only collaborator with the owner role. Imported categories do not have the All users group assigned by default. Modify collaborators and their category roles after the import. When you add the All users group to an imported category, it has the Editor category role assigned by default, as opposed to the Viewer role which is assigned to this group in case of manually created categories.

Importing a CSV file

To import a CSV file:

  1. Go to Governance and choose Categories, Business terms, Classifications, Data classes, Reference data, Policies, or Rules.

  2. Click Import from file.

  3. Choose a file.

  4. Select how to manage conflicts with existing artifacts:

    • Replace all values of a row which means that the import file overwrites existing data defined for the artifact and inserts new data for empty columns if available. The imported draft of an artifact does not replace an existing draft of the artifact with the same name, because you can have multiple drafts of the same artifact with the same name. To replace an existing published artifact, you must publish the draft version you want to use.

      For example:
      (1.) The published business term is named release.
      (2.) You import a CSV file to modify this business term as a draft.
      (3.) When you publish this term, all values of the originally published business term are replaced.

      An overview of the import action:

      • business term: (1.) original business term
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term
        tags:
        related terms: marketing>>version
        classifications: >>Confidential

      • business term: (2.) business term in the CSV file
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>date
        classifications:

      • business term: (3.) imported business term (draft), with the selected merge option
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>date
        classifications:

    • Replace with defined values which means that existing values and empty properties of the published artifact are replaced with the values defined for the imported artifact. Properties without replacement values in the import file remain unchanged. To modify an existing published artifact, you must publish the draft version you want to use.

      For example:
      (1.) The published business term is named release.
      (2.) You import a CSV file to modify this business term as a draft.
      (3.) When you publish this term, existing values and empty properties of the published artifact are replaced with the values defined for the imported artifact.

      The following table provides an overview of the import action:

      • business term: (1.) original business term
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term
        tags:
        related terms: marketing>>version
        classifications: >>Confidential

      • business term: (2.) business term in the CSV file
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>date
        classifications:

      • business term: (3.) imported business term (draft), with the selected merge option
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>date
        classifications: >>Confidential

    • Replace empty values which means that the import file only inserts data if the value is empty. Existing values are not modified. To modify an existing published artifact, you must publish the draft version you want to use.

      For example:
      (1.) The published business term is named release.
      (2.) You import a CSV file to modify this business term as a draft.
      (3.) When you publish this term, only empty values of the originally published business term are replaced.

      The following table provides an overview of the import action:

      • business term: (1.) original business term
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term
        tags:
        related terms: marketing>>version
        classifications: >>Confidential

      • business term: (2.) business term in the CSV file
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>date
        classifications:

      • business term: (3.) imported business term (draft), with the selected merge option
        name: release
        artifact type: glossary_term
        category: marketing
        description: example term edited
        tags: beta
        related terms: marketing>>version
        classifications:

  5. Click Import. Note: Any categories you import are published immediately without workflow.

Imports run in asynchronous mode. You can close the import window to continue working. However, you will not be notified when the import is complete.

Running parallel imports

You can run multiple imports in parallel and query their statuses independently.

To run parallel imports of governance artifacts, run each import in asynchronous mode by using the REST API. Submit a POST /v3/governance_artifact_types/{artifact_type}/import request with the async_mode parameter set to true.

After the request is submitted, a process ID is provided in the response (process_id parameter). You can use the process ID to check the status of the import. Submit a GET /v3/governance_artifact_types/import/status/{process_id} request to query the import status.

Importing all governance artifacts from a ZIP file

You can import all governance artifacts from a single ZIP file by using REST API. Use this kind of import to move all governance artifacts from one Watson Knowledge Catalog service instance to another, or to import Industry accelerators. The exported ZIP file contains separate folders for each artifact type, and each folder contains a CSV file with artifacts. The Artifact ID column in the CSV file is required. The structure of the ZIP file cannot be modified.

Important: - When you import governance artifacts by using a ZIP file, all artifacts are published immediately without workflow.

Required permissions - You must have the Manage glossary permission to import the ZIP file.

Submit the POST /v3/governance_artifact_types/import request. Select the merge option and add the ZIP file to import. The maximum size of the ZIP file is 2GB. For details, see API documentation.

After the request is submitted, a process ID is provided in the response (process_id parameter). You can use the process ID to check the status of the import. Submit a GET /v3/governance_artifact_types/import/status/{process_id} request to query the import status. You can see a separate status for the import process and the synchronization process. The status for the synchronization process is displayed when the synchronization phase starts. The imported data might not be displayed on the target environment until the synchronization finishes, even when the status of the import is "Succeeded".

Considerations when you import files that were edited manually
If you exported the ZIP file and modified its contents manually, make sure that the correct ZIP structure and file format is preserved before you import the ZIP file.

When you run a large import and the synchronization process is interrupted, it is restarted automatically from the moment it failed after around 15 minutes. The synchronization might be interrupted because the pod where it runs is shut down. The process is restarted on another pod. If the process fails for 24 h, it is no longer restarted and is marked as failed.

Tech preview This is a technology preview and is not yet supported for use in production environments.
When the import hangs, you can clean the import process to import the ZIP file again. It is not possible to start a new import process until the previous one completes. All import processes are cleaned automatically by default after 24 h of inactivity. If you are an admin, you can clean the hanging import process manually before it is cleaned automatically. Clean the process only when you are sure that it is not in progress or pending. Submit the POST /v3/governance_artifact_types/import/cleanup/{process_id} request.

Learn more

Parent topic: Managing governance artifacts