Automatic term assignment (Watson Knowledge Catalog)

Automatic term assignment is the process of automatically mapping business terms to data assets and asset columns. Terms can automatically be assigned to assets and columns as a part of metadata enrichment, column analysis, and automated discovery.

For information about automatic term assignment in column analysis and automated discovery, see Automatic term assignment in column analysis and automated discovery.

You can assign business terms manually by editing the data asset properties in a project or a catalog, or when you work with enrichment results.

If automatic term assignment is configured as part of metadata enrichment, such assignments are generated by several methods. These methods also generate suggestions for terms to assign.

The terms are assigned based on the confidence level. Initially, these associations are represented as candidates which domain experts and stewards can review and assign manually. The confidence level for when a term is suggested or automatically assigned is determined by the project's enrichment settings. The default confidence level to be exceeded is 75% for term suggestions, and 90% for automatic assignment of candidate terms.

Only published business terms can be assigned.

Methods used to generate term assignments

The following methods are used to generate term assignments:

A project administrator can customize some settings for the term assignment methods. See Default enrichment settings.

Removing assignments

Removed terms are considered in automatic term assignment in IBM Cloud Pak for Data 4.5.3 or later.

When you review the assignments, you might find terms that you think are not accurate for a given data asset. You can remove such terms thus providing negative feedback to the automatic term assignment methods. When you rerun automatic term assignment, the ML-based term assignment method then also returns a negative confidence value for such terms. The individual confidence values returned by each term assignment method are adjusted by this negative confidence value for calculating the overall confidence score of a term. See How the overall confidence score is calculated.

How the overall confidence is computed

A method that associates a term with a data asset computes a confidence, which is a numeric value between a configurable minimum and 1. The minimum value is configured as percentage threshold for which the term must match by the setting of the suggestion threshold for term assignment.

The confidence is computed differently depending on which release of IBM Cloud Pak for Data you are on.

How new analysis results update existing term assignments

When you rerun an enrichment, a new analysis result updates term assignments as follows:

Publishing term assignments

When you publish the enrichment results, term assignments, whether manual or automatic, are available in the catalog and in all projects that contain a given data asset. Term suggestions are not published.

When you remove a published term assignment, all projects that contain the data asset are affected. While you work within the enrichment results, the changes are internal to the project. However, when you publish the changes, the term is removed from the asset in all projects it is contained in. Before you remove a published assignment, make sure that it wasn't added on purpose by other users.

Learn more

Parent topic: Metadata enrichment results