Automatic term assignment

Automatic term assignment is the process of automatically mapping assets to business terms. Terms are automatically assigned to assets as an optional subtask that is a part of column analysis or automated discovery.

Terms are words or phrases that describe a characteristic of the enterprise. Term assignment helps to understand how technical assets relate to glossary entries in Information Governance Catalog by creating associations between terms and information assets.

Initially these associations are represented as candidates which domain experts and stewards can review and assign manually. Terms are represented as candidates if their confidence level matches or exceeds the candidateThreshold parameter, which is 50% by default. Candidates are automatically assigned if their confidence level matches or exceeds the assignmentThreshold, which is 80% by default. Individual thresholds can be set at the InfoSphere® Information Analyzer suite or workspace level. For more information about configuring term assignment settings, see Customizing term assignment parameters. Note: If workflow is enabled on the Information Governance Catalog glossary, only published terms can be assigned.

The following services are used to generate term assignments:
  • The linguistic name matching service bases its result on the similarity between the term and the name of the data set or asset. For example, a column CREDNUM might be associated with a term Credit Card Number because of the similarity between the two names.
  • The class-based assignment service creates candidates based on data classification. If a data class has been selected for an asset either as the result of column analysis or manually, and if this data class is linked to one or more business terms, these terms are considered candidates or assignments if they match or exceed the respective thresholds. The confidence of the candidate is the same as the confidence of the data class the term is linked with. For example, a column COL1 classified as an E-mail address with 90% confidence, is likely to be assigned to the term E-mail Address if the data class and term are linked. Since there is no linguistic similarity between the name of the column and the term, the linguistic name matching service is not capable of making this association. To enable the class-based assignment service it is important to review data class to term linkage before running term assignment since appropriate linkage is an important prerequisite for high quality results.
  • The machine learning service uses a supervised machine learning model to assign terms. The model is initially trained with business terms that are already stored in Information Governance Catalog. As users assign business terms manually or they confirm automatic assignments, the system learns from those user actions and automatically refreshes (retrains) the model. The user's selection is automatically captured and provided to the model as feedback. This in turn triggers retraining of the model for better outcome when using the term assignment feature next time.
    The machine learning service is available when you install IBM® Information Server Enterprise
    Search
    .

Since the analysis runs in the context of an InfoSphere Information Analyzer workspace, results of automatic term assignment are not visible in Information Governance Catalog unless published. InfoSphere Information Analyzer analysis runs a configurable sequence of steps on a given data set. In InfoSphere Information Analyzer, you have the option to run a column analysis or a data quality analysis, during which the column analysis is automatically run. Term assignment happens during column analysis. After the analysis is complete, candidate terms are suggested and can be edited, approved, and published from the summary page of each data set in your workspace.

Term assignment functionality that was a part of the Information Governance Catalog, Version 11.5 release is still available as the product transitions to the Unified Governance experience. In Information Governance Catalog, you can open an asset in the catalog and click Detect term assignments, but this feature only assigns terms based on the linguistic name matching service. The automatic term assignment that happens when you run automated discovery or run a column analysis allows you to review term assignments before they are published. It allows data curators more flexibility as they work to assign meaningful terms to assets. For more information, see the Assigning terms to information assets topic in Information Governance Catalog documentation.