Designing data classes (IBM Knowledge Catalog)

When you design a data class, you must decide whether to enable data matching for this data class, which business terms or classifications it should be related to, and whether to define hierarchical relationships between data classes.

Required permissions
To author a data class, you must have this user permission:
- Access governance artifacts
Additionally, you must have one of these category collaborator roles in the primary category for the data class:
  • Admin
  • Owner
  • Editor
  • A custom role with the permission to create data classes.

For more information, see Required permissions.

Properties of data classes

Data classes have these standard properties that are similar to other governance artifacts.

Property or behavior Supports? Explanation
Must have unique names? Yes Data class names must be unique within a category.
Description? Yes Optional. Include a description to help users find this data class.
Add relationships to other data classes? Yes See Relationships between data classes.
Add relationships to other types of governance artifacts? Yes See Relationships with other types of governance artifacts.
Add relationship to asset? Yes See Asset relationships in catalogs.
Add custom properties? Yes See Custom properties and relationships for governance artifacts and catalog assets.
Add custom relationships? Yes See Custom properties and relationships for governance artifacts and catalog assets.
Organize in categories? Yes The primary category for the artifact determines who can view or modify the artifact. See Categories.
Import from a file? Yes See Importing governance artifacts.
Import from a Knowledge Accelerator? No
Export to a file? Yes See Exporting governance artifacts.
Managed by workflows? Yes See Workflows.
Specify effective start and end dates? Yes See Effective dates.
Assign a Steward? Yes See Stewards.
Add tags as properties? Yes See Tags.
Assign to an asset? No
Assign to a column in a data asset? Yes A data class can be added to a column in a data asset both manually and automatically.
Automated assignment to assets during profiling or enrichment? Yes See Managing metadata enrichment
Predefined artifacts? Yes See Predefined data classes.
Add regular expression (regex) patterns? Limited Some custom data classes with regular expression patterns might fail to run masking flows or cannot preview example of the masked data. For example, you cannot use capture groups such as ([abc]), but you can use non-capture groups (?:[abc]).

Relationships between data classes

You can use hierarchies to create relationships between data classes.

For the currently processed data class, you can define the following relationships to other data classes within the same category:

  • Parent data class
  • Dependent data classes

The parent data class is used to organize the data class in parent/children relationships. It also acts as a kind of "pre-filter" if an automatic matching data method is used: If a parent data class has a matching data method, the data matching methods for the children data classes will only be evaluated if the data matching method for the parent data class returned a positive match. This means that if you define a parent data class it has an impact on the criteria used by the data classification process to decide whether the data class should be assigned or not to an analyzed data field.

Example:

  • US License - parent data class
  • Georgia State Driver's License - dependent data class

Relationships with other types of governance artifacts

You can add the following related artifacts:

  • Classifications
  • Business terms

The classifications and business terms that you add are suggestions for columns to which the data class is assigned.

When you add relationships between data classes and business terms, those business terms are automatically assigned to assets when their related data classes are assigned during metadata enrichment. For example, a data class Email address can be related to a business term Contact method. When the metadata enrichment process detects a column that matches the data class Email address, both the data class Email address and the business term Contact method are assigned. See Automatic term assignment.

However, a data class is not automatically assigned when one of its related business terms is assigned to a column.

You can include data classes in data protection rules to identify the type of data to control.

Working with data classes

To create a data class:

  1. Open Governance > Data classes.
  2. Click New data class to create a new data class and provide the required information. Data classes can have the same name if they are in different categories.
  3. Click Save as draft. The data class in a draft state is now ready for refining as listed in the following section.
  4. When ready, click Publish or Send for approval depending on your workflow definition.

To edit an existing data class:

  1. Open a data class and click a + icon or edit next to the field you want to change.
  2. Click Save as draft. The data class in a draft state is now ready for refining.
  3. Click Publish or Send for approval depending on your workflow definition.

You can provide the following information to define your data class:

  • Add an example for the data class in the Example property. If you specify a data class named City-New, the example could be London.

  • Assign this data class to a primary category and optionally to secondary categories.

  • Edit custom properties that provide additional information in the Details section.
    Custom properties can be created as described in Custom properties and relationships for governance artifacts and catalog assets. If any custom relationship types are defined, they are also shown here. Inverse relationships show up in the other artifact after you publish the artifact where you created the relationship.

  • Use data matching to organize database columns and data file fields for review and subsequent column analysis work. For example, database columns with numeric data typically include numbers within a range of valid values.

  • Enable or disable a data class for auto-assignment. To enable data class, you need to enable data matching. A data class with data matching method enabled is treated as enabled data class and a data class where data matching method is disabled is treated as a disabled data class.

  • Choose the matching priority of a data class to determine which data class candidate should become the inferred data class of a field. Only data classes with confidence above threshold are considered. See Priority.

  • Specify related artifacts. You can select only the business terms and classifications that have been published. The classifications and business terms that you add here are suggestions for columns to which the data class is assigned. You can assign one or more classifications at a column level.

  • Add other related content.

Depending on the effective dates that are set for the data class, it is active or inactive. Active data classes can be used to specify actions, for example, classifying data automatically. Inactive data classes do not contribute to any action until they become active.

You cannot use draft data classes to specify data matching or for any other action. By default, the data class is published if you send it for approval.

You can also create additional data classes based on one of the reference data sets available in Knowledge Accelerators by using the data matching method. See Reference data sets in Knowledge Accelerators.

Learn more

Parent topic: Data classes