Managing data quality rules

You can create and manage data quality rules for assessing the quality of the data in your project.

If you have the required permissions, you can manage data quality rules in these ways:

You can also complete these tasks with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.

Requirements and restrictions

Required services

Service The IBM Knowledge Catalog and IBM Knowledge Catalog Premium services are not available by default. An administrator must install one of the services. To determine whether a service is installed, open the Services catalog. If the service is installed and ready to use, the tile in the catalog shows Ready to use.

Service The DataStage Enterprise service is automatically installed when the data quality feature is enabled in IBM Knowledge Catalog. If you did not purchase a DataStage license, use of DataStage Enterprise is limited to creating, managing, and running data quality rules. For examples of accepted use, see Enabling optional features after installation or upgrade for IBM Knowledge Catalog.

For the following features, generative AI capabilities must be enabled in the deployment and the required models must run on GPU or in a watsonx.ai as a Service instance:

  • Generating SQL-based rules from plain text
  • Generating rule descriptions and explanations for rule expressions This setup can be done during installation, upgrade, or at any time later. For more information about the deployment modes, see Preparing to install IBM Knowledge Catalog in the IBM Software Hub documentation.

Project settings

You can define these settings:

  • On the Data intelligence page, enable or disable natural language queries for the project.
  • On the Data quality page, configure a default table for rule output, and enable or disable AI-generated descriptions for data quality rules.

Some project-level settings determine certain aspects of data quality rule execution, for example, whether trailing spaces in string values are ignored in equality checks. These settings apply to all data quality rules for a given project. However, you cannot change them in the UI. You can check or update these settings for each project by using the IBM Knowledge Catalog API Get project settings for data quality rules and Replace project settings for data quality rules.

Required permissions

To view data quality rules, you must have at least the Viewer role in the project.

To create, edit, or delete data quality rules, you must have the Manage data quality assets user permission and the Admin or the Editor role in the project.

To see the output when you test a definition-based rule that has no output table configured, you must have the Drill down to issue details or the Access data preview user permission.

Creating data quality rules

You can create different types of data quality rules:

Publishing data quality rules

You can make any data quality rule available for re-use in other projects by publishing it to a catalog from where it can be added to any number of projects. Before you do so, make sure that the description of the data quality rule provides meaningful information. Such information helps other users pick the right data quality rule for use in their project.

To publish a data quality rule:

  1. Select the data quality rule from the list of assets and click Publish to catalog. Alternatively, you can select Publish to catalog from the asset's overflow menu.
  2. Select the catalog and fill in the asset properties.
  3. If an asset duplicate already exists in the catalog, you can specify what action should be taken in such a case. The choices you have are determined by the catalog default setting. For more information about duplicate asset handling, see Handling duplicate assets in catalogs.
  4. Click Publish. The assets are added to the catalog and you are the owner of them. Assigned business terms and classifications are listed in the Governance artifacts section. Assigned governance rules are shown in the Related items section with the Is governed by relationship.

When you publish a data quality rule to a catalog, the related objects, like data quality definitions, data assets, and connections are also published.

Editing data quality rules

You can edit a data quality rule to update its description, the selected data quality dimension, any business term assignments, or the rule configuration. You can also manage the list of related items.

To edit a data quality rule, open the asset and perform the appropriate actions:

  • To update the description, any explanations for the rule logic, or the data quality dimensions, click the Edit icon edit icon next to the property. You can manually update the description and explanations or generate them by using AI.
  • To manage business terms, go to the Governance artifacts section of the asset and add or remove terms as needed.
  • To assign or delete governance rules, go to the Governance artifacts section of the asset, and add or remove governance rules as needed.
  • To add or remove related artifacts, assets, or columns, go to the Related items section of the asset, and add or remove items as needed.
  • To manage custom properties (if available), go to the Details section.
  • To update the rule configuration, click Edit rule. For all types of rules, you can change the output type. Depending on your new selection, any configured output settings are reset or overwritten. Rule output that was written before the change remains untouched. For SQL-based rules, you can change the SQL statement. For definition-based rules, you can change which data quality definitions are used and the sampling settings. You cannot change the way how bindings are managed.

For data quality rules that bind data directly, a Validates data quality of relationship with each bound column is added to the Related items section. You can manually add columns with this type of relationship to data quality rules with externally managed bindings or SQL-based data quality rules. SQL-based rules then contribute to the data quality scores of the corresponding column. Rules with external bindings contribute to the data quality scores of the columns that are linked with the Validates data quality of relationship only if no columns are configured for score reporting in the rule subflow stage.

When you view a data quality rule, you can click the Info icon info icon to view more details such as output settings or related assets.

Deleting data quality rules

You can delete a data quality rule in one of these ways:

  • In the project, select the data quality rule and click Delete.
  • Open the data quality rule and select Delete from the overflow menu next to the name of the data quality rule.

When you delete a data quality rule, its run history, any associated DataStage flow and jobs are also deleted from the project. Output tables in the project and in the database are kept. The issues that were returned by this data quality rule are removed, and the data quality and dimension scores are recalculated.

Learn more