Important:

IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Managing data quality (Watson Knowledge Catalog)

Measure, monitor, and maintain the quality of your data to ensure the data meets your expectations and standards for specific use cases.

This feature is not available by default. The data quality feature must be installed with Watson Knowledge Catalog. To see whether the feature is enabled, check whether the asset types Data quality definition and Data quality rule are available when you select to add a new asset to a project. To see these asset types, you must also have the Access data quality asset types permission.

Data of good quality is in a state that usually can be defined as fit for use, defect free, or meeting expectations and requirements. Data quality is measured against the quality dimensions Accuracy, Completeness, Consistency, Timeliness, Uniqueness, and Validity.

Data quality analysis provides answers to these questions:

  • How good is the overall quality of a data asset?
  • Which of the data assets has the better quality?
  • How did the quality of a data asset change over time?
  • Does the data asset meet my quality expectations?
Required services
Watson Knowledge Catalog
DataStage
Data format
Tables from relational and nonrelational data sources
Tabular: Avro, CSV, Parquet, ORC; for data assets uploaded from the local file system, CSV only :
Data size
Any :
Required permissions
To view data quality assets, you must have the Access data quality asset types user permission and at least the Viewer role in the project.
To create, edit, or delete data quality assets, you must have the Access data quality asset types user permission and the Admin or the Editor role in the project.

Data quality analysis and monitoring

Use data quality analysis and monitoring to evaluate data against specific criteria. Use these evaluation criteria repeatedly over time to see important changes in the quality of the data being validated.

After a data quality check is designed, you have these options:

  • Create a data quality definition that defines the logic of the data check irrespective of the data source. The definition contains logical variables or references that you link or bind to actual data (for example, data source, table and column or joined tables) when you create a data quality rule that can be executed.

    After you create a data quality rule with the required bindings based on a select data quality definition, that rule can be executed. The rule produces relevant statistics and can generate an output table, depending on the rule configuration.

  • Create an SQL-based data quality rule.

The functionality of a data quality rule can range from a simple single column test to evaluating multiple columns within and across data sources.

Assessing data quality

To determine whether your data is of good quality, check in how far the data meets your expectations and identify anomalies in the data. Evaluating your data for quality also helps you to understand the structure and content of your data.

Monitoring data quality

To ensure that important data meets your organization's quality expectations, implement data quality SLA rules that monitor your data for compliance with the standards and can provision for remediation of detected data quality issues.

Learn more

Parent topic: Integrating and preparing data