Sample data quality definitions
A set of predefined data quality definitions is available for download from the IBM Knowledge Catalog samples GitHub repository.
The provided .zip file contains 190 predefined data quality definitions that you can import to a project in Cloud Pak for Data, and use to quickly develop data quality rules for a set of data domains and conditions that are commonly found in many data sources.
For easier evaluation of which definitions you want to make available to other users, create a separate project for the data quality definitions.
To add the data quality definitions, follow these instructions:
-
Download the
SampleDataQualityDefinitions_CPD.zipfile from theIBM Knowledge Catalog samplesGitHub repository: -
In Cloud Pak for Data, go to Projects > All projects and click New Project.
-
Select Local file.
-
Upload the
SampleDataQualityDefinitions_CPD.zipfile.The new project contains all predefined data quality definitions. You can use these definitions in these ways:
- As is. If a predefined data quality definition already matches your needs, you can create data quality rules from that definition without further modification.
- As templates. You can copy and modify the predefined data quality definitions, and customize them for your specific data conditions.
- As models for development. The definitions can serve as examples of specific functions or conditions in use that can provide guidance for designing and developing rules for your environment.
-
To share the data quality definitions, whether as uploaded or already customized, with other users, publish them to a catalog.