Sample data quality definitions

A set of predefined data quality definitions is available for download from the IBM Knowledge Catalog samples GitHub repository.

The provided .zip file contains 190 predefined data quality definitions that you can import to a project in Cloud Pak for Data, and use to quickly develop data quality rules for a set of data domains and conditions that are commonly found in many data sources.

You can add the definitions to an existing project. For easier evaluation of which definitions you want to make available to other users, you might want to create a separate project for the data quality definitions.

To add the data quality definitions to an existing project, follow these instructions:

  1. Download the SampleDataQualityDefinitions_CPD.zip file from the IBM Knowledge Catalog samples GitHub repository:

    https://github.com/IBM/knowledge-catalog-samples/blob/main/data-quality/SampleDataQualityDefinitions_CPD.zip

  2. Go to Projects > All projects and open the project to which you want to add the definitions.

  3. From the toolbar, select Import project.

  4. Upload the SampleDataQualityDefinitions_CPD.zip file.

    All predefined data quality definitions are added to the project. You can use these definitions in these ways:

    • As is. If a predefined data quality definition already matches your needs, you can create data quality rules from that definition without further modification.
    • As templates. You can copy and modify the predefined data quality definitions, and customize them for your specific data conditions.
    • As models for development. The definitions can serve as examples of specific functions or conditions in use that can provide guidance for designing and developing rules for your environment.
  5. To share the data quality definitions, whether as uploaded or already customized, with other users, publish them to a catalog.