Ensuring data quality through data contracts

A data contract is a formal agreement between a data producer and a data consumer and defines among other things the expected structure, schema, and quality of data. It ensures that data meets the requirements and aligns with business definitions.

Data contracts for use in IBM watsonx.data intelligence must be set up in YAML or JSON format and conform to the Open Data Contract Standard.

Tech preview This is a technology preview and is not yet supported for use in production environments.

Required permissions
You must have the Admin or the Editor role in the project and the Manage data quality assets and Execute data quality rules user permissions.

Data contracts can be fully managed and enforced through the Data Product Hub UI. However, users can also manage and version contract files in YAMl or JSON format in an external source control system such as Git. They can then run data quality validations directly by using the Data Contract Enforcement API.

The Data Contract Enforcement API provide methods that you can use for these tasks:

  1. Upload a new data contract or update an existing one.
  2. Run data quality validation tests against data based on that contract.
  3. Retrieve the test results.

The API calls require one or more of these parameters:

project_id
The ID of the project that you want to use as workspace for your validations.
data_contract_id
The ID of the data contract against which you want to validate your data. You can retrieve the ID from the id field in the response when you create a data contract. Alternatively, you can submit a GET /data_quality/v4/projects/{project_id}/data_contracts call to list all data contracts within a project.
Important: Currently, only rules in SQL format are validated for data contracts.

Enforcing a new data contract

You want to enforce a data contract that does not yet exist in the project:

  1. Create the contract in YAML or JSON format, for example, in Git.

  2. Optional: Validate the contract against the syntax that is defined in the ODCS standard before you create the contract in the project:

    POST /data_quality/v4/projects/{project_id}/data_contracts_validation
    
  3. Create the contract in the project:

    POST /data_quality/v4/projects/{project_id}/data_contracts
    
  4. Run the test:

    POST /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test
    

    The data assets and the SQL rules that are defined in the data contract are created in the project and the rules are run. If you want to remove the data quality rules from the project after the test is complete, set the retain_dq_objects parameter of the call to false.

  5. Retrieve the test results:

    GET /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test_results
    
  6. Review the results to determine whether the data met the defined quality standards.

Enforcing an updated data contract

You updated the data contract and need to retest your data. For example, the data that is subject to the data contract or the data quality requirements changed.

  1. Update the contract in the source repository, for example, in Git.

  2. Update the contract in the project:

    PUT /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}
    
  3. Run the test:

    POST /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test
    

    Depending on the changes in the data contract, data assets and SQL rules are updated or added, and the rules are run.

  4. Retrieve the test results:

    GET /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test_results
    
  5. Review the results to determine whether the data met the defined quality standards.

Finding a specific data contract, and testing or retesting as required

You want to find out whether a specific contract exists in your project and was enforced to decide whether you must rerun the data quality tests or even create a new contract.

  1. Identify the contract that you want to look for in the source repository. Note the name, ID, or any metadata that you can match against what’s in the project.

  2. List the data contracts that exist in the project:

    GET /data_quality/v4/projects/{project_id}/data_contracts
    
  3. If the contract that you are looking for exists in the project, check for existing results:

    GET /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test_results
    
    1. If results are available and are still current, no further action is needed.

    2. If no results are available or if the existing results are outdated, for example, because they stem from an older contract or older data, run the test:

      POST /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test
      
    3. Retrieve the results:

    GET /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test_results
    
  4. If the contract that you are looking for does not exist in the project, follow the steps in Enforcing a new data contract.

Deleting contracts, rules, or results

You can delete data contracts and any test results from a project:

  • Delete one or more contracts:

    DELETE /data_quality/v4/projects/{project_id}/data_contracts
    

    Provide the contract IDs as a comma-separated list.

  • Delete the test results for a specific contract:

    DELETE /data_quality/v4/projects/{project_id}/data_contracts/{data_contract_id}/test_results