Data quality dimensions
Data quality dimensions describe a measurable characteristic of data and help defining data quality requirements. Use data quality dimensions to determine the expected results of data quality assessment, whether initial assessment or ongoing monitoring.
The state that you want your data to be in usually can be defined as fit for use, defect free, corresponds to specification, or meeting expectations and requirements. When you measure data quality, you compare the actual state of your data to this wanted state. The standards, expectations, and requirements that are important to your business processes are expressed as characteristics or dimensions of the data.
The Data Management Association (DAMA) International published a paper that describes 6 core dimensions of data quality: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity
In addition to these core dimensions, IBM watsonx.data intelligence provides the dimensions Conformity, Coverage, and Homogeneity.
The following table describes the data quality dimensions and lists automated data quality checks that can identify issues associated with a specific dimension. These checks can be the data quality checks in metadata enrichment or data quality checks that are part of a data contract. Data contracts must conform to the Open Data Contract Standard (ODCS). In addition, these dimensions can be evaluated by setting up and running individual data quality rules.
| Dimension | Description | Types of data quality checks |
|---|---|---|
| Accuracy | Data values are as close as possible to real values. | Data quality checks as part of data contract testing |
| Completeness | All required data values are present. | Completeness check Data quality checks as part of data contract testing |
| Conformity | Data adheres to the defined standards, formats, and permissible values. | Data quality checks as part of data contract testing |
| Consistency | Data values within a column comply with a rule. | Capitalization style check Missing values representation check Referential integrity check Suspect values check Data quality checks as part of data contract testing |
| Coverage | Data represents the expected data set, typically measured by record counts or data completeness. | Data quality checks as part of data contract testing |
| Homogeneity | Data within a data asset is uniform and consistent over time. All data points share similar characteristics, formats, or structures. | Historical stability |
| Timeliness | Data represents the reality from a required point in time. | Data quality checks as part of data contract testing |
| Uniqueness | Distinct values appear only once. | Uniqueness check Data quality checks as part of data contract testing |
| Validity | Data conforms to the format, type, or range of its definition. | Data class check Data type check Format check Length check Possible values check Range check Regex check |
You can create your own data quality dimensions by using the IBM Knowledge Catalog API Create a data quality dimension.