Data quality analysis results

When a data asset is successfully analyzed, the results are displayed at both the data asset level and column level. Analysis results include information on the content and structure of your data asset, and metrics about the overall quality of your data.

Data quality analysis results are available on the asset's Data quality page in a project or a catalog. You can also access them from within a metadata enrichment by clicking the quality score of an asset or a column.

Required permissions
To view the analysis results, you must be a collaborator in the workspace.
To change the way that the scores are calculated, you must have the Admin or Editor role in the project.
To create new data quality checks, you must have the Admin or Editor role in the project and the Manage data quality assets permission.
To view the data that caused data quality issues (the output table) from the rule run history or the Data quality page, you must have the Drill down to issue details permission. However, the data asset in the project that is created for the output table is accessible by anyone who can access the connection. To limit access to this data asset, the connection to the data source where the output table is stored should be set up with personal credentials.

Data quality information becomes available in a project or a catalog as follows:

  • In a project, after the first data quality check is run on the data asset in one of these ways or when a connected IBM Match 360 entity data asset is added:

    • Data quality analysis runs on the asset as part of metadata enrichment.
    • A data quality rule runs on the asset.
  • In a catalog:

    • A data asset with data quality information is published to the catalog.

The quality scores are recalculated and the data is refreshed as follows:

  • In a project, every time a data quality check is run on the asset or when an IBM Match 360 entity data asset is updated:

    • A data quality analysis runs in the context of metadata enrichment.
    • A data quality rule is run on the asset.
    • The IBM Match 360 matching algorithm is changed or potential match issues are remediated.
  • In a catalog:

    • An asset is published from a project.

You can immediately see when the quality scores were last updated.

Data quality information for an asset

When you access an asset's data quality information, you see the overall data quality scores and the results of the data quality checks that were run on the asset. In addition, you have access to the analysis results for the asset columns.

Overall scores at asset level

A graphical representation of the quality scores gives you an at-a-glance view of the asset's overall quality and of the level of quality regarding the dimensions that are applied to the asset. For these scores, trend information shows how the overall quality or the quality score for a dimension changed over time. You can select whether the trend is shown for a period of 30, 90, or 180 days. A dimension does not show trend information if no check contributed to this dimension before.

The overall asset score is the weighted average of the scores that are provided by the asset columns. Each dimension score is the weighted average of the corresponding dimension scores that are provided by the individual checks.

The overall and dimension scores and the trend information are recalculated for these changes:

  • A data quality check is run on the asset.
  • The Contributes to overall score setting for a check or a column is changed.
  • A data quality rule that was applied to the asset is deleted.
  • The asset profile is deleted on the asset's Profile page.
  • The asset is updated in IBM Match 360.

For more information, see Data quality scores.

Data quality check results at asset level

Here, you can see which checks were run on the asset and what the results were. The list is sorted by date with the most recent checks at the top.

Name & logic

The name of a data quality rule and the name of the data quality definition that contains the rule logic, or the name of a predefined data quality check.

Data quality rules with externally managed bindings or SQL-based data quality rules contribute to the data quality scores of an asset if that asset is added as a related item to the corresponding rule with the Validates the data quality of relationship. The same score and issues are reported for all assets and columns that are linked with this relationship type.

The predefined data quality checks are run on the entire asset. However, not all of them return results for all columns. For example, the Suspect values check identifies outliers in numeric columns or string columns with numeric data but does not return results for string columns with string values. Thus, the list of predefined data quality checks might be shorter for individual columns.

In a project, you can click the name of a data quality check for details. For predefined data quality checks, view information about the findings: the columns that have issues and the number and percentage of values in those columns that were identified as quality issues. If an output table is set up for these issues, a user with the appropriate permissions can view the actual rows where data causes quality issues. For data quality rules, you can see the general rule configuration and have access to the rule's output table if one is configured. If you want to update the rule configuration and have the required permissions, you can directly go to the asset by clicking View data quality rule.

For connected IBM Match 360 entity data assets, Potential matches is displayed here for matching. No further information for this type of check is provided.

Type

The type of check, which can be Data quality rule, Matching, or Profiling. Matching is shown for IBM Match 360 results. Profiling is shown for predefined data quality checks that were run in the context of metadata enrichment. See Predefined data quality checks.

Dimension

The data quality dimension to which this check is tied. The predefined data quality checks that are run during profiling or as part of metadata enrichment have default dimensions assigned. For data quality rules, you assign dimensions as required.

For connected IBM Match 360 entity data assets, the dimension Entity confidence is shown.

If no dimension is set, the field shows None. For more information, see Data quality dimensions and Data quality scores.

Focus & percentage of data with issues

Depending on the type of check, the focus can be one or more columns or an entire table. For the predefined data quality checks, the focus is always the entire table. Percentage of data with issues shows how much of the data doesn't meet the quality criteria defined in the check.

Data checked & issues found

The number of records that were checked and the number of quality issues that were found. These issues can be in the same or in different records.

Sampling

The kind of sampling that was applied in the last run of the check. For data quality rules, this column shows a dash (—) if no sampling is configured. For matching, the column always shows a dash. For the predefined data quality checks, the column always has a value.

Score

The quality score that the check returned for the asset.

Contributes to overall score

This setting determines whether this specific quality score is considered in the calculation of the overall scores. You can change this setting only in a project. You must be a project administrator or editor to do so. In a catalog, the setting is locked. See Data quality scores.

Last checked

The date and time when the check was last run.

You can switch to the column overview by clicking Columns.

In a project, you also have the option to create new data quality definitions or data quality rules if the data quality component of IBM Knowledge Catalog is enabled. You must be a project administrator or editor and have the Manage data quality assets permission..

Columns overview

View data quality information for the individual columns:

  • The column name.
  • The column's overall data quality score.
  • The column's quality score for any of the dimensions that are applicable to the asset. A dash (—) is shown if none of the checks that were applied to that column contributed to the dimension.
  • The number of checks that were run on a column.
  • Whether the column's data quality score is considered in the calculation of the overall asset score and the dimension scores. As a project administrator or editor, you can change that setting.
  • When the column was last checked.

You can then drill down into the data quality details for each column. See Data quality information for a column.

You can go back to the list of data quality checks by clicking Checks.

Data quality information for a column

When you access the data quality information for a column, you see a section that shows the overall data quality scores and you have access to the results of the data quality checks that were run on the column. Matching does not contribute to the column-level data.

In addition to the quality information, you can see which data class and business terms are assigned to the column.

Overall scores at column level

A graphical representation of the quality scores gives you an at-a-glance view of a column's overall quality and of the level of quality with regard to the dimensions that are applied to the column. For these scores, trend information shows how the overall quality or the quality score for a dimension changed over time. You can select whether the trend is shown for a period of 30, 90, or 180 days.

The overall score for the column or a dimension is the weighted average of the scores provided by the data quality checks that were applied to the column.

In a project, the overall and dimension scores and the trend information are recalculated every time a data quality check that affects the column is run on the asset. The score is also recalculated when you change the Contributes to overall score setting for a check that affects the column, or when data quality rules or the asset profile are deleted.

In a catalog, the overall and dimension scores and the trend information are updated when an asset is published from a project.

For more information, see Data quality scores.

Data quality check results at column level

Here, you can see which checks were applied to the column and what the results were. The list is sorted by date with the most recent checks at the top.

Name & logic

The name of a data quality rule and the name of the data quality definition that contains the rule logic, or the name of a predefined data quality check.

Data quality rules with externally managed bindings or SQL-based data quality rules contribute to the data quality scores of a column if that column is added as a related item to the corresponding rule with the Validates the data quality of relationship. The same score and issues are reported for all assets and columns that are linked with this relationship type.

In a project, you can click the name of a data quality rule to see the general rule configuration and the rule's output table if one is configured. If you want to update the rule configuration and have the required permissions, you can directly go to the asset by clicking View data quality rule.

Type

The type of check, which can be Data quality rule or Profiling. Profiling is shown for predefined data quality checks that were run in the context of metadata enrichment. See Predefined data quality checks.

Dimension

The data quality dimension to which this check is tied. The predefined data quality checks that are run during profiling or as part of metadata enrichment have default dimensions assigned. For data quality rules, you can assign dimensions as required. If no dimension is set, the field shows Other. For more information, see Data quality dimensions and Data quality scores.

Percentage of data with issues

This value shows how much of the data doesn't meet the quality criteria defined in the check.

Data checked & issues found

The number of records that were checked and the number of quality issues that were found. These issues can be in the same or in different records.

Sampling

The kind of sampling that was applied in the last run of the check. For data quality rules, this column shows a dash (—) if no sampling is configured. For the predefined data quality checks, the column always has a value.

Score

The quality score that the check returned for the column.

Contributes to overall score

This setting determines whether this specific quality score is considered in the calculation of the overall scores. You can change this setting only in a project. You must be a project administrator or editor to do so. In a catalog, the setting is locked. See Data quality scores.

Last checked

The date and time when the check was last run.

Watson Data API for data quality

You can use a collection of REST APIs to generate and retrieve data quality information.

  • Data Quality Assets methods
    Data quality assets are data assets that are subject to data quality checks. Sample API: Get data quality assets
  • Data Quality Checks
    Data quality checks can be, for example, data quality rules or checks that are run as part of metadata enrichment. Sample API: Get data quality checks
  • Data Quality Dimensions
    A set of standard data quality dimensions is provided with the product, but you can create custom dimensions. Sample API: Get a list of data quality dimensions
  • Data Quality Issues
    Data quality issues are the problems the data quality checks found for a data asset. Sample API: Get a list of data quality issues
  • Data Quality Scores
    For each data asset, different types of quality scores are generated, such as the overall score or dimension scores. Sample API: Get a list of data quality scores for a given asset

Learn more

Parent topic: Managing data quality