Data quality
The Data quality page in the Manage tab helps you to evaluate the quality of your project data. Data quality is expressed by using a set of three parameters, such as the ratio between events and cases, ratio between cases and variants, and average rework ratio per case. Based on the values of these parameters, IBM Process Mining rates the data quality as Good, Warning, and Bad. To learn about the reference ranges of the ratings for each parameter, see the Table 1: Data quality reference ranges.
Ratio events/cases
This is the ratio between the number of events and cases in a process. A very low or very high value indicates poor data quality. If the value is very high, standardize the activity names to reduce the complexity. If the value is very low, check if your data source actually represents a process.
Ratio cases/variants
This is the ratio between the number of cases and variants in a process. If the ratio is too low, it indicates that the number of variants is significantly high when compared to that of the cases. The low value indicates issues with data mapping or the absence or lack of standardization of activity names.
Average rework ratio per case
This is the average ratio between the total number of events and the number of distinct events in a process. A higher number indicates an undesirable amount of rework performed in the process.
Data quality reference ranges
Use the following table to understand the reference ranges of data quality ratings for each parameter.
Consider x as the ratio obtained for the parameter.
| Good | Warning | Bad | |
|---|---|---|---|
| Ratio events/cases | 5 <= x < 30 | 2 <= x < 5 or 30 <= x < 60 |
1 <= x < 2 or x >= 60 |
| Ratio cases/variants | x >= 5 | 2 <= x < 5 | 1 <= x < 2 |
| Average rework ratio per case | 1 <= x < 1.25 | 1.25 <= x < | x >= 2 |
Table 1: Data quality reference ranges