Data quality

In the Data quality tab, you can evaluate the quality of your project's data. Data quality is represented with the following set of parameters:

  • Ratio between events and cases.
  • Ratio between cases and variants.
  • Average rework ratio per case.

Based on the values of these parameters, IBM Process Mining rates the data quality as Good, Warning, and Critical. The following table indicates the reference ranges of the ratings for each parameter:

Good Warning Critical
Ratio events/cases 5 <= x < 30 2 <= x < 5
or
30 <= x < 60
1 <= x < 2
or
x >= 60
Ratio cases/variants x >= 5 2 <= x < 5 1 <= x < 2
Average rework ratio per case 1 <= x < 1.25 1.25 <= x < x >= 2

Consider x as the ratio obtained for the parameter.

Ratio events/cases

This is the ratio between the number of events and cases in a process. A very low or very high value indicates poor data quality. If the value is very high, standardize the activity names to reduce the complexity. If the value is very low, check whether your data source represents a correct process.

Ratio cases/variants

This is the ratio between the number of cases and variants in a process. If the ratio is too low, it indicates that the number of variants is significantly high when compared to that of the cases. The low value indicates issues with data mapping or the absence or lack of standardization of activity names.

Average rework ratio per case

This is the average ratio between the total number of events and the number of distinct events in a process. A higher number indicates an undesirable amount of rework performed in the process.