Data quality

In the Data quality tab, you can evaluate the quality of your project's data. Data quality is represented with the following set of parameters:

Ratio between events and cases
A very low or very high value indicates poor data quality. If the value is very high, standardize the activity names to reduce the complexity. If the value is very low, check whether your data source represents a correct process.
Ratio between cases and variants
If the ratio is too low, it indicates that the number of variants is significantly high when compared to that of the cases. The low value indicates issues with data mapping or the absence or lack of standardization of activity names.
Average rework ratio per case
The average ratio between the total number of events and the number of distinct events in a process. A higher number indicates an undesirable amount of rework performed in the process.
Number of distinct activities
The total count of unique activity names in your process. A very high number of distinct activities (100 or more) can indicate poor data quality, such as lack of activity name standardization, overly granular activity definitions, or inconsistent naming conventions. This complexity makes process analysis more difficult and can impact system performance.
Number of distinct relations
The total count of unique activity-to-activity transitions (directly-follows relationships) in your process. A very high number of distinct relations (2500 or more) indicates high process complexity and variability, which can result from poor data quality, lack of process standardization, or an overly complex process model. High relation counts make process visualization and analysis challenging and can significantly impact system performance.

Based on the values of these parameters, IBM Process Mining rates the data quality as Good, Warning, and Critical. The following table indicates the reference ranges of the ratings for each parameter:

Parameter Good Warning Critical
Ratio events/cases 5 <= x < 30

2 <= x < 5

or

30 <= x < 60

1 <= x < 2

or

x >= 60

Ratio cases/variants x >= 5 2 <= x < 5 1 <= x < 2
Average rework ratio per case 1 <= x < 1.25 1.25 <= x < 2 x >= 2
Number of distinct activities 1 <= n < 100 100 <= n -
Number of distinct relations 1 <= n < 2500 2500 <= n -

x is the ratio of the given parameter and n is the number of activities or relations.