To analyze the process, IBM Process Mining requires you to upload a log file (CSV or XES) into the Datasource. Compress the file in a ZIP/GZ format for a faster upload.


In the Datasource you can perform the following actions.

Upload log file

Upload log file

Manage data set

Manage simulated data set

IBM Process Mining prevents you from including a log file that has different headers from the already included logs.

However, you can include a new log file with more columns than the previous logs, as long as existing heading structure is persisting. For example, the new columns must be added on the right of the file.

You cannot include a new log file with less columns than the previous logs, even if the existing heading structure is persisting.

Note: both the project owner and the collaborators (snapshots) are able to include and exclude data chunks within the process. However,

Upload the reference model

Upload the reference model

Please consider that only the Project Owner can upload/remove the Reference Model.

How to derive a Reference Model from data using IBM Process Mining

  1. Select and filter one or multiple most frequent variants to identify the “expected behavior” (check here for further details)
  2. Enter the BPMN tab, after setting model details set to 100% activity and 1% relations, then click on “Edit a copy”.
  3. On our BPA tool you can:

    a. Refine the created BPMN model, adding/removing activities or relations

    b. (Optional) Download it

    c. Use it as a reference for the process mining project.

Map data columns

Map relevant data columns

After uploading the CSV file, IBM Process Mining requires you to map your log file by Basic Headings (Process ID, Activity, Start time, End Time, Resource, and Role) and by Custom Fields (more relevant data).

It is mandatory to map at least one process ID, the activity field, and a datetime field (as start time) in order to be able to visualize your process.

First, select the column, then select the corresponding heading as shown in the following image:


To visualize and analyze complex processes with the Multi Level Process Mining capability, for example, Order to Cash or Procure to Pay processes, without biased statistics, you can map up to five “Process IDs” by selecting the corresponding headings and map them as Process ID. This action automatically enables IBM Process Mining's Multi-Level Process Mining capability.

Multi-level process mining example

The Start time and End time can be written in different ways in your data (for example, 18/03/13, 03/18/13, 18March2013, and so on), you must define your timestamps in an acceptable manner to ensure the readability by IBM Process Mining.


The following chart displays how to uniform your data in order to make it readable to IBM Process Mining. For example, in the previous image the timestamp dated the 6 of August 2013 at 3:22 PM must be edited dd/MM/yyyy HH:mm.


When you map a Custom field, you must specify the type of data that is in the respective column and whether the field is mandatory for every event (every line of the log file).

Custom fields

Data type can be either Text, Numeric (decimal), Integer, Date, or Amount.

Use Numeric or Integer types when you expect to run computations on the custom fields (for example, compute statistics).

Once the Numeric type is selected you need to specify the decimal separator (for example, the comma in Italy) in the "pattern" field. The dot is considered as a default.

Use the Amount type when the custom fields represent a monetary amount (a pattern is required, as for the Numeric type).

Use the Date type if the column contains dates (a time pattern is required, as for the Start time and End time).

Note: if you set the custom field as mandatory, wherever it has no value in a record, a Critical issue is generated inside the data checking.

Visualize your process

When you map the log file, it is possible to visualize the process by clicking the blue box.


Every time something changes in the Workspace you need to click again on “Visualize Your Process” otherwise you will not apply the changes.

Best practices to generate and map a datasource for Multilevel Process Mining analysis

The generation and mapping of a datasource for Multilevel Process Mining analysis should follow specific best practices in order to have a correct and consistent outcome. First of all, a multilevel datasource contains a different column for each entity (ProcessID) involved in the process. For example, a simple P2P process may contain 4 different columns, related to RequisitionID, OrderID, ReceiptID and InvoiceID.

Sample multilevel data source

In the displayed example, the behavior is the following:

Datasource generation

On the data preparation side, in addition to correctly populating the columns with the respective entityID (ProcessID), you must make sure to create the relationship between those entities: to do so, you must identify "bridge activities" within the process. A bridge activity is typically representing the creation of an entity and should never have reworks on the same entityID. For example, in P2P process, typical bridge activities are:

Once identified the bridge activities, you have to correctly populate the corresponding records in the datasource:

It's important to follow the functional/logical flow of the process while populating the bridge activities: for example, in P2P, we never populate the InvoiceID in the Order creation bridge activity, because the Invoice is supposed to be generated after the Order. If an expected flow occurs (e.g. invoice activities before order creation), Process Mining will be able to handle it autonomously.

All the non-bridge activities should contain only the respective ID (no links with other entities). For example, in P2P, Order release activity is referring only to Order entity.

Datasource mapping

Even during the mapping of the ProcessIDs, it's important to follow the functional/logical flow of the process. For example, in P2P, you must map the entities with the following orders:

  1. RequisitionID as ProcessID;
  2. OrderID as ProcessID2:
  3. ReceiptID as ProcessID3;
  4. InvoiceID as ProcessID4.