Data source

In IBM Process Mining, you must upload a data source, which is a log file either in a .CSV or a .XES format, to analyze the process. You can also compress the file to ZIP or GZ format for a faster upload. Additionally, the name of the data source file should adhere to the UTF-8 standards. If you are uploading a .zip file, then the name of all the files in the .zip file should adhere to the UTF-8 standards.

In the application, you can include a data source in both the following ways:

Using the Create a process-mining project wizard
For more information, see Create a process-mining project.
Using the Data source section on the Manage page

This document describes how to manage a data source by using the Data source section on the Manage page.

You can use the following links for more information:

Data source

Introduction

You can use the Data source section to perform the following actions:

Upload and manage data sources by using any of the following methods:
- Upload a data source from local device or network.
- Upload a simulated data source.
You can simulate the process data by using the BPMN page. For more information, see Simulation.
Map the data to the appropriate column headings
Visualize the process with the updated or modified data source.

IBM Process Mining prevents you from uploading a log file or data source that has different headers from exsitng data sources. You can include a new log file with more columns than the previous logs if the existing heading structure is persisting. For example, it is possible to add the new columns must be added on the right of the file.
However, you cannot include a new log file with fewer columns than the previous logs, even if the existing heading structure is persisting.

IBM Process Mining allows both the project owner and the collaborators, who access the project through snapshots to include and exclude data chunks within a project.

Following limitations are included in IBM Process Mining:

Only the owner of the project can delete data.
Data sources that are included in the snapshots are not visible to the project owner.

Managing data source using the Manage page

Uploading a data source

You can use the following steps to upload a data source:

In the left pane of the Manage page, click Data source.
In the Data source section, you can either upload a data source or get a data source file from simulation.
- To upload a data source, do one of the following steps:
- Click Drag and drop file here or click to upload. In the file browser, select the file and click Open.
- Click Upload file. In the file browser, select the file and click Open.
If the format and structure of the newly added data source comply with the existing data source, then it is uploaded to IBM Process Mining.
- To select a data source from the simulated data, complete the following steps:
1. Expand the Upload file overflow menu and select Get from simulation.
2. In the Import from simulation dialog box, expand the simulation that you want to import, and then select required simulation.
3. Click Import.
You can repeat the process to import other data sources that are obtained through simulation.

You can use the Included toggle to choose the data sources that you want to include in the project.

You can use the Delete button () to delete a data source from the table.

Mapping the data

You can use the following steps to map the data:

In the Data mapping subsection of the Data source section, note a sample mapping.
Click Edit data mapping to the Map your data step in the 'Create a process-mining project' wizard.
On the Map your data step of the 'Create a process-mining project' wizard, map the data with the appropriate columns headings, and then click Next.

Note that:
- It is mandatory to map the required columns headings, Process ID, Activity, and Start time.
- For more information on data mapping, see Step 6 of Create a process-mining project.
- To visualize and analyze complex processes with the Multi-Level Process Mining capability without biased statistics, you can map up to five “Process IDs”. To do so, select the corresponding headings and map them as Process ID. This action automatically enables the Multi-Level Process Mining capability of IBM Process Mining.
On the Time format configuration page, do the following steps:
a. Select the Use same time format for all dates checkbox.
b. In the Start Timestamp list, select the required time format.
c. Click Save.

Visualizing the data source

After completing the data mapping process, you must visualize the process with the new updates. Visualization is necessary every time you introduce a change into the process. It can also help you save all changes.

To visualize the process, in the Data source section, click Visualize. Process visualize

You can see the the updated process model in the Model tab.

Best practices to generate and map a data source for Multi-level Process Mining analysis

You must adhere to the following best practices when generating and mapping a data source for Multi-level Process Mining analysis.

A multi-level data source must contain different column for each entity (ProcessID) involved in the process. For example, a simple P2P process can contain four different columns such as RequisitionID, OrderID, ReceiptID, and InvoiceID.

Sample multilevel data source

In the displayed example, you can note the following behavior:

An order is created from a requisition and then released.
The order is received in two different goods receipt
The two goods receipts are registered and paid in a single invoice.

Data source generation for Multi-level Process Mining

On the data preparation side, in addition to correctly populating the columns with the respective entityID (ProcessID), you must ensure to create the relationship between those entities. To do so, you must identify the bridge activities within the process. A bridge activity represents the creation of an entity and must not include reworks on the same entityID.

For example, in P2P process, following are the typical bridge activities:

Order Creation
It represents the creation of the Order entity, which is linked to one Requisition entity. The same OrderID is never created twice and hence no reworks are expected.
Goods Receipt
It represents the registration of a Receipt entity, which is always linked to at least one Order entity. The same ReceiptID is never created twice and hence no reworks are expected.
Invoice Registration
It represents the registration of an Invoice entity, which is linked either to a Receipt or to an Order. The same InvoiceID is never created twice and hence no reworks are expected.

After identifying the bridge activities, you must correctly populate the corresponding records in the data source:

The bridge activity must contain the respective entityID, for example, Invoice registration must include a populated InvoiceID.
Generate one record of the bridge activity for every linked entityID, for example, an InvoiceID is linked to four different ReceiptIDs. That is, four Invoice registration activities have to be created; InvoiceID keeps the same value, whereas ReceiptID is always different.
- If multiple records are generated for the same bridge activity, they must have the same timestamp.
- IBM Process Mining recognizes the bridge activity and manage it with frequency 1, even if the record is repeated.
Never populate more than two entityIDs in the same record, for example, you cannot populate InvoiceID, ReceiptID, and OrderID in the same Invoice registration activity record.

It is important to follow the functional or logical flow of the process when populating the bridge activities. For example, in P2P, you must not populate the InvoiceID in the Order creation bridge activity because the Invoice is supposed to be generated after the Order. If an expected flow occurs (for example, invoice activities before order creation), IBM Process Mining handles it autonomously.

All the nonbridge activities should contain only the respective ID (no links with other entities). For example, in P2P, Order release activity refers only to Order entity.

Data source mapping

When mapping the ProcessIDs, it is important to follow the functional or logical flow of the process. For example, in P2P, you must map the entities with the following orders:

RequisitionID as ProcessID
OrderID as ProcessID2
ReceiptID as ProcessID3
InvoiceID as ProcessID4