Data source

In IBM Process Mining, you must upload a data source, which is a log file either in a .CSV or a .XES format, to analyze the process. You can also compress the file to ZIP or GZ format for a faster upload. Additionally, the name of the data source file should adhere to the UTF-8 standards. If you are uploading a .zip file, then the name of all the files in the .zip file should adhere to the UTF-8 standards.

In the application, you can include a data source in both the following ways:

Using the Create Process wizard
For more information, see Create a process.
Using the Data source section on the Manage page

This document describes how to manage a data source by using the Data source section on the Manage page.

You can use the following links for more information:

Data source

Introduction

You can use the Data source section to perform the following actions:

Upload and manage data sources by using any of the following methods:
- Upload a data source from local device or network.
- Upload a simulated data source.
  Note: You can simulate data in the process by using the BPMN page. For more information, see Simulation.
Map the data to the appropriate column headings
Visualize the process with the updated or modified data source.

Important:

IBM Process Mining prevents you from uploading a log file or data source that has different headers from exsitng data sources.

You can include a new log file with more columns than the previous logs if the existing heading structure is persisting. For example, it is possible to add the new columns must be added on the right of the file. However, you cannot include a new log file with fewer columns than the previous logs, even if the existing heading structure is persisting.

Note: IBM Process Mining allows both the project owner and the collaborators, who access the process through snapshots to include and exclude data chunks within the process.

Following limitations are included in IBM Process Mining:

Only the owner of the process can delete data.
Data sources that are included in the snapshots are not visible to the project owner.

Managing data source using the Manage page

Uploading a data source

You can use the following steps to upload a data source:

In the left pane of the Manage page, click Data source.
In the Data source section, do one of the following steps to upload a data source:
- Select a data source from local device or network
  a. Click Uploaded data source.
  b. Click Drag and drop file here or click to upload or Add file.
  c. In the window that appears, select the required data source, and then click Open.
  d. If the format and structure of the newly added data source comply with the existing data source, then it is uploaded to IBM Process Mining.
- Select a data source from the simulated data
  a. Click Simulated data source to upload a data source that is obtained after running a stimulation on the process.
  b. Click Get from simulation.
  c. In the Import from simulation dialog box, expand the simulation that you want to import, click required simulation, and then click Import.
  
  d. IBM Process Mining displays only the simulated data source.
  Note: You can repeat the process the import other data sources that are obtained through simulation.
Notes:
- You can use the Included toggle button to choose the data sources that you want to include in the process.
- You can use the Delete button () to delete a data source from the table.

Mapping the data

You can use the following steps to map the data:

In the Data mapping section of Data source, note a sample mapping, and then click Edit data mapping.
On the Data mapping page of the Create process wizard, map the data with the appropriate columns headings, and then click Next. Notes:
- It is mandatory to map the required columns headings, Process ID, Activity, and Start time.
- For more information on data mapping, see Step 6 of Create a process.
- To visualize and analyze complex processes with the Multi-Level Process Mining capability without biased statistics, you can map up to five “Process IDs”. To do so, select the corresponding headings and map them as Process ID. This action automatically enables the Multi-Level Process Mining capability of IBM Process Mining.
On the Time format configuration page, do the following steps:
a. Select the Use same time format for all dates checkbox.
b. In the Start Timestamp list, select the required time format.
c. Click Save.

Visualizing the data source

After completing the data mapping process, it is mandatory to visualize the process with the new updates. It is recommended to use the Visualize button whenever the process is updated. Applying the configuration helps you to perform better analysis of the process, else all the changes are lost.

You can use the following steps to update the process:

In the Data source section, click Visualize.
On the Model page, view the activities of the updated process.

Best practices to generate and map a data source for Multi-level Process Mining analysis

You must adhere to the following best practices when generating and mapping a data source for Multi-level Process Mining analysis.

A multi-level data source must contain different column for each entity (ProcessID) involved in the process. For example, a simple P2P process can contain four different columns such as RequisitionID, OrderID, ReceiptID, and InvoiceID.

Sample multilevel data source

In the displayed example, you can note the following behavior:

An order is created from a requisition and then released.
The order is received in two different goods receipt
The two goods receipts are registered and paid in a single invoice.

Data source generation for Multi-level Process Mining

On the data preparation side, in addition to correctly populating the columns with the respective entityID (ProcessID), you must ensure to create the relationship between those entities. To do so, you must identify the bridge activities within the process. A bridge activity represents the creation of an entity and must not include reworks on the same entityID.

For example, in P2P process, following are the typical bridge activities:

Order Creation
It represents the creation of the Order entity, which is linked to one Requisition entity. The same OrderID is never created twice and hence no reworks are expected.
Goods Receipt
It represents the registration of a Receipt entity, which is always linked to at least one Order entity. The same ReceiptID is never created twice and hence no reworks are expected.
Invoice Registration
It represents the registration of an Invoice entity, which is linked either to a Receipt or to an Order. The same InvoiceID is never created twice and hence no reworks are expected.

After identifying the bridge activities, you must correctly populate the corresponding records in the data source:

The bridge activity must contain the respective entityID, for example, Invoice registration must include a populated InvoiceID.
Generate one record of the bridge activity for every linked entityID, for example, an InvoiceID is linked to four different ReceiptIDs. That is, four Invoice registration activities have to be created; InvoiceID keeps the same value, whereas ReceiptID is always different.
- If multiple records are generated for the same bridge activity, they must have the same timestamp.
- IBM Process Mining recognizes the bridge activity and manage it with frequency 1, even if the record is repeated.
Never populate more than two entityIDs in the same record, for example, you cannot populate InvoiceID, ReceiptID, and OrderID in the same Invoice registration activity record.

It is important to follow the functional or logical flow of the process when populating the bridge activities. For example, in P2P, you must not populate the InvoiceID in the Order creation bridge activity because the Invoice is supposed to be generated after the Order. If an expected flow occurs (for example, invoice activities before order creation), IBM Process Mining handles it autonomously.

All the nonbridge activities should contain only the respective ID (no links with other entities). For example, in P2P, Order release activity refers only to Order entity.

Data source mapping

When mapping the ProcessIDs, it is important to follow the functional or logical flow of the process. For example, in P2P, you must map the entities with the following orders:

RequisitionID as ProcessID
OrderID as ProcessID2
ReceiptID as ProcessID3
InvoiceID as ProcessID4