Creating a Python logic file for process apps

To create a custom process app, you must create and upload a Python logic file. The Python logic file extracts and transforms your data, which becomes the source of data for your process mining project.

Complete the following steps to create a python logic file:

Download and open the python logic file template.
Specify the entry point.
Specify configurable variables.
Optional: Set a schedule setting for data refresh options.
Specify the output data format.
Optional: Handle exceptions in the Python code.

Specifying the entry point

The entry point or the entry function serves as the entry to the data transformation process. Without a defined entry point in the logic file, the data transformation fails even before it can begin.

This is the entry point in the python logic file template:

def execute(context):

Specifying configurable variables

Configurable variables compute the data transformation process. They include dynamic variables such as token key, server or systemURL, username, and password, and other dynamic input fields.

If you specify the variables, later on the user will be able to configure the event log data when they use your process app. For more information, see Defining user inputs.

Starting from Process Mining 1.14.3, getting configurable variables as environment variables setting is no longer supported.

You can specify configurable variables in a custom process app by using a configurable object from the context objects. The configuration variables are available with a Python dictionary that you can retrieve by using the context.config syntax. In the following example, you define the execute function as a context parameter:

def execute(context): 
config = context[“config”] 
# example retrieving an input property named ‘URL’ 
url = config[“URL”]

If the raw data source that the user of your process app uploads is in .CSV format, you can configure the Python logic file to transform the data to be the source for your process mining project. Before the user uploads the data, they must transform the file from .CSV format to .zip format. Then, the Python logic file accesses the .zip file to perform the necessary data transformation in the same directory location.

The logic file can also retrieve the name of the uploaded .zip file from the context JSON that is passed into the execute function of the logic file. The name of the key for the uploaded logic file name is fileUploadName and the value is the name of the uploaded .zip file. For example, if the name of the uploaded .zip file is “Input_sample.zip”, the logic file can retrieve the name of the .zip file by using the following command:

def execute(context): 
# Example retrieving the name of an uploaded zip file 
myFileUploadName = context[“fileUploadName”]

Optional: Setting a schedule for data refresh options

As a custom process app user, you have two scheduling options to manage updates to the event log data of your process:

The replacement method, where the existing data source gets replaced. The previous project data source is disregarded. The new data source is processed based on your requirements: you can either choose how frequently your data is replaced or use a calendar to set a preferred timing for data replacement.

If you only want to use the replacement method, you do not have to make any addtional changes in your Python logic file.
The incremental update method, where new data is added on top of the existing project data source. Both the previous and the new data are used in your project.

If you want to use the incremental update method, you must set the associated Python logic file to generate a schedule to fetch new data from an external source. To do that, specify two key variables, lastScheduleExecutionDate and lastScheduleExecutionTimezone, which are provided in the Python logic file.

The lastScheduleExecutionDate and lastScheduleExecutionTimezone variables are essential for logical operations within the Python script, allowing for the incremental fetching of new event log data based on the last successful schedule execution date and time zone.

You can access lastScheduleExecutionDate and lastScheduleExecutionTimezone variables through context object. These variables are accessible within the Python logic file by using the context dictionary. This is passed to the Python logic through the execute function:
```
last_schedule_execution_date = context["lastSuccessfulExecution"]["lastScheduleExecutionDate"]
last_schedule_execution_timezone = context["lastSuccessfulExecution"]["lastScheduleExecutionTimezone"]
…
# Subsequent code using last_schedule_execution_date and last_schedule_execution_timezone
if last_schedule_execution_date and last_schedule_execution_timezone:
   # Logic using the last scheduled execution date and timezone
```

Parsing the `last_schedule_execution` data into ISO 8601 format

The last scheduled execution date and time (that is, the last_schedule_execution_date) are in ISO 8601 format (yyyy-MM-dd'T'HH:mm:ss.SSS). After retrieving the data and time zone as shown in the preceding example, you can parse the last_schedule_execution data into ISO 8601 format string in the following way:

from datetime import datetime
import zoneinfo

...

timezone = zoneinfo.ZoneInfo(last_schedule_execution_timezone )

# Parse the ISO 8601 formatted date string
date = datetime.fromisoformat(last_schedule_execution_date ).astimezone(timezone)

For time zone processing, it is recommended to use use the Python zoneinfo module.

If the logic file is intended to do incremental, these variables are declared globally in the following way:


LAST_SCHEDULE_EXECUTION_DATE = None
LAST_SCHEDULE_EXECUTION_TIMEZONE = None

...

def execute(context):
   LAST_SCHEDULE_EXECUTION_DATE = context["lastSuccessfulExecution"]["lastScheduleExecutionDate"]
   LAST_SCHEDULE_EXECUTION_TIMEZONE = context["lastSuccessfulExecution"]["lastScheduleExecutionTimezone"]

Since the Python service calls the execute function, it passes the context to its execution function and in result both LAST_SCHEDULE_EXECUTION_DATE and LAST_SCHEDULE_EXECUTION_TIMEZONE are set.

For first ETL, LAST_SCHEDULE_EXECUTION_DATE and LAST_SCHEDULE_EXECUTION_TIMEZONE value must be None. So, wherever you are using these variables in your logic. The first ETL runs without the values and a check might be used as follows:

if LAST_SCHEDULE_EXECUTION_DATE and LAST_SCHEDULE_EXECUTION_TIMEZONE:
   timezone = zoneinfo.ZoneInfo(last_schedule_execution_timezone )
   date = datetime.fromisoformat(last_schedule_execution_date ).astimezone(timezone)

If you send this date format to a remote API, the preceding parsed date assumes that the format is what the remote API expects. If not, you can use the parse to convert to the appropriate format expected by the remote API.

Specifying the output data format

The entry point, which you define in the logic file, returns the output of the data transformation process. The output format must be a Python DataFrame object, which is a data structure to manipulate and store data in Python. The output of the data transformation process in the logic file must be either pandas or polars, which are the two DataFrame modules. In the following example, you configure the DataFrame object:

   def output(event_list): 
      return pd.DataFrame(event_list) 
   def execute(context): 
      extract_and_transform(context) 
      return output(event_list)

In the example, you need to change the functions per your requirements for the logic file.

Handling exceptions in the Python code

To inform the process app user about invalid credentials or lost connection to an external system, you can configure the Python code to send a dedicated exception named ProcessAppException in the process app user interface. The following code snippet gives an example of ProcessAppException configuration:

   from process_app import ProcessAppException 
   def execute(context): 
      # Example raising a ProcessAppException 
      raise ProcessAppException("cannot connect to the system")

Next step

Creating custom process apps