Creating a Python logic file for process apps
To create a custom process app, you must create and upload a Python logic file. The Python logic file extracts and transforms your data, which becomes the source of data for your process mining project.
Complete the following steps to create a python logic file:
Specifying the entry point
The entry point or the entry function serves as the entry to the data transformation process. Without a defined entry point in the logic file, the data transformation fails even before it can begin.
This is the entry point in the python logic file template:
def execute(context):
Specifying configurable variables
Configurable variables compute the data transformation process.
They include dynamic variables such as token key,
server or systemURL,
username, and password, and other dynamic
input fields.
If you specify the variables, later on the user will be able to configure the event log data when they use your process app. For more information, see Defining user inputs.
Starting from Process Mining 1.14.3, getting configurable variables as environment variables setting is no longer supported.
You can specify configurable variables in a custom process app
by using a configurable object from the context objects. The
configuration variables are available with a Python dictionary that
you can retrieve by using the context.config syntax.
In the following example, you define the execute
function as a context parameter:
def execute(context):
config = context[“config”]
# example retrieving an input property named ‘URL’
url = config[“URL”]
If the raw data source that the user of your process app uploads
is in .CSV format, you can configure the Python logic
file to transform the data to be the source for your process mining
project. Before the user uploads the data, they must transform the
file from .CSV format to .zip format.
Then, the Python logic file accesses the .zip file to
perform the necessary data transformation in the same directory
location.
The logic file can also retrieve the name of the uploaded
.zip file from the context JSON that is
passed into the execute function of the logic file.
The name of the key for the uploaded logic file name is
fileUploadName and the value is the name of the
uploaded .zip file. For example, if the name of the
uploaded .zip file is “Input_sample.zip”,
the logic file can retrieve the name of the .zip file
by using the following command:
def execute(context):
# Example retrieving the name of an uploaded zip file
myFileUploadName = context[“fileUploadName”]
Optional: Setting a schedule for data refresh options
As a custom process app user, you have two scheduling options to manage updates to the event log data of your process:
-
The replacement method, where the existing data source gets replaced. The previous project data source is disregarded. The new data source is processed based on your requirements: you can either choose how frequently your data is replaced or use a calendar to set a preferred timing for data replacement.
If you only want to use the replacement method, you do not have to make any addtional changes in your Python logic file.
-
The incremental update method, where new data is added on top of the existing project data source. Both the previous and the new data is used in your project.
If you want to use the incremental update method, you must set the associated Python logic file to generate a schedule to fetch new data from an external source. To do that, specify two key variables,
lastScheduleExecutionDateandlastScheduleExecutionTimezone, which are provided in the Python logic file.The
lastScheduleExecutionDateandlastScheduleExecutionTimezonevariables are essential for logical operations within the Python script, allowing for the incremental fetching of new event log data based on the last successful schedule execution date and time zone.You can access
lastScheduleExecutionDateandlastScheduleExecutionTimezonevariables through context object. These variables are accessible within the Python logic file by using the context dictionary. This is passed to the Python logic through the execute function:last_schedule_execution_date = context["lastSuccessfulExecution"]["lastScheduleExecutionDate"] last_schedule_execution_timezone = context["lastSuccessfulExecution"]["lastScheduleExecutionTimezone"] … # Subsequent code using last_schedule_execution_date and last_schedule_execution_timezone if last_schedule_execution_date and last_schedule_execution_timezone: # Logic using the last scheduled execution date and timezone
Parsing the last_schedule_execution data into ISO 8601 format
The last scheduled execution date and time (that is, the
last_schedule_execution_date) are in ISO 8601 format
(yyyy-MM-dd'T'HH:mm:ss.SSS). After retrieving the data and time
zone as shown in the preceding example, you can parse the
last_schedule_execution data into ISO 8601 format
string in the following way:
from datetime import datetime
import zoneinfo
...
timezone = zoneinfo.ZoneInfo(last_schedule_execution_timezone )
# Parse the ISO 8601 formatted date string
date = datetime.fromisoformat(last_schedule_execution_date ).astimezone(timezone)
zoneinfo module.If the logic file is intended to do incremental, these variables are declared globally in the following way:
LAST_SCHEDULE_EXECUTION_DATE = None
LAST_SCHEDULE_EXECUTION_TIMEZONE = None
...
def execute(context):
LAST_SCHEDULE_EXECUTION_DATE = context["lastSuccessfulExecution"]["lastScheduleExecutionDate"]
LAST_SCHEDULE_EXECUTION_TIMEZONE = context["lastSuccessfulExecution"]["lastScheduleExecutionTimezone"]
Since the Python service calls the execute
function, it passes the context to its execution function and in
result both LAST_SCHEDULE_EXECUTION_DATE and
LAST_SCHEDULE_EXECUTION_TIMEZONE are set.
For first ETL, LAST_SCHEDULE_EXECUTION_DATE and
LAST_SCHEDULE_EXECUTION_TIMEZONE value must be None.
So, wherever you are using these variables in your logic. The first
ETL runs without the values and a check might be used as
follows:
if LAST_SCHEDULE_EXECUTION_DATE and LAST_SCHEDULE_EXECUTION_TIMEZONE:
timezone = zoneinfo.ZoneInfo(last_schedule_execution_timezone )
date = datetime.fromisoformat(last_schedule_execution_date ).astimezone(timezone)
Specifying the output data format
The entry point, which you define in the logic file, returns the
output of the data transformation process. The output format must
be a Python DataFrame object, which is a data
structure to manipulate and store data in Python. The output of the
data transformation process in the logic file must be either
pandas or polars, which are the two
DataFrame modules. In the following example, you
configure the DataFrame object:
def output(event_list):
return pd.DataFrame(event_list)
def execute(context):
extract_and_transform(context)
return output(event_list)
Handling exceptions in the Python code
To inform the process app user about invalid credentials or lost
connection to an external system, you can configure the Python code
to send a dedicated exception named
ProcessAppException in the process app user interface.
The following code snippet gives an example of
ProcessAppException configuration:
from process_app import ProcessAppException
def execute(context):
# Example raising a ProcessAppException
raise ProcessAppException("cannot connect to the system")