BPM - Instances Cleanup Strategy in Constrained Situations
SmithaVenugopal 060001159S Comment (1) Visits (7119)
The running of business processes and services in IBM Business Process Manager (BPM) involves persisting data in multiple tables in different cardinalities in the BPM Process Server schema (assuming Oracle as the database in this blog) including the persistence of the process instances, tasks, variables etc.
Once these processes are completed or terminated, the data is of little use to the functional needs of the business. At a given point in time, a process can assume the following states
As the data grows, it leads to degradation of performance leading to a plethora of issues working with the database. Un-responsive queries from BPM can be monitored and tuned (DB) to help for some time. For instance, to fix a lock wait timeout, the following query was tuned by creating a function based index on the table:
However, reducing the data remains the only option when you still see issues like:
It’s discouraged to persist system tasks in the database, hence you should have them removed automatically when completed.
Nevertheless, housekeeping becomes a must have for IBM BPM enabled systems that sees regular purge activities to keep the database lean.
We often find our clients in situations where the business data (in the form of variables and instances and task data) is stored in the product database for years together and at some point of time, the need for a leaner database becomes the only way to sustain the system.
Much of their operations depend on this historical data for correlating different aspects of their business over time.
So what do you do when you start facing problems with the basic (database) operations in BPM? The typical constraints faced with are:
A way forward could be:
There are multiple ways to strategize the archive policy depending on the urgency of the cleanup.
Since this blog references IBM BPM 8.5.6 cumulative fix 2, the instance cleanup is done using the BPMP
Strategies to Purge
If the data volume is very high (e.g. 14 million tasks and related process data), it may not be feasible to purge in one activity.
One would need to make a choice and plan to perform the initial purge based on the following:
For example, to perform a purge on a data volume than runs in millions or more as an immediate action, it will take a longer time to complete purging. Hence, it is advisable to perform this during a scheduled outage.
Since this command is run in a connected mode, an outage will have to be induced by blocking all the inbound channels to stop requests coming in and by blacking out the running of the active instances.
To manage the Purge activity better, the purge can be done in batches of a manageable size (taking into consideration the soap timeout settings specified in the soap.client.props file (/pr
The size can be determined by querying the database for comp
For example, the query below will provide a view of the completed instances for a day for a particular snapshot:
Select s.acronym, to_d
A sample output is as follows / attached ( QueryResponse.
Based on the output, create a script to purge in multiple commands (batches). An example could be, if the soap connection has been observed to time out after purging 35k instances, batches less than 35k can be purged in one command run. A python script can be written with multiple commands to purge chunks of instances, based on some criteria as described above.
One such command could look like the one below:
The above command deletes 33016 completed instances split in transactions of 1000 instances.
If the transaction timeout value permits, the transaction slice value can be adjusted/increased still further.
Once the initial purge is done, purging smaller number of instances can be attempted online when there is the least business activity going on.
In situations where there is no time when there is no activity on the system and/or there is any activity that is centered around database (e.g. a searching process and task data), it would be wise to reduce the transaction slice to a small number so that db locks are not held long enough to avoid deadlocks/lock wait timeouts.
Note: Each time a purge is run, monitoring database along with the BPM servers is a good idea to avoid issues.
After purge, it is very critical to perform a RE-INDEX on the process database especially when a large number of instances has been deleted. The key tables could comprise of the following:
If incremental replication is not possible, before the subsequent purges are devised, an archive strategy has to be thought over along with the purge strategy. The factors influencing the frequency of archival could be:
For a healthy BPM environment, it is important to perform cleanup in other areas as well (e.g. PDW cleanup, Snapshots cleanup, cleanup of the durable messages, EM On hold tasks etc.) as described in the article Purging data in IBM Business Process Manager.
A note on housekeeping: There are a lot of improvements in this area that can ease the operations' activities, including archive based on snapshot in the latest versions of IBM BPM.
Besides housekeeping, it is also important to ensure the performance of the database in conjunction with IBM BPM.
For more information refer to:
Note: Archival before cleanup should be a stop gap arrangement only and is not fool proof in terms of being able to read all the data (as it is done using BPM RUNTIME), where the operations teams depend on REST services or other means provided by BPM to read the execution data. Hence, it mandates that the application takes care of such visibility requirements outside the product database.