Resolving The Problem
The primary function of the Event Manager is to guarantee scheduled execution of code. Note that the Event Manager is not executing the code, but scheduling it with the corresponding Process Server. Any work scheduled by a specific Event Manager is run on the local Process Server. The Event Manager scheduler is used anytime an undercover agent (UCA) is invoked, but is also used for processing business process definitions (BPD) notifications, executing business process definition system lane activities, and executing business process definition timer events - it is not specific to TWEvents or to undercover agents.
All date times in the process database are written by the database and use the database servers time clock. All timer based executions (BPD timers, scheduled UCAs, task due dates are all triggered and alerted by the process server. The time clocks of the Teamworks server and the database need to be in sync or events will not be processed properly.
To understand the Event Manager, you must first understand queues. The Event Manager has two types of queues: Asynchronous (async) and Synchronous (sync). Async queues are executed as soon as possible with no guaranteed order. Sync queues are executed serially. If you have multiple tasks set to run on one sync queue, they will execute one after the other in the order that they were put into the sync queue. The Event Manager treats sync and async queues differently.
Each task in a sync queue must be executed in serial. To prevent problems in a cluster, an Event Manager claims ownership of one or more sync queues when it starts up. The ownership is stored in the LSW_UCA_SYNC_QUEUE where QUEUE_OWNER is linked to OWNER_ID in LSW_EM_INSTANCE. This is not a permanent assignment. The LSW_EM_INSTANCE table keeps track of status of all of the event managers. The status is checked every 15 seconds (default) by the heartbeat period. If the owner of a sync queue is no longer available, another Event Manager takes ownership of that sync queue.
Async tasks are picked up by each Event Manager when there is room in their async queue for more tasks.
Each process server has its own running Event Manager. The Event Manager is configured by each process server's copy of the 80EventManager.xml file:
Explanation of Event Manager Settings
All time settings are in milliseconds.
- If this parameter set to true, the Event Manager is turned on for this process server instance. If you set this parameter to off, this process server has no event manager. Setting this parameter to false also disables the business process definition engine for this instance. This approach allows you to allocate process server instances for different duties.
- If this parameter is set to true, the scheduler for the Event Manager is started in a paused state. The Event Manager scheduler resumes if you specifically tell it to resume from the Event Manager Monitor console page, or if you click Resume All on that page.
Note: Pause/Resume always uses the Java Message Service (JMS) to send the request to the scheduler, even if you are pausing or resuming the web server to which you are connected. Pause/Resume is the only piece of the scheduler infrastructure that uses JMS - all other communication is done through the database.
- This parameter is commented out, by default, and the host name is used instead. This parameter is used to populate the LSW_EM_INSTANCE table and names the Event Manager as viewed from the Event Manager monitor. If your host name is not descriptive for you, you can uncomment this parameter and use a name of your choosing.
- These parameters are used to determine which Event Manager instances are up and running. These parameters should not need to be changed.
The heartbeat is a separate thread that constantly updates the lsw_em_instance database table to tell other schedulers that it is alive. The heartbeat runs even if the scheduler itself is paused. The lsw_em_instance table drives the content in the top section of the Event Manager Monitor console. A scheduler whose expiration time is in the past is treated as disconnected. When this situation happens, the other schedulers assume that it is dead and pick ups any additional work as necessary. The heartbeat of a non-disabled scheduler will update the lsw_em_instance every <heartbeat-period> milliseconds (15 sec by default), and it sets its expiration to <heartbeat-expiration> milliseconds in the future (60 sec by default). This situation means that if a process server machine gets completely unplugged, it will take 60 seconds until the other schedulers recognize it as disconnected.
Note: All schedulers whose heartbeat has ever run, are listed in lsw_em_instance and are shown on the Event Manager Monitor page. If a scheduler is disabled, it updates (or inserts) its row at start up, so that Event Manager Monitor users recognize that the machine is there, but the scheduler is disabled. The difference between "disconnected", "disabled", and "paused" is only visible by moving your mouse over the red light associated with the scheduler instance on the Event Manager Monitor page. If a particular machine no longer exists, and you do not want to see it on the Event Manager Monitor page anymore, you can carefully remove its row from the lsw_em_instance table. The row is re-created if a machine with that name starts up again.
- For every loader long period, the Event Manager looks at each queue (sync and async) that it has access to and fills them to capacity. This scenario is sometimes referred to as a major tick. This setting does not apply if "kick-on-schedule" is true.
- For every loader short period, the Event Manager looks through each of the queues that the Event Manager has and tries to fill them to capacity. Think of the loader long period as a sweep that fills the queue and the short period as the sweep that tries to fill any space that might be left over in the queue. This scenario is sometimes referred to as a minor tick. This setting does not apply if "kick-on-schedule" is true.
- For scheduled tasks, this parameter specifies how far in advance the Event Manager looks for tasks.
- This parameter specifies the number of tasks to fill for each sync queue that the Event Manager has acquired.
- This parameter specifies the number of tasks to fill for each async queue that the Event Manager has acquired.
- Business process definitions execute in their own async queue. The business process definition queue is used for timers firing, delivering messages to business process definition instances, and executing system lane tasks. This parameter is the queue depth for that queue.
- The Event Manager has its own internal queue. This parameter is barely used and should not need to be changed.
- This parameter specifies the minimum number of threads that the Event Manager should use.
- This parameter specifies the maximum number of threads that the Event Manager can use.
The thread pool is not per queue; it is the total number of threads for that Event Manager instance in that particular Process Server Java virtual machine (JVM).
Note: Your total available database connections in the application server connection pool should be at least 2x this number. The number of connections on the actual database server needs to be at least the sum of the max connection pool for all nodes in the cluster.
- This parameter specifies the maximum number of times to retry a failed task.
- When this parameter is set to true, a newly-scheduled task forces the Event Manager into an immediate poll of lsw_em_task, to reduce the time between when a new task is scheduled and when it will be executed. This parameter helps with latency - a newly-scheduled "right now" task is executed almost immediately - but hurts overall throughput, because the TaskLoader ends up being more active than it would be otherwise. If the kick-on-schedule is false, newly-scheduled tasks are not picked up until the next time the Event Manager polls lsw_em_task (up to the loader-long-period), which will increase latency. However it also increases overall throughput by reducing the chatter and contention on the lsw_em_task table For a system with heavily loaded Event Manager, this parameter should be set to false. Default is False for a Workflow Server and True for a Workflow Center. You should not change this setting.
- This parameter specifies the time between retries for failed tasks.
- This parameter is disabled by default. If the parameter is enabled, task history is maintained in the lsw_em_task_history table. You can then query this table to get the history of your tasks. Note: The product does not provide a way to display or clean up this data.
- For more information on how to enable this flag, please review "Optional Data Collection":
- This parameter specifies the time interval (milliseconds) the Sync Queue Controller wakes up and checks for Sync Queue jobs that need to be executed
- The Event Manager is quick and efficient. Usually it is the tasks it is executing that slows it down; not the Event Manager itself.
- If you want to throttle the Event Manager, do not decrease the thread pool. Instead, decrease the queue capacity.
- A sync queue can get stuck because it will not advance until the task finishes either successfully or failed 5 times (default). To help make this less of a problem, create multiple sync queues. You can manage sync queues in the Process Admin Console.
- You can delete on-hold Event Manager tasks via the Process Admin Console, REST API, or wsadmin commands: Deleting on-hold Event Manager tasks
All the time stamps used by the Event Manager scheduler - the heartbeat expirations and the task scheduled times - are interpreted relative to the system clock for the database machine. Thus, the scheduler does not require keeping the process server system clocks in sync. Keeping system clocks in sync is a good idea, however, for date-based tracking data, log analysis, and so on.
Business Process Manager, Business Automation Workflow
03 May 2021