Fixes are available
APAR status
Closed as program error.
Error description
Customer experiences the following exception, which then caused missing task timer executions and stuck instances. [8/31/13 4:01:08:261 EDT] 000068dc wle_ucaexcept E CWLLG0203E: Undercover Agent job failed. Task 22289533 job details are: class=com.lombardisoftware.bpd.runtime.engine.quartz.DbNotificat ionBpdTa sk parameters=[605640;14267244] Error: UOWManager transaction processing failed; nested exception is com.ibm.wsspi.uow.UOWException: javax.transaction.RollbackException [8/31/13 4:01:08:453 EDT] 000068e0 XATransaction E J2CA0027E: An exception occurred while invoking prepare on an XA Resource Adapter from DataSource jms/com.ibm.lombardi/EventEmissionQueueFactory, within transaction ID {XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000140d363c82b00000001354b872ca132aa12e00dbc22cb3da98070a cb91ff5c e17fb00000140d363c82b00000001354b872ca132aa12e00dbc22cb3da98070a cb91ff5c e17fb000000010000000000000000000000000002)} : javax.transaction.xa.XAException: CWSIC8007E: An exception was caught from the remote server with Probe Id 3-013-0010. Exception: CWSIC2029E: This transaction cannot commit as an operation that was performed within the transaction boundary failed. The first operation that failed generated the following exception: com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException: CWSIK0025E: The destination LombardiEventEmitterInputQueue on messaging engine prod.Messaging.000-MONITOR.ProcessServerCell01.Bus is not available because the high limit for the number of messages for this destination has already been reached...
Local fix
n/a
Problem summary
The Event Manager is responsible for scheduling and driving the execution of work in the Process Server/Process Center such as invocation of Under Cover Agents (UCAs), execution of Business Process Definitions (BPDs), invocation of BPD system task implementations and BPD timer events. This is done through so-called Event Manager tasks. They represent jobs the Event Manager is responsible to schedule. In case of exceptional situations, such as the a "queue full condition" of the monitor event queue, re-execution of those jobs kicks in in order to try overcoming the exceptional situation. There is a re-execute-limit specified in the 80EventManager.xml (or configured in 100Custom.xml) that determines the number of retries. Once that limit is reached, the respective Event Manager task is not retried anymore. This then could result in a BPD instance that is not continuing its navigation anymore - it hangs. PROBLEM DETAILED DESCRIPTION: The problem was observed when monitoring was enabled and the Process Server produced more monitor events than could be processed fast enough. Thus the respective queue filled up and resulted in a "queue full condition". Consequently the BPD transactions that tried to emit those monitor events failed and were rolled back. Respective Event Manager tasks were retried until reaching the re-execute-limit of the Event Manager. Upon reaching that limit, corresponding BPD instances were not navigated anymore. As a result many hanging process instances were observed that could only be recovered via the move token feature.
Problem conclusion
The problem is solved by enhancing the Event Manager such that Event Manager tasks that were retried until reaching the re-execute-limit can be resumed by administrative means once the exceptional situation is resolved. With this code fix in place when a task reaches the maximum limit When a work item has reached the maximum retry execution or the queue is full the items will be listed in the Event Manager console page and marked with a scheduled execution date of "2099-02-01." Due to localization it may also appear as "1/2/99". * With this APAR upon reaching the re-execute-limit, the respective Event Manager task is put on hold. * In addition, a task is created and assigned to the EM administrator (as specified via notify-error in the 80EventManager.xml). Note: The system could be configured such that the EM administrator is notified about such tasks via e-mail. Thus the EM administrator could be notified about the exceptional situation. * To resume Event Manager tasks that are on hold, a administrative command is provided that allows replaying such Event Manager tasks so that they can be scheduled by the Event Manager again: BPMReplayOnHoldEMTasks. Parameters: getNumberOfTasks - retrieves the number of Event Manager tasks that are on hold Set this parameter to true if the BPMReplayOnHoldEMTasks command should retrieve the number of Event Manager tasks available for replay. maxNumberOfTasksToReplay - replays on-hold Event Manager tasks up to a maximum number specified Use this parameter to set an upper limit for the number of on-hold Event Manager tasks to be replayed. bpdInstanceId - replays on-hold Event Manager tasks for the BPD instance specified Specifies for which BPD instance on-hold Event Manager tasks should be replayed. Note that the parameters are mutually exclusive. To invoke BPMReplayOnHoldEMTasks you must start wsadmin and connect it to the process server or process center. E.g., wsadmin -conntype SOAP -port 4080 -host PC1.mycompany.com -user admin -password admin -lang jython Examples Query the number of available on-hold Event Manager tasks in the system: wsadmin>AdminTask.BPMReplayOnHoldEMTasks ('[-getNumberOfTasks true]') 'The BPMReplayOnHoldEMTasks command found 20 on hold Event Manager Task(s) ready for replay.' Replay 13 on-hold Event Manager tasks: wsadmin>AdminTask.BPMReplayOnHoldEMTasks ('[-maxNumberOfTasksToReplay 13]') 'The BPMReplayOnHoldEMTasks command replayed 13 on hold Event Manager Task(s).' Replay on-hold Event Manager tasks for BPD instance 49: wsadmin>AdminTask.BPMReplayOnHoldEMTasks ('[-bpdInstanceId 49]') 'The BPMReplayOnHoldEMTasks command replayed 1 on hold Event Manager Task(s).' Replay all on-hold Event Manager tasks: wsadmin>AdminTask.BPMReplayOnHoldEMTasks(); 'The BPMReplayOnHoldEMTasks command replayed 20 on hold Event Manager Task(s).' Notes: - Before replaying on-hold Event Manager tasks, analyse the root cause that led to the on-hold Event Manager tasks. Replay on-hold Event Manager tasks after the root cause is identified and resolved. - When an Event Manager task is replayed, the associated notification task for the administrator is deleted. - If there is a large number of on-hold Event Manager tasks in the system, don't replay all Event Manager tasks at once. Start with replaying a chunk of 100 Event Manager tasks. Then replay a larger chunk. As long as the performance is satisfactory, keep increasing the chunk until all on-hold Event Manager tasks are replayed. Please note, that replaying too many on-hold Event Manager tasks in one chunk can create a lot of load on the system. In order to cope with this load, the system has to be tuned carefully. It is recommended to replay on-hold Event Manager tasks during times with low system load. FIX AVAILABILITY: iFix for 8.0.1.1 is available on Fix Central, search for APAR JR47860 at http://www.ibm.com/support/fixcentral/ Fix is also targetted for inclusion in next fixpack for BPM 8.0.1, BPM 8.5.0 When obtaining any of the above fixes, be sure to download the accompanying readme, for itself, and any prerequisite fixes, and review them thorougly.
Temporary fix
Comments
APAR Information
APAR number
JR47860
Reported component name
BPM ADVANCED
Reported component ID
5725C9400
Reported release
801
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-09-23
Closed date
2013-12-09
Last modified date
2015-02-06
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
BPM STANDARD
Fixed component ID
5725C9500
Applicable component levels
R801 PSY
UP
R850 PSY
UP
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSFTDH","label":"IBM Business Process Manager Standard"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.1"}]
Document Information
Modified date:
07 October 2021