When modeling business processes, fault and exception handling require special care. The Web Services Business Process Execution Language (WS-BPEL) standard offers fault handlers and compensation handlers to model failure and exceptional paths in your business process logic. IBM® WebSphere® Integration Developer exploits this concept, and lets you model fault and compensation handling paths in business processes. Even if you take this step with adequate care, failure situations remain at runtime that are simply not predictable at authoring time. Moreover, there are exceptional situations that need to involve manual repair, for example, an administrator who analyzes and resolves the situation. WebSphere Process Server V6.1.2 supports manual repair and change of control flow logic. It comes with new functions to manually modify the state of process instances. These can be used to overcome exceptional situations and thus to repair the process instance.
As powerful as these functions are, they require far-seeing and careful treatment. This article describes these new repair features, provides useful tips, and identifies potential pitfalls when applying them.
You should have a good understanding of how to model business processes with WebSphere Integration Developer and how to run and administrate these processes with WebSphere Process Server. You should also have basic knowledge of how BPEL processes are executed in WebSphere Process Server.
This article shows you how to repair processes, and specifically prepares you for how to dynamically react to exceptional situations where modeled fault handling and automated recovery mechanisms do not help to solve the situation. Examples for such situations include:
- A service provided by a separate software component is unavailable for a certain time period, and the service implementation has to be taken over by manual interaction. While this situation could have been foreseen to some extent, that is, the process modeler could have implemented a fault handler that treats the fault, appropriately, for example, by adding a human task activity to the fault handler, this level of exactness and completeness in fault handling cannot be assumed in every case.
- In real life scenarios, it can happen that a small number of process instances cannot be executed correctly. Typically, a new business process is extensively tested before it is deployed in a production system. Unfortunately, when a business process is huge and complex, or in certain constellations of input data, the process may fail, anyway.
The first section of this article describes what can already be considered during modeling time in order to benefit best from the new repair features, in case of new models. The next section discusses how a long-running process can be saved from major damage, when unforeseen faults occur. The final section provides an overview of the new functions in WebSphere Process Server V6.1.2, namely skip and jump, that allow you to modify the default execution behavior of a long-running process.
This article does not cover how to model fault and compensation handling in general. The paper is also not about how to use WebSphere Process Server's automated recovery mechanisms such as Failed Event Manager, hold and retention queue handling, and so forth. To learn more about these topics, see the Resources section.
All these facets are illustrated with a sample process that handles an order request. Its overall structure is shown in figure 1.
Figure 1. Structure of Order Process
The process consists of three major steps, each communicating with another service:
- Align customer data
In this first step, the customer data is checked. The process therefore communicates with an external service, called Customer Registration Service. The process checks whether the requester is a known customer. If yes, the data of the customer is retrieved from the registration service. If it is a new customer, the customer data has to be entered in a Human Task activity, and it is added to the registration service. Finally, the current bonus rate for the customer is calculated.
- Inventory management
The order process now checks whether all shipment items are available; done using the Warehouse Service. Items not in stock are ordered from other suppliers, and the process waits until all requested items are complete.
Finally, the order is shipped and the customer is invoiced. The order request process therefore communicates with the Shipping Service.
To highlight the process repair options available, this article concentrates on the Align Customer Data step: assuming the Customer Registration Service is currently unavailable and all calls to the customer service return with a runtime exception. The objective of the process repair actions is now to overcome this first step, that is, avoid the hold up of all order requests.
This article also assumes the interaction with the Customer Registration Service can be manually reworked at a later point in time when the service is available. All calls to the customer service can be skipped, allowing the actual order to proceed.
What needs to be considered at authoring time
As stated in the introduction, repairing process instances is independent of modeling specific things such as fault and compensation handling; however, it is important to note, that you can use the repair functions for all long-running process instances independent of how they are modeled. Although, there are some aspects that you can consider at authoring time to fully exploit the new functions. These aspects are described in the following sections.
Continue On Error
Process repair basically means to manually intervene in a failing process instance, in such a way that it can successfully complete some work that has previously failed. This intervention is the more effective and practicable the closer it is performed to the actual point of failure (that is, where the fault has been raised). If the fault is propagated through several levels of fault handlers, causing the termination of all other activities, and finally the process instance itself, there is no way to repair it. This is why business processes offer the Continue On Error option. When you set this attribute to No, the process instance will halt the current execution path where the fault happens and the fault is not handled by any fault handler on the direct enclosing scope or activity. The activity instance, at which the fault was raised, is put in a special state stopped. Notice that parallel paths in the process instance can proceed and the process state remains running. Note further that the activity state stopped is not restricted to basic activities: structured activities, such as sequences or scopes, can stop as well.
You can set the Continue On Error attribute on process level as shown in figure 2. The default value is No.
Figure 2.Setting Continue On Error on process level
The Continue On Error attribute can be overwritten at the activity level (see figure 3) for the following activity types: that is, invoke, Java™ snippet and human task activities. You should leave the default, Same as Process.
Figure 3. Setting Continue-on-Error for activities
Important: If you set the Continue On Error attribute to No, there is no negative performance impact to the normal process execution
Whenever a process repair operation is undertaken, the person who requested the repair action needs to be authorized to do it, that is, depending on the concrete action, the person has to be a process administrator, a scope administrator, an activity administrator, or a system administrator.
The system administrator is a role defined on the BPEContainer application. All other administrators are specific to the respective process application, and they are defined in the process model.
You can define process administrators at authoring time on the properties pane of the process. Declare them on the Administration section, as shown in figure 4.
Figure 4. Define a process administrator
If no process administrator is defined, the process starter becomes the process administrator.
A process administrator is allowed to:
- Repair activity instances in state "stopped", for example, force-complete and force-retry, anywhere in the process instance.
- Skip activity instances.
- Perform jumps between activities.
- Modify global and local variables.
- Perform administrative actions on the process instance, for example, terminate or delete a process instance.
Scope administrators can be defined at authoring time on the properties pane of the scope activity. As shown in figure 5, it is declared on the Administration section.
Figure 5. Define a scope administrator
Note that scope administrators are passed on by scopes to their enclosed scopes, that is, scope administrators of enclosing scopes become scope administrators of the current scope. In addition, the process administrator becomes scope administrator.
A scope administrator is allowed to:
- Repair stopped activity instances, for example, force-retry and force-complete; those are located in the scope.
- Skip activity instances that are enclosed in the scope.
- Perform jumps between activities contained in the scope.
- Modify variables that are defined on the scope or on enclosed scopes.
In subsequent sections of this article, we show, that some repair actions can target activities that are not yet reached by the process navigation (see "Skip an activity instance").
Important: It is important to understand that, in this case, only scope administrators of scopes that have already been reached by the process navigation are allowed to perform repair actions. This means in case the scope, where your repair action takes place, is not yet reached, you must either be a scope administrator of an exterior enclosing scope or process administrator in order to execute the repair action.
For some activity types, for example, invoke and Java Snippet activities, you can define an activity administrator in the Administration section of the activity as shown in figure 6.
Figure 6. Administrator for an activity
In addition, you can define a default activity administrator on process level (see Figure 4, second human task specification). It is applied for all activities in the process that do not have an explicit administrator defined.
Process administrators and scope administrators of enclosing scopes automatically become activity administrators of all activities contained in this scope.
An activity administrator is allowed to:
- Repair the activity instance in state stopped, for example, force-retry and force-complete.
- Skip the activity instance.
Recommendations for assigning administration rights
We recommend defining at least a process administrator in order to avoid potentially unauthorized repair actions from the process starter. Since scope administrators are inherited down to enclosing scopes, and this results in additional persistent data, we recommend sparse but intended usage of scope administrators.
You should also assign process reader rights to scope administrators; otherwise, the scope administrators are not allowed to see the process instance in the BPC Explorer, which is actually the starting point for applying skip and jump functions.
Scope administrator -- example
In our example, all customer related actions are collected in a scope called AlignCustomerData, as already stated in the introduction (see figure 1).
The customer relationship manager "Jeff" is allowed to view and repair this area in the process. However, other areas of the process are not of his interest and should not be changed by him for security reasons. This is why we added Jeff to the scope administrator role of scope AlignCustomerData.
Persist all activities
During the navigation of a long-running process, activity instances are not always persisted. If a short-running activity, for example, an assign activity, is fully contained in one transaction, the activity instance data may not be persisted in the database.
On the one hand, this behavior may be desired, because it saves database accesses and thus improves the overall performance of the long-running process. While on the other hand, the more historic data you have, the better you can do process repair. For instance, you can determine the state of all activity instances and thus the execution progress of the whole process instance.
During modeling time, you can however, decide that the activity data is always persisted even if it is not required for navigation. You can do this by setting the Enable persistence and queries of business relevant data of the respective activity, see figure 7.
Figure 7. Enable persistence and queries of business relevant data
Consider setting this attribute for activities in critical areas of your processes, where you expect process repair to happen. Be aware, that this impacts the performance, even in cases where no process repair is needed.
Unique activity names
A jump between two activities is only supported if the target activity has a unique name. This is something you have to bear in mind when you are naming the activities during process modeling (referring to the name and not the display name of an activity).
Ignore missing data
WS-BPEL actually specifies that a runtime fault is thrown when empty data is assigned, for example, if the source of an assignment is an uninitialized part. However, if the Ignore missing data attribute of a business process is set, these exceptions are suppressed.
Figure 8 shows, that this attribute is enabled for our sample process.
As we show in this article, a process repair action may indicate that certain steps in a long-running process are skipped; as a result the output data of these steps may not be available for subsequent steps. In order to keep the runtime faults from being thrown when the data is accessed later on, it is beneficial to enable the attribute.
Figure 8. Ignore missing data
Repairing an activity in state stopped
Exceptions can occur anywhere in a process instance. They can be caused by a modeling or coding error (for example, access to uninitialized data) in the business process logic or by a problem outside the process (for example, a called service is not available). An activity can stop when an unexpected -- and therefore unhandled -- exception occurs.
In the following we assume, that the Continue On Error attribute is set in such a way that all activities stop upon unhandled faults.
An exception that occurs in a process can always be associated with an activity instance and may happen either before, during, or after its actual implementation, that is, an activity can stop at different times or execution phases. Moreover, the time or execution phase, when in the life-time of an activity the problem occurs, determines what repairing actions can be performed.
A stopped activity instance carries an additional attribute -- stop reason, which tells the phase when the activity has been stopped. We distinguish three execution phases and therefore three stop reasons.
Failure during the Activation of the activity - stop reason: activation failed
Before an activity is activated, certain conditions must be met, that is, the join condition of the activity must be true. If the evaluation of the join condition fails, for example, an exception is thrown during the execution of a Java implemented join condition, the stop reason is ACTIVATION_FAILED.
Failure during execution of the activity - stop reason: implementation failed
When the exception occurs during the actual implementation of the activity the stop reason is IMPLEMENTATION_FAILED.
This is the most common and multifaceted case. Examples include:
- The expression in an assign accesses an uninitialized part of a variable.
- A runtime exception occurs in a Java Snippet.
- The invocation of an external service returns an unexpected fault.
- The staff resolution of a human task fails.
- The case condition evaluation of a choice condition fails.
Failure upon leaving the activity - stop reason: follow-on navigation failed
After the implementation of an activity has been completed, the Business Flow Manager examines out-going links of the activity and evaluates the transition conditions on these links. If the evaluation of a link condition fails, the activity stops with stop reason FOLLOW_ON_NAVIGATION_FAILED.
To resume the process navigation at a stopped activity, there are two repair actions available: force-retry and force-complete. Keep in mind, in order to repair a process instance, it might be necessary to update one or more process variables to avoid the same failure again before one of these functions can be applied.
When an activity is force-completed or force-retried, you can overwrite its continue-on-error behavior. If this is done and the activity fails again, the exception is propagated to the fault handling of the process and the activity is not stopped again. This can be useful in scenarios where a certain fault is handled not on the next enclosing but on other enclosing scopes.
When an activity is force-retried, its implementation is run again. If it is an invoke activity or human task activity, input data can be optionally provided. Note, that process variables are updated with the provided input data.
The force-retry repair action can be used if the user wants an activity to be re-executed, for example, if an invoke activity was stopped because a service was unreachable but it has been fixed in the meantime.
When an activity is force-completed, the activity is put in state finished and the follow-on navigation is continued. In case of an invoke, receive, or human task activity, the user can provide an output message with the force-complete request. This is treated as the normal output of the activity and the process variables are updated, respectively. If no output is provided for an invoke or human task activity, the process variable is not updated.
A force-complete can be used, for example, to manually complete an activity by an administrator and continue in the predefined control flow of the process, because he expects the occurred exception from happening again, when the activity is re-run.
Note that force-complete can be combined with the jump action, see section Jumping to another activity for more details.
Force-complete with fault
When force-completing an activity, it is possible to provide a fault message instead of an output message. The activity is then put in state failed and the fault is propagated to the fault handler of the next enclosing scope. This is required if an unexpected error occurs and there is no fault handler specifically modeled to handle it. The user can force-complete an activity with a fault to manually enforce that a certain fault handler, that handles a fault different from the one that made the activity instance stop, is triggered to handle this fault.
Repair actions and stop reason
Not all repair actions are allowed for every stop reason. The three execution phases of an activity, its stop reasons, and the allowed repair actions are summarized in the table below.
Table 1. Stop reason and allowed repair actions
|Phase in activity's lifetime||Stop reason||Allowed repair actions|
|Join condition evaluation||activation failed||force-retry|
|Run actual invocation||implementation failed||
|Evaluate transition conditions of links leaving the activity||follow-on navigation failed||force-complete|
If the activity has not yet been started, that is, the stop reason is activation failed, it is not allowed to force-complete the activity. In this case it is important to overcome the problems with activating the activity, while force-complete request addresses problems with finishing it.
In case the activity has already been completed, but the follow-on navigation fails, a force-retry action is no longer allowed. Here, the problem lies in an area where the navigation is moved to succeeding activities, and rerunning the activity's implementation cannot help to overcome this kind of issue.
Repairing an activity in state stopped -- example
Let us come back to our sample process. As already stated, we concentrate on the first step, the AlignCustomerData scope, which is illustrated in figure 9. The scope is implemented as a cyclic flow. In the first step, the customer number is copied to a local variable (1). If the customer number is not null, that is, the customer is already known, the left hand path is followed (2), and the customer data is retrieved from the customer registration service (3). In case of a new customer, the customer data, such as address, bank connection, amd so forth, is entered in the human task activity ProvideCustomerData (4). Afterwards it is inserted into the customer registration service (5). Finally, the two paths join together and the current bonus for this customer is calculated in Java snippet calculateBonus (6).
As already mentioned in the introduction, we assume the customer registration service is not available and all calls return with an exception. As a result, the invoke activities calling this service stop.
Figure 9. Implementation of scope AlignCustomerData
The customer relationship manager "Jeff" uses the Business Process Choreographer Explorer (BPC Explorer for short) to repair processes that suffer from the outage.
He opens the Critical Processes view of the BPC Explorer, it showing all process instances that contain an activity in state stopped (see figure 10).
Figure 10. Critical Processes
He brings-up the activity instance list of the first process instance and finds out that the activity RetrieveCustomerData is in state stopped. The detailed view of the activity is displayed in figure 11.
Jeff looks up the customer data manually and updates the variable customerData with this information using the variable section as shown in figure 11. Just like that, he has manually replaced the implementation of the stopped activity.
He force-completes the activity by clicking the Force Complete button.
Figure 11. Activity details for stopped activity RetrieveCustomerData
As an alternative to the BPC Explorer you can also write a client using the Business Flow Manager APIs to perform the repair actions, including getting and setting variables.
(See resources for a reference.)
Changing the navigation of a process
In this section, we discuss how a user can influence or change the normal execution behavior of a running process instance. There are two major functions in this context: skip and jump.
Skip an activity instance
Consider a situation where a certain step in a process instance is not needed, we'll use a standard travel booking process as an example. The user does not require a hotel reservation because the traveling person stays at a friend's house. In this case, it is beneficial to be able to skip one or more activities in a process instance.
With WebSphere Process Server V6.1.2, all basic activities can be asked to be skipped in every activity execution state via a skip request. This request can either have an immediate effect on the activity, for example when the activity is actually running, or it can target an activity that is not yet reached. In the latter case, the activity is marked for skip until the navigation reaches the activity.
Important: A pending skip request can be cancelled via a cancel-skip request.
When an activity is skipped by request, the behavior in the follow-on navigation differs from the one of an activity that is skipped, because its join condition evaluates to false. In both cases the activity ends in state skipped. However, in case of a skip request all out-going links are evaluated and followed. In case of an automatic skip the current execution path is navigated as "dead-path" and link conditions are not evaluated. They are all considered to be false.
The skip function can be used in combination with stopped activities to repair an unforeseen situation, but also if no exception has occurred so far and a process execution path is interrupted, for example, by a human task activity that waits to be claimed and completed.
Skip an activity in an active state
An activity is in an active state, when the implementation of the activity had been started, but before it has been completed. The active activity states are: running, waiting, ready, claimed, and stopped.
When a skip is requested on an active activity, the activity's implementation is aborted and its state changes to skipped. For example, if an invoke activity, waiting for an asynchronous response, is skipped, the waiting is stopped and a later arriving response from the service is ignored. Follow-on navigation is continued, that is, all out-going links are followed and the transition conditions on the links are evaluated.
Skip a future activity
Activities that have not been reached yet can be skipped as well. When an activity has not been reached yet its state is "inactive", or the activity instance is not yet created.
Skipping a future activity means the activity is created if it does not exist yet, and it is marked for skip, that is, the "skipRequested" attribute is set. When navigation reaches the activity instance, the state of the activity is immediately set to skipped, and its implementation is not run. After this, the "skipRequested" attribute is unset and navigation continues as in the first case. Unsetting the "skipRequested" attribute effectuates that the activity is not skipped again, which could be the case, if the activity is inside a loop or a cyclic flow.
Skip an activity in an end state
An activity is in an end state if the implementation of the activity is completed. End states of an activity are: finished, terminated, skipped, and failed. When an activity is skipped in an end state, the request targets the next iteration of this activity, for example, if it is located inside a while loop or inside a cyclic flow. Note there is no validation that verifies that an activity can be reached again. So even if it is obvious that an activity cannot be reached again, a skip request is not rejected, but it actually has no effect. As in the case of inactive activities, the skip request marks the current activity instance to be skipped. Normally, if it is reached again, a new instance object is created which inherits this mark. The implementation is then not run but the navigation continues as in the two other cases.
Cancel the skip request of an activity instance
A cancel-skip request can be used to undo the effects of a skip request in case that the activity has not yet been reached or it is in an end state. A cancel-skip request is used to unset the "skipRequested" attribute.
Skip an activity instance -- example
Let us look at our sample process again. In the last repair step, Jeff has overcome the problems in the invoke activity RetrieveCustomerData by using force-complete. While he has been looking-up the customer data manually, he found out that the best bonus rate is already given to the current customer and instead of letting the process find that out by itself, he decides to skip the respective step in the process. So just before force-completing RetrieveCustomerData, he updates the bonus variable with this information and marks the calculateBonus activity for skip. To do this, he selects the Process State View as shown in figure 12, clicks on the calculateBonus activity and chooses Skip Activity from the drop down list. It is important to understand that the skip must be requested before the RetrieveCustomerData is force-completed, because the process navigation continues immediately after the force-complete call.
Figure 12. Skip activity instance "calculateBonus"
Jumping to another activity instance
In WebSphere Process Server 6.1.2, it is now possible to manually override the actual state of a running process instance. Specifically, you are able to perform jumps in the control flow from a single, dedicated activity, the source activity, to another one, the target activity. Jumping away from an activity instance can only be performed in an active state. A jump semantically consists of two steps: first, the source activity of the step is ended and second, the target activity of the jump is executed.
The source of the activity of the jump can be ended in three ways. It can be skipped, completed, or forced to be completed. So there are three possible ways to perform a jump: skip and jump, force-complete and jump, complete and jump. Note, only human task activities can be completed in combination with a jump; all other activity types need to be force-completed. This means in this case you also may provide an output message for this activity and after the jump has been completed the activity is in finished state. The jump behavior is the same for all three options.
Important: To jump between two activities, it is not required that the target activity is a successor of the source activity. You can also perform a "backward" jump. It is only possible to jump from one activity to another one, multiple source activities are not supported. The target of the jump has to be a single activity, however, it does not have to be a basic activity; it can be a structured activity as well. It is executed just after the source has been finished.
Forward jumps have the following characteristics:
- Any transition conditions specified on links between the source and the target activity are not evaluated, therefore, it is assured that the target of the jump is executed in any case.
- All activities in between the source and the target are not activated by the workflow engine. This means that the activities are not executed in case of a forward jump. This also means that any data that would have been produced by these activities is not available for the target activity of the jump. So be aware about the data required by the target activity. You can set the data of the variables manually to assure that the target is correctly executed. This is also described in this article (see "Repairing an activity in state stopped--an example").
Backward jumps have the following properties:
- Activity instances in between the source and the target are deleted from the database, so that they can be executed again after the jump has completed.
- At this point in time, no compensation is involved. However, if an activity in-between is a scope activity, the compensation handler remains registered.
Where are jumps supported?
The source and the target activity have to meet certain requirements that a jump can be performed. If these requirements are not met the jump request is rejected. These requirements are based on the state of the source activity and the structural composition of the process model.
- In order to perform a jump, the source of the jump must be in an active state. The possible states of the source activity to perform a jump are: claimed, ready, running, stopped, and waiting. These states are valid for both: skip and jump, and force complete and jump. However, since complete and jump is only allowed for human task activities, complete and jump can only be called in state claimed.
- Only activities with unique names in the business process are potential jump targets. However, the display name might be identical.
- Jumps are only allowed in combination of another activity's completion. A jump is only executed, if the skip, force-complete or complete operation is successful, for example, no runtime exception occurs.
- It is possible to perform jumps within sequences and cyclic flows, but jumps within parallel flows are not supported. In both cases the source and the target activity both have to be directly nested in the same enclosing construct. It is neither possible to jump into a structured activity nor to jump out of the direct enclosing structured activity. However it is possible to jump over structured activities.
- It is not possible to jump from an invoke activity with an attached handler, that is, a compensation handler, fault handler or undo action.
When you use the BPC Explorer to perform a jump, these requirements are checked automatically. Thus, only valid jump targets are displayed.
Jumping to another activity instance -- example
Let's come back to our sample business process. Figure 13 shows a process instance standing at the human task activity ProvideCustomerData. The customer relationship manager, Jeff, has already claimed this activity. Jeff now decides to complete this activity and to continue at calculateBonus, because he knows that the subsequent activity AddCustomerData will fail since the registration service is not available. Jeff will add the new customer to the registry at a later point in time by manual actions, when the service is back working.
To perform a jump within BPCExplorer, complete the following steps:
- Go to the Process State View and click on activity ProvideCustomerData. A menu appears offering the different options.
Select Jump to another Activity.
Figure 13: Jump from activity ProvideCustomerData to activity calculateBonus
Tip: BPC Explorer highlights all activities that are valid jump targets, while the other activities are faded out. Since the source of the jump is contained in a cyclic flow all the activities directly enclosed in this cyclic flow are valid jump targets. As it is not possible to jump outside of structured activities the activities not contained in the cyclic flow are faded out. Figure 14 shows all possible jump targets.
Figure 14. Possible targets for jump from activity ProvideCustomerData
To jump to calculateBonus, left-click on the respective activity.
Figure 15: Select the target calculateBonus of the jump
Tip: Figure 15 shows the menu offering the options how the source activity should be handled. The user is able to complete, force complete, or skip the source activity. The option "complete" is only available to be used if the source of the jump is a claimed human task activity otherwise this option would not be displayed. Jeff decides to complete the human task activity.
On the next screen, you can edit the output message of activity ProvideCustomerData.
Figure 16: Provide output message for complete and jump with ProvideCustomerData as source activity
- Enter the customer data and click on Complete and Jump. The output message is taken as output of the human task activity and the navigation continues at the selected target calculateBonus.
Common pitfalls of manually changing the navigation
When an activity is skipped or left-out due to a jump request, its implementation is not run. Thus the effects of the activity on the process instance and the whole application system outside the process instance are gone. In case of a backward jump, activities may be executed multiple times which also may have effects on the process. These effects need to be examined before a skip or a jump is performed in order to avoid undesirable side-effects. In addition, it might be required that these positive effects of an activity execution are achieved by substitutive, manual actions.
Effects on the process instance itself
- Local and global variables
- For example, consider an assign activity that is skipped. Then all updates to local and global variables of the assign did not happen. A subsequent step in the process might, however, require these changes. Global and local variables can be updated or initialized by get-/set-variable API requests or in the BPC Explorer.
- Correlation set values
- Message activities, that is receive, reply, invoke, and pick activities, can initialize or verify correlation set values with their input and output message. This is an important feature, for example, to correlate external requests to the correct process instances. It can be critical to skip or jump over an activity that initialized correlation sets, because the correlation sets can be used by other activities.
- Receive-reply pairs
- Another source of inconsistencies is receive-reply pairs belonging to the same two-way operation; when only one of the activities is run and the other one is skipped or not executed because of a jump request. When a receive activity of a two-way request is skipped or over-jumped, the corresponding reply activities should be marked to be skipped or over-jumped as well. If the reply is executed anyhow it will fail or stop with a fault indicating that no corresponding receive activity has been found. In case it stops you can force-complete the activity, but this would require further manual intervention later. If a reply activity is skipped the request remains open until the process completes, and an error is returned to the caller. A jump to an initiating receive activity, that has no correlation defined, should be avoided.
- Multiple executions of activities
- In case of a backward jump, the navigation continues at the target activity, and, as a result, some activities may be executed again, that is depending on the control flow. For example, an invoke activity might be re-executed and, for instance, a sub process is started a second time. Please be aware that in case of a backward jump the activities which are between the source and target are not compensated. Because activities may be executed twice, you need to handle them accordingly. One solution requires that activities are idempotent, or can somehow deal with the second invocation. A second solution is marking the activity for skip, before the jump request is performed, so that it is not executed a second time. Note: marking the activity for skip should be performed before the jump is requested.
- Jump over activities
- A forward jump means that the navigation is directly continued at the target activity; all activities between the source and target activity are ignored. They are neither skipped nor are they caught-up later, therefore any information which would have been produced by any of these activities is not available during further navigation of the business process. If you want that one of these activities is run anyway, you can jump back to this activity and mark the other activities which need not to be executed again, for skip.
- Logical incorrect jumps
- A possible jump target might not be a valid jump target concerning the logic of the business process. For example, in a cyclic flow, it is allowed to jump everywhere. Consider a cyclic flow that has mutually exclusive branches; each branch has preconditions and invariants that apply to all activities in the branch. If a jump request crosses these branches, these conditions may be violated and this may result in unexpected behavior and exceptions. Therefore, in general, whenever a user performs a jump, they should check the validity with regard to the business logic of the process.
Effects on the surrounding application
When service calls are omitted, the process instance state can get out-of-sync with the rest of the software application. For example, consider two long-running processes that communicate with each other with multiple operations. One process might assume to receive requests in a certain order, and it might stumble upon a request that was not expected.
You can consider using SCA (Service Component Architecture) methods to call partners of the process to simulate the skipped activity. In case these partners are other long-running process instances, it is also useful to undergo process repair on partner instance as well.
Process repair summarizes a set of actions that can be applied on a running process instance. These actions are designed to help recover from an exceptional situation that was not foreseen at modeling time. Process repair actions can be applied to any long-running process instance. However, to apply these functions most effectively, it is beneficial to regard certain aspects during modeling time, such as the continue-on-error attribute, and clever assignments of administration rights. This article showed you how these repair functions; namely: force-complete, force-retry, skip and jump, work in both theory and samples. Using these functions might not be enough to solve critical situations, and you may need to: update the internal process state, for example, update variable contents, and/or intervene on the whole application system outside the process.
This article showed that you need to use skip and jump actions with care, because they result in leaving-out or re-executing certain areas of the process which may, for example, result in a lack of information in other areas of the process instance. You also learned that, although, there is a set of restrictions for valid jump targets, an allowed jump target might be invalid from an application point of view. You need to detect this invalidity by studying the process model and its intention to avoid this kind of jump.
The authors want to thank the following persons for their review and comments: Christopher Coltsman, Susan Hermann, Ruth Schilling, and Gunnar Wilmsmann.
Process Server V6.1 Information Center
Provides more information about fault handling and compensation handling in business processes.
Process Server V6.1.2 Information Center
Provides more information about administering process templates and process instances.
Process Server V6.1.2 Information Center
Provides more information about developing client applications for business processes and tasks.
Process Management Samples and Tutorials page
Use this page to find tutorials on this topic.
- WebSphere Process Server
Version 6.1 Business Process Choreographer Programming Model
Read this white paper to learn more about WebSphere Process Server V6.1 Business Process Choreographer Programming Model
Get products and technologies
- Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.