IBM WebSphere Developer Technical Journal : Process anti-patterns: How to avoid the common traps of business process modeling, Part 2

Modeling data flow

This article is the second in a series describing typical modeling errors extracted from hundreds of actual process models created in different tools, including IBM® WebSphere® Business Modeler. Part 1 discussed anti-patterns that occur when you need to describe branching and iterative behavior in a business process model. This article addresses the modeling of data flow, events and triggers, the correct termination of a process, and the reuse of activities in hierarchical process models.

Jana Koehler (koe@zurich.ibm.com), Research Staff Manager, Zurich Research Laboratory, IBM

Author photo: JanaJana Koehler is a Research Staff Member and manager at the IBM Zurich Research Laboratory. She works on technologies for business process management and distributed systems, and leads the Business Integration Technologies team at the lab. Jana has contributed to the conceptual design of the patterns, refactoring, and transformation operations, and has authored the technical documentation. You can reach Jana at koe@zurich.ibm.com.



Jussi Vanhatalo (juv@zurich.ibm.com), Research Staff, IBM Zurich Research Laboratory, IBM

Author photo: JussiJussi Vanhatalo is a member of the Business Integration Technologies group at the IBM Zurich Research Laboratory. He currently works on business process management including quality assurance of process models. You can reach Jussi at juv@zurich.ibm.com.



04 April 2007

From the IBM WebSphere Developer Technical Journal.

Scenario 3: Modeling data flow

Real-world business processes always work on data in some form. They require data, they modify and update data, and they often also derive new data by bringing various data sources together. Therefore, capturing the data flow of a process is usually an important phase in a business process modeling project. Adding this information to the process model is non-trivial and often leads to errors. In addition to errors, your models can quickly become cluttered. This section focuses on problems around the modeling of data flows.

Dangling inputs and outputs

A phenomenon that we often observed in process models is the occurrence of dangling inputs and outputs, i.e., inputs and outputs of an activity or gateway that remain unconnected in the model. This phenomenon usually occurs when you edit models in the basic editing mode of IBM WebSphere Business Modeler (hereafter called Business Modeler), which does not visualize inputs and outputs, but only shows the connecting edges between activities and gateways. Dangling inputs and outputs often remain as residues of connections that you decide to delete or redirect. When you delete a connection, Business Modeler does not automatically delete the inputs and outputs that were connected, because you might want to reconnect them.

Figure 1 shows an example process with dangling control flow (small white arrows) and data flow (small gray arrows) inputs and outputs in a fork and a join.

Figure 1. Dangling inputs and outputs in a process model are only visible in the advanced editing mode of Business Modeler
Figure 1. Dangling inputs and outputs in a process model are only visible in the advanced editing mode of Business Modeler

Unfortunately, dangling inputs are often the source of simulation errors or prevent the simulation from running at all, because an activity or gateway waits for some input that it can never receive. The fork and join in Figure 1 cannot execute due to their dangling inputs. Currently in Business Modeler, all branches of a gateway must have the same data inputs and outputs. The editor enforces this requirement, i.e., whenever the you add an input or output to a gateway in some branch, Business Modeler automatically adds an input or output to all the other branches as well. It is not possible to have different business items associated with different branches. Thus, if you only connect some of the inputs and outputs, you immediately have dangling inputs and outputs that are not directly visible in basic editing mode, as Figure 2 shows. Only experienced users would notice the larger shapes for the input and output branches in the gateways that hint at the problem.

Figure 2. Dangling inputs and outputs are not visible in the basic editing mode of Business Modeler
Figure 2. Dangling inputs and outputs are not visible in the basic editing mode of Business Modeler

Dangling outputs are less severe than dangling inputs because they usually do not prevent a process model from correctly executing. However, dangling data outputs show that a task or process produced some data, or data was involved in the branching modeled in some gateway, but this data is not used anywhere in the process.

The anti-pattern in Figure 3 summarizes the dangling inputs (circled in red) that you must avoid and dangling outputs (circled in orange) that you should avoid.

Figure 3. Anti-pattern: Avoid dangling inputs and outputs
Figure 3. Anti-pattern: Avoid dangling inputs and outputs

A dangling input causes deadlocks if it is a control input of an activity or a gateway, a data input of a gateway, or a required data input of an activity. A data input is required if its minimum multiplicity is greater than zero. (A minimum multiplicity of zero means that the input or output is optional). The input logic and output logic tabs in Figures 3 and 4 show the defined minimum and maximum of the inputs and outputs. The pattern in Figure 4 summarizes how to correctly model the inputs and outputs of gateways and activities with a single input and output criterion.

Figure 4. Pattern: Correctly defining and connecting data inputs and outputs
Figure 4. Pattern: Correctly defining and connecting data inputs and outputs

To avoid a deadlock you must connect all required control inputs and data inputs. You also should connect all control outputs and required data outputs. It is recommended, but not required, to connect all the optional data inputs and outputs, i.e., those that have their minimum multiplicity set to 0. You should delete all non-required control inputs and outputs from an activity. Removing data inputs and outputs changes the data requirements, so is not always possible to remove them. For example, the business item C is only connected as an input in the lower branch of the merge on the left-hand side of Figure 3. Since it not connected as an output, i.e., it is not used by any subsequent activity, we removed it from the merge in the pattern in Figure 4.

Recommendations

  • Working in basic editing mode speeds up editing models, especially when creating models from scratch. However, before ending a modeling session you should switch to the advanced editing mode to find and clean up dangling inputs and outputs.

Reducing clutter in data-flow models

Can there be cases where dangling inputs and outputs make sense? Yes, because they can be a valid means to reduce clutter in models by showing only some selected flows. We see two possible modeling approaches where you can safely use dangling inputs and outputs without affecting the ability to execute the process model. Figure 5 shows the first approach, which uses connected control flow and puts all data inputs and outputs into separate input and output criterion.

In version 6 of WebSphere Business Modeler, you can also correctly generate BPEL code from such models.

Figure 5. Dangling inputs in a separate input criterion allow process execution
Figure 5. Dangling inputs in a separate input criterion allow process execution

The intuitive idea behind this approach is that data is not flowing through the process, but tasks and subprocesses access data from data sources shared among the activities. The specified control flow determines the order in which the data is accessed. The separation of the connected control flow from the disconnected data inputs and outputs into separate input and output criteria ensures that the process can correctly execute along the connected control flow. Furthermore, all gateways only involve control flow, but no data. Data inputs and outputs are visible when the model is viewed in advanced editing mode; in basic mode only the control flow is visible.

Figure 6 shows the second approach, which uses only a single input and output criterion, but sets the minimum multiplicity of all disconnected inputs and outputs to zero.

Figure 6. Dangling inputs with minimum multiplicity set to zero represent optional inputs, which can remain unconnected in executable process models
Figure 6

This approach focuses on only showing selected data flows in a process model. No additional control-flow connections should occur when a data-flow connection already exists between activities and gateways. Furthermore, only a single business item traverses a gateway to keep the data flow simple. An advantage of this presentation is that gateways involve data flow so you can capture the data-based branching decisions. You could not model these decisions in the first approach, because it only showed the control flow. Using only a single input and output criterion makes editing the model easier. A slight disadvantage is that it can lead to a mixture of control flow with data flow involving different business items, which can make the models harder to understand than models that only show control flow with disconnected inputs and outputs. To further complicate matters, different stakeholders in the modeling project sometimes disagree on which data is the most relevant and which data can be considered optional.

Recommendations

  • To reduce clutter in complex data flow models, show disconnected data inputs and outputs. The disconnected inputs must be either put into a separate input criterion or marked as optional by setting their minimum multiplicity to zero to allow the process to execute.

Multiple connections between activities

Complex control and data flows easily lead to multiple connections in process models, which are another source of cluttered models. Multiple connections, (or multi-connections) all start in the same activity or gateway and all end in one other activity or gateway. These connections lead to unnecessary redundancy if the multi-connections only involve control flow. If the multi-connections are associated with the same business item, they can easily lead to modeling errors. Figure 7 shows an example.

Figure 7. A cluttered model due to multi-connections
Figure 7. A cluttered model due to multi-connections

The model in Figure 7 is very cluttered and hard to understand because of the control and data flow. Two control flow connections leave Task 1 and end in Task 3. Such a control-flow multi-connection between a source and target modeling element is redundant because it does not add any additional information to the model. Control only needs to flow once from a source element to a target element. Furthermore, if you have already drawn a data connection between the source and the target, you do not need to add any additional control-flow connection because data flow always implies control flow.

Notice also that business item A leaves Task 1 four times, while item B leaves this task two times. Item A flows to Task 2 and Task 3 once, while it flows twice to task Task 4. Item B flows to Task2 and Task 3. Such data multi-connections usually point to a modeling problem where users either tried to pass the same item to several activities or intended to express that two different instances of the item are passed.

Take for example a negotiation process where you exchange an offer and a counter offer. You have two options to capture the flow of these two offers correctly:

  • The first option is to give meaningful different names to the inputs and outputs of the tasks using the business item. In the graphical model, Business Modeler only shows the name of the business item. In the attributes view, we can see the names of the inputs and outputs to distinguish the purpose of the business item.
  • The second option is to define a business item template and associate several different business items with it. For example, you could define a offer template followed by two business items, initial offer and counter offer, that inherit their common attributes from the offer template. The second solution is more appropriate if indeed two different data objects flow through the process model that share a common set of attributes.

Recommendations

  • Try to avoid mixing data and control flow in a model. Decide whenever possible for a pure control-flow or a pure data-flow model. Do not use control-flow multi-connections.

Gateway form versus activity form with data flow

Let’s start by covering the correct usage of gateway and activity form when you use a data flow. You can use gateway form and activity form interchangeably for process models that contain only control flow, as we discussed in the Background section in Part 1 of this article. However, for process models with data flow the behavior is different depending on whether you use gateway form or activity form. To correctly capture complex data flows, you might need to mix both forms in a single process model.

As we pointed out earlier, Business Modeler currently requires that all incoming and outgoing branches of gateways, i.e., of fork, join, decision, and merge must always have the same business items attached to them. In Business Modeler you cannot have different business items on different branches of a gateway. Consequently, we use gateway form if we want to model how the same, shared information is flowing along alternative branches (for a decision) or along parallel branches (for a fork). Figure 8 shows an example where items A and B flow into Task 2 and Task 3, while A and C flow out of them. Gateway form describes how the data flow gets routed based on the value of attributes of business items to alternative branches in the process model. You define output conditions for these decisions to capture in detail how these values determine the branch the item flows into.

Figure 8. Data flow shared along several branches is correctly modeled using gateway form.
Figure 8. Data flow shared along several branches is correctly modeled using gateway form.

If you need to model how different, unshared business items flow along the branches in a process model, you must use activity form. It is the only way to correctly capture how the data flow in a process branches based on the type of information, i.e., the business item. Figure 9 shows a model where unshared business items flow along several parallel branches.

Figure 9. Process model in activity form. Different types of unshared information flow along parallel branches
Figure 9. Process model in activity form. Different types of unshared information flow along parallel branches

In Figure 9, Task 1 produces outputs A, B, and C that it routes in parallel to Task 2 and Task 3. Task 2 receives item A, while Task 3 receives items B and C. We used single output and input criteria (notice the absence of arrows below the inputs and outputs), because we model parallel branching where Task 1 acts as a fork, while Task 4 acts as a join bringing the different data flows together again.

Figure 10 shows the same data flow, but now flowing along alternative branches instead of parallel ones. This means that Task 1 acts as a decision, while Task 4 acts as a merge and you need to define two output criteria and their respective input criteria for these tasks.

Figure 10. Process model in activity form. Different types of unshared information flow along alternative branches
Figure 10. Process model in activity form. Different types of unshared information flow along alternative branches

By defining different output criteria, we model that Task 1 provides alternative outputs, namely either item A or items B and C. Based on the output, either Task 2 or Task 3 executes. Again, you can only view this flow logic correctly when working in the advanced editing mode in Business Modeler. In the basic editing mode, the process models in Figures 9 and 10 look identical.

Now we can show the corrected model for the process in Figure 7. The most likely interpretation of this model is that the user wanted to show how shared information is passed on to several tasks. It is rather unlikely that Task 1 produces several copies of the item A as output. Consequently, the process should have been modeled using gateway form as Figure 11 shows. We removed the redundant control-flow connections and added a new business item, A-prime, to distinguish the two purposes of the item A.

Figure 11. A corrected model for the process from Figure 7 using gateway form
Figure 11. A corrected model for the process from Figure 7 using gateway form

Recommendations

  • Use gateway form with decision, fork, merge, and join to model how shared information flows along several branches in the process and where branching takes place based on the value of attributes of business items.
  • Use activity form using input and output criteria, but no gateways, in models where data flow branches based on the type of the information and unshared business items travel along different branches.
  • Try to separate process fragments where data flow branches based on the business item from process fragments where data flow branches based on an item attribute to avoid the need to mix both forms.

Data-flow errors typically arise when data is flowing along several execution branches that can either capture alternative or parallel behaviors. This produces three modeling situations:

  • The same, shared data is passed along several branches in the process flow.
  • Different, unshared data is passed along the branches.
  • A mixture of shared and unshared data is passed (most complex).

We discuss each of these cases separately in the following sections.

Passing shared data along several branches

Understanding the difference in the behavior of data-flow models using activity form or gateway form provides the foundation to investigate three data-flow modeling situations in detail and to discuss the typical modeling errors. In most of the cases, identical dataflow anti-patterns and patterns apply to parallel and alternative branching. Therefore, we concentrate on parallel branching and discuss alternative branching flows only if there is an interesting difference. We first discuss the situation where shared data must be routed along several branches.

The anti-pattern in Figure 12 shows a frequent error that we observed when using activity form to specify that shared business items flow along several branches.

Figure 12. Anti-pattern: Activities provide the same data outputs multiple times for using the same, shared data on parallel branches
Figure 12. Anti-pattern

Using activity form leads to a duplication of data in the inputs and outputs of activities. Task 1 would normally produce business items A and B as outputs. To route the items in parallel to two process fragments (visualized by the blue boxed areas), the modeler duplicated the outputs of Task 1. Similarly, the modeler duplicated the inputs of Task 4 as well. This is not only a bad modeling practice, it also changes the semantic meaning of the model because it adds additional business items and duplicates inputs and outputs of the activities. When you design activities for reuse, such a duplication is a strong limitation, because any reusing process must provide two As and two Cs as input to Task 4, or the task cannot execute. The pattern in Figure 13 shows the correct modeling solution for this situation.

Figure 13. Pattern: Use gateways for branching when using the same, shared data on several paths
Figure 13. Pattern: Use gateways for branching when using the same, shared data on several paths

For alternative branching, an identical pattern results where the fork is replaced by a decision and the join is replaced by a merge. You use gateway form for branching when shared data is used and produced along several branches. The process fragments that comprise these branches can of course modify the data. We can see in the pattern that item B enters both branches, but does not leave the blue process fragment. Instead, item C is provided on both branches as input to the join. Note that again that you must provide the same item C on both branches for the join to be able to correctly execute.

The simulation in Business Modeler 6.0.2 currently shows that two As and two Cs leave the join in the pattern in Figure 13, which causes Task 4 to execute twice. So the simulation implements a semantics where items get multiplied by a fork and the join does not behave symmetrically, i.e., it does not undo the multiplication. The join behaves more like a merge on data flow models, so a lack of synchronization can occur, unless the modeler intended multiple execution of the process fragment following the join.

In the process model in Figure 13, you can prevent the lack of synchronization by setting the minimum and maximum multiplicity of the inputs A and C to 2, because Task 4 must synchronize two occurrences of each item. Unfortunately, this solution makes reusing Task 4 in other process models more difficult. Furthermore, computing the right multiplicities can be challenging in models with more complex fork-join structures.

Passing unshared data along several branches

When unshared data flows along different alternative or parallel branches, many users are tempted to use gateway form. This often leads to dangling inputs in a join or merge, which cause a deadlock. The anti-pattern in Figure 14 illustrates this error.

Figure 14. Anti-pattern: Passing unshared data along parallel branches using gateways causes deadlocks due to dangling inputs
Figure 14. Anti-pattern

The dangling inputs in the join prevent it from executing and thus block all activities succeeding it. This means Task 4 in the anti-pattern cannot execute, although the task has all its inputs connected. The dangling outputs of the fork are not causing an execution problem, but they lead to a process model where certain outputs are not used. Additional dangling inputs can occur in situations where process fragments within the blue frame produce additional data. For example, a new business item D is provided as output of some task within this process fragment and it replaces input item C.

Very often, you can correct this error by wiring all business items through all tasks, although these tasks do not need to access these business items. For example, items B and C would additionally flow through the upper parallel branch. There are several good reasons to not wire unnecessary data through activities. First, it makes reusing activities more difficult as they require additional input and output that not all reusing processes may be able to provide. Secondly, it exposes information to activities that do not need it, which can later cause security and performance problems when designing the implementing IT solution by closely following the process model.

The pattern in Figure 15 shows the correct solution using activity form:

Figure 15. Pattern: Use activity form for branching when different, unshared data is used on several paths
Figure 15. Pattern: Use activity form for branching when different, unshared data is used on several paths

This pattern shows parallel branches. When modeling alternative branching flows that work on unshared data you need to correctly define the input and output criteria of the activities. In particular, the input criteria of activities that act as implicit merges must exactly match the alternative branches. If the criteria do not match the branches, deadlocks or a lack of synchronization can occur. If we change the pattern in Figure 15 to show two alternative branches, we need to define one input criterion for Task 4 that includes item A and a second input criterion that includes items B and D.

Passing shared and unshared data along several branches

The third situation covers the case where a subset of the data is shared on all the branches, but an individual branch works on data that is specific to this branch. Using only activity or gateway form cannot lead to a correct model. The anti-pattern in Figure 16 shows a situation where the upper branch works on item A, the lower branch receives item C and produces item D, but both branches work on item B. This model uses activity form. We notice immediately the problem of the duplicated item B in the output of Task 1 and the input of Task 4.

Figure 16. Anti-pattern: Using activity form to pass shared and unshared data along several paths leads to a duplication of inputs and outputs
Figure 16. Anti-pattern

The anti-pattern in Figure 17 shows that the gateway form leads to dangling inputs in the join that cause a deadlock.

Figure 17. Anti-pattern: Using gateway form to pass shared and unshared data along several paths leads to deadlocks caused by dangling inputs
Figure 17. Anti-pattern

The only solution is to mix both forms: Use gateways to route business items that the branches share, and use input and output criteria to route business items that are specific to a branch. The pattern in Figure 18 shows a solution that works for parallel branches.

Figure 18. Pattern: Bypassing gateways is possible in parallel flows
Figure 18. Pattern: Bypassing gateways is possible in parallel flows

The shared items A and B branch through the fork and rejoin in the join. To avoid dangling inputs, the join should expect no other input. Item C is only needed as input for the lower branch. It passes through the fork, where it creates a dangling output in one of the branches. This is not an ideal modeling solution, but at least it allows the process to execute. Alternatively, C could bypass the fork and enter the blue process fragment directly. D must, however, bypass the join and enter Task 4 directly to avoid the deadlock.

Note that A flows through the fork and join, but it bypasses the process fragment in the lower branch of the fork, because it is not required by activities inside this fragment. A could bypass the fork and join, but it still needs to enter and leave the process fragment in the upper branch of the fork. It also needs to enter Task 4. We show in the pattern a selected variant of bypassing that eliminates the critical deadlock. Additional flows that bypass gateways are possible, but such a solution quickly leads to a cluttered diagram.

Bypassing forks and joins in parallel flows does not cause a deadlock, because all branches execute in parallel and thus, all business items always arrive through the connections, i.e., all tasks receive their inputs as specified. For example, Task 1 produces all its outputs and they are therefore available for Task 4. Alternative flows cannot guarantee the availability of inputs for an activity that expects these inputs from several alternative branches.

The anti-pattern in Figure 19 shows a process model with alternative branches where item D, which is only produced by the lower branch, bypasses the merge to enter Task 4 directly. D is a required input of Task 4.

Figure 19. Anti-pattern: Bypassing gateways of alternative branches can cause deadlocks
Figure 19. Anti-pattern: Bypassing gateways of alternative branches can cause deadlocks

Unfortunately, D is not available if the upper branch executes. A required input must always be provided by all alternative branches. If an input is not required, but optional, two possible solutions would correct the input behavior of Task 4. The pattern in Figure 20 shows the first solution, where the minimum multiplicity of item D as input of Task 4 is set to zero.

Figure 20. Pattern: Inputs provided along only some of the alternative paths must be optional
Figure 20. Pattern: Inputs provided along only some of the alternative paths must be optional

The pattern in Figure 21 shows the second solution by defining several input and output criteria for Task 4 that correctly match the alternative branches. Task 4 has an input criterion requiring only business items A and B for the upper branch, while for the lower branch it has a separate criterion that includes business items A, B, and D.

Figure 21. Pattern: Input criteria must precisely match the data that flows along alternative paths
Figure 21. Pattern: Input criteria must precisely match the data that flows along alternative paths

Both solutions enable Task 4 to execute correctly and independently of the branching decision in the decision gateway.

Recommendations

  • When modeling complex data flows, take a systematic approach based on situations that distinguish whether shared or unshared data has to be passed along several flows. Then determine whether these flows occur as parallel or alternative branches in the process model. You can use activity form to capture the flows, but when branches share data, watch out for duplicated inputs and outputs of activities. Gateway form is a better solution for this situation.
  • When modeling alternative branches, pay attention to data that is only available on one of the branches. This data must be an optional input for any activity following the merging of the alternative branches.

Scenario 4: Modeling events and triggers

For the purposes of this article we make no semantic difference between events and triggers, and so will use the term events hereafter.

Very often, users want to model events and triggers in a process model. Business Modeler supports events in the context of business measures that you use to define key performance indicators for process monitoring, but not directly as first-class modeling elements in the process model itself. Currently, Business Modeler supports events in the form of a notification, which can be received via a notification receiver and broadcasted via a notification broadcaster, i.e., it supports an abstract modeling of publish-subscribe communication. Currently you cannot model a point-to-point, event-based communication and capture how events flow through a process model.

Events as control flow?

In the models we studied, we observed that often control flow is used to capture events. However, the semantics of control flow in Business Modeler only defines an order of execution between activities—it does not carry any information like events usually do. Consequently, a modeling practice that captures events with control flow leads to various semantic problems. Figure 22 shows an example using control flow to capture the logic behind several initial and final events that occur in a process.

Figure 22: Complex event triggering logic incorrectly captured as control flow
Figure 22: Complex event triggering logic incorrectly captured as control flow

Figure 22 uses three start nodes to represent three different events that can initiate execution of the process. The control connections from these start nodes to the subsequent gateways were named with events; however, these names are not visualized and can only be seen by clicking on the connection or opening the attributes view. This leads to a model where essential information about the events is not directly visible in the graphical representation. It also uses a merge and a join to capture the event logic. The user’s intention here was to describe a process that is triggered by a single event or by two events that must jointly occur. To express the event logic (event1 AND event2), the user introduced the upper two start nodes and connected them to a join. The lower start node represents the third event. The user connected this start node and the join to a merge to represent the event logic (event1 AND event2) OR event3.

However, the semantics of start nodes in Business Modeler specifies that all start nodes of a process model execute at once. In our example, all three event-representing control connections are triggered immediately and then the join executes. The merge executes twice, once when it receives control from the lower start node, and again when it receives control from the join. Consequently, Task 1 executes twice in all executions of this process. You can observe this behavior in the Business Modeler simulation. So the three alternative events that the user tried to capture with the three start nodes always occur together and are by no means alternative triggers for the subsequent task.

The user captured the two final events that the task triggers after successful execution in a decision with two outgoing branches that directly end in a stop node. Stop and end nodes cannot pass any event information outside the process—they only stop the control flow. Therefore using control flow to depict events is not a good idea.

The anti-pattern in Figure 23 generalizes this insight. Multiple start nodes directly linked to gateways are not suitable to capture any triggering logic of a process. Connecting several start nodes directly to a merge causes a lack of synchronization. You should replace a join that only has start nodes as input with a single start node that directly connects to a task or subprocess. Connecting all outgoing branches of a decision or fork directly to end or stop nodes shows that the decision or fork is unnecessary. A gateway should lead to branches that contain tasks or subprocesses. Usually, only one of the branches should capture a “do-nothing” case and directly link to an end or stop node. Scenario 5 goes into more detail on the difference in the semantics of the end and stop node.

Figure 23. Anti-pattern: A merge or join only proceeded by start nodes, and a decision or merge only followed by an end or stop node is an error
Figure 23. Anti-pattern

Events as data flow

In our own practice, we use two alternative solutions to capture events. In process models (which are not intended to be exported to the IT level and where we want to capture start and end events of the process, but do not need to show the event flow), we use receivers and notification broadcasters. For more information on how to correctly use these modeling elements, see the Business Modeler documentation. In process models, where we want to describe how information received through events flows between the activities in a process, we use business items and data-flow connections.

Figure 24 shows on the left three different catalogues of business items to distinguish events, notifications, and “normal” business items from each other.

Figure 24. Event flow represented as a specific kind of business item flow
Figure 24. Event flow represented as a specific kind of business item flow

Each kind of information is associated with a different icon, which you can easily customize in Business Modeler. Notifications and business items are available as predefined catalogues, while the Events subcatalogue is user-defined. The process model fragment in this figure shows an example where Task 1 can execute if it either receives a business item A with some event 3, or if events 1 and events 2 occur. Task 1 sends business item B to Task 2 with a complex event, which now flows through the process similar to other business items. Because we model events as specific kinds of business items, we can define attributes for them to capture in more detail what information they carry. You can also access this information when modeling decision conditions.

The pattern in Figure 25 illustrates the approach of using data flow to represent events. It receives initial events via the process input interface, while final events leave the process via the process output interface.

Figure 25. Pattern: Events can be modeled as business items
Figure 25. Pattern: Events can be modeled as business items

Recommendations

  • Do not use control flow to model events and triggers in a process model. Either use the modeling elements that are provided to capture notifications, or represent events as a specific kind of business item flow.
  • Do not connect all inputs or outputs of a gateway or activity directly to start, end, and stop nodes.

Scenario 5: Correct termination of a process

Business Modeler offers two types of nodes to terminate a process, which are called the end and the stop node. The end node is visualized with a circle containing a cross, while the stop node is visualized with a circle containing a black dot. A stop node stops all activities and flows in the process model, so it terminates all executing branches within the whole process model, i.e., it leads to a “global shutdown” of the entire process.

If more than one branch executes at once, e.g., if the model contains some parallelism, the stop node always terminates all parallel branches. In contrast, the end node only has a local effect; it only ends the single branch through which it was reached. Given the more global effect of the stop node on the entire process, we have to be careful putting a stop node in process models that can have several branches executing in parallel.

Semantically, we can use both nodes interchangeably in models that definitely only have a single sequential execution, e.g., those models that do not use forks, inclusive decisions, cyclic connections, and branching output criteria. From a tool perspective, at least one stop node is required within every process, subprocess, and loop in Business Modeler 6.0.2. The simulation in Business Modeler 6.0.2 requires that every path in the process model ends in a stop node, i.e., the end node should rarely be used. The stop node is particularly important when simulating data flow models, because it is required to release the data. This means that a parent process can only receive data from a subprocess when a stop node is reached or when the advanced output logic (see the tab of the same name) of the subprocess is set to streaming, i.e., the subprocess releases data while still running.

Let’s explore the semantic difference between the end and the stop node in more detail, particularly the global shutdown effect of the stop node.

The stop node in parallel execution branches

Very often, users are not aware of the global effect of a stop node and use it to end each of the individual branches of a process. Figure 26 shows a typical example.

Figure 26. Stop nodes used in a process model with parallel execution branches
Figure 26. Stop nodes used in a process model with parallel execution branches

Immediately following the start node, we see an inclusive decision with two branches. The upper branch leads to a fork that causes Task 1 and Task 2 to execute in parallel. Both tasks connect directly to a stop node. The lower branch of the inclusive decision leads to a cyclic process fragment, where Task 3 iterates until a decision condition is satisfied. The "yes" branch of this second decision is directly connected to a stop node.

As soon as one of the branches reaches the stop node, the entire process terminates, even if tasks have not finished their execution. Such a “global shutdown” can sometimes be intended and will be correctly simulated. However if we think of the IT implementation, this is most likely not the intended process behavior; instead, branches running in parallel should end individually after the tasks on these branches finished correctly.

A problem occurs if the initial decision is indeed inclusive and activates both branches. If it activates only the lower branch, no problem occurs because the cyclic process fragment only contains a sequential loop, i.e., Task 3 is executed repeatedly, but several instances of the task do not run at the same time. If the decision only activates the upper branch, you can still have a problem in the fork if Task 1 and Task 2 represent activities of different duration. When one of the tasks finishes, it would also cause the immediate termination of the other task. When the inclusive decision activates both of its outgoing branches and Task 1 or Task 2 only have a very short execution duration, this could in principle also cause the cycle to never execute, because as soon as the execution reaches one of the upper stop nodes, it terminates the whole process including the cycle. The anti-pattern in Figure 27 shows the problem of stop nodes used in parallel branches.

Figure 27. Anti-pattern: Stop nodes ending parallel branches always terminate the whole process even if they are only meant to end a single execution branch
Figure 27. Anti-pattern

If we replace the inclusive decision and fork with two exclusive decisions, the process would only have one sequential execution path. In this case, you could use end and stop nodes interchangeably. Without parallelism, the stop node has exactly the same effect as the end node, i.e., it ends the single branch through which it was reached.

Given the current requirements of the simulation, we recommend you use the stop node in models. You should be aware, however, that some parallel paths may not have finished when the stop node is reached during a simulation run. The BPEL export in Business Modeler maps the end and the stop node to an implicit end of the BPEL process. It does not generate an explicit global termination behavior in the BPEL for a stop node, e.g., via a BPEL terminate activity.

The pattern in Figure 28 summarizes our discussion and recommends that you rejoin parallel branches before adding a stop node to terminate the process.

Figure 28. Pattern: Stop nodes model a global shutdown behavior. You can safely use them when this behavior is intended, when only a single sequential execution occurs, or when parallel branches rejoin before reaching a single stop node.
Figure 28. Pattern: Stop nodes model a global shutdown behavior.

Recommendations

  • Use the stop node when you want to model a “global shutdown” of a process.
  • Rejoin parallel branches with a join node and then place a single end or stop node, instead of ending parallel branches individually.

Data output upon termination of a process

Finally, let’s look at the process boundaries and the inputs and outputs of a process. Very often, processes receive data as inputs and produce data as outputs similar to activities inside the process model. Consequently, you can define input and output criteria for the process to represent the process interface.

Figure 29 shows an example of a process for which inputs and outputs are defined.

Figure 29. Input and output data of a process, combined with start and stop nodes
Figure 29. Input and output data of a process, combined with start and stop nodes

The process receives item A as input, which it passes on to Task 1. The task also has a start node, however, this node is optional and does not change the execution semantics of the task, which can only begin execution when the input is available. The additional start node visualizes the process beginning more clearly. The process produces items A and B, or A and C as outputs. A is an output of Task 4, B is an output of Task 2, and C is an output of Task 3. Item D is an output of Task 3, and is not used by any other task and not provided as an output of the process. It is therefore connected to an end node. Using a stop node instead would not be a correct solution, because it would immediately end the whole process and prevent execution of Task 4. Item E is an output of Task 4 and connected to a stop node, because it is not used by any other activity in the process. This use of the stop node is correct, because Task 4 is the last executing task of this process and no other activities execute in parallel. The process interface shows two alternative output criteria to match the alternative outputs. The anti-pattern in Figure 30 summarizes typical problems of process interfaces that do not match their incoming flows.

Figure 30. Anti-pattern: Process output interfaces that do not match their incoming flows lead to nondeterminism (on the left) or to a deadlock (on the right)
Figure 30. Anti-pattern

You would define a process interface the same way you define the inputs and outputs of activities within the process. The process interface definition needs to model the different execution branches that can occur and lead to different combinations of business items that can reach or leave an activity. Scenario 3 discussed this in detail. A possible source of errors is the mismatch between the input and output criteria of the process and its possible execution branches. It is important to link alternative branches to alternative input and output criteria.

Figure 30 shows two parallel branches on the left that provide outputs A, B, and C in parallel. The process output interface only expects a single output in each output criterion, which means that the process internally decides which data to release via its output interface, and this data can be different any time the process runs. A reusing process must therefore be able to handle any of the possible outputs. On the right, we have two alternative branches that either provide A or B or C as output of the process. The two alternative branches result from the decision. In addition, Task 2 has B or C as alternative outputs. The output criterion always expects all three business items in a single output criterion. In this situation, a reusing process would never receive all the data specified in the output interface of the reused process, because the process fragment shown in the left of the anti-pattern is deadlocking because it cannot release all the required data. The correct solution is to match the single output criterion on the right with the process fragment on the left and vice versa. Note also that a stop node was not provided in the anti-pattern, which is required to release the data in a simulation run.

Each branch in a process must begin with a start node or by receiving data as input. It is also a good modeling practice to end each branch of a process either with a termination node or by providing data as output. To increase the well-formedness of process models, we recommend you rejoin branches opened by forks and decisions using joins and merges when they use some shared data. We discussed this in detail in Scenario 3.

Sometimes, activities within a process have output data that is not part of the process output. This output often remains unconnected and becomes a dangling output. An alternative is to connect this output to an end or stop to emphasize that the process has been completely modeled. For this purpose, Business Modeler provides an asymmetry between the start and the end and stop nodes with respect to data flow. A start node can only issue control flow, while stop and end nodes can also receive data flow.

The pattern in Figure 31 shows correctly matching process output interfaces and a correctly placed stop node that does not lead to an unintended global shutdown. However, the process will not release its data in the simulation because the parallel branches involving Task 2 and Task 3 do not end with a stop node.

Figure 31. Pattern: Process output interfaces must correctly match their incoming flows
Figure 31. Pattern: Process output interfaces must correctly match their incoming flows

The pattern in Figure 32 shows the further improved model. Note that we added a join to rejoin the two parallel branches so that they can lead to a single stop node. The end node to which Task 3 previously connected is no longer needed. No unintended shutdown can occur and all data output is correctly released.

Figure 32. Pattern
Figure 32

Recommendations

  • Process interfaces must correctly match the data flows of the process.
  • Terminate each process branch with an end or stop node.
  • End branches that release data with a stop node. Rejoin them before adding the stop node to avoid an unintended global shutdown.
  • Connect data output of activities that are not released to the process output interface to an end or stop node to avoid dangling outputs.

Scenario 6: Reuse of activities in hierarchical process models

So far we discussed subprocesses and tasks from the perspective of a single process model, i.e., from a local point of view. However, different processes can share the same activities, so we want to increase their reuse in a process model as much as possible. To help with reuse, Business Modeler lets you define tasks and subprocesses as global modeling elements directly in the project tree, so that you can drag and drop them into other process models for reuse.

Figure 33 shows a composed process that reuses other global subprocesses and global tasks.

Figure 33. A detailed view of a hierarchical process model
Figure 33. A detailed view of a hierarchical process model

Figure 34 shows a project tree on the left and a compact view of the hierarchical levels of the composed process on the right.

Figure 34. Global tasks and processes defined in the project tree and reused at several levels of a process
Figure 34. Global tasks and processes defined in the project tree and reused at several levels of a process

We have implemented Business Modeler plug-ins to visualize the decomposition hierarchy and the reuse of activities. Interested readers should contact the authors to inquire about the availability of these plug-ins.

The project tree contains the definitions of the global processes and global tasks available for reuse. The compact view on the decomposition hierarchy of the Main Process lets us immediately see where processes and tasks are located on the hierarchy levels. Whenever possible, you should define each activity only once and thus, implement them only once. The definition of this activity can subsequently be reused by any other process.

Both figures show the four levels of the example process. On the top level, we find the Main Process, which reuses Subprocess 1 and Global Task 1. Subprocess 1 reuses Global Task 2 and Subprocess 2. Subprocess 2 reuses Subprocess 3. The reused subprocess is followed by a decision with two branches. Both branches reuse Global Task 3. As this task occurs twice in Subprocess 2, the tool distinguishes its two occurrences by adding a “:2” to the second occurrence of this task. The upper branch of the decision reuses Global Task 1, which Main Process also reuses. Finally, Subprocess 3 invokes Main Process again.

Since we focus on common modeling errors in this article, we do not want to discuss aspects of a good hierarchical decomposition, but highlight what can go wrong when you compose a process reusing subprocesses and global tasks. The example above illustrates two sources of errors. First, we can see that Main Process occurs twice in the hierarchical composition, which means that it reoccurs in its own decomposition. A process that occurs within its own decomposition hierarchy leads to a recursive refinement. For our example process, the recursion is infinite; Subprocess 3 always invokes Main Process again. In a proper recursive process definition, Subprocess 3 would contain a decision with two alternative branches, where one branch covers the recursive invocation, while the other leads to a stop or end node.

Infinite decompositions can easily occur in larger composite processes, where maintaining a global view of the model becomes difficult. Usually, users do not intentionally create recursive process models. They result from dragging and dropping processes or tasks on different hierarchical levels of the process. It’s very easy to create reuse chains as in Figure 33 where a lower-level process suddenly reuses one of its “parent” processes. To avoid such recursive refinements, you should group reusable processes into several abstraction levels. Processes at a higher abstraction level should only be refined with processes from a lower abstraction level, and the lowest level should only contain global and local tasks.

A second modeling error that we observed results from the reuse of the same activity within a process model. For example, Global Task 3 occurs twice in the refinement of Subprocess 2. You should always examine multiple occurrences of activities, because they can indicate that you did not adequately capture the control flow. It can be argued that the decision should be placed after Global Task 3 and that this task should only occur once in the process model. Figure 35 shows the corrected Subprocess 2.

Figure 35. The corrected subprocess with only a single occurrence of the reused global task
Figure 35. The corrected subprocess with only a single occurrence of the reused global task

Multiple occurrences of an activity do not necessarily lead to a process execution error, but they can affect the readability of the model. A frequent example of redundancy occurs when users approximate iterative behavior. Instead of drawing a cycle, they add a decision and place repeated activities multiple times in the process model. This technique enables a business analyst to separate repeated paths in a process, and it facilitates the analysis of the process model in a business scenario. However, in an implementation scenario, the alternative sequential paths do no correctly capture the behavior to implement, which makes it impossible to generate meaningful BPEL out of these models. A better solution would be to place the repeating activities within a cycle, as we discussed in Scenario 2 in Part 1 of this article. You can create the cycle by either using a merge followed by a decision, or by using the loop modeling elements available in Business Modeler.

Finally, reusing activities in process models with data flow requires a careful examination of the inputs and outputs of the reusable activities. Figure 36 shows a Global Task that provides alternative input and output interfaces for reuse.

Figure 36. Input and output criteria of a global task that specify alternative task interfaces for reuse
Figure 36. Input and output criteria of a global task that specify alternative task interfaces for reuse

The task executes when it receives business items A and B, or C and D, or B and C as inputs. We defined these possible combinations in three different input criteria. As an output, the task provides business item A, or B, or C. Each single business item is thus placed in a separate output criterion. The definitions of the input and output criteria are part of the activity definition; they cannot be changed in any process model that reuses this task. You can only add additional control flow inputs and outputs to an activity after dragging and dropping it into a reusing process model.

To correctly reuse a task or subprocess, each reusing process must provide the inputs for at least one of the input criteria, which means that you must connect these inputs to other activities in the reusing process model. The reusing process should also be able to handle the possible outputs of the reused activity. Ideally, you should connect all outputs of all output criteria. If an output cannot be connected, because it cannot be used by the reusing process, you could connect this output to an end or stop node and add an additional control-flow connection from the reused activity to other activities further downstream in the process model.

In our example, the reusing process must be able to handle the alternative outputs A, B, or C. We need subsequent process fragments in the reusing process to receive these business items as separate inputs. If a specific output occurs depending on a specific input (for example, when business item A is produced as an output only if A and B are provided as an input) then the reusing process only needs to be able to handle those outputs.

Recommendations

  • To make hierarchical models more readable, group reusable subprocesses based on identical abstraction levels, and only refine processes at one abstraction level with processes from a lower abstraction level.
  • Unless you’re modeling an infinite recursive refinement of the process, a hierarchical process decomposition should not contain the same process in different level. Add a reachable exit branch to stop the recursive invocation.
  • Examine multiple occurrences of the same reused activity for redundancy and possible improvements of the control flow of the process. When reusing an activity in a process model containing data flow, make sure the inputs and outputs of the activity match the possible data flows that can involve the reused activity within the reusing process model.

Conclusion

In this article, we investigate typical modeling errors that we extracted from hundreds of real-world process models drawn in different business process modeling tools over the last two years. The modeling errors are grouped into six common modeling scenarios. In Part 2 of this article, we address the following scenarios: the modeling of data flow, the modeling of events and triggers, the correct termination of a process, and the reuse of activities in hierarchical process models.

For data-flow modeling, we provide users with a systematic approach where they evaluate whether they’re passing shared or unshared data along several alternative or parallel branches. We also give hints on how to avoid clutter in data flow models. Finally, we explain how to terminate a process correctly and how to pass data from a terminated process.


Acknowledgments

We would like to thank our colleagues Thomas Gschwind, Jochen Küster, Cesare Pautasso, Ksenia Ryndina, Michael Wahler, Olaf Zimmermann, and IBM practitioners for their valuable comments on draft versions of this article. We also want to thank the numerous colleagues who sent their models to us that we analyzed for this work.

Resources

Learn

Get products and technologies

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Business process management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Business process management, WebSphere, Architecture
ArticleID=204429
ArticleTitle= IBM WebSphere Developer Technical Journal : Process anti-patterns: How to avoid the common traps of business process modeling, Part 2
publish-date=04042007