Part 1 of this series described how service-oriented architectures (SOAs) mature through a number of stages as recognized by the Service Integration Maturity Model (SIMM), and how this affects the type of solutions you implement in WebSphere Process Server (hereafter called Process Server) and WebSphere Enterprise Service Bus (ESB).
Part 2 looked specifically at how services are exposed (SIMM level 4), and what purpose the ESB Gateway performs in this exposition.
In this third article, we will make the risky assumption that a well-matured set of services have already been exposed, and that our task now is purely to create processes that allow us to string these services together, sometimes in combination with human tasks. It is a risky assumption because most organizations are still maturing their services, but for now, it will help us to focus on the process aspects of the solution.
In the SIMM model (Figure 1), this is classed as service composition (SIMM level 5). However, at this level, we encounter a range of other terms, such as orchestration and choreography, that are defined later in the article.
Figure1. The Service Integration Maturity Model (SIMM)
We will look at the most common styles of solution for this maturity level, and categorize them into what we will call process implementation types (Figure 2).
Figure 2. The common process implementation types
Here's a basic description of each type:
- Graphical user interface (GUI) intensive process: This is the navigational flow and data aggregation "at the glass", rather than in a BPEL process in Process Server. Typically a single user flow, through collaborative graphical user interfaces (GUIs), may transfer flow. Process Server is not involved in the process navigation, although it may be used in the background to provide swiftly responding synchronous services to the user interface (UI). GUI intensive processes are also less linear, allowing the user to hop backwards and forwards.
- Synchronous transactional process: This is a typical use of short running BPEL processes. It is often used for real-time responses to graphical user interfaces, or for transactional sub-processes.
- Asynchronously initiated transactional process: The caller waits only for one-way request to be accepted. Actual transactional process occurs in the background. It is used to improve apparent response time where the caller only requires an acknowledgment, as opposed to a completion.
- Briefly persisted process: This is a special use of a long running process where the process completes relatively swiftly. The process lifespan is deliberately short (seconds, maybe minutes), such that process versioning issues can be avoided. No human tasks or in-process events may be present. The process is commonly used to manage parallel aggregation. The process must be easily quiesced without significant business impact to allow new versions to be deployed.
- Versioned long lived process: A true long running process that will last a relatively long time (days, weeks, and so on – notice that “hours” are the grey area here between these and briefly persisted processes). Process instances will always be present in the systems so that complex issues of process versioning must be taken into account. It can contain human tasks and complex error handling, such as compensation. It may receive in-process events.
- Task based process: This is used to balance multiple tasks between a number of different users, possibly in different teams or departments. It requires long lived processes, and hence you must consider process versioning issues also.
These are considered high-level patterns since there are multiple concrete implementations for each one. The remainder of this article describes each in more detail.
Remember that the assumption we made earlier – that we had a range of maturely exposed services at our disposal. It was an important assumption, which is why we are drawing your attention to it again. We will return to it in the next article and consider how we assess and improve a service. For now, let's put that aside and look at how we typically string mature services together.
A typical use of a short running BPEL process (Figure 3) is to string together service invocations within one transaction, or where invocations are not transactional, but the current state of the process need not be retained in the case of a failure. Short running is also used as a preference for high performance since it does not need to perform persistence in-between steps of the process.
Figure 3. Properties setting to make a BPEL process short-running
The following lists the primary characteristics of this type of process:
- It performs lightweight sequences of service invocations and any associated logic.
- Whole process is fully transactional (Figure 4). For example, the process can receive or initiate a global transaction and can propagate and control further downstream transactional resources. It may call some non-transactional systems during the process, but internally, the process is always a single transaction.
- The process is often used to expose services providing real-time responses to graphical user interfaces, or for transactional sub-processes.
- It is not essential that the process interacts transactionally with backend systems. It can also be used for a process calling multiple non-transactional services where the current state of the process need not be retained in a failure; for example, aggregation of multiple sources of data.
Figure 4. Transaction boundaries in a synchronous transactional process
An example use of this process implementation type is a travel booking system that confirms reservations of hotels, flights, and car rentals across a number of transactional systems. This system responds in real-time to the customer to confirm the booking.
Since we are dealing with a short running BPEL process, the following features are not available to us:
- Human tasks may not be present in the process. Business exceptions must be handled in-process, or for performance handed back to the user as an exception code.
- The process cannot receive in-process events, such as event handlers, mid-process of "Receive" activities.
- Parallel processing is not possible. Be aware that parallel scopes run in serial in a short running process.
- Avoid using compensation. Simplistic compensation can still be defined on individual activities, but try to avoid it to improve response time on error.
A typical makeup of a synchronous transactional process is shown in Figure 5.
Figure 5. Example of a synchronous transactional process
The following relate to the numbered points in Figure 5:
- While you can use constructs such as “For Each” to good effect, remember that this is a short running process running on a single thread, so there is no benefit when using it to process items in parallel.
- How many times will the invoke be called inside the loop? If you only find out at runtime, how can you be sure that the total aggregated time is not longer than the globally set transaction time for Process Server of 120 seconds? You need to consider these important edge cases in the design.
- Since short running processes can be replaced relatively easily, business logic may be coded directly into the process. For the longer running processes described later, it might be wise to separate it out into business rules or sub processes.
- Invocations to transactional systems will, by default, be performed in the same transaction as the process. If you need to alter this behavior, adjust the transaction related SCA qualifiers.
- If you have to make updates to a non-transactional system, it is best to do that last. If you have any problems during the transactional updates, they can all be rolled back without having to worry about how to compensate the non-transactional system.
- This is a synchronous request, so the caller expects a populated response.
Performance is of primary importance, typically in terms of the response times of the transaction if these are user initiated requests. Many of the following points relate to maximization of response time performance:
- Only actions absolutely necessary for the response are performed within the synchronous process to ensure the fastest possible response time. Any remaining actions are pushed out into a separate process, such as an “Asynchronously initiated transactional process”.
- Error handling is focused around translating system errors to useful defined fault codes that can be interpreted by the calling system. If error handling is automated, it is typically pushed offline into a separate process so that the user is not waiting for compensating actions to be performed.
- Since invocations are performed serially, the aggregated response times of all backend systems must be considered when assessing the feasible service level agreement for the process.
- Since compensating actions are not advised in this usage type and human tasks are not available to us, you must fully understand the level of transactionality of downstream systems. Where they are non-transactional, assess carefully the rare, but important, error conditions where “in-doubt” transactions may occur.
When building an infrastructure on which this type of process can run, consider the following implications:
- From an infrastructural point of view, no state is held. If this were the only type of process implementation type used on the topology, this greatly simplifies the clustering considerations around JVM and database tuning and disaster recovery. You may not need a separate messaging cluster. However, it is likely that this usage will share a server infrastructure that does have state requirements for other (for example, long running) processes. Of course, you can dedicate a cluster to short running processes only if you really wanted isolation from any effects of the asynchronous load.
- As this usage type centers around providing services for use by synchronous consumers, you can assume that the performance characteristics focus on response times, rather than throughput. This has implications in terms of the sizing of the infrastructure. Significant headroom in processor power will be required if you want to ensure the same response time even during peak periods.
- With the focus on live users interacting with the system, you need a topology that provides high availability during all hours that the system is online to the users.
- Even though this process implementation type provides “state free” services, do not forget that the processes may be performing two phase transactions with the external systems with which they communicate. If a server crash occurs, the state of the transaction logs must be preserved. This is, of course, true for all Process Server and WebSphere ESB installations and any two phase transaction co-coordinator.
Note that a mediation flow component is an alternative implementation for this pattern, although it has some key limitations. This is a deep topic, partly relating to differing features between mediation flow components and BPEL, but also relating to the thorny question of the difference between integration logic and business logic. This topic will be covered in a future article.
A process implementation type where the caller need not wait for the request to completed, so the process is triggered with an asynchronous one-way request. The actual transactional process occurs in the background. It is used to improve apparent response time where the caller only requires an acknowledgment, as opposed to a completion.
Technically, this is a short running BPEL process initiated by a one way asynchronous call (Figure 6) using either the internal SCA messaging transport, a JMS provider, WebSphere MQ, or perhaps a protocol such as Web Services Reliable Messaging.
Figure 6. Transaction boundaries of an asynchronously initiated transactional process
The following are the primary characteristics of this type of process:
- This is a process that completes in a single transaction, longer than the consumer is prepared to wait for (Figure 7).
- The consumer of the process requires nothing more than confirmation that the action will occur at some point in the (near) future. For example, they do not require any “new” data in the confirmation response.
Figure 7. Example of an asynchronously initiated transactional process
The following relates to the numbered points in Figure 7:
- Asynchronously initiated processes are typically used to perform background work. They are often initiated with a one way interface.
- Since the caller is not waiting synchronously for a response, it is acceptable for the aggregate time for all the invocations in the process to be higher than is acceptable for a synchronous transactional process. However, you are still performing everything within one transaction, so you must ensure the process completes well within the transaction timeout of the server (typically 120 seconds).
- If an error occurs, there is no caller to pass the message back to. In fact, errors end up in the Process Server “failed event manager” and need to be resolved by an operations team. You must ensure that only system errors, such as connectivity problems, end up there. Business errors are passed to an exception handling mechanism that is visible to the business. This can be anything from a separate process containing human tasks, to just an exception report viewed by the business.
- Note that due to the typical use of a one way interface for initiation, this process type does do not usually have a “Reply” activity in the BPEL.
An example is a solution where a user requests a new book via an online bookstore. The request is complex involving many interactions and taking around 30 seconds to complete, but it can be fulfilled in a single transaction. The user only needs to know that the book will eventually arrive in the post. The user, therefore, receives an immediate confirmation and does not need to wait for the whole request to be processed.
The primary issues to consider when designing this type of process are:
- At its core, this shares many of the design characteristics of a synchronous transactional process, but typically with less response time performance demand since there is no user awaiting a response. The SLA is likely to be driven by throughput rather than the transaction duration. However, be aware that there is a global limit to the maximum transaction time on the server of 120 seconds. Aim for an overall transactional process to complete well within this, such as within a few tens of seconds at most.
- In terms of error handling, the process needs to ensure that, once received and confirmation sent to the caller, the business event is never lost, and always resolved to completion in some form. Any failure failed request must be stored and visible to operators so that it is guaranteed to be resolved at a future time either manually or autonomically. By default, in the typical implementation, faults will go to the Failed Event Manager. Enough information must be present in the request for it to be resolved without the original consumer present - or at least you can contact the caller. You need to consider who will be doing this resolution. The Failed Event Manager is a tool used by IT operations. If the customer needs to be contacted, that is done by a call center representative. You need to think about how they will pass information about the failed event between them. In short, you need to be clear about the differing roles of process administrator, IT operator, and how they each interact with the error.
Alternate design options to consider:
- Note that a mediation flow component is an alternative implementation for this pattern, although it has some key limitations. As noted earlier, we will cover this in a separate article.
- In its most simplistic form, the user needs only a confirmation that the message is received. However, it is often the case that the user needs some key piece of data, such as a reference number, which then forces you to make at least one invocation before you can respond. In these cases, you typically chain a synchronous transactional process with an asynchronously initiated transactional process. This allows you to do the essential work in the first process, and then the background work in the second process. Note that the first process checks for the errors that the caller needs to fix.
- We have mainly focused on an implementation using a short running process, but of course, anything can be asynchronously initiated. Other implementations can use a mediation flow component, or a long running process to perform the work. The key point of this pattern is that just because something can be done in one unit of work does not mean the user should wait for it.
- Anti-pattern variant: Beware of promising what you cannot guarantee to deliver. Imagine the case of booking an airline seat. The user may get an acknowledgement that they read as “your seat is booked”, but by the time the asynchronous request is fulfilled, the seat might have been taken by a competing request.
Consider the following implications:
- As we have an asynchronous requirement using messages, you can assume that a separate messaging engine cluster is critical for efficient load distribution.
- Where an asynchronous invocation fails, you need reliable access to support services, such as the Failed Event Manager, so a separate support cluster is required.
- By decoupling the main process from the consumer, you have longer transaction times. However, this means threads will be in use for longer, using up threads from the main thread pool needed by other parts of Process Server. Fortunately, it is possible to configure a special thread pool for use on this asynchronous work. You can also configure the number of concurrent threads that are allowed to perform the work. This is another key advantage of this pattern since it allows you to spread the load of the work across a much longer time period, saving CPU and memory for synchronous activity during peak times.
This is a special use of a long running process (Figure 8), where the process is specifically designed to complete swiftly. The process lifespan is deliberately short (roughly minutes), such that process versioning issues associated with longer running processes are avoided (see Versioned long running process and Task based process). No human tasks or in-process events may be present. Common uses are to handle “straight through processing” or to manage parallel aggregation.
Figure 8. Properties settings for a briefly persisted process
Note how much simpler the design, implementation, and most importantly, maintenance of a briefly persisted process is compared to truly long running types, such as the versioned long running process described next. Armed with this knowledge, you can make a conscious decision of which implementation type to use and the advantages and disadvantages of each.
Sometimes we use this type in preference to a synchronous transactional process. For example, when the aggregate transaction time is too long, when you want recovery in between multiple transactions, when you need parallelism, or when you need to communicate reliably with asynchronous systems (Figure 9).
Figure 9. Assembly diagram for a briefly persisted process
Consider the following characteristics:
- You have a multi-transaction process that needs to survive server restarts and carry on from where it left off, but whose process instances can be flushed to completion. This allows an update to the process to be installed easily (see the example in Figure 10).
- Typically, instances complete in seconds, minutes, or at most, hours.
Any business reporting will be historical rather than on running
Figure 10. Example of a briefly persisted process
The following relate to the numbered points in Figure 10:
- For Each and Parallel scopes (1a and 1b in Figure 10) really do run in parallel briefly persisted processes since they are long running. This is one of their key benefits over short running processes.
- Processing on timeout is minimal so as not to risk holding up the process.
- Where possible, invocations have expirations set. Note that only asynchronous invocations can have expirations in the BPEL itself. Expirations on synchronous requests generally have to be done at the transport binding level.
- Compensation actions are ideally handed off via one-way asynchronous calls so that this process can continue immediately to completion.
- Ideally, callers of a briefly persisted process are not dependent on real-time responses due to the throughput based performance characteristics of long running processes. For synchronous callers, the process may be better initiated via a one-way interface, and when no Reply activity is present.
Some examples of this process type are:
- Background processing of a credit card payment, requiring various non-transactionally related invocations: validating a card number, checking the current balance on an account, processing the payment, submitting the transaction to the accounts system, and sending a confirmation email. If a server is brought down while the payment is being processed, the payment must continue to be processed from the point at which it had reached by another server.
- Another characteristic is parallel aggregation of a number of relatively slow responding services. For example, if a broker wanted to request 12 different quotations from different sources, each request taking 5-10 seconds. It makes sense to perform the 12 requests in parallel and have the complete answer in 10 seconds, rather than doing it in serial and taking 60-120 seconds. Note, however, that this scenario has a response time based requirement. Also note the sidebar on potential anti-patterns in this area.
As noted, a briefly persisted process, by design, promises to complete in a reasonable period of time. The maximum process duration is specifically noted in the design and the process needs to be specifically designed to achieve it. It is important to the operations teams, as it sets the maximum time, to flush all running instances prior to installing any updates on the process template. It is also important to the consumer of the service, and may be defined as a formal “service level agreement”. The design considerations for briefly persisted processes, therefore, focus on timely completion of the process:
- No human tasks can be used in the process, since it is not possible to guarantee the response time of a task. It is rarely an appropriate expectation that a task will be completed within minutes, or even by the end of a working day.
- "Continue on error" flag must be set to “true”. Otherwise, the process administrator tasks may hold up the process until they are resolved. Ideally, you want to catch all faults on the invocations and control the error behavior rather than have the process fail. Unhandled exceptions result in a “failed process”. You need to consider how these will be cleared to upgrade a process template.
- If the state of the process needs to be held to resolve the issue, it cannot be held by keeping the briefly persisted process running. The process must end, perhaps kicking off a truly long running process to independently handle the exception, or storing the relevant data to a data store. Compensation actions must be taken into consideration in the expected length of the process, and any that take an inappropriately long time need to be offlined to an alternate resolution mechanism, such as a dedicated exception handling process or an exception report.
- Invokes need expiration times set (Figure 11) that match with the SLA
of the process. Timeouts need to be caught and managed in a timely
Figure 11. Setting the expiration time on an invocation
Working with an SLA related to a “straight through process” that completes in a number of minutes or hours is an acceptable use of a briefly persisted process. However, sometimes briefly persisted processes are used as a way to provide parallelism in synchronous requests, as in the example of aggregating multiple quotations (see the sidebar on potential anti-patterns around response time vs. throughput requirements). This is a more difficult requirement to satisfy, and any online consumer of such a service must understand the nature of the service they are calling. For example, they need to provide the user with a clear indication that the background work may take time to complete, invoke the service in the background to allow the user to continue with other work, and understand what to do if the service takes too long to respond - for example, re-submit, poll, and so on. This is a useful place to use Ajax-style processing in a browser, where the user can continue to work while background processing takes place.
Beware of the “uncorrelated response”. If a briefly persisted process does not meet its SLA for whatever reason, the caller may “move on” and abandon while waiting for the reply. In this case, the BPP may still try and deliver its response, even though the caller is no longer there. Consider how this situation is handled.
Consider the following design options:
- Can a short lived process or mediation flow component get the job done? If so, there will be significant gains in simplicity of the design, especially around error handling. There will also be improved CPU performance and memory usage due to the reduced serialization.
- If the whole process cannot complete in a reasonable period of time, can it be broken up onto a chain of briefly persisted processes? However, you need to weigh the advantages against the loss of the overall view of the process in the implementation.
Consider the following implications:
- As we have an asynchronous requirement, you can assume that a separate messaging engine layer is critical to ensure even workload distribution.
- You need good monitoring in place to ensure alerts are fired if briefly persisted processes are lasting longer than agreed, or remaining as you move into a maintenance window.
- Where an asynchronous invocation fails, you need reliable access to support services such as the Failed Event Manager. Access to the BPC Explorer to administer failed processes is also required. A separate support cluster is advisable.
- If you are providing real-time responses, a high availability architecture is required. Plenty of headroom in processor power is also required.
- Housekeeping processes may be required to handle uncollected response messages, or more ideally, you need to leverage message expiration capabilities.
Now we move on to a process implementation type using a true long running process that will often last a relatively long time (days, weeks, and so on). Process instances will always be present in the system so that complex issues of process versioning must be taken into account. The reasons why the process may be longer running may be due to waiting for in-process events, calling asynchronous services that are themselves long running, and containing human tasks to process exception cases (see Figure 12). Note that if the process contains a predominance of human tasks, it falls into the next type - task based process.
Figure 12. Example of a versioned long running process
The following relate to the numbered points in Figure 12:
- The process may live long enough so that event handlers may be useful.
- Complex compensation can be embedded in the process.
- Invoke expirations must have timeout handling.
- You can use in-process receives and wait activities (4a and 4b in Figure 12) in a versioned long running process.
- Human tasks may be present, but typically for exception paths. For example, most flows do not create tasks and the main objective of the process is straight through processing. Contrast this with task based processes, which are described next in the article.
- Activities will, by default, run in separate transactions, but they can be configured to try to group together (“participate”) in the same transaction for performance. However, to ensure they run as a single transaction, they must be separated out into a short running process.
Consider the following characteristics:
- A multi-transaction process whose instances often span days or weeks so that process instances are assumed to be present when attempting to maintain the process template.
- There are obvious breaks of minutes, hours, or days in the process activity, where you must wait for external events to occur, or where exceptions mean human based tasks must be performed before the process can continue.
A custom built computer is ordered online. The process assesses what parts are already in stock, places orders for the remaining parts, and then awaits receipt of each order. On receipt of all orders, a request is logged with the manufacturing department to assemble and ship the computer. If any of the part orders cannot be sourced, or are delayed, a task is raised for a call center representative to contact the customer and discuss alternatives.
Versioning: Handling multiple process templates
You can assume that the process template cannot be re-installed to make changes since existing process instances depend on it. A new template will need to be introduced so that old and new versions of the process can run side by side.
The following discuss key design points about template versioning:
- Flexibility (dynamicity) requirements of the process must be understood from the start, as these points of variability must be designed in. Do these points need to be changeable in running instance, or just for new instances?
- Only limited changes can be made to templates with running process instances. It is beyond the scope of this article to detail what they are. They vary by version of the product, so refer to the Resources section for migration of process instances.
- Known points of variability that must change for existing process instances can often be pushed outside the process template; for example, by invoking business rules that can be re-configured at will, or by encapsulating logic in replaceable sub-processes. See the Associated short running components section.
- Where new template logic needs to be introduced without affecting existing process instances, the BPEL engine in Process Server has a sophisticated feature that allows multiple templates of the same name to exist on the same cluster, and routing between them to be performed automatically. This is called late binding. Late binding ensures that the latest version of a template is chosen dymamically at runtime, and also handles correlation of events with running process instances, regardless of which template version they use.
- Note that SCA bindings connected directly to the BPEL (for example, links in the assembly editor) will not use late binding features. To late bind to a process, you need to invoke via the BPEL API. You can do this directly in the Java code, but a much neater way is to front the long running process with a short lived BPEL acting as a proxy (see the Resources section). This short running process calls the long running process by specifying the BPEL template name on the BPEL partner reference, rather than via an SCA connection.
- Selectors are a more simplistic alternative to late-binding and perform date-based routing. However, be aware that they do not automatically handle correlation of mid-process events across different templates, and they can only call templates within the current module, or those made available via imports. Late binding by contrast will be aware of all versions of a process template, whatever module they are within.
Truly long running processes, by definition, may take a long time to complete, so ensure the needs of the initiator of the process are taken into account when considering how the process is to respond:
- Aim for either one way invocation, or at least a swift response early in the process. This improves decoupling with the caller. If the consumer must wait until the end of the process for the reply, ensure all consumers can cope with the timeframe of the long running process. Also consider how, if the process was to fail, can the caller be released from waiting for a response.
- Rather than always using a reply at the end of a process to respond to a caller, consider whether this response is actually a completely separate business event to be fired in a publish subscribe style at some point in the process. Bear in mind that this means the process will need enough context information to find the original consumer.
- Consider the applicability of the consumer polling for the results of the long running process, rather than receiving a reply or an event.
- A long spanning response may be unavoidable. If so, be aware that this means you cannot replace the calling module while requests are in progress. A further facade may be needed, which will have to be state free (for example, using permanent JMS queues for import and export).
- Since the whole module containing a template with running process instances cannot be uninstalled, any components deployed in the long-running BPEL's module also cannot be changed.
- Wherever possible, place short running components in a separate module to the long running process. Primarily, this ensures that they are replaced when changes are needed. It has the added advantage that they will have exports in place so that they are called from the new version of the template as well as the original. Otherwise, new versions of the short running components have to be created each time a new template is introduced.
- If the granularity of the process is right, the statuses of the long running process instances are likely to be of interest to the business users. Assess what information must be made available to the business. You then need to decide whether the inbuilt tools can satisfy the requirement, for example, the BPC Explorer (which since version 6.2 now includes the basic historical monitoring previously known as the BPC Observer), and process-based business space widgets.
- It is likely that status reports across active process instances will be required. Suitable analysis needs to be done to assess what process information is accessible so that query tables (introduced in V6.2) and other related capabilities, such as custom properties and custom tables, can be configured.
- Some level of history may need to be held outside of the process. Further business activity monitoring may, therefore, be required, such as the consumption of events is shown in business dashboards by WebSphere Business Monitor.
From the above design points, you can see that implementing a versioned long lived process is more complex than a briefly persisted process, even though they are both technically implementations of long running BPEL. This is one of the main reasons for differentiating between the two. If it is possible to implement a briefly persisted process, you need to do so and avoid the complexity associated with the versioned long lived process.
However, if the requirements need the features available from a versioned long lived process, you need to design and implement the long running BPEL process using a specific pattern as shown in Figure 13.
Figure 13. Pattern required to make best use of late binding to introduce new templates
The following relate to the numbered points in Figure 13:
- The dashed lines added to the solution diagram show the path of the hidden late binding as a result of configuring the Process Template name of “VersionedLongLivedProcess1” on the reference partner ProxyProcess1. Notice that no imports or exports are required to make the late binding happen.
- When this template is introduced with the same name of VersionedLongLivedProcess1, requests from ProxyProcess1 will automatically be directed to the new template based on its “valid from” date.
- All short lived components related to the long running process, such as rules, sub-processes, and mediation flows, are housed in a separate module to the long running process so that they can be updated independently. They are available to any new versions of the process template that might be introduced.
The key points to note about this pattern are:
- The facade module containing a short running process is used to provide late binding to the initial long running process template. Note that it must be in a separate module, since at some point, you will want to decommission (uninstall) the initial long running template. You cannot do this since the short running process is still in use, routing requests for the newer templates.
- To make the calls between the short and long running process late bindings, the “invoke” activities in the short running BPEL are not wired to anything over SCA. Instead, the name of the long running process template is specified in the properties of the BPEL partner (Figure 14).
- Notice that there may several possible invocation choices in the short running BPEL. These represent not only the request that initiates a new process instance, but also the in-process events that might correlate with the long running process as a result of “event handlers” or mid process “receive” activities within it.
- The long running process template is delivered in a module of its own. All related short running components are placed in a separate module so that they are replaced at will. If they were in the module with the long running process, they cannot be updated while process instances were present for that template. They are uninstalled when the template’s module was decommissioned. This also enables newer templates to make use of the original short running components as they are all exposed via exports. They are not callable if they are included in the same module as the original template.
- New templates are delivered on their own in new modules. The process template name must always be the same for the late binding versioning to work.
Figure 14. Setting the process template name to enable late binding
Consider the following implications:
- As we have an asynchronous requirement, you can assume that a separate messaging engine layer is critical.
- Where an asynchronous invocation fails, you will need reliable access to support services, such as the Failed Event Manager and CEI event viewer, so a separate support cluster is required.
- The focus of this process implementation type is asynchronous, and therefore, the main requirement is throughput - the number of process activities performed per second. You can tune the Business Process Choreography (BPC) engine to control the amount of work that it takes on concurrently from its input queue. This is balanced to a level where the BPC engine can make good use of the available CPUs, but without completely stealing all the CPU from more online synchronous activities, such as using the BPC Explorer, business space, and synchronous services.
- Since the business processes last longer, they will overlap more, and consequently there will be more running instances present in the database than for briefly persisted processes. The database will be larger, and some tables will require more active housekeeping to ensure their statistics and indexes are up-to-date.
- You cannot remove applications with running process instances at the time new versions of process templates are introduced. Therefore, a mechanism for decommissioning of old templates when all their instances complete will need to be introduced.
- If processes are not deleted automatically on completion, suitable archiving or housekeeping mechanisms must be put in place to ensure process instances are cleared down to keep the database to a manageable size.
- This implementation type may contain a small number of human tasks so some of the design considerations for task based processes may be relevant.
- It is likely that the solution will include a process monitoring requirement, and the infrastructure is, therefore, likely to require the setup of the CEI.
- Note: The scope of the BPEL late-binding is limited to the cluster as the BPC engine is cluster-based rather than cell based. Therefore, you need to make sure that modules containing BPELs that are interacting cross-module over late binding are deployed into the same cluster. BPELs early-bound over SCA calls are fine, of course, since SCA is a cell-level concept and the cluster restriction does not apply.
This implementation type is essentially a long running process that predominantly contains human tasks (Figure 15). It enables you to perform efficient workload balancing of multiple tasks between a number of different users, often in different teams or departments. Since the presence of many human tasks in the process ensures that it will last a long time, it implicitly inherits all the versioning issues of the versioned long lived process. This means new challenges relating to the workflow patterns and the user interface considerations associated with the human tasks.
Figure 15. Example of a task based process
The following relate to the numbered points in Figure 15:
- You can use loops with human tasks too, allowing you to efficiently spread items of work in parallel.
- Processes can break waiting for an event, or for some period of time to pass. However, if the process waits for a significant time, carefully consider whether to split the process here to lessen the versioning complications.
- Enabling well coordinated parallelism between human activities (3a and 3b in Figure 15) is often key to business process improvements.
- Task based processes, of course, still aim to automate as many actions as possible with system invocations. Tasks can be automated further over time with little or no change to the core process.
- Collaboration scopes allow more flexible navigation, splits, joins, forks, merges, back links, and user driven navigation, such as skip and redo, and user added sub tasks and follow on tasks.
This process is predominantly based around task based human interaction. BPEL processes guide the flow of control from one task to the next. Most processes are expected to go though a significant number of user tasks before completion.
One example is a support process where the problem arrives at a generic front line support person and is gradually passed through the various levels of the support organization until it reaches an appropriate point for resolution. Along its path, the support problem may be split into multiple tasks pursued in parallel. Tasks need to be passed to appropriately skilled teams of people. Timely resolution needs to be ensured, including automated escalation of problems that are taking too long to resolve. Users who perform the tasks may have little awareness of how they fit within the overall process. Business managers, however, need current and historical information about how the process is flowing.
Consider the following design considerations:
- Due to the implicit long-lived nature of human tasks, versioning issues within processes are critical here. All task based processes inherit the versioning complexities of versioned long lived process described earlier.
- Human task or workflow is a whole design area in its own right, stretching as much into the dynamics of human interaction and teaming as it does into technology. Common workflow patterns need to be understood to capture the process requirements effectively. It is often beneficial to model the business process first using tools such as WebSphere Business Modeler to visualize and simulate the process and assess potential process optimizations prior to committing to detailed design and implementation.
- Although the process is predominantly human tasks, that does not preclude it completely from performing system interactions (invocations) as well. However, where these are more complex, involving a sequence of several steps, it is wise to separate these out into a sub-process based on any of the previously discussed process implementation types. This allows you to take advantage of the features of those types, such as transactionality or brief (versioning free) persistence.
- The people performing the tasks are likely to require sophisticated
graphical user interfaces, so choices between different user interface
options become important. While this is beyond the scope of this
article, note that within the product, there are a number of key
- The BPC Explorer provides a minimalistic interface and allows data forms to be customized using Java Server Pages technology.
- Business Space widgets provide a significant step forward in "out of the box" functionality and flexibility, allowing many common user interface requirements to be “configured” rather than coded.
- There are also various options for generating user interfaces, such as Lotus® Forms from task templates. For a complex power user frontend, however, a completely custom graphical user interface may be required. Carefully consider the benefits of embracing the Business Space framework for this also.
- For people to make decisions associated with their tasks, they typically require a significant amount of data. If they did not, you probably can automate the response. You need to assess whether the process holds the necessary data itself, or whether it simply passes keys to allow the user interface to retrieve the data. If the latter, a more process portal or business space style interface is likely to be appropriate with portlets or widgets that can self populate, given appropriate pieces of context information.
- Users will often need to see images or scanned copies of paper documents, the contents of which Process Server does not actually need. Ensure that these large objects are not passed through the process, and only referred to via links (for example, URLs) that can then be used by the user interface to retrieve the content directly. There is little value in Process Server handling large objects whose content it cannot interpret. Managing such objects is generally an unnecessary drain on memory and CPU.
- Users interacting with tasks introduce a whole new set of issues in relation to how they are authenticated, what actions they are authorized to perform, and whether their security context needs be used to perform the system interactions that result from their task interaction. The end-to-end security architecture needs to be well understood during the design phase, and whether and how Process Server interacts with enterprise user registries (for example, LDAP) must be agreed. Remember that BPEL processes have authorization requirements too (for process owner or process starter, and so on). Ensure that the user registry group caters for this also.
- Requirements service level agreements (SLAs) and key performance indicators (KPIs) for how long it takes to complete tasks, or to reach certain parts of the process are critical for ongoing process optimization and management. While the process itself can help to enforce these KPIs, aggregate monitoring of how they are performing against these metrics may be a separate design exercise; for example, setting up a monitoring model to be viewed in dashboards from WebSphere Business Monitor. Note that this information can effectively capture the business model and use to kick start the creation of the monitoring model.
- The breadth of workflow specific features of the product need to be well understood, such as the business calendar, escalations, substitutions, and staff resolution.
- Consider the effect of ad-hoc tasks, such as subtasks, and follow-on tasks, especially when using Business Space. A BPEL may create a single task and wait for its completion. The user may then add sub-tasks and follow-ons into a complex delegation structure containing many tasks of which the invoking process is unaware. Consider how the arbitrary navigation that ad-hoc tasks offer affect, monitoring the solution. Might they, for example, produce an unbalanced set of start and end markers.
- Note the special case of workflow (page flow process) discussed later in the article. Be aware of the difference between this and normal task based processes, and be careful not to mix the two styles in the same process.
Consider the following implications:
- Clearly the human task container must be configured for the runtime. Also, Business Space needs to be configured since the human task related widgets provide an extremely powerful starting point for the graphical user front end, and also for general administration of tasks.
- A separate “support” cluster must be present to house the BPC Explorer, business space, and the CEI so that it is independent of the cluster running the BPEL applications. If the Business Space or the BPC Explorer is used as the user interface for business users, then it may be appropriate to have a separate presentation tier tuned to the availability and performance expectations of online business users.
- Users working the human tasks must be identifiable if they are to see and administer only their own tasks. The security architecture must propagate user identity appropriately. You must consider the options for where user identity is store - ideally in an LDAP repository. Are all the users internal, or do you also have external customers performing or viewing tasks? Are these two groups of users available in the same directory? You may need a more sophisticated and flexible mechanism, such as the Virtual Member Manager.
- Much of the user activity will be centered around access to tasks, and more importantly, lists of tasks. There will inevitably be a large number of process instances and associated tasks. You need to ensure that the BPC database is well tuned such that user queries against the tasks and task lists are highly responsive.
- The HTM container is cluster scoped, not cell scoped. Each cluster, for example, has a different BPC Explorer, and the HTM APIs will only pull back lists of tasks from the cluster to which you connect. If you have, for example, the double gold topology where there are multiple deployment targets, you will have two unconnected sets of tasks (and processes).
We have deliberately left one special case of a process implementation to be the last. It is a strange one, as it is actually the one type where Process Server is not involved at all in the process navigation. It is important to recognize that there are times when Process Server is not the right place for process logic, namely when you are talking about screen to screen navigation that truly belongs in the presentation layer. A GUI intensive process is where navigational flow has a more logical home in the graphical user interface.
This process uses mostly enterprise services that is defined by navigation "at the glass", where a user is guided from one screen to another. The navigation is typically embedded in the presentation framework itself.
To be specific, Process Server is not involved in the process navigation at all. Process Server may be used to provide services to the GUI. However, in this capacity, Process Server is only providing exposition of services, not the process implementation itself.
A customer talks to a call center representative to request a loan. They fill out a sequence of pages of an electronic loan application form. The path through the various pages depends on the information entered in the previous steps. During the process, multiple enterprise services are called, such as address matching, customer verification, and account history. The page flow rarely changes hands from the original call center representative during the form filling process.
Consider the following design considerations:
- Most presentation frameworks have their own navigation capabilities for guiding users from one screen to the next, and it makes much more sense to use these rather than split the risk of having the GUI implementation split between Process Server and the presentation layer.
- Since the GUI often requires data from multiple sources to provide the user with all the information they need, it might be tempting to think this is why a Process Server process is used, to aggregate and “serve up” this data to the GUI. However, often the rich and specific collection of data that the user requires is so specific to the GUI that this does not form a re-usable data aggregation service. It is better in these cases to perform the integration "at the glass", using context sensitive portlets, or widgets that draw the information as required.
- Where BPEL processes are used, they will typically provide responses back to a user, requiring fast response times. You might find other process implementation types in use, alongside a GUI intensive process providing the services used by the GUI. They will typically be short running (see Synchronous transactional process), or initiated with fire and forget calls (see Asynchronously initiated transactional process). They will have no knowledge of the GUI that is calling them or its navigational state.
- Note that there is another place that navigational logic often lives in GUI intensive processes, and that is within the user’s head! Many rich graphical user interfaces simply provide a flexible mechanism for users to access data in a myriad of different ways, and leave it largely to the user to decide the logical sequence of actions required to perform the work. Where the screen flow of an individual user is complex, it is doubly clear that you should not try to engrave it into a Process Server BPEL process.
- Despite all the warnings above, there is a special case where a process can be used to provide screen navigation known as a page flow process. This is covered in the next section.
Consider the following implications:
- Since the process is not implemented in Process Server at all, any remaining requirement from Process Server centers around providing services for use by a GUI. You can assume that these performance characteristics focus on response times, rather than throughput. This has implications in terms of the sizing of the infrastructure. Significant headroom in processor power will be required to ensure the same response time even during peak periods.
- With the focus on live users interacting with the system, you will require a topology that provides high availability during all hours for the users.
- If the requirements are purely synchronous, transactional, and state free, you can use a simplified architecture that does not require a separated messaging engine layer. However, it is rare that any usage type is used in isolation. Nearly all other usage types have a messaging requirement.
Page flow is deliberately not shown as one of the core process implementation types in the diagram in Figure 2 because it is a complicated special case. However, it would be remiss not to discuss it at all.
Page flow is a special case of a task based process, where instead of the tasks being spread between a group of users, all of the tasks are for the same user (Figure 16). In other words, this is a way of using a task based process to provide screen navigation for a single user. Notice there is a strange overlap here. We are talking about using a task based process to implement something that is really a GUI intensive process. We have said that these are generally implemented in the presentation tier, so opportunities for page flow processes need to be careful assessed and considered along side other technologies, such as the page navigation capabilities of the presentation tier.
Figure 16. Page flow process
The following relate to the numbered points in Figure 16:
- Typically, a single user performs the process end to end.
- Movement from one task to the next must happen in a single transaction.
- Movement between tasks often does not require interaction with backend systems, and just represents screen navigation.
- Interaction with backend systems needs to be transactional wherever possible. Non-transactional error handling is complex in a page flow scenario.
It is important to recognize just how different this is from normal workflow style processes. In workflow, when a task is completed, the transaction also completes and control is passed back to the process. The user does not wait for any further response from it. Most probably they go back to their task list, and pick up a completely unrelated task from a different process. However, on completion of a task in page flow, you expect the process to perform all process navigation necessary to get to whatever the next task may be within in the current process, and return that task to you immediately. Looking back on the discussion earlier on Response time vs. throughput requirements, workflow satisfies a throughput based requirement. Page flow, using the same technology components, attempts to satisfy a response time based requirement. So clearly, you will have to design in a markedly different way if this is to work effectively (see the section on Avoiding pitfalls in page flow later.
Probably the most common reason for interest in page flow is the hope that it will provide a GUI, without the need for a team specialized in user interface technology. While there is some truth in this, carefully assess the sweet spots and limitations of page flow before embarking on this route:
- In page flow, screen navigations, and perhaps validation too, are centralized, so they can be used by multiple presentation channels. However, this is achieved in many other ways, and only requires the navigation layer to be separated correctly in the GUI design.
- The page flow process owns the state of the data. There is one place only to go to understand where in the business process you are, and what to do next. The downside to this is that the GUI has no idea where it will go to next, and has no understanding of the overall navigation the user is performing. This can make the design of the GUI much harder, especially in relation to caching content and improving navigation usability.
- Page flow effectively splits the presentation logic across two tiers of the architecture. As a result, in page flow, changes to the GUI often involve two teams: the UI team and the process team.
- Another common advantage cited is that work from one user can easily be transferred to another user. However, a sensible approach to persistence of state in the presentation tier can achieve this equally well.
- Finally, page flow is often considered to make monitoring of an overall business process easier since there are many quick-start monitoring capabilities in BPEL processes, such as using the BPC Explorer, or sending events to WebSphere Business Monitor. Note, however, that WebSphere Business Monitor can take events from anywhere, so the GUI layer itself or the services layer can provide events to allow the process to be monitored.
- Use a GUI navigation framework in preference to page flow: Avoid page flow if possible. It is generally better to keep single user navigation in the presentation tier.
- Limit the use of page flow to sequences that are truly task based: Only use page flow in its sweet spot, where the user is being driven through a set of tasks. Where the user requires free navigation around screens, page flow will be inappropriate. Do not use it as a general GUI framework for all types of user interface. Consider whether the user will want to go backwards as well as forwards through the screens, or what will happen when the user presses the “Back” button on the browser.
- Use single step Human Task Manager (HTM) APIs: Use only synchronous APIs to interact with the HTM. These specifically enable single step transition from one task to the next, and from initiation of a process to the first task in the process.
- Maintain transactionality between pages: Ensure movement in the process from one task to another can be performed in a single transaction. This includes any updating invocations. If you were to have multiple transactions between pages, and only some of them succeeded, where does that leave the GUI? The user may not be able to navigate to any page at all, as the next task may not have been reached.
- A page flow should not span processes: It is not possible to complete a task in one process, and to pick up a task in another process in a single transaction. For this reason, at least, and also for reasons of performance and loss of context, flow between pages needs to be limited to tasks in the same process.
- Reduce or avoid interactions with backend systems between tasks: Avoid interaction with backend systems in between pages (for example, within the process). In preference, have the GUI perform system interactions. This reduces the “chatter” between the process and GUI, avoids unnecessarily large amounts of data from being stored in the process, and improves response times between pages. The process purely serves up the next task in line.
Page flow is technically possible, but has its drawbacks. Assess it alongside the following alternatives:
- Consider your longer term GUI strategy. Use the navigational capabilities of a presentation framework if available. Introduce one if necessary.
- Introduce presentation layer persistence to enable the user to suspend work, or pass work to other users during mid-screen flow.
- Have the GUI or services layer publish events relevant to business activity monitoring.
- Is there any “flow”, or did the user just needed a set of tabbed pages to fill out a particularly large form? Look at ways of supporting a large form within the presentation layer.
- Can a richer portal style of user interface provide the user with all the information they needed at once, rather than requiring several pages to perform the work?
- If you are really keen on having the GUI decoupled from the navigation, consider providing a service that receives the data submitted by the GUI, and responds with the next “logical action” to take. The GUI then interprets this into what actual pages are displayed. This way, the service provides logical navigation, but it remains fully decoupled from the subtleties of how the GUI is implemented.
We have looked at all the process implementation types in detail now, but do you really need to wade through all the preceding information every time you need to make a decision as to what implementation type to use?
In Figure 17, we have provided a quick decision tree that highlights the process characteristics that most strongly affect the choice of implementation types. With these key criteria in mind, decisions can typically be made without referring to the deeper detail of exactly how the implementation type works internally.
Figure 17. Decision tree for choosing process implementation types based on the primary criteria
We have created a vocabulary of different design patterns for processes - our process implementation types. These patterns provide capabilities that can be understood by high level designer who may not be a deep Process Server specialist, yet mean something specific to a detailed designer or implementer. Almost all solutions we have come across can be described in terms of these key patterns. In fact, you can argue that if part of a solution does not fall into one of the key patterns, you may be staring at an anti-pattern in the face.
Figure 18. Example solution depicted against the SOA reference architecture layers
Consider for a moment how much information is conveyed in Figure 18. As a small example, take for example, the versioned long running process. We know from earlier descriptions of this implementation type that at least three modules will be needed for dynamicity and versioning. However, we do not need to show this in the design since it can be assumed as part of the implementation pattern.
You have seen that there are more discrete types of process than simply short and long running processes. By understanding what the core implementation types are, you can use these as a “pattern vocabulary” to translate requirements into workable solutions. Since the process implementation types have well-defined characteristics, they can be used as effective building blocks in a high level design, giving the designer confidence in what can and cannot be achieved with each type.
The transition from high level design to detailed design is also improved as coding standards can include detailed examples of exactly how each of the types are implemented, providing more predictable and repeatable implementations.
We made an important assumption at the beginning of this article regarding the maturity of the services our processes were calling. This allowed you to focus on true business processes, uncluttered by how they integrate with external systems. In future articles, we will move the spotlight back onto the services and discuss how to characterize a service’s maturity, and what integration patterns to use to bring a service up to a quality that a process, or indeed any other consumer, would actually want to use.
The ideas within this article were primarily developed in collaboration with Brian Petrini, who will be authoring other articles within this series. Andy Garratt was also a primary contributor and reviewer of this article.
Furthermore, the conclusions above are gathered from discussions with people on design topics, project experiences, published articles, and also from our conversations with those involved in the creation of the product. Our thanks to the following colleagues (alphabetical by surname): Michele Chilanti, Andy Garratt, Marc Fiammante, Werner Fuehrich, Geoff Hambrick, Susan Herrmann, Fintan McElroy, Gerhard Pfau, Roland Peisl, Ruth Schilling, Ruud Schoonderwoerd, Joseph (Lin) Sharpe, Paul Verschueren, Claudia Zentner, and in reality, many more.
Increase flexibility with the Service Integration Maturity Model
(SIMM) by Ali Arsanjani, and Kerrie Holley provides a basic
description of the SIMM, which has since been adopted by the open group.
- The first article in this series explains how WebSphere Process
Server and WebSphere Enterprise Service Bus are used to facilitate
differing levels of SOA maturity.
- The Information Center has good detail
about process versioning in general and on how late binding works, and the proxy pattern that should be used in
order to achieve it. There is also a good article on developerworks WebSphere that digs into the topic
in more detail. Note that migration of process instances to new versions of a template was
introduced in Version 7 of WebSphere Process Server, but it is important
to understand the boundaries of this capability.
- The IBM Redbook, Human-Centric Business Process Management with WebSphere Process
Server V6, provides some good work examples of task based
- For a practical approach on defining
processes within an SOA, take a look at the Process oriented modeling for SOA article series by Ruud
- Workflow patterns have been studied in
great detail by Professor Wil van der Aalst and are listed in the workflow patterns Web
site. These are useful when working on task based processes.
- For details on how to use Business Space
graphical user interfaces in conjunction with BPEL processes take a look
at IBM Redbook: Building solutions with Business Space powered by
- This slightly dated BPMN
tutorial has some slides on the difference between orchestration
and choreography. If you want full detail, you can consult the BPMN
Kim Clark is an IT Specialist from the United Kingdom working in IBM Software Services for WebSphere (ISSW). He has been working in the IT industry since 1993. He has been the technical lead on projects since the first implementations of WebSphere Process Server and writes and presents regularly on SOA design. He holds a degree in Physics from the University of London, England.