Recovering from failed asynchronous SCA service invocations on WebSphere Process Server

Imagine this common scenario. You have a mission-critical Service Component Architecture (SCA) application running on IBM WebSphere Process Server. A message is sent to this application to invoke an asynchronous service, however, service invocation fails. What happens to the original message? What new messages are sent? How can the system recover from the failure? This article describes potential message routes and recovery scenarios. It explains how to configure the system to set up recovery, and it covers a wide range of SCA messaging options, including both WebSphere Default Messaging (JMS provider) and WebSphere MQ.

This article is primarily intended for administrators, operations personnel, and advanced support personnel involved in setting up and troubleshooting applications running under WebSphere Process Server. To get the most out of this article, you need intermediate to advanced skills in WebSphere MQ and WebSphere messaging, and intermediate skills in SCA and Enterprise Service Bus (ESB).


Ivan Smirnov (, Senior Consultant, Prolifics

Ivan Smirnov is a Senior Consultant with Prolifics. He has been working with WebSphere Application Server and related products since 2000. Ivan has enjoyed a mix of development and administration/troubleshooting work, believing this mix provides unique insights into both areas. Ivan's currently concentrating on SOA projects leveraging WebSphere Integration Developer and WebSphere Process Server.

Neha Dhawale (, Consultant, Prolifics

Neha Dhawale is a consultant with Prolifics. She has experience working on various WebSphere products like WebSphere Application Server, WebSphere Process Server, WebSphere Integration Developer and MQ. Originally a J2EE developer, Neha is now helping several clients to enable and implement Service Oriented Architecture while using different Business Integration products. Currently her focus is on WebSphere Process Server.

16 January 2008

Also available in Russian


WebSphere Process Server (hereafter called Process Server) invokes Service Component Architecture modules and components as component services. This invocation, which is defined by the service binding, can be either synchronous or asynchronous. An unexpected failure in the module reroutes the message. How messages are rerouted and processed depends on the choice of the binding.

SCA and Web service bindings are synchronous. A synchronous invocation blocks the request to the target until a response (success or failure) is received. So, a system exception is realized immediately after the invocation and the service call is terminated. Then, it the service caller decides whether to retry the call.

In asynchronous binding, the service caller sends a message to the service provider using a messaging system, and then the caller moves on to other activities. If the service has a two-way interface, the service caller expects a reply message; however, in a one-way interface, the service caller does not expect any reply. Either way, the service caller does not "stay on the line" and will not know if the service provider "hangs up." Therefore, the infrastructure must be reliable and guarantee that each message is processed, even in the case of system failures. Asynchronous bindings support three message binding types: JMS, MQ, and MQ JMS.

Exceptions fall into one of two major categories: application or system.

Application (or business) exceptions arise when a certain business condition is not met. For example, a client might not have enough funds in an account to cover a requested withdrawal. These types of situations are modeled as service faults, which are included in the service interface. This model enables the precise definition of a system reaction to a business exception. So, this article focuses, instead, on system exceptions.

System exceptions occur when some part of the infrastructure fails (for example, the database is down or a partner Web service is unreachable), when a violated invocation contract occurs (for example, the service caller passes data in a different format than the service provider expects, possibly due to a version mismatch), or in the case of unexpected code errors (such as unchecked exceptions in Java). The goal is to make sure that all service invocations are accounted for and not lost, even in the case of a system exception.

This article builds on the developerWorks publication "Exception Handling in WebSphere Process Server and WebSphere Enterprise Service Bus" by Pamela Fong and Jeff Brent (see Resources for a link).

Applicable software

The failure and recovery scenarios described in this article apply to and have been tested with the following software:

  • IBM WebSphere Process Server V6.0.2.2
  • IBM WebSphere MQ V6.0.2.2

System error processing in WebSphere Process Server

Message recovery scenarios vary with the type of messaging system used, system configuration, and the location of the exception. Process Server supports the binding of asynchronous SCA interfaces to two messaging systems: built-in WebSphere Default Messaging and WebSphere MQ, which is accessible using either JMS (hereafter called MQ JMS binding) or native MQ interface (MQ binding).

For either system, the message delivery in an asynchronous invocation pattern is a two-hop process as shown in Figure 1. The Export module uses the JMS (or MQ) MDB listener service to listen at the destination. When a message arrives on the destination queue, the JMS MDB of the export module is invoked; then processing occurs, including object de-serialization and function selection. The SCA message is then dispatched to the SCA MDB to be delivered to the target component. For performance reasons, the SCA MDB is bypassed during the first message delivery attempt.

We confirmed this MDB interaction by reviewing the Java stack traces produced by system errors. You can see from stack traces that the first time an exception is thrown in a component, the message is picked up by an MDB class specific to export binding, such or

When the message is retried, this MDB is no longer in the picture; however, the message is picked up by the SCA MDB class, The hop from one MDB to another is accomplished by moving the message from the original export destination to the Service Integration Bus (SIB) destination generated for the export by Process Server. The MQ (or MQ JMS or JMS) MDB attempts to move messages when it recognizes that incoming messages have been previously delivered. If the SIB messaging engine is not available, another exception is thrown, and the message is backed out to the original destination. An example of an exception thrown if the messaging engine is not available is provided in the downloadable file SIB_ME_stopped.txt.

Figure 1. Initial routing of message by SCA Export
Figure 1. Initial routing of message by SCA Export

All SCA components in Process Server are invoked through Java; system exceptions are represented as, a subclass of java.lang.RuntimeException. If a system error occurs during component processing, an exception is propagated back, down the Java call stack, to the calling component, and further down the component call chain, until it reaches the SCA MDB.

If there has been an asynchronous invocation in the call chain, the exception reaches the associated MDB which received the most recent asynchronous call and which is listening on the corresponding SIB destination. If all component invocations in the module have been synchronous, this MDB will be the one that is listening on the export module. In this case, the message is first backed out to the messaging destination to which the export module is bound (SIB or MQ). Likewise, if a system error occurs during export processing (object de-serialization or function selection), the message is also backed out to the messaging destination from which it was picked.

What happens next to this message depends on the messaging system. Specific recovery scenarios vary with the messaging system configurations and the location of the exception. Several potential scenarios are discussed below.

If a system exception is propagated back to an asynchronously invoked component, the message is backed out to the SIB queue that corresponds to that component. It is picked up by the MDB listening on the queue. The retry count for the message increments by one and the message delivery is retried. The retry behavior for SIB destinations is the same whether the destination is used for an asynchronous component or module export, and is described below.

Failed Message Routing: WebSphere SIB

In the case of WebSphere Default Messaging, you specify retry behavior by setting the Maximum failed deliveries count and Exception destination SIB destination properties. Use the Integrated Services Console (that is, the Web administrative interface of WebSphere Process Server), and select Buses => SIB name => Destinations => SIB Destination name. Then, set these two properties in the SIB Destination properties window, as shown in Figure 2.

Figure 2. Configuring the exception destination for an SIB queue
Figure 2. Configuring the exception destination for an SIB queue

The redelivery count of the SIB queue destination governs the number of attempts made for the delivery of a message to the target component, in case of failures. If the message has been backed out the queue-specified number of times, it is moved to the exception destination.

For module exports with JMS binding and SCA components, the Process Server run time automatically generates the SIB queue destination when the SCA module is deployed. This auto-generated SIB queue has exception destination set to Failed Event queue. Messages sent to Failed Event queue are picked up the Failed Event manager MDB. Auto-generated queue destinations have maximum retry deliveries set to 5. This means that the message will be moved after five failures. For more information on this retry behavior and the SIB destination settings, see the developerWorks article "Exception Handling in WebSphere Process Server and WebSphere Enterprise Service Bus" by Pamela Fong and Jeff Brent.

SIB JMS destinations are connected to MDBs by activation specifications, not listener ports. Activation specifications do not have a poison message-control feature that shuts down the listener. This is not required because poison-control functionality is supplied by the SIB destination (by setting exception destination properties). Thus, in the case of JMS binding, there is one less object to configure.

Poison messages

Poison messages are messages that cannot be processed by the receiving application and are repeatedly backed out to the target destination. Poison message control is infrastructure functionality whose goal is to prevent infinite delivery, failure, and backout of poison messages.

In addition to configuring the messaging infrastructure to meet the business requirements, you need to understand how a messages flows in Process Server and the possible scenarios of failure so that you can enable appropriate recovery mechanisms to avoid message loss.

Failed message routing: MQ

In the case of MQ (either MQ or MQ JMS binding), retry behavior is controlled by local queue properties named backout threshold (BOTHRESH) and backout requeue queue (BOQNAME). You can find these propoerties in MQ Explorer on the Storage tab.

Figure 3. Configuring backout attributes of an MQ queue
Figure 3. Configuring backout attributes of an MQ queue

If backout threshold is specified (not 0), the Process Server run time attempts to move a message to the backout requeue queue after the message has been backed out a specified number of times. However, if backout requeue queue is not specified, does not exist, or the message cannot be put on that queue (for example, the queue has "put enabled" turned off, which causes temporary prohibition of putting messages on the queue), the message is moved to the queue manager's dead-letter queue (also known as the undelivered message queue).

Another setting that comes into play is the listener port’s retry count. A listener port is a WebSphere Application Server facility that feeds MQ messages to MDBs. It also keeps track of messages that are repeatedly picked up and backed out, to control poison messages. If a message has been backed out the number of times specified in the listener port retry count, then the listener ports shut down.

You can enable listener ports to be created and configured automatically using the "Generate Binding" feature in WebSphere Integration Developer, or you can configure them manually. Auto-generated listener ports have the retry count set to 1; failed messages will not be retried and the listener port will shut down after a first retry attempt.

The types of errors that could result in the message being backed out all the way to the MQ queue are:

  • The message cannot be de-serialized.
  • The message cannot be routed because of an error in the function selection.
  • There is a downstream system failure and the SIB Messaging Engine is not available.

Failures for the first two reasons are not recoverable by a retry because of the unexpected message content. You must either change the message content or the SCA module.

A listener port retry count of 1 works best with an MQ queue backout threshold of 1. This combination of settings results in the removal of defective messages to a backout requeue queue where they can be examined at a later time.

Table 1 summarizes the disposition of an MQ message in case of system failure during its processing, depending on location of failure and MQ queue settings. It assumes an auto-generated listener port with Retry Count=1.

Table 1: Message disposition in case of system failures with MQ and MQ JMS bindings
Source of failureBOTHRESH=1, BOQNAME is existing queueBOTHRESH=1, BOQNAME does not exist or Put DisabledBOTHRESH <> 1
De-serialization, Function selection, or SIB Messaging Engine downMessage moved to backout requeue queueMessage moved to Dead Letter QueueMessage returns to the same queue and becomes poisonous (would be picked up and backed out repeatedly). The listener ports shuts down.
Any SCA component other than ExportSIB forwards message to Failed Event Manager

To recover messages removed to the backout requeue queue and dead-letter queue requires an examination of failure reasons. Because messages do not carry any exception information, you need to determine the failure from other sources, such as Process Server logs or CEI events. After you identify and rectify the reason for failure reason, you can move messages to the destination queue. WebSphere MQ Explorer does not enable you to move messages; you need an alternative solution. You can either use the administration tool supplied in MQ Support Pack MO71, use your favorite tool (such as MQ Explorer), or write a custom script.

WebSphere MQ Support Pack MO71 (GUI Administrator) is a simple GUI tool for administering local and remote queue managers. You can leverage the following capabilities for failed message recovery:

  • Browse queues and individual messages, including all MQ headers.
  • Save message to a file and load message from file, which enables sophisticated editing operations using a variety of third-party tools of your choice most appropriate for the message content type.
  • Move messages to an arbitrary queue. The move message feature maps to the "resubmit message" functionality of the Failed Event Manager in the SIB world.
Figure 4. Browsing a failed message with MQ Support Pack MO71
Figure 4. Browsing a failed message with MQ Support Pack MO71

Under most scenarios, messages on the backout requeue queue will not preserve the name of the original destination queue from which they have been requeued because the destination queue name is only present in JMS headers, not MQ headers. Non-JMS clients do not send JMS headers. JMS clients, such as WebSphere Application Server may choose to include or omit certain headers, so you cannot rely on the header being there. Also, WebSphere MQ routing is very flexible and topologies implemented in production environments are often quite elaborate. In this situation, the message sender cannot know the eventual message destination, and the queue name it puts into the JMS header will not be the SCA module destination queue where the message failed (and where it should be returned).

Therefore, if you share the backout requeue queue among several destination queues, you might have great difficulty determining the original queue from which each particular message has been requeued, and your ability to correctly recover messages for eventual processing will suffer. For these reasons, you should assign a dedicated backout requeue queue to each module export destination.

Comparing recovery tools for MQ and SIB

For both MQ and SIB bindings, recovery from system errors consists of planning (configuring messaging resources with a possibility of failures in mind), failure investigation, and resubmission (moving the message back to the processing queue, possibly after an edit). The recovery tooling and procedure you use depends on the messaging system (SIB or MQ) involved in moving the message to requeue or exception destination. As explained above, in the case of MQ or MQ JMS binding, the message can be moved to a SIB exception destination, if the failure occurs after an asynchronous call within the SCA module. So, you always use SIB tools for recovery in JMS-bound services, but you might need to use both MQ and SIB tools for recovery of MQ and MQ JMS-bound services.

The SIB recovery tool is Failed Event Manager. It provides an easy-to-use Web interface and is part of the Integrated Solutions Console, which is the primary Web-based administrative tool for the WebSphere Application Server family, including WebSphere Process Server. (This interface was formerly known as "Admin Console"). Full integration with the WebSphere platform gives this Failed Event Manager several advantages:

  • Access to related common events supplies information for failure investigation.
  • If editing message data is required, business object data type awareness limits the possibility of corrupting message data and submitting an invalid message.
  • The tool remembers the original SIB destination where the message failed, so resubmission will be automatically directed to the correct destination.

Shortcomings of Failed Event Manager (FEM) are a reflection of its strength. In particular, with FEM it is not possible to resubmit failed message to a different destination or to change business object type. It is a black box application which currently does not provide extension points for customization.

Any MQ recovery tool suffers from its independence from Process Server. Failure investigation may be hampered by the need to hunt down error information that is not attached to the MQ message. You edit message data in a raw format, which has its pros and cons; you have unlimited power to change anything in the message, but run a risk of creating an invalid message format. Editing messages that use a non-text serialization model (such as Java serialization) in raw format presents nearly insurmountable challenges. Table 2 summarizes the differences between recovery tools for MQ and SIB JMS messages.

Table 2: Comparison of recovery tools for failed messages in MQ and SIB


Access to exception information

Not available


Editing message data formats

Raw data; editing binary format extremely challenging

All formats, attribute-by-attribute

Changing data type


Not possible


Select queue manually

To original queue automatically

In practice, recovery of MQ messages in production systems will likely require some degree of scripting automation.

Failed Event Manager

Failed Event Manager is the tool for recovery of SIB messages that could not be processed. The FEM MDB listens to the failed event queue and logs the message to the FEM database.

Failed Event Manager is the most powerful recovery tool in the WebSphere Process Server tool set. To access FEM, log on to WebSphere Process Server Integrated Services Console, and go to Integration Applications => Failed Event Manager, as shown in Figure 5.

Figure 5. Accessing Failed Event Manager
Figure 5. Accessing Failed Event Manager

Using FEM, you can investigate the failure by viewing event details to see which component failed and detailed information about the failure (Java Exception) and the original message.

Figure 6. Inspecting a failed event in Failed Event Manager
Inspecting a failed event in Failed Event Manager

You can also browse Common Base Events related to the message. And, you can edit the message content and resubmit it.

Figure 7.
Figure 7.

You can also resubmit the original message without changes, and you can "batch resubmit" multiple messages. Resubmitted messages are placed on the queue where the failure originally occurred.

Comparing synchronous and asynchronous calls within SCA modules

Regardless of the type of invocation mechanism used, the communication style between each of the SCA components in a module can be synchronous or asynchronous. Synchronous invocation of a component is implemented as a Java call. Asynchronous invocation entails putting a message on a SIB destination associated with the target component from where the message is picked by the SCA MDB and the component implementation class is invoked. To set the preferred interaction style for the interface of each SCA component, use the Details pane of the Properties page, as shown in Figure 4. The SCA component is invoked asynchronously if its preferred interaction style is Asynchronous and synchronously if its preferred interaction style is Synchronous or Any.

Figure 8. Specifying the preferred interaction style on an interface of an export
Figure 8. Specifying the preferred interaction style on an interface of an export

The preferred interaction style can be overridden if one Java component is invoked from another. Listing 1 shows the Java code for selecting synchronous or asynchronous invocation at run time based on a message attribute.

Listing 1: Overriding preferred interaction style at run time
public void induceErrorOnOneWay(DataObject msg) {
  boolean useAsync = msg.getBoolean("dispatchAsynchronously");
  OneWayErrorInducer service = locateService_OneWayErrorInducerPartner();
  if (useAsync) {
	// Will use ASYNC invocation
	OneWayErrorInducerAsync asyncService = (OneWayErrorInducerAsync)service;
  } else {
	// Will use SYNC call

Asynchronous invocations can occur between components within the SCA module regardless of the module binding type. Services provided by the module may be invoked synchronously (for example, using a Web service over HTTP call); however, some invocations inside the module might be asynchronous. Such an invocation breaks a transaction into "before" and "after" parts. When an asynchronous call boundary is reached, the SCA runtime attempts to put a message on the SIB queue of the target component. If successful, all previous work is committed. All work done up to this point is not undone or repeated even if a system error occurs later during service request processing.

The target component then picks up the message from the SIB destination. If an error occurs, the message rolls back to the components SIB queue, up to the "Maximum Failed Deliveries" times. After that, the message is moved to the Failed Event Manager queue. It is possible to recover such a message, but it is often impractical because of the short life span of synchronous calls. If the service has a two-way (request-response) interface, the component that issued the asynchronous call would most likely time out by the time that manual intervention can recover the failed message. This would result in a failure returned to the original synchronous caller. If you raise the timeout on an asynchronous call within an SCA module, the original service client will likely time out on its own. Because of this, the best disposition of failed messages in FEM originating from synchronous interfaces is to discard such a message. All in all, asynchronous hops in synchronous requests are just not destined to be recoverable.

Figure 9 shows the detailed processing of a message in Process Server followed by some failure scenarios.

Figure 9. Message routing in case of component failure
Figure 9. Message routing in case of component failure

A JMS/MQ message flows within an SCA module as a deserialized business object. When the message arrives on a JMS (or MQ) Queue, it is delivered to the target component as follows:

  1. Message is picked by the JMS MDB of the Export component using the listener service .
  2. Aftere deserializaion and function selection are complete, the JMS MDB should put the message to the SIB destination for the SCA MDB to pick; however, for performance reasons, when a message is delivered from the destination queue to the target component, the MDB hop from JMS MDB to the SCA MDB is skipped the first time an attempt is made for the delivery of the message. The JMS MDB directly sends the message to the target component.
  3. In such a situation if an error occurs at the target component, the message is rolled back to the JMS MDB.
  4. The JMS MDB returns the message to the originating queue (JMS/MQ queue or SIB destination of the originating component).
  5. Depending on the number set for the maximum retry count on the queue, the message delivery is retried. This time JMS MDB sends the message to the SCA destination to be picked by the SCA MDB.
  6. SCA MDB is responsible for the delivery of the message to the target component.
  7. If a failure occurs, the message is rolled back to the SIB destination. MDB retries delivery as many times as specified by the Maximum Failed Deliveries and then removes it from the queue.

Synchronous services

For Synchronous services it is often impractical to recover these messages because of their short life span. If the service has a two-way (request-response) interface, the component that issued a synchronous call would most likely time out by the time that manual intervention to recover the failed message occurs. This would result in a failure returned to the original synchronous caller. If you raise the timeout on an asynchronous call within the SCA module, the original service client will likely time out on its own. Therefore, the best disposition of failed messages in FEM originated from synchronous interfaces is to solemnly discard such a message. All in all, asynchronous hops in synchronous requests are just not destined to be recoverable. So, critical business transactions are better served by an asynchronous model which is capable of providing the assured delivery of messages, and which has recovery mechanisms.

Common Events Infrastructure (CEI)

The Common Events Infrastructure (CEI) can help you troubleshoot message delivery problems. You can record details of messages as they traverse SCA components and later view this information in the CEI browser. However, CEI is not a recovery tool; you cannot use CEI to edit or resubmit messages. Failure information is separated from the original message content; that is, the information is part of a different CEI event.

You can define Common Base Events for each operation of the interface for a component except Export. You can also specify how much information to record with each event: None, Digest, or Full.

Figure 10. Configuring CEI to record full event content
Figure 10. Configuring CEI to record full event content

The information saved depends on the type of event. With full content, selecting Entry and Exit events records the contents of the business object. For a Failure event, failure (exception) details are recorded. Therefore, to completely investigate a component failure using CEI, you might need to enable and configure two types of events for full event content: Entry (for collecting message data) and Failure (for collecting exception information).

You can review CEI events using the Common Base Events browser application. To access it, log on to Process Server Integrated Services Console, and select Integration Applications => Common Base Events Browser.

Figure 11. Browsing Common Base Events generated by SCA
Figure 11. Browsing Common Base Events generated by SCA

The following example scenario a simple Credit Service that uses a one way interface to accept employee data, and uses maps and a Java component to return a credit score along with the manager's approval. A simple Java test component throws various exceptions. The intention of this sample project is to illustrate different possible failure conditions in an SCA module, and to give you a thorough understanding of the system behavior to enable recovery. Therefore, the module has been kept as simple as possible.

About the sample code

You can use the sample code in the download to test message routing and recovery with different MQ queue settings. The sample code requires WebSphere MQ Version 6.0.1 and WebSphere Process Server Version 6.0.2.

To set up the sample:

  1. Create default local queue manager named QM_nyconst60is listening on port 1414.
  2. Use MQSC script in file create_qs.txt to create local queues.
  3. Import into WebSphere Integration Developer V6.0.2 or above as Project Interchange.
  4. Change MQ_INSTALL_ROOT to point to WMQ installation as described in the Technote titled "java.lang.NoSuchFieldError: msgToken error occurs when trying to send message from WebSphere Application Server V6.0 to a WebSphere MQ V6 queue" at
  5. Use Test Component or Test Module features to test components in MessageSender module. If the boolean-typed message attribute shouldFail is set to true, the Java component in the receiving module will throw a RuntimeException. You can use the boolean-typed attribute dispatchAsynchronously to select synchronous or asynchronous intra-module invocation style.


The Authors would like to thank AJ Aronoff, Jonathan Machules, and Devi Gupta for their help in preparing this article.


When a system error occurs in an asynchronous SCA service, the failed message is not lost. Knowing where the message has been routed helps recover from system errors and completes the asynchronous service request. For messages that arrive via WebSphere SIB JMS, Failed Event Manager provides adequate investigation and recovery capabilities. If messages arrive directly from WebSphere MQ, full recovery would require using an MQ support pack, other MQ administration software or custom script.


Sample code in Proj Interchange format source.zip41 KB
MQSC script for creating sample queuesSIB_ME_stopped.zip3 KB
Example stack trace, SIB msg eng not availcreate_qs.zip1 KB



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Business process management on developerWorks

Zone=Business process management, WebSphere
ArticleTitle=Recovering from failed asynchronous SCA service invocations on WebSphere Process Server