Unexpected application errors

Unexpected errors may occur due to a logic error in the code of an FTM application. While every effort is made to eliminate all such errors during the development and test cycle, it is still necessary to handle any such errors as gracefully as possible.

An unexpected error in a message flow that results in an exception can be caught by an earlier TryCatch node, or ultimately, by the catch and failure terminals of an MQInput node.

Each of the top level message flows in the Flow Coordinator has a generic error handler wired to the input node of the flow.

It is the responsibility of the generic error handler to raise the E_UnexpectedError event to log the details of the exception (it does this out of transaction so the logging will not be rolled back), then re-throw the exception. The exception is then caught by the standard exception handling of the integration bus, which rolls back the transaction.

The integration bus then retries the message flow. To prevent an infinite loop of retries, one of following techniques, depending on configuration, may be used:

The MQ configuration of the queue may have a BackoutThreshold, which specifies the maximum number of times a message is retried before the message is routed to the failure terminal.
The message flow can check the RetryCount field of the MQMD header. If this is set, and an error has already been logged, the message can be ignored.

When logging an error, the generic error handler logs the E_UnexpectedError event to the event log as an out of transaction update so the logging is not rolled back. This event contains an XML rendering of the integration bus exception tree that contains the full details of the exception from the exception list. The event also records the context information from the environment tree so there is as much information as possible about the application data that was being processed when the error occurred.

In order to facilitate the collection of context data, a subtree named Environment/PMP/Variables is managed by the FTM message flows. Initial context information is set by the flow coordinator; the event processing flow adds the identity of the event that is being processed, the transition row that has been matched for that event, and the name of the associated action.

The application flows may add to this information. When processing an action for an event that is associated with a number of objects that are passed in the environment result set, the current object should be identified.

The unexpected error event may be viewed in the Operations and Administration Console.

If it still fails after the message has been retried the designated number of times, the MQInput node of the integration bus either writes the message to a designated backout (or dead letter) queue or, if the failure terminal is wired up, allows the failure flow to handle the error. It is good practice to wire up the failure terminal to handle such an error. The sample application performs final error handling by including a trace node, followed by outputting the message to a failure queue. For a live application, it is recommended to have separate failure queues for each input source and the event queue.

Once processing has stopped due to an unexpected error, detailed problem analysis needs to be undertaken. The problem needs to be resolved and, once this is done, there needs to be an investigation into how the objects that were stopped in a certain state in the business process can be restarted. Several methods to continue processing and bring the process to a conclusion are available. Some examples are:

It may be possible to copy the failed message from the failure terminal to the appropriate input terminal. As the problem is now resolved, processing should continue as expected.
In some cases, it might be possible to manually create an event of the required type to restart processing.
It might be possible to move objects into a new state at the database level.

All of these methods are labor intensive, and require an in-depth knowledge of the business process and the implemented solution. For that reason, it is important to account for all expected errors in the FSM design in order to make sure that they can easily be restarted.