Large messages normally contain lots of smaller individual messages. WebSphere Business Integration Message Broker is ideally suited to handle messages of this type, as the smaller messages can be parsed from within the larger message and propagated out individually. This means that the outer large message, which can be of the order of hundreds of megabyte, need not be fully parsed, thereby producing a bottleneck that can consume large amounts of resource.
Instead, a "message pointer" can move through the larger message building up the smaller messages that can then be propagated out when they have been successfully parsed. Once the smaller messages have been propagated out, the message tree that had been parsed in the larger message, can also be deleted, thus keeping memory usage constant and parsing can continue with the next message.
This technique relies on the fact that the Message Repository Manager (MRM), XML and XML namespaces (XMLNS) parsers provided by WebSphere Business Integration Message Broker are on-demand parsers, only parsing enough of the input message to satisfy the current request.
This article uses the Message processing technique described below to:
- Exploit the on-demand parsing capability inherent in the broker parsers.
- Exploit the ability to delete portions of the message tree that corresponds to previously parsed sections of the input message.
Simple message scenario using an XML-based model
Let's consider a generic case where large amounts of message data need to be processed into smaller manageable "chunks" to allow processing to continue in a timely manner. This allows an interactive sort of processing model rather than a batch processing model. A common scenario that exhibits the large message problem, that is, outer "envelope" messages that contain many inner messages, is within the X12 Electronic Data Interchange (EDI) message handling domain, http://www.x12.org/. See the Further examples section at the end of this article for a brief outline of this messaging standard and how these messages can be modeled in the MRM domain so that the technique described here can be used. Another applicable EDI messaging standard is Electronic Data Interchange For Administration Commerce and Transport (EDIFACT) due to the large message sizes encountered in this standard. For this article, I will use a much simpler message scenario using an XML-based model.
XML instance document
Here is an XML instance document that demonstrates an outer or envelope message that contains multiple inner messages.
<OuterMessage> <InnerMessage> --------- <MessageBody>...</MessageBody> |--- Repeatable </InnerMessage> --------- ... </OuterMessage> The OuterMessage has multiple children of InnerMessage.
Message processing technique
The example code snippets in this article relate to processing the XML messages defined above. The individual inner messages could be very large and memory usage during a parse can also be large.
The example code is written in Extended Structured Query Language (ESQL) and is executed by a Compute node. A Compute node takes as input a message tree (called InputRoot) and outputs a different message tree (called OutputRoot). Notice that InputRoot is read-only and therefore cannot be modified in place.
Figure 1. The message flow
Before we can start processing the outer message, we need to take the InputRoot and make it mutable. This allows us to destroy parts of the tree after we have successfully parsed them thus freeing up memory resource. We can do this by copying the input tree backed by the bitstream to the environment.
-- Copy the input tree, backed by the bitstream, to the environment -- Set a message pointer to this copied message tree SET Environment.Variables.InputRoot = InputRoot.XML; DECLARE InMessageCopy REFERENCE TO Environment.Variables.InputRoot;
Then we set a message pointer to the start of our input message ready for processing.
-- Shortcuts to our input and output message DECLARE InputMessage REFERENCE TO InMessageCopy;
All subsequent moves will then invoke the XML parser in partial parsing (or on-demand) mode. This means that the parser will only parse enough of the bitstream to fulfil the parsing requirement created by the ESQL statement.
For example, we could move the message pointer to the Outer message.
-- Move to the Outer Message MOVE InputMessage FIRSTCHILD NAME 'OuterMessage';
We could then access this part of the tree using ESQL as it has been parsed. If we try to access part of the tree after this node, then the parser will be called again, so we want to do this only when required.
Building the output message
Now we can move through our input message and build up our output message as we go.
-- Declare a variable which relates to the number of the Inner -- Message we are currently processing. This is useful for reporting -- the number of messages we have processed. DECLARE Inner_No INTEGER 0; -- Move onto the first Inner Message ready to begin processing them MOVE InputMessage FIRSTCHILD NAME 'InnerMessage'I; -- Check if we are at the root or at the end of the outer message WHILE LASTMOVE(InputMessage) DO CASE FIELDNAME(InputMessage) -- If we'Ire on a Inner Message element copy it to the -- output and propagate it WHEN 'IInnerMessage'I THEN -- Delete previously processed inner message block -- if there is one IF Inner_No>=1 THEN DELETE PREVIOUSSIBLING OF InputMessage; END IF; -- Increment the Inner Message counter SET Inner_No = Inner_No + 1; -- Create output message headers using -- the original message headers CALL CopyMessageHeaders(); -- Copy the Inner Message to the OutputRoot SET OutputRoot.XML.InnerMessage = InputMessage; -- Propagate the Inner message PROPAGATE; END CASE; MOVE InputMessage NEXTSIBLING; END WHILE;
The act of propagating a message clears the OutputRoot automatically, so there is no need to reset an OutputRoot after propagation.
Memory usage here is kept to a minimum. The DELETE PREVIOUSSIBLING statement frees the memory which contains the parsed information for that node of the tree. (In this case the previous InnerMessage) This is required because a parsed message tree takes up a lot more space than the bitstream which represents it.
These examples show the business models that you can model using the technique in this article.
The X12 standard uses a single large message that envelopes possibly several hundred smaller messages within Functional Groups (identified by tags GS and GE). Each Functional Group can also contain thousands of Transaction Set messages (identified by tags ST and SE).
Figure 2. Overview of the X12 model tags
The Outer Message here would be the ISA. The IEA Interchange Control and the Inner Message would be the ST and SE Transaction Sets.
Each ST and SE section represents an individual message that needs to be processed. Normally a Functional Group contains messages of a similar type that require similar processing.
By using this method, you can stop the common parsing bottle neck in your processing as you try to resolve an entire outer message at one point in time.
Instead, you can propagate a steady stream of smaller messages along the message flow where they can be processed. You can also delete post processed nodes from the message tree to reduce the memory usage of your message flows.
- Find WebSphere Business Integration resources on the WebSphere Business Integration zone.
- Get involved in the developerWorks community by participating in developerWorks blogs.
- Join the WebSphere Business Integration Message Broker discussion forum.