Reducing memory usage in WebSphere Business Integration Message Broker

Splitting large messages using ESQL in WebSphere Business Integration Message Broker

Messages are routed and transformed by message flows within WebSphere Business Integration Message Broker. As a message passes through the message flow, it is manipulated by message processing nodes typically using a language called Extended Structured Query Language. The message is in the form of a message tree, which is created from the input message bitstream, and serialized into an output bitstream, by a parser. There are several parsers supplied by the broker, the most common being Message Repository Manager, XML and XML namespaces.

When you handle large messages within the WebSphere Business Integration Message Broker, memory used for storing the message tree and parsing time can become expensive if you use the wrong techniques.

This article provides a proven method by which you can quickly and efficiently parse a large message and propagate out smaller messages in a sequence. Memory usage is also addressed so that the minimum of physical resource is used during a parse. The method described is suitable for use with the Message Repository Manager, XML and XML namespaces parsers.

Dominic Storey ( DSTOREY@uk.ibm.com), Software Engineer, WebSphere Message Broker Development team, IBM Hursley

Dominic Storey is a software engineer at IBM Hursley Lab in the United Kingdom. He is currently works with the Message Broker Tooling development team. Previously he has also worked within the Message Broker development team and on the MQSeries development team. He has been employed by IBM since 1997.



25 May 2005

Introduction

Large messages normally contain lots of smaller individual messages. WebSphere Business Integration Message Broker is ideally suited to handle messages of this type, as the smaller messages can be parsed from within the larger message and propagated out individually. This means that the outer large message, which can be of the order of hundreds of megabyte, need not be fully parsed, thereby producing a bottleneck that can consume large amounts of resource.

Instead, a "message pointer" can move through the larger message building up the smaller messages that can then be propagated out when they have been successfully parsed. Once the smaller messages have been propagated out, the message tree that had been parsed in the larger message, can also be deleted, thus keeping memory usage constant and parsing can continue with the next message.

This technique relies on the fact that the Message Repository Manager (MRM), XML and XML namespaces (XMLNS) parsers provided by WebSphere Business Integration Message Broker are on-demand parsers, only parsing enough of the input message to satisfy the current request.

This article uses the Message processing technique described below to:

  • Exploit the on-demand parsing capability inherent in the broker parsers.
  • Exploit the ability to delete portions of the message tree that corresponds to previously parsed sections of the input message.

Simple message scenario using an XML-based model

Let's consider a generic case where large amounts of message data need to be processed into smaller manageable "chunks" to allow processing to continue in a timely manner. This allows an interactive sort of processing model rather than a batch processing model. A common scenario that exhibits the large message problem, that is, outer "envelope" messages that contain many inner messages, is within the X12 Electronic Data Interchange (EDI) message handling domain, http://www.x12.org/. See the Further examples section at the end of this article for a brief outline of this messaging standard and how these messages can be modeled in the MRM domain so that the technique described here can be used. Another applicable EDI messaging standard is Electronic Data Interchange For Administration Commerce and Transport (EDIFACT) due to the large message sizes encountered in this standard. For this article, I will use a much simpler message scenario using an XML-based model.

XML instance document

Here is an XML instance document that demonstrates an outer or envelope message that contains multiple inner messages.

<OuterMessage>
<InnerMessage>					---------
<MessageBody>...</MessageBody> 		 |--- Repeatable
</InnerMessage>					---------
...
</OuterMessage>

The OuterMessage has multiple children of InnerMessage.

Message processing technique

The example code snippets in this article relate to processing the XML messages defined above. The individual inner messages could be very large and memory usage during a parse can also be large.

The example code is written in Extended Structured Query Language (ESQL) and is executed by a Compute node. A Compute node takes as input a message tree (called InputRoot) and outputs a different message tree (called OutputRoot). Notice that InputRoot is read-only and therefore cannot be modified in place.

Figure 1. The message flow
The message flow

Before we can start processing the outer message, we need to take the InputRoot and make it mutable. This allows us to destroy parts of the tree after we have successfully parsed them thus freeing up memory resource. We can do this by copying the input tree backed by the bitstream to the environment.

-- Copy the input tree, backed by the bitstream, to the environment
-- Set a message pointer to this copied message tree
SET Environment.Variables.InputRoot = InputRoot.XML;
DECLARE InMessageCopy REFERENCE TO Environment.Variables.InputRoot;

Then we set a message pointer to the start of our input message ready for processing.

-- Shortcuts to our input and output message
DECLARE InputMessage REFERENCE TO InMessageCopy;

All subsequent moves will then invoke the XML parser in partial parsing (or on-demand) mode. This means that the parser will only parse enough of the bitstream to fulfil the parsing requirement created by the ESQL statement.

For example, we could move the message pointer to the Outer message.

-- Move to the Outer Message
MOVE InputMessage FIRSTCHILD NAME 'OuterMessage';

We could then access this part of the tree using ESQL as it has been parsed. If we try to access part of the tree after this node, then the parser will be called again, so we want to do this only when required.

Building the output message

Now we can move through our input message and build up our output message as we go.

-- Declare a variable which relates to the number of the Inner 
-- Message we are currently processing. This is useful for reporting 
-- the number of messages we have processed.
DECLARE Inner_No INTEGER 0;

-- Move onto the first Inner Message ready to begin processing them
MOVE InputMessage FIRSTCHILD NAME 'InnerMessage'I;

-- Check if we are at the root or at the end of the outer message
WHILE LASTMOVE(InputMessage)
DO
CASE FIELDNAME(InputMessage)
		-- If we'Ire on a Inner Message element copy it to the 
-- output and propagate it
		WHEN 'IInnerMessage'I THEN
-- Delete previously processed inner message block -- if there is one
			IF Inner_No>=1 THEN
				DELETE PREVIOUSSIBLING OF InputMessage;
			END IF;

			-- Increment the Inner Message counter
			SET Inner_No = Inner_No + 1;

-- Create output message headers using 
-- the original message headers
			CALL CopyMessageHeaders();

			-- Copy the Inner Message to the OutputRoot
			SET OutputRoot.XML.InnerMessage = InputMessage;
	
-- Propagate the Inner message
PROPAGATE;
	END CASE;
MOVE InputMessage NEXTSIBLING;
END WHILE;

The act of propagating a message clears the OutputRoot automatically, so there is no need to reset an OutputRoot after propagation.

Memory usage here is kept to a minimum. The DELETE PREVIOUSSIBLING statement frees the memory which contains the parsed information for that node of the tree. (In this case the previous InnerMessage) This is required because a parsed message tree takes up a lot more space than the bitstream which represents it.


Further examples

These examples show the business models that you can model using the technique in this article.

The X12 standard uses a single large message that envelopes possibly several hundred smaller messages within Functional Groups (identified by tags GS and GE). Each Functional Group can also contain thousands of Transaction Set messages (identified by tags ST and SE).

Figure 2. Overview of the X12 model tags
Overview of the X12 model tags

The Outer Message here would be the ISA. The IEA Interchange Control and the Inner Message would be the ST and SE Transaction Sets.

Each ST and SE section represents an individual message that needs to be processed. Normally a Functional Group contains messages of a similar type that require similar processing.


Conclusion

By using this method, you can stop the common parsing bottle neck in your processing as you try to resolve an entire outer message at one point in time.

Instead, you can propagate a steady stream of smaller messages along the message flow where they can be processed. You can also delete post processed nodes from the message tree to reduce the memory usage of your message flows.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=83864
ArticleTitle=Reducing memory usage in WebSphere Business Integration Message Broker
publish-date=05252005