Handling large MRM messages

When an input bit stream is parsed, and a logical tree is created, the tree representation of an MRM message is typically larger, and in some cases much larger, than the corresponding bit stream.

About this task

The reasons for this large size include:
  • The addition of the pointers that link the objects together
  • Translation of character data into Unicode that can double the original size
  • The inclusion of field names that can be contained implicitly within the bit stream
  • The presence of control data that is associated with the integration node's operation

Manipulation of a large message tree can, therefore, demand a great deal of storage. If you design a message flow that handles large messages that are made up of repeating structures, you can code specific ESQL statements that help to reduce the storage load on the integration node. These statements support both random and sequential access to the message, but assume that you do not need access to the whole message at one time.

These ESQL statements cause the integration node to perform limited parsing of the message, and to keep only that part of the message tree that reflects a single record in storage at a time. If your processing requires you to retain information from record to record (for example, to calculate a total price from a repeating structure of items in an order), you can either declare, initialize, and maintain ESQL variables, or you can save values in another part of the message tree, for example LocalEnvironment.

This technique reduces the memory that is used by the integration node to that needed to hold the full input and output bit streams, plus that required for one record's trees. It provides memory savings when even a few repeats are encountered in the message. The integration node makes use of partial parsing, and the ability to parse specified parts of the message tree, to and from the corresponding part of the bit stream.

To use these techniques in your Compute node apply these general techniques:
  • Copy the body of the input message as a bit stream to a special folder in the output message. This creates a modifiable copy of the input message that is not parsed and therefore uses a minimum amount of memory.
  • Avoid any inspection of the input message; this avoids the need to parse the message.
  • Use a loop and a reference variable to step through the message one record at a time. For each record:
    • Use normal transforms to build a corresponding output subtree in a second special folder.
    • Use the ASBITSTREAM function to generate a bit stream for the output subtree that is stored in a BitStream element, placed in the position in the tree, that corresponds to its required position in the final bit stream.
    • Use the DELETE statement to delete both the current input and the output record message trees when you complete their manipulation.
    • When you complete the processing of all records, detach the special folders so that they do not appear in the output bit stream.

You can vary these techniques to suit the processing that is required for your messages. The following ESQL provides an example of one implementation.

The ESQL is dependent on a message set called LargeMessageExanple that has been created to define messages for both the Invoice input format and the Statement output format. A message called AllInvoices has been created that contains a global element called Invoice that can repeat one or more times, and a message called Data that contains a global element called Statement that can repeat one or more times.

The definitions of the elements and attributes have been given the correct data types, therefore, the CAST statements used by the ESQL in the XML example are no longer required. An XML physical format with name XML1 has been created in the message set which allows an XML message corresponding to these messages to be parsed by the MRM.

When the Statement tree is serialized using the ASBITSTREAM function the Message Set, Message Type, and Message Format are specified as parameters. The Message Type parameter contains the path from the message to the element being serialized which, in this case, is Data/Statement because the Statement element is a direct child of the Data message.

The input message to the flow is the same Invoice example message used in other parts of the documentation except that it is contained between the tags:
       <AllInvoices> ....  </AllInvoices>  

Example

CREATE COMPUTE MODULE LargeMessageExampleFlow_Compute
	CREATE FUNCTION Main() RETURNS BOOLEAN
	BEGIN
		CALL CopyMessageHeaders();
  		-- Create a special folder in the output message to hold the input tree
  		-- Note : SourceMessageTree is the root element of an MRM parser
  		CREATE LASTCHILD OF OutputRoot.MRM DOMAIN 'MRM' NAME 'SourceMessageTree';

  		-- Copy the input message to a special folder in the output message
  		-- Note : This is a root to root copy which will therefore not build trees
  		SET OutputRoot.MRM.SourceMessageTree = InputRoot.MRM;

  		-- Create a special folder in the output message to hold the output tree
  		CREATE FIELD OutputRoot.MRM.TargetMessageTree;

  		-- Prepare to loop through the purchased items
  		DECLARE sourceCursor REFERENCE TO OutputRoot.MRM.SourceMessageTree.Invoice;
  		DECLARE targetCursor REFERENCE TO OutputRoot.MRM.TargetMessageTree;
  		DECLARE resultCursor REFERENCE TO OutputRoot.MRM;
  		DECLARE grandTotal   FLOAT     0.0e0;

  		-- Create a block so that it's easy to abandon processing
  		ProcessInvoice: BEGIN
    	-- If there are no Invoices in the input message, there is nothing to do
    	IF NOT LASTMOVE(sourceCursor) THEN
      		LEAVE ProcessInvoice;
    	END IF;

    	-- Loop through the invoices in the source tree
    	InvoiceLoop : LOOP
      		-- Inspect the current invoice and create a matching Statement
      		SET targetCursor.Statement =
        		THE ( 
          			SELECT
            			'Monthly'                        AS Type,
            			'Full'                           AS Style,
            			I.Customer.FirstName             AS Customer.Name,
            			I.Customer.LastName              AS Customer.Surname,
            			I.Customer.Title                 AS Customer.Title,
            			(SELECT 
              				FIELDVALUE(II.Title)  		AS Title,
              				II.UnitPrice * 1.6 			AS Cost,
              				II.Quantity                 AS Qty
            			FROM I.Purchases.Item[] AS II 
            			WHERE II.UnitPrice> 0.0                ) AS Purchases.Article[],
            			(SELECT
              				SUM( II.UnitPrice * 
                   			     II.Quantity  *
                   				 1.6                          )
            				FROM I.Purchases.Item[] AS II                      ) AS Amount,
            				'Dollars'                                  AS Amount.Currency
          				FROM sourceCursor AS I
          				WHERE I.Customer.LastName <> 'White'
       				 );

      		-- Turn the current Statement into a bit stream
      		-- The SET parameter is set to the name of the message set 
         -- containing the MRM definition
      		-- The TYPE parameter contains the path from the from the message 
         -- to element being serialized
      		-- The FORMAT parameter contains the name of the physical format 
         -- name defined in the message   
      		DECLARE StatementBitStream BLOB
        		ASBITSTREAM(targetCursor.Statement 
        			OPTIONS FolderBitStream
        			SET 'LargeMessageExample'
        			TYPE 'Data/Statement'
        			FORMAT 'XML1');
                
      		-- If the SELECT produced a result (that is, it was not filtered
         -- out by the WHERE clause), process the Statement
      		IF StatementBitStream IS NOT NULL THEN
        		-- create a field to hold the bit stream in the result tree
        		-- The Type of the element is set to MRM.BitStream to indicate
           -- to the MRM Parser that this is a bitstream 
        		CREATE LASTCHILD OF resultCursor
          			Type  MRM.BitStream
          			NAME  'Statement'
          			VALUE StatementBitStream;

        		-- Add the current Statement's Amount to the grand total
        		SET grandTotal = grandTotal + targetCursor.Statement.Amount;
      		END IF;

      		-- Delete the real Statement tree leaving only the bit stream version
      		DELETE FIELD targetCursor.Statement;

      		-- Step onto the next Invoice, removing the previous invoice and any
      		-- text elements that might have been interspersed with the Invoices
      		REPEAT
        		MOVE sourceCursor NEXTSIBLING;
        		DELETE PREVIOUSSIBLING OF sourceCursor;
      		UNTIL (FIELDNAME(sourceCursor) = 'Invoice') 
               OR (LASTMOVE(sourceCursor) = FALSE)
      		END REPEAT;

      		-- If there are no more invoices to process, abandon the loop
      		IF NOT LASTMOVE(sourceCursor) THEN
        		LEAVE InvoiceLoop;
      		END IF;

    	END LOOP InvoiceLoop;
  		END ProcessInvoice;

  		-- Remove the temporary source and target folders
  		DELETE FIELD OutputRoot.MRM.SourceMessageTree;
  		DELETE FIELD OutputRoot.MRM.TargetMessageTree;

  		-- Finally add the grand total
  		SET resultCursor.GrandTotal = grandTotal;
  		
  		-- Set the output MessageType property to be 'Data'
  		SET OutputRoot.Properties.MessageType = 'Data'; 
 
		RETURN TRUE;
	END;

	CREATE PROCEDURE CopyMessageHeaders() BEGIN
		DECLARE I INTEGER 1;
		DECLARE J INTEGER CARDINALITY(InputRoot.*[]);
		WHILE I < J DO
			SET OutputRoot.*[I] = InputRoot.*[I];
			SET I = I + 1;
		END WHILE;
	END;

END MODULE;