Handling large files with WebSphere Transformation Extender
A common middleware requirement in financial institutions is the ability to receive a batch of transaction information from a file input. These files, or bundles, vary in size and can become very large, containing several megabytes or even gigabytes of data. The term transaction can refer either to a financial transaction or to a messaging Unit of Work, so for the remainder of this document we'll refer to each unit of data within the inbound file as a record. The parsing, validation and transformation of each record can usually be executed independently, but for the purposes of cross-validation (for example, invoice subtotals, check-sum calculations, and so on), it's useful to maintain a method of ordering of the records from the original file.
This article describes how you can split an inbound file into constituent records using the WebSphere Transformation Extender (hereafter called Transformation Extender) WebSphere MQ adapter. The Transformation Extender map does the following:
- Receives an XML file
- Decomposes the file into separate records
- Creates separate WebSphere MQ messages, each with an XML format body
- Formats the MQ messages as members of a single MQ message group
- Puts the MQ messages on an output queue
The basic concept covered in this scenario could be extended to handle different kinds of XML formats, which could include repeating sections that contain several subelements and could also start the repeats at different nested levels within the input XML document.
Create the Type tree
A Transformation Extender Type tree provides metadata that describe the structure of data, both inbound and outbound, to a Transformation Extender map. Each mapâs input card(s) and output card(s) has an associated Type tree and type. The scenario in this article requires two Transformation Extender Type trees. You can generate a Type tree by importing an XML schema into the Type Designer.
One of the Type trees (mq.mtt) is supplied as a sample in the download package provided with this article. We'll generate the other Type tree (schema1.mtt) via an import from an XML schema:
To create the Type tree, complete the following steps:
- From the Windows Start menu, select IBM WebSphere Transformation Extender 8.1 => Design Studio => Type Designer.
- In the Type Designer, select Tree => Import, as shown in Figure 1:
Figure 1. Select Tree => Import
- In the Importer Wizard, select Import from XML Schema, as shown in Figure 2, and click Next. If you have installed other Industry packs such as the SAP Pack, or Web Services Pack, you may see additional import options in this list.
Figure 2. Select Import from XML Schema
- Select the file Schema1.xsd, which is provided with the attached download, and click Next, as shown in Figure 3:
Figure 3. Select Schema1.xsd
- In the next dialog, you'll be prompted to choose the national language for the execution time data. Select Western (the default), and click Next.
- In the next dialog, leave the default setting for File Name. Make sure that the path specified is the location where you unzipped the download package included with this article. Ensure that Xerces is selected for Validation, as shown in Figure 4, then click Next.
Figure 4. Select validation method
- You should see a Type tree representation of the XML schema, with no errors or warnings, as shown in Figure 5. Click Finish to complete the import.
Figure 5. Completed Type tree
- You'll be prompted to open the generated Type tree. Click Yes.
The Type tree you just generated describes the format of the data that the Transformation Extender map is designed to receive from an inbound file. You'll find an example input file, Input.xml, in the download package, as shown below:
<?xml version="1.0"?> <Chunk> <Item Att1="1" Att2="2" Att3="3" Att4="4" Att5="5"/> <Item Att1="A" Att2="B" Att3="C" Att4="D" Att5="E"/> <Item Att1="one" Att2="two" Att3="three" Att4="four" Att5="five"/> </Chunk>
The XML element called
Item can occur an unlimited number of times. The first file we'll use to test the map function contains only three repeats, but Transformation Extender can support a file size up to 2GB.
While we still have the Type Designer open, let's examine the second Type tree that is used in the scenario:
- Select File => Open and select the mq.mtt file, provided in the download package, then click Open, as shown in Figure 6:
Figure 6. Select mq.mtt
- Take a look at the Type tree, shown in Figure 7:
Figure 7. mq.mtt Type tree
This Type tree is very similar to the Type tree shipped with the Transformation Extender samples for the WebSphere MQ adapter, but we've extended it for this scenario to include an item named Count. Count is used by two input cards of the functional map in the scenario to carry data about the number of repeats in the inbound file, and the index of the current iteration. These values are required when setting values in the MQMD header of the output message. We'll discuss this in more detail later in the article.
Configure the map
Now that you've prepared the Type tree metadata, you can take a look at the configuration of the Transformation Extender map, which receives the XML file as input, and generates separate MQ messages as output. To do this, complete the following steps:
- From the Windows Start menu, select IBM WebSphere Transformation Extender 8.1 => Design Studio => Map Designer.
- In Map Designer, selectFile => Open, then select XMLMap.mms, the map file supplied with the download package, and click Open. The map is displayed in the editor, as shown in Figure 8:
Figure 8. Map file in editor
The overall map contains one input card (InCard), and three output cards (OutCard_MQMDNormGroupMember, OutCard_MQMDLastGroupMember, and OutCard_GenerateMessages). The purpose of the first two output cards is to represent two versions of an MQMD header. The third output card generates the output MQ messages by invoking the functional map F_CreateOutputMessage. In programming terms, a functional map is like a subroutine; it maps a portion of data at a time. A functional map can used like a function, taking one or more input objects and generating one output object. In this example, the functional map is called in order to generate each output MQ message.
- Each map's input card nominates Type tree, Type and Source information. This is the minimal configuration required in order to run a map; however, there are many additional configuration options that can be configured on each card to change the map's behavior. Each map's output card nominates Type tree, Type and Target information. Right-click a card and select Edit in order to view and change these properties, as shown in Figure 9. If you unzipped the download package to a location other than C:\, you need to edit the settings for the Type tree and FilePath values to point to the correct location.
Figure 9. Edit Input Card
- The following two tables show the settings for each of the map's cards.
For the main executable map, Map:
CardName TypeTree Type FilePath InCard Schema1.mtt Chunk Element XSD C:\SampleFiles\Input.xml OutCard_MQMDNormGroupMember mq.mtt WithMD2 Generic Message MQSeries C:\SampleFiles\MQMD1.txt OutCard_MQMDLastGroupMember mq.mtt WithMD2 Generic Message MQSeries C:\SampleFiles\MQMD2.txt OutCard_GenerateMessages mq.mtt WithMD2 Generic Message MQSeries C:\SampleFiles\Output.xml
For the functional map, F_CreateOutputMessage:
CardName TypeTree Type FilePath In1 Schema1.mtt Item Element XSD <BLANK> In2 mq.mtt WithMD2 Generic Message MQSeries <BLANK> In3 mq.mtt WithMD2 Generic Message MQSeries <BLANK> IndexNumber mq.mtt MsgSeqNumber MQLONG Field MQSeries <BLANK> CountNumber mq.mtt Count MQSeries <BLANK> Out mq.mtt TextItem MQSeries <BLANK>
Check that the card properties for each input card and output card for the map and functional map, match the entries in these tables. Note that the FilePath properties for the input and output cards of the functional map are intentionally blank, because they're not needed. The data is returned to the calling map, and does not need to be written to a file or other output. You should also note that each of the output cards of the main map has its target set to
File. These output files are generated only for debugging reasons in order to demonstrate the functional purpose of each card in the map. In a production circumstance, where there is no requirement for the writing of these output files, you can change the File => Transaction On Success setting to
!Create. This setting is preferred to assigning a target of
Sink, which can, in some circumstances, have negative effects on performance.
Investigate the map in detail
When dealing with grouped messages in MQ, the MQMD header contains three separate fields that require values in order to help associate distinct physical MQ messages as members of the same logical group:
The GroupId field is a byte string that identifies the particular message group or logical message to which the physical message belongs. GroupId is also used if segmentation is allowed for the message. When a message is too big for a MQ queue, an attempt to put the message on the queue will usually fail. Segmentation is a technique in which the queue manager or application splits the message into smaller pieces, called segments, and places each segment on the queue as a separate physical message. Our scenario does not require message segmentation, so we won't discuss this MQ feature further.
The MsgFlags field is divided between two uses: denoting segmentation and status. In other words, the MsgFlags field of a physical message indicates whether it belongs to a message group, is a segment of a logical message, both, or neither. Our scenario does not require message segmentation, but we do use the MsgFlags property to record each outbound message's group status. MQ defines two constant values that are used in the MsgFlags field to determine whether the message is a normal member of a message group (MQMF_MSG_IN_GROUP=8), or the very last member of the message group (MQMF_LAST_MSG_IN_GROUP=16).
The MsgSeqNumber field contains a sequence number of a logical message within a MQ message group. Sequence numbers start at 1, and increase by 1 for each new logical message in the group, up to a maximum of 999, 999 999. A physical message that is not in a group has a sequence number of 1.
As previously stated, the purpose of the first two output cards is to represent two versions of an MQMD. Selecting output card 1 (OutCard_MQMDNormGroupMember) displays a series of fixed values that are set for fields of the MQMD header in circumstances where the message is not the last member of a group, as shown in Figure 10.
Figure 10. OutCard_MQMDNormGroupMember values when not last
Note in particular the settings for the GroupId, MsgSeqNumber and MsgFlags properties. The MsgSeqNumber value (statically set to 1) is replaced in the final output message by the relevant iteration number, when the Functional Map is called from the Output card 3. The MsgFlags value (statically set to 8) specifies that the message is a member of a MQ message group.
Selecting output card 2 (OutCard_MQMDLastGroupMember) displays a series of fixed values that are set for fields of the MQMD header in circumstances where the message is the last member of a group, as shown in Figure 11:
Figure 11. OutCard_MQMDLastGroupMember values when last
Note in particular the settings for the GroupId, MsgSeqNumber and MsgFlags properties. The MsgSeqNumber value (statically set to 1) is replaced in the final output message by the relevant iteration number, when the functional map is called from output card 3. The MsgFlags value (statically set to 16) specifies that the message is the last member of a MQ message group.
Selecting output card 3 (OutCard_GenerateMessages), then clicking on the row called TextItem(s), displays the invocation of the functional map, as shown in Figure 12:
Figure 12. Functional map invocation
The functional map F_CreateOutputMessage takes five arguments. These arguments are mapped in the same order to the functional map''s five input cards. The functional map is passed references to the input card InCard, which represents the inbound file, and also the first two output cards (which have already been built by the time the third output card is evaluated) representing the MQMD structures which we've already discussed. The final two parameters that are passed to the functional map are an index number and a count number. The index specifies the number of the current iteration through the repeating elements and the count specifies the total number of repeating elements that were present in the original input file. These final two values are used in an IF clause, which determines which of the two MQMD structures needs to be used by the PUT rule to the MQ adapter when writing output messages to the final output queue.
The functional map (F_CreateOutputMessage) looks like this:
Figure 13. Functional map
Depending on the evaluation of the IF statement discussed above, the functional map executes one of the two PUT rules to the target Transformation Extender MQ server adapter. The first argument of the PUT rules is the value
MQS, which specifies that the data should be routed to the MQ server adapter. The next two arguments specify a Queue Manager Name (QM1) and a Queue Name (WTX.OUT) where the messages are to be written. The final argument passed to the PUT rule is made by concatenating the output from several functions. The resulting argument provides the data that the MQ adapter uses to form the output MQ message. The purpose of the LEFT and RIGHT functions is to insert the relevant MsgSeqNumber in the correct position in the MQMD header. The MsgSeqNumber field itself is 4 bytes long, and occurs at an offset of 348 bytes into the MQMD, as shown in Figure 14:
Figure 14. MsgSeqNumber field
Compile and test the map
Having investigated the map source, you can now use the Map Designer to compile the map:
- To compile the map, click the shortcut button (highlighted with the red circle in Figure 15):
Figure 15. Compile the map
- In order to test the map, you need to have installed WebSphere MQ and created a queue manager and local queue. If you've followed the naming convention in the sample files supplied with this article, create a queue manager named QM1 and a local queue named WTX.OUT as follows:
- To create the queue manager, enter the following from a command prompt:
- To start the queue manager, enter the following from a command prompt:
- To start a runmqsc session, enter the following from a command prompt:
- To define the local queue, in the runmqsc session, enter
- To create the queue manager, enter the following from a command prompt:
- Once compiled, you can also run the map for testing purposes from within Map Designer. To run the map, click the shortcut button (highlighted with the red circle in Figure 16):
Figure 16. Run the map
- Once the map has completed successfully, you'll see the following message:
Figure 17. Command server message
- Once the map has completed successfully, examine the output messages on the WebSphere MQ queue named WTX.OUT. There should be three output messages on the queue. The samples directory contains a small graphical application, RFHUtil, that you can use to view the output messages. You can also download this application from Message Broker SupportPac IH03.
Start RFHUtil.exe, and specify the Queue Manager Name and Queue Name, as shown in Figure 18:
Figure 18. Specify queue information
- Click Read Q to take the first message from the queue, then switch to the MQMD tab, to display the MQ message's MQMD header:
Figure 19. MQMD header
Note that the values have been set according to the values coded in the Map Designer. In particular, note that the GroupId has been set correctly, and that the first message read from the queue has a MsgSeqNumber with a value 1. The MsgFlags property indicates that the first message is a member of a group, so the Yes box in the Group property has been checked.
- Switch back to the Main tab, and click Read Q until you have taken all the messages from the queue. Examine the MQMD header of the final message in the group and you should see that, because of the IF statement in the Transformation Extender map, the MsgFlags property of the final message in the group has been set accordingly. For this reason the Last box has been checked, as shown in Figure 20:
Figure 20. MQMD header - final message
You can feel free to experiment with files of different formats and sizes. The download package includes another sample input file named Input6000Items.xml that you can use to test the map. Before testing, make sure that the maximum queue depth of the WTX.OUT queue is large enough to accept 6000 messages.
The article has shown you how to use a Transformation Extender map to split incoming files into separate MQ messages, so that they can more easily be processed by a consuming application, using the MQ API, or even another IBM software product such as WebSphere Message Broker. This technique will be especially valuable for those of you interested in conducting batch file operations. You can easily adapt this scenario to the financial sector, for example, to help you handle batches of invoices or SEPA (Single European Payments Area) transactions.
- WebSphere MQ product information: Find out more about WebSphere MQ features and specifications.
- WebSphere MQ product library: Get complete product documentation.
- WebSphere Transformation Extender product information: Find about more about WebSphere Transformation Extender features and specifications.
- WebSphere Transformation Extender product library: Get complete product documentation.
- File processing options with WebSphere Message Broker File Extender: Learn about other file processing options available with WebSphere Message Broker File Extender.
- Message Broker SupportPac IH03: WebSphere Message Broker V6 message display, test and performance utilities, including RFHUtil, which is used to test the scenario in this article.