How many times have you witnessed this scenario?
Developer: "So the quantity is always stored in the OQTY field?"
Administrator: "That's right. Just pick it up from there."
Developer: (Later) "Could you help me out? These lines have no quantity?"
Administrator: "Oh, yeah! That one is an exception. It comes from the factory and they use a different system."
Welcome to the marvelous world of data mapping, where every rule has an exception. This article is part of the lightweight XML client series, in the Working XML column. Previous installments have focused on Java programming, the APIs, and the tools available to build such a client. While the last article introduced a first version of the lightweight client -- in which the client uses stylesheets and XI to prepare and send the XML transactions -- in this article, I'll show you how to write the stylesheets.
In case you missed the previous articles in this series, let me emphasize again the specific bias of this project: It's about a simplistic solution to very simple needs. You can find many excellent high-end XML servers -- both commercial (such as IBM WebSphere) and open source.. The products in this class excel at processing a large volume of XML transactions for e-business or enterprise application integration (eAI).
This project aims to complement the servers with a low-key client for B2B e-business. Increasingly, B2B e-business takes the form of XML transactions -- ordering goods usually involves sending an XML message to the XML server of a supplier. The supplier delivers the goods (at least, that is the hope), but she may also reply with a few XML messages, such as delivery information, order status, and ultimately an invoice.
To deploy such a system, it should be obvious that the two parties (buyer and supplier) must be able to prepare and process XML transactions. How do they do that? Well, by installing an XML server at both locations, of course.
Wrong answer!
Not every organization is capable of installing and maintaining an XML server, even if they would benefit from supporting B2B e-business. Simply put, the maintenance of a server is too complex and too costly unless the company expects to process a significant volume of transactions. Take the following scenarios:
- I started my career in the insurance business. Most independent brokers (in Belgium, at least) are small businesses that cannot afford to run complex software. Yet they benefit enormously from being able to process new contracts and claims electronically. It makes them more competitive (faster service, better answers, new products), and it lowers the cost for the insurance company, which translates into higher margins or more leeway when negotiating prices.
- The same pattern holds for many other industries: In the financial sector, real estate, newsstands, independent bookstores, legal services, health care, and on and on. Many industries face the issue of deploying B2B e-business to independent professionals.
- This applies to e-government as well. Governments need to collect lots of information, but the agencies and businesses responsible for providing this information may not be able to process XML transactions.
- This scenario happens in the stereotypical e-business example -- the buyer/supplier relationship. One of the two partners may be a small business that cannot afford to run complex XML software. Often, companies that deploy a B2B e-business strategy find that they can quickly sign around 20 partners and then the efforts stall because none of their other partners can commit to a complex solution.
Essentially, whenever you try to install a B2B e-business solution in an environment where the partners are of different sizes, you face the issue of finding appropriate tools for the smaller partners (as I have already explained, finding high-end tools for the larger partners is comparatively simpler), which ultimately justifies my interest in SOAP. Earlier remote procedure call (RPC) standards were based on a programming model where a developer wrote specific procedure calls. SOAP supports that model, but it's also XML-over-the-wire, which means that any tool that generates XML is a candidate to prepare transactions.
More specifically, in the context of the lightweight XML client, by combining XI (another project from this column -- see Resources) with XSLT, any tools that can export data into a text file can prepare SOAP transactions. In my experience, most of the business-management tools used by small businesses can export to text files, if only to integrate with an office suite. The lightweight XML client takes advantage of this option.
Different applications export data in different formats, so some work is needed to re-organize the data for the SOAP transaction. This process is called data mapping because you are mapping information from the export file into the XML document. Unfortunately, the mapping is different for each application.
In a recent project, my company deployed a similar lightweight XML client to hundreds of locations. We found that the mapping was specific to each business-management application. Fortunately, the mappings were simple enough that a team of junior programmers could take care of them after a little bit of XSLT training. In most cases, they could prepare a mapping in a day or less. Therefore, the cost was acceptable to the users.
The typical business-management application does a very poor job documenting its export files. Hence the warning: "Be paranoid". Assume the documentation is incomplete and outdated (it probably is). Testing, though, is the recipe for success.
To illustrate how to build a map for this lightweight client, I have created a typical case by combining elements from actual projects. The small enterprise manages its orders through two off-the-shelf applications running on an IBM eServer (formerly known as AS/400). Each day, it merges the new orders from both applications into one file that is sent to a Windows workstation where a secretary imports them in a desktop-publishing package for printing (approximately 20 to 50 orders a week). Not the most intuitive setup, but at least an export file is available for the lightweight client to work with.
First, take a look at the target. Transactions are being prepared for a
SOAP server deployed on AXIS 1.1. The server has one method, processOrder(), which takes an array of Order objects.
The Order object is specified in UML in Figure 1:
Figure 1. SOAP data model

Now take a look at what you get from the eServer. Table 1 shows the documentation received from the client:
Table 1. Export file format
| Field | Length | Description |
| HDR_OTYP | 3 | "HDR" for header record |
| HDR_ORFF | 5 | order reference |
| HDR_ODTE | 8 | order date |
| HDR_ODTC | 8 | creation order date |
| HDR_OTOT | 9 | order total |
| LIN_OTYP | 3 | "LIN" for line record |
| LIN_ODEP | 3 | department code |
| LIN_ORFF | 5 | order reference |
| LIN_OCOD | 5 | product reference |
| LIN_OFI1 | 5 | not used |
| LIN_ODES | 35 | description |
| LIN_OQTY | 2 | quantity |
| LIN_OPRI | 9 | product price |
It is essential to ask (and obtain) a few sample files as well. Again the description may be incomplete or inaccurate.
From the description, it's clear that the export file contains two record types: HDR and LIN. HDR records contain header information (such as the order reference) while LIN records contain the product descriptions. Obviously, each order cannot have more than one HDR record, but will have as many LIN records as there are products in the order. The structure with two records is typical for order files.
Now the detective work starts. The next step is to contrast the export file against the SOAP definition. The goal is to document the inconsistencies between the two definitions. In this example, the examination reveals that:
- The export file has two dates versus one in the SOAP server.
- The dates are eight characters long so, presumably, they are
not in the appropriate representation for an
xsd:date. - The SOAP server does not need the department code.
- The "not used" field needs to be checked.
- The buyer and address are nowhere to be found.
To continue the analysis, you have many options: study sample files, interview the user, create test cases, or -- in the best cases -- talk to the application developer. Of course, the application developer is rarely available, but the user can supply a lot of information, since he probably recognizes fields in the output document and may explain how the documents are prepared.
Occasionally, you might have to create test cases -- fake documents with well-known values. By exporting them and searching the values in the export, you can document the role of every field. Listing 1 is a sample export file:
Listing 1. Sample export file
HDRAZ5251029200309252003 1899.00 HDRAZ5281029200310272003 149.95 LINWEBAZ525THPRE IBM ThinkPad R Series Economy 1099.00 LINDITAZ525THPRV IBM ThinkPad R Series Value 2 400.00 LINWEBAZ528BKXBE XML by Example 5 29.99 |
By reviewing it and interviewing the user, it becomes clear that:
- The creation order date and department code are for internal use only.
- The "not used" field is always empty, so it's probably left over from a previous application and can be safely ignored.
- The quantity field may not contain any data. Further tests reveal that one of the two applications discussed earlier generates empty fields when only one product is ordered.
- The buyer and address are constants; they're the name and address of the sample company.
It is very uncommon for the map analysis to find missing information -- that is, information required by the server that is neither in the export file nor a constant. This is because the business relationship precludes the e-business system. In other words, the two partners have already been in business, sometimes for years, so they are already storing and processing all the information they need. The lightweight client simply exploits this information in novel ways.
Table 2 summarizes the map analysis and specifies how to transform the export file into a SOAP request:
Table 2. Analysis summary
| SOAP element | Export field | Comment |
| Order/Reference | HDR_ORFF | - |
| Order/Date | HDR_ODTE | convert to xsd:date format |
| Order/Buyer | - | "Example Corp." |
| Order/Address | - | "Example Street 5, 45202 Cincinnati, OH" |
| Order/Total | HDR_OTOT | - |
| Item/Code | LIN_OCOD | - |
| Item/Description | LIN_ODES | - |
| Item/Price | LIN_OPRI | - |
| Item/Quantity | LIN_OQTY | 1 if absent |
New applications are designed to support XML from the ground up, but it will be years before all existing applications are converted. In the meantime, what can you do? A profitable option is to use an add-on (such as the lightweight XML client) to build XML support on top of the existing capabilities of the application.
This approach is very attractive for businesses because it offers a migration path to new services, such as e-business. Unfortunately for the developer, it often means working with old, poorly documented file formats. An extensive analysis of these documents is the only guarantee for success.
In Part 2 of this article, I'll show you how to turn this analysis into an actual XSLT stylesheet for preparing SOAP requests.
- Participate in the discussion forum.
- Download the XI lightweight XML client used in this article.
- Continue on to "Map files into SOAP requests, Part 2." Benoit Marchal shows you the coding necessary to map into a state-of-the art SOAP request (developerWorks, January 2004).
- Read the previous Working XML installments on the lightweight client:
- "A lightweight XML client" (developerWorks, September 2003) launches the project -- an XML client for e-commerce, born out of the author's experience with B2B e-commerce over the last couple of years.
- "A first version of the lightweight client" (developerWorks, October 2003) shows you how to create SOAP transactions through XSLT.
- Take a look at XI, an earlier Working XML project that dealt with importing text documents in an XML publishing solution (or any XML solution for that matter) (developerWorks, April 2002).
- Focus more specifically on server-side issues in "Adapting legacy applications as Web services" (developerWorks, January 2002) by Dietmar Kuebler and Wolfgang Eibach, as it discusses the conversion of legacy applications to the new era of e-business.
- If you deal with legacy data, read up on various issues that you
might encounter in "Challenges with legacy data" (developerWorks, July 2001) by Scott W. Ambler.
- Check out XML Lightweight Extractor, an IBM alphaWorks technology that simplifies the process of extracting data from relational databases.
- Try the XML Data Mediator, another IBM alphaWorks technology for converting data from and into XML.
- Create XSL stylesheets visually with the XSLerator tool, also part of IBM alphaWorks.
- If you are new to regular expressions, try The Regex Coach tool.
- Try XML Convert from Unidex, another tool for mapping legacy files into XML.
- Read more about IBM WebSphere Application Server right here on developerWorks.
- Find hundreds more XML resources on the developerWorks XML zone. You'll find all previous installments of Benoit's Working XML column at the column summary page.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

Benoit Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.
Comments (Undergoing maintenance)





