Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Working XML: Safe coding practices, Part 2

Separation of tasks

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Photo of Benoit Marchal
Benoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.

Summary:  Save yourself hours of debugging and maintenance. Benoît continues to review his notes on horror stories in the use of XML. In the process, he discusses appropriate design techniques for working with XML documents, and how best to integrate XML processing into an application.

Date:  07 Jul 2005
Level:  Intermediate

Activity:  7798 views
Comments:  

In this installment, I continue my discussion of common XML pitfalls and, perhaps more importantly, show you how to avoid them. My previous column focused on frequent misunderstandings in the use of the XML syntax itself. Here, I look at how to integrate XML support into an application for efficiency and maintainability.

Programming languages and development tools (including databases, IDEs, and modeling tools) offer growing XML support. At the time of this writing, Java technology has no less than five official APIs related to XML:

  • Java Architecture for XML Binding (JAXB)
  • Java API for XML Processing (JAXP), probably the largest API, consists of four parts: SAX, DOM, TrAX, and XPath APIs
  • Java API for XML Registries (JAXR)
  • Java API for XML-based RPC (JAX-RPC)
  • SOAP with Attachments API for Java (SAAJ)

Furthermore, you'll find countless unofficial Java APIs such as JDOM, PDOM, Castor, and StAX (which will eventually be integrated into JAXP). And last but not least, most projects offer extensions to the standard APIs, such as Axis extensions to JAX-RPC.

The choice is overwhelming. Given the short deadlines that are common in this industry, it is not surprising that developers usually reach for the most popular API (typically DOM in my experience) and don't sweat too much over their decision.

Likewise, who has the time to design an XML vocabulary? It seems faster to dump the object hierarchy as XML tags.

Yet the decision you make in this space will have a significant impact on your ability to deliver on time and on budget, not to mention how easy it will be to maintain your application.

A word of warning: Selecting a design can be a trade-off between different qualities. No single design fits all needs, so take the time to validate the trade-offs against the requirements of your application. However, I have found that some designs just seem to be better starting points than others, and I will cover those.

XML as interface

From a design point of view, it helps to concentrate on the role of XML documents, rather than on their structure. Structurally, XML documents are repositories of data. More interestingly, XML documents are used as interfaces between applications.

In the interface scenario, one application prepares XML documents that another application consumes. You can find many examples of this: An XML editor saves XML files for a content manager; a news reader downloads Atom or RSS feeds from a Web server; a SOAP client sends a SOAP request to a server; the Eclipse platform reads an XML plug-in description; and many more.

Even in simpler cases where one application produces XML files for its own consumption, you can still think of the document as an interface between different runs (and occasionally different versions) of the application.

By looking at this as an interface design issue, you can apply the same rules that you use with JavaBeans and other APIs to XML design.

Design by contract

Among the many methodologies that have been proposed to help design interfaces, one that is particularly relevant is the design-by-contract approach first introduced in the Eiffel language.

In a nutshell, when designing by contract you need to spell out the functional requirements of each component in terms of a contract with the other components. Like a legal contract, the contract includes the duties and the rights of each component -- meaning, what the component will deliver and what it expects from other components.

In practical terms, you express the functional requirements in terms of a data structure and pre- as well as post-conditions. XML is ideally suited for this approach, thanks to the availability of schema languages (such as W3C XML Schema and RELAX NG) and structural validation languages such as Schematron.

The vocabulary you define becomes the contract between the components.

Resilience

It is a given that the functional requirements of the application will change over time. So will the XML vocabulary that supports it. Still, in the mad rush to meet deadlines, few vocabularies are designed to withstand the test of time. Two common problems are the lack of a versioning scheme and reliance on the object data model.

A versioning scheme enables backward compatibility, allowing older applications to work with files produced by newer applications and vice-versa. A good versioning scheme enables:

  • A new application to engage a compatibility mode
  • An old application to open new files without crashing

You can address the first point easily enough by including a version tag or attribute. If the version is lower than the current version, the application must turn on backward compatibility.

The second point requires more work. You cannot change the old code (otherwise it becomes a new application), so you must include forward-looking compatibility from the outset. Specifically, the code must:

  • Know how to treat new tags -- for example, ignore them or report an error (but don't crash)
  • Detect when the file breaks backward-compatibility and report an error

The latter is often overlooked. A new version of a vocabulary may introduce changes such that an old reader should not attempt to process the file. A versioning scheme must provide a mechanism for reporting this -- for example, a tag or attribute called compatibleWith that specifies the minimum version required to proceed with the document. Applications would then refuse to open documents whose compatibleWith number is higher than their own. See Listing 1 for an example.


Listing 1. Simple versioning scheme
                
<v:root compatibleWith="3" version="5"
    xmlns:v="http://psol.com/2005/sample">
    <!-- content goes here -->
</v:root>

Listing 1, which was written with version 5 of this particular application, can be processed by versions 3, 4, or 5 but not with versions 1 or 2.

Alternatively, newer applications can use a different namespace to mark incompatible changes.

Data model and vocabulary

Typically, a lot of effort has already gone into developing an object model for the application, so it seems logical to make use of that effort and derive the XML model from the object model.

In practice, however, this can turn into a maintenance nightmare. The very qualities that make a good object model become liabilities for an XML vocabulary:

  • The object model often contains redundant information -- for example, to speed searches in hash tables; redundant information means more validation of the XML document.
  • Conversely, the model might dynamically compute information that is recorded for validation purposes. For example, with an online purchase, you want to store not only the item lines but also the total amount, because the buyer has approved the total amount. In an object model, it is perfectly valid to re-compute the total from the item lines.
  • Although this is not considered pure object design, the development tools and libraries being used often influence the object model. For example, if you're using a widget library, you might adapt the model to fit nicely with the widgets.
  • Portions of the object model might be optimized using bit masks or native arrays, sacrificing readability for speed.
  • Finally, the object model does not need to be stable from one version of the application to the next; properties are often added and removed as the software evolves.

If the XML vocabulary is a direct mapping of the object model, it won't be very stable. This may cause some problems during development, since you will need to adapt the XML-related code frequently -- but it will cause even more problems during maintenance. Remember that the document is an interface, so you must isolate it from the implementation details wherever feasible.

In contrast, if you take a moment to identify the stable relationship in your object model, then you can derive a more stable vocabulary at minimal cost. Using UML and stereotypes, it might even be possible to generate both the object model and the XML vocabulary from a single diagram (see the "UML, XMI, and code generation" articles in Resources).

Alternatively, you can concentrate on a functional view of the data model. The object model is a technical view that implements the nuts and bolts of the application and it changes as the technology evolves. The functional view however is more stable -- even if you migrate the application to a new platform, it will still basically provide the same set of services.


From the application side

Now that I've covered the XML vocabulary, it's time to look at the application. Again, the rule is to treat the XML processing as an interface and to encapsulate it.

The worst case scenario (one that I see frighteningly often) is to design an application around a DOM (or JDOM) tree. With a few exceptions (browsers come to mind), DOM is a horrible object model to work with.

DOM appears to be an attractive option because it is easy to use and fairly generic. Many developers figure that it can accommodate anything that's thrown at it, and in most cases it does. Many developers have learned XML programming with DOM, so they assume it's the logical choice.

DOM was designed by the W3C for a fairly restricted class of applications: Web browsers. It works very well for related applications, including editors and XML utilities, but it is sub-optimal for general-purpose applications.

The main issue with using a DOM model is that it forces you to spread your XML code across the entire application. It's a better idea to concentrate your XML code in one or two packages.

I am reminded of a project where I reviewed a fairly sizable product that was built around a DOM tree. Almost every single class in the application would extract data from the DOM tree or insert data into it. This is illustrated in Figure 1, where you can see that almost every package depends on the DOM tree.


Figure 1. In this model, every package depends on DOM
In this model, every package depends on DOM

This proved difficult to debug, difficult to work with, and difficult to maintain for a number of reasons:

  • Conversions from strings to native types were incoherent. Validation was inconsistent -- different routines would interpret the same DOM tree differently!
  • Information in the DOM was not in the ideal format, which made for complex algorithms.
  • It was difficult to track changes. One routine could update the DOM tree in such a way as to break another routine accessing the same data.
  • Any change in the XML vocabulary required that team members check thousands of lines of code.
  • Debugging was a nightmare because no one ever knew which routine had produced certain results in the DOM tree.

With SQL databases, it's a good idea to separate the data loading from the object model. The same applies with XML.

Figure 2 illustrates a more robust alternative that relies on an application-defined object model to isolate the code that deals with XML in a Serialization package.


Figure 2. Isolating the XML code
Isolating the XML code

The model is brought into memory or serialized to XML through this dedicated package. The benefits include:

  • It encapsulates (isolates) the code that deals with XML in a well-defined part of the application. This may still run into thousand lines of code, but it can be debugged and tested independently.
  • It promotes a more coherent use of XML, because the XML code is the responsibility of one developer (I have yet to see an application with so much XML code that it requires a whole team).
  • While loading, it is possible to reorganize the data to better suit the algorithms -- for example, loading into a hash table or a database may help in working with a large data set.
  • It is easier to adopt a Model/View/Controller (MVC) paradigm.
  • The object model is decoupled from the XML model so that they can evolve independently.

Obviously the serialization routine may still use DOM for the actual parsing (although I tend to find SAX more efficient). You may also want to explore JAXB or Castor, which essentially generate the serialization package automatically.

What is the price to be paid for all the above benefits? The startup costs are higher, but you will quickly find that you save time when you encapsulate the XML code. In the first stages of development, it is normal and expected that both the data model and the XML vocabulary will evolve somewhat rapidly. Even then, you will see the benefits of localizing these changes begin to pay off.


Coming up next

The moral of this article is that a little forethought can pay off tremendously. Take the time to design your application and isolate the XML components.

In the next installment, I will review the all-important issue of validation.


Resources

About the author

Photo of Benoit Marchal

Benoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=88284
ArticleTitle=Working XML: Safe coding practices, Part 2
publish-date=07072005
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com