Skip to main content

skip to main content

developerWorks  >  XML | Java technology  >

Java document model usage

Sidebar: A few words about document models

developerWorks

Nancy Dunn (nancydunn@aol.com)
Retired XML zone editor

Return to article

In some circles it takes little to start a flame war over language-specific XML document models. What's the best way to go? Hew closely to the standards with the DOM, or take advantage of the idioms of your programming environment with a language-specific DOM model? Do they, in fact, offer any significant advantages over an implementation of the DOM?

According to Joe Kesselman, staff scientist/programmer at IBM, the answer to that last question is often "No." He says, "The DOM offers not only the ability to move between languages with minimal relearning, but to move between multiple implementations in a single language -- which a specific set of classes such as JDOM can't support. This permits you to pick among implementations of the DOM to select the one whose peformance trade-offs best match your needs, and to wrap the DOM API around existing data structures so they can be accessed directly rather than having to be copied into an XML-specific data model and then copied back if they've been changed."

Kesselman, who is a member of the W3C DOM working group and an editor of the Traversal chapter, does concede that there are some details of the DOM which a Java programmer may initially find annoying, though he asserts that one adapts to them very quickly. "The use of the Document object as a node factory is indeed a bit less convenient than simply calling an object constructor, but is inherent in working with abstract interfaces. Note that dom4j also uses a node factory; it just happens to use a separate getDocumentFactory() call to retrieve it. One of several ways in which Dennis (Sosnoski)'s DOM sample (in the accompanying article) could be simplified would be to factor out the retrieval of the Document for this purpose, just as he factored out retrieval of dom4j's factory."

Another significant difference is the DOM's use of Text nodes rather than direct references to strings which, Kesselman points out, is the most important actual code difference between Sosnoski's DOM example and the others. "This is probably the single most contentious feature of the DOM," Kesselman admits. "In trivial examples, it is indeed a nuisance. On the other hand, there are also situations where the ability to treat Text just like any other node -- to use it as a starting point for tree navigation, for example -- can simplify your code. Whether it's a net advantage or disadvantage depends on exactly what your code is doing... but in real-world code it honestly doesn't seem to be a problem, especially since it often winds up buried in a subroutine so most of your code can't tell the difference."

Kesselman also grants that the DOM does not yet have an officially blessed parser and serializer API, "though this is currently under development as part of DOM Level 3, based heavily on ideas from SAX and JAXP and other existing parser APIs. If you're working in Java, I would recommend using the JAXP interfaces until the DOM's own load/save module becomes available."

Kesselman's recommendation: "Unless you've got a darned good reason to do otherwise, it probably is wisest to stick with the standard DOM. That minimizes the relearning and recoding you have to do as you move from language to language and maximizes the reusability and maintainability of your code should you want to change platforms within a single language. There are indeed times when you want to step outside the DOM and use a custom model, and at that time it's certainly worth considering whether one of these other solutions addresses your specific needs (and handles them better than a function library running on top of the DOM, or a model which supports both the DOM APIs and any additional functionality you need). But there are significant long-term costs in doing so; think carefully about what your future plans for this code might be before you take that leap, and be sure that the benefits outweigh the costs."

Kesselman also asked us to remind readers that the W3C DOM is still a work in progress: "If there's something you need to do that the DOM can't support, make sure the Working Group knows about it. Feedback from real-world users is an essential factor in the DOM's evolution." To see what other users have been suggesting, check out the W3C's DOM "open issues list".

Return to article