Nancy Dunn (nancydunn@aol.com)
Retired XML zone editor
Return to article
In some circles it takes little to start a flame war over language-specific
XML document models. What's the best way to go? Hew closely to the
standards with the DOM, or take advantage of the idioms of your programming
environment with a language-specific DOM model? Do they, in fact, offer any
significant advantages over an implementation of the DOM? According to Joe Kesselman, staff scientist/programmer at IBM, the answer
to that last question is often "No." He says, "The DOM offers not only the ability
to move between languages with minimal relearning, but to move between
multiple implementations in a single language -- which a specific set of
classes such as JDOM can't support. This permits you to pick among
implementations of the DOM to select the one whose peformance trade-offs
best match your needs, and to wrap the DOM API around existing data
structures so they can be accessed directly rather than having to be copied
into an XML-specific data model and then copied back if they've been
changed." Kesselman, who is a member of the W3C DOM working group and an editor of the Traversal chapter,
does concede that there are some details of the DOM which a Java
programmer may initially find annoying, though he asserts that one adapts
to them very quickly. "The use of the Document object as a node factory is
indeed a bit less convenient than simply calling an object constructor, but
is inherent in working with abstract interfaces. Note that dom4j also uses
a node factory; it just happens to use a separate getDocumentFactory() call
to retrieve it. One of several ways in which Dennis (Sosnoski)'s DOM
sample (in the accompanying article) could be simplified would be to factor out the retrieval of the
Document for this purpose, just as he factored out retrieval of dom4j's
factory." Another significant difference is the DOM's use of Text nodes rather than
direct references to strings which, Kesselman points out, is the most important
actual code difference between Sosnoski's DOM example and the others. "This
is probably the single most contentious feature of the DOM," Kesselman admits.
"In trivial examples, it is indeed a nuisance. On the other hand, there are
also situations where the ability to treat Text just like any other node --
to use it as a starting point for tree navigation, for example -- can
simplify your code. Whether it's a net advantage or disadvantage depends on
exactly what your code is doing... but in real-world code it honestly
doesn't seem to be a problem, especially since it often winds up buried in
a subroutine so most of your code can't tell the difference." Kesselman also grants that the DOM does not yet have an officially blessed parser
and serializer API, "though this is currently under development as part of
DOM Level 3, based heavily on ideas from SAX and JAXP and other existing
parser APIs. If you're working in Java, I would recommend using the JAXP
interfaces until the DOM's own load/save module becomes available." Kesselman's recommendation: "Unless you've got a darned good reason to do
otherwise, it probably is wisest to stick with the standard DOM. That
minimizes the relearning and recoding you have to do as you move from
language to language and maximizes the reusability and maintainability of
your code should you want to change platforms within a single language.
There are indeed times when you want to step outside the DOM and use a
custom model, and at that time it's certainly worth considering whether one
of these other solutions addresses your specific needs
(and handles them better than a function library running on top of
the DOM, or a model which supports both the DOM APIs and any additional
functionality you need). But there are significant long-term costs in doing so;
think carefully about what your future plans for this code might be before
you take that leap, and be sure that the benefits outweigh the costs." Kesselman also asked us to remind readers that the W3C DOM is still a work in
progress: "If there's something you need to do that the DOM can't
support, make sure the Working Group knows about it. Feedback from
real-world users is an essential factor in the DOM's evolution." To see
what other users have been suggesting, check out the W3C's
DOM "open
issues list".
Return to article
|