The Document Object Model (DOM), is a platform- and language-neutral interface for dynamically accessing and updating the content, structure, and style of XML documents. DOM defines a standard set of interfaces for representing documents, a standard model of how these objects can be combined, and a standard set of methods for accessing and manipulating them. DOM is a W3C Recommendation, which makes it a recognized Web standard. Implementations are available for a wide variety of languages, including Perl, C, C++, Java, Tcl, and Python.
As I'll demonstrate in this article, DOM is an excellent choice for XML handling when stream-based models (such as SAX) are not sufficient. Unfortunately, several aspects of the specification, such as its language-neutral interface and its use of the "everything-is-a-node" abstraction, make it difficult to use and prone to generating brittle code. This was particularly evident in my company's recent review of several large DOM projects that were developed by a variety of developers over the past year. The common problems, and their remedies, are discussed below.
The DOM specification is designed to be usable with any programming language. Therefore, it attempts to use a common, core set of features which are available in all languages. The DOM specification also attempts to remain neutral in its interface definitions. Because of this, Perl programmers can apply their DOM knowledge when working with Java, and vice versa.
The specification also treats every part of the document as a node consisting of a type and a value. This provides an elegant conceptual framework for dealing with all aspects of the document. As an example, the following XML fragment
<paragraph align="left">the <it>Italicized</it> portion.</paragraph> |
is represented via the following DOM structure:
Figure 1: DOM Representation of an XML Document

Each of the Document, Element, Text, and Attr pieces of the tree are DOM::Nodes.
The downside of DOM's language neutrality is that the methodologies and patterns that are normally used in each programming language cannot be employed. For example, the attributes of an XML node would naturally be represented in Perl as a hash, since they are a set of unique name-value pairs. With DOM, however, they are represented as a set of nodes, and the value of each is accessed via a separate function call. Instead of using a simple hash, the programmer must learn to use a number of new data structures and access methods. These minor inconveniences add up to unusual coding practices and an increase in lines of code. They also force the programmer to learn the DOM method of doing things in place of the way she would handle it intuitively.
The everything-is-a-node abstraction, while quite elegant, leads to awkward coding situations, such as the attribute node example above. This also occurs when accessing the value contained within an XML tag. Consider the XML fragment: <tagname>Value</tagname>. You may think the text value would be accessible by calling a getValue or similar method on the tagname node. In fact, the text is treated as one or more child nodes under the tagname node. Thus, in order to get the text value, you need to traverse the children of tagname, collating them into a string. There is good reason for this: tagname may contain other embedded XML tags. If tagname does contain embedded XML tags, getting its text value makes less sense. In the real world, however, we have seen very frequent coding errors caused by this lack of convenient functions.
The everything-is-a-node abstraction also loses some value because of the number of node types
that exist and because of the lack of uniformity present in their access methods. For example, the
insertData method is used to set the value of
CharacterData nodes, while the value of Attr
(attribute) nodes is set by direct access to a value field. By presenting
different interfaces for the different nodes, the uniformity and elegance of the model is diminished, and the
learning curve is increased.
An analysis of several large XML projects revealed some common problems in working with the DOM. A few of these are presented below.
In all of the projects that we looked at in our review, an overarching problem presented itself: it took many lines of code to do simple things. In one example, 16 lines of code were used to check the value of an attribute. But the same task, with improved robustness and error handling, can be accomplished in three lines of code. What contributed to the increase in the number of code lines were the low-level nature of the DOM API, incorrect application of methods and programming patterns, and lack of knowledge of the full API. The following presents specific instances of these issues.
In the code we examined, the most common task was to traverse or search the DOM. Here is a condensed version of the code required to find a node called "header" under the config section of the document:
$document_root = $dom_document->getDocumentElement();
my $config_node = $document_root->getFirstChild();
foreach my $node ( $config_node->getChildNodes() ) {
if ( $node->getName() eq "header") {
# do something
}
}
|
The document is traversed from the root by getting the top element, getting
its first child (config_node), and finally by
individually examining config_node's children.
Unfortunately, not only is this method quite verbose, but it is also fraught with
fragility and the potential to have bugs.
As an example, the second line of the code gets the intermediate node
using the getFirstChild method. Already, a
multitude of potential problems exist. The first child of the root node may
not be actually be the config_node the user is
searching for. By blindly following the first child, we have ignored the
actual name of the tag and will potentially be searching the incorrect part
of the document. A frequent error in this scenario occurs when the source
XML document contains whitespace or a carriage return after the root node;
the first child of the root node is actually a DOM::Text
node, not the intended node. To correctly navigate to our intended node, we
need to examine each of document_root's child nodes
until we find one that is not a Text node and that has the name we are looking for.
We are also ignoring the possibility that the document may have a different structure from
what we are expecting. If the document_root
doesn't have any child nodes, for example, config_node
will be set to undef, and the third line of the example will
raise an error. Therefore, to properly navigate the document, not only do we have to examine
each child node individually and check for the appropriate name, but at every step we
also have to check to make sure each method call returned a valid value. Writing
robust, error-free code that can handle arbitrary input requires both a great deal of
attention to detail and many lines of code.
Retrieving the text value within a tag
After DOM traversal, the second most common task was to retrieve the text value contained in a tag.
Consider the XML fragment <sometag>The Value</sometag>.
Having navigated our way to the sometag node, how do we
capture its text value (The Value)? An intuitive implementation may be:
$sometag->getData(); |
As you may have guessed, the above code will not
perform the desired action. We cannot call a getData
or a similar function on the sometag node because the actual
text is stored as one or more child nodes. A better approach would be:
$sometag->getFirstChild()->getData(); |
The problem here is that the value may not actually be contained in the first
child; processing instructions or other embedded nodes may be found within
sometag, or the text value may be contained in
several child nodes instead of in just one. Recall that whitespace is frequently
represented as a text node, so the call to
$sometag->getFirstChild() may get you only
the carriage return between the tag and its value. In fact, we need to traverse
all of the children, checking for nodes of type Text,
and collating their values until we have the complete value.
The DOM interface includes a method for finding child nodes with a given name. For example, the call:
my @results = $document_root->getElementsByTagName("name"); |
will return an array (or a NodeList) of tags called
name from within the document. This is certainly
more convenient than the traversal methods we discussed above. It is also the
cause of a common set of bugs.
The problem is that getElementsByTagName
recursively traverses the document, returning all matching nodes. Suppose you
have a document containing customer information, company information, and
product information. All three of these items can potentially have a
name tag within them. If you were to call
getElementsByTagName searching for customer names
and ended up with product and company names, your program will likely
misbehave. Calling the function on a subtree of the document can diminish
the risks. However, XML's flexible nature makes it quite difficult to ensure
the subtree you are operating on has the structure you are expecting, and
doesn't have spurious child nodes with the name you are searching on.
Given the limitations imposed by DOM's design constraints, how can you use the specification effectively and efficiently? We present a few basic principles and guidelines for DOM usage, and create a library of functions to make life easier.
Your experience using DOM will be significantly improved if you follow a few basic principles:
- Do not use DOM to traverse the document
- Whenever possible, use XPath to find nodes or traverse the document
- Use a library of higher-level functions to make DOM use easier
These principles are derived directly from examination of common problems. DOM traversal, as discussed above, is a leading cause of errors. It is also, however, one of the most commonly needed functionalities. How do we traverse the document without using the DOM?
XPath is a language for addressing, searching, and matching pieces of the document. It is a W3C Recommendation, which makes it an accepted standard, and it is implemented in most languages and XML packages. Chances are your DOM package supports XPath either directly or via an add-on.
XPath provides an excellent means by which to traverse and search the document.
It uses a path notation, similar to that used in file systems and URLs,
to specify and match pieces of the document. For example, the XPath:
/x/y/z searches the document for a root node
of x, under which resides the node y, under which resides the node z. This statement returns all
nodes that match the specified path structure.
More complex matchings are possible both in terms of the structure of the document,
and the values of the nodes and their attributes. The
statement /x/y/* returns all nodes under
any node y with the parent x. /x/y[@name='a']
matches all nodes y who have a parent x, and have an attribute called
name with the value a.
A full examination of XPath and its usage is beyond the scope of this article. See Resources for links to some excellent tutorials. Take a little time to learn XPath, and you will be rewarded with much easier handling of XML documents.
One of the surprising aspects of our examination of the DOM projects was the amount of copy-and-paste code that was present. Pieces of code from one file would be copied and pasted into many others to implement similar pieces of functionality. Why would experienced developers who otherwise employ good programming practices engage in copy-and-paste methods instead of creating helper libraries? We believe this is because most programmers are not DOM experts, and they will happily grab the first piece of code that does what they need. They do not feel confident enough in their DOM skills to produce the canonical functions that make up the helper library.
It is quite easy to create and use helper libraries to implement common functionalities; it only requires a small amount of discipline. Below are some basic helper functions that will get you started.
The most commonly performed action when working with XML documents is
looking up the value of a given node. As discussed above, this can
present difficulties both in traversing the document to find the desired node and in retrieving the
value of the node. The traversal can be simplified using XPath, and the retrieval of the value can be
coded once and then reused. We have implemented the
getValue function with the helper of two
lower-level functions, findNode. This helper
finds and returns the first node, which matches the given XPath
expression, and getTextContents, which
non-recursively returns the concatenated values of the text nodes
under the passed-in node, as shown in Listing 2.
sub getTextContents {
my ($node, $strip)= @_;
my $contents;
if (! $node )
{
return;
}
for my $child ($node->getChildNodes()) {
if ( ! is_element_node($child) ) {
$contents .= $child->getData();
}
}
if ($strip) {
$contents =~ s/^\s+//;
$contents =~ s/\s+$//;
}
return $contents;
}
sub findNode {
my ($node, $xpath) = @_;
if (! defined($node) || ! defined($xpath) )
{
return undef;
}
my $match = ($node->xql($xpath))[0];
if (! $match )
{
return undef;
}
return $match;
}
sub getValue {
my ($node, $xpath) = @_;
my $match = findNode( $node, $xpath );
if (! defined($match) )
{
return undef;
}
return getTextContents( $match );
}
|
getValue is called by passing in both a node from which to start the search, and an XPath statement that specifies the node we're searching for. The function finds the first node to match the given XPath and extracts its text value.
Another common action is to set the value of a node to a desired value, as shown in Listing 3.
sub setValue {
my ($node, $xpath, $value) = @_;
my $match = findNode( $node, $xpath );
if (! defined($match) )
{
return undef;
}
foreach my $child ( $match->getChildNodes() )
{
$match->removeChild ($child);
}
$match->addText($value);
return $match;
}
|
This function takes a starting node and an XPath statement -- just like getValue -- and a string
to set the value of the matching node to. It finds the desired node using findNode, removes all of
its children (thereby removing any text and other elements contained within it), and sets its text contents to the passed-in string.
While some programs look up and modify the values contained in XML documents, others modify the structure of the document itself by adding and removing nodes. This helper function simplifies the addition of a node to the document, as shown in Listing 4.
sub appendNode {
my ($doc, $nodename, $xpath, $value) = @_;
if (! defined($nodename) || ($nodename eq "") ) {
return undef;
}
my $match = findNode( $doc, $xpath );
if (! defined($match) )
{
return undef;
}
my $newnode;
eval {
$newnode = $doc->createElement( $nodename );
};
if ($@ || (! defined($newnode) )) {
return undef;
}
$match->appendChild( $newnode );
if ( defined($value) ) {
$newnode->addText($value);
}
return $newnode;
}
|
The parameters to this function are the DOM document, the name of the node to add, the XPath statement specifying the node to add it under (that is, what the parent node of the new node is), and, optionally, the text value of the node. The new node is appended to the specified parent node, and its value is set to the passed-in string.
Copying a section of a document into another location or document, while not a very common operation, was the cause of much confusion and gave rise to various inventive copy procedures. As Listing 5 illustrates, it is, in fact, fairly simple to implement.
sub copySubTree
{
my ($sourcenode, $destnode) = @_;
my $copy_node = $sourcenode->cloneNode(1);
if ( $sourcenode->getOwnerDocument() ne $destnode->getOwnerDocument() )
{
$copy_node->setOwnerDocument( $destnode->getOwnerDocument() );
}
$destnode->appendChild($copy_node);
return $copy_node;
}
|
This function takes the source node and copies it over as a child under the destination node. The destination node may be in another document, in which case the subtree is copied between documents.
The DOM has been maligned as a difficult and nonintuitive way of manipulating XML documents. In fact, it forms a very effective base which easy-to-use systems can be built upon by following a few simple principles. DOM has already been implemented and optimized on most platforms, and is a very good choice for applications that need to search and manipulate XML documents in complex processes.
- For additional details on working with Perl and DOM, see the author's article Dare to script tree-based XML with Perl.
- For background, consult The DOM specification maintained by the
W3C.
- Get up to speed on XPath with the
Zvon XPath Tutorial.
- Review the XPath specification, maintained by the W3C.
- CPAN is always an excellent source for finding Perl XML modules.
- To learn more about manipulation of XML documents with Perl and other scripting languages, see XML and scripting languages on developerWorks.
- Get the answers to frequently asked questions at the Perl XML FAQ.
- Subscribe to the perl-xml mailing list by sending e-mail to Lyris@ActiveState.com with the message: SUBSCRIBE Perl-XML.
- If you want to know how IBM's WebSphere Application Server (WAS) supports XML development, see this technical background info on XML in the WAS Advanced Edition 3.5 online help.

Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.
Comments (Undergoing maintenance)





