One of the most misunderstood and abused aspects of using the Document Object Model, or DOM, is that of getting an initial DOM implementation for programming. This is typically referred to as bootstrapping, and I often see it done incorrectly. Of course, since you can't do anything with DOM until you do have an implementation handy, this can cause all of your programs grief. If this hasn't occurred to you, it turns out to be a classic example of which came first, the chicken or the egg. You can't do anything with DOM until you have a starting point; however, that starting point is itself a DOM class. So, you need a DOM class to start working with DOM, but to get a DOM class you need to work with DOM...confusing, isn't it? I'll help straighten all this out in the next few tips.
Note: I realize that if you are using JAXP, Sun's Java API for XML Processing, it's possible to get a DOM implementation without working through the steps outlined in this tip. However, you may not always have a JAXP implementation laying around, and as a good DOM programmer, you should know how to work around JAXP anyway! So even with JAXP available, this tip should be useful to you.
In both DOM Level 1 and Level 2, the process you use to get a DOM implementation to work with is a bit of a challenge. In the next tip, I'll deal with ways to correct all of those problems. First, you should understand why you need a DOM implementation, and how not to do bootstrapping -- both of which I cover in this tip.
If you are reading in an XML document, such as from an existing file or input stream, this entire section is not
applicable. In these cases, the
reader.getDocument() method returns a DOM
Document object, and you can then operate on that DOM tree without any problem. However, DOM is also useful because it allows you to create a new XML structure using a DOM tree, and then serialize that structure out to a file or other output sink. In these cases, vendor-specificity becomes an issue.
The end-goal of bootstrapping is to get a vendor's implementation of the
org.w3c.dom.Document interface. Most developers are inclined to write this line of code to get that implementation instance:
Document doc = new org.apache.xerces.dom.DocumentImpl();
If you use this code, you may have several problems:
- Your code is now tied to Apache Xerces, and can't easily be made to work with another parser.
- This code often won't even function with a different version of the parser you are using.
- This is not the proper way to get access to a DOM-creating mechanism in the first place!
In addition to this being vendor-specific, you will have to perform additional vendor-specific steps to get an implementation of the
A better approach is to use the
which acts as a factory for both of these interfaces. Instead of directly getting a DOM
Document implementation, write your code like this:
DOMImplementation domImpl = new org.apache.xerces.dom.DOMImplementationImpl(); DocumentType docType = domImpl.createDocumentType("rootElementName", "public ID", "system ID"); Document doc = domImpl.createDocument("", "rootElementName", docType);
Now you've got access to a
DOMImplementation object. With that, you can generate both types of DOM structures that are used to build trees, the
Document interfaces. This removes all of the extra vendor-specific steps I talked about
before; of course, you still have the reference to Xerces' specific
implementation class, so all is not well quite yet. As already mentioned, you would have to recompile all of your code just to change out a parser, and that's not a workable solution.
Fixing this problem requires a lot more work, and I'm saving this work for the next tip in this series. For now, make sure that you understand the basic concepts of bootstrapping, and why this is an issue in the first place. In the next tip, I'll show you how to get around the requirement to specifically reference a DOM implementation in your code, and you'll get into some programming of your own to handle this. Until then, I'll see you online!
- Read the DOM API on W3C.org.
- Learn about the structure of a DOM document, and how to use Java technology to create a document from an XML file, make changes to it, and retrieve output, in Nicholas Chase's tutorial "Understanding DOM" (developerWorks, August 2001).
Part 2 of this series of tips explains a better way to bootstrap in your DOM applications (developerWorks, December 2002). Part 3 explains the changes to DOM Level 3 that relate to bootstrapping and improve upon DOM Levels 1 and 2 (developerWorks, December 2002).
- Find more XML resources on the developerWorks XML zone.
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
- Want us to send you useful XML tips like this every week? Sign up for the developerWorks XML Tips newsletter.
Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using the Java language and Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.