For those of you who are sold on the W3C's DOM (the Document Object Model) and think SAX is silly, you will have to find a way to move from DOM to the other formats that application developers use. These other formats are, of course, SAX and JDOM. What do you do when you have to accept DOM as input and convert it to something else? This certainly is a valid question. With DOM providing a complete document representation, converting it into another format makes a lot of sense. In this tip, you'll learn how to perform this conversion from DOM to either SAX or JDOM
Unfortunately, DOM Level 1 and the newer Level 2 do not
provide a means of outputting a DOM tree to SAX or any other
format. The result is that each parser implementation provides a
set of custom APIs for output, and implementation independence is
lost. In other words, your code only works with the parser you
wrote it for (like Crimson, or Xerces, or Oracle, and so on). DOM
Level 3 is supposed to provide this functionality, so we'll all
have to wait and see what DOM Level 3 provides in the way of
output methodology. In the meantime, check out your vendor's
documentation on writing, or on the serialization of, a DOM tree. As an
example, using Apache Xerces, you would need to use the
org.apache.xml.serialize.XMLSerializer
class, as shown in Listing 1. In either case, you will probably
have to output the DOM tree to a stream, then push that
stream back into SAX for sequential processing. Note that Listing
1 only shows outputting a DOM tree to a stream; you can then use
that stream as input to a SAX processor.
import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.XMLSerializer;
import org.xml.sax.InputSource;
import org.w3c.dom.Document;
public class PrintDOMTree {
public static void main(String[] args) {
try {
InputSource source = new InputSource(args[0]);
DOMParser parser = new DOMParser();
parser.parse(source);
Document doc = parser.getDocument();
XMLSerializer serializer = new XMLSerializer();
// Insert your PipedOutputStream here instead of System.out!
serializer.setOutputByteStream(System.out);
serializer.serialize(doc);
} catch (Exception e) {
e.printStackTrace();
}
}
}
|
Moving from DOM to JDOM is quite a bit easier than moving from DOM to SAX. This actually makes sense, since once you have a DOM tree, you've probably already had a chance to get at the data through SAX. In fact, rarely do situations arise where a DOM tree is best handled by SAX, because you've already used up the memory for storing the XML in memory through a DOM representation. A far more common task is to convert an XML document that is coming in as a DOM tree to a JDOM tree. Since these formats are both document representations, but substantially different in behavior and functionality, you may want to let someone else take your DOM tree and deal with it as JDOM. While you might argue that this should be their job, you do need to know (at least!) how to convert from your structure to theirs.
For converting from DOM to JDOM, the JDOM API provides a
consumer for DOM Nodes, which is called org.jdom.input.DOMBuilder.
This class will take in a DOM Document (as well as some
other DOM structures, such as Element and Attr)
and convert the DOM tree to a JDOM Document. There
really isn't much to this operation, so I'll simply show you the
code in Listing 2 and let you see the process in action.
// Java imports
import java.io.IOException;
// JDOM imports
import org.jdom.JDOMException;
import org.jdom.input.DOMBuilder;
import org.jdom.output.XMLOutputter;
// SAX and DOM
import org.xml.sax.InputSource;
// Xerces
import org.apache.xerces.parsers.DOMParser;
public class DOMtoJDOM {
// DOM tree of input document
org.w3c.dom.Document domDoc;
public DOMtoJDOM(String systemID) throws Exception {
DOMParser parser = new DOMParser();
parser.parse(new InputSource(systemID));
domDoc = parser.getDocument();
}
public org.jdom.Document convert()
throws JDOMException, IOException {
// Create new DOMBuilder, using default parser
DOMBuilder builder = new DOMBuilder();
org.jdom.Document jdomDoc = builder.build(domDoc);
return jdomDoc;
}
public static void main(String[] args) {
try {
DOMtoJDOM tester = new DOMtoJDOM(args[0]);
org.jdom.Document jdomDoc = tester.convert();
// Output the document to System.out
XMLOutputter outputter = new XMLOutputter();
outputter.output(jdomDoc, System.out);
} catch (Exception e) {
e.printStackTrace();
}
}
}
|
There's nothing more to say. Once you know how to move from DOM to SAX and JDOM, you're all set for tackling any output format you need and interacting with pretty much any type of XML representation you'll come up against. Watch the DOM Level 3 specification for changes to outputting DOM trees in a standard, vendor-independent way, and until then, enjoy using the DOM!
- Visit the birthplace of SAX at Dave Megginson's site
- Find out more about JDOM
- Look into the background of DOM at the W3C's DOM page
- Catch up with other recent tips in the developerWorks XML zone:
- Using XSLT as a shortcut to Web page tables of contents
- Documenting style sheets with RDF
- Using lookup tables in XSLT
- Moving DOM nodes (without triggering the Wrong-document exception)
- Using JDOM with XSLT
- Using SAX and SAX helper classes to achieve vendor independence
- Using SAX to communicate with apps that need DOM or JDOM
- Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
Brett McLaughlin (brett@newInstance.com) works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project and the EJBoss EJB server as well as a co-founder of the Apache Turbine project.
