You have to store your XML somewhere
XML is a great data format—as evidenced by, among other things, an entire zone of IBM developerWorks that is dedicated to the subject. And more often that not in 2007, talking about XML means talking about Web services, or converting between XML and Java™ objects, or perhaps reading an XML configuration file, or even using an XML-formatted database format instead of a relational or an object-oriented one.
One thing that you won't hear talked about much these days is how XML gets from whatever in-memory representation you use—DOM, JDOM, or what have you—into static files, filled with angle brackets and quotation marks. Frankly, taking XML and writing it to a file just isn't very exciting—but it is necessary. Imagine a programming world where you could never actually persist XML to a file! You could build up an XML document in memory, and even send it to another component in your application (or to another application altogether); but you could never store that XML. You could use XML to store configuration data, and write all sorts of utilities to read that data, but never actually store the configuration data itself. You could even read the contents of a SOAP envelope—but never store those contents on disk if your application had to go offline.
Obviously, actually writing out XML is pretty important. In fact, if you just tried to keep data floating around in memory and didn't need to worry about how that data was stored, you might easily conceive of a programming world where XML is never really needed, and certainly isn't be the important piece of the programming culture that it is today.
So the question is simple: how are you persisting your XML to a file? Now, I assume that I'm dealing with programmers who actually handle this task themselves. In other words, if you never deal with persisting XML in your programming tasks, this discussion might be more informational than anything else (although it's a good place to start as you learn how to perform that task). For those of you who do have to worry about persistence, I've come up with three pretty common mainstream approaches:
- Using the DOM and JDOM APIs and the like directly to write to a file from your XML data structure
- Using the Transformation API for XML (TrAX) and the identity transformation to persist your XML
- Using a higher-level API like JAXB to handle persistence
Using APIs directly
If you read the XML with an API (or APIs), one quite obvious way to write XML to a
file is to use that same API to write the XML. For example, if you work with XML
using the JDOM API, and have a JDOM
Document object, you can simply do something like this:
XMLOutputter outputter = new XMLOutputter(); outputter.setFormat(Format.getPrettyFormat()); outputter.output(myDocument, new FileWriter("outputFile.xml"));
You can do something similar in DOM Level 3 using the new Load and Save API:
DOMWriter writer = new org.apache.xml.serialize.XMLSerializer(); writer.setNewLine("\r\n"); writer.setEncoding("UTF-8"); writer.writeNode(new FileOutputStream(new File("outputFile.xml")), myDocument);
Note the variety of ways to use the new DOM API, some less vendor-dependent than others. The example above includes a Xerces-specific class in the code, but other ways aren't as tightly tied in to a specific vendor class. Those approaches aren't as clear from a teaching perspective, though, so I kept* the vendor-specific code.
The advantage here is that you interact pretty directly with your API, and thus have a lot of control. You can set newlines, you can deal with indentation, you can control almost every aspect of the output file. Additionally, you have as little between you and the file as is possible; there's no wrapper API nor layers of indirection, and you keep your (programming) hands close to the XML. Assuming you know JDOM or DOM well, these are extremely useful ways to output XML.
Some of the same things that are so good about this approach double as negatives. Just
as you have control over every detail of output, you have the ability to really mess
things up by misconfiguring that output. Bad line feeds, incorrect encodings, and I/O
errors are all common problems that can result from this approach. Additionally, you're
working at a very low level, without lots of helper utilities. (JDOM provides a few in the
Format.getCompactFormat() methods; DOM provides almost nothing). That means you must understand encodings, output formats, indentation, and anything else that would affect your output.
Another popular option is to use TrAX and the identity transformation. TrAX is the Transformation API for XML, and is now a part of JAXP, which in turn is included with every release of the Java platform (except the Micro Edition). TrAX allows you to use XSL stylesheets to transform XML. And because XML is most commonly manipulated with SAX and DOM, TrAX can take SAX events and DOM
Documents as input, and easily work with files as output. Additionally, TrAX can easily convert between these formats. For example, you can take an XML document represented in the DOM and transform it, and send the output to a file. Or you can read in a file, transform it, and put the resulting document into a DOM
As a side effect of all this, you can use a stylesheet that doesn't do anything to a document, and simply take one format as input and output that document to any other format. Using a non-transforming stylesheet—essentially a stylesheet that does nothing but echo out what it receives as input—is called an identity transformation. So you can take a document from a file, apply an identity transformation, and end up with that same XML in a DOM
Document. If you go the other way—from DOM to a file—then you actually persist your XML. That looks a bit like this:
Source domSource = new DOMSource(myDOMDocument); Result fileResult = new StreamResult(new File("outputFile.xml")); TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(); transformer.transform(domSource, fileResult);
The XML in the DOM document here ends up in a file called outputFile.xml.
Advantages of TrAX
The biggest benefit of TrAX is that it's so simple to use. It's also accessible to anyone with access to the Java platform, and doesn't require a lot of knowledge of SAX or DOM. That makes this option extremely attractive to developers with only rudimentary XML programming skills. Additionally, junior programmers who aren't familiar with SAX or DOM can use TrAX—learning all of about 10 or 20 lines of functional code—and quickly persist XML to files, or even DOM
Documents or SAX events.
The downside of TrAX
The biggest negative of using TrAX is that, while it's easy to perform an identity transformation, it's pretty tricky to handle the details of output. Line feeds, encodings, spacing and indentation—these are all things that TrAX offers for configurability, but they aren't nearly as easy to work with as they might be if you used DOM or JDOM directly. As in most cases, the ease of use that TrAX offers for common tasks is paired with less flexibility, at least right out of the box.
Note that it's possible to get TrAX and the identity transformation to do nearly everything JDOM or DOM can do in terms of output; it's just not as simple or intuitive. You have to learn a bit about XSLT and a bit about the TrAX API, neither of which are that closely related to the actual output tasks you're performing.
Data binding for persistence
Another way to move XML into a static form—particularly if you want that form to be a file on disk somewhere—is to use a data binding API like JAXB. While data binding isn't usually considered a persistence method, it effectively is just that: a way to take an XML document represented in memory and write it to a file.
I don't have room in this tip to go into much detail about what data binding is (and you can find several articles on the subject on developerWorks already); here's an abbreviated version of some code that uses data binding, of the JAXB flavor, to achieve persistence:
FileOutputStream stream = new FileOutputStream("outputFile.xml"); Marshaller marshaller = myJaxbContext.createMarshaller(); marshaller.marshal(myJavaObject, stream);
You can set several options, such as the encoding of the output file, all with the
Marshaller object. In fact, JAXB is probably just as flexible as either of the two previous methods shown in terms of setting up output properties.
The pros of JAXB
One of the biggest advantages of JAXB is that it's quite easy to use, particularly when you use it for rather simple tasks. And, while SAX and DOM are still considered somewhat hardcore, at least in normal Java programming circles, JAXB is standard fare for almost everyone using the Java language. That means that you'll find more current articles and tutorials on JAXB (which a survey of articles published in 2007 on sites like this will bear out). Additionally, support for JAXB is a little bit better than that for DOM or SAX. Even though SAX and DOM are part of the standard edition of the Java platform, JAXB is very much an invention of Sun Microsystems, Inc., so it seems to be a bit better supported.
Additionally, you don't need much XML knowledge at all to use JAXB. You can work with
normal Java objects—not XML-specific ones like DOM's
Text interfaces—and take
those objects straight to XML. That means less to learn to get started, and almost everyone likes less to learn, especially when the boss is yelling at you from the corner office.
The cons of JAXB
The bad news when it comes to JAXB is that you don't have to know much about XML to use it. While that might have seemed like the pro I just referred to, it doubles as a potential con. The less you know about XML, the harder it is to intelligently use JAXB. You can easily end up with an XML file that isn't formatted in a useful fashion, or that has only some of the objects you meant to persist in it, or that is different from the objects you marshaled in unexpected ways.
All of this often leads developers to either put JAXB down, or learn a lot more about XML, SAX, and DOM. At that point, many developers then move on to using SAX and DOM for persistence, and keep JAXB simply for its simplest function: converting between XML and Java objects.
And one other option...
I intentionally left one final option out of this discussion: writing XML as a series
of bits, bytes, and strings directly to a
FileWriter. This is certainly a viable way to write XML to a
file, and it happens quite a bit; however, in this case, you don't persist existing XML
data as much as create XML from data you have that's not already in that format. You'll recognize this kind of code because it usually looks something like this:
String xmlString = setupXMLBuffer( new StringBuffer("<firstName>") .append(customer.firstName) .append("</firstName>") .append("<lastName>") .append(customer.lastName) .append("</lastName>") // etc... .toString() ); bufferedWriter.write(xmlString); // other file I/O code
Nothing is wrong with this code at all; it's just that instead of persisting XML, you really persist data and stuff it in XML, all in one step. So issues about how to persist the data, and which approach works best, are irrelevant. The act of writing the data and putting it into XML can't be separated, so much of this discussion just doesn't apply.
How should you handle your XML persistence? No one answer is right. That said, it's worth discussing the options that you and other Java and XML developers use. Are you moving towards a fairly consistent approach to what is a universal problem? Does one form of persistence result in XML documents on disk that are easier to read, use, consume, or send to another application?
As in most of these tips, my point is to get people actually talking about what works for them, and what doesn't. If you figure out what works for other people, you should be able to get better at your own programming—at least, that's the idea! So take a minute to hop on the Java and XML forums at developerWorks (see the links in Resources), and let us know what you use for persistence. If it's based on some particular functionality, let us know that, too. I hope to see you online soon.
- Sun's online Java and XML Headquarters: When it comes to JAXP, there's no better place to start than here.
- The core API documentation for Java 5.0 technology: See how the JAXP JavaDoc is now integrated into the API.
- SAX Web site: Find out more about the APIs under the covers of JAXP. Start with SAX 2 for the Java environment.
- The W3C Web site: For another view of XML supported by SAX, take a look at DOM.
- Apache Xerces parser: Read about this parser that Sun uses in their JDK 5.0 implementation.
- Introduction to XML tutorial (Doug Tidwell, developerWorks, August 2002): Need a more basic introduction to XML? Try the this tutorial and other educational offerings, which cover the most fundamental topics.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- Participate in the discussion forum.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.
- developerWorks XML zone: Share your thoughts: After you read this article, post your comments and thoughts in this forum. The XML zone editors moderate the forum and welcome your input.