Data binding APIs are most useful in that they allow programmatic manipulation of XML. It's a lot easier to type someElement.addAttribute("name", "value"); than it is to parse a file, buffer the output, add the characters that make up an attribute declaration, close the stream, and flush the output stream. However, all the manipulation in the world is of little use if you can't properly write your changes back out to a file. This article focuses on that process -- known as marshalling in the data binding world -- and in particular JAXB's marshalling capabilities. Specifically, you'll learn how JAXB scores in the round-tripping arena.
In the first installment of this column, you learned some important terms: Marshalling and unmarshalling are intrinsic to the data binding world; however, you also learned some new terms, such as round tripping and semantic equivalence. Round tripping is the process of converting from XML to Java code, and then back again. The quality of a data binding API's round tripping capabilities is measured by how closely the input and output documents match up. Semantic equivalence makes that comparison possible -- it allows insignificant aspects of XML, like ignorable whitespace, to be discarded and a valid comparison to be made.
In the second installment, I introduced a simple XML document, shown again here in Listing 1.
Listing 1. Basic XML listing of guitars (guitars.xml)
<guitars>
<guitar id="10021">
<builder luthier="true">Ryan</builder>
<model>Mission Grand Concert</model>
<back-sides>Brazilian Rosewood</back-sides>
<top>Adirondack Spruce</top>
<notes>
<![CDATA[
Just unbelievable... this guitar has all the tone &
resonance you could ever want. I mean, <<WOW!!!>> This
is a lifetime guitar.
]]>
</notes>
</guitar>
<guitar id="0923">
<builder smallShop="true">Bourgeois</builder>
<model>OMC</model>
<back-sides>Bubinga</back-sides>
<top>Adirondack Spruce</top>
</guitar>
<guitar id="11091">
<builder>Martin & Company</builder>
<model>OM-28VR</model>
<back-sides>Indian Rosewood</back-sides>
<top bearclaw="true">Sitka Spruce</top>
<notes>It's certainly true that Martin isn't the only game in town anymore.
Still, the OM-28VR is one of their best models... and this one
has some fabulous bearclaw to boot. Nice specimen of a
still-important guitar manufacturer.
</notes>
</guitar>
</guitars>
|
I also supplied a schema for this document -- which isn't repeated here for brevity's sake -- and I showed you how to generate Java source files from this schema, as shown in Listing 2.
Listing 2. JAXB class generation output
C:\developerworks>xjc -p com.ibm.dw guitars.xsd -d src parsing a schema... compiling a schema... com\ibm\dw\impl\runtime\MSVValidator.java com\ibm\dw\impl\runtime\SAXUnmarshallerHandlerImpl.java com\ibm\dw\impl\runtime\ErrorHandlerAdaptor.java com\ibm\dw\impl\runtime\AbstractUnmarshallingEventHandlerImpl.java com\ibm\dw\impl\runtime\UnmarshallableObject.java com\ibm\dw\impl\runtime\SAXMarshaller.java com\ibm\dw\impl\runtime\XMLSerializer.java com\ibm\dw\impl\runtime\ContentHandlerAdaptor.java com\ibm\dw\impl\runtime\UnmarshallingEventHandlerAdaptor.java com\ibm\dw\impl\runtime\SAXUnmarshallerHandler.java com\ibm\dw\impl\runtime\ValidatorImpl.java com\ibm\dw\impl\runtime\ValidatableObject.java com\ibm\dw\impl\runtime\UnmarshallerImpl.java com\ibm\dw\impl\runtime\NamespaceContext2.java com\ibm\dw\impl\runtime\Discarder.java com\ibm\dw\impl\runtime\NamespaceContextImpl.java com\ibm\dw\impl\runtime\ValidatingUnmarshaller.java com\ibm\dw\impl\runtime\UnmarshallingContext.java com\ibm\dw\impl\runtime\GrammarInfoImpl.java com\ibm\dw\impl\runtime\ValidationContext.java |
Ensure that you have these Java source files generated, compiled, and ready for use. For detailed steps, consult the previous article in the series.
With your classes generated and ready for use, you're all set to unmarshal the XML document from Listing 1 into JAXB's in-memory model. This is the first step in testing out JAXB's round-tripping capabilities. Since this isn't an article on JAXB basics (for such articles, see Resources), I'll just let you see the code, shown in Listing 3.
Listing 3. Unmarshalling XML to Java code
import java.io.FileInputStream;
import javax.xml.bind.*;
// Import generated classes
import com.ibm.dw.*;
public class RoundTripper {
private String inputFilename;
private String outputFilename;
private JAXBContext jc;
private final String PACKAGE_NAME = "com.ibm.dw";
public RoundTripper(String inputFilename, String outputFilename) throws Exception {
this.inputFilename = inputFilename;
this.outputFilename = outputFilename;
jc = JAXBContext.newInstance(PACKAGE_NAME);
}
public Guitars unmarshal() throws Exception {
Unmarshaller u = jc.createUnmarshaller();
return (Guitars)u.unmarshal(new FileInputStream(inputFilename));
}
public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Incorrect usage: java RoundTripper" +
"[input XML filename] [output XML filename]");
return;
}
try {
RoundTripper rt = new RoundTripper(args[0], args[1]);
Guitars guitars = rt.unmarshal();
} catch (Exception e) {
e.printStackTrace();
return;
}
}
}
|
Note: If you are having trouble getting these classes set up and running, consult the last section in this article, "Running example programs," for help.
Some might think that at this point you should print out the version in memory. However, the same APIs used to print something in memory are also used to write the data to an output stream, so that step is really unnecessary.
Now you can instruct JAXB to spit the in-memory representation back out to XML. This will allow you to inspect the differences between your input file and your output file. I've added code to the RoundTripper class to take care of this, as seen in Listing 4.
Listing 4. Marshalling Java to XML
import java.io.FileInputStream;
import java.io.FileOutputStream;
import javax.xml.bind.*;
// Import generated classes
import com.ibm.dw.*;
public class RoundTripper {
private String inputFilename;
private String outputFilename;
private JAXBContext jc;
private final String PACKAGE_NAME = "com.ibm.dw";
public RoundTripper(String inputFilename, String outputFilename) throws Exception {
this.inputFilename = inputFilename;
this.outputFilename = outputFilename;
jc = JAXBContext.newInstance(PACKAGE_NAME);
}
public Guitars unmarshal() throws Exception {
Unmarshaller u = jc.createUnmarshaller();
return (Guitars)u.unmarshal(new FileInputStream(inputFilename));
}
public void marshal(Guitars guitars) throws Exception {
Marshaller m = jc.createMarshaller();
m.marshal(guitars, new FileOutputStream(outputFilename));
}
public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Incorrect usage: java RoundTripper" +
"[input XML filename] [output XML filename]");
return;
}
try {
RoundTripper rt = new RoundTripper(args[0], args[1]);
Guitars guitars = rt.unmarshal();
rt.marshal(guitars);
} catch (Exception e) {
e.printStackTrace();
return;
}
}
}
|
This is, again, fairly simple and self-explanatory. I ran this program with guitars.xml as the input file, and supplied output.xml as the output filename. There isn't any output to speak of, in terms of text written to the terminal, but you should get a new file (output.xml) from running this process. Theoretically, this file should be a carbon copy of guitars.xml, since no changes were made to the file in memory.
Once you've generated output.xml, open it up. It should look very similar, if not identical, to Listing 5.
Listing 5. output.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<guitars>
<guitar id="10021">
<builder luthier="true">Ryan<builder>
<model>Mission Grand Concert</model>
<back-sides>Brazilian Rosewood<back-sides>
<top>Adirondack Spruce<top>
<notes>
Just unbelievable... this guitar has all the tone &
resonance you could ever want. I mean, <<WOW!!!>> This
is a lifetime guitar.
<notes>
</guitar>
<guitar id="0923">
<builder smallShop="true">Bourgeois</builder>
<model>OMC<model>
<back-sides>Bubinga<back-sides>
<top>Adirondack Spruce</top>
<guitar>
<guitar id="11091">
<builder>Martin & Company<builder>
<model>OM-28VR<model>
<back-sides>Indian Rosewood<back-sides>
<top bearclaw="true">Sitka Spruce<top>
<notes>It's certainly true that Martin isn't the only game in town anymore.
Still, the OM-28VR is one of their best models... and this one
has some fabulous bearclaw to boot. Nice specimen of a
still-important guitar manufacturer.
<notes>
<guitar>
<guitars>
|
With an input and output file, it's now possible to see how JAXB does at round-tripping, by comparing the files to each other (remember that the original file was shown back in Listing 1).
First, note that the input file had no XML declaration (the line that starts with <xml version=...). JAXB automatically inserted this line into the output. This may seem like a minor issue, but is pretty important -- it's common to actually include an XML file in another XML document these days, particularly when working with SOAP or other transport technologies. The problem with the insertion of the XML declaration is that an XML document can have only one. If you inserted guitars.xml into another XML document, you'd not violate this rule; on the other hand, if you did the same with output.xml, you would have problems. So right away, JAXB has a feature that you need to be careful about.
Also note that the CDATA sections in the original XML document have been removed. This technically doesn't violate the rules of semantic equivalence -- the content in both documents are semantically the same. In the first document, entity references are avoided through the use of CDATA; in the output document, CDATA is abandoned in favor of entity references. This is more an issue about actual equality between the documents, rather than semantic equality. It's something you should be aware of, although not something that's a major concern.
It's nice to see properly handled whitespace. Even though the CDATA section is removed, the whitespace is properly preserved. In addition, the lengthy space in the middle of the description of the Martin OM-28VR guitar is kept as is, a properly handled issue.
One of the best and most telling ways to evaluate roundtripping is to actually retest the round-tripping process. Be careful, though -- I don't mean to simply redo the test. Instead, take the output file (output.xml), and feed it to the round-tripper as the input file. If anything has been introduced into the XML that shouldn't be there, you'll see each successive roundtrip create an output that is a little further away from the original file (guitars.xml). This is a great way to really isolate problem areas. A good data binding tool will always create the same file, over and over again, especially after the initial roundtrip process.
Performing this step, I instruct my RoundTripper to produce retest.xml, based on output.xml as the source XML. The result is shown in Listing 6.
Listing 6. retest.xml
<xml version="1.0" encoding="UTF-8" standalone="yes"?>
<guitars>
<guitar id="10021">
<builder luthier="true">Ryan<builder>
<model>Mission Grand Concert</model>
<back-sides>Brazilian Rosewood<back-sides>
<top>Adirondack Spruce<top>
<notes>
Just unbelievable... this guitar has all the tone &
resonance you could ever want. I mean, <<WOW!!!>> This
is a lifetime guitar.
<notes>
<guitar>
<guitar id="0923">
<builder smallShop="true">Bourgeois<builder>
<model>OMC<model>
<back-sides>Bubinga<back-sides>
<top>Adirondack Spruce<top>
<guitar>
<guitar id="11091">
<builder>Martin & Company</builder>
<model>OM-28VR<model>
<back-sides>Indian Rosewood<back-sides>
<top bearclaw="true">Sitka Spruce<top>
<notes>It's certainly true that Martin isn't the only game in town anymore.
Still, the OM-28VR is one of their best models... and this one
has some fabulous bearclaw to boot. Nice specimen of a
still-important guitar manufacturer.
<notes>
</guitar>
<guitars>
|
The good news is that Listing 5 and Listing 6 are identical -- showing that JAXB does a pretty good job once that initial roundtripping step has been taken.
In general, then, I can say that JAXB shows pretty well. While I think the automatic addition of the XML declaration is a real issue, it's still not bad compared to APIs that affect the content. JAXB also handles CDATA sections a little differently than you might expect, but it does preserve semantic equivalence. In the next article, I'll show you the various options that you can tweak to further affect the output file, manually dealing with some of the issues that JAXB introduces. All in all, though, JAXB shows itself well in preserving the input document as it should.
In closing, let me share with you my cheat sheet, the Ant setup I use to make all my classpath and JAXB samples easy. Listing 7 shows the Ant build file I've used with this article. To use this file yourself, simply modify the paths to your own XML input files, as well as to your JAXB JAR files.
Listing 7. Ant build file
<?xml version="1.0"?>
<project basedir="." default="roundtrip">
<property name="jwsdp.home" value="c:\jwsdp-1.3"/>
<property name="xml.inputFile" value="guitars.xml"/>
<property name="xml.outputFile" value="output.xml"/>
<property name="xml.retestFile" value="retest.xml"/>
<path id="classpath">
<pathelement path="build"/>
<fileset dir="${jwsdp.home}" includes="jaxb/lib/*.jar"/>
<fileset dir="${jwsdp.home}" includes="jwsdp-shared/lib/*.jar"/>
<fileset dir="${jwsdp.home}" includes="jaxp/lib/**/*.jar"/>
<path>
<taskdef name="xjc" classname="com.sun.tools.xjc.XJCTask">
<classpath refid="classpath"/>
<taskdef>
<!-- compile Java source files -->
<target name="compile">
<!-- generate the Java content classes from the schema -->
<echo message="Compiling the schema external binding file..."/>
<xjc schema="guitars.xsd" package="com.ibm.dw" target="src"/>
<!-- compile all of the java sources -->
<echo message="Compiling the java source files..."/>
<javac srcdir="src" destdir="build" debug="on">
<classpath refid="classpath"/>
</javac>
<!-- Copy over the properties files -->
<copy todir="build">
<fileset dir="src">
<exclude name="**/*.java"/>
</fileset>
<copy>
<target>
<target name="roundtrip" depends="compile">
<echo message="Converting XML file to Java and back..."/>
<java classname="RoundTripper">
<arg value="${xml.inputFile}" />
<arg value="${xml.outputFile}" />
<classpath refid="classpath" />
</java>
<target>
<target name="roundtrip-retest" depends="roundtrip">
<echo message="Converting XML file to Java and back... (Second iteration)"/>
<java classname="RoundTripper">
<arg value="${xml.outputFile}" />
<arg value="${xml.retestFile}" />
<classpath refid="classpath" />
<java>
<target>
<project>
|
By default, this file will generate source files from a schema, compile those files, copy over the required JAXB property files, and then compile and run the RoundTripper class. You can manually run the roundtrip-retest target, which handles the second pass of the roundtripping process, using output.xml as the input file. This file should make life easier -- enjoy!
- Read the first installment of the Practical data binding column in which Brett examines several important concepts in data binding, including round-tripping and semantic equivalence. In the second installment, he looks at how JAXB handles class generation, and how that affects the XML input and output accepted by the API (developerWorks, May 2004).
- Visit the developerWorks "XML and Java technology" forum, hosted by Brett McLaughlin, for additional information on how to work with these two technologies.
- Read Dennis Sosnoski's article "Code generation approaches -- JAXB and more" (developerWorks, January 2003) in which he looks
at several XML data binding approaches using code generation from W3C XML Schema or DTD grammars for XML documents.
- Try Quick, an alternative data binding framework.
- Learn how to use Quick with "Converting between Java objects and XML with Quick" by Brett McLaughlin (developerWorks, August 2002).
- Try JaxMe, an open-source implementation of the JAXB API.
- Check out XMLBeans, another open-source data binding tool.
- Obtain text parsing utilities from the Jakarta Commons package.
- Get the scoop on Sun's XML APIs from Sun's Web site.
- Check out Brett's complete work on data binding in
Java and XML Data Binding
(O'Reilly & Associates).
-
Browse for books on these and other technical topics.
- Find more data binding resources on the developerWorks
XML and Java technology zones.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.