Skip to main content

Practical data binding: Looking into JAXB, Part 1

Round-tripping and XML-to-Java translation

Brett McLaughlin (brett@oreilly.com), Editor, O'Reilly and Associates
Brett McLaughlin has worked in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Summary:  In the last installment, Brett examined several important concepts in data binding, including round-tripping and semantic equivalence. In this article, he looks at Sun's JAXB architecture and implementation in light of these terms. You'll learn how JAXB handles class generation, and how that affects the XML input and output accepted by the API.

View more content in this series

Date:  19 May 2004
Level:  Introductory
Activity:  2965 views

Before I get into the nuts and bolts of data binding, and particularly how you can apply a data binding package to common programming problems, you need to select a data binding package to work with. My general rule is that you should pick out software on your own, as your programming requirements are inevitably unique. That said, the information that programmers use to make these decisions is universal. In this article, I'll examine JAXB in light of these universal rules, and help you determine if JAXB is the right solution for your data binding needs.

A brief review

Before diving into JAXB, though, I'll quickly run through the terms from the last article in this series. Here are a few important definitions:

  • Unmarshalling: The process of converting XML data to a Java class (or classes).
  • Marshalling: The process of converting Java data to an XML document (just the opposite of unmarshalling).
  • Semantic equivalence: Equality based on the rules of XML. Two documents can be semantically equivalent even if they don't look the same -- see the previous article for examples.
  • Round-tripping: The complete trip from an XML document into Java code and back to XML. Effective round-tripping ensures that the input and output documents are identical (semantically equivalent).

I throw these terms around pretty loosely throughout the article, so make sure you've got a handle on each of them.

You should also understand that in this and the next several articles, the focus isn't necessarily on basic functionality -- it's on the implementation of that functionality. Every data binding package is able to marshal or unmarshal data. However, many packages handle these tasks loosely, and semantic equivalence and round-tripping suffer as a result. These flaws in implementation (or the lack thereof) are the focus of the first few articles of this series, so don't be surprised if I take several articles to get to basic usage of the packages. What's the point of using a package if you don't know if it actually works?

Finally, I assume that you have JAXB set up and running. You can find plenty of other articles on developerWorks that detail this process, and with the new Sun Java Web Services Developer Toolkit, installation is trivial. Get your packages installed, your classpath set up, and you'll be ready to go.


Generating classes

Before you can do much with JAXB, generate some Java classes to represent your XML data. You'll work with a fairly simply XML document in these samples, shown in Listing 1. This is a simple listing of one of my favorite things, guitars.


Listing 1. Basic XML listing of guitars
<guitars>
  <guitar id="10021">
    <builder luthier="true">Ryan</builder>
    <model>Mission Grand Concert</model>
    <back-sides>Brazilian Rosewood</back-sides>
    <top>Adirondack Spruce</top>
    <notes>
      <![CDATA[
        Just unbelievable...   this guitar has all the tone & 
        resonance you could ever want. I mean, <<WOW!!!>> This 
        is a lifetime guitar.
      ]]>
    </notes>
  </guitar>
  <guitar id="0923">
    <builder smallShop="true">Bourgeois</builder>
    <model>OMC</model>
    <back-sides>Bubinga</back-sides>
    <top>Adirondack Spruce</top>
  </guitar>
  <guitar id="11091">
    <builder>Martin & Company</builder>
    <model>OM-28VR</model>
    <back-sides>Indian Rosewood</back-sides>
    <top bearclaw="true">Sitka Spruce</top>
    <notes>It's certainly true that Martin isn't the only game in town anymore. 
           Still, the OM-28VR is one of their best models...     and this one 
           has some fabulous bearclaw to boot.              Nice specimen of a 
           still-important guitar manufacturer.
    </notes>
  </guitar>
</guitars>

You also need an XML Schema to generate classes and data structures from as you work with JAXB. The XML Schema for Listing 1 in shown in Listing 2.


Listing 2. XML Schema for Listing 1
<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           elementFormDefault="qualified">
  <xs:element name="back-sides" type="xs:string"/>
  <xs:element name="builder">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute name="luthier" default="false">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="true"/>
                <xs:enumeration value="false"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
          <xs:attribute name="smallShop" default="false">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="true"/>
                <xs:enumeration value="false"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="guitar">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="builder"/>
        <xs:element ref="model"/>
        <xs:element ref="back-sides"/>
        <xs:element ref="top"/>
        <xs:element ref="notes" minOccurs="0"/>
      </xs:sequence>
      <xs:attribute name="id" type="xs:string" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="guitars">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="guitar" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="model" type="xs:string"/>
  <xs:element name="notes" type="xs:string"/>
  <xs:element name="top">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute name="bearclaw" default="false">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="true"/>
                <xs:enumeration value="false"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
</xs:schema>


Basic instructions

With your XML and XML Schema files in place, it's simple enough to generate JAXB classes. Ensure that you've got your command line and environment set up, and issue the following command:

xjc -p com.ibm.dw guitars.xsd -d src

Make sure you're in the same directory as the guitars.xsd file, and create a directory called src off your working directory. If you don't take care of these steps, you'll get some java.io.IOException errors. Otherwise, you should see a long bit of output, some of which is shown in Listing 3.


Listing 3. JAXB class generation output
C:\developerworks>xjc -p com.ibm.dw guitars.xsd -d src
parsing a schema...
compiling a schema...
com\ibm\dw\impl\runtime\MSVValidator.java
com\ibm\dw\impl\runtime\SAXUnmarshallerHandlerImpl.java
com\ibm\dw\impl\runtime\ErrorHandlerAdaptor.java
com\ibm\dw\impl\runtime\AbstractUnmarshallingEventHandlerImpl.java
com\ibm\dw\impl\runtime\UnmarshallableObject.java
com\ibm\dw\impl\runtime\SAXMarshaller.java
com\ibm\dw\impl\runtime\XMLSerializer.java
com\ibm\dw\impl\runtime\ContentHandlerAdaptor.java
com\ibm\dw\impl\runtime\UnmarshallingEventHandlerAdaptor.java
com\ibm\dw\impl\runtime\SAXUnmarshallerHandler.java
com\ibm\dw\impl\runtime\ValidatorImpl.java
com\ibm\dw\impl\runtime\ValidatableObject.java
com\ibm\dw\impl\runtime\UnmarshallerImpl.java
com\ibm\dw\impl\runtime\NamespaceContext2.java
com\ibm\dw\impl\runtime\Discarder.java
com\ibm\dw\impl\runtime\NamespaceContextImpl.java
com\ibm\dw\impl\runtime\ValidatingUnmarshaller.java
com\ibm\dw\impl\runtime\UnmarshallingContext.java
com\ibm\dw\impl\runtime\GrammarInfoImpl.java
com\ibm\dw\impl\runtime\ValidationContext.java

This goes on for quite a bit -- notice that JAXB creates a lot of classes for even a relatively simply XML Schema.


Effects on round-tripping

Now that I've gone through the preliminaries, it's on to actual examination of what's happening here. Without spending time reviewing the basics of JAXB (which has been covered quite nicely in other places), you should have two source files for all of your elements: one named for the element (for example, Guitar.java), and one named for the element with "Type" stuck on the end (for example, GuitarType.java). Both are interfaces, with implementation classes within the sub-directory called impl. This makes for a lot of classes -- something I think is a bit of overkill.

What is of interest, though, is these classes themselves. Remember that one of the primary concerns of a data binding implementation is round-tripping -- the ability to go from XML to Java code and back to XML without mutating the data in some unpredictable way. In other words, what goes in is what comes out. At this point, you're not ready to actually test this out with an unmarshal-marshal cycle (although you will do that later); instead, just examine the source and check for potential problems.

The first problem occurs in one of the typical places for any data binding package: typing. Even with the help of XML Schema, XML doesn't always match up nicely to Java types. This often means that you lose some data typing, which can allow invalid data to sneak in. In some cases, this points to a problem in your XML Schema; in others, it's a limitation of an XML-to-Java mapping that just has to be watched for. One such problem is found in the source code that represents the top element. Take a look at the lines related to the bearclaw (highlighted in bold) in Listing 4, which is the source for the TopType class.


Listing 4. Source code for TopType.java
package com.ibm.dw;

public interface TopType {

    java.lang.String getValue();
    void setValue(java.lang.String value);

    java.lang.String getBearclaw();
    void setBearclaw(java.lang.String value);
}

If you look back at the source document and its XML Schema, it's clear that the bearclaw attribute is intended to be either "true" or "false." Unfortunately, JAXB failed to pick up on this and use a boolean data type -- instead, the TopType class accepts any string value for this attribute. The result is, potentially, bad data. It's also possible to end up with "True", "true", "tRUe", and any number of other variations that your XML-consuming applications may choke on. In other words, you've got a problem area that you're going to have to deal with.

Here are several solutions to this problem:

  1. Manually edit the TopType class's source code to accept only boolean values.
  2. Manually add exception handling to the TopType methods to ensure only string values that can be converted to booleans are supplied.
  3. Create a new type in your XML Schema that represents boolean data types.

The first two options are pretty self-explanatory. The third option is also simple, albeit something that the W3C folks really should have put into the specification. Listing 5 gives a simple boolean type definition:


Listing 5. A boolean type for schemas
<xsd:simpleType name="xsd:boolean">
  <xsd:restriction base="xsd:string">
    <xsd:enumeration value="true"/>
    <xsd:enumeration value="True"/>
    <xsd:enumeration value="TRUE"/>
    <xsd:enumeration value="false"/>
    <xsd:enumeration value="False"/>
    <xsd:enumeration value="FALSE"/>
  </xsd:restriction>
</xsd:simpleType>

Looks pretty good, right? Trouble is, it still won't solve the problem. JAXB is still going to generate a class that accepts a string argument, based on the usage of the xsd:string construct in your XML Schema.

Now, before all of you start to tell me that this isn't a problem, let me show you what JAXB does do to protect your data. When you go to marshal your Java classes back out to XML, validation methods are called which have been generated based on your XML Schema (and the restrictive types, shown in both Listing 2 and Listing 5). In other words, if you've supplied a value of "foobar" to the bearclaw attribute, it will get caught. However, values like "TRUe", "fAlSe", and "tRue" -- all, almost certainly, unwanted results -- will also be caught in this validation process. So now you've got to use the type detailed in Listing 5, and notate every possible permutation of the words "true" and "false" in relation to capitalization. This is a lot of effort for what seems like a trivial task. It's these sorts of issues that make the process of round-tripping very complex, and a lot easier to talk about than to implement. These are also exactly the sort of problems you should watch out for when choosing, and then using, a data binding package.

What's even more concerning, at least to me, is that this can create a problem that goes beyond simple round-tripping. Remember that error-checking only happens during marshalling -- this means that erroneous data can be floating around in these member variables while they are in memory, awaiting marshalling. Further, this applies to all properties that have a restrictive set of values -- not just those intended to be boolean. The glaring issue here, though, is that sometimes XML documents are read in, manipulated, and used by other applications, rather than being marshalled back out to XML. So all your applications now have the ability to insert erroneous data into these fields, and any other applications using the data get that erroneous data. Unless you want to marshal out your classes every time you need to access information, you've got a real problem on your hands. By the way, these problems are all indicative of the relatively immature state of data binding, not just JAXB in particular.


What do I do?

So what can you do? First, keep reading these articles. I'll walk through JAXB, and later Castor, and really try to identify the areas to be wary of. You can't write bulletproof and error-protected code without knowing where the most problematic issues arise; that's what this and the next few articles focus on. More importantly, realize that even the best data binding package needs a good programmer or two to add some additional protection to make it behave properly.

Finally, keep in mind that data binding isn't always a magic bullet. I don't want to cast a pall on all of you who are interested in data binding -- rather, the contrary. I think it's a killer application; however, sometimes a simple SAX program, or even a DOM tree, provides all the functionality you need without adding in nearly the complexity of a data binding project. In future columns, I'll examine the best times to use data binding, as opposed to when SAX and DOM are more effective, and give you plenty of examples to help in your decision-making process.


What's Next?

It's obvious that I haven't completely exhausted JAXB, however you begin to see the kinds of analysis that are valuable when you investigate data binding packages. Choosing a data binding package is more complex than just choosing your favorite Web site and clicking a link -- be sure that issues like round-tripping are handled properly for the application you choose.

JAXB is still immature -- this is still the first major release of a fairly new technology. Also, keep in mind that all the pre-release versions of JAXB were basically scrapped (remember when JAXB only worked with DTDs? Now it only works with XSDs), so this 1.x release is a true first attempt at this sort of thing. That doesn't mean you shouldn't use JAXB -- it just indicates that caution is warranted.

In my next article, I'll move from class generation to unmarshalling and marshalling, and show you how those processes work. I'll also begin to delve into the handling of issues like whitespace, CDATA sections, and a lot more. Stick around... you'll see a lot more code, a lot more detail, and a lot more fun. Until next time, see you online!


Resources

About the author

Brett McLaughlin has worked in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=10941
ArticleTitle=Practical data binding: Looking into JAXB, Part 1
publish-date=05192004
author1-email=brett@oreilly.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers