With the fall in the price-performance ratio of storage, there's really no need to throw data away—ever. The result is a growing mountain of data and, because nothing in technology is really free (including open source), the need to manage the data arises. This problem of the data mountain is only beginning to affect mainstream users, but it has been around for years in large data-centric service provider organizations. This article looks at the problem of relational data management—specifically, XML data management.
Managing data requires specific skills in many areas, such as backup and restore, database design, and choice of service provider. Should backups occur every day? What approach should you take in relation to database design? Should you use natural keys or application-generated primary keys? With regard to service providers, should data be stored remotely or on site?
Although these are all important questions, you'll look at a very specific issue—one that has important repercussions on data design, data retrieval, and client code management. This is the problem of representing optional XML elements in relational databases. As you'll see, the solution adopted for modeling optional XML elements has far-reaching consequences.
You'll see a number of technologies in this article. To make the topic progression clear, here's a road map of coverage:
- Brief review of XSD and XML
- Bridging the XML and Java™ worlds with Java Architecture for XML Binding (JAXB)
- A simple engine to publish the Java class data
- A database entity class
- Solving the problem of optional XML data
Start with some data definition.
An optional XML element in XSD
Listing 1 provides an excerpt from an XSD document that
specifies an optional element called comment. This
XSD document defines a purchase order. Often called a business object,
you can think of such a purchase order as an important entity in the
application domain. In other words, users of the system might typically
interact with such business objects—to create purchase orders, modify
existing purchase orders, and so on. The important point is that Listing 1 describes
the starting point for modeling a typical business object.
Listing 1. An XSD type definition with an optional element
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="EuropeanAddress"/>
<xsd:element name="billTo" type="EuropeanAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
|
In the context of XSD, what is an optional XML element? Well, an optional element
is simply one that can occur zero or more times in the associated XML document.
Don't worry about the relationship between XSD and XML documents: XSD is simply
a format that allows you to define the structure and types that appear in an XML
document. Basically, each xsd:element in Listing 1
defines a placeholder for an item in an XML document. Let's make this more
concrete by introducing another technology: JAXB.
JAXB: a bridge between XML and Java technology
It's not controversial to describe XML as the lingua franca of modern computing. But, on its own, XML is a bulky and not very readable format. So, wouldn't it be nice to be able to translate from XML into Java code? Fortunately, you can do this several ways, and the one you'll look at is JAXB.
JAXB uses an XSD file in conjunction with a sample XML file to generate Java classes. The generated Java classes then represent the XSD elements. Sound difficult? Well, it isn't, and the following section looks at an example of a full translation. If you want to follow along with the steps described, you'll need to install the Sun Microsystems Java Web Services Developer Pack (see Resources). I used version 2.0.
Follow the installation instructions, and pay particular attention to the steps about creating a directory called %JAVA_HOME%\jre\lib\endorsed and copying JAR files to that location. This step is required to override the JAXB classes built into the Java software development kit (JDK) version 1.5.x.
XML transformation into Java code
Listing 2 illustrates an XML file that you can think of as an instantiation of the XSD document from Listing 1.
Listing 2. An example XML document
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="IE">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Cambridge</city>
<postcode>12345</postcode>
</shipTo>
<billTo country="IE">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Cambridge</city>
<postcode>12345</postcode>
</billTo>
<items>
<item partNum="242-NO" >
<productName>Nosferatu - Special Edition (1929)</productName>
<quantity>5</quantity>
<USPrice>19.99</USPrice>
</item>
<item partNum="242-MU" >
<productName>The Mummy (1959)</productName>
<quantity>3</quantity>
<USPrice>19.98</USPrice>
</item>
<item partNum="242-GZ" >
<productName>Godzilla and Mothra: Battle for Earth
/Godzilla versus
King Ghidora</productName>
<quantity>3</quantity>
<USPrice>27.95</USPrice>
</item>
</items>
</purchaseOrder>
|
So, the problem definition is simple: you want to transform the contents of Listing 2 into a set of related Java classes. This process is straightforward using JAXB and Apache Ant:
ant compile -Djwsdp.home=C:\Sun\jwsdp-2.0 |
This command creates a set of new folders and Java class files. The Java classes relate to the elements in the XSD file. It's now time to put these classes to work.
A simple engine to publish the Java class data
You now need to put the generated classes together in a simple program. Listing 3 illustrates the key part of the program modified from one of the examples bundled with the Java Web Services Developer Pack.
Listing 3. The JAXB program
JAXBElement<?> poElement =
(JAXBElement<?>)u.unmarshal(new FileInputStream("po.xml"));
PurchaseOrderType po = (PurchaseOrderType)poElement.getValue();
// examine some of the content in the PurchaseOrder
System.out.println("Ship the following items to: ");
// display the shipping address
EuropeanAddress address = po.getShipTo();
displayAddress(address);
// display the items
Items items = po.getItems();
displayItems(items);
|
In Listing 3, you unmarshall an XML file called po.xml into an instance of a class
called PurchaseOrderType. You then dig into this class
and extract the constituent elements. You do all this using the JAXB-generated classes.
Once the automatically generated classes are created, you can use them as follows:
ant run -Djwsdp.home=C:\Sun\jwsdp-2.0 |
Listing 4 shows the program output from this command.
Listing 4. The transformation and program output
ant run -Djwsdp.home=C:\Sun\jwsdp-2.0
compile:
[echo] Compiling the schema...
[xjc] C:\java\ibmcode\unmarshal-read\gen-src\primer.po is not found and thus
excluded from the dependency check
[xjc] Compiling file:/C:/java/ibmcode/unmarshal-read/po.xsd
[xjc] Writing output to C:\java\ibmcode\unmarshal-read\gen-src
[echo] Compiling the java source files...
[javac] Compiling 4 source files to C:\java\ibmcode\unmarshal-read\classes
run:
[echo] Running the sample application...
[java] Ship the following items to:
[java] Alice Smith
[java] 123 Maple Street
[java] Cambridge null
[java] IE
[java]
[java] 5 copies of "Nosferatu - Special Edition (1929)"
[java] 3 copies of "The Mummy (1959)"
[java] 3 copies of "Godzilla and Mothra: Battle for Earth/Godzilla versus
King Ghidora"
BUILD SUCCESSFUL
Total time: 9 seconds
|
Notice how Listing 4 reproduces the data in Listing 2 thanks to JAXB.
You've managed to represent the domain data in a pair of XSD and XML files. You used the XSD file in conjunction with JAXB to create a suite of associated Java classes. You then used a simple Java program to manipulate the generated classes in conjunction with the XML file and to display the contents of the latter. Now what?
Recall from the XSD file that you also generated optional XML data. The next step in the roadmap is to persist the XML data to a relational database. This sounds difficult, but using the Java Persistence application programming interface (JPA) and Hibernate, this task proves relatively easy.
Listing 5 illustrates the most important part of a new
Java class called PurchaseOrder.
Listing 5 The PurchaseOrder Java class
@Id @GeneratedValue
@Column(name = "PO_ID")
private Long id;
@Embedded
private ShippingAddress shippingAddress;
@Embedded
private BillingAddress billingAddress;
@Column(name = "COMMENT")
private String comment;
@Column(name="COMMENT_ENTERED", columnDefinition="boolean default false")
private boolean commentEntered;
|
In Listing 5, you have a standard primary key column called PO_ID
(that is, purchase order ID). This column is followed by two address types: one
for shipping and the other for billing. Next comes the interesting part: the
COMMENT database column, which represents the
optional XML element.
Notice the last column in Listing 5: the COMMENT_ENTERED
column. This is a Boolean-type column and reflects the user actions as follows:
- If the user enters a comment, then
COMMENT_ENTEREDis True. - If the user doesn't enter a comment, then
COMMENT_ENTEREDis False.
This is in fact the solution to the original problem of the optional XML element! What's so special about this? To answer this, you need to know a bit about null values in relational databases.
A general, good practice is not to allow null (that is, no value) columns in a relational database, because when you attempt to read such a column, you get a null value exception. This in turn means that your Java extraction code must catch this exception. The result is complex Java code with no real business benefit.
Take the example above of the COMMENT column. If
no comment is associated with a given purchase order object, you might expect this
column to be null after persisting the entity. This means that your Java extraction
code must expect the possibility of a null value (and an associated exception). In
effect, such a null value is a type of data column overloading, and in this context,
a null value might be understood to represent some aspect of the data. There's
simply no need to use null values in this way! As illustrated above, an additional
Boolean column avoids the problem altogether.
So, before you read the COMMENT column in your
Java extraction code, you can check the value of the
COMMENT_ENTERED column and only then attempt your
read. Where's the benefit? The main benefit is simplicity. The data definition is simpler,
because it reflects the user-data interactions. Likewise, the Java extraction code is
cleaner and faster, with no null-value exceptions being thrown.
XML is the lingua franca of modern computing. It still remains somewhat challenging to get XML data into and out of relational databases. Typically, recourse is made to one of the Java marshaling technologies, such as, JAXB, which allows for elegant definition of business objects in the form of XSD files. Coupled with code generation, these business objects can then be transformed into Java classes. The classes can in turn be transformed into entities suitable for persistence solutions.
So, how should you approach the problem of optional XML elements? One approach that many organizations use is to insert null values in the associated database columns. Is this good or bad? Is it necessary? Inserting null values in relational databases is a bad practice. This is true for legacy (that is, non-XML) data, and it is also true for XML data. Even when XML includes optional elements, it's still possible to avoid the need for null values. Regardless of how complex an application domain is, let longstanding design practices to guide your decision-making.
Instead of null values, it's a simple matter to define additional metadata that reflects the data semantics. So, for any columns that can potentially be null, you can just define an additional Boolean column. The additional column can have a (false) default value that is modified only when the partner column has a value, avoiding the need for nulls.
Learn
- XML and Java technologies: Data binding, Part 1: Code generation approaches -- JAXB and more: Generating data classes from DTDs or schemas (Dennis Sosnoski, developerWorks, January 2003): Read background on why you might want to use data binding for XML, plus an overview of available Java frameworks for data binding including JAXB.
- Practical data binding: Looking into JAXB, Part 1: Round-tripping and XML-to-Java translation (Brett McLaughlin, developerWorks, May 2004): Learn how JAXB handles class generation, and how that affects the XML input and output accepted by the API.
- Practical data binding: Looking into JAXB, Part 2: A closer look at JAXB's round-tripping capabilities (Brett McLaughlin, developerWorks, May 2004): Focus on how to use classes generated with the JAXB API, and examine JAXB's round-tripping capabilities. Also, learn the problem areas in converting from XML to Java code and then back again.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- Java Web Services Developer Pack 2.0: Download the developer pack to better understand the JAXB translation process shown in this article.
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.
Stephen Morris is an independent writer/consultant based in Ireland. Widely experienced in enterprise development and networking applications, Stephen has worked for some of the world's biggest networking companies on projects such as J2EE/J2SE-based network management systems, billing applications, financial systems, porting/developing SNMP entities, network device technologies, and several mobile computing applications. He holds a master's degree in computer science and holds three patents in the area of network management. He is the author of Moving Your Career Up the Value Chain: Building Specialized Development Skills in a Global Economy as well as numerous articles on network management and other topics.
Comments (Undergoing maintenance)





