Skip to main content

Optional XML in relational databases, Part 1: Are null values needed?

Plan for optional XML elements and avoid null values in your database

Stephen B Morris, CTO, Omey Communications
Stephen Morris is an independent writer/consultant based in Ireland. Widely experienced in enterprise development and networking applications, Stephen has worked for some of the world's biggest networking companies on projects such as J2EE/J2SE-based network management systems, billing applications, financial systems, porting/developing SNMP entities, network device technologies, and several mobile computing applications. He holds a master's degree in computer science and holds three patents in the area of network management. He is the author of Moving Your Career Up the Value Chain: Building Specialized Development Skills in a Global Economy as well as numerous articles on network management and other topics.

Summary:  Optional XML elements—for example, online forms—are special in that they might or might not appear in XML files. Representing such optional elements is a challenge both in programming languages and in databases. In many cases, null values are inserted in databases to represent empty optional elements, but null database XML values can be difficult to retrieve. Discover alternative, less-invasive approaches to handling optional XML elements without the need for data overloading or inserting null or other sentinel values.

Date:  09 Jun 2009
Level:  Intermediate PDF:  A4 and Letter (37KB | 9 pages)Get Adobe® Reader®
Activity:  3385 views

With the fall in the price-performance ratio of storage, there's really no need to throw data away—ever. The result is a growing mountain of data and, because nothing in technology is really free (including open source), the need to manage the data arises. This problem of the data mountain is only beginning to affect mainstream users, but it has been around for years in large data-centric service provider organizations. This article looks at the problem of relational data management—specifically, XML data management.

Frequently used acronyms

  • JAR: Java Archive
  • XML: Extensible Markup Language
  • XSD: XML Schema Infoset Model

Managing data requires specific skills in many areas, such as backup and restore, database design, and choice of service provider. Should backups occur every day? What approach should you take in relation to database design? Should you use natural keys or application-generated primary keys? With regard to service providers, should data be stored remotely or on site?

Although these are all important questions, you'll look at a very specific issue—one that has important repercussions on data design, data retrieval, and client code management. This is the problem of representing optional XML elements in relational databases. As you'll see, the solution adopted for modeling optional XML elements has far-reaching consequences.

Your road map

You'll see a number of technologies in this article. To make the topic progression clear, here's a road map of coverage:

  • Brief review of XSD and XML
  • Bridging the XML and Java™ worlds with Java Architecture for XML Binding (JAXB)
  • A simple engine to publish the Java class data
  • A database entity class
  • Solving the problem of optional XML data

Start with some data definition.

An optional XML element in XSD

Listing 1 provides an excerpt from an XSD document that specifies an optional element called comment. This XSD document defines a purchase order. Often called a business object, you can think of such a purchase order as an important entity in the application domain. In other words, users of the system might typically interact with such business objects—to create purchase orders, modify existing purchase orders, and so on. The important point is that Listing 1 describes the starting point for modeling a typical business object.


Listing 1. An XSD type definition with an optional element
	
<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
    <xsd:element name="shipTo" type="EuropeanAddress"/>
    <xsd:element name="billTo" type="EuropeanAddress"/>
    <xsd:element ref="comment" minOccurs="0"/>
    <xsd:element name="items" type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>

In the context of XSD, what is an optional XML element? Well, an optional element is simply one that can occur zero or more times in the associated XML document. Don't worry about the relationship between XSD and XML documents: XSD is simply a format that allows you to define the structure and types that appear in an XML document. Basically, each xsd:element in Listing 1 defines a placeholder for an item in an XML document. Let's make this more concrete by introducing another technology: JAXB.

JAXB: a bridge between XML and Java technology

It's not controversial to describe XML as the lingua franca of modern computing. But, on its own, XML is a bulky and not very readable format. So, wouldn't it be nice to be able to translate from XML into Java code? Fortunately, you can do this several ways, and the one you'll look at is JAXB.

JAXB uses an XSD file in conjunction with a sample XML file to generate Java classes. The generated Java classes then represent the XSD elements. Sound difficult? Well, it isn't, and the following section looks at an example of a full translation. If you want to follow along with the steps described, you'll need to install the Sun Microsystems Java Web Services Developer Pack (see Resources). I used version 2.0.

Follow the installation instructions, and pay particular attention to the steps about creating a directory called %JAVA_HOME%\jre\lib\endorsed and copying JAR files to that location. This step is required to override the JAXB classes built into the Java software development kit (JDK) version 1.5.x.

XML transformation into Java code

Listing 2 illustrates an XML file that you can think of as an instantiation of the XSD document from Listing 1.


Listing 2. An example XML document
	
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="IE">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Cambridge</city>
        <postcode>12345</postcode>
    </shipTo>
    <billTo country="IE">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Cambridge</city>
        <postcode>12345</postcode>
    </billTo>
    <items>
        <item partNum="242-NO" >
            <productName>Nosferatu - Special Edition (1929)</productName>
            <quantity>5</quantity>
            <USPrice>19.99</USPrice>
        </item>
        <item partNum="242-MU" >
            <productName>The Mummy (1959)</productName>
            <quantity>3</quantity>
            <USPrice>19.98</USPrice>
        </item>
        <item partNum="242-GZ" >
            <productName>Godzilla and Mothra: Battle for Earth
/Godzilla versus 
                    King Ghidora</productName>
            <quantity>3</quantity>
            <USPrice>27.95</USPrice>
        </item>
    </items>
</purchaseOrder>

So, the problem definition is simple: you want to transform the contents of Listing 2 into a set of related Java classes. This process is straightforward using JAXB and Apache Ant:

ant compile -Djwsdp.home=C:\Sun\jwsdp-2.0

This command creates a set of new folders and Java class files. The Java classes relate to the elements in the XSD file. It's now time to put these classes to work.

A simple engine to publish the Java class data

You now need to put the generated classes together in a simple program. Listing 3 illustrates the key part of the program modified from one of the examples bundled with the Java Web Services Developer Pack.


Listing 3. The JAXB program
	
JAXBElement<?> poElement =
   (JAXBElement<?>)u.unmarshal(new FileInputStream("po.xml"));
   PurchaseOrderType po = (PurchaseOrderType)poElement.getValue();

   // examine some of the content in the PurchaseOrder
   System.out.println("Ship the following items to: ");

  // display the shipping address
  EuropeanAddress address = po.getShipTo();
  displayAddress(address);

  // display the items
  Items items = po.getItems();
  displayItems(items);

In Listing 3, you unmarshall an XML file called po.xml into an instance of a class called PurchaseOrderType. You then dig into this class and extract the constituent elements. You do all this using the JAXB-generated classes. Once the automatically generated classes are created, you can use them as follows:

ant run -Djwsdp.home=C:\Sun\jwsdp-2.0

Listing 4 shows the program output from this command.


Listing 4. The transformation and program output
	
ant run -Djwsdp.home=C:\Sun\jwsdp-2.0

compile:
     [echo] Compiling the schema...
      [xjc] C:\java\ibmcode\unmarshal-read\gen-src\primer.po is not found and thus
excluded from the dependency check
      [xjc] Compiling file:/C:/java/ibmcode/unmarshal-read/po.xsd
      [xjc] Writing output to C:\java\ibmcode\unmarshal-read\gen-src
     [echo] Compiling the java source files...
    [javac] Compiling 4 source files to C:\java\ibmcode\unmarshal-read\classes

run:
     [echo] Running the sample application...
     [java] Ship the following items to:
     [java]     Alice Smith
     [java]     123 Maple Street
     [java]     Cambridge null
     [java]     IE
     [java]
     [java]     5 copies of "Nosferatu - Special Edition (1929)"
     [java]     3 copies of "The Mummy (1959)"
     [java]     3 copies of "Godzilla and Mothra: Battle for Earth/Godzilla versus
King Ghidora"

BUILD SUCCESSFUL
Total time: 9 seconds

Notice how Listing 4 reproduces the data in Listing 2 thanks to JAXB.

You've managed to represent the domain data in a pair of XSD and XML files. You used the XSD file in conjunction with JAXB to create a suite of associated Java classes. You then used a simple Java program to manipulate the generated classes in conjunction with the XML file and to display the contents of the latter. Now what?

Recall from the XSD file that you also generated optional XML data. The next step in the roadmap is to persist the XML data to a relational database. This sounds difficult, but using the Java Persistence application programming interface (JPA) and Hibernate, this task proves relatively easy.

A database entity class

Listing 5 illustrates the most important part of a new Java class called PurchaseOrder.


Listing 5 The PurchaseOrder Java class
	
    @Id @GeneratedValue
    @Column(name = "PO_ID")
    private Long id;

    @Embedded
    private ShippingAddress shippingAddress;
    @Embedded
    private BillingAddress billingAddress;

    @Column(name = "COMMENT")
    private String comment;
    @Column(name="COMMENT_ENTERED", columnDefinition="boolean default false")
    private boolean commentEntered;

In Listing 5, you have a standard primary key column called PO_ID (that is, purchase order ID). This column is followed by two address types: one for shipping and the other for billing. Next comes the interesting part: the COMMENT database column, which represents the optional XML element.

Notice the last column in Listing 5: the COMMENT_ENTERED column. This is a Boolean-type column and reflects the user actions as follows:

  • If the user enters a comment, then COMMENT_ENTERED is True.
  • If the user doesn't enter a comment, then COMMENT_ENTERED is False.

This is in fact the solution to the original problem of the optional XML element! What's so special about this? To answer this, you need to know a bit about null values in relational databases.

No need for null columns

A general, good practice is not to allow null (that is, no value) columns in a relational database, because when you attempt to read such a column, you get a null value exception. This in turn means that your Java extraction code must catch this exception. The result is complex Java code with no real business benefit.

Take the example above of the COMMENT column. If no comment is associated with a given purchase order object, you might expect this column to be null after persisting the entity. This means that your Java extraction code must expect the possibility of a null value (and an associated exception). In effect, such a null value is a type of data column overloading, and in this context, a null value might be understood to represent some aspect of the data. There's simply no need to use null values in this way! As illustrated above, an additional Boolean column avoids the problem altogether.

So, before you read the COMMENT column in your Java extraction code, you can check the value of the COMMENT_ENTERED column and only then attempt your read. Where's the benefit? The main benefit is simplicity. The data definition is simpler, because it reflects the user-data interactions. Likewise, the Java extraction code is cleaner and faster, with no null-value exceptions being thrown.

Conclusion

XML is the lingua franca of modern computing. It still remains somewhat challenging to get XML data into and out of relational databases. Typically, recourse is made to one of the Java marshaling technologies, such as, JAXB, which allows for elegant definition of business objects in the form of XSD files. Coupled with code generation, these business objects can then be transformed into Java classes. The classes can in turn be transformed into entities suitable for persistence solutions.

So, how should you approach the problem of optional XML elements? One approach that many organizations use is to insert null values in the associated database columns. Is this good or bad? Is it necessary? Inserting null values in relational databases is a bad practice. This is true for legacy (that is, non-XML) data, and it is also true for XML data. Even when XML includes optional elements, it's still possible to avoid the need for null values. Regardless of how complex an application domain is, let longstanding design practices to guide your decision-making.

Instead of null values, it's a simple matter to define additional metadata that reflects the data semantics. So, for any columns that can potentially be null, you can just define an additional Boolean column. The additional column can have a (false) default value that is modified only when the partner column has a value, avoiding the need for nulls.


Resources

Learn

Get products and technologies

Discuss

About the author

Stephen Morris is an independent writer/consultant based in Ireland. Widely experienced in enterprise development and networking applications, Stephen has worked for some of the world's biggest networking companies on projects such as J2EE/J2SE-based network management systems, billing applications, financial systems, porting/developing SNMP entities, network device technologies, and several mobile computing applications. He holds a master's degree in computer science and holds three patents in the area of network management. He is the author of Moving Your Career Up the Value Chain: Building Specialized Development Skills in a Global Economy as well as numerous articles on network management and other topics.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=394080
ArticleTitle=Optional XML in relational databases, Part 1: Are null values needed?
publish-date=06092009
author1-email=stephenbjm@yahoo.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers