 | Level: Introductory Brett McLaughlin (brett@oreilly.com), Editor, O'Reilly and Associates
04 May 2004 Data binding, although commonplace in today's world of Java technology and XML programming, is still largely misunderstood. This column throws out all the theoretical claptrap and focuses on the concepts you need to get started with data binding. You will understand the differences between general data binding and data binding in the XML world, as well as round-tripping, semantic equivalence, and what to look for in a data binding package.
The fact that you're reading this column tells me that you're at least mildly interested in XML data binding. Just a short year ago, that would have led me to define what data binding is, go into the concepts involved, and generally be boring for several pages worth of article text. However, it isn't a year ago -- it's 2004, and data binding seems to have sunk into the consciousness of almost every XML and Java™ developer working in the industry today. For reasons that are, to be honest, somewhat mystifiying to me, even junior XML developers who don't know what SAX and DOM stand for are spending hours a day working on data binding. So the background of you, the person who clicked on this article title, is not as clear to me as it once might have been.
However, this isn't necessarily a bad thing. The wealth of information on data binding, and the influence that this information has on programmers, means that you can get to some pretty fun stuff rather quickly, without spending weeks and weeks focusing on little conceptual details that don't help pay the bills (and put you to sleep while you're at it). Still, I should cover a few basics -- don't worry, though, it will be as painless as possible.
Another primer? Really?
If everyone is familiar with data binding, then why bother with any definition and concept at all? Why not just skip to the code? Well, that sounds appealing (to me as much as to you, I assure you). However, just because people think they know something doesn't mean that they do. In other words, while lots of people are throwing around the term "data binding", few of them actually understand what that term means, and even fewer get the basic issues associated with a solid data binding package. Just to make sure that everyone starts on the same page, this initial article will detail some important concepts that you've probably either never heard of, or only seen on an obscure Web page somewhere. I'll focus on practical application of these terms, and then dive right into a lot more code in the next article in this series. So hang in there -- this will be quick and easy.
Data binding in general
First, it's important to take a step back from XML data binding (which is where all your minds are running to -- admit it) and look at data binding in a more general sense. While most programmers think of data binding as the process of converting between an XML document and some Java programming structure, that's a specific application of the technology. Before opening up that particular can of worms, it's a good idea to look at the bigger picture.
Classical data binding
At its simplest, data binding is the process of taking some bit of data, such as in an XML document, text file, or database, and representing that data programmatically -- binding the data to some in-memory construct that you can throw around a virtual machine (VM) and operate upon. Similarly, data binding packages should allow you to then update the data in the underlying storage medium (that XML document, file, or database sector) with any changes you made to the value in your VM. If you can't do at least this much, then you don't have a data binding package -- at least not in the classical sense of the term.
Of course, that's all pretty boring, so it's much more common to see data binding packages that zero in on both a format medium and a programming language.
Extending data binding to Java technology
Because you're all cool Java programmers, I'll just conveniently narrow the focus to the Java programming language. Obviously, procedural languages can perform data binding tasks, but it's much easier to represent data in memory in an object-oriented language, particularly when the data has some structure to it, as in a database or XML document. The Java language turns out to be quite good at these tasks, which is probably why there is so much more interest (from what I see, at least) in Java data binding packages as opposed to other languages.
Once the programming language is narrowed to Java, it's possible to limit the other end of the data binding pipe -- the data format medium. First, take the rather simple case of text files. Text files are probably not talked about in data binding circles, generally because the Java language -- especially the Jakarta Commons package (see Resources) -- has a wealth of string parsing utilities, and data binding is a bit of overkill in these cases.
Another common data format medium is a database. For the purpose of this article, I'll take both relational databases (RDBMS) and object-oriented databases (OODBS) as one unit, rather than separately. When data binding is used in this case, it removes the need for a programmer to work with traditional database APIs, like JDBC or Enterprise JavaBeans (EJB). Instead, the manipulation of the database occurs behind the scenes, and is the work of the data binding API. It's here that you can begin to see why this discussion is important -- it reveals that one of data binding's advantages is its ability to shield the programmer from more complex APIs. It's obviously a lot easier to write code like that shown in Listing 1 than it is to work with JDBC directly.
Listing 1. Accessing a database with data binding APIs
// Get an instance of a data binding factory
Factory factory = DBFactory.newInstance();
factory.connect();
List employees = factory.unmarshal(DatabaseConstants.EMPLOYEE_TABLE);
// Manipulate data in employee objects
for (Iterator i = employees.iterator; i.hasNext(); ) {
Employee employee = (Employee)i.next();
System.out.println("First name: " + employee.getFirstName());
System.out.println("Last name: " + employee.getLastName());
// etc...
}
|
Even more popular than a database is using an API like this to work with XML, and pull data out of an XML document. Here's where the problem comes in: Programmers have been so focused on data binding with XML, rather than data binding in general, that they often take a generic API and make it format-specific. For example, Listing 2 shows the equivalent code to Listing 1, as I've seen it used in many recent consulting gigs.
Listing 2. Accessing an XML document with data binding APIs
// Get an instance of a data binding factory
Factory factory = XMLFactory.newInstance();
factory.connect();
List employees = factory.unmarshal(XMLConstants.EMPLOYEE_DOCUMENT);
// Manipulate data in employee objects
for (Iterator i = employees.iterator; i.hasNext(); ) {
Employee employee = (Employee)i.next();
System.out.println("First name: " + employee.getFirstName());
System.out.println("Last name: " + employee.getLastName());
System.out.println("Address (line 1): " +
employee.getAttribute("street1"));
System.out.println("Address (line 2): " +
employee.getAttribute("street2"));
}
|
Note, in particular, the bolded lines -- here, XML semantics have been introduced into the actual data binding usage. This is a really bad idea, and one that should make you run around the room screaming. When you work with (and choose) a data binding API, look for a generic API, not a specific one. That's why I've just spent some of your valuable time walking through these various permutations of data binding APIs -- to show you that the specifics really are an implementation detail, not an API detail. If XML semantics or database semantics show up in your code, then you've probably got a data binding API that is poorly written, poorly implemented, or both. Consider that changing from one data storage medium to another should be a piece of cake, and have minimal effect on your code. Move away from being a Java and XML data binding programmer and begin to be a data binding programmer. You'll find that your code is more robust, just as functional, and more flexible than ever.
My API is too specific, but I'm stuck with It
It's possible that some of you are seeing the light here, but are stuck with a particular data binding API or implementation. You may have code like that shown in Listing 2, or something similar, that exposes data format structure in places where it doesn't belong. That's OK -- it's actually possible to do a little work that can result in the same benefits I've just described. In these cases, consider writing a wrapper API. A wrapper API sits between your developers and the data binding API (which presumably isn't doing its job very well). You can encapsulate the functionality that's problematic, such as getAttribute() or getChild() calls (which are XML specific), by writing classes that call those methods but have more useful and storage-neutral method names. You should also consider putting a factory in place that allows you to easily move from an XML medium to a database medium (or to anything else you might one day need). This isn't a trivial task, but it's still manageable for even somewhat experienced developers. I won't show you how to write any wrapper APIs here, but it's something I plan to come back to later in this column.
 |
Important definitions
With an understanding of the value of a generic data binding API, you're already well along in learning how to choose and use data binding APIs properly. The last piece of the primer puzzle is to learn some definitions that you may not be familiar with. These are critical, especially when working with XML, to ensure that your API behaves properly; it can also save you frustration when your XML output from an API doesn't look like it did when it went in.
Marshalling and unmarshalling
First, here are two terms that you should be familiar with if you've worked with data binding even a little bit. Review these and then I'll show you some more interesting terms.
Marshalling is the process of converting data in memory to a storage medium. So in a Java and XML environment, marshalling would involve converting some set of Java objects to an XML document (or documents). In a database environment, it would be pushing the Java-represented data into a database. The magic in marshalling, obviously, is converting the object-oriented structure of Java instances into a flattened structure suitable for XML, or a relational structure in an RDBMS (converting to an OODBMS is actually pretty trivial with Java technology).
Unmarshalling is the process of converting data from its storage medium into memory -- just the opposite of marshalling. So you would unmarshal an XML document into a Java VM. The complexity here isn't in flattening data, as that's not necessary, but in mapping the right data to the right variables in Java code. If the mapping is wrong, then you won't be able to access your data properly. That of course turns into a bigger problem when you try to marshal things back out, and the problem propagates quickly.
Round-tripping
Round-tripping is perhaps the most important, and most misunderstood, of data binding terms. Round-tripping is the term used to describe a complete cycle, from storage medium into memory and back. In an XML and Java technology environment, this would mean going from an XML document into Java instance variables, and then back to an XML document again. Proper round-tripping requires that the XML input and the XML output be identical, assuming that no data has been changed in the interim. In other words, the documents input.xml and output.xml in Listing 3 should be essentially the same document.
Listing 3. Round-tripping
// Get an instance of a data binding factory
Factory factory = XMLFactory.newInstance();
factory.connect();
List employees = factory.unmarshal("input.xml");
employees.marshal("output.xml");
|
If the input and output don't match up properly, then you've got a problem with round-tripping, and trying to actually work with your API could get tricky fast. If you can't rely on your API to properly preserve data while it passes through the memory of your VM, you're essentially dead in the water. In my next column, I'll begin to examine the round-tripping capabilities of Sun's JAXB API, to see if it does what it's supposed to.
Semantic equivalence
Because XML is a textual format for data, it has some interesting quirks. For instance, the way that XML handles (and ignores) whitespace is a treatise unto itself. The existence of a DTD or schema affects how whitespace is treated; the use of the CDATA element affects entity processing; and indentation in one editor can result in a document that looks like a complete mess in another editor.
As if that's not enough, XML has some very specific rules about attributes and their ordering. In XML documents, the ordering of attributes on an element is not significant. Between this non-ordering and whitespace handling, two documents that look completely different could actually contain the same data. In such cases, it's really inaccurate to say that the documents are "the same", or that they are "equal." This brings me to a new term: semantic equivalence. This means that from the standpoint of the data within the documents, and the structure involved, the documents are the same. While the documents may look different from each other, from a data standpoint they are identical. For example, take a look at Listing 4.
Listing 4. Simple, formatted XML document
<employees>
<employee id="1045" ssn="498123049" firstName="Bob" lastName="Smith">
<address>
<street>109 Fairfield</street>
<city>Mesquite</city>
<state country="US" code="TX" />
<zipCode>75150</zipCode>
</address>
</employee>
</employees>
|
This is a simple, nicely-formatted XML document that's readable and easy to follow. Take a look at Listing 5, though, which isn't quite so enjoyable to peruse.
Listing 5. Not-so-pretty XML document
<employees><employee firstName="Bob" ssn="498123049" lastName="Smith" id="1045" >
<address> <street>109 Fairfield</street>
<city>Mesquite</city> <state code="TX" countryCode="US"></state>
<zipCode>75150</zipCode></address>
</employee> </employees>
|
You're probably already seeing that these two listings contain the same data -- in fact, they are semantically equivalent. That's not too hard to determine, because these are simple cases. However, when you have XML documents with hundreds or thousands of records, and each element can have several attributes (all of which can appear in any order), semantic equivalence is harder to determine. It's also a critical part of the data binding framework -- remember that the last section talked about round-tripping. Proper round-tripping doesn't require that the input and output documents look the same; it just requires that they be semantically equivalent. So now figuring out if your API does proper round-tripping requires that you know the basics of XML, and ensure that those rules are being followed by your API.
While some APIs offer all sorts of bells and whistles for customizing your output, you shouldn't have to use any of these options to ensure semantic equivalence. As with round-tripping, I'll examine JAXB's semantic equivalence processing in the next article, and see how the input and output documents differ in structure. I'll also examine any changes that take place from when a document is originally unmarshalled to when it is marshalled back into an output format.
Wrapping up
Some of you may feel that you've read through this entire article and it didn't tell you how to do anything. However, I think if you take the time to get a grasp of the basics of data binding -- and in particular round-tripping and semantic equivalence -- you'll find that you can program better, and eventually faster, when working with data binding APIs. This is also foundational material for what I'll be diving into in the next article -- the JAXB API. In the meantime, you'd do well to perform some simple tests using your own favorite API, to see how it (ideally) preserves semantic equivalence. So study up, and I'll see you next time.
Resources
- Visit the developerWorks "XML and Java technology" forum, hosted by Brett McLaughlin, for additional information on how to work with these two technologies.
-
Browse for books on these and other technical topics.
- Obtain text parsing utilities from the Jakarta Commons package.
- Check out the JAXB API at the Glassfish Web site.
- Take a peek at the Castor data binding project.
- Try Zeus, a simple, albeit useful,
case study in data binding.
- Investigate SAX's Web site to learn the basics of XML programming.
- Learn how to use JAXB to develop enterprise applications with WebSphere® Studio Application Developer V5.1
in this article by Tilak Mitra (developerWorks, February 2004).
- Find more data binding resources on the developerWorks
XML and Java technology zones.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the author  | |  | Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine. |
Rate this page
|  |