 | Level: Introductory Kevin Williams (kevin@blueoxide.com), CEO, Blue Oxide Technologies, LLC
01 Mar 2003 One of the great features of XML is that you can easily reuse your designs all the way down to the component level. In this first installment of a three-part series, columnist Kevin Williams provides an overview of XML reuse in enterprise-level solutions, with examples in both XML and XML Schema. You can share your thoughts on this article with the author and other readers in the accompanying discussion forum.
In enterprise-level solutions, one of the most challenging problems facing XML designers is how to design structures that can be reused. In this column, I take a look at some of the historical approaches to reusing serialized data, and then show how XML allows you to break from tradition and take a more flexible approach to your document designs.
In the beginning: pre-XML reuse strategies
Before XML was created, serialized data typically took the form of flat files (setting aside SGML for now, as it was complex enough to be used only in specialized situations). These files could take any one of a number of forms, the most common being the repeating record approach. In these serialized representations, a sample of which is shown in Table 1, each record was defined to be a particular number of characters, and a separate document was required that described how each record looked.
Table 1. Sample flat-file description
|
Starting position
|
Length
|
Name
|
Format
|
Description
| | 1 | 30 | Name | string | The name of the customer. Right-padded with spaces. | | 31 | 10 | Balance | numeric(10,2) | The customer's balance. Implied decimal. Left-padded with zeroes. | | 41 | 6 | Due date | date | The customer's bill due date. MMDDYY. |
The output of this typical flat file looks like this:
Kevin Williams 0000010817031103Anne Yastremski 0000007723031303
|
Several things about this approach made it difficult to use. For instance, without a document describing the content, the file itself was difficult to comprehend. The example here isn't too challenging -- assuming that you know it was created near the beginning of 2003 -- but more complex files would be virtually impossible to parse without the supporting documentation. Also, any change to the source file would break any parser designed to read it. For instance, suppose I added two digits to the example to make the year four digits. Record length would grow from 46 characters to 48, and every parser would have to be modified to take this change into account.
Unless the receiving system happened to be directly compatible with the sending system (such as two COBOL systems using a PICS file to describe the data), there is no easy way to validate the document contents -- the fields essentially have to be validated by hand (that is, with custom code). And finally, the document design is one size fits all. If the receiving system just wanted to know the total outstanding balance across all customers, it would have to receive and discard many bytes of extraneous data.
Typical XML reuse strategies
With XML serialization, you can address many of the problems of flat files. In the XML arena, your description of the file's content becomes an XML Schema, and the file itself becomes an XML document, as shown in Listing 1.
Listing 1. Equivalent XML Schema and XML instance
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="customers">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="customer" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="customer">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="name">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="30"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="balance">
<xsd:simpleType>
<xsd:restriction base="xsd:float">
<xsd:totalDigits value="10"/>
<xsd:fractionDigits value="2"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="dueDate" type="xsd:date"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<customers>
<customer>
<name>Kevin Williams</name>
<balance>108.17</balance>
<dueDate>20030311</dueDate>
</customer>
<customer>
<name>Anne Yastremski</name>
<balance>77.23</balance>
<dueDate>20030313</dueDate>
</customer>
</customers>
|
Let's see how this approach addresses the problems with flat files that were identified earlier. First, XML documents are relatively self-describing, so even without the XML Schema document you could parse this document with a fairly high degree of confidence. Next, the structured nature of the serialization makes it less intolerant to changes. If you were to add a field, for example, it might be able to accept the new file without modification, depending on how the parser was written.
In addition, document validation is now built in because of the XML Schema. Any system that can understand XML Schema documents (and that's most systems these days) can validate the data against the constraints described there at parse time, with no additional code. Unfortunately, this design approach doesn't address the final concern: the one-size-fits-all problem. Most XML schema design efforts to date have mimicked the approach taken by flat-file designers in the past: build a single structure and attempt to use that structure in all situations. If a serialization contains information that you're not interested in, you have to discard that information -- chewing up valuable bandwidth and processor time. If you want a different serialization, then the designers go back to the drawing board and create a new structure that requires an entirely different parser.
However, the hierarchical nature of XML allows you to adopt a different approach to XML design -- component-level reuse, which I look at next.
Component-level reuse: an overview
Think of an XML document as a set of nested containers. The outermost container is the root element. All of its children appear as containers inside it, and so on, until containers only contain actual text values. (Attributes can be thought of as labels on the specific containers describing their contents, but let's not stretch the metaphor too far right now). For example, you might have the structures shown in Figures 1 and 2, each of which contains a customer element:
Figure 1. Customer list example
Figure 2. Invoice example
At first glance, these structures seem incompatible. They describe completely different data concepts (one describes a customer list, the other a single invoice) and contain different information. However, if you build two completely separate structures for these two documents you miss a reuse opportunity. You may be able to reuse the customer element in both of these documents. To do so, however, the two customer elements have to be syntactically and semantically identical. Not only do they have to have the same contents, but the contents must also have the same meaning each time they appear.
Great, you might say, so I've shared the customer element. What's the advantage to doing so? Component-level reuse has several benefits. Let's briefly touch on three of them.
Benefit 1
First, the nature of XML and XML parsers makes it very easy to treat your containers -- or elements -- as black boxes. In other words, you can take an element that you know is reused somewhere else and copy it without worrying about the contents of that element. Let's say, for instance, I have the customer list document shown earlier in Listing 1 and I want to create an invoice for a particular customer. Once I've identified that customer's element in the customer list (by matching on the customer ID or name, for example), I can easily copy that element and all of its children and add it to my invoice document. I don't need to know what other information is embedded in the customer element. The customer element could include detailed address information, a summary of the customer's orders over the past year, or even the customer's favorite flavor of ice cream. To the code, it doesn't matter. Sharing elements leads to simpler, more manageable code, because the contents of the customer element don't have to be deserialized and reserialized to move from one document to the next.
Benefit 2
Second, typical approaches to XML presentation can be greatly leveraged when elements are reused. Because XSLT operates on a per-element basis, guaranteeing that an element is syntactically and semantically equivalent across different source documents enables you to reuse XSLT style sheet fragments. For instance, suppose that whenever I display customer information on my Web site I want the customer name to be bold, the address to be italicized, and so on. If I store this processing code in one place (say, as a template in a customer.xslt file), I can use xsl:include in my style sheets everywhere I need to display the customer information. Then, when the product manager demands that I change the customer name from black text to navy, I can change the customer.xslt file and it will automatically apply everywhere a customer appears on a Web page.
Benefit 3
Third, reusing components in XML designs allows you to reuse serialization code. Since I now know that the customer element will be the same in each of my target documents, I can create a Customer object with a serialize method that returns an XML document fragment. Then whenever I need a customer to appear in an XML document (for instance, as part of my invoice or as part of my customer list), I can use the same code to build the necessary element. This approach reduces code redundancy and greatly simplifies troubleshooting and upgrades.
Conclusion
In this column, I looked at how you can reuse XML designs not only at the document level, but also down to the component level. I showed you advantages of this component-level reuse, and how it can simplify code and shorten development cycles. In the next column, I'll identify the different types of reusable components in XML document designs and show you some practical examples of each.
Resources
- See some lessons learned as a result of failed software reuse initiatives in Douglas Schmidt's article, "Why Software Reuse has Failed and How to Make It Work for You."
- Check out Daniel Steinberg's article, "Help for reusing your assets," to learn more about asset reuse in the enterprise (developerWorks, November 2001).
- Larry Singer, Chief Information Officer (CIO) for the state of Georgia and executive director of the Georgia Technology Authority (GTA), offers his take on component-level architecture and reuse (developerWorks, April 2001).
- Learn more about how good taxonomies and semantics support XML reuse in Joshua Lubell's presentation, "Architectures in an XML World."
- Find out more about the philosophy behind XML reuse in Simon Nicholson's article, "The XML Assembly Line: Better Living Through Reuse."
- Find other installments of Kevin Williams' XML for Data column.
- Find more XML resources on the developerWorks XML zone.
-
IBM WebSphere Studio provides a suite of tools that automate XML development, both in Java and in other languages. It is closely integrated with the WebSphere Application Server, but can also be used with other J2EE servers.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the author  | |  |
Kevin Williams is the CEO of Blue Oxide Technologies, LLC, a company that designs XML and Web service creation software. Visit their Web site at http://www.blueoxide.com. Kevin can be reached for comment at kevin@blueoxide.com.
|
Rate this page
|  |