Over the past 10 years, XML has become a common, well-accepted standard for storing and exchanging data within and between organizations. Because XML itself is only an abstraction, the success of the format completely depends on the XML format designed by the organization or by a group of organizations. Just like any software product, these XML formats face maintenance challenges as business requirements change. And it's not just the need for changes in general: XML formats must often be updated for multiple organizations at the same time for competitive and market reasons.
Maintaining just one XML schema is relatively simple. Making a change that affects several hundred organizations, however, can have an immense impact. You can use up plenty of time and money just to address a simple XML schema change that, designed correctly up front, causes little more than a extra glance at the information. This article discusses two things:
- How to manage this impact
- How to minimize this impact where possible
We use some very simplistic examples containing cars, tires, and windscreens and their related companies or resellers. Although not completely realistic, this is sufficient to describe suggestions to improve the maintainability of XML formats.
To get started, let's use a Volvo C30 with Michelin tires as an example in which an XML file is created to share information about the tires. Listing 1 provides this example.
Listing 1. A simple XML example file for sharing information about tires
<car>
<brand>Volvo</brand>
<type>C30</type>
<kind>Small family car</kind>
<tires>
<tire>
<brand>Michelin</brand>
<type>Winter</type>
<count>4</count>
</tire>
<tire>
<brand>Michelin</brand>
<type>Spare</type>
<count>1</count>
</tire>
</tires>
<windscreen count="1">
<brand>Car glass</brand>
</windscreen>
</car>
|
The XML file looks pretty simple, doesn't it? At a glance, you might not see any problems. Look deeper, though. The actual problems are in the XML schema. It turns out to be fairly large and monolithic. And then, consider that this XML format has only a few element types. Imagine how this might grow for a truly realistic example, as in Listing 2.
Listing 2. An XML schema describing the simple XML format
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="car">
<xs:complexType>
<xs:complexContent>
<xs:extension base="brand">
<xs:sequence>
<xs:element ref="type"/>
<xs:element ref="kind"/>
<xs:element ref="tires"/>
<xs:element ref="windscreen"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:element name="kind" type="xs:string"/>
<xs:element name="tires">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="tire"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="tire">
<xs:complexType>
<xs:complexContent>
<xs:extension base="brand">
<xs:sequence>
<xs:element ref="type"/>
<xs:element ref="count"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:element name="count" type="xs:integer"/>
<xs:element name="windscreen">
<xs:complexType>
<xs:complexContent>
<xs:extension base="brand">
<xs:attribute name="count" use="required" type="xs:integer"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
<xs:complexType name="brand">
<xs:sequence>
<xs:element ref="brand"/>
</xs:sequence>
</xs:complexType>
<xs:element name="brand" type="xs:string"/>
<xs:element name="type" type="xs:NCName"/>
</xs:schema>
|
Now, before you say, "What's the big deal? Nobody looks at the XSD file," think about changes that might be required during the course of business. What if the tire XML needed to change to display the size of the tires, like this:
[...] <tire> <brand>Michelin</brand> <type>Winter</type> <count>4</count> <size>20"</size> </tire> [...] |
A simple change like this means that the windscreen company gets a new XSD when anything changes in the tire format. Plus, it needs to modify its software to understand the XSD. That's not good. It adds extra work and expense for the windscreen company. The tire companies, too, receive a new XSD and might also need to update software to handle it, depending upon how the company writes its software. Perhaps nobody reads the XSD, but it sure can give big headaches to a lot of people.
To avoid a monstrous XSD file, the solution is to give the tire and windscreen their own namespaces and separate XSD files. Listing 3 shows how to do that.
Listing 3. The example XML file modified to contain namespaces
[...]
<tr:tires>
<tr:tire count="4">
<tr:brand>Michelin</tr:brand>
<tr:type>Winter</tr:type>
</tr:tire>
<tr:tire count="1">
<tr:brand>Michelin</tr:brand>
<tr:type>Spare</tr:type>
</tr:tire>
</tr:tires>
<wnd:windscreen count="1">
<wnd:brand>Car glass</wnd:brand>
</wnd:windscreen>
[...]
|
If you do that and leave the rest of the car XML example intact, the XSD file becomes much shorter and easier to deal with. Most of the relevant information moves to other XSDs that are imported on top of the XSD. Now, when you look at the XSD file, it appears in modules, like Listing 4.
Listing 4. The modified car XML schema importing separate XSDs for tires and windscreens
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://car.org/car"
xmlns:tr="http://car.org/tire"
xmlns:wnd="http://car.org/windscreen"
xmlns:car="http://car.org/car">
<xs:import namespace="http://car.org/tire" schemaLocation="tr.xsd"/>
<xs:import namespace="http://car.org/windscreen" schemaLocation="wnd.xsd"/>
<xs:element name="car">
<xs:complexType>
<xs:sequence>
<xs:element ref="car:brand"/>
<xs:element ref="car:type"/>
<xs:element ref="car:kind"/>
<xs:element ref="tr:tires"/>
<xs:element ref="wnd:windscreen"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="brand" type="xs:NCName"/>
<xs:element name="type" type="xs:NCName"/>
<xs:element name="kind" type="xs:string"/>
</xs:schema>
|
Now, this is just one module of the XSD. The XSD file didn't really shrink as much as it appears here, but the more important point is that the XSDs are now modular and much easier to work with. As a result, you can change something in the tire format and distribute it to all tire companies without bothering windscreen companies with a change that doesn't affect them.
Positive side effects of modules
Modularity does much more than just help with the management of distributed maintainability problems. It also improves the re-usability of single elements. For example, suppose that the tire company produces tires for bicycles as well as for cars. The tire company might want to use the same tire XSD file to describe its bicycle tires. The bicycle company that buys these tires, however, does not want an XML schema describing a car, because most of the elements are irrelevant to a bicycle company. A common bicycle does not have a windscreen, for example. The bicycle companies want their own bicycle XSD that simply imports the tire XSD.
In that scenario, the XML might look like Listing 5.
Listing 5. An example XML file for a bicycle description reusing the tire XML format
<bicycle>
[...]
<tr:tire count="2">
<tr:brand>Gazelle</tr:brand>
<tr:type>Race</tr:type>
<tr:size>25"</tr:size>
</tr:tire>
[...]
</bicycle>
|
This is similar to the car XSD, in which the tire was imported, but it's specific to bicycles. Now, go back to the car for the next example.
Managing the reality of modules
Eventually, the car XSD not only imports a windscreen XSD and a tire XSD but also a motor XSD, a steering wheel XSD, a chair XSD, a paint XSD, and so on. This becomes difficult to manage. The solution to this problem is to include a separate XSD called parts.xsd that imports all the parts of the car. This way, when anything changes in the import list, only the parts.xsd file specifically meant to manage these dependencies changes. Listing 6 shows the entire example.
Listing 6. The car XML schema replacing the list of imports with one include
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://car.org/car"
xmlns:tr="http://car.org/tire"
xmlns:wnd="http://car.org/windscreen"
xmlns:car="http://car.org/car">
<xs:include schemaLocation="parts.xsd"/>
<xs:element name="car">
<xs:complexType>
<xs:sequence>
<xs:element ref="car:brand"/>
<xs:element ref="car:type"/>
<xs:element ref="car:kind"/>
<xs:element ref="tr:tires"/>
<xs:element ref="wnd:windscreen"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="brand" type="xs:NCName"/>
<xs:element name="type" type="xs:NCName"/>
<xs:element name="kind" type="xs:string"/>
</xs:schema>
|
The parts.xsd file looks like Listing 7, however.
Listing 7. parts.xsd: the list of imports to be included
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://car.org/car"
xmlns:tr="http://car.org/tire"
xmlns:wnd="http://car.org/windscreen"
xmlns:car="http://car.org/car">
<xs:import namespace="http://car.org/tire" schemaLocation="tr.xsd"/>
<xs:import namespace="http://car.org/windscreen" schemaLocation="wnd.xsd"/>
</xs:schema>
|
The practical advantage to using parts.xsd is that the list of XSD imports does not need to be copied through every XSD file that needs references to these parts. Instead, one inclusion is sufficient.
As the number of elements and parts grows in a real-world problem, you also have more reason to be careful with your type definitions. For example, it is common—but risky—in XML formats to use unqualified attributes—that is, attributes that do not specifically belong to one namespace, as Listing 8 shows.
Listing 8. An XML example with unqualified count attributes
<tr:tires>
<tr:tire count="4">
<tr:brand>Michelin</tr:brand>
<tr:type>Winter</tr:type>
</tr:tire>
<tr:tire count="1">
<tr:brand>Michelin</tr:brand>
<tr:type>Spare</tr:type>
</tr:tire>
</tr:tires>
<wnd:windscreen count="1">
<wnd:brand>Car glass</wnd:brand>
</wnd:windscreen>
|
If you select elements based on an unqualified attributes, it becomes difficult to
predict your results. For example, if you select all elements containing an
attribute count whose value is 1 with
the following XPath query, you couldn't predict which namespaces the results
belong to:
//[@count = 1] |
If you caught it, you actually can predict the namespaces, because the example is simple enough. But the point is, something like this can quickly grow out of proportion. The solution is to qualify each attribute with its specific namespace. Look at the subtle difference in the XML example in Listing 9.
Listing 9. An XML example with qualified count attributes
[...]
<tr:tires>
<tr:tire tr:count="4">
<tr:brand>Michelin</tr:brand>
<tr:type>Winter</tr:type>
</tr:tire>
<tr:tire tr:count="1">
<tr:brand>Michelin</tr:brand>
<tr:type>Spare</tr:type>
</tr:tire>
</tr:tires>
<wnd:windscreen wnd:count="1">
<wnd:brand>Car glass</wnd:brand>
</wnd:windscreen>
[...]
|
Luckily, this also means a very subtle difference in the XML schema, as you can see in Listing 10.
Listing 10. The tire XML schema modified for qualified count attributes
[...]
<xs:element name="tire">
<xs:complexType>
<xs:sequence>
<xs:element ref="tr:brand"/>
<xs:element ref="tr:type"/>
</xs:sequence>
<xs:attribute name="count" use="required" form="qualified" type="xs:integer"/>
</xs:complexType>
</xs:element>
[...]
|
The effects might not be dramatic, but they are noticeable. It's a good habit to prevent possible conflicts whenever you can.
Designing a more general format
When you use an relational database as storage back end, the tendency is to make a one-on-one translation of the database tables or fields into XML constructs. Doing so limits modeling freedom but also infects other connecting parties with the limitations of the database, even when they do not use this database. The XML in Listing 1 is a good example of such a one-on-one translation from a relational database.
Maybe you want your document to be modeled differently, however. You might, for instance, want a front-to-back layout of the car to make an automated drawing of all the parts. This could be described in the XML example shown in Listing 11, which you can use in an XSLT transformation to Scalable Vector Graphics (SVG). This is not extremely difficult, even though it might appear so at first glance. Here's how to create the XML file:
Listing 11. An XML example in an alternative structure
<car>
<brand>Volvo</brand>
<type>C30</type>
<kind>Small family car</kind>
<tr:tire tr:count="2">
<tr:brand>Michelin</tr:brand>
<tr:type>Winter</tr:type>
</tr:tire>
<wnd:windscreen wnd:count="1">
<wnd:brand>Car glass</wnd:brand>
</wnd:windscreen>
<tr:tire tr:count="2">
<tr:brand>Michelin</tr:brand>
<tr:type>Winter</tr:type>
</tr:tire>
<tr:tire tr:count="1">
<tr:brand>Michelin</tr:brand>
<tr:type>Spare</tr:type>
</tr:tire>
</car>
|
The car XSD now looks like Listing 12 (pay special attention to
the complexType in the car element).
Listing 12. The car XML schema describing an alternative structure
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://car.org/car"
xmlns:tr="http://car.org/tire"
xmlns:wnd="http://car.org/windscreen"
xmlns:car="http://car.org/car">
<xs:import namespace="http://car.org/tire" schemaLocation="tr.xsd"/>
<xs:import namespace="http://car.org/windscreen" schemaLocation="wnd.xsd"/>
<xs:element name="car">
<xs:complexType>
<xs:sequence>
<xs:element ref="car:brand"/>
<xs:element ref="car:type"/>
<xs:element ref="car:kind"/>
<xs:choice maxOccurs="unbounded">
<xs:element ref="tr:tire"/>
<xs:element ref="wnd:windscreen"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="brand" type="xs:NCName"/>
<xs:element name="type" type="xs:NCName"/>
<xs:element name="kind" type="xs:string"/>
</xs:schema>
|
Storing this file in a relational database and later retrieving the exact same XML document is more complicated. For these kind of purposes, NXD can be handy. Exist DB is a simple open source example of such an NXD. IBM DB2 Express-C is a free alternative that offers an integrated solution of a relational database and an XML database. It allows you to access the database with SQL or with pure XML technology like XQuery.
Managing versions and documentation of XML schemas
In many cases, it's perfectly fine for a group of companies to use multiple versions of
the same XML schema at the same time. As long as you know which version you
use and which elements changed in that version, you'll be okay. A good habit to get
into is to use XSD's annotation elements to describe the version and element
information. Annotations can contain two elements: documentation
and appinfo.
The documentation element speaks for itself—always
document, and you'll rarely have regrets! More interesting is the
appinfo element, because it can contain anything you
like. For example, if you define your custom element version
to contain a version of a specific element, the XSD looks like
Listing 13.
Listing 13. An example XML schema containing customized annotations
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://car.org/tire"
xmlns:tr="http://car.org/tire"
xmlns:custom="http://car.org/custom">
<xs:element name="tires">
<xs:annotation>
<xs:appinfo>
<custom:version>0.91</custom:version>
</xs:appinfo>
<xs:documentation>
Describes a set of tires.
</xs:documentation>
</xs:annotation>
<xs:complexType>
|
Although the XSD parser doesn't know what to do with the custom version, these and similar custom attributes can greatly help manage XSDs within a larger group of organizations. After all, XML files and XML schemas are meant to be read by human readers as well as computers, so it is a good habit to treat XSDs that way.
A final feature in XML schemas worth mentioning is the extension
element. In the last section, you saw the extensibility of XSD's appinfo
element. By default, this element is open to any content. Adding an extension
element is a more restrictive way to extend types. The XSD file
in Listing 14 describes how to extend the
basicTire type to contain a size
element.
Listing 14. The tire XML schema containing an extension for a size element
<xs:element name="tire" type="tr:sizedTire"/>
<xs:complexType name="basicTire">
<xs:sequence>
<xs:element ref="tr:brand"/>
<xs:element ref="tr:type"/>
</xs:sequence>
<xs:attribute name="count" use="required" form="qualified" type="xs:integer"/>
</xs:complexType>
<xs:complexType name="sizedTire">
<xs:complexContent>
<xs:extension base="tr:basicTire">
<xs:sequence>
<xs:element ref="tr:size"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
|
XML schema even allows you to specifically forbid extensions of the
basicType if you desire. If you're interested in
looking further into the XSD specifications, check Resources. You'll find a lot more interesting features in XML schema than we have room
to dissect in this article.
As you can see, you don't want to treat XML schemas as automatically generated technicalities if you want to keep your XML formats maintainable in large enterprises and their surroundings. You can choose from many more ways to improve the maintainability of XML schemas. You can even describe your XML format in other languages, like Schematron and RELAX NG. Whatever you use, design the right XML format up front, and you can meet the needs of everyone involved in the communication.
Learn
- The basics of using XML
schema to define elements (Ashvin Radiya, Vibha Dixit, developerWorks, August 2000): Learn more about XML schema.
- Make your life easier
with the XML schema Standard Type Library (Nicholas Chase, developerWorks, July 2007): Get tips on using the XML schema Standard Type Library.
- Index of XML standards: Check out developerWorks' list of the most important XML standards.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- The developerWorks Web Architecture zone: Expand your Web development skills with articles and tutorials that specialize in Web technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology bookstore: Browse for books on these and other technical topics.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- Exist DB: Download and explore an open source database management system. Exist DB is entirely built on XML technology, stores XML data according to the XML data model, and features efficient, index-based XQuery processing.
- DB2 Express-C 9.5: Try out the reliability, flexibility, and power of this full-function relational and XML data server.
- IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks XML zone: Share your thoughts: After you read this article, post your comments and thoughts in this forum. The XML zone editors moderate the forum and welcome your input.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.

Adriaan de Jonge is a software professional currently working for the Dutch government, juggling a few projects in several roles. Adriaan has written XML-related articles for IBM developerWorks and Amazon. You can reach Adriaan at adriaandejonge@gmail.com.

S.E. Slack is a writer and author with more than 10 technology books to her credit. She resides in Colorado with her family. Her latest title on the shelves is Windows Vista: Home Entertainment with Windows Media Center and Xbox 360 (Microsoft Press, 2007). Her next book will be PowerPoint 2007 Graphics and Animations Made Easy, to be published by McGraw-Hill.





