Tip

Namespaces and versioning

Using XML namespaces to mark the version of XML formats

Comments

Content series:

This content is part # of # in the series: Tip

Stay tuned for additional content in this series.

This content is part of the series:Tip

Stay tuned for additional content in this series.

One of the core features of XML is its ability to deal with changes in the rules for data (hence the extensible in its name -- Extensible Markup Language). As changes are made to XML vocabularies, the creation of multiple versions is inevitable. This makes it necessary to mark the versions clearly, for human and machine information. The clear marking of versions can be used for driving validation, or for branch processing according to the requirements of each version.

You can mark the version of an XML vocabulary in many ways. This discussion focuses on the use of XML namespaces for marking versions.

Versioning with special attributes or document types

Let's start with an XML vocabulary for a mailing label format:

Listing 1. Mailing label format
<?xml version="1.0"?>
<labels>
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

If I think this format may change, it makes sense to mark its version when I first deploy it. One way of doing this is through the document type declaration (DTD). The DTD refers to a public identifier, which can be made specific to the document version. A good example of this is the W3C's XHTML public identifier, as used in the following declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">

You can see not only the version (1.0), but also the variation within the version (XHTML has three variations: strict, transitional, and frameset).

Naturally, the DTDs for the various versions themselves reflect the changes being made. This approach requires that I define the format in a DTD, which might not always be desired. Also, though DOM and SAX provide access to the public identifier used in the source's declaration, XSLT does not.

Another approach often used is the top-level version attribute. For instance:

Listing 2. Mailing label format with version attribute
<?xml version="1.0"?>
<labels version="1.0">
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

The top-level version attribute works whether I use an XML schema system or not. And the version information is available from DOM, SAX, XSLT, or any other normal XML processing technology.

The version attribute approach is taken by the XSLT language itself. The biggest problem with this approach is primarily conceptual: The connection between the version identifier and each XML information item is somewhat tenuous (typically through an attribute of an ancestor, possibly a distant one). This can also lead to some awkwardness in code that dispatches according to version.

Versioning with namespaces

To make the version information a more immediate property of the XML information items, you can place them in XML namespaces that reflect the version. For example:

Listing 3. Mailing label format with namespace version
<?xml version="1.0"?>
<labels xmlns="http://uche.ogbuji.net/eg/labels/1.0">
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

Thus the version information comes through with the namespace -- on each SAX event, on each DOM node, or on the namespace axis of each XPath node. This common system is used in most W3C vocabularies. In fact, XSLT uses it in addition to the version attribute.

To be precise, the W3C usually uses date stamps in the namespace URIs rather than version numbers. I might emulate this with the following:

Listing 4. Mailing label format with date-stamped namespace version
<?xml version="1.0"?>
<labels xmlns="http://uche.ogbuji.net/eg/labels/2002/05">
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

When you use namespace for versioning as shown in Listing 4, the biggest problem is that even a small change in the actual format becomes a big issue because of the propagation of the namespaces. If I tweak the format to allow an optional country element in the address, users end up supporting the original namespace as well as the updated one (say http://uche.ogbuji.net/eg/labels/1.0) in all their processing code, even though it might not have much of an actual effect on the processing code.

If a change to the format across a version is minor enough that it is not expected to affect processing much, one solution is not to change the namespace URI with every single change in format. This solution works in most cases, but does break down when the maintainer of a namespace uses a retrievable URI that points to an actual document that describes the format. In this instance, the document will likely change with any format change, regardless of how minor; hence it makes sense to change that document's URI, which also happens to be the namespace.

Eric van der Vlist proposed a system for minimizing this problem on the XML-DEV mailing list in March 2001 (see Related topics).

In this case, version numbers are divided into major and minor parts based on the magnitude of the format changes represented. Only the major parts of version numbers are used in the namespace. For instance, the original version of my mailing label format is 1.0 (major 1, minor 0). After I add the optional country element, the new version is 1.1 (major 1, minor 1). The namespace I use in both cases is:

http://uche.ogbuji.net/eg/labels/1

Then I set up the HTTP server (that provides the documentation pointed to by each namespace URI) to redirect the user from a URL with only the major version number to a URL that gives the precise version. So when the server gets a request for http://uche.ogbuji.net/eg/labels/1, it redirects to the document at http://uche.ogbuji.net/eg/labels/1.1, as that is the latest version. A user is still free to retrieve the 1.0 document by making an explicit request for that URI.

Conclusion

This tip glosses over several controversial points by assuming common practice. Marking versions using namespaces is more common than doing so using version attributes, though which approach is better is a matter of debate. Also, it is controversial as to whether a namespace URI should point to anything at all, either directly to a document defining the format or to a general information document about the vocabulary, as defined by the Resource Directory Description Language (RDDL). Again, common practice uses HTTP URLs for the namespaces. Considering the subtleties explored in this discussion, placing the version in the namespace is, in practice, well proven, and makes dealing with changes in XML format just a bit less hairy.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12113
ArticleTitle=Tip: Namespaces and versioning
publish-date=06012002