Skip to main content

skip to main content

developerWorks  >  XML  >

Abolish XML namespaces?

Making the case for change with this problematic technology

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Parand Darugar (tdarugar@yahoo com), Architect

18 Jul 2005

Experience shows XML namespaces can be a common cause of confusion and a major complicating factor in XML adoption. In this article, the author argues that XML namespaces do not offer a good solution for the problems they aim to solve, and are not needed for the majority of XML use cases in the real world. His recommendation is to deprecate namespaces or significantly curtail their usage. For cases that require namespaces, developers should use best practices and conventions to restrict the syntactical freedoms offered by the specification such that namespaces present a consistent face that's easier to understand.

The problem

Over the years I have worked with a good number of XML developers, ranging in skill from occasional user to expert. In almost every case I have found a lack of understanding of namespaces -- or, in the presence of understanding, hands-on confusion in working with and debugging namespace-related issues. XML namespaces, as defined by the current specification, are a departure from the perl-hacker-should-be-able-to-create-an-XML-parser-in-two-weeks credo; it takes more than two weeks just to understand the nuances of XML namespaces. The XML namespace FAQ gives the flavor of this confusion (see Resources):

3.3) Does the XML namespaces recommendation define anything except a two-part naming system for element types and attributes?

No.

This is a very important point and a source of much confusion, so we will repeat it:
...

If you have access to XML developers, ask how many believe they really understand XML namespaces. My experience has been that very few feel they do. I am not the first to have noticed this; you will find XML namespaces to be the subject of frequent and vitriolic discussion on the various XML developer mailing lists. It is fair to say that the XML namespace specification is amongst the most contested of the basic XML specifications.



Back to top


XML namespace benefits?

Given the real or perceived difficulties surrounding XML namespaces, what benefits does it offer? The intro to the specification offers some insight:

We envision applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.

So the problems to address are recognition and collision, with the use case being combining multiple documents.

Take a look at this portion of the XML Namespace FAQ:

3.1) What is the purpose of XML namespaces?

XML namespaces are designed to provide universally unique names for elements and attributes. This allows people to do a number of things, such as:

- Combine fragments from different documents without any naming conflicts. (See example below.)

- Write reusable code modules that can be invoked for specific elements and attributes. Universally unique names guarantee that such modules are invoked only for the correct elements and attributes.

- Define elements and attributes that can be reused in other schemas or instance documents without fear of name collisions. For example, you might use XHTML elements in a parts catalog to provide part descriptions. Or you might use the nil attribute defined in XML Schemas to indicate a missing value.

The following example is worth examining closely. Essentially, the problem presented is that the element Address appears in two separate documents in two different contexts to mean two different things. So far, so good.

Then the question arises:

This is not a problem as long as these element types exist only in separate documents. But what if they are combined in the same document, such as a list of departments, their addresses, and their Web servers?

Here you have a statement of the problem. This is an example of disambiguating a common element when two documents are combined.

What might a combined document look like? The FAQ provides it for you:


Listing 1. Combined document with namespaces

  <Department>
     <Name>DVS1</Name>
     <addr:Address xmlns:addr="http://www.tu-darmstadt.de/ito/addresses">
        <addr:Street>Wilhelminenstr. 7</addr:Street>
        <addr:City>Darmstadt</addr:City>
        <addr:State>Hessen</addr:State>
        <addr:Country>Germany</addr:Country>
        <addr:PostalCode>D-64285</addr:PostalCode>
     </addr:Address>
     <serv:Server xmlns:serv="http://www.tu-darmstadt.de/ito/servers">
        <serv:Name>OurWebServer</serv:Name>
        <serv:Address>123.45.67.8</serv:Address>
     </serv:Server>
  </Department>

Excellent. Now take a look at the same document without namespaces, where you should clearly see the collision and recognition problems manifest themselves:


Listing 2. Combined document without namespaces

  <Department>
     <Name>DVS1</Name>
     <Address>
        <Street>Wilhelminenstr. 7</Street>
        <City>Darmstadt</City>
        <State>Hessen</State>
        <Country>Germany</Country>
        <PostalCode>D-64285</PostalCode>
     </Address>
     <Server>
        <Name>OurWebServer</Name>
        <Address>123.45.67.8</Address>
     </Server>
  </Department>

Which one looks nicer to you? I think you'll agree that the second document is less ambiguous, and that the different uses of Address aren't likely to create confusion for the software being written.



Back to top


Elements in context

Here is the surprise -- the XML Namespace specification ignores one of the basic pillars of XML: XML documents are hierarchical; no tag is an island.

If I were to tell you which Address element I'm interested in, I might say "the Server address, the one that falls under the Server tag," or "the Department address, the one that falls under the Department tag." In XML speak, that would be /Department/Server/Address or /Department/Address, respectively. This has no ambiguity; you know exactly which element I am referring to in either case.. This is because an XML tag is defined by its context, not just by its tag name.

You run into ambiguity only when you ignore context. But why would you ignore context? So you can write programs that are triggered by the Address element alone, ignoring the rest of the markup? If you are not interested in the structure of the document, then why bother to use XML at all? Why express parent-child relationships between elements? Why not simply use name-value pairs with no hierarchy?

It's fundamentally wrong to ignore structure and hierarchy when dealing with XML. A novice developer who creates her first SAX-based program might fall into this trap and run into recognition and collision problems, but by the second program she will probably implement some sort of state keeping. In any case, this developer is certainly not going to be helped by XML namespaces, which are far more complicated than keeping track of state.

Some background

I can claim to know a little about combining XML documents; in a past life, I wrote a commercial XML transformation and combination engine. The collision problem is not one of the more pressing issues to solve. The majority of challenges arise on the content side: the values in the XML document, as opposed to the document structure. These include semantic disambiguation -- what a particular tag means -- and format conversion -- multiple formats for expressing a particular value (for example, think GMT versus local time).

If the issue of disambiguating tags when combining documents is important for an application, then utilizing the existing context of the XML document is a far simpler solution than using a complex new model that relies on universally unique names. The solution to collision and recognition, as put forth by XML namespaces, boils down to creating universally unique names for each and every individual tag in every document.

Note that the way the example is posed is artificial; you do not need to combine documents to run into this situation. Just imagine that the combined document is what you started with. Certainly, you can have multiple elements with the same name but different meanings in a single document. In fact, in my experience this is far more common than the combined document case. Is the solution to provide a universally unique name for every element? If you are going to give every element a unique name in every case, then you can simply use very long, unambiguous names. You don't need namespaces for that.

The solution is to treat elements in their hierarchical context. Plain old XML provides that -- XML namespaces are not needed.



Back to top


Better examples

The common XML use cases, and certainly the use case put forth by the FAQ, do not need XML namespaces. However, there are cases where namespaces -- or something like them -- could offer value. This section examines some compelling cases.

The first is the use of namespaces as a method for identifying or versioning document types. You have likely seen the constructs such as:

<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">

This is an example of the topmost tag of a SOAP 1.2 message. The namespace here serves a useful purpose: It informs the consumer that this particular piece of XML is a SOAP envelope, is related to the World Wide Web Consortium (W3C), and conforms to the May 2003 version of the specification. This is indeed more informative than the alternative non-namespace-inflicted tag:

<Envelope>

So this is one use for namespaces! Or is it? The identification is certainly useful, but you do not need namespaces to achieve that. All you need is an attribute and a convention -- something as simple as: Identify documents through the documentIdentifier attribute on the topmost tag:

<Envelope documentIdentifier="http://www.w3.org/2003/05/soap-envelope">

Look at another case. Namespaces have a compelling use in providing unique identifiers for type information. You may have seen XML fragments such as:

<element name="cost" type="xsd:float"/>

or

<element name="greeting" type="SOAP-ENC:string"/>

Where xsd and SOAP-ENC are namespace identifiers that refer to the XML Schema and SOAP encoding types, respectively. Thus, cost is an element of type float as defined by XSD, and greeting is an element of type string as defined by the SOAP encoding specification. Here is another similar example:

<cost xsi:type="xsd:float">29.95</cost>

This conveys that cost has a type, and by type I mean type as defined by XSI, and that the type is float as defined by XSD. The key point here is that you are indeed looking for unique, non-context-related identifiers for each type. You are not combining your document with the XSD or the SOAP encoding document; you are simply referring to particular elements within each specification from your document. The specification need not even be in XML -- you are referring to a flat structure, simply a list of types. If you believed that the type structure was hierarchical, you would need to fully qualify the path for the type, with something like:

<cost xsi:type="xsd:/types/simple/float">29.95</cost>

At last, you have a good reason to use XML namespaces. But what is this piece of XML actually trying to achieve? It sends type information along with the tag. That is unusual; normally you define type information with a DTD or an XML Schema in a separate file, obviating the need to repeat this information for every tag on every request.

Perhaps this still isn't a reasonable use case for XML namespaces, but you have glimpsed a certain amount of usefulness. The lesson can be generalized as follows: A method for associating the attributes of elements with external reference points might have value. The element itself does not need a namespace, but its attributes might.

Others might argue that you can achieve this with alternative , simpler methods than XML namespaces, but I will not argue that -- at least not in this article. Instead, I propose that this is a very special case indeed, and that XML namespace usage should be restricted to cases such as this where more reasonable paths are not available.



Back to top


Deprecating XML namespaces

Perhaps these arguments have convinced you of the problems with XML namespaces and their limited applicability. Now what?

I advocate aggressively deprecating the XML Namespace specification and removing it from general use. To be clear, I am not claiming that namespaces are never needed; only that they are very rarely needed, and that the current XML Namespace specification carries with it a particularly great amount of pain. The use of XML namespaces in general XML documents is not a best practice -- in fact, it is a very costly practice.

In any case, given their widespread adoption, XML namespaces are unlikely to simply go away. Nor is it obvious that a new specification would significantly improve the situation; the folks who put together the original are very smart indeed.

However, it is entirely reasonable and quite prudent to change common practice from utilizing XML namespaces in every new XML and Web service specification to using XML namespaces sparingly, only when absolutely needed, and to keeping them out of common specifications.

The very least that could and absolutely must be done is to develop best practices and conventions for XML namespace usage patterns such that they are easier to understand. As it stands, the syntactical freedom granted by the specification allows namespace placement almost anywhere in the document, with an infinite number of ways to express a single concept. Understanding a namespace-inflicted document becomes significantly simpler if the developer community can agree on something closer to a single syntax for expressing a single concept. Imagine if you could glance at a namespace-riddled XML document and easily comprehend it; imagine if it really was human readable. That, however, is a topic for another article...



Resources



About the author

Photo of Parand Tony Darugar

Parand Tony Darugar has been building and architecting high-performance distributed systems for most of his career, often as serial entrepreneur, and recently at a large Internet-based business. His interests include Web services and Service Oriented Architectures (SOA), XML, distributed architectures, and artificial intelligence. You can read more of his thoughts at his blog, Standard Deviations, and reach him at tdarugar at yahoo com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top