Level: Intermediate Parand Darugar (tdarugar@yahoo com), Architect
18 Jul 2005 Experience shows XML namespaces can be a common cause of confusion and a major complicating factor in XML adoption. In this article, the author argues that XML namespaces do not offer a good solution for the problems they aim to solve, and are not needed for the majority of XML use cases in the real world. His recommendation is to deprecate namespaces or significantly curtail their usage. For cases that require namespaces, developers should use best practices and conventions to restrict the syntactical freedoms offered by the specification such that namespaces present a consistent face that's easier to understand.
The problem
Over the years I have worked with a
good number of XML developers, ranging in skill from occasional
user to expert. In almost every case I have found a lack of
understanding of namespaces -- or, in the presence of understanding,
hands-on confusion in working with and debugging namespace-related issues.
XML namespaces, as defined by the current
specification, are a departure from the
perl-hacker-should-be-able-to-create-an-XML-parser-in-two-weeks credo; it takes
more than two weeks just to understand the nuances of XML
namespaces. The XML namespace FAQ gives the flavor of this confusion (see Resources):
3.3) Does the XML namespaces recommendation define anything except a two-part naming system for element types and attributes?
No.
This is a very important point and a source of much confusion, so we will repeat it:
...
If you have access to XML developers, ask how many believe they really understand XML namespaces. My experience has been that very few feel they do. I am not the first to have noticed this; you will find XML namespaces to be the subject of frequent and vitriolic discussion on the various XML developer mailing lists. It is fair to say that the XML namespace specification is amongst the most contested of the basic XML specifications.
XML namespace benefits?
Given the real or perceived difficulties surrounding XML namespaces,
what benefits does it offer? The intro to the specification offers some insight:
We envision applications of Extensible Markup Language (XML) where a
single XML document may contain elements and attributes (here referred
to as a "markup vocabulary") that are defined for and used by multiple
software modules. One motivation for this is modularity; if such a markup
vocabulary exists which is well-understood and for which there is useful
software available, it is better to re-use this markup rather than re-invent it.
Such documents, containing multiple markup vocabularies, pose problems of
recognition and collision. Software modules need to be able to recognize
the tags and attributes which they are designed to process, even in the
face of "collisions" occurring when markup intended for some other software
package uses the same element type or attribute name.
So the problems to address are recognition
and collision, with the use case being combining multiple
documents.
Take a look at this portion of the XML Namespace FAQ:
3.1) What is the purpose of XML namespaces?
XML namespaces are designed to provide universally unique names for elements
and attributes. This allows people to do a number of things, such as:
- Combine fragments from different documents without any naming conflicts. (See example below.)
- Write reusable code modules that can be invoked for specific elements and attributes.
Universally unique names guarantee that such modules are invoked only for the correct
elements and attributes.
- Define elements and attributes that can be reused in other schemas or
instance documents without fear of name collisions. For example, you might
use XHTML elements in a parts catalog to provide part descriptions. Or you
might use the nil attribute defined in XML Schemas to indicate a missing value.
The following example is worth examining closely. Essentially, the
problem presented is that the element Address appears
in two separate documents in two different contexts to mean two different things.
So far, so good.
Then the question arises:
This is not a problem as long as these element types exist only in separate documents. But what if they are combined in the same document, such as a list of departments, their addresses, and their Web servers?
Here you have a statement of the problem. This is an example of disambiguating a common element when two documents are combined.
What might a combined document look like? The FAQ provides it for you:
Listing 1. Combined document with namespaces
<Department>
<Name>DVS1</Name>
<addr:Address xmlns:addr="http://www.tu-darmstadt.de/ito/addresses">
<addr:Street>Wilhelminenstr. 7</addr:Street>
<addr:City>Darmstadt</addr:City>
<addr:State>Hessen</addr:State>
<addr:Country>Germany</addr:Country>
<addr:PostalCode>D-64285</addr:PostalCode>
</addr:Address>
<serv:Server xmlns:serv="http://www.tu-darmstadt.de/ito/servers">
<serv:Name>OurWebServer</serv:Name>
<serv:Address>123.45.67.8</serv:Address>
</serv:Server>
</Department>
|
Excellent. Now take a look at the same document without namespaces, where
you should clearly see the collision and recognition problems manifest
themselves:
Listing 2. Combined document without namespaces
<Department>
<Name>DVS1</Name>
<Address>
<Street>Wilhelminenstr. 7</Street>
<City>Darmstadt</City>
<State>Hessen</State>
<Country>Germany</Country>
<PostalCode>D-64285</PostalCode>
</Address>
<Server>
<Name>OurWebServer</Name>
<Address>123.45.67.8</Address>
</Server>
</Department>
|
Which one looks nicer to you? I think you'll agree that the second document is less ambiguous,
and that the different uses of Address aren't likely to create confusion for
the software being written.
Elements in context
Here is the surprise -- the XML Namespace specification
ignores one of the basic pillars of XML: XML documents are
hierarchical; no tag is an island.
If I were to tell you which Address element I'm interested
in, I might say "the Server address, the one that falls under the
Server tag," or "the Department address, the one that falls under
the Department tag." In XML speak, that would be
/Department/Server/Address or
/Department/Address, respectively.
This has no ambiguity; you know exactly which element I am referring to in either case.. This is because
an XML tag is defined by its context, not just by its tag name.
You run into ambiguity only when you ignore context. But why would
you ignore context? So you can write programs that are triggered by
the Address element alone, ignoring the rest of the markup? If you
are not interested in the structure of the document, then why bother to
use XML at all? Why express parent-child relationships between
elements? Why not simply use name-value pairs with no hierarchy?
It's fundamentally wrong to ignore structure and hierarchy when dealing with XML.
A novice developer who creates her first SAX-based program might fall into this trap and run into recognition and collision problems, but by the second program she will probably implement some sort of state keeping. In any case, this developer is certainly not going to be helped by XML namespaces, which are far more complicated than keeping track of state.
 |
Some background
I can claim to know a little about combining XML
documents; in a past life, I wrote a commercial XML transformation and combination
engine. The collision problem is not one of the more pressing issues to solve. The majority of challenges arise on the content side: the values in the XML document, as
opposed to the document structure. These include semantic
disambiguation -- what a particular tag means -- and format
conversion -- multiple formats for expressing a particular value
(for example, think GMT versus local time).
|
|
If the issue of disambiguating tags when combining documents is important
for an application, then utilizing the existing context of the XML document
is a far simpler solution than using a complex new model that relies on
universally unique names. The solution to
collision and recognition, as put forth by XML namespaces, boils down to
creating universally unique names for each and every individual tag in
every document.
Note that the way the example is posed is artificial; you do not need to
combine documents to run into this situation. Just imagine that the combined
document is what you started with. Certainly, you can have multiple elements
with the same name but different meanings in a single document. In fact,
in my experience this is far more common than the combined document case.
Is the solution to provide a universally unique name for every element?
If you are going to give every element a unique name in every case, then
you can simply use very long, unambiguous names. You don't need namespaces
for that.
The solution is to treat elements in their hierarchical context. Plain
old XML provides that -- XML namespaces are not needed.
Better examples
The common XML use cases, and certainly the use case put forth by
the FAQ, do not need XML namespaces. However, there are cases where
namespaces -- or something like them -- could offer value. This section examines
some compelling cases.
The first is the use of namespaces as a method for identifying or
versioning document types. You have likely seen the constructs such as:
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> |
This is an example of the topmost tag of a SOAP 1.2 message. The
namespace here serves a useful purpose: It informs the
consumer that this particular piece of XML is a SOAP envelope,
is related to the World Wide Web Consortium (W3C), and conforms to the May 2003
version of the specification. This is indeed more informative than
the alternative non-namespace-inflicted tag:
So this is one use for namespaces! Or is it? The identification
is certainly useful, but you do not need namespaces to achieve that.
All you need is an attribute and a convention -- something as simple as:
Identify documents through the documentIdentifier attribute on the topmost tag:
<Envelope documentIdentifier="http://www.w3.org/2003/05/soap-envelope"> |
Look at another case. Namespaces have a compelling use in
providing unique identifiers for type information. You may have
seen XML fragments such as:
<element name="cost" type="xsd:float"/> |
or
<element name="greeting" type="SOAP-ENC:string"/> |
Where xsd and SOAP-ENC
are namespace identifiers that refer to the XML Schema and SOAP encoding types,
respectively. Thus, cost is an element of type float
as defined by XSD, and greeting is an element of type
string as defined by the SOAP encoding specification. Here is another similar
example:
<cost xsi:type="xsd:float">29.95</cost> |
This conveys that cost has a type, and by type I mean type as defined by
XSI, and that the type is float as defined by XSD. The key point here is
that you are indeed looking for unique, non-context-related identifiers for
each type. You are not combining your document with the XSD or the SOAP
encoding document; you are simply referring to particular elements within
each specification from your document. The specification need not even be
in XML -- you are referring to a flat structure, simply a list of types.
If you believed that the type structure was hierarchical, you would need
to fully qualify the path for the type, with something like:
<cost xsi:type="xsd:/types/simple/float">29.95</cost> |
At last, you have a good reason to use XML namespaces. But what is this
piece of XML actually trying to achieve? It sends type information
along with the tag. That is unusual; normally you define type
information with a DTD or an XML Schema in a separate file, obviating the
need to repeat this information for every tag on every request.
Perhaps this still isn't a reasonable use case for XML namespaces,
but you have glimpsed a certain amount of usefulness. The lesson can be
generalized as follows: A method for
associating the attributes of elements with external
reference points might have value. The element itself does not need a namespace,
but its attributes might.
Others might argue that you can achieve this with alternative , simpler
methods than XML namespaces, but I will not argue that -- at least not in
this article. Instead, I propose that this is a very special case indeed,
and that XML namespace usage should be restricted to cases such as
this where more reasonable paths are not available.
Deprecating XML namespaces
Perhaps these arguments have convinced you of the problems with
XML namespaces and their limited applicability. Now what?
I advocate aggressively deprecating the XML Namespace
specification and removing it from general use. To be clear, I am
not claiming that namespaces are never needed; only that they are
very rarely needed, and that the current XML Namespace specification
carries with it a particularly great amount of pain. The use of XML
namespaces in general XML documents is not a best practice -- in fact,
it is a very costly practice.
In any case, given their widespread adoption, XML namespaces are
unlikely to simply go away. Nor is it obvious that a new specification
would significantly improve the situation; the folks who put together
the original are very smart indeed.
However, it is entirely reasonable
and quite prudent to change common practice from utilizing XML
namespaces in every new XML and Web service specification to using
XML namespaces sparingly, only when absolutely needed, and to keeping
them out of common specifications.
The very least that could and absolutely must be done is to develop
best practices and conventions for XML namespace usage patterns such
that they are easier to understand. As it stands, the syntactical
freedom granted by the specification allows namespace placement almost
anywhere in the document, with an infinite number of ways to express a
single concept. Understanding a namespace-inflicted document becomes
significantly simpler if the developer community can agree on something closer to a single
syntax for expressing a single concept. Imagine if you could glance at
a namespace-riddled XML document and easily comprehend it; imagine if
it really was human readable. That, however, is a topic for another article...
Resources
About the author  | 
|  | Parand Tony Darugar has been building and architecting high-performance distributed systems for most of his career, often as serial entrepreneur, and recently at a large Internet-based business. His interests include Web services and Service Oriented Architectures (SOA), XML, distributed architectures, and artificial intelligence. You can read more of his thoughts at his blog,
Standard Deviations, and reach him at tdarugar at yahoo com. |
Rate this page
|