Note: This document mentions changes proposed up through the September 2002 "Last Call Working Draft" of version 1.1 of Namespaces in XML.
Most business and communications problems that XML can solve require a combination of several XML vocabularies. (You may read tag and attribute sets in place of the term XML vocabularies if you wish.) XML has a mechanism for qualifying names to be allocated into different namespaces, such as namespaces that apply to different industries. A company or (better yet) an industry consortium can assign names to elements and use common words like "title" or "state" without worrying that those names will clash with the same names used in another vocabulary.
XML namespaces also allow names to evolve over time. After using the first version of a vocabulary, your real-world experience may lead you to devise an enhanced vocabulary. The new version can be assigned to a different namespace, and you can use XSLT to transform data from one vocabulary to the other.
Speaking of XSLT, that stylesheet standard provides for the importation of subsidiary stylesheets, which can contain generic templates written by others. The name of a template can be qualified to a namespace, again avoiding clashes. In other words, my stylesheet can call a named template that has a distinctive name qualified by a namespace (which has been chosen by the template's author). I could even use more than one library of templates imported into my stylesheet, and different namespaces for each library would avoid duplicate names of the templates. Many recommended standards of the World Wide Web Consortium (W3C) promote namespaces for modularity.
XML namespaces also allow various tools that process XML, such as a stylesheet-driven XSLT processor, to pick out the instructions they should obey and treat instructions for other processors as just more data. The processor is set up to consider elements from a particular namespace (or two) to be the instructions. Elements that have no namespace are data, as are all elements that have a namespace other than those recognized as instructions.
The formal designation of a namespace is a URI. Generally, you'll see URLs (one form of URI) as the identifier. Because URIs use a wide range of characters, there would be a severe impact on the XML syntax if we had to attach the full URI directly to every qualified name. Therefore, the XML Namespaces Recommendation also defines prefixes that are directly attached to names. Syntactically, you use quotes (single or double) around the URI string, and a colon to set off the prefix; other characters present no interference. The prefix is a standard XML name. You can avoid using a prefix by assigning one URI to all unprefixed names or by laboriously (and dangerously) reassigning the default namespace wherever needed in the document. For practical purposes, prefixes are required when you intermix vocabularies.
Like other specifications for XML, the XML Namespaces Recommendation is published by the W3C. The W3C is developing version 1.1 of the Namespaces Recommendation, where the formal designation will be an Internationalized Resource Identifier, or IRI. The differences between URIs and IRIs lie in how certain characters are escaped to make them benign.
Let's look at some real namespace syntax:
<mddl:custodian xmlns:mddl="http://www.mddl.org/mddl/2001/1.0-final">Merrill Lynch</mddl:custodian>
This is not a mere
custodian element; it is a
custodian element in the vocabulary identified by the URI
http://www.mddl.org/mddl/2001/1.0-final. The prefix
mddl is used to associate the element name with that URI. The URI is in the mddl.org domain; mddl.org is the organization that maintains the Market Data Definition Language, an XML vocabulary in which
custodian is one of many elements. (This vocabulary defines elements pertaining to investments and portfolio management.) Notice that mddl.org has made provisions to define other vocabularies and to issue later versions of the MDDL vocabulary by having several fields in their URI.
The local part of the name is the name within a particular vocabulary. For names that are not qualified by a namespace, the local part is the only part that exists. For a prefixed name, the local part is what comes after the colon. For example, elements named
book:isbn are in the same namespace but have different local parts. Elements named
person:title have the same local part but are entirely unrelated because they belong to different namespaces.
Prefixes simplify discussion of your work. You can discuss
xsl:apply-templates and the like while you develop an XML-based system, and only occasionally approach the details of their respective namespaces. In some technical sense, the prefix doesn't matter because it's a transient abbreviation that associates names with a namespace URI.
However, it's a best practice to establish logical and consistent prefix names to boost developer productivity.
The prefix qualifies and associates names of elements and attributes, and also applies to keyword-type
text strings in some situations. For example,
book:title is equivalent to "title as a characteristic of a book" when read in an XML document, which is convenient when a person has to scan some XML. By referring to the place where the prefix
book is tied to a URI, one can find a more formal specification that states, for example, "title as defined in the book vocabulary issued by abaa.org in 1999."
Several W3C recommendations use the term QName to refer to an XML name that may (or may not) be qualified to a namespace, and if you read specifications regularly, you will even occasionally see "QName-but-not-NCName" to indicate an XML name that must be qualified to a namespace. (The term NCName refers to an XML name without a colon. NC means "no colon." ) For example, named templates in XSLT can be named with QNames rather than with simple XML names, facilitating the publication of a library of templates that are all named in a particular namespace. A QName uses the colon (:) as a special character to separate the prefix from the local part. Naturally, the prefix and local part cannot contain a colon, but they otherwise follow the prescribed syntax for XML names.
More than one prefix can be associated with a particular URI. XML standards will generally force resolution of prefixes to their associated URIs, so that names are the same if their local parts and URIs match, even if the prefixes differ. A prefix can only be associated with one URI at a time.
Every article about XML namespaces has to point out that the URI goes nowhere, meaning there is no need to fetch any material that the URI appears to identify. Indeed, there is no requirement to set up a server for the identified location or to have fetchable material at the location. The XML Namespaces Recommendation only requires string-matching to establish that two URIs are the same, though it does briefly mention that the namespace value is a URI reference and implies that this value should follow the syntax of RFC 2396 of the Internet Engineering Task Force.
The URIs issued by the W3C always use the
http: protocol and the
w3.org domain name, so use of HTTP URLs can be considered the safe approach, and thus a best practice.
The domain name is the key to avoiding clashing names. By using the worldwide Domain Name System, the namespace URI provides an answer to the "Says who?" question. If you have a domain name, you have a piece of the world where you control the names, and this applies to your XML namespaces as well as your servers. For example, mddl.org is the domain name belonging to an organization that defines XML vocabularies pertaining to investments, and nobody else can assign names and URLs under the mddl.org domain.
In the future, the W3C may establish a guiding principle for the namespace URI to point to a fetchable resource. Various W3C committees are discussing alternatives. Most likely, the material identified by the URI will itself be an indirect pointer to an actual schema or description, allowing the syntax of the real description to evolve over time. For now, the URIs used for W3C namespaces point to simple text pages stating that the URI is a namespace. Try this as an example: http://www.w3.org/1999/XSL/Transform.
The W3C document that defines the namespace syntax and function uses the term namespace name to refer to the URI of the namespace. The XML Information Set Recommendation, which defines the meaningful parts of an XML document, also uses the term in the same way. However, the XPath functions
local-name() return the prefix when applied to a namespace node.
Therefore, it is a best practice to either avoid the term namespace name or only use it in a context where it's clear what you mean. XQuery uses the terms namespace prefix and namespace URI when discussing its syntax. The latter can safely be used to refer to the URI in a namespace declaration.
Namespaces are designated for the various XML vocabularies, whether issued by the W3C itself like MathML or by an industry consortium like DSML, which came from an OASIS Technical Committee. (See Resources.) You can do the same within your organization.
Furthermore, other W3C recommendations in the XML family use namespaces to distinguish what they define. The XML Recommendation defines no element names, but describes two attributes that can be used in XML documents,
xml:lang. The XML Base Recommendation adds
xml:base to the list. In each case, use of the xml prefix means that they are in a namespace that is defined by default for every XML document. In a recent Erratum, the W3C declared that you cannot use any prefix besides xml on the names built into XML.
The W3C uses a unique prefix for each vocabulary it defines. Each recommendation takes pains to point out that these prefixes are not functionally special, just used consistently. Returning to the MathML example, the MathML 2.0 Recommendation suggests the outer element
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">, where
mml is their favored prefix. Again, the
mml string is not special, other than for humans reading the documentation. The URI (http://www.w3.org/1998/Math/MathML) is the string that actually identifies the MathML vocabulary. (In that vocabulary, an element with the local part name math is the outer element.)
The XML Inclusions Recommendation presented a design dilemma: The inclusion construct couldn't be reduced to a single string value, as could
xml:lang. An inclusion declaration may need as many as three parts:
- the href of the included resource,
- its presumed encoding,
- and its parsing method.
These could be joined as attributes on an element, but naming that element
xml:include would impinge on the set of available element names in XML, causing messy exceptions for humans and machines alike. The solution was to define a namespace just for this one element. A typical include element looks like this:
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="http://example.com/std/defs" parse="xml" />
In this example, notice that the element carries the declaration of the
xi prefix inside its own start tag. If one file has several XML inclusions, you may want to declare
xi at the top, in which case each element is still named
xi:include, but doesn't carry the namespace declaration inside its start tag:
<document xmlns:xi="http://www.w3.org/2001/XInclude" ...> <-- ...other content... --> <xi:include href="http://example.com/system/common" parse="xml" /> <-- ...other content... --> <xi:include href="http://example.com/system/style" parse="xml" /> <-- ...other content, and perhaps more include elements... --> </document>
Having a namespace for the generalized XML include also keeps it separate from the application-specific includes in XSLT and XML Schema.
XSLT and XML Schema are two cases in which an elaborate recommendation requires a full document to describe the transformation or data design, respectively. These documents are known as XSLT stylesheets and schema definitions, respectively. Following one of the basic design principles of XML, these are XML documents that use specially-namespaced vocabularies. In fact, XML Schema defines one vocabulary for the schema definition document, and another vocabulary for schema items that occur in the instance documents or data defined by the schema. Schema definitions and XSLT stylesheets may intermix the elements and attributes of their language with elements and attributes from other namespaces, so prefixes are needed.
It is a best practice to use the prefixes that the W3C uses. For example, the XSLT Recommendation and most books about XSLT use the prefix
xsl to identify the elements of the XSLT vocabulary. If you stick with the
xsl prefix for your stylesheets, you can then discuss your deployment plans and consult XSLT books without the mental overhead of translating prefixes.
XML documents have a tree structure, descending from the document element or outermost element. A namespace can be declared on any element, allowing it to be recognized within the sub-tree defined by that element and all its children. The declaration resembles an attribute, but most W3C recommendations consider it to be a separate type of node. When you look at the XML, you'll see the namespace declarations inside the start tags of the relevant elements, right alongside the attributes. There are two syntax variations:
The first one is commonly used; it associates a prefix with a URI. The second one declares that there is a default namespace for those elements that lack a prefix. Within the overall design of XML, both of these syntaxes fit under the reservation of names beginning with the characters
xml for XML purposes. The default namespace is initialized to be no namespace-URI at all, so there is a syntax for undefining a previously-defined default namespace by assigning it to the null string. (Null strings are technically valid as URIs, but disallowed as namespace URIs.) Prefixes can be set to different URIs, but cannot be undefined, at least for XML 1.0 documents.
Variations of namespace declarations
|"http://URI"||Sets default||Associates prefix|
|"" (null string)||Unsets default||ILLEGAL! (may change)|
In April of 2002, the XML Working Group of the W3C announced it was considering a revision of XML namespaces that would permit the assignment of a namespace prefix to the null string. In the September edition of the proposal, the usage was restricted so that such a declaration could only be used to undefine a prefix for the purposes of avoiding conflicts and eliminating unwanted namespace nodes, and a qualified name could not use the prefix at any place in the document where it was assigned to null. For now, note that the exclude-result-prefixes feature of XSLT can be used to remove unwanted namespace nodes if they aren't in use, should you need to do so.
A prefix can be associated with one URI at the top of a tree, but associated with a different URI within a sub-tree by having an
xmlns:prefix="new-uri" declaration in the start-tag of the element atop the sub-tree, then associated with another URI (or the original URI) in a sub-sub-tree inside the sub-tree, and so on. Doing this can cause confusion for those who have to read the raw XML document.
This example is compact, but imagine how hard it would be to find all the
xmlns declarations in a large document:
<data:document xmlns:data="http://example.com/namespace/fields"> <-- ...other content... --> <data:legacy xmlns:data="http://example.com/namespace/legacy-data"> <-- ...data of an older style... --> <data:item xmlns:data="http://example.com/namespace/fields"> <-- ...this one item within legacy uses the standard namespace... --> </data:item> </data:legacy> <-- ...other content... --> </data:document>
You can apply the following preferred practices:
The best practice here is to use a given prefix for only one namespace throughout all XML documents in a system. If this is impractical, at least try to associate the prefix with only one URI within a single document. Another best practice is to make all the necessary associations up in the start tag of the document element, so that they apply throughout the whole document. This makes it easier to find all the declarations. The number of namespace declarations that can appear in a single start tag is unlimited.
When a software tool generates XML, it has to place namespace nodes (
xmlns declarations) within the tree so that they are in effect where needed to qualify names. If a namespace has an associated prefix, the namespace can be declared higher up than the element where it's needed. This can have the desirable effect of reducing redundant declarations. The Xalan XSLT processor is one example of a tool that does this.
You must declare all prefixes before using them, except
xmlns, which can be assumed to be in effect and unchanging throughout all XML documents. You may be tempted to exploit the attribute-like syntax to have some of your declarations set up as default attributes in an external entity.
The best practice here is to have the declarations contained within the document, thereby reducing assumptions and dependencies.
Use of the default namespace (the one applicable to unprefixed element names) is a judgment call. If you can get accustomed to prefixing all element names everywhere, you avoid some pitfalls. However, some people may experience prefix fatigue or feel that one namespace applies to the real content of the document and that making it the default is a way to make that distinction. If you follow that latter path, you will need to establish some design principles for determining the namespace that can be the default in a given document. Of course, the rules will benefit only those people who actually have to read (and possibly create) XML documents.
The best practice regarding use of prefixes is to either use them everywhere or to use them on all items except those that are the real content being delivered to the end user. Use prefixes for all process control elements that are modified only by system developers, including XSLT stylesheets, schema definitions, and so forth. Use prefixes on all items coming from XML vocabularies that are external to your organization, with the possible exception of real content being delivered to the end user.
An attribute can appear in a different namespace than the element that contains it. For example,
<movie:title xml:lang="fr"> has an attribute that is not from the
movie namespace. If an attribute name has a prefix, its name is in the namespace indicated by the prefix. However, if an attribute name has no prefix, it has no namespace. This is true even when the default namespace has been assigned. The W3C Namespaces in XML Recommendation makes that point with this example:
<x xmlns="http://www.w3.org" xmlns:n1="http://www.w3.org"> <good a="1" n1:a="2" /> </x>
The elements are affected by the declaration of a URI for the default namespace. That is, both
good are associated with the URI "http://www.w3.org" because it's the default namespace. The attribute
n1:a is also associated with that namespace, due to its use of the
n1 prefix, which is associated with the same URI. There is no conflict that the
a attribute is being declared twice, because while
n1:a is in the
http://www.w3.org namespace, the unprefixed
a is not; the latter is not in any namespace.
xml:lang is illustrated above, let's note that it is a best practice to use the
xml:lang attribute as the way to declare that the content of the element is in a particular natural language.
When a W3C vocabulary specifies both elements and attributes, it typically will not require that the attributes be qualified to the namespace as long as they occur on elements that are qualified. Returning to the XML Include example, in
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="http://example.com/std/defs" parse="xml" />, the
parse attributes are specified as meaningful attributes of the
xi:include element, so an XML parser that is able to act upon the
xi:include element must interpret those attributes as details of the include operation.
In the full universe of possible attribute names, all names beginning with the letters "xml" -- in that order,
but in any upper-/lower-case combination -- are reserved to be defined by the W3C. That way, a namespace
xmlns="http://foo.com" can use the syntax of an attribute rather
than a distinct syntax.
Most W3C specifications call it a namespace declaration rather than an attribute, and
it's a best practice to observe the difference in conversation. (The Namespaces Recommendation document itself refers to these declarations as reserved attributes long enough to introduce them. DOM Level 3 also treats these declarations as attributes from the
A namespace declaration like
xmlns:fooname="http://foo.com" has the same syntax as an attribute with a qualified name, but the initial letters "xml" signal its special role, and it too is a namespace declaration in conversation. However, an attribute like
xml:space="preserve" is still an attribute in the proper terminology, but it is in the reserved namespace. If your XML documents get processed by an application that recognizes XML but is not namespace-aware, the QNames will probably survive and the namespace declarations will be treated as attributes.
xmlns prefix has been specified by the first Namespaces in XML Recommendation to not have an associated URI. The W3C may opt to change this in the future. This may not make much of a difference in the real world, since most XML tools and processes manage namespace declarations automatically. Where they don't, you usually have a method to create or avoid creating a namespace node in the tree-like representation of the XML. When the XML resides in a file, the namespace declaration has the standard
xmlns sequence in the start tag of an element, but the XML parser that reads the file will know to recognize
xmlns whether or not it's associated with a namespace. (Since XML launched without namespaces, you could potentially encounter an early XML parser that is not namespace-aware; avoid such parsers if the best practices presented here are at all relevant to you.)
The XML Schema Recommendation has complete provisions for defining a document structure with namespaced elements and attributes. Furthermore, it defines a special QName data type for strings that must be valid as qualified names. A schema definition document can specify the target namespace for the document structure.
The older document type definition (DTD) syntax for specifying document structure is not namespace-aware. However, DTDs tolerate element and attribute names that contain colons. If you want to use DTDs and namespaces together, you can do so by designating specific prefixes and treating them as fixed parts of the element and attribute names. The technique is explained in detail in C. M. Sperberg-McQueen's memo in The Cover Pages (see Resources). Expect substantial discomfort if you must do this. (DTDs allow the assignment of values to attributes not explicitly present in the XML document. Setting an attribute named
xmlns through this DTD mechanism is a bad idea.)
To this point, I have covered the foundation established by the W3C. Part 2 provides more depth on the best way to establish your own XML vocabularies. In Part 2, you'll also see renaming techniques that are namespace-aware.
- Participate in the discussion forum.
- Delve deeper into XML namespaces and define your own XML vocabularies in David Marston's second article, "Plan to use XML namespaces, Part 2."
- For another look at the subject, read Uche Ogbuji's article "Use XML namespaces with care" (developerWorks, April 2004).
- Discover what XML 1.1 and Namespaces 1.1 are about, what changes they bring, and how they affect other specs and users in "XML 1.1 and Namespaces 1.1 revealed" by Arnaud Le Hors (developerWorks, May 2004).
- The Namespaces in XML 1.0 Recommendation from the W3C sets the standard.
- The XML Schema Recommendation of the W3C has three parts: Primer, Structures, and Datatypes.
- The W3C is developing Architectural Principles about identifiers.
- XInclude is on its way to becoming a W3C recommendation.
- The XML Information Set Recommendation of the W3C is an abstract design that identifies the significance of the parts of an XML tree structure.
- "Real-world XML Schema" offers naming ideas (developerWorks, January 2002).
- Part 6 of Christina Lau's "XML and WebSphere Studio Application Developer" series provides coaching on the use of namespaces in schemas.
- Find out more about MathML, a W3C vocabulary, and Directory Services Markup Language (DSML), which comes from an OASIS Technical Committee.
- Read C. M. Sperberg-McQueen's memo on DTDs and namespaces in The Cover Pages.
- See the latest "XPointer xmlns() Scheme" draft for a slightly different way to declare namespaces.
- URI Generic Syntax, RFC 2396, gives plenty of detail on URIs.
- Find more XML resources on the developerWorks
XML technology zone.
- Get Rational Application Developer for WebSphere Software, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
David Marston has worked with XML technologies since late 1998. Over his 25+ years in the computing business, he has been involved with all aspects of software development. He is a graduate of Dartmouth College and a member of the ACM. He is on the Next-Generation Web team at IBM Research. You can contact him at David_Marston@us.ibm.com.