My professional colleague Elliotte Rusty Harold is a noted instructor on XML, developer of XML software, and contributor in other ways to the development of XML. We share a keen interest in issues of good practices and clear thinking for XML users and developers. I have been offering my own advice and experiences on this topic in recent developerWorks articles. Harold has channeled his own thoughts on the matter into a book, Effective XML. As its home page states:
Effective XML is a collection of guidelines and best practices for using XML. It focuses on using and developing XML applications, with a particular emphasis on aspects of XML that are often misunderstood or misapplied.
In this article, I discuss Harold's book and include my own observations about variations in XML best practices and norms.
The book does not even get to Part 1 before it presents material you really should not miss. The introduction includes a section that breaks down an assortment of XML terminology that is often confused. Thinking and speaking precisely about the many subtle concepts in XML is essential to using the technologies properly. Many of the distinctions Harold makes are not just academic hair-splitting, but are key to truly understanding the core concepts behind XML. I took an important lesson here from the section "Namespace versus Namespace Name versus Namespace URI." I often use these three terms carelessly, and I appreciated the reminder in the book.
Part 1 covers XML syntax, and the first item (like many others in the book) is remarkably similar to some treatments I have given similar topics in my own writing -- in this case my tip "Always use an XML declaration." I strongly disagree with the very next section in the book, "Mark Up with ASCII if Possible." I have always seen XML's clean basis in Unicode as a good reason to ditch the ASCII (and English-only) bias that still grips too much of the software industry. I think XML is already compelling developers of chauvinistic ASCII-only tools to provide better internationalization. Rather than retreat to ASCII in order to accommodate the few remaining tools that have not caught up, I prefer that users force their vendors to internationalize (perhaps resorting to market dynamics by simply choosing different vendors). Harold's suggestion centers around the mark-up (such as tag names), rather than the character data, which he admits has to be localized. He also takes a step in the right direction in a much later section, "Write in Unicode," but I think his recommendations in this area show too much English bias to be viable.
Harold has been a leading opponent of the developments in XML 1.1, and his comments have sparked very arcane and interesting discussion in the XML community. The section "Stay with XML 1.0" articulates his take on the issue in clear enough terms that I recommend it to anyone who has to consider whether to support XML 1.1 -- just be aware of the counterarguments made by proponents of XML 1.1. For my part, I plan to stick with XML 1.0 as long as I can, but I notice with interest one issue that might compel me to consider XML 1.1: It loosens many of the unnecessary restrictions on the characters that can be used for XML mark-up constructs. Harold's preference for sticking with ASCII in mark-up certainly makes it easier to ignore XML 1.1, but I've already expressed my disagreement with the idea of avoiding non-ASCII mark-up.
I have observed that many of the 1.0 technologies associated with XML are works of technical excellence, often because they are driven by brilliant individuals and small groups. In an unfortunate pattern, success of the 1.0 specification brings numerous, varied interests to the process of developing 1.1 or 2.0 versions, and the result is a mess mandated by politics and designed by committee. A similar recommendation I make to XML developers is to stay with XPath 1.0 and XSLT 1.0 (mixing in EXSLT where necessary). XPath and XSLT 2.0 are examples of the loss in quality typical of second-generation XML technologies.
One of the most passionate arguments I had with my operating systems design lab partner in college was whether the "hump case" (for example, "OneTwo") is better than the underscore (as in "one_two") for naming computer symbols. I advocated hump case back then, but I have come to believe that he was correct. Using underscores improves readability to a large extent. XML allows dashes in variable names, which opens up a convention that's as readable as underscores but easier to type ("one-two"). (I recommend this in another article, "Keep your XML clean" -- see Resources). Harold recommends my least favorite option in his section "Name Elements with Camel Case."
"White Space Matters" is another section that I think is essential reading. The rest of Part 1 includes sage advice about design patterns for DTDs similar to those that should be understood by developers in any technical language.
The next part covers XML structural matters, and in the first two sections, "Make Structure Explicit through Markup" and "Store Metadata in Attributes," the book touches on a lot of the same principles and topics I've covered in my article "When to use elements versus attributes." Much of the discussion in these book sections and in my article are nicely complementary. As expected, I disagreed with some areas, such as Harold's preference never to mark up units of measure separately from numerical values. The sections "Remember Mixed Content" and "Allow All XML Syntax" are also important reading for anyone involved with XML. They address aspects of XML that are sometimes neglected in XML processing, which works against the strengths of XML. The section "Use Processing Instructions for Process-Specific Content" is interesting reading that could be somewhat controversial in the XML community; some would prefer to eliminate processing instructions from XML. I agree with Harold that they can be quite useful.
The sections "Use Namespaces for Modularity and Extensibility" and "Rely on Namespace URIs, Not Prefixes" echo lessons I offered in my article "Use XML namespaces with care" and my tip "Namespaces and versioning" (see also the book section "Version Documents, Schemas, and Stylesheets"). The next section, "Don't Use Namespace Prefixes in Element Content and Attribute Values," goes even further, touching on yet another point of controversy in XML: XSLT and other languages that host XPath do use namespace prefixes in attribute values, and this practice undoubtedly has pitfalls; however, I haven't seen any superior alternative to this practice besides a complete re-architecturing of XML namespaces (probably an unrealistic option at this point). Also, in contrasting between RELAX NG and W3C XML Schema (WXS) in the use of namespace prefixes in attributes, Harold implies that this practice is not used in RELAX NG. Listing 1 is the example used in the book:
Listing 1. Example illustrating that RELAX NG can be written without using namespace prefixes in attributes
<rng:element xmlns:rng="http://relaxng.org/ns/structure/1.0"
name="Year" ns="http://www.example.com">
<rng:data type="gYear"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/>
</rng:element>
|
Listing 2 is a modification of Listing 1 illustrating that RELAX NG does not actually proscribe using namespace prefixes in attributes.
Listing 2. Modification of Listing 1
<rng:element
xmlns:rng="http://relaxng.org/ns/structure/1.0"
xmlns:eg="http://www.example.com"
name="eg:Year">
<rng:data type="gYear"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/>
</rng:element>
|
The approach in Listing 2 does indeed use a bound prefix within an attribute. Of course, in the equivalent WXS you have no choice but to use prefixes in attributes. RELAX NG is better in that it provides the option of avoiding them. So Harold's point is still valid overall, but I want to clarify that things are a tad more complex than he lets on.
I find the section "Reuse XHTML for Generic Narrative Content" rather too restrictive. XML is wonderful in allowing a diversity of vocabularies, and I see little reason to restrict choices in prose-like vocabularies. Docbook and TEI, for example, are semantically richer than XHTML, and are just as useful. I too tend to use XHTML for generic narrative content, but I like having the choice of others as well. Another consideration ties yet again into my concern over English bias: XHTML tags generally only make intuitive sense to an English speaker. I would recommend that someone developing XML vocabularies for use by speakers of some other language craft a vocabulary based on that language, to make working in it easier for their users. The power of XSLT and the ready availability of XML transformation tools do wonders for ensuring that no one ever has to be chained to any particular XML language for any particular task. Of course this entry would also proscribe use of the most charmingly-named XML vocabularies of all time, John Cowan's "Itsy Bitsy Teeny Weeny Simple Hypertext DTD" (IBTWSH DTD) -- however, in all seriousness I don't use this DTD because of its insistence on all-uppercase element type names.
Harold and I have often been loud voices advocating that people focus on XML as text, and place less emphasis on data typing, strong data bindings, and other such manifestations of XML as some sort of new wave DBMS replacement. His section "Pretend There's No Such Thing as the PSVI" argues against the device underlying much of the over-complication of XML: the Post Schema Validation Infoset (PSVI). This is a system of type annotations for XML documents that is used in some specifications such as WXS, XPath 2.0, XSLT 2.0, and XQuery as the bedrock data model for XML, rather than the actual XML text. I do wish this section had more to say, as well as more examples to hammer this point home. In fact, this point is important enough that I would have liked to see another section along the lines of "Treat the Raw Text of XML as Paramount in Processing."
The rest of the book covers programming- and processing-specific tips and ideas, with an emphasis on Java tools and idioms. I do not cover all that because of my focus on core XML practices and design. Essential XML is a terrific book, and I highly recommend it. I don't agree with everything in it, but I'd be mad to expect to agree with every point in such a book. Harold does lay out clear principles which, if at least carefully considered, will make any reader a better user of XML.
- Participate in the discussion forum.
- Visit Elliotte Rusty Harold's home page for his book Effective XML (Addison-Wesley, 2003), where you can preview selected sections online. See the developerWorks Developer Bookstore to order the the book.
- Find out more about some of the XML vocabularies mentioned here in my article, "A survey of XML standards: Part 3" (developerWorks, February 2004). See also the master article of my XML standards survey (developerWorks, March 2004). This series did not cover John Cowan's Itsy Bitsy Teeny Weeny Simple Hypertext DTD.
- Uche has written several articles and tips related to the XML design topics Harold covers, including:
- When to use elements versus attributes (developerWorks, March 2004)
- "Use XML namespaces with care" (developerWorks, April 2004)
- "Tip: Always use an XML declaration" (developerWorks, April 2004)
- "Tip: Namespaces and versioning" (developerWorks, June 2002)
- "Localization within a document format" (developerWorks, September 2002)
- "Keep your XML clean" (ADTmag.com)
- Find more XML resources on the developerWorks XML zone, including previous installments of the Thinking XML column. If you have comments on this article, please post them on the Thinking XML forum.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.
Comments (Undergoing maintenance)





