Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

XML Watch: Exploring alternative syntaxes for XML

Weighing the pros and cons

Edd Dumbill (edd@xml.com), Editor and publisher, xmlhack.com
Edd Dumbill is managing editor of XML.com and the editor and publisher of the XML developer news site XMLhack. He is co-author of O'Reilly's Programming Web Services with XML-RPC, and co-founder and adviser to the Pharmalicensing life sciences intellectual property exchange. Edd is also program chair of the XML Europe conference. You can contact Edd at edd@xml.com.

Summary:  XML's syntax has brought many benefits due to its interoperability, yet it can be tiresome to author XML documents. Edd Dumbill examines a range of alternative syntaxes for XML, and discusses their benefits and drawbacks.

View more content in this series

Date:  01 Oct 2002
Level:  Intermediate

Comments:  

What's in a name? That which we call a rose
By any other name would smell as sweet.
-William Shakespeare, Romeo and Juliet

One of the paradoxes of XML is that despite having a heritage from the document-creation community, it can often be remarkably frustrating to author by hand. The extra typing required to open and close tags and escape special characters not only wastes time, but introduces more possibility for error. If you don't want to buy an editor to help you get around this -- and many people don't, for various reasons including taste, principle, and the sheer intractibility of creating a general-purpose XML editor -- then you're stuck editing in longhand.

SGML, the document-oriented ancestor of XML, had a way round this. SGML included ways of adding shortcuts to reduce the amount of tagging required, and could even completely redefine document syntax. However, when XML was created, this functionality was omitted to simplify the language and increase interoperability.

Over time, though, many of the features in SGML have been reimplemented for XML -- either by standards organizations, or just by community efforts. This is somewhat ironic as, in the early days of XML, its proponents took great delight in proclaiming the simplicity of XML over SGML. Now, with all of XML's bolt-ons, the complexities of the two technologies are at least comparable!

The purpose of this article is to survey some of the most popular alternative syntaxes developed for XML, and highlight their areas of usefulness. I will not attempt to list them all, as many people have already made endeavours in this area. Alternative syntaxes have been created for various reasons: to save effort, to mimic favorite environments, to better illustrate the underlying data model, or to work better with existing tools. (In answer to the obvious question about decreasing interoperability through other syntaxes, note that none of these syntaxes purport to be an exchange syntax -- that is still left to the XML 1.0 syntax.)

Ease of use

The first two syntaxes I want to examine are major contenders in the labor saving category: PYX and SOX.

PYX is a line-oriented alternative syntax for XML. It was covered in detail on developerWorks earlier this year by David Mertz (see Resources.) Unlike the other syntaxes I will cover in this article, PYX is mainly useful as an alternative output syntax. As you will see, authoring XML using PYX does not appear to be very practical. Created by Sean McGrath, PYX is based on an SGML (surprise, surprise!) concept called Element Structure Information Set, or ESIS. PYX uses the first character of every line to represent a markup event such as an open-tag or attribute. Table 1 shows what a basic XHTML page might look like in PYX.


Table 1. Comparison of PYX against XML
grep '^(p' document.pyx | wc -l

PYX versionXML version
(html
Axmlns http://www.w3.org/1999/xhtml
-\n
(head
(title
-Test page
)title
)head
-\n
(body
-\n
)body
-\n
)html
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Test page</title></head>
<body>
</body>
</html>

PYX's chief advantage is in leveraging the rich supply of line-oriented tools, especially under the UNIX-like operating systems that have emerged over the last 20 years. Instead of having to rewrite tools to process documents using SAX or DOM, you can use familiar tools like grep and wc. For example, to count the number of paragraphs in a document, you could use the command:

grep '^(p' document.pyx | wc -l

As with all of the syntaxes discussed in this article, the creator of PYX also released tools to convert PYX to and from XML. For more information on these, see Resources.

SOX, or Simple Outline XML, draws on another common text formatting pattern, the outline. An outline is a common name given to a tree-shaped hierarchy in a document. Such a hierarchy may be expressed through special character sequences or indentation. SOX uses indentation to indicate the level of nesting of XML elements. By doing this, it can omit the closing tag of an element. Listing 1 shows the example from Table 1 in SOX.


Listing 1. Simple XHTML document expressed in SOX
		
html>
    xmlns http://www.w3.org/1999/xhtml
    head>
        title> Test page
    body>

A new user of SOX will not find it a completely unfamiliar step from XML 1.0, especially when compared to PYX. SOX's primary advantage is that it is much easier to ensure that a document is well-formed when editing with a simple text editor. Also, the restriction of one element per line means that some line-oriented processing is still possible, which means it's harder to write an obscure document.

However, because SOX is designed to be edited, and because it preserves the use of the greater than sign (>) as a special character, it has some more subtle rules for dealing with whitespace, escaping characters, and so on. For complete details, see the SOX page mentioned in Resources. Due to these subtleties, it's hard to see that SOX actually presents much benefit beyond using a decent developer's text editor with an XML editing mode, such as emacs or vim -- you still need to run your SOX file through the SOX-to-XML converter in order to check its correctness.


Mimicking programming languages

Several attempts have been made to project XML into a favorite syntax from a programming language. Benefits of this approach include:

  • You don't need to switch between alternate syntaxes when editing
  • You can take advantage of existing editing aids
  • You may be able to interpret the XML files directly in the languages' compiler or interpreter

Python -- SLiP

The syntax of the Python programming language often polarizes opinion: It relies on the indentation level of lines to indicate blocks, rather than the braces {} or parentheses () used by other languages. It certainly leads to a pleasant style of uncluttered code.

The SLiP syntax (which stands for "Sorta Like Python") for XML, developed by Scott Sweeney (see Resources), uses Python-like indentation rules, and can be formatted by any Python-aware editor. Sweeney describes his motivation: "The idea came to me while attending a conference recently. I wanted a way to take notes quickly on my laptop ... Almost all of the XML editors I have seen to date have been mouse-oriented and require constant back-and-forth between mouse and keyboard, making it impossible to keep up with the lectures. I wanted something quicker."

Listing 2 shows SLiP syntax for our trivial XHTML document.


Listing 2. Simple XHTML document expressed in SLiP
		
html(xmlns="http://www.w3.org/1999/xhtml"):
    head():
        title(): "Test page"
        body():

Scheme -- SXML

While most of the syntaxes examined in this article attempt to introduce some degree of simplification, SXML takes a different approach, prioritizing Scheme compatibility over brevity. It provides a representation for XML documents in the Scheme programming language; once such a representation is available, then operations on the document become native operations on Scheme data structures.

Listing 3 shows the example XHTML document in SXML. There is an interesting historical twist to the use of s-expressions to encode XML: Before XML burst large upon the W3C, s-expressions were being favored as one syntax for W3C recommended languages -- see the W3C's Platform for Internet Content Selection (PICS) content rating recommendation, for example, in Resources.


Lisiting 3. Simple XHTML document expressed in SXML
		
(html (@ (xmlns "http://www.w3.org/1999/xhtml"))
  (head
    (title "Test page"))
  (body))

SXML's creator, Oleg Kiselyov, has gone beyond the mere syntax, and developed SXML into a useful toolkit that includes XPath and XSLT implementations. Oleg's XML and Scheme page (see Resources) includes many interesting ideas on mingling XML with Scheme, including discussion on the idea of making XML documents executable.


Domain-specific syntaxes

Perhaps the most useful category of non-XML syntaxes is that of domain-specific syntaxes. Any general purpose alternative syntax for XML is still going to bump up against the limits of that generalness: There are few general semantic concepts, so the opportunity for collapsing them into abbreviated syntax is limited. By contrast, when applications of XML are considered, there is much more scope for collapsing multi-tag structures into a shorter representation. This section considers some of the most useful of these syntaxes.

WikiML for documentation-oriented markup

After HTML, one of the most popular markup languages on the Web is probably that used in WikiWikiWeb, a rapid-entry hypertext documentation system that uses a Web browser as its user interface. Wikis tend to use simple character-based markup to denote structure, trading off flexibility against convenience. For example, compare the following HTML hypertext links with Wiki markup:

<p>Here's a link to <a href="http://www.ibm.com">IBM</a>.</p>

Here's a link to [IBM|http://www.ibm.com].

Unfortunately, Wiki syntax tends to vary among different Wiki systems. The WikiML tools take the syntax used in the popular PHPWiki project, and convert it into an XML language, WikiML. By applying various style sheets, you can then make the transition to, say, XHTML or DocBook. Wikis are highly convenient for the particular task of writing documentation, and you would be hard-pressed to write as efficiently in raw XML.

Shrinking XSLT -- XSLTXT

XSLT must surely be one of the most contorted programming languages in popular circulation. The XML syntax does little to help the style sheet creator, and simple constructs such as "switch/case" blocks can grow to an amazing length very quickly.

XSLTXT is a project that attempts to limit the tag soup of XSLT. XSLTXT does not attempt to alter XSLT semantics at all, but just provide a reduced-clutter syntax. Table 2 shows a comparison of XSLT and XSLTXT on a typical block.


Table 2. Comparison of XSLT and XSLTXT code
XSLT codeXSLTXT equivalent
<xsl:template name="foo">
  <xsl:param name="a"/>
  <xsl:param name="b"/>
  SELECT <xsl:value-of select="$a"> FROM <xsl:value-of select="$b"/>
</xsl:template>
tpl .name "foo" ("a", "b")
  "SELECT " 
  val "$a"
  " FROM "
  val "$b"

XSLTXT, like SLiP, uses indentation to signify block structure, and uses abbreviations for keywords such as xsl:template. Additionally, parentheses are used for parameters. The XSLTXT project provides converters for XSLT and XML, and also provides a TXTReader Java class that can be used as a plug-in deserializer for XSLT in XML processors.

RELAX NG Compact

RELAX NG (RNG) is a schema language for XML, developed by an OASIS Technical Committee. One of the major forces behind RNG is James Clark, who is also the mastermind of XSLT. (One is tempted to wonder if XSLT experience motivated the RELAX NG Compact syntax.) The RNG creators recognised that when you're thinking about modeling a schema, you don't really want to waste time considering excess angle brackets. So, they created RELAX NG Compact, a non-XML syntax that implements the same concepts as the XML syntax for RELAX NG. Table 3 shows how the compact syntax aids clarity over the XML syntax.


Table 3. A comparison of RELAX NG's Compact and XML syntaxes
RELAX NG versionRELAX NG Compact version
<?xml version="1.0" encoding="UTF-8"?>
<element name="date"
xmlns="http://relaxng.org/ns/structure/1.0">
 <optional>
  <attribute name="type"/>
 </optional>
 <element name="year"><text/></element>
 <element name="month"><text/></element>
 <element name="day"><text/></element>
</element>
element date { 
  attribute type { text }?,
  element year { text },
  element month { text }, 
  element day { text }
}

The compact syntax simultaneously reduces the amount of text and makes the relationship between the elements clearer. It goes further than XSLTXT: In addition to reducing the amount of typing required, RELAX NG Compact is actually a different language -- note, for example, the use of the question mark (?) in place of the <optional> element container. Because the compact syntax has actually been developed by the same group responsible for the original language, its adoption and support is good. In contrast, a lot of the syntaxes highlighted in this article tend to originate from a single third party.

Other XML languages with complex constructs that have had non-XML syntaxes proposed include Topic Maps, for which Lars Marius Garshol has proposed the Linear Topic Map Notation, and RDF, for which Dan Connolly and Tim Berners-Lee have proposed N3, which actually an RDF superset (see Resources).


Summary

The main motivation for creating non-XML syntaxes is the difficulty inherent in authoring XML. As Scott Sweeney noted, even the best commercial XML editors require a degree of point and clicking that gets in the way of rapid, free-form content creation. At the end of the day, the contracted general XML syntaxes such as SOX and SLiP have very little to differentiate between them: Their main benefit seems to be in the ability to omit closing tags.

One downside to such contracted syntaxes is in the loss of interoperability and future-proofing. Most of the efforts come from single third-party sources. It's not entirely clear what support there is for diverse character encodings, as well as the less frequently used parts of XML such as processing instructions. Also in most cases there is only one tool originator in place, so the ideas may well just die out.

When I first wrote on this topic a year ago, developer and author Michael Champion responded, encouraging me not to lose sight of the fact that the interoperability of the XML 1.0 syntax, and consequent network effect, is the main value of XML, and what got us where we are now.

Moving beyond general-purpose XML syntax replacements, the application-specific non-XML syntaxes seem to offer a lot more value, especially where a lot of content is being created. Wiki markup, for instance, can be a huge time saver in comparison to writing straight into DocBook XML. (Eric van der Vlist has written entire books using this method.) Of course, you don't get all the features of DocBook, but there is a reasonable trade-off with ease-of-authoring. RELAX NG Compact is a good example of how a non-XML syntax can really illuminate the underlying concepts and data structures of the language without being cryptic -- and the fact that it's sanctioned by the RNG committee provides some insurance for the future.

In conclusion, is it possible to say when it is best to use a non-XML syntax? It's certainly easy to say when not to use one: when the incremental benefit over XML 1.0 is small, or the route to preserving your data in XML 1.0 is fragile or lossy. A good practice is to consider when the content is created, and whether it will be exchanged frequently. One suitable scenario for a non-XML syntax occurs when there is a one-off creation of the content up front, and it is either not to be exchanged, or exchanged after being translated into XML 1.0. I certainly don't think "well, I can carry on using Python mode in Emacs" is sufficient justification for moving data out of XML for anything but the most personal of projects.

Easy content creation is still the most compelling argument I've seen for using alternative XML representations. Editor development seems to be one of the trickiest problems in computing, and every little bit helps.


Resources

About the author

Edd Dumbill is managing editor of XML.com and the editor and publisher of the XML developer news site XMLhack. He is co-author of O'Reilly's Programming Web Services with XML-RPC, and co-founder and adviser to the Pharmalicensing life sciences intellectual property exchange. Edd is also program chair of the XML Europe conference. You can contact Edd at edd@xml.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12174
ArticleTitle=XML Watch: Exploring alternative syntaxes for XML
publish-date=10012002
author1-email=edd@xml.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).