Readers of my earlier installments on RELAX NG (Part 1 and Part 2) will have noticed that I chose to provide many of my examples using compact syntax rather than XML syntax. Both formats are semantically equivalent, but the compact syntax is, in my opinion, far easier to read and write. Moreover, readers of this column in general will have a sense of how little enamored I am of the notion that everything vaguely related to XML technologies must itself use an XML format. XSLT is a prominent example of this XML-everywhere tendency and its pitfalls -- but that is a rant for a different column.
Later in this article, I will discuss the format of the RELAX NG compact syntax in more detail than the prior installments allowed.
On the downside, since the RELAX NG compact syntax is newer -- and not 100% settled at its edges -- tool support for this syntax is less complete than for the XML syntax. For example, even though the Java tool trang supports conversion between compact and XML syntax, the associated tool jing will only validate against XML syntax schemas. Obviously, it is not overly difficult to generate the XML syntax RELAX NG schema to use for validation, but direct usage of the compact syntax schema would be more convenient. Likewise, the Python tools xvif and 4xml validate only against XML syntax schemas.
To help remedy the gaps in direct support for compact syntax, I have produced a Python tool for parsing RELAX NG compact schemas, and for outputting them to XML format. While my rnc2rng tool only does what trang does, Eric van der Vlist and Uche Ogbuji have expressed their interest in including rnc2rng in xvif and 4xml, respectively. Ideally, in the near future direct validation against compact syntax schemas will be included in these tools.
Writing rnc2rng proved more difficult than I anticipated; and
there is probably a lesson in that. While RELAX NG compact syntax
is quite readable -- as you will see below -- there are enough
variations in the arrangement of tokens between instances that a
parser was non-trivial to write. For better or worse, I use
PLY's
lex module
to tokenize the schema, but gave up on using
yacc for the parsing, and opted for
application-specific massaging of
the token stream instead. Debugging declarative grammars is often
more difficult than incrementally adjusting imperative code.
Despite my frequent concern about the unfriendliness of XML, the
task of parsing an XML syntax schema would have been far simpler,
since I could have let a framework like SAX or DOM do most of
the work for me.
Since the last installment, tool support for RELAX NG has gotten a little bit better. Version 2.0 of the <oXygen/> XML editor has been released, incorporating trang as a plug-in, and thereby offering some support for RELAX NG. While this is not the place for a full review, I found that <oXygen/> 2.0 -- which I liked in version 1.2 to start with -- has gained a number of nice features and general polish. I would like to see RELAX NG integrated at a deeper level into various editors -- to a degree similar to DTD and W3C XML Schema. With a bit more time, I think greater RELAX NG integration into tools is likely.
A compact syntax RELAX NG schema may begin with any of several optional namespace declarations. Each of these looks a lot like an assignment statement in a programming language. A default namespace for schema tags may be specified with:
default namespace = "http://relaxng.org/ns/structure/version" |
When converted to XML syntax, use of this declaration appends an
"ns" attribute to the root element of the schema. If this
namespace is not explicitly specified, the default default
namespace is used, and is declared with the root attribute, such as:
<root-tag xmlns="http://relaxng.org/ns/structure/1.0"> |
You may also declare an external namespace for elements or attributes:
namespace foo = "http://some.path.to/foo" |
This allows you to describe elements like:
element foo:bar { ... }
|
When converted to XML syntax, the namespace URL is added to the root tag as an extra attribute:
<root-tag xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:foo="http://some.path.to/foo">
|
The namespace "a" is a bit special here. RELAX NG allows
annotations, which are basically just tags with the "a"
namespace. In compact syntax, you can avoid thinking about
namespaces by adding an annotation with initial double hash
marks:
## An annotation |
Converted to XML syntax, this annotation appears as:
<a:documentation>An annotation</a:documentation> |
By the way, a single leading hash introduces a comment instead of an annotation, so the following compact syntax form:
# This is a comment |
corresponds to this XML form:
<!-- This is a comment --> |
You can also use a slightly odd compact syntax form to specify
other annotations within the "a" namespace:
[ a:defaultValue = "foo" ] |
A root attribute "xmlns:a" will be specified automatically in
the XML syntax if annotations are used, but since "a" is just
another namespace, you can specify your own URL if you want.
The default attribute is equivalent to specifying:
namespace a = "http://relaxng.org/ns/compatibility/annotation/1.0" |
One more special namespace is specified differently in both syntax forms. Data types rely on a modular specification, usually using W3C XML Schema data types. You may specify these with compact syntax:
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes" |
or XML syntax:
<root-tag xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> |
Syntax features: Nested And context-free
The main body of a RELAX NG grammar may have either of two styles. In some way, the more direct style is to simply nest elements and attributes where they should occur in a valid instance. Generally, it is good form to use indentation much as you would in a programming language, but as in C-family languages, curly braces are the actual block delimiters. A moderately complete schema would look like this:
Listing 1. A nested compact syntax schema
# A library patron example
default namespace = "http://some.other.url/ns"
namespace foo = "http://home.of.foo/ns"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
## Annotation here
element patron {
element name { xsd:string { pattern = "\w{,10}" } }
& element id-num { xsd:string }
& element book {
( attribute isbn { text }
| attribute title { text }
| attribute anonymous { empty })
}*
}
|
The library patron example uses most of the syntax elements.
"&"s are interspersed between elements (or attributes) indicating that
the several elements must occur, but may do so in any order. In
XML syntax, this is the same as the <interleave> tag. Likewise,
interpersed "|"s indicate a choice between several items -- in XML,
<choice>. Notice the "book" element, too: The parenthesis indicate a group, but they are redundant in this case. A group
(XML: <group>), however, is useful as part of quantification or
interpersal. For example:
Listing 2. Using groups for quantification
element foo {
( element bar { text },
element baz { text } )+,
element bam { text } }
|
In this case, a valid document's root <foo> element might
contain several <bar></bar><baz></baz> sequences prior to one
final <bam> element. There is no way to express the same
concept by only quantifying the individual "bar" and "baz" elements.
A nested-style RELAX NG grammar need not describe a single element only. Any well-formed XML document must have a single root element, so clearly an attribute at the top is prohibited. Likewise, a sequence or interleave description at the top level could not describe a well-formed XML document, and therefore it could not describe a valid one. But there is nothing wrong with allowing a choice of root elements, such as:
( element foo {text}
| element bar {text} )
|
A second style of RELAX NG grammar more closely resembles a DTD.
A special production named "start" is indicated at the
beginning, followed by a variety of other named productions. As
with namespace declarations, a production is named in the manner
of an assignment in a programming language. For example, a
library patron schema could also look something like this:
Listing 3. A context-free compact syntax schema
# A library patron example
default namespace = "http://some.other.url/ns"
namespace foo = "http://home.of.foo/ns"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
## Annotation here
start = patron
patron = name & id-num & book
name = element name { xsd:string { pattern = "\w{,10}" } }
id-num = element id-num { xsd:string }
book = element book {
( attribute isbn { text }
| attribute title { text }
| attribute anonymous { empty }) }*
|
Names of productions may occur within other productions, which can prevent repetitions, and generally make complex patterns more readable. Beyond readability, naming patterns allows recursive definition of patterns -- either direct or mutual recursion. For example, describing HTML -- where tables can nest within tables, or lists within lists -- is not possible in a strictly nested style. An upshot of recursive XML instance documents is to make DTDs and context-free RELAX NG much more natural as descriptions than is W3C XML Schemas (but you can get what is needed out of W3C XML Schemas; it just requires more work).
It is probably worth looking at an entire XML syntax RELAX NG schema document. For comparison, Listing 4 is what rnc2rng produces when processing the context-free library patron schema in Listing 3:
Listing 4. A context-free XML syntax schema
<?xml version="1.0" encoding="UTF-8"?>
<!-- A library patron example -->
<grammar xmlns="http://relaxng/ns/structure/1.0"
ns="http://some.other.url/ns"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:foo="http://home.of.foo/ns">
<a:documentation>Annotation here</a:documentation>
<start><ref name="patron"/></start>
<define name="patron">
<interleave>
<ref name="name"/>
<ref name="id-num"/>
<ref name="book"/>
</interleave>
</define>
<define name="name">
<element name="name">
<data type="string"/>
<param name="pattern">\w{,10}</param>
</data>
</element>
</define>
<define name="id-num">
<element name="id-num">
<data type="string"/>
</element>
</define>
<define name="book">
<zeroOrMore>
<element name="book">
<choice>
<attribute name="isbn"/>
<attribute name="title"/>
<attribute name="anonymous">
<empty/>
</attribute>
</choice>
</element>
</zeroOrMore>
</define>
</grammar>
|
I would say this is easier to read than a W3C XML Schema, but it doesn't even come close to the compact syntax (prior installments pointed out that this schema is actually impossible to express precisely in either a W3C XML Schema or a DTD).
In some of these examples you'll notice that elements
and attributes in compact syntax always contain something
in curly braces after their name. In XML syntax you can
self-close an attribute tag, but to prevent ambiguity you need
to specify at least {text} or {empty} for an attribute
body. Of course, you can also use a more complex data type description if
you wish. Also, the only quantification that makes
sense for attributes is "?" -- attributes might be optional, but
they will not be repeated multiple times.
In some corner cases, rnc2rng differs from trang. For example, both tools force an annotation to occur inside a root element in XML syntax, even if the annotation line occurs before the root element in the compact syntax. Since well-formed XML documents are single-rooted, this is a necessity. But trang also moves comments in a similar manner, while rnc2rng does not. At a minimum, the two tools use whitespace in a slightly different manner. Most likely, a few other variations exist, but ideally none that are semantically important.
- Participate in the discussion forum.
-
Download the xvif library. For a somewhat more polished tool,
4Suite
incorporates xvif for RELAX NG validation. The command-line
tool 4xml will validate against both RELAX NG and DTDs, with
various options. 4Suite includes many other tools and
libraries for working with many XML-related technologies.
-
trang and jing are complementary tools for
transformation between schemata, and validation against RELAX NG
schemas. The former depends on the latter but both can be
downloaded in a convenient archive
here.
- You will need to obtain an implementation of the Java API for
XML Processing (JAXP) to use trang. If you run a Java 1.4
JVM, you are fine; otherwise, download crimson
here.
-
DTDinst
is a Java tool to for converting DTDs into an XML instance
document format, including handling of parametric entities.
The DTDinst XML format is of limited utility by itself, since
nothing else works with it. However, an
XSLT stylesheet
is available to transform this format into RELAX NG (with a few
caveats). You will need an XSLT tool to utilize this.
- Find a collection of documents and tools presented in this series of
articles here.
- Read David Mertz's roundup of XML editors: Part 1 examines Java and MacOS applications(including <oXygen/>),
while Part 2 looks at Windows-based products.
You'll find all of the previous installments of the XML Matters column.
- Find more XML resources on the developerWorks XML zone.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

David Mertz thinks that the schema that is real is not the real schema. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future, columns are welcomed.
Comments (Undergoing maintenance)





