Skip to main content

Introducing Examplotron

The fastest road to schema

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at uche@ogbuji.net.

Summary:  A zoo of XML schema languages is out there, and although some of the beasts are bigger than others none is as friendly as Examplotron. With Examplotron, your example XML document is your schema, for the most part. It requires you to learn very little new syntax, and most of the core features of XML can be specified by providing representative examples in the source. In this article, Uche Ogbuji introduces Examplotron, providing plenty of, well, examples.

Date:  10 Jun 2003
Level:  Introductory
Activity:  958 views

At first XML had the Document Type Definition (DTD). XML 1.0 came bundled with the schema technology inherited from SGML. However, numerous XML users complained about DTDs including the fact that they use a different syntax from XML itself. The W3C developed a successor technology to DTD, W3C XML Schema, but some complained that it was too complex, and that it showed every sign of design-by-committee. Separate groups developed schema technologies that became RELAX NG and Schematron. These technologies all have their strengths and weaknesses, and their attendant factions. But for the developer with deadlines to mind, crafting schemata is often too much of an additional burden.

Without a doubt , it is always a good idea to develop a schema. If for no other reason, it provides documentation of the format. But in the real world, the most common course for harassed developers is to develop a sample of the XML format to serve all purposes of a proper schema. But what if the example itself could provide the benefits of a formal schema? In particular, what if the example could be used to validate documents? Eric van der Vlist set out to develop a system that allows example documents to serve as formal schemata, and his invention is Examplotron.

In this article, I introduce Examplotron. This system is simple to use, so I encourage you to follow along by downloading Examplotron 0.7 (compile.xsl) and use your favorite XSLT and RELAX NG processors (see Resources for relevant links). The Examplotron implementation file you download is called compile.xsl: I thought this name too generic, so on my machine -- and in this article -- I have renamed it to eg-compile.xsl.

Schema by example

To use Examplotron, take most any XML instance and run it through a compiler that creates a compiled Examplotron script. The script can then be run against real instance documents to validate them. In earlier releases of Examplotron, the process was as illustrated in Figure 1:


Figure 1. Processing model of early versions of Examplotron
Processing model of early versions of Examplotron

This is similar to the most common mechanism for Schematron validation. The schema, which in the case of Examplotron is a reference instance document, is compiled into an XSLT script which can then be run against other XML documents to check its validity against the schema. The most recent Examplotron versions (including 0.7, which I cover in this article) use a different process, illustrated in Figure 2.


Figure 2. Processing model of the most recent version of Examplotron
Processing model of the most recent version of Examplotron

Keeping the simple stuff simple

Suppose I come up with an XML format for mailing address labels. To think through the format, and describe it to others, I come up with a simple example, as in Listing 1 (eg1.xml).


Listing 1. A mailing label instance and valid Examplotron schema (eg1.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels>
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

The brilliant thing is that my simple example is, without any further fuss or ado, a perfectly useful Examplotron schema. You can use any XSLT processor and the eg-compile.xsl script to compile it into a ready form for validation:

$ 4xslt -o eg1.rng eg1.xml eg-compile.xslt

The format of the 4xslt command line above is 4xslt -o [output file] [source file] [XSLT file]. The output file, eg1.rng, is a RELAX NG file. You can use any RELAX NG processor to check it. In this article, I use 4Suite's RELAX NG facilities (which are based on xvif, also by the productive Eric van der Vlist). Since eg1.xml is both a schema and a valid source document, I can apply the created schema against it:

$ 4xml --rng=eg1.rng eg1.xml

The format of the 4xml command line above is 4xml --rng=[RELAX NG schema file] [source file]. By default, the source document is echoed back to the screen as long as no RELAX NG validation errors have been found, which should be the case in the above invocation. For a more telling test, I can apply the created RELAX NG schema against a different document that conforms to eg1.xml -- such as the document in Listing 2 (test1.xml):


Listing 2. Sample document for validation against the Examplotron schema (test1.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels>
  <label>
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <state>ID</state>
    </address>
  </label>
</labels>

I apply the schema as follows:

$ 4xml --rng=eg1.rng test1.xml

And again it is valid. Listing 3 (test2.xml) is an example of an invalid document. When I validate it against eg1.rng as above, I get an error message -- "Qname quote not expected" -- which makes perfect sense because there is nothing in the Examplotron source document suggesting that a quote element is legal.


Listing 3. Sample document that is invalid against the Examplotron schema (test2.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels>
  <label>
    <quote>What thou lovest well remains, the rest is dross</quote>
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <state>ID</state>
    </address>
  </label>
</labels>


Refinements on the theme

Of course, the sample XML I've presented as an Examplotron schema probably doesn't convey all the information required for validation. For example, are there any optional elements or attributes that were omitted from the sample document? This question is addressed if you make sure that you always include all possible elements or attributes in the Examplotron source schema, even the optional ones; Examplotron includes ways for you to indicate that some elements are optional. Another question that might arise is: Can there be more than one label element? If you want to indicate that you can have more than one of an element, you can simply list it multiple times in the Examplotron source. Listing 4 (eg2.xml) is an Examplotron source file specifying that there can be one or more label elements.


Listing 4. Mailing label Examplotron source that allows multiple labels (eg2.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels>
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
  <label/>
</labels>

Notice that the second label is empty. Examplotron figures out the content model for the element from the first one, and the second one is purely a marker to indicate that it can occur more than once, so you can leave it empty. To avoid confusing people who are truly looking at the Examplotron source as a human-readable example, you may want to fill out all such elements with the expected content. Listing 5 (test3.xml) is an example of a document that is valid against Listing 4, having more than one label.


Listing 5. Sample document that is valid against Listing 4 (test3.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels>
  <label>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
  <label>
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <state>ID</state>
    </address>
  </label>
</labels>

This document is not valid against Listing 1. Since Listing 1 has only one label element, Examplotron takes it at its word and generates a RELAX NG schema that only permits the one element. Also, all elements that appear in an Examplotron schema are required by default.

Explicit indication of occurrences

This is great so far, but not quite flexible enough for the real world. Usually in XML formats, one has to specify that a certain element is optional, or appears a certain number of times. In DTDs, one uses occurrence indicators to express this. Examplotron does some very good guessing based on sample documents as they are, but in most cases you'll have to help it out a bit to get more precise results. You can provide hints to Examplotron by adding special attributes to the source document, similar to DTD occurrence indicators. Listing 6 (eg3.xml) is an Examplotron schema that specifies zero, one, or more label elements, and allows a single, optional quote element.


Listing 6. Mailing label Examplotron source that uses Examplotron hint attributes (eg3.xml)
<?xml version="1.0" encoding="utf-8"?>
<labels xmlns:eg="http://examplotron.org/0/">
  <label eg:occurs="*">
    <quote eg:occurs="?">Midwinter Spring is its own season...</quote>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

Notice the declaration of the special Examplotron namespace in this document, which is used for adding the hint attributes. The eg:occurs attribute has values similar to occurrence indicators in DTDs. Hence "*" means "zero or more", "+" means "one or more", and "?" means "zero or one". (For more on these values, see the Examplotron home page in Resources.)

Mixing things up

Mixed content -- the ability to mix child elements and plain text in XML elements -- has spotty support in some schema languages, which is unfortunate since it is one of the defining features of XML. Examplotron in its clever way makes this just as easy as any other feature. Listing 7 is an Examplotron schema that allows the optional quote element to have mixed content, and specifically embedded emph and strong tags.


Listing 7. Mailing label Examplotron source that demonstrates mixed content support
<?xml version="1.0" encoding="utf-8"?>
<labels xmlns:eg="http://examplotron.org/0/">
  <label eg:occurs="*">
    <quote eg:occurs="?">
<emph>Midwinter</emph> Spring is its own <strong>season</strong>...
    </quote>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
</labels>

Again, the basic principle of Examplotron holds: You show it an example of a construct and it works the example into RELAX NG form for you. Examplotron supports namespaces in a similar way. Just use namespaces in the source document and Examplotron builds those namespaces into the RELAX NG. Other schema features, such as data typing, are supported through hint attributes. For more detail on these more advanced features of the language, see the full Examplotron specification (in Resources), which is very readable.


Follow my example

Schema systems are perhaps the area of XML enjoying the greatest technical advancement. And yet of all the work in XML schema systems, I think that Examplotron is the most brilliant-yet-simple idea. I think you'll find that it can do wonders for productivity. On a recent project, a client who had many XML formats hired me, through my company Fourthought, to develop schemata for documentation and validation for these XML formats. All they had to start with were sample XML documents for each format. Using Examplotron to generate the production RELAX NG schemata from these sample documents saved me perhaps over a hundred hours of effort, and thus saved them tens of thousands of dollars. I did have to augment Examplotron with document generation and other refinement code; I hope to cover the non-proprietary aspects of this refinement code in a future article.

Examplotron produces RELAX NG schemata, but if you must produce W3C XML Schema, all is still well: You can use James Clark's excellent Trang tool to convert RELAX NG to WXS. I know from my overall consulting experience that sample documents are the most common form of schema in the real world, so I expect that Examplotron will be of great help to a lot of folks right away.


Resources

  • Visit the Examplotron home page which is also the (very readable) specification. The link to the XSLT script that implements Examplotron is a bit buried. I suggest renaming this script to "eg-compile.xsl" after download.

  • Examplotron schemas are compiled into RELAX NG. If you wish to learn RELAX NG, read this tutorial. See also RELAX NG's Compact Syntax, by Michael Fitzgerald.

  • Find out more about Schematron, a very powerful schema language based on rules and abstract patterns. It is often used in conjunction with other schema languages, including W3C XML Schema and RELAX NG, because it can offer some facilities not available with those languages alone. Uche Ogbuji has an Introduction to Schematron that is mostly targeted at XSLT users, and Chimezie Ogbuji has a more general introduction.

  • For more information on W3C XML Schema, see the home page and the exhaustive Cover pages.

  • Read what David Mertz has to say about RELAX NG in his "XML Matters" column here on developerWorks.
    • Part 1 of this three-part series gives a fairly complete overview of both the syntax and semantics of RELAX NG schemas (February 2003).
    • Part 2 addresses a few additional semantic issues and looks at tools for working with RELAX NG (March 2003).
    • Part 3 looks at tools for working with the RELAX NG compact syntax and transforming between it and the RELAX NG XML syntax form (May 2003).


  • See the W3C XSL Home page for links to XSLT processors you can use. See the RELAX NG home page for links to RELAX NG processors.

  • Try James Clark's Trang tool to translate between a variety of XML schema languages.

  • The author uses 4Suite for XSLT and RELAX NG processing in this article. The RELAX NG support in 4Suite is based on xvif, also by Eric van der Vlist.

  • Find more XML resources on the developerWorks XML zone.

  • Check out Rational Application Developer for WebSphere Software, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.

  • Find out how you can become an IBM Certified Developer in XML and related technologies.

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at uche@ogbuji.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12279
ArticleTitle=Introducing Examplotron
publish-date=06102003
author1-email=uche@ogbuji.net
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers