Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

A hands-on introduction to Schematron

Directly express rules without creating a whole grammatical infrastructure

Uche Ogbuji, Consultant and Co-Founder, Fourthought
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche.ogbuji@fourthought.com.

Summary:  Meet Schematron, a schema language that allows you to directly express rules without creating a whole grammatical infrastructure. Schematron is useful whenever you wish to apply and check against rules for the contents of XML documents. Schematron is extraordinarily flexible in the variety of rules you can express, and it's even more expressive than other schema languages such as DTD, W3C XML Schema (WXS) and RELAX NG. In this tutorial, author Uche Ogbuji uses detailed examples to illustrate Schematron's use, and offers recipes for common schema needs.

Date:  02 Sep 2004
Level:  Intermediate PDF:  A4 and Letter (87 KB | 24 pages)Get Adobe® Reader®

Activity:  29039 views
Comments:  

Intermediate Schematron features

Querying namespaces

Schematron provides full support for XML namespaces. To declare a namespace for use in rules, add an ns instruction as a child of the schema. Give it a prefix with the namespace prefix to be used within the schema, and a uri with the namespace name (a URI). As I have mentioned, the prefix you declare in the schema is only used to resolve namespaces within expressions in the schema, not in the candidate document. For example, if the candidate is an XHTML document, it would not use a prefix for XHTML elements, but you must declare a prefix such as "html" in order to match XHTML elements in your schema. XPath and XPattern (used in rule contexts) require that you use prefixes for all namespace-aware node expressions. The following schema validates that an XHTML document has a title:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
  <title>Special XHTML conventions</title>
  <ns uri="http://www.w3.org/1999/xhtml" prefix="html"/>
  <pattern name="Document head">
    <rule context="html:head">
      <assert test="html:title">
          Page does not have a title.
      </assert>
    </rule>
  </pattern>
</schema>

This code listing is eg5_1.sch in x-schematron-files.zip. Run it against eg5_1_good1.xml, which has a title, and eg5_1_bad1.xml, which does not. Run these using your Schematron implementation of choice, and experiment with the schema and the candidate documents.

Using namespaces in output

As I mentioned, you can basically think of Schematron as a reporting framework. Toward this end, Schematron also supports using elements and attributes in output. Any element that appears in an output message but not in the Schematron namespace gets copied to the output as is. This example is the same as that in Validating the presence of elements, except with XHTML elements used for output.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron"
        xmlns:html="http://www.w3.org/1999/xhtml">
  <title>Technical document schema</title>
  <pattern name="Major elements">
    <rule context="doc">
      <assert test="section">
        <html:p>
          <name/> must have at least one <html:code>section</html:code>
          child.
        </html:p>
      </assert>
    </rule>
  </pattern>
</schema>

Notice the new namespace declaration on the root element. Such namespace declarations are only used for output elements. As discussed in Querying namespaces, if you want to use namespaces in queries, you must use the ns instruction for declaration. This means that if you are querying XHTML elements as well as using XHTML in output, you have to declare the namespace twice, using both mechanisms.

This listing is eg5_2.sch in x-schematron-files.zip. Run it against eg5_2_good1.xml, which has the required element, and eg5_2_bad1.xml, which doesn't. Run these using your Schematron implementation of choice, and experiment with the schema and the candidate documents.


Keys: An introduction

DTDs allow you to associate one element with another by using attributes of type ID and IDREF, which make up a mechanism so limited it hasn't received much use in industry practice. Other schema languages provide somewhat better ways to tie elements and attributes together, but Schematron provides unique power and flexibility by borrowing XSLT's key facility.

You can create a Schematron key by using a key instruction within the schema. It includes:

  • A use attribute
  • An XPath that gives the value for the key item
  • A match attribute, which determines what nodes are covered
  • A name attribute -- a simple string that gives the name of the key

The Schematron processor gathers all the nodes in the candidate document that match the XPattern given in match, and creates a look-up table with the given name. The key of each row in the look-up table (the look-up string) is the result of evaluating use against the matched nodes, and the value is a list of nodes with same look-up string.

You can access any keys you have defined in XPath expressions using the key function, which takes two parameters: the name of the key and the look-up string. The result is a node set with all nodes from the table corresponding to the look-up string. The example in the next panel, Keys: An example, helps illustrate this.


Keys: An example

You can use keys to check the reference of one value in a document against other values. In this example of keys, I refer back to the technical submissions scenario. A main-contact element is allowed in the prologue, with the restriction that its e-mail attribute must match the same attribute in one of the authors.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
  <title>Technical document schema</title>
  <key name="author-e-mails" match="author" use="@e-mail"/>
  <pattern name="Main contact">
    <rule context="main-contact">
      <assert test="key('author-e-mails', @e-mail)">
        "e-mail" attribute must match the e-mail of one of the authors.
      </assert>
    </rule>
  </pattern>
</schema>

The key definition maps each author's e-mail address to the author node. The key is invoked in the assertion check by looking up the e-mail used for the main-contact; if the look-up fails, the result is an empty node set, which is converted to boolean as false, and causes the assertion to fail.

This listing is eg5_3.sch in x-schematron-files.zip. Run it against eg5_3_good1.xml, which has a valid main contact, and eg5_3_bad1.xml, whose main contact does not match any author. Run these using your Schematron implementation of choice, and experiment with the schema and the candidate documents.


Validating based on conditions in the document

Very often you'll want to validate one part of a document based on what occurs in another part. This is something called a co-occurrence constraint. WXS and DTD cannot handle such validation at all, and RELAX NG can handle only limited examples, but Schematron provides extraordinary power for such validation tasks. This schema checks that content by each author includes at least three sections, with the goal of encouraging longer submissions and discouraging people from padding the author list.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
  <title>Technical document schema</title>
  <pattern name="Section minimum">
    <rule context="doc">
      <assert test="count(section) >= 3*count(prologue/author)">
        There must be at least three sections for each author.
      </assert>
    </rule>
  </pattern>
</schema>

When using Schematron, don't think in terms of other schema languages, or you probably won't take advantage of all its power. Just think of what rules you'd like to express about the candidate document, and chances are you'll be able to find a way to express it using XPath, and thus in Schematron.

The code listing above is eg5_4.sch in x-schematron-files.zip. Run it against eg5_4_good1.xml, which meets the section count minimum, and eg5_4_bad1.xml, which does not. Run these using your Schematron implementation of choice, and experiment with the schema and the candidate documents.


Phases

If I were to combine all the rules in this tutorial into one Schematron schema, it would be a large one. Schematron allows for modularity of schemata by allowing patterns to be organized into phases. A phase is a simple collection of patterns that are executed together. Some Schematron implementations allow you to select a particular phase to process. The following large sample schema incorporates several of the example rules from this tutorial, and organizes them into phases.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
  <title>Technical document schema</title>
  <key name="author-e-mails" match="author" use="@e-mail"/>
  <phase id="quick-check"> <!-- "minimal sanity check" -->
    <active pattern="rightdoc" />
  </phase>
  <phase id="full-check">
    <active pattern="rightdoc" />
    <active pattern="extradoc" />
    <active pattern="majelements" />
  </phase>
  <phase id="process-links">
    <active pattern="report-link" />
  </phase>
  <pattern id="rightdoc" name="Document root">
    <rule context="/">
      <assert test="doc">Root element must be "doc".</assert>
    </rule>
  </pattern>
  <pattern id="extradoc" name="Extraneous docs">
    <rule context="doc">
      <assert test="not(ancestor::*)">
        The "doc" element is only allowed at the document root.
      </assert>
    </rule>
  </pattern>
  <pattern id="majelements" name="Major elements">
    <rule context="doc">
      <assert test="prologue">
        <name/> must have a "prologue" child.
      </assert>
      <assert test="section">
        <name/> must have at least one "section" child.
      </assert>
    </rule>
  </pattern>
  <pattern id="report-link" name="Report links">
    <rule context="*">
      <report test="@link">
        <name/> element has a link to <value-of select="@link"/>.
      </report>
    </rule>
  </pattern>
</schema>

The code is eg5_5.sch in x-schematron-files.zip. To trigger various validity messages and reports, run it against the various files eg5_5_goodx.xml and eg5_5_badx.xml, where x indicates a number that specifies a particular file. The files include documents that should trigger various validity messages and reports. Run these using your Schematron implementation of choice, and experiment with the schema and the candidate documents.

5 of 9 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=138368
TutorialTitle=A hands-on introduction to Schematron
publish-date=09022004
author1-email=uche.ogbuji@fourthought.com
author1-email-cc=