 | Level: Intermediate Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
08 Oct 2004 If you have the basics of an XML format in mind, but know that you will not be able to get everyone at the table to agree to every detail of the schema, consider Schematron abstract patterns. Schematron is probably the most powerful XML schema language available (and it can be much more than just a schema language). Its advanced features, especially abstract patterns, allow for schemata that you can quickly adapt to multiple variants of XML formats. This opens up extraordinary possibilities for XML schema, including the abilities to restrict XML formats and to make them generic and adaptable as well.
ISO Schematron is a very unique XML schema language, offering extraordinary power in conjunction with other schema languages or on its own. As I pointed out in my tutorial "A hands-on introduction to Schematron," Schematron is not just a schema language but a full-blown reporting facility for XML. This article covers a number of advanced ISO Schematron features, so I encourage you to read that tutorial if you are not yet familiar with the technology. I shall focus on variable assignment and abstract patterns, which open up some impressive possibilities for XML schema design. In the tutorial and in this article, I use the term candidate XML to describe the XML file against which a Schematron schema is invoked.
To be precise, as I mentioned in my tutorial, Schematron is a host language for many potential means of accessing data (which could include XML or something else, such as flat text or database formats). But almost every Schematron implementation that I know of uses XPath and XSLT as query languages and is used for processing XML. In this article, I assume such an implementation. For testing, I use the Scimitar ISO Schematron toolkit (see Resources).
Variables in Schematron tests
Schematron was inspired by many of the ideas in XSLT, including the use of variables to help simplify code. This is especially useful in cases of data-type constraints, where you are trying to validate that some bit of text in XML is a valid representation of some given data type (such as integer, date, time, or URL). Often the XPaths that you use to check the lexical properties of the source data can be long and complex. Judicious use of variables can help simplify them. You can reference variables in the test attributes of assert and report elements in the usual way -- for example, $var. You can assign a value to the variables in Schematron using the let instruction. Listing 1 is Schematron code for validating that a provided currency figure is between $10,000 and $1,000,000 dollars, allowing for the optional use of commas and a dollar sign.
Listing 1. Validating a currency figure
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
<title>Example of let</title>
<pattern name="Nor too many nor too few dollars">
<rule context="money">
<!-- Remove dollar signs and US convention comma separators -->
<let name="amount" value="number(translate(@amount, '$,', ''))"/>
<assert test="$amount >= 10000 and $amount <= 1000000">
Amount should range from ten thousand to one million dollars
</assert>
</rule>
</pattern>
</schema>
|
The XPath $amount >= 10000 and $amount <= 1000000 is certainly an improvement over number(translate(@amount, '$,', '')) >= 10000 and number(translate(@amount, '$,', '')) <= 1000000 ("<" must be escaped in XPaths used in Schematron, just as in XPaths used with XSLT). The let instruction's value attribute is computed at run time using the context defined in the enclosing rule. Listings 2, 3, and 4 are very simple examples that are all valid against Listing 1. Listings 5 and 6 are invalid.
Listing 2. Valid against Listing 1
<money amount="$300,000.00"/>
|
Listing 3. Valid against Listing 1
<money amount="500,000"/>
|
Listing 4. Valid against Listing 1
<money amount="10000.00"/>
|
Listing 5. Not valid against Listing 1
<money amount="$1,000.00"/>
|
Listing 6. Not valid against Listing 1
<money amount="$3,000,000.00"/>
|
The let instruction can also appear outside rule, specifically as a child of schema, pattern, or phase. If let appears outside rule, its value is computed with the document root as context node. The use of phase-specific variables is especially interesting as a way of parameterizing the tests performed during various phases.
Schematron abstract patterns
You can use variables for limited parameterization of schemata, by which I mean allowing for parts of the expressions within certain rules to be specified using code outside the rule itself. The limits revolve around how you can use variables in XPath. For example, if one of the things you want to parameterize is the name of elements in the candidate XML, it can greatly complicate your expressions (for example, you have to use predicates rather than simple element name tests).
Schematron's abstract patterns serve as a mechanism for more flexible parameterization. An abstract pattern is a special pattern that is treated essentially as a template. Parameters use the same syntax as XPath variables ("$param"), but are very different in meaning. The Schematron processor performs a preprocessor pass during which it does a literal string substitution of the parameter with the given value. The most important use for abstract patterns is in writing schemata that are flexible as to the precise vocabulary used in the candidate XML, focusing on the general ideas expressed in the XML. As an example, tables are a very common construct in XML formats for readable documentation. But the Basic Table module of XHTML 1.1 uses a different vocabulary for tables than DocBook. The former is a vocabulary derived from HTML; the latter is based on the CALS standard for tables in SGML languages. Listing 7 is an example of an abstract pattern with some sample constraints on the general idea of an XML table structure.
Listing 7. Abstract pattern for some example constraints on tables
<pattern abstract="true" name="table">
<rule context="$table">
<assert test="$row">A table has at least one row</assert>
</rule>
<rule context="$row">
<assert test="$cell">A table row has at least one cell</assert>
</rule>
</pattern>
|
The attribute abstract="true" establishes that this is an abstract pattern. The queries in this pattern (chiefly within the context and test attributes) can now have parameter references such as $table. These are very different from XPath variable references; they are purely string substitutions to be computed before run time, and effectively allow one to rewrite the queries using the given parameters. Other than the use of parameter references, abstract patterns look just like non-abstract ones. Listing 8 is an example of the use of an abstract pattern, creating concrete versions by providing values for the parameters.
Listing 8. Using an abstract data type
<pattern name="xhtml-basic-table" is-a="table">
<param formal="table" actual="table"/>
<param formal="row" actual="tr"/>
<param formal="cell" actual="td"/>
</pattern>
<pattern name="cals-table" is-a="table">
<param formal="table" actual="ctable"/>
<param formal="row" actual="tbody/row"/>
<param formal="cell" actual="entry"/>
</pattern>
|
Listing 8 creates two patterns, concrete instances of the abstract pattern given by the is-a attribute -- in this case is-a="table". The param elements provide a value for each parameter reference used in the abstract pattern. So, for example, the first param in the first pattern of Listing 8 has an attribute formal="table" to indicate that it is providing a value for parameter instances of the form $table in the abstract pattern. The value is given by the actual attribute (actual="table" in this case). This is simple replacement text, and not processed in any special way. Listing 9 is a mock-up of the pattern that emerges from applying the parameters in the second pattern of Listing 8 to the abstract pattern in Listing 7. The replacement text is highlighted in blue.
Listing 9. Mock-up of an abstract pattern with parameters resolved
<pattern name="cals-table">
<rule context="table">
<assert test="tbody/row">A table has at least one row</assert>
</rule>
<rule context="tbody/row">
<assert test="cell">A table row has at least one cell</assert>
</rule>
</pattern>
|
You can only use patterns with an is-a attribute to specify parameters. You cannot also have rules or let elements in such patterns. For example, if you wish to add a pattern that enforces the use of tbody in the CALS table alone, you cannot just add such a rule to the second pattern in Listing 8 -- you must create an entirely new, concrete pattern for the additional rule. Listing 10 is a complete Schematron example that combines Listings 7 and 8, and adds an additional, concrete pattern that only applies to the CALS table form.
Listing 10. Complete Schematron example using abstract patterns
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
<title>Table abstract patterns</title>
<pattern abstract="true" name="table">
<rule context="$table">
<assert test="$row">A table has at least one row</assert>
</rule>
<rule context="$row">
<assert test="$cell">A table row has at least one cell</assert>
</rule>
</pattern>
<pattern name="xhtml-basic-table" is-a="table">
<param formal="table" actual="table"/>
<param formal="row" actual="tr"/>
<param formal="cell" actual="td"/>
</pattern>
<pattern name="cals-table" is-a="table">
<param formal="table" actual="ctable"/>
<param formal="row" actual="tbody/row"/>
<param formal="cell" actual="entry"/>
</pattern>
<!-- special stand-alone pattern for CALS-specific table rules -->
<pattern name="cals-table-extra">
<rule context="ctable">
<assert test="tbody">A table has a tbody</assert>
</rule>
</pattern>
</schema>
|
Endless possibilities
As you can see, the basic mechanism of Schematron abstract patterns is very simple. Nevertheless, it opens up a great deal of power in expressing XML schemata. It is common to have a set of basic ideas for a schema widely agreed upon, but not the precise syntax. As an example, you can represent a purchase order through a seemingly infinite number of XML formats, each of which require a set of item elements, a delivery address, and so on. Schematron abstract patterns allow you to express such general ideas independently of the precise syntax used for the purchase order format. This flexibility is one of the advantages offered by the Architectural Forms specification from SGML. Architectural Forms are a way to create special DTDs that allow for the remapping of XML names. They are very sophisticated but also very complex, so few people have been able to understand them well enough to take advantage of their power. Schematron offers a much simpler and yet more flexible approach, building on its use of XPath for querying.
You can gain even more expressive power by augmenting Schematron abstract patterns with semantically rich annotations as described in my article "Use data dictionary links for XML and Web services schemata." The resulting schemata would be readily adaptable to any syntax, while at the same time offering semantic transparency, a term I coined for my Thinking XML column, meaning the ability for XML systems to correctly interpret information regardless of what precise syntax is used for the information.
Even if your immediate concerns don't run as far as semantic transparency, you may find abstract patterns a useful device. If you have users of an XML vocabulary in different international locales and you'd like to provide localized versions for element and attribute names, controlled vocabularies, and the like, consider abstract patterns. Overall, this Schematron device could make you rethink the way you design XML schemata -- all for the better.
Resources
- Learn all about Schematron in Ogbuji's tutorial "A hands-on introduction to Schematron" (developerWorks, September 2004).
- Use Scimitar, the implementation of ISO Schematron used for this article. A couple of other implementations of ISO Schematron are mentioned in this Weblog entry by Schematron inventor Rick Jelliffe.
- Read more about Architectural Forms and their relationship to Schematron abstract patterns in Leigh Dodds' article "Schematron and Architectural Forms," from which you'll find many useful links.
- Investigate XHTML 1.1 and its modules defined in "Modularization of XHTML" -- section 5.6.1 is the "Basic Tables Module." Note that for actual use, the author suggests the "Tables Module", which might be better named and defines a table model more suited for accessibility.
- Reference the OASIS Technical Memorandum TM 9502:1995, which defines the CALS Table Model DTD. It is a widely adopted SGML representation for tables, and overall a very influential model for tables in various document languages.
- Read more on semantic transparency, XML design, and other topics in Ogbuji's Thinking XML column, and don't forget that the Thinking XML discussion forum is a good place to post questions and comments on such matters. Rick Jelliffe gave a (currently out-of-date) example of abstract patterns in an XML-DEV thread about syntactic flexibility.
- Browse for books on these and other technical topics.
- Discover more XML resources on the developerWorks XML zone.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the author  | 
|  | Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net. |
Rate this page
|  |