XML Schema 1.1, Part 2: An introduction to XML Schema 1.1

Co-occurence constraints using XPath 2.0

In this second of a six-part series, take an in-depth look at the co-constraint mechanisms introduced by XML Schema 1.1, specifically the new assertions and type alternatives features with authors Neil Delima, Sandy Gao, Michael Glavassevich, and Khaled Noaman.

Share:

Neil Delima (ndelima@ca.ibm.com), Software Developer, IBM

Neil Delima is a Staff Software Developer at the IBM Toronto Lab. As a member of the XML Parser Development team, he has worked on developing and testing XML technology for over seven years. He is a committer on Apache's Xerces-Java parser project and has contributed to the W3C DOM and XML 1.1 test suites.



Sandy Gao (sandygao@ca.ibm.com), Software Developer, IBM

Sandy (Shudi) Gao is a software developer at the IBM Toronto Software Lab. He has been a committer to the Apache Xerces XML Parser (Java) project since 2001 and was one of the key contributors to the XML Schema support therein. Sandy has been representing IBM in W3C XML Schema Working Group since 2003. He contributed significantly to XML Schema version 1.1 development and became an editor of the specification in 2006. Sandy is also representing IBM in W3C SML Working Group.



Michael Glavassevich (mrglavas@ca.ibm.com), Software Developer, IBM

Michael Glavassevich is a member of the XML Parser Development team at the IBM Toronto Lab. He has been one of the main contributors to the Apache Xerces2 project for the last five years, working on, among other things, the implementation of XML Schema, XInclude, JAXP 1.3/1.4 and DOM Level 3. Michael also represented IBM in the JAXP Expert Group that developed JAXP 1.4.



Khaled Noaman (knoaman@ca.ibm.com), Software Developer, IBM

Khaled Noaman is a member of the XML Parser Development team at the IBM Toronto Lab. He has been involved in the development of the Apache Xerces-C++ parser for over five years and implemented many of the parser features including support for XML Schema Structures.



13 January 2009

Also available in Chinese Russian Vietnamese

Introduction

Complex and simple type definitions in XML Schema 1.0 allow schema authors to specify and restrict the content of elements and values of attributes. According to the XML Schema 1.0 specification, complex type definitions constrain elements by providing attribute declarations that govern the appearance and contents of attributes by restricting elements to be empty or to conform to a specific content model, such as element-only, mixed, or simple content determined by a simple type definition of the content.

Frequently used acronyms

  • DOM: Document Object Model
  • HTML: Hypertext Markup Language
  • W3C: World Wide Web Consortium
  • XDM: XPath 2.0 Data Model
  • XML: Extensible Markup Language
  • XSLT: Extensible Stylesheet Language Transformations

Complex type definitions also define a mechanism that governs a type definition hierarchy which determines how complex types can be derived from other complex types or simple types by extension or restriction. Substitution groups on complex types control the substitution of elements with elements of its derived type. Simple types on the other hand constrain the character values of the contents of elements and attributes.

In this article we discuss co-occurrence constraints, a new feature introduced in XML Schema 1.1 to not only constrain the content of elements and attributes, but their existence as well.

A bit of history

As we mentioned in the first article of the series, XML Schema 1.0 has certain limitations. Beyond the constraints mentioned above, XML schema authors often needed to enforce more complex rules that determine and restrict the content of elements and attributes, such as the ability to restrict the appearance of certain child elements based on the value of an attribute, having the total sum of child elements not exceed a certain value, or allowing the value of a child element to be valid based on the scope in which it is found.

Unfortunately, XML Schema 1.0 did not provide a way to enforce these rules. To implement such constraints, you would

  • Write code at the application level (after XML schema validation)
  • Use stylesheet checking (also a post-validation process)
  • Use a different XML schema language such as RelaxNG or Schematron

With the constant requests for co-occurrence constraint checking support from the XML Schema 1.0 user community, the XML Schema 1.1 working group introduced the concept of assertions and type alternatives in XML Schema 1.1 to allow XML schema authors to express such constraints.

Assertions

Assertions provide XML schema authors with a flexible way to control the occurrence and values of elements and attributes.

Usage scenarios

Before you delve into how assertions are defined in XML Schema 1.1, first take a look at some usage scenarios.

  1. Specify a constraining rule based on the values of two or more attributes. Given the XML fragment in Listing 1, you can specify a rule between attributes width and height so that the height is never be greater than the width.
    Listing 1. XML fragment - element with two attributes
    <dimension width="10" height="5"/>
  2. Specify a constraining rule between attributes and elements. In Listing 2, we have an element that has an attribute and two child elements. You can specify a rule between an attribute and the child elements such that value of the attribute equals the number of child elements.
    Listing 2. XML fragment - element with one attribute and two child elements
    <parent children="2">
      <child name="one"/>
      <child name="two"/>
    </parent>
  3. Specify a constraining rule that determines the order and choice between attributes. For the element defined in Listing 3, you can specify a rule where timer has either a time or iterations attribute but not both.
    Listing 3. XML fragment - timer element
    <timer time="30" iterations="2000"/>
  4. Specify a grouping of elements and attributes into a model group. For example, you can restrict the content of element parent (defined in Listing 4), by specifying a rule that forces the content to be either child or grandchild and both elements having the attributes name and dob.
    Listing 4. XML fragment - A parent element
    <parent>
      <child name="abc" dob="1/1/1997"/>
      <grandchild name="xyz" dob="1/1/2007"/>
    </parent>
  5. Specify a constraining rule on the text in an element with mixed content. In Listing 5 is an element, parent, that has mixed content. You can then specify a rule that allows the mixed content text to be only a maximum of 10 characters.
    Listing 5. XML fragment - A parent element with mixed content
    <parent>2 children
      <child>abc</child>
      <child>xyz</child>
    </parent>

To address these and other usage scenarios, XML Schema 1.1 provides more expressive constraints through XML Schema 1.1 assertions. Assertions in XML Schema 1.1 are similar to those available in other schema languages such as Schematron and RelaxNG.

At the time of writing this article, you can specify assertions on simple and complex types. The predicate is specified using an XPath 2.0 expression which is part of the assertion specified on the type.

Assertions on complex types

In XML Schema 1.1, complex type definitions can contain an assertions schema component which is a sequence of <xs:assert> child elements of the complex type definition. The order of this sequence is insignificant. Assertions constrain the existence of elements and attributes and their values. The <xs:assert> schema component contains a test property which is an XPath expression property record and an annotations property.

The value of the test attribute of the xs:assert element information item is an XPath expression that evaluates to either true or false. You can use a special variable, $value, to refer to the simple content value of the element or attribute being checked. Evaluation is done in the context of the parent element. The XPath expression must be a valid XPath 2.0 expression or at least conform to the minimal XPath subset defined in the XML Schema 1.1 specification.

If the XPath expression specified is invalid, an xpath-valid error is reported. If the xs:assert is incorrectly specified, the schema processor reports an as-props-correct error. If the evaluation of the test expression is true and does not result in a dynamic or type error, the element is considered locally valid. If it evaluates to false a generic cvc-assertion error is reported.

Listing 6 shows an example of a complex type with an assertion that constrains the values of two attributes. The assertion expression evaluates to true if the value of height is less than the value of width, otherwise it evaluates to false.

Listing 6. Assertion on complex type - @height < @width
<xs:element name="dimension">
  <xs:complexType>
    <xs:attribute name="height" type="xs:int"/>
    <xs:attribute name="width" type="xs:int"/>
    <xs:assert test="@height < @width"/>
  </xs:complexType>
</xs:element>

In the example above, we defined an xs:assert element information item as a direct child of xs:complexType. We can also specify xs:assert on xs:restriction or xs:extension when defining a complex type with complex content (xs:complexContent) or simple content (xs:simpleContent). For an element to be valid, each assertion in its sequence of assertions needs to evaluate to true. This sequence is comprised of all the assertions defined on the complex type as well as all assertions of the complex type's ancestors.

In Listing 7, we have two complex types, baseType and derivedType, each with its own assertion. The assertion on baseType checks if the attribute mustUnderstand is present on the element. The assertion on derivedType checks if the mustUnderstand attribute has a value YES and at least one body child is present; otherwise it expects mustUnderstand to have a value of NO. The derivedType has a sequence of two assertions, the one from baseType and its own. For the element message to be valid, its content must be valid as defined by its complexType definition and all assertions must evaluate to true.

Listing 7. Assertion on complex type with complex content
<xs:complexType name="baseType">
  <xs:sequence>
   <xs:element name="body" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute name="mustUnderstand" type="xs:string"/>
  <xs:assert test="@mustUnderstand"/>
</xs:complexType>

<xs:complexType name="derivedType">
  <xs:complexContent>
    <xs:restriction base="baseType">
      <xs:sequence>
        <xs:element name="body" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="mustUnderstand" type="xs:string"/>
      <xs:assert test="( @mustUnderstand eq 'YES' and fn:count(./body) > 0 )
                       or ( @mustUnderstand eq 'NO' )"/>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>

<xs:element name="message" type="derivedType"/>

When defining a complex type with simple content, you can specify two types of assertions. The first one acts as facet and restricts the simple content type (for example, restricting the simple value to be a multiple of 7), while the second one appears as an assertion on the element as a whole, including its attributes. Since the syntax of the content model of xs:simpleContent/xs:restriction does not distinguish between the two types of assertions, a new element information item, xs:assertion was introduced to indicate an assertion facet. We will cover xs:assertion in the next section when we discuss assertions on simple type definitions.

Assertions on simple types

In XML Schema 1.1, like complex types, xs:restriction elements among the children of an xs:simpleType can contain xs:assertion elements. Assertions in simple types are similar to other simple type constraining facets. The assertions simple type component represents a set of constraining facets that restrict the value space of a simple type by requiring values to satisfy conditions specified by the XPath expression in the value of test attribute.

As with complex type definition, the assertions are an ordered sequence of xs:assertion elements specified as facets in the simple type definition. The specified order of the sequence of assertions is insignificant as all assertions in this sequence need to evaluate to true for an element or attribute of this type to be valid. The assertions schema component contains a value property which is a sequence of assertions from the base type, if any, and assertions defined in the derived simpleType.

The value of the test attribute of the xs:assertion element facet is an XPath 2.0 expression or XPath 2.0 subset as defined by the XML Schema 1.1 specification that evaluates to either true or false. Evaluation is done in the context of the parent element. An element or attribute with simple content is valid if it is valid with respect to all assertion facets (that is, the test property of each xs:assertion evaluates to true, without any dynamic or type errors.)

In Listing 8, we show an example of an element with simple content that has an assertion facet that evaluates to true if the element's value is a multiple of 10.

Listing 8. An element with simple content that allows values that are multiples of 10
<xs:element name="message">
 <xs:simpleType>
   <xs:restriction base="xs:int">
     <xs:assertion test="($value mod 10) = 0"/>
  </xs:restriction>
 </xs:simpleType>
</xs:element>

A value is valid with respect to a derived simple type that restricts another simple type, provided that it satisfies the derived type (and its restricting facets), and all assertions belonging to both the base and the derived type. In Listing 9, a string value is valid only if it is from 3 to 25 characters long and ends with the string "xyz".

Listing 8. Assertions on derived simple type definitions
<xs:simpleType name="base">
  <xs:restriction base="xs:string">
    <xs:maxLength value="25"/>
    <xs:assertion test="fn:ends-with($value, 'xyz')"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="derived">
  <xs:restriction base="base">
    <xs:assertion test="fn:string-length($value) > 3 "/>
  </xs:restriction>
</xs:simpleType>

Error message customization

As demonstrated in the previous sections, XML Schema 1.1 assertions can use any XPath 2.0 expressions, and these expressions can be very complex. When the assertions fail, it becomes very important to provide error messages that are easy to understand.

Schema error codes

When a schema constraint is violated, the schema specification requires that the corresponding error code be reported. For example, when you see the error code cvc-attribute.3, you know clause 3 of the constraint Attribute Locally Valid is violated, indicating that the value of an attribute is not valid with respect to its type.

With a little more information about the context (for example, the element or attribute name, line and column numbers, or values involved), this error code approach is often sufficient for problem diagnosis. Applying this to assertions, the error code cvc-assertion will be reported when an assertion is not satisfied. Even with all the context information, you still do not know what really went wrong and how to fix it, unless you look at the schema and try to understand the (potentially very complex) XPath expressions.

Schematron approach

Users of Schematron (see Resources) often find it useful to be able to customize messages that are reported when rules are violated (Listing 10)

Listing 10. A Schematron rule
<report test="@min > @max">
  On element "<sch:value-of select="local-name(.)"/>", value of the
  "min" attribute "<sch:value-of select="@min"/>" can not be greater
  than that of the "max" attribute "<sch:value-of select="@max"/>".
</report>

The following XML fragment (Listing 11) violates this rule.

Listing 11. A fragment that violates a Schematron rule
<range min="30" max="10"/>

This fragment will produce a message: On element "range", value of the "min" attribute "30" can not be greater than that of the "max" attribute "10".

This approach has two significant benefits:

  1. Human readable error messages can be associated with validation rules, making it easy to diagnose validation failures.
  2. The error message can also use XPaths to refer to values in the instance being validated to provide more information about what is causing the violation. In the above example, range, 30, and 10 are all information that can vary from instance to instance.

Localization support

Validation rules can be deployed in systems with different locales, and users will expect to see error messages in different human languages. To make it possible to use a localized message, Schematron suggests using the diagnostics attribute in association with the xml:lang attribute as in Listing 12.

Listing 12. Example of localized message in Schematron
<sch:pattern>
  <sch:rule context="person">
    <sch:assert test="name" diagnostics="d1 d2">
      A person must have a name.
    </sch:assert>
  </sch:rule>
</sch:pattern>

<sch:diagnostics>
  <sch:diagnostic id="d1" xml:lang="en">
    A person must have a name.
  </sch:diagnostic>
  <sch:diagnostic id="d2" xml:lang="fr">
    Une personne doit avoir un nom.
  </sch:diagnostic>
</sch:diagnostics>

Schematron implementations can now select the right diagnostic based on the language expected.

SML approach

The Schematron approach is still not perfect for the localization issue. When new languages are supported, the Schematron rule has to be updated, both to add the new diagnostic entry, and to add the new ID to the diagnostics attribute.

The Java™ programming language handles this by using property bundles. When a new language is added, a new property bundle is introduced, and as long as it follows a certain naming convention, it can be discovered automatically, without the need to change the places where the messages are used.

The Service Modeling Language (SML) uses Schematron as one of its validation mechanisms. It introduces the "location ID" concept (Listing 13) to allow resource management strategies like the one used by a Java environment.

Listing 13. SML with a location ID concept
<sch:assert test="name" sml:locid="person:nameRequired">
  A person must have a name.
</sch:assert>

The locid attribute is of type QName. Its namespace name can be used to locate the bundle (which might contain, for example, all error messages related to a person), and the local name to identify the error message to show for the corresponding rule. In Listing 14 and Listing 15, we show some examples of message properties in English and French.

Listing 14. A fragment of a message property in English
nameRequired = A person must have a name.
Listing 15. A fragment of a message property in French
nameRequired = Une personne doit avoir un nom.

Error message customization for assertions

XML Schema 1.1 does not prescribe any way to customize error messages for assertions, but it allows application specific information embedded in annotations. For example, Listing 16 shows how to include a customized error message in the "appinfo" element inside an annotation and use "documentation" to provide additional information about the message. The users benefit when XML Schema 1.1 processors adopt a best practice for using annotations to customize assertion errors. The common practice also might include mechanisms to enable localization of error messages.

Listing 16. Customize error messages using annotations
<xs:complexType name="rangeType">
  <xs:attribute name="min" type="xs:int"/>
  <xs:attribute name="max" type="xs:int"/>
  <xs:assert test="@min <= @max">
    <xs:annotation>
      <xs:appinfo>
        Value of the "min" attribute can not be greater than that of the "max"
        attribute.
      </xs:appinfo>
      <xs:documentation>
        When this assertion fails, the content of the above "appinfo" is used
        to produce the error message.
      </xs:documentation>
    </xs:annotation>
  </xs:assert>
</xs:complexType>

Type alternatives

XML Schema 1.1 introduces a new mechanism called type alternatives that allow the schema author to specify type substitutions on an element declaration.

A look at conditional type assignment

In XML Schema 1.0, xsi:type was introduced as a mechanism for type substitution. xsi:type is specified on an element in the instance document to replace the declared type with a derived one. This mechanism works well if you design an XML vocabulary specifically for use with XML Schema and you require that instances of your vocabulary use xsi:type for type substitution. If, however, you write an XML schema for a vocabulary which already has its own notion of type substitution, then xsi:type will not work. Instances of this vocabulary select types using some other mechanism. One such example is the Atom Syndication Format, an XML language used for Web feeds.

Atom allows instances to specify a type attribute on elements containing text constructs. If present, the value of this attribute must be one of text, html or xhtml. The content allowed is determined by the value of this attribute. Because this attribute is not xsi:type, it is impossible to write a schema which models Atom using the XML Schema 1.0 language. If the condition for selecting the type is more complex, for example @height < @width (a comparison between two attributes values), you cannot simply substitute it in the instance with xsi:type.

To address the shortcomings of xsi:type, you can use the type alternative mechanism. This allows the schema author to specify type substitutions on an element declaration which are selected based on the evaluation of XPath expressions. In the next section we will show how this works using Atom as an example.

How type alternatives work

In XML Schema 1.1, element declarations can have a type table which contains a sequence of type alternative components and a default type definition (which is also a type alternative). In an XML schema document these are specified as a sequence of xs:alternative child elements of the element declaration. The type alternative schema component contains a test property which is an XPath expression property record, a type definition, and an annotations property.

The value of the test attribute on xs:alternative corresponds to the test property, an XPath expression which evaluates to true or false. The expressions allowed are limited to a subset of XPath 2.0, specifically those which only select the attribute axis. This means that only attributes on the current element are accessible by XPath evaluation. It is worth noting that the XDM data model which is constructed for the evaluation does not include any type information. This was done to avoid a situation where a schema processor would need to somehow guess the types of the attributes in order to determine the type of the element. One cannot know the actual types of the attributes until the element's type is determined.

The last xs:alternative child of the element declaration is allowed to omit the test attribute. If present, this type alternative is the default type definition. If no such xs:alternative is specified the default type definition is the one which was declared for the element.

The value of the type attribute on xs:alternative corresponds to the type definition property of the type alternative schema component. If the XPath expression on the test attribute has evaluated to true, then the specified type is selected as a substitution for the one declared on the element. The type specified must be derived from the declared type or a special simple type definition called xs:error (which has no valid instances). The xs:error type can be used to cause elements to be invalid if they satisfy the condition for the type alternative.

If an element declaration has type alternatives, they are evaluated in the order that they were specified in the schema. The first type alternative whose XPath expression evaluates to true is the type that is selected. If none of the XPath expressions evaluate to true then the default type definition is selected as the type for the element.

Now that we have described how the type alternatives mechanism works, look at an example (Listing 17) of how you can use it to write a schema for Atom. As mentioned in the previous section, elements containing textual constructs may have a type attribute which specifies the allowed content for the element. The snippet below shows how to write a declaration for a title element in Atom.

Listing 17. Type alternative xsd example
<xs:element name="title" type="xs:anyType">
  <xs:alternative test="@type='text'" type="xs:string"/>
  <xs:alternative test="@type='html'" type="htmlContentType"/>
  <xs:alternative test="@type='xhtml'" type="xhtmlContentType"/>
  <xs:alternative test="@type" type="xs:error"/>
  <xs:alternative type="xs:string"/>
</xs:element>

The element declaration for title has a base type of xs:anyType and specifies five type alternatives. The type alternatives are evaluated in order until one of the XPath expressions evaluates to true (or, if none evaluate to true, the default type definition is chosen). The first three type alternatives select a type based on the value of the type attribute being text, html, or xhtml. If the type attribute has none of these values, the XPath expressions for the first three type alternatives will evaluate to false. The fourth type alternative checks whether the type attribute exists. If the schema processor has reached this point, the value of type (if this attribute exists) is not one which Atom allows. We assign the type for this alternative to be xs:error to signal that if this condition is satisfied, the element is invalid. If none of the XPath expressions evaluate to true, the default type definition (xs:string) is selected.

The instances of the title element, Listing 18, illustrate how the different type alternatives are selected.

Listing 18. Type alternative xml instance example
<!-- 1st type alternative selected: xs:string -->
<title type="text">My News</title>

<!-- 3rd type alternative selected: xhtmlContentType -->
<title type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">My
 <xhtml:em>News</xhtml:em>!</title>

<!-- default type alternative selected: xs:string -->
<title>My News</title>

<!-- 4th type alternative selected: xs:error. Invalid. -->
<title type="unknown">Oops! Error.</title>

Conclusion

In this article, we gave an overview of co-occurrence constraint support in XML Schema 1.1, highlighting the addition of assertions and type alternatives to further restrict the existence and values of elements and attributes. In Part 3 of the series, we will explore wildcard support and how it allows you to evolve your XML schema.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=362770
ArticleTitle=XML Schema 1.1, Part 2: An introduction to XML Schema 1.1
publish-date=01132009