Complex and simple type definitions in XML Schema 1.0 allow schema authors to specify and restrict the content of elements and values of attributes. According to the XML Schema 1.0 specification, complex type definitions constrain elements by providing attribute declarations that govern the appearance and contents of attributes by restricting elements to be empty or to conform to a specific content model, such as element-only, mixed, or simple content determined by a simple type definition of the content.
Complex type definitions also define a mechanism that governs a type definition hierarchy which determines how complex types can be derived from other complex types or simple types by extension or restriction. Substitution groups on complex types control the substitution of elements with elements of its derived type. Simple types on the other hand constrain the character values of the contents of elements and attributes.
In this article we discuss co-occurrence constraints, a new feature introduced in XML Schema 1.1 to not only constrain the content of elements and attributes, but their existence as well.
As we mentioned in the first article of the series, XML Schema 1.0 has certain limitations. Beyond the constraints mentioned above, XML schema authors often needed to enforce more complex rules that determine and restrict the content of elements and attributes, such as the ability to restrict the appearance of certain child elements based on the value of an attribute, having the total sum of child elements not exceed a certain value, or allowing the value of a child element to be valid based on the scope in which it is found.
Unfortunately, XML Schema 1.0 did not provide a way to enforce these rules. To implement such constraints, you would
- Write code at the application level (after XML schema validation)
- Use stylesheet checking (also a post-validation process)
- Use a different XML schema language such as RelaxNG or Schematron
With the constant requests for co-occurrence constraint checking support from the XML Schema 1.0 user community, the XML Schema 1.1 working group introduced the concept of assertions and type alternatives in XML Schema 1.1 to allow XML schema authors to express such constraints.
Assertions provide XML schema authors with a flexible way to control the occurrence and values of elements and attributes.
Before you delve into how assertions are defined in XML Schema 1.1, first take a look at some usage scenarios.
- Specify a constraining rule based on the values of two or more attributes. Given the XML fragment in Listing 1, you can
specify a rule between attributes
widthandheightso that the height is never be greater than the width.
Listing 1. XML fragment - element with two attributes<dimension width="10" height="5"/>
- Specify a constraining rule between attributes and elements. In Listing 2, we have an element that has an attribute
and two child elements. You can specify a rule between an attribute and the child elements
such that value of the attribute equals the number of child elements.
Listing 2. XML fragment - element with one attribute and two child elements<parent children="2"> <child name="one"/> <child name="two"/> </parent>
-
Specify a constraining rule that determines the order and choice between attributes.
For the element defined in Listing 3, you can specify a
rule where
timerhas either atimeoriterationsattribute but not both.
Listing 3. XML fragment - timer element<timer time="30" iterations="2000"/>
-
Specify a grouping of elements and attributes into a model group. For example,
you can restrict the content of element
parent(defined in Listing 4), by specifying a rule that forces the content to be eitherchildorgrandchildand both elements having the attributesnameanddob.
Listing 4. XML fragment - A parent element<parent> <child name="abc" dob="1/1/1997"/> <grandchild name="xyz" dob="1/1/2007"/> </parent>
-
Specify a constraining rule on the text in an element with mixed content. In
Listing 5 is an element,
parent, that has mixed content. You can then specify a rule that allows the mixed content text to be only a maximum of 10 characters.
Listing 5. XML fragment - A parent element with mixed content<parent>2 children <child>abc</child> <child>xyz</child> </parent>
To address these and other usage scenarios, XML Schema 1.1 provides more expressive constraints through XML Schema 1.1 assertions. Assertions in XML Schema 1.1 are similar to those available in other schema languages such as Schematron and RelaxNG.
At the time of writing this article, you can specify assertions on simple and complex types. The predicate is specified using an XPath 2.0 expression which is part of the assertion specified on the type.
In XML Schema 1.1, complex type definitions can contain an assertions schema component
which is a sequence of <xs:assert> child elements of
the complex type definition. The order of this sequence is insignificant. Assertions
constrain the existence of elements and attributes and their values. The
<xs:assert> schema component contains a test
property which is an XPath expression property record and an annotations property.
The value of the test attribute of the xs:assert element information item is an XPath
expression that evaluates to either true or false. You can use a special variable, $value, to refer to the simple content value of the element or attribute being checked.
Evaluation is done in the context of the parent element. The XPath expression must be a
valid XPath 2.0 expression or at least conform to the minimal XPath subset defined in
the XML Schema 1.1 specification.
If the XPath expression specified is invalid, an xpath-valid error is reported.
If the xs:assert is incorrectly specified, the schema
processor reports an
as-props-correct error. If the evaluation of the test expression is true and does not
result in a dynamic or type error, the element is considered locally valid. If it
evaluates to false a generic cvc-assertion error is reported.
Listing 6 shows an example of a complex type with an
assertion that constrains the values of two attributes. The assertion expression
evaluates to true if the value of height
is less than the value of width, otherwise it evaluates to
false.
Listing 6. Assertion on complex type - @height < @width
<xs:element name="dimension">
<xs:complexType>
<xs:attribute name="height" type="xs:int"/>
<xs:attribute name="width" type="xs:int"/>
<xs:assert test="@height < @width"/>
</xs:complexType>
</xs:element>
|
In the example above, we defined an xs:assert element information item as a direct
child of xs:complexType. We can also specify xs:assert
on xs:restriction or
xs:extension when defining a complex type with complex content
(xs:complexContent)
or simple content (xs:simpleContent). For an element to be valid, each assertion in
its sequence of assertions needs to evaluate to true. This sequence is comprised of
all the assertions defined on the complex type as well as all assertions of the complex
type's ancestors.
In Listing 7, we have two complex types, baseType
and derivedType, each with its own assertion. The assertion
on baseType checks if the attribute mustUnderstand
is present on the element. The assertion on derivedType
checks if the mustUnderstand attribute has a value
YES and at least one body child
is present; otherwise it expects mustUnderstand to have a
value of NO. The derivedType
has a sequence of two assertions, the one from baseType
and its own. For the element message to be valid, its content
must be valid as defined by its complexType definition and all assertions must evaluate
to true.
Listing 7. Assertion on complex type with complex content
<xs:complexType name="baseType">
<xs:sequence>
<xs:element name="body" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="mustUnderstand" type="xs:string"/>
<xs:assert test="@mustUnderstand"/>
</xs:complexType>
<xs:complexType name="derivedType">
<xs:complexContent>
<xs:restriction base="baseType">
<xs:sequence>
<xs:element name="body" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="mustUnderstand" type="xs:string"/>
<xs:assert test="( @mustUnderstand eq 'YES' and fn:count(./body) > 0 )
or ( @mustUnderstand eq 'NO' )"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:element name="message" type="derivedType"/>
|
When defining a complex type with simple content, you can specify two types of
assertions. The first one acts as facet and restricts the simple content type
(for example, restricting the simple value to be a multiple of 7), while the second one
appears as an assertion on the element as a whole, including its attributes. Since
the syntax of the content model of xs:simpleContent/xs:restriction does not
distinguish between the two types of assertions, a new element information item,
xs:assertion was introduced to indicate an assertion
facet. We will cover xs:assertion in the next section when we discuss assertions
on simple type definitions.
In XML Schema 1.1, like complex types, xs:restriction
elements among the children of an xs:simpleType can
contain xs:assertion elements. Assertions in simple types
are similar to other simple type constraining facets. The assertions simple type
component represents a set of constraining facets that restrict the value space of
a simple type by requiring values to satisfy conditions specified by the XPath
expression in the value of test attribute.
As with complex type definition, the assertions are an ordered sequence of
xs:assertion elements specified as facets in the simple type
definition. The specified order of the sequence of assertions is insignificant as all
assertions in this sequence need to evaluate to true for an element or attribute of
this type to be valid. The assertions schema component contains a value property
which is a sequence of assertions from the base type, if any, and assertions defined
in the derived simpleType.
The value of the test attribute of the xs:assertion element
facet is an XPath 2.0 expression or XPath 2.0 subset as defined by the XML Schema 1.1
specification that evaluates to either true or false. Evaluation is done in the context
of the parent element.
An element or attribute with simple content is valid if it is valid with respect to all assertion facets (that is, the test property of each
xs:assertion evaluates to true, without any dynamic or type errors.)
In Listing 8, we show an example of an element with simple content that has an assertion facet that evaluates to true if the element's value is a multiple of 10.
Listing 8. An element with simple content that allows values that are multiples of 10
<xs:element name="message">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:assertion test="($value mod 10) = 0"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
|
A value is valid with respect to a derived simple type that restricts another simple type, provided that it satisfies the derived type (and its restricting facets), and all assertions belonging to both the base and the derived type. In Listing 9, a string value is valid only if it is from 3 to 25 characters long and ends with the string "xyz".
Listing 8. Assertions on derived simple type definitions
<xs:simpleType name="base">
<xs:restriction base="xs:string">
<xs:maxLength value="25"/>
<xs:assertion test="fn:ends-with($value, 'xyz')"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="derived">
<xs:restriction base="base">
<xs:assertion test="fn:string-length($value) > 3 "/>
</xs:restriction>
</xs:simpleType>
|
As demonstrated in the previous sections, XML Schema 1.1 assertions can use any XPath 2.0 expressions, and these expressions can be very complex. When the assertions fail, it becomes very important to provide error messages that are easy to understand.
Schema error codes
When a schema constraint is violated, the schema specification requires that the
corresponding error code be reported. For example, when you see the error code
cvc-attribute.3, you know clause 3 of the constraint
Attribute Locally Valid is violated, indicating that the value of an attribute is
not valid with respect to its type.
With a little more information about the context (for example, the element or attribute name,
line and column numbers, or values involved), this error code approach is often
sufficient for problem diagnosis. Applying this to assertions, the error code
cvc-assertion will be reported when an assertion is not
satisfied. Even with all the context information, you still do not know what really went
wrong and how to fix it, unless you look at the schema and try to understand the
(potentially very complex) XPath expressions.
Users of Schematron (see Resources) often find it useful to be able to customize messages that are reported when rules are violated (Listing 10)
Listing 10. A Schematron rule
<report test="@min > @max"> On element "<sch:value-of select="local-name(.)"/>", value of the "min" attribute "<sch:value-of select="@min"/>" can not be greater than that of the "max" attribute "<sch:value-of select="@max"/>". </report> |
The following XML fragment (Listing 11) violates this rule.
Listing 11. A fragment that violates a Schematron rule
<range min="30" max="10"/> |
This fragment will produce a message: On element "range", value of the "min" attribute "30" can
not be greater than that of the "max" attribute "10".
This approach has two significant benefits:
- Human readable error messages can be associated with validation rules, making it easy to diagnose validation failures.
- The error message can also use XPaths to refer to values in the instance
being validated to provide more information about what is causing the violation.
In the above example,
range, 30, and 10 are all information that can vary from instance to instance.
Validation rules can be deployed in systems with different locales, and users will
expect to see error messages in different human languages. To make it possible to use
a localized message, Schematron suggests using the diagnostics
attribute in association with the xml:lang attribute
as in Listing 12.
Listing 12. Example of localized message in Schematron
<sch:pattern>
<sch:rule context="person">
<sch:assert test="name" diagnostics="d1 d2">
A person must have a name.
</sch:assert>
</sch:rule>
</sch:pattern>
<sch:diagnostics>
<sch:diagnostic id="d1" xml:lang="en">
A person must have a name.
</sch:diagnostic>
<sch:diagnostic id="d2" xml:lang="fr">
Une personne doit avoir un nom.
</sch:diagnostic>
</sch:diagnostics>
|
Schematron implementations can now select the right diagnostic
based on the language expected.
The Schematron approach is still not perfect for the localization issue. When new
languages are supported, the Schematron rule has to be updated, both to add the new
diagnostic entry, and to add the new ID to the
diagnostics attribute.
The Java™ programming language handles this by using property bundles. When a new language is added, a new property bundle is introduced, and as long as it follows a certain naming convention, it can be discovered automatically, without the need to change the places where the messages are used.
The Service Modeling Language (SML) uses Schematron as one of its validation mechanisms. It introduces the "location ID" concept (Listing 13) to allow resource management strategies like the one used by a Java environment.
Listing 13. SML with a location ID concept
<sch:assert test="name" sml:locid="person:nameRequired"> A person must have a name. </sch:assert> |
The locid attribute is of type QName. Its
namespace name can be used to locate the bundle (which might contain, for example,
all error messages related to a person), and the local name
to identify the error message to show for the corresponding rule. In
Listing 14 and Listing 15, we show
some examples of message properties in English and French.
Listing 14. A fragment of a message property in English
nameRequired = A person must have a name. |
Listing 15. A fragment of a message property in French
nameRequired = Une personne doit avoir un nom. |
Error message customization for assertions
XML Schema 1.1 does not prescribe any way to customize error messages for assertions, but it allows application specific information embedded in annotations. For example, Listing 16 shows how to include a customized error message in the "appinfo" element inside an annotation and use "documentation" to provide additional information about the message. The users benefit when XML Schema 1.1 processors adopt a best practice for using annotations to customize assertion errors. The common practice also might include mechanisms to enable localization of error messages.
Listing 16. Customize error messages using annotations
<xs:complexType name="rangeType">
<xs:attribute name="min" type="xs:int"/>
<xs:attribute name="max" type="xs:int"/>
<xs:assert test="@min <= @max">
<xs:annotation>
<xs:appinfo>
Value of the "min" attribute can not be greater than that of the "max"
attribute.
</xs:appinfo>
<xs:documentation>
When this assertion fails, the content of the above "appinfo" is used
to produce the error message.
</xs:documentation>
</xs:annotation>
</xs:assert>
</xs:complexType>
|
XML Schema 1.1 introduces a new mechanism called type alternatives that allow the schema author to specify type substitutions on an element declaration.
A look at conditional type assignment
In XML Schema 1.0, xsi:type was introduced as a mechanism
for type substitution. xsi:type is specified on an element in the instance document to replace the declared type with a derived one. This
mechanism works well if you design an XML vocabulary specifically for use
with XML Schema and you require that instances of your vocabulary use
xsi:type for type substitution. If, however, you write
an XML schema for a vocabulary which already has its own notion of type substitution,
then xsi:type will not work. Instances of this vocabulary
select types using some other mechanism. One such example is the Atom Syndication
Format, an XML language used for Web feeds.
Atom allows instances to specify a type attribute on elements containing text
constructs. If present, the value of this attribute must be one of text,
html or xhtml.
The content allowed is determined by the value of this attribute. Because
this attribute is not xsi:type, it is impossible to write a
schema which models Atom using the XML Schema 1.0 language. If the condition for
selecting the type is more complex, for example @height < @width (a comparison
between two attributes values), you cannot simply substitute it in the instance
with xsi:type.
To address the shortcomings of xsi:type, you can use the type alternative mechanism. This allows the schema author to specify type substitutions on an element declaration which are selected based on the evaluation of XPath expressions. In the next section we will show how this works
using Atom as an example.
In XML Schema 1.1, element declarations can have a type table which contains a
sequence of type alternative components and a default type definition (which is also
a type alternative). In an XML schema document these are specified as a sequence of
xs:alternative child elements of the element declaration.
The type alternative schema component contains a test property which is an XPath
expression property record, a type definition, and an annotations property.
The value of the test attribute on xs:alternative corresponds
to the test property, an XPath expression which evaluates to true or false. The
expressions allowed are limited to a subset of XPath 2.0, specifically those which
only select the attribute axis. This means that only attributes on the current element
are accessible by XPath evaluation. It is worth noting that the XDM data model which is
constructed for the evaluation does not include any type information. This was done to
avoid a situation where a schema processor would need to somehow guess the types of the
attributes in order to determine the type of the element. One cannot know the actual
types of the attributes until the element's type is determined.
The last xs:alternative child of the element declaration is
allowed to omit the test attribute. If present, this type alternative is the default
type definition. If no such xs:alternative is specified the
default type definition is the one which was declared for the element.
The value of the type attribute on xs:alternative corresponds
to the type definition property of the type alternative schema component. If the XPath
expression on the test attribute has evaluated to true, then the specified type is
selected as a substitution for the one declared on the element. The type specified must
be derived from the declared type or a special simple type definition called
xs:error (which has no valid instances). The
xs:error type can be used to cause elements to be invalid if
they satisfy the condition for the type alternative.
If an element declaration has type alternatives, they are evaluated in the order that
they were specified in the schema. The first type alternative whose XPath expression
evaluates to true is the type that is selected. If none of the XPath expressions
evaluate to true then the default type definition is selected as the type for the element.
Now that we have described how the type alternatives mechanism works, look
at an example (Listing 17) of how you can use it to write
a schema for Atom. As mentioned in the previous section, elements containing textual
constructs may have a type attribute which specifies the allowed content for the
element. The snippet below shows how to write a declaration for a title
element in Atom.
Listing 17. Type alternative xsd example
<xs:element name="title" type="xs:anyType"> <xs:alternative test="@type='text'" type="xs:string"/> <xs:alternative test="@type='html'" type="htmlContentType"/> <xs:alternative test="@type='xhtml'" type="xhtmlContentType"/> <xs:alternative test="@type" type="xs:error"/> <xs:alternative type="xs:string"/> </xs:element> |
The element declaration for title has a base type of xs:anyType
and specifies five type alternatives. The type alternatives are evaluated in order
until one of the XPath expressions evaluates to true (or, if none evaluate to true, the
default type definition is chosen). The first three type alternatives select a type
based on the value of the type attribute being text,
html, or xhtml. If the type
attribute has none of these values, the XPath expressions for the first three type
alternatives will evaluate to false. The fourth type alternative checks whether the
type attribute exists. If the schema processor has reached this point, the value of
type (if this attribute exists) is not one which Atom allows. We assign the type
for this alternative to be xs:error to signal that if this
condition is satisfied, the element is invalid. If none of the XPath expressions
evaluate to true, the default type definition (xs:string)
is selected.
The instances of the title element, Listing 18, illustrate
how the different type alternatives are selected.
Listing 18. Type alternative xml instance example
<!-- 1st type alternative selected: xs:string --> <title type="text">My News</title> <!-- 3rd type alternative selected: xhtmlContentType --> <title type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">My <xhtml:em>News</xhtml:em>!</title> <!-- default type alternative selected: xs:string --> <title>My News</title> <!-- 4th type alternative selected: xs:error. Invalid. --> <title type="unknown">Oops! Error.</title> |
In this article, we gave an overview of co-occurrence constraint support in XML Schema 1.1, highlighting the addition of assertions and type alternatives to further restrict the existence and values of elements and attributes. In Part 3 of the series, we will explore wildcard support and how it allows you to evolve your XML schema.
Learn
- XML Schema 1.1, Part 1: An
introduction to XML Schema 1.1: An overview of the key improvements over XML Schema 1.0 and an in-
depth look at datatypes (Neil Delima, Sandy Gao, Michael Glavassevich, Khaled Noaman;
deveoperWorks; December 2008): Start your exploration with an overview of the key improvements over
XML Schema 1.0 and in-depth look at datatype.
- XML Schema 1.1, Part 3: An introduction
to XML Schema 1.1: Evolve your schema with powerful wildcard support (Neil Delima, Sandy Gao, Michael
Glavassevich, Khaled Noaman; deveoperWorks; November 2009): Take an in-depth look at versioning features introduced by XML Schema 1.1, specifically the new powerful
wildcard mechanisms and open content.
- XML 1.0 specification: Read about XML and how it enables generic SGML to be served, received, and processed on the Web.
- XML Schema Part 1: Structures Second Edition: Learn more about the W3C XML Schema language and how it describes the structure and constrains the contents of XML 1.0 documents, including those which exploit the XML Namespace facility. This specification depends on XML Schema Part 2: Datatypes.
- XML Schema
Part 2: Datatypes Second Edition: Find information on the datatypes used in the W3C XML Schema language.
- W3C XML
Schema Definition Language (XSD) 1.1 Part 1: Structures: Check out the latest specification of the W3C XML Schema language.
- W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes: Find more information on the new datatypes added to the W3C XML Schema language.
- XQuery 1.0: Learn more about XML Query language, which uses the structure of XML to express queries across all kinds of data.
- XML Path Language 2.0: Learn more about the XPath language.
- The Service Modeling Language (SML): Learn more about how to model complex systems and services.
- XSL Transformations (XSLT) Version 2.0: Review this specification that defines the syntax and semantics of the XSLT 2.0 language.
- XQuery 1.0 and XPath 2.0 Data Model (XDM): Read about this W3C specification which is the data model of XPath 2.0, XSLT 2.0, and XQuery languages.
- Atom Syndication Format: Find more about an XML-based Web content and metadata syndication format.
- Schematron: Check out this language for making assertions about the presence or absence of patterns in XML documents.
- RELAX NG: Explore a schema language for XML.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology bookstore: Browse for books on these and other technical topics.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- The XML Parser for Java (Xerces2-J): Try this parser distributed by Apache.
- IBM
trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks XML zone: Share your thoughts: After you read this article, post your comments and thoughts in this forum. The XML zone editors moderate the forum and welcome your input.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.
Neil Delima is a Staff Software Developer at the IBM Toronto Lab. As a member of the XML Parser Development team, he has worked on developing and testing XML technology for over seven years. He is a committer on Apache's Xerces-Java parser project and has contributed to the W3C DOM and XML 1.1 test suites.
Sandy (Shudi) Gao is a software developer at the IBM Toronto Software Lab. He has been a committer to the Apache Xerces XML Parser (Java) project since 2001 and was one of the key contributors to the XML Schema support therein. Sandy has been representing IBM in W3C XML Schema Working Group since 2003. He contributed significantly to XML Schema version 1.1 development and became an editor of the specification in 2006. Sandy is also representing IBM in W3C SML Working Group.
Michael Glavassevich is a member of the XML Parser Development team at the IBM Toronto Lab. He has been one of the main contributors to the Apache Xerces2 project for the last five years, working on, among other things, the implementation of XML Schema, XInclude, JAXP 1.3/1.4 and DOM Level 3. Michael also represented IBM in the JAXP Expert Group that developed JAXP 1.4.





