Compound XML document profiles for rich content, Part 1: Exploring extensibility alternatives using XML Schema

Discover how to build compound XML Schema profiles from core specification schemas. In this article, you explore several extension capabilities of XML Schema and compare approaches for building Compound Document Format (CDF) profiles. In Part 2, you will define a pattern for developing mixed-namespace profiles using XML Schema based on this analysis.

Steve Speicher (sspeiche@us.ibm.com), Senior Software Engineer, EMC

Steve Speicher is a Senior Software Engineer with IBM working on Software Standards. Steve's current focus is leveraging tools and model-driven development to improve the process of creating standards. Steve has previously worked on software development tools in the Rational division and IBM internal tools. Steve holds a B.S. in Computer Science and Applied Mathematics, both from Kent State University. Contact Steve at sspeiche@us.ibm.com.



Kevin E. Kelly, Senior Software Engineer, EMC

Kevin E. Kelly is a Senior Software Engineer with IBM working on software standards. Kevin is a member of the W3C XForms Working Group and the Chair for the W3C Compound Document Format Working Group. His focus is on the client technology and evolving open standards-based technologies for faster, more efficient standards adoption through XML-based and model-driven approaches. Before joining IBM, Kevin spent eight years at Rational software working on UML modeling and Java technologies. Kevin holds a B.S. from Mercer University and a M.S. from the University of Montana. Contact Kevin at kekelly@us.ibm.com.



13 September 2005

Overview

User demand for rich Web application content is continually increasing for both desktop and mobile device platforms. Open, standards-based functional XML schemas enabling rich content help ensure that such content -- and the skills required to produce it -- remains ubiquitous, accessible, and cost effective. Schemas also help ensure that this technology does not become a proprietary format for a single or small number of vendors constrained to specific programming frameworks or to specific renderer and browser technologies.

XML-based, declarative functional schemas like XHTML, XForms, XML Events, Scalable Vector Graphics (SVG), SMIL, VoiceXML, and XHTML Mobile Profile are examples of schemas that provide specific functionality for creating rich content.

Each functional schema pertains to a specific area of functionality. For example, SVG addresses graphics; XForms addresses form input collection and submission; XML Events addresses the creation of events and listeners; and so on. However, most rich Web applications require a combination of two or more of these functional schemas within a single document. Combining schemas can be problematic because not all schemas can be embedded within other schemas. And not all schemas allow other schemas to be embedded within themselves. In fact, most functional schemas assume that they are the root schema in a single document with only one functional namespace and that if the need arises for rich content from another functional namespace, a separate document can be referenced with its own root schema. For example, an XHTML document can reference an SVG graphic in a separate document at runtime to render the graphic.

Some schemas are written specifically to be embedded, such as XForms. Other schemas have been adapted for embedding through the use of newer schemas that forge a combination of existing functional schemas (such as XHTML and VoiceXML's X+V profile). The XForms specification includes guidance for enclosing schemas, but an actual combining driver schema does not exist. In the case of X+V, a separate driver schema was created by copying the VoiceXML schema and replacing specific elements with XHTML schema elements.

There isn't much clear guidance for user agent developers or content creators about which tags from XForms are allowed under XHTML tags or which XHTML tags are then allowed under XForms tags. While useful, the many XML Schema mechanisms are simply too insufficient and too ambiguous to be of value to a user agent developer.

Figure 1. A sample rendered compound document using XHTML, MathML, and SVG
Sidewinder screenshot

Understand your XML Schema design options

In this section, we discuss XML Schema extensibility options so you can compare and contrast design alternatives. The intent is not to do an exhaustive analysis of XML Schema and its extensibility capabilities, but to focus on a few options that are more relevant to compound documents. The goal of schema design is to leverage existing schemas as much as possible and to provide external schema changes to build a model of the desired mixed-namespace document.

Working with third-party schemas: A simple example

Listing 1 illustrates how to enable XHTML elements within XForms content sets and, conversely, how to enable XForms elements within XHTML content sets. This provides a scenario that uses XML Schema constructs for extensibility and working with third-party schemas.

This article only looks at some of the elements that can be compounded, which makes it easier to understand the approach, proves that the approach works, and limits the overall size and complexity of this article. Specifically, we show you how to combine XHTML and XForms in the following ways:

  • Have xhtml:p contain the xforms:select element
  • Have xforms:select contain the xhtml:p element

The XML instance should look similar to Listing 1.

Listing 1. Sample Compound Document Format content
<xhtml:body>
   <xhtml:p>
      <xforms:select>
         <xhtml:p>Some XHTML content within the xforms:select element</xhtml:p>
         ...

When you investigate the schema approaches, you'll want to replace your existing XHTML Schema definition. You can do this by replacing your existing XHTML Schema or redirecting a select few XHTML instance documents to use an XML Schema instance -- namely xsi:schemaLocation="xhtml+xforms.xsd" within the root element declaration. (See Listing 2.)

Listing 2. Sample XML Schema instance declaration
<xhtml:html xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/1999/xhtml xhtml+xforms.xsd">
...

Redefining content models

Code listings

To obtain complete versions of the code fragments in this article, see the Download section.

This approach utilizes the XML Schema redefine construct. Redefine works much like include, but, as its name indicates, it allows you to redefine simple types, complex types, element groups, and attribute groups. This is an extremely powerful mechanism in XML Schema because it enables a clean separation between disparate core components and extensions.

To better understand this approach, take a look at the process of enabling an XHTML p element within an XForms select element. First, you need to import the XHTML schema to bring it into the current schema. Next, use a redefine to include the XForms schema. When you include the XForms schema, you extend the definition of the complex type that defines the content model for the select element -- specifically, the type selectType, shown in Listing 3.

Listing 3. Code fragment for XForms with XHTML profile (xforms+xhtml.xsd)
<redefine schemaLocation="../core/XForms-Schema.xsd">
   <complexType name="selectType">
      <complexContent>
         <extension base="xforms:selectType">
            <choice minOccurs="0" maxOccurs="unbounded">
               <element ref="xhtml:p" />
            </choice>
         </extension>
      </complexContent>
   </complexType>
</redefine>

Now you need to redefine the XHTML p element content model to enable the XForms select element. To do this, import the preceding schema to get the necessary redefines. This step is required because when you use redefine, it allows you to only redefine schema elements from a schema with the same namespace as the parent (redefining) schema. You can get around this restriction by redefining XForms content in a new schema and then importing that schema into the new driver schema for your target root namespace.

Next, you need to redefine the content model for the XHTML p element -- that is, the complex type pType. Extend this type definition to include a reference to the XForms global element definition for select.

Listing 4. Code fragment for XHTML with XForms profile (xhtml+xforms.xsd)
<import schemaLocation="xforms+xhtml.xsd"
   namespace="http://www.w3.org/2002/xforms"/>

<redefine schemaLocation="../core/xhtml1-strict.xsd">
   <complexType name="pType" mixed="true">
      <complexContent>
         <extension base="xhtml:pType">
            <sequence minOccurs="0" maxOccurs="unbounded">
               <element ref="xforms:select" />
            </sequence>
         </extension>
      </complexContent>
   </complexType>
</redefine>

To accomplish this, you need to modify the XHTML and XForms schemas. For example, the XHTML p element declaration has an anonymous type declaration that can't be redefined. It is extracted as a separate, accessible complex type named pType. This same thing is done for the XForms select element, creating its type definition of selectType.

Redefines provide a powerful mechanism for defining extensions to existing schemas, but they also illustrate the importance of developing core schemas with extensibility in mind.

Allowing any content

One particular feature of XML Schema, often referred to as wildcarding, makes use of the schema elements xsd:any and xsd:anyAttribute. Wildcarding allows you to define complex or simple type restrictions for extensions based on an associated namespace. This feature is quite useful in limiting the extensibility to a namespace, but doesn't produce schemas with any finer-grained extensibility as, for example, when using only xforms:select within xhtml:p. Instead, wildcarding allows all elements from the XForms namespace to be valid within the content model for the element p.

Take the example of enabling the XHTML p element within the XForms select element content set. You reuse the redefine mechanism to avoid modifying the existing core schemas. You redefine the complex type selectType to add the any element with the namespace attribute for XHTML, indicating that strict validation should be performed. (See Listing 5.)

Listing 5. Code fragment for XForms with XHTML profile (xforms+xhtml.xsd)
<import schemaLocation="../core/xhtml1-strict.xsd" namespace="http://www.w3.org/1999/xhtml"/>

<redefine schemaLocation="../core/XForms-Schema.xsd">
   <complexType name="selectType">
      <complexContent>
         <extension base="xforms:selectType">
            <sequence minOccurs="0" maxOccurs="1">
               <choice>
                  <any namespace="http://www.w3.org/1999/xhtml" processContents="strict"/>
               </choice>
            </sequence>
         </extension>
      </complexContent>
   </complexType>
</redefine>

Next, enable the XForms select element within the XHTML p element's content set. Similarly, you redefine the existing complex type definition pType to add the any element, which supplies the namespace restriction to the XForms namespace. You also indicate to the validator that the content should follow the strict processing model, as shown in Listing 6.

Listing 6. Code fragment for XHTML with XForms profile (xhtml+xforms.xsd)
<import schemaLocation="xforms+xhtml.xsd"
   namespace="http://www.w3.org/2002/xforms"/>

<redefine schemaLocation="../core/xhtml1-strict.xsd">
   <complexType name="pType" mixed="true">
      <complexContent>
         <extension base="xhtml:pType">
            <sequence minOccurs="0" maxOccurs="unbounded">
               <any namespace="http://www.w3.org/2002/xforms" processContents="strict"/>
            </sequence>
         </extension>
      </complexContent>
   </complexType>
</redefine>

As mentioned earlier, this step enables the sample instance document to successfully validate. However, it also enables many other instance documents to validate when they would not otherwise. The any element is probably best utilized by the original core schemas to indicate where the original authors intended to have their schema extended. The any element also enables certain scenarios where other unknown content models are valid, and their validation can be handled by the schema associated with that element's namespace content.

Including different modules

This approach is much like XHTML modularization, where a top-level driver schema includes various smaller schema modules to compose the entire schema. The smaller schema modules can either be excluded, or new ones can be included or modified to produce a new schema. You can apply a variety of techniques when doing this. In one instance, you may want to have each of the smaller modules validate standalone, which requires each module to include its dependencies and could limit the overall flexibility of the modularization approach. An alternate approach is to have a top-level driver schema that includes all the prerequisite and core modules. You should also consider whether the modules need target namespaces, which affects whether including is even possible since xsd:include only works to include a schema that has either the same target namespace as the owning schema or no target namespace. When modules have no target namespace declaration, this is often referred to as a chameleon schema design pattern.

Take a look at a sample include hierarchy that defines the XForms schemas. A separate module defines the XForms select element and its content set, shown in Figure 2.

Figure 2. XForms separate module
XForms separate module

Listing 7 is the schema that simply defines the select element and its associated content set.

Listing 7. XForms_select.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xforms="http://www.w3.org/2002/xforms"
   targetNamespace="http://www.w3.org/2002/xforms"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
   elementFormDefault="qualified">

   <xsd:include schemaLocation="XForms-Schema.xsd" />

   <xsd:element name="select" type="xforms:selectType" />

   <xsd:complexType name="selectType">
      <xsd:sequence>
         <xsd:element ref="xforms:label" />
         <xsd:group ref="xforms:List.UI.Common" maxOccurs="unbounded" />
         <xsd:group ref="xforms:UI.Common" minOccurs="0" maxOccurs="unbounded" />
      </xsd:sequence>
      <xsd:attributeGroup ref="xforms:Common.Attributes" />
      <xsd:attributeGroup ref="xforms:Single.Node.Binding.Attributes" />
      <xsd:attributeGroup ref="xforms:UI.Common.Attrs" />
      <xsd:attribute name="selection" use="optional" default="closed">
         <xsd:simpleType>
            <xsd:restriction base="xsd:string">
               <xsd:enumeration value="open" />
               <xsd:enumeration value="closed" />
            </xsd:restriction>
         </xsd:simpleType>
      </xsd:attribute>
      <xsd:attribute name="incremental" type="xsd:boolean" use="optional" default="true" />
   </xsd:complexType>

</xsd:schema>

This element declaration and complex type definition have been extracted from the single XForms-Schema.xsd file. Now you can simply include this schema in the top-level XForms schema, as shown in Listing 8.

Listing 8. Snippet from XForms-Schema.xsd
<xsd:include schemaLocation="XForms_select.xsd" >
   <xsd:annotation>
      <xsd:documentation>
         Include the appropriate module to get the content model for the 'select' element.
      </xsd:documentation>
   </xsd:annotation>
</xsd:include>

To enable the XHTML p element with the XForms select content set, you need to create a derivative of XForms_select.xsd and update XForms-schema.xsd to include it instead. The new module for the XForms select element will be called XForms_select_with_xhtml.xsd and contains the code shown in Listing 9.

Listing 9. Code fragment from XForms_select_with_xhtml.xsd
<xsd:complexType name="selectType">
   <xsd:sequence>
      <xsd:element ref="xforms:label" />
      <xsd:group ref="xforms:List.UI.Common" maxOccurs="unbounded" />
      <xsd:group ref="xforms:UI.Common" minOccurs="0" maxOccurs="unbounded" />
      <xsd:element ref="xhtml:p" minOccurs="0"/>
   </xsd:sequence>
   ...removed...
</xsd:complexType>

As you can see, the XHTML p element reference is added to the complex type definition. All you need to do is modify the XForms-schema.xsd file to include this new module, as shown in Listing 10.

Listing 10. Partial XForms-schema.xsd (with new module)
<xsd:include schemaLocation="XForms_select_with_xhtml.xsd" >
   <xsd:annotation>
      <xsd:documentation>
         Include the appropriate module to get the content model for the 'select' element
         and appropriate XHTML extensions.
      </xsd:documentation>
   </xsd:annotation>
</xsd:include>

Now the include tree looks as shown in Figure 3.

Figure 3. XForms include tree with XHTML
XForms include tree

For the XHTML schema, you need to construct a similar module that defines the p element and its corresponding pType complex type definition, as shown in Listing 11.

Listing 11. Partial xhtml1-strict_p.xsd
<xs:include schemaLocation="xhtml1-strict.xsd" />

<xs:element name="p" type="pType"/>

<xs:complexType name="pType" mixed="true">
   ...removed...
</xs:complexType>

All you need to do is create a modified version of this module -- for example, xhtml1-strict_p_with_xforms.xsd. Now, add the XForms select element within the definition of the complex type pType, as shown in Listing 12.

Listing 12. Partial xhtml1-strict_p_with_xforms.xsd
<xs:complexType name="pType" mixed="true">
   <xs:complexContent>
      <xs:extension base="Inline">
         <xs:choice minOccurs="0">
            <xs:element ref="xforms:select" />
         </xs:choice>
         <xs:attributeGroup ref="attrs" />
      </xs:extension>
   </xs:complexContent>
</xs:complexType>

Finally, you can either use the xhtml1-strict.xsd file directly or create a simple xhtml+xforms.xsd driver schema, as shown in previous examples.

Listing 13. Partial xhtml+xforms.xsd
<include schemaLocation="xhtml1-strict.xsd"/>

Figure 4 shows the new compound modules you created.

Figure 4. XHTML and XForms with new compound modules
XHTML and XForms with new compound modules

This approach is achievable, but requires careful coordination between all the appropriate schema modules to ensure that the most appropriate combination is used. There are better ways to define extensions within schema -- such as redefines -- which make this approach seem a bit burdensome.

Utilizing substitution groups

Substitution groups are a mechanism for swapping elements of compatible content models with each other. First, introduce a new XForms element select-xhtml, whose content set allows for the XHTML element p. Do this by importing the XHTML schema (including the XForms schema) and making your target namespace that of the included XForms schema. You can then define the XForms select-xhtml element appropriately, as shown in Listing 14.

Listing 14. Partial XForms with XHTML profile (xforms+xhtml.xsd)
<import schemaLocation="../core/xhtml1-strict.xsd"
  namespace="http://www.w3.org/1999/xhtml"/>

<include schemaLocation="../core/XForms-Schema.xsd" />

<element name="select-xhtml" substitutionGroup="xforms:select">
  <complexType>
    <complexContent>
      <extension base="xforms:selectType">
        <sequence minOccurs="0" maxOccurs="unbounded">
          <element ref="xhtml:p" />
        </sequence>
      </extension>
    </complexContent>
  </complexType>
</element>

Next, introduce a new XHTML element, p-xforms, which can contain the XForms select element. You do this in a similar manner as before: Create a new schema to contain the extensions for XHTML, set its target namespace to be the same as the included XHTML schema, and import the XForms redefinitions (xforms+xhtml.xsd). You can then define the XHTML element p-xforms with a content set that allows for the XForms select element, as shown in Listing 15.

Listing 15. Partial XHTML with XForms profile (xhtml+xforms.xsd)
<element name="p-xforms" substitutionGroup="xhtml:p">
  <complexType mixed="true">
    <complexContent>
      <extension base="xhtml:pType">
        <sequence minOccurs="0" maxOccurs="1">
          <element ref="xforms:select" />
        </sequence>
      </extension>
    </complexContent>
  </complexType>
</element>

However, substitution has its drawbacks. It creates new elements solely to enable compounding of documents. It can also be blocked or controlled by the core schema that defines the elements using the XML Schema attributes block and final. Because of these potential limitations, other approaches are more desirable for the design goals in this article. Substitution may have its merits in other domains.

Using the hybrid approach

In the previous sections, you saw a number of extensibility patterns that use XML Schema. While it might be desirable to use only one of these approaches, you can also opt to use some, if not all, of these in conjunction with one another. For example, you could combine the use of modularized schemas and redefines to produce the desired results.

Using other notable approaches

In this article, you learned about the most relevant approaches for the example situation. Other approaches can be as simple as modifying existing schemas, although this violates the design goal of reusing core schemas that are not modified from their original sources.

While this article has investigated these approaches in the context of compound XML documents for rich content, you may still encounter situations where certain XML document formats and combinations require unique consideration.


Summary

This article explores several extension capabilities of XML Schema and concludes that, for purposes of building Compound Document Format profiles, the pattern of building modularized schemas and using redefines is the ideal approach. Other techniques may be more appropriate, depending on the situation. The use of redefines places certain guidelines on the core schemas -- global type declarations, global element/attribute declarations, and group declarations. The use of these schema constructs, in light of redefine construct restrictions, is often very useful in schema design.


Download

DescriptionNameSize
schema samplesx-cxdp1schema_samples.zip52 KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=93784
ArticleTitle=Compound XML document profiles for rich content, Part 1: Exploring extensibility alternatives using XML Schema
publish-date=09132005