The W3C has started a Compound Document Formats (CDF) Working Group. The CDF Working Group grew out of a Web Applications and Compound Documents Workshop to explore issues around standardization for compound documents and specification of the behavior of some format combinations, addressing the need for an extensible and interoperable Web.
The CDF Working Group focuses on combinations of specific namespace vocabularies that will become CDF profiles, such as a rich media profile for mobile devices that might include XHTML and SVG Tiny. Other examples include combinations like XHTML and XForms, or XHTML and a subset of VoiceXML using the X+V profile.
A namespace uniquely identifies a set of names so there is no ambiguity when objects have different origins but the same names are mixed together. An XML namespace is a collection of element types and attribute names, which are uniquely identified by the name of the unique XML namespace of which they are a part. In an XML document, any element type or attribute name can thus have a two-part name that consists of the namespace name and the element or attribute name.
A Compound Document by Inclusion (CDI) combines XML markup from several namespaces into a single physical document. A number of standards exist, and continue to be developed, that are descriptions of XML markup within a single namespace; XHTML, XForms, VoiceXML, and MathML are some prominent examples of such standards, each having its own namespace. Each of these specifications focuses on a single aspect of rich-content development. For example, XForms focuses on data collection and submission, VoiceXML on speech, and MathML on the display of mathematical notations.
To authors of content, each of these many standards is useful and important. However, it is the combination of elements from any number of these standards that lends true flexibility and power to rich document creation. A document may be created to be displayed within a Web browser, and to include an input form, a scalable graphic, and a bit of mathematical notation -- all on the same page. XHTML, XForms, SVG, and MathML, respectively, serve these needs, and therefore you can combine them into a single multi-namespace document.
Consider this simple example: a compound document combining XHTML and MathML. The namespace declarations in Listing 1 are marked with appended comments that match the numbered descriptions that follow:
Listing 1. A simple compound document
<?xml version="1.0" encoding="iso-8859-1"?> <xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"><!-- 1 --> <xhtml:body> <xhtml:h1>A Compound document</xhtml:h1> <xhtml:p>A simple formula using MathML in XHTML.</xhtml:p> <mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML"><!-- 2 --> <mathml:mrow> <mathml:msqrt> <mathml:mn>49</mathml:mn> </mathml:msqrt> <mathml:mo>=</mathml:mo> <mathml:mn>7</mathml:mn> </mathml:mrow> </mathml:math> </xhtml:body> </xhtml:html>
- XHTML Namespace declaration: Each XHTML element in Listing 1 is qualified with the
- MathML Namespace declaration: Each MathML element in Listing 1 is qualified with the
Figure 1 is a rendered version of the simple compound document in Listing 1 which combines XHTML and MathML for rich content.
Figure 1. Rendered simple compound document
Compound documents can be composed of a single document that contains multiple namespaces, as in Listing 1. This is a Compound Document by Inclusion (CDI). However, a compound document can also be composed over several documents in which one document of a particular namespace references another separate document of a different namespace. For example, a root or top-most document might contain XHTML content for defining and formatting a page. This parent XHTML document can reference another document, of another namespace, through the use of the XHTML
<object> tag. You can repeat this for as many documents as necessary. The root document plus this collection of separate, referenced documents is a Compound Document by Reference (CDR). Figure 2 is a simple CDR document in which an XHTML root document contains a reference to a separate SVG child document that has markup for three colored circles.
Figure 2. Compound Document by Reference
And of course, a compound document can be a hybrid of both a CDI and a CDR.
Model Driven Development (MDD) is an approach and set of techniques for developing better software faster. The Object Management Group (OMG) has labeled this notion of MDD as Model Driven Architecture (MDA), and has developed a set of standards to assist in MDD. The process begins with the definition of business logic early in the requirements phase of software development. This business logic might be modeled in the Unified Modeling Language (UML), based upon the abstraction of the business logic. One or more resulting models form the basis for generating code to produce an implementation.
Some reasons to use MDD are:
- Speeds up the development process
- Business logic is independent from the platform
- If business logic changes, the model is changed
- Expertise is applied to the business model, not the software
- Decreases the costs of software development
You can represent models in many forms, such as UML, XML Model Interchange, Essential Meta Object Facility, and W3C XML Schema.
Eclipse is an open source tool integration platform, most often used as a Java development environment. As a tool integration platform, Eclipse has a varied and ever-growing set of editors and utilities, one of which is the Eclipse Modeling Framework (EMF).
EMF is a tools sub-project of the Eclipse Open Source Project. EMF is a modeling and data integration framework, as well as a code generation framework for building plug-ins for Eclipse. EMF uses ECore, a meta-language describes models and provides runtime support for those models. EMF uses ECore, a meta language that describes models based upon a subset of the OMG Meta Object Facility 2.0 (MOF) called Essential MOF (EMOF). EMF models are persisted as XML Model Interchange (XMI) documents. EMF provides viewing and command-based editing of the model as well as a basic editor for manipulating and serializing instance documents based on an EMF model. EMF models can be created from annotated Java code, XML documents, or UML models.
EMF serves as the backbone for MDD in Eclipse.
You can create CDRs and edit them with existing XML editors, since the references to other documents use generic reference mechanisms such as the
<xhtml:object> tag. However, editors for CDIs require knowledge of more than just how to validate instances of separate documents that reference in order to offer a directed editing experience. An editor that supports compound documents must have specific information about which tags from one namespace can be inserted as children of tags from another namespace. These cross-namespace relationships can be both bidirectional and recursive. A compound document profile defines which tags can be inserted under which other tags for a set of mixed namespaces. Several explicit compound document profiles exist today, such as XHTML/X+V (a subset of VoiceXML) and XHTML/MathML/SVG.
To provide a concrete example, consider an XHTML+XForms compound document profile that must define which XForms tags can exist as child tags for specific XHTML tags and vice versa. One requirement for this profile is that an
xhtml:div element can have as a child an
xforms:repeat element, which can have as a child another
xhtml:div element, which can in turn have as a child an
xforms:input element, as shown in Listing 2.
Listing 2. XHTML and XForms nested tags
<xhtml:div> <xforms:repeat model="model_PostalAddress" id="repeat_AddressLine_model_PostallAddress" nodeset="/hrxml:PostalAddress/hrxml"DeliveryAddress/hrxml:AddressLine"> <xhtml:div> <xforms:input ref="." model="model_PostalAddress"> <xforms:label>Address Line</xforms:label> </xforms:input> </xhtml:div> </xforms:repeat> </xhtml:div>
This nesting of tags needs to be explicitly defined with mechanisms beyond
xsd:anyAttributes because validating and directed editors, and user agent implementers who write rendering code for browsers, need more explicit detail to unambiguously validate and guide document construction, and to build the processing and rendering engines.
When considering compound document creation and editing tooling, keep in mind that you need to accommodate two users: the compound document schema architect and the instance document creator.
The compound document schema architect wants to efficiently express the definition for how to combine specific namespace vocabularies using defined profiles. This is the person who builds the implementation of a compound document profile.
The instance document creator wants to leverage the profile, but has no interest in building or editing profiles. The instance document creator simply wants to create well-formed and valid instances of documents that adhere to a profile, preferably with a directed editor and correct-by-construction experience. In this experience, restricted choices are offered to the editor for valid context-sensitive choices according to the profile.
EMF as an open modeling technology is a natural fit for defining compound document profiles. You can then use the EMF ECore models to create Eclipse-based editors for document creation and serialization.
The model-driven approach to compound document tooling begins with Platform Independent Models (PIMs) of each functional namespace (XHTML, XForms, SVG, and so on) that will be included in a profile. A PIM is a high-level abstraction that does not consider implementation specifics, but rather expresses only the intent of what is being modeled. PIMs can take many forms, such as W3C XML Schema, RELAX NG, Schematron, MOF, or UML models. Once the PIM models for all the profile schemas are created, they can be transformed to Platform Specific Models (PSMs), all of the same normative type. For example, the PSMs might all be XML Schema, UML models, or EMF ECore models. Next, the profile is realized by creating cross-model references between the models, representing the places where tags from one namespace may be referenced by, or inserted under, another. For example, a profile for XHTML+XForms would need to define that an
<xforms:model> tag can be inserted under the
<xhtml:head> tag. Figure 3 shows this PSM XHTML+XForms profile annotation as a UML aggregation relationship between the head class from the XHTML PIM model and the model class from the XForms PIM model.
Figure 3. PSM cross-model relationship in UML
You can transform the PSMs into EMF ECore models, which can be created from UML models or XML Schemas using EMF-provided tooling. In the example in Figure 3, the aggregation relationship becomes an EReference in the PSM ECore model. Creating these models and realizing the profiles as references across these models is the role of the compound document schema architect. These PSM models that realize the compound document profile are then used to drive a directed editor, which the instance document creator uses to create and edit instances that adhere to the profile. Figure 4 is a profile for XHTML+XForms+XML Events from PIM to PSM to serialized instance documents.
Figure 4. Model-driven compound document editor profile creation
A model-driven approach is an efficient way to create functional PIMs of specific namespaces that can be used to create PSMs of combinations of namespaces to represent profiles. You can reuse PIM models many times in different combinations to form as many profiles as required. Using Eclipse EMF ECore models is an ideal way to get directed editing and serialization for the creation of an instance document in a Compound XML Document Editor.
The Compound XML Document Editor (available at IBM alphaWorks) is a dynamic editor framework that uses ECore models to drive model-based compound document construction.
You can add any type whose instances are serialized to XML to the Compound XML Document Editor framework without the need to write any Java code. The Compound XML Document Editor uses model repositories, in which ECore models are stored. Once you drop an ECore model into a Compound XML Document Editor model repository and start the Compound XML Document Editor, you can create or dynamically edit instance documents from these ECore models. You can create model repositories to accommodate as many models and compound document profiles as necessary.
You can swap out individual models, or you can switch out entire model repositories at runtime. Furthermore, you can make changes to ECore models on the fly that are immediately reflected in the editor and in serialized instance documents.
The Compound XML Document Editor comes with ECore models for XHTML, XForms, XML Events, SVG, SMIL, VoiceXML, XUL, MathML, and XLink. Figure 5 shows the available profile combinations in the default model repository with XHTML as the root document; it includes a profile that allows inclusion of elements and attributes from several other namespaces.
Figure 5. Default model repository
The Compound XML Document Editor uses the underlying EMF models to provide a directed editing experience by restricting the allowable right-click options for tag insertion. This is illustrated in Figure 6: The profile is honored by an EMF editor that interrogates the PSM model and allows only valid entries in accordance with that compound document profile. Element attributes are represented as properties in a property sheet.
Figure 6. Directed editing
Once you have created a document, you can render it directly from configurable right-click menu options for browsers that support the compound document profile used in the document (see Figure 7).
Figure 7. Rendering options
Figure 8 shows an insurance form for Automobile Loss Reporting based on ACORD schemas rendered in the X-Smiles browser.
Figure 8. X-Smiles rendered XForm
The Compound XML Document Editor is a standards-based, model-driven, compound document development framework that supports dynamic compound document creation and serialization. The Compound XML Document Editor utilizes Model Driven Development concepts with Eclipse EMF to help develop flexible compound documents and the profiles that define them.
Acknowledgements: Thanks to Simon Johnston and Steve Speicher.
- Learn more about Eclipse and the Eclipse Modeling Framework at eclipse.org.
- Stay current on the latest developments with the W3C's Compound Document Formats (CDF) Working Group, which grew out of a Web Applications and Compound Documents Workshop.
- The W3C is also home to many of the specifications mentioned in this article, such as:
- Visit the Object Management Group (OMG) site where you'll find more information on these technologies:
- Confused by all the XML standards out there? Uche Ogbuji's developerWorks article series on XML standards can help you sort through it all:
- Part 1 -- The core standards (January 2004)
- Part 2 -- XML processing standards (February 2004)
- Part 3 -- The most important vocabularies (February 2004)
- Part 4 -- Detailed cross-reference of the most important XML standards (March 2004)
- Read more about XML User Interface Language (XUL) on Mozilla.org.
- Visit the home page for the RELAX NG schema language.
- Schematron is a language for making assertions about patterns found in XML documents.
- Find hundreds more XML resources on the developerWorks XML zone.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
Get products and technologies
- Download the Eclipse-based Compound XML Document Editor from IBM alphaWorks.
- Check out the Unified Modeling Language (UML) site for more information on this popular modeling tool. You can also find more UML-related resources at the developerWorks Rational area.
Kevin E. Kelly is a Senior Software Engineer with the IBM Corporation working on Software Standards. Kevin is a member of the W3C XForms Working Group as well as the W3C Compound Document Format Working Group. His focus is on the client technology and evolving open standards-based technologies for faster, more efficient standards adoption through XML-based and model-driven approaches. Before joining IBM, Kevin spent eight years at Rational Software working on UML modeling and Java technologies. Kevin holds a B.S. from Mercer University, and a M.S. from the University of Montana.
Jan Joseph Kratky is the lead developer for the Compound XML Document Editor and XML Forms Generator. Currently a software engineer with IBM Emerging Software Standards in Research Triangle Park, North Carolina, he holds a B.A. from Cornell University and an M.S. from Rensselaer Polytechnic Institute. A Sun Certified Java Programmer and Sun Certified Web Component Developer, Jan has worked with Java technologies since 1997, and with Eclipse technologies since 2001.