Skip to main content

Model-driven compound document development

Build compound XML documents in Eclipse

Kevin E. Kelly (kekelly@us.ibm.com), Senior Technical Staff Member, IBM Japan, Software Group
Kevin E. Kelly is a Senior Software Engineer with the IBM Corporation working on Software Standards. Kevin is a member of the W3C XForms Working Group as well as the W3C Compound Document Format Working Group. His focus is on the client technology and evolving open standards-based technologies for faster, more efficient standards adoption through XML-based and model-driven approaches. Before joining IBM, Kevin spent eight years at Rational Software working on UML modeling and Java technologies. Kevin holds a B.S. from Mercer University, and a M.S. from the University of Montana.
Jan J. Kratky, Advisory Software Engineer, IBM Japan, Software Group
Jan Joseph Kratky is the lead developer for the Compound XML Document Editor and XML Forms Generator. Currently a software engineer with IBM Emerging Software Standards in Research Triangle Park, North Carolina, he holds a B.A. from Cornell University and an M.S. from Rensselaer Polytechnic Institute. A Sun Certified Java Programmer and Sun Certified Web Component Developer, Jan has worked with Java technologies since 1997, and with Eclipse technologies since 2001.
Keith Wells, Advisory Software Engineer, IBM Japan, Software Group
Keith Wells is a software developer at the IBM RTP campus. Keith has been involved with Emerging Technologies and the Emerging Technologies Toolkit for several years.

Summary:  Build flexible tools for the creation of mixed-namespace documents with an open standards-based approach that uses the Eclipse Modeling Framework and underlying ECore models to represent functional schemas and the connections between them. Using these models, you can provide a dynamic environment for automated serialization of instance documents that adhere to the combined functional schema definitions, while providing a directed editing experience.

Date:  22 Jul 2005
Level:  Advanced
Activity:  3963 views

Compound documents

The W3C has started a Compound Document Formats (CDF) Working Group. The CDF Working Group grew out of a Web Applications and Compound Documents Workshop to explore issues around standardization for compound documents and specification of the behavior of some format combinations, addressing the need for an extensible and interoperable Web.

The CDF Working Group focuses on combinations of specific namespace vocabularies that will become CDF profiles, such as a rich media profile for mobile devices that might include XHTML and SVG Tiny. Other examples include combinations like XHTML and XForms, or XHTML and a subset of VoiceXML using the X+V profile.

Compound documents defined

Find out more

To learn more about any of the technologies listed in this article, see the relevant links in Resources.

A namespace uniquely identifies a set of names so there is no ambiguity when objects have different origins but the same names are mixed together. An XML namespace is a collection of element types and attribute names, which are uniquely identified by the name of the unique XML namespace of which they are a part. In an XML document, any element type or attribute name can thus have a two-part name that consists of the namespace name and the element or attribute name.

A Compound Document by Inclusion (CDI) combines XML markup from several namespaces into a single physical document. A number of standards exist, and continue to be developed, that are descriptions of XML markup within a single namespace; XHTML, XForms, VoiceXML, and MathML are some prominent examples of such standards, each having its own namespace. Each of these specifications focuses on a single aspect of rich-content development. For example, XForms focuses on data collection and submission, VoiceXML on speech, and MathML on the display of mathematical notations.

To authors of content, each of these many standards is useful and important. However, it is the combination of elements from any number of these standards that lends true flexibility and power to rich document creation. A document may be created to be displayed within a Web browser, and to include an input form, a scalable graphic, and a bit of mathematical notation -- all on the same page. XHTML, XForms, SVG, and MathML, respectively, serve these needs, and therefore you can combine them into a single multi-namespace document.

Consider this simple example: a compound document combining XHTML and MathML. The namespace declarations in Listing 1 are marked with appended comments that match the numbered descriptions that follow:


Listing 1. A simple compound document

<?xml version="1.0" encoding="iso-8859-1"?>
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"><!-- 1 -->
  <xhtml:body>
    <xhtml:h1>A Compound document</xhtml:h1>
    <xhtml:p>A simple formula using MathML in XHTML.</xhtml:p>
    <mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML"><!-- 2 -->
      <mathml:mrow>
        <mathml:msqrt>
          <mathml:mn>49</mathml:mn>
        </mathml:msqrt>
        <mathml:mo>=</mathml:mo>
        <mathml:mn>7</mathml:mn>
      </mathml:mrow>
    </mathml:math>
  </xhtml:body>
</xhtml:html>

  1. XHTML Namespace declaration: Each XHTML element in Listing 1 is qualified with the xhtml: namespace prefix.
  2. MathML Namespace declaration: Each MathML element in Listing 1 is qualified with the mathml: prefix.

Figure 1 is a rendered version of the simple compound document in Listing 1 which combines XHTML and MathML for rich content.


Figure 1. Rendered simple compound document
Rendered simple compound document

Compound documents can be composed of a single document that contains multiple namespaces, as in Listing 1. This is a Compound Document by Inclusion (CDI). However, a compound document can also be composed over several documents in which one document of a particular namespace references another separate document of a different namespace. For example, a root or top-most document might contain XHTML content for defining and formatting a page. This parent XHTML document can reference another document, of another namespace, through the use of the XHTML <object> tag. You can repeat this for as many documents as necessary. The root document plus this collection of separate, referenced documents is a Compound Document by Reference (CDR). Figure 2 is a simple CDR document in which an XHTML root document contains a reference to a separate SVG child document that has markup for three colored circles.


Figure 2. Compound Document by Reference
Compound Document by Reference

And of course, a compound document can be a hybrid of both a CDI and a CDR.


Model Driven Development

Abbreviations

CDI – Compound Document by Inclusion
CDR – Compound Document by Reference
DTD – Data Type Definition
EMF – Eclipse Modeling Framework
MathML – Mathematical Markup Language
MDA – Model Driven Architecture
MDD – Model Driven Development
MOF – Meta-Object Facility
OMG – Object Management Group
PIM – Platform Independent Model
PSM – Platform Specific Model
SMIL – Synchronized Multimedia Integration Language
SVG – Scalable Vector Graphics
UML – Unified Modeling Language
VoiceXML – Voice eXtensible Markup Language
W3C – World Wide Web Consortium
X+V – XHTML + Voice profile
XHTML – eXtensible HyperText Markup Language
XMI – XML Model Interchange
XML – eXtensible Markup Language
XUL – XML User interface Language

Model Driven Development (MDD) is an approach and set of techniques for developing better software faster. The Object Management Group (OMG) has labeled this notion of MDD as Model Driven Architecture (MDA), and has developed a set of standards to assist in MDD. The process begins with the definition of business logic early in the requirements phase of software development. This business logic might be modeled in the Unified Modeling Language (UML), based upon the abstraction of the business logic. One or more resulting models form the basis for generating code to produce an implementation.

Some reasons to use MDD are:

  • Speeds up the development process
  • Business logic is independent from the platform
  • If business logic changes, the model is changed
  • Expertise is applied to the business model, not the software
  • Decreases the costs of software development

You can represent models in many forms, such as UML, XML Model Interchange, Essential Meta Object Facility, and W3C XML Schema.

Model-driven development in Eclipse

Eclipse is an open source tool integration platform, most often used as a Java development environment. As a tool integration platform, Eclipse has a varied and ever-growing set of editors and utilities, one of which is the Eclipse Modeling Framework (EMF).

EMF is a tools sub-project of the Eclipse Open Source Project. EMF is a modeling and data integration framework, as well as a code generation framework for building plug-ins for Eclipse. EMF uses ECore, a meta-language describes models and provides runtime support for those models. EMF uses ECore, a meta language that describes models based upon a subset of the OMG Meta Object Facility 2.0 (MOF) called Essential MOF (EMOF). EMF models are persisted as XML Model Interchange (XMI) documents. EMF provides viewing and command-based editing of the model as well as a basic editor for manipulating and serializing instance documents based on an EMF model. EMF models can be created from annotated Java code, XML documents, or UML models.

EMF serves as the backbone for MDD in Eclipse.


Compound document tooling

You can create CDRs and edit them with existing XML editors, since the references to other documents use generic reference mechanisms such as the <xhtml:object> tag. However, editors for CDIs require knowledge of more than just how to validate instances of separate documents that reference in order to offer a directed editing experience. An editor that supports compound documents must have specific information about which tags from one namespace can be inserted as children of tags from another namespace. These cross-namespace relationships can be both bidirectional and recursive. A compound document profile defines which tags can be inserted under which other tags for a set of mixed namespaces. Several explicit compound document profiles exist today, such as XHTML/X+V (a subset of VoiceXML) and XHTML/MathML/SVG.

To provide a concrete example, consider an XHTML+XForms compound document profile that must define which XForms tags can exist as child tags for specific XHTML tags and vice versa. One requirement for this profile is that an xhtml:div element can have as a child an xforms:repeat element, which can have as a child another xhtml:div element, which can in turn have as a child an xforms:input element, as shown in Listing 2.


Listing 2. XHTML and XForms nested tags

<xhtml:div>
  <xforms:repeat model="model_PostalAddress"
    id="repeat_AddressLine_model_PostallAddress"
    nodeset="/hrxml:PostalAddress/hrxml"DeliveryAddress/hrxml:AddressLine">
    <xhtml:div>
      <xforms:input ref="." model="model_PostalAddress">
        <xforms:label>Address Line</xforms:label>
      </xforms:input>
    </xhtml:div>
  </xforms:repeat>
</xhtml:div>

This nesting of tags needs to be explicitly defined with mechanisms beyond xsd:any and xsd:anyAttributes because validating and directed editors, and user agent implementers who write rendering code for browsers, need more explicit detail to unambiguously validate and guide document construction, and to build the processing and rendering engines.

Compound document tooling users

When considering compound document creation and editing tooling, keep in mind that you need to accommodate two users: the compound document schema architect and the instance document creator.

The compound document schema architect wants to efficiently express the definition for how to combine specific namespace vocabularies using defined profiles. This is the person who builds the implementation of a compound document profile.

The instance document creator wants to leverage the profile, but has no interest in building or editing profiles. The instance document creator simply wants to create well-formed and valid instances of documents that adhere to a profile, preferably with a directed editor and correct-by-construction experience. In this experience, restricted choices are offered to the editor for valid context-sensitive choices according to the profile.

EMF as an open modeling technology is a natural fit for defining compound document profiles. You can then use the EMF ECore models to create Eclipse-based editors for document creation and serialization.

The model-driven approach to compound document tooling begins with Platform Independent Models (PIMs) of each functional namespace (XHTML, XForms, SVG, and so on) that will be included in a profile. A PIM is a high-level abstraction that does not consider implementation specifics, but rather expresses only the intent of what is being modeled. PIMs can take many forms, such as W3C XML Schema, RELAX NG, Schematron, MOF, or UML models. Once the PIM models for all the profile schemas are created, they can be transformed to Platform Specific Models (PSMs), all of the same normative type. For example, the PSMs might all be XML Schema, UML models, or EMF ECore models. Next, the profile is realized by creating cross-model references between the models, representing the places where tags from one namespace may be referenced by, or inserted under, another. For example, a profile for XHTML+XForms would need to define that an <xforms:model> tag can be inserted under the <xhtml:head> tag. Figure 3 shows this PSM XHTML+XForms profile annotation as a UML aggregation relationship between the head class from the XHTML PIM model and the model class from the XForms PIM model.


Figure 3. PSM cross-model relationship in UML
PSM cross-model relationship in UML

You can transform the PSMs into EMF ECore models, which can be created from UML models or XML Schemas using EMF-provided tooling. In the example in Figure 3, the aggregation relationship becomes an EReference in the PSM ECore model. Creating these models and realizing the profiles as references across these models is the role of the compound document schema architect. These PSM models that realize the compound document profile are then used to drive a directed editor, which the instance document creator uses to create and edit instances that adhere to the profile. Figure 4 is a profile for XHTML+XForms+XML Events from PIM to PSM to serialized instance documents.


Figure 4. Model-driven compound document editor profile creation
Model-driven compound document editor profile creation

A model-driven approach is an efficient way to create functional PIMs of specific namespaces that can be used to create PSMs of combinations of namespaces to represent profiles. You can reuse PIM models many times in different combinations to form as many profiles as required. Using Eclipse EMF ECore models is an ideal way to get directed editing and serialization for the creation of an instance document in a Compound XML Document Editor.

Compound XML Document Editor

The Compound XML Document Editor (available at IBM alphaWorks) is a dynamic editor framework that uses ECore models to drive model-based compound document construction.

You can add any type whose instances are serialized to XML to the Compound XML Document Editor framework without the need to write any Java code. The Compound XML Document Editor uses model repositories, in which ECore models are stored. Once you drop an ECore model into a Compound XML Document Editor model repository and start the Compound XML Document Editor, you can create or dynamically edit instance documents from these ECore models. You can create model repositories to accommodate as many models and compound document profiles as necessary.

You can swap out individual models, or you can switch out entire model repositories at runtime. Furthermore, you can make changes to ECore models on the fly that are immediately reflected in the editor and in serialized instance documents.

The Compound XML Document Editor comes with ECore models for XHTML, XForms, XML Events, SVG, SMIL, VoiceXML, XUL, MathML, and XLink. Figure 5 shows the available profile combinations in the default model repository with XHTML as the root document; it includes a profile that allows inclusion of elements and attributes from several other namespaces.


Figure 5. Default model repository
Default model repository example

The Compound XML Document Editor uses the underlying EMF models to provide a directed editing experience by restricting the allowable right-click options for tag insertion. This is illustrated in Figure 6: The profile is honored by an EMF editor that interrogates the PSM model and allows only valid entries in accordance with that compound document profile. Element attributes are represented as properties in a property sheet.


Figure 6. Directed editing
Directed editing

Once you have created a document, you can render it directly from configurable right-click menu options for browsers that support the compound document profile used in the document (see Figure 7).


Figure 7. Rendering options
Rendering options

Figure 8 shows an insurance form for Automobile Loss Reporting based on ACORD schemas rendered in the X-Smiles browser.


Figure 8. X-Smiles rendered XForm
X-Smiles rendered XForm

Conclusion

The Compound XML Document Editor is a standards-based, model-driven, compound document development framework that supports dynamic compound document creation and serialization. The Compound XML Document Editor utilizes Model Driven Development concepts with Eclipse EMF to help develop flexible compound documents and the profiles that define them.

Acknowledgements: Thanks to Simon Johnston and Steve Speicher.


Resources

Learn

Get products and technologies

About the authors

Kevin E. Kelly is a Senior Software Engineer with the IBM Corporation working on Software Standards. Kevin is a member of the W3C XForms Working Group as well as the W3C Compound Document Format Working Group. His focus is on the client technology and evolving open standards-based technologies for faster, more efficient standards adoption through XML-based and model-driven approaches. Before joining IBM, Kevin spent eight years at Rational Software working on UML modeling and Java technologies. Kevin holds a B.S. from Mercer University, and a M.S. from the University of Montana.

Jan Joseph Kratky is the lead developer for the Compound XML Document Editor and XML Forms Generator. Currently a software engineer with IBM Emerging Software Standards in Research Triangle Park, North Carolina, he holds a B.A. from Cornell University and an M.S. from Rensselaer Polytechnic Institute. A Sun Certified Java Programmer and Sun Certified Web Component Developer, Jan has worked with Java technologies since 1997, and with Eclipse technologies since 2001.

Keith Wells is a software developer at the IBM RTP campus. Keith has been involved with Emerging Technologies and the Emerging Technologies Toolkit for several years.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=89442
ArticleTitle=Model-driven compound document development
publish-date=07222005
author1-email=kekelly@us.ibm.com
author1-email-cc=dwxed@us.ibm.com
author2-email=kratky@us.ibm.com
author2-email-cc=dwxed@us.ibm.com
author3-email=wellsk@us.ibm.com
author3-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers