ODFDOM for Java: Simplifying programmatic control of documents and their data, Part 1

This article is the first in a three-part series and introduces the new Open Document Format (ODF) Document Object Model (DOM) for Java™ along with the ODF Toolkit Union open source community, whose mission is to simplify the programmatic manipulation of documents and their data.


Ming Fei Jia (jiamingf@cn.ibm.com), Staff Software Engineer, IBM

Ming Fei Jia is a Staff Software Engineer at the IBM China Software Development Lab, where he is a member of the IBM ODFDOM project team, the OASIS ODF Technical Committee, and the ODF Interoperability and Conformance (OIC) Technical Committee. He also serves as the representative for IBM China document standard activities. You can reach him at jiamingf@cn.ibm.com.

23 March 2010

Also available in Chinese Russian Portuguese

Editor's note: Know a lot about this topic? Want to share your expertise? Participate in the IBM Lotus software wiki program today.

Open Document Format overview

ODF is an XML-based, open-standard file format for office documents, such as spreadsheets, text documents, and presentations. ODF is application-, platform-, and vendor-neutral and thereby facilitates broad interoperability of office documents.

The ODF standard was created and is maintained by the ODF Technical Committee of the Organization for the Advancement of Structured Information Standards (OASIS). OASIS published ODF 1.0 in May 2005; the International Organization for Standardization / International Electrotechnical Commission ratified it in May 2006 as ISO/IEC 26300:2006, making ODF the first international standard for office documents.

Today, ODF is supported by an array of vendor and open source solutions, including Microsoft® Office 2007 SP2. As a result of the widespread availability of these offerings, a growing number of users are saving their documents in ODF formats.

Besides the traditional ODF office productivity editors, a new class of innovative applications is emerging with support for ODF. These applications include ODF viewers for Web browsers, ODF format convertors, ODF standard conformance and validation tools, and collaboration tools that manipulate ODF document elements.

ODF Toolkit Union overview

The ODF specification provides a detailed description of the standard. At a length of more than 700 pages, though, it does not simplify or ease the tasks facing a software developer wanting to develop applications that programmatically manipulate documents and their contents. The ODF Toolkit Union open source community was established to address this requirement.

IBM® and Sun combined resources to launch the ODF Toolkit Union in November 2008. The goal of the ODF Toolkit Union community is to provide an open source and vendor-neutral ODF development platform and to develop various ODF tools and components that support the needs of developers.

Using these tools, developers can write ODF applications more easily, without a deep knowledge of the intricacies of the actual ODF specification. All tools and assets available from the ODF Toolkit Union are available under the Apache2 open source license. Any volunteer can join any existing project of the community and establish a new project.

Projects within the ODF Toolkit Union use the open source Mercurial tool for source-code management. Also, the ODF Toolkit Union provides wikis, forums, and mailing lists that developers can use to discuss and collaborate on technical issues.

Current projects of the ODF Toolkit community

The current projects of the ODF Toolkit community can be categorized into three classes:

  • ODF conformance and validation tools
  • ODF application tools

ODFDOM is the primary project, and there are two sub-ODFDOM projects. One is ODFDOM for Java, the focus of this series of articles, and the other is An Open Document Library (AODL), which is the .Net module of the ODF Toolkit for C#.

Here ODF conformance tools refer mainly to the ODF Validator, which is a tool used for validating whether a given ODF document conforms to the specific version of the ODF standard. It focuses on document packaging and syntax checking. The ODF Validator has two user interfaces, the command-line interface and the Web interface.

ODF application tools can be any kind of ODF document manipulation tools that meet specific requirements. Currently, these include two ODF document processing tools that use Extensible Stylesheet Language Transformation (XSLT).

All the projects of the ODF Toolkit Union open source community are in the initial phase of development. Volunteers who are interested in open source development and ODF are welcome to join and contribute to the community, and to benefit from its work.

Figure 1 shows a summary of the current ODF Toolkit projects.

Figure 1. Schematic of the current ODF Toolkit projects
Schematic of the current ODF Toolkit projects

Joining the ODF Toolkit community
To join the ODF Toolkit community, follow these steps:

  1. On the ODF Toolkit home page, click the Sign Up button or enter the following Web address directly in your browser:
  2. Register your account on the Sign Up page.
  3. Browse the project introductions, and select one or more projects in which you're interested.
  4. Subscribe to the mailing list for the projects that you select.

ODFDOM project overview

As stated in the last section, the ODFDOM - OpenDocument API is both the primary project of the community and its most active. ODFDOM provides developers with a set of lightweight Java APIs supporting the programming of ODF applications.

The APIs are designed so that developers can write as few lines of code as possible to create, modify, load, and save ODF documents easily. By leveraging ODFDOM, developers need not understand the trivial details of ODF specifications, and they don't need to rely on office software run times, such as with other ODF editors.

Volunteers working on the ODFDOM project receive updated code approximately every three months, rapidly expanding and improving the utility and performance of the ODFDOM.

ODFDOM use case scenarios

The ODFDOM project has two goals. One goal is to provide a set of APIs that are more convenient and lightweight for manipulating ODF documents than those offered by desktop ODF document editors, such as OpenOffice.org and IBM® Lotus® Symphony™, in the current office software market.

The other goal is to help developers conveniently develop the features needed to manipulate ODF documents in specific industry scenarios of personal-user or enterprise-server environments.

Here are several simple but typical user scenarios for which ODFDOM can be used:

  • In an enterprise environment, automatically generate large numbers of ODF documents according to specific business document templates and back-end database data.

    For example, suppose you need to automatically generate all employees' payrolls with ODF documents according to a payroll document template and payroll records in an employee database. In this scenario, a relatively easy solution is to integrate ODFDOM into the enterprise application server as one of several servlets and provide ODF document auto-generating capability.

  • Validate whether a given ODF document conforms to a specific version of the ODF specification.

    By leveraging ODFDOM, you can precisely validate whether the packaging and syntax checking (for example, RelaxNG schema checking) of an ODF document conforms to the ODF standard. In fact, the ODF Validator in the ODF Toolkit community is a typical application of ODFDOM.

  • Within collaboration applications, access compound documents composed of different document parts by different authors.

    One of the solutions for such a scenario is to deploy ODFDOM on the application server and enable different user clients to access different parts of a compound document by programming with an ODFDOM API.

  • Search specific document contents based on a given search condition without needing to render the ODF document in editors.

    An obvious solution for this scenario is to use an ODFDOM navigation API in the convenient layer to implement the content-searching function. It's easily done, and only several lines of code are needed.

ODFDOM flexible build environment
ODFDOM is a vendor-neutral open source project whose build environment is not restricted to any vendor-specific development platform. Developers can compile and build ODFDOM source code in any Java development environment.

For example, ODFDOM provides an Ant build script so that it can be built with the command-line interface, the NetBeans integrated development environment (IDE), or the Eclipse IDE.

Also, ODFDOM adopts Maven, an open source code management tool, to build and maintain its source code. Maven's flexible and dynamic build mechanism provides more convenience for community developers, in that the source code of each project component is maintained on the Maven server.

Developers initially need to download only the core component code set to start a build, and during the building of the source code, Maven checks the dependency of different components and downloads the required component source code dynamically.

For developers, the biggest benefit of Maven is that they need not care much about the component dependencies of the project nor about the component version changes with project evolution. Therefore, they are able to keep a loose-coupled level among different components so that they can focus on their business logic code.

Joining the ODFDOM project
Before joining the ODFDOM project, you must register for an odftoolkit.org account according to the steps outlined previously.

After you log in to the community account, go to the ODFDOM wiki page and become familiar with the ODFDOM project overview, code architecture, source code download address, release status, simple application examples, forums, and so on.

You can also subscribe to the ODFDOM mailing list for community developers, access forums, and report bugs.


Subsequent articles in this series delve into more detail on typical user scenarios of ODFDOM, the benefits of ODFDOM when manipulating ODF documents, and the flexible build environment used in the open source development of the ODFDOM.


Code sampleODFDOM-part1-en.zip80.5KB





developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into IBM collaboration and social software on developerWorks

ArticleTitle=ODFDOM for Java: Simplifying programmatic control of documents and their data, Part 1