Skip to main content

Use XSLT to prepare XML for import into OpenOffice Calc

Use a generic XSLT stylesheet to pull XML data into the popular spreadsheet

Uche Ogbuji (uche@ogbuji.net), Consultant, Fourthought, Inc.
Uche photo
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at uche@ogbuji.net.

Summary:  The popular open source office suite OpenOffice.org is XML-savvy at its core. It uses XML in its file formats and offers several XML-processing plug-ins, so you might expect it to have nice tools built in for importing XML data. Unfortunately, things are not so simple, and a bit of work is required to manipulate general XML into delimited text format in order to import the data into its spreadsheet component, Calc. This article offers a quick XSLT tool for this purpose and demonstrates the Calc import of records-oriented XML. In addition to learning a practical trick for working with Calc, you might also learn a few handy XSLT techniques for using dynamic criteria to transform XML.

Date:  25 Mar 2005
Level:  Intermediate
Activity:  5748 views

A few months ago a client asked me to import some XML data into a spreadsheet. The data was for managers who couldn't handle raw XML. I was happy that the company had standardized on the popular office suite OpenOffice.org (popularly abbreviated "OOo"), and I thought this would make the task trivial. After all, OOo is one of the most XML-savvy projects of its size. As I discussed in a recent developerWorks article (see Resources), OOo uses XML for its file format. Once I set to the task, however, I was surprised to find that none of the built-in or readily available third-party importers handled importing generic XML into the spreadsheet component, OOo Calc.

OOo Calc can import from some specialized XML formats, such as XHTML tables, and several existing tools can import general XML into OOo Writer (the word processor component). But when it comes to importing generic XML into OOo Calc, the user is on his own. I can understand this limitation: it's not usually easy to superimpose a regular lattice of information upon an XML document; such is the mismatch between extensible hierarchies and rigid tables. In this article I discuss a technique I used to solve the practical problem of converting XML into a form that's easily digestible by OOo Calc. Please make sure you are familiar with XSLT and XPath, and that you understand HTML and XHTML table structures before reading on.

From XML to XHTML table

I ended up writing a quick XSLT program to convert the XML to an XHTML table structure, which unmistakably signals how the information structure should be worked into the spreadsheet. The technique I offer in this article extends this idea, making it useful for arbitrary XML formats, with just a bit of extra work from the user. Listing 1 (labels.xml) is a sample XML source file that I want to import into a spreadsheet. (To download this code, see Download.) The XML code is quite regular, but I still need to process it before I can easily import it.


Listing 1. Sample XML to be imported into a spreadsheet (labels.xml)
<?xml version="1.0" encoding="iso-8859-1"?>
<labels>
  <label id='ep' added="2003-06-10">
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <province>ID</province>
    </address>
  </label>
  <label id='tse' added="2003-06-20">
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <province>CT</province>
    </address>
  </label>
  <label id="lh" added="2004-11-01">
    <name>Langston Hughes</name>
    <address>
      <street>10 Bridge Tunnel</street>
      <city>Harlem</city>
      <province>NY</province>
    </address>
  </label>
  <label id="co" added="2004-11-15">
    <name>Christopher Okigbo</name>
    <address>
      <street>7 Heaven's Gate</street>
      <city>Idoto</city>
      <province>Anambra</province>
    </address>
  </label>
</labels>
 

Listing 2 (quick-xml2table.xslt) shows the XSLT tool to create an XHTML table. (To download this code, see Download.)


Listing 2. XSLT module to convert arbitrary XML to a regular XHTML table grid (quick-xml2table.xslt)
<?xml version="1.0" encoding="utf-8"?>
<xsl:transform
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dyn="http://exslt.org/dynamic"
    xmlns:u="http://uche.ogbuji.net"
    exclude-result-prefixes="dyn u"
    version="1.0"
>

  <xsl:param name='table-spec-source' select='""'/>
  <xsl:output method='xml' indent='yes'/>

  <u:table-spec>
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='normalize-space(address)'/>
      <u:col-spec select='@added'/>
    </u:row-spec>
  </u:table-spec>

  <!-- This code recognizes the u:table-spec element whether
       as the root element (for example, in a stand-alone file) or
       within another root element (for example, within the xsl
       root element)
    -->
  <xsl:variable
    name='table-spec'
    select='document($table-spec-source)/*/u:table-spec
            |document($table-spec-source)/u:table-spec'/>

  <xsl:template match='/'>
    <!-- Remember current source location (document root) while
         looping through the column expressions -->
    <xsl:variable name='source' select='.'/>
    <table>
      <xsl:for-each select='$table-spec/u:row-spec'>
        <xsl:variable name='row-query' select='.'/>
        <!-- Force the context back to the current source -->
        <xsl:for-each select='$source'>
          <!-- Compute the source elements that make up the rows -->
          <xsl:variable name='rows'
            select='dyn:evaluate($row-query/@select)'/>
          <!-- Process source row elements -->
          <xsl:apply-templates select='$rows' mode='process-cols'>
            <!-- Remember column specs for row -->
            <xsl:with-param name='col-specs'
              select='$row-query/u:col-spec'/>
          </xsl:apply-templates>
        </xsl:for-each>
      </xsl:for-each>
    </table>
  </xsl:template>

  <xsl:template match='*' mode='process-cols'>
    <xsl:param name='col-specs'/>
    <!-- Remember current source location while looping through the
         column expressions -->
    <xsl:variable name='source' select='.'/>
    <tr>
      <xsl:for-each select='$col-specs'>
        <xsl:variable name='col-query' select='@select'/>
        <td>
          <!-- Force the context back to the current source -->
          <xsl:for-each select='$source'>
            <!-- Compute column as string value of current source node -->
            <xsl:value-of select='dyn:evaluate($col-query)'/>
          </xsl:for-each>
        </td>
      </xsl:for-each>
    </tr>
  </xsl:template>

</xsl:transform>
  

The key to this technique is the table-spec-source parameter, which is passed to the XSLT processor. I set this parameter to a node set that gives an XML structure in the namespace http://uche.ogbuji.net, which outlines how to construct the output table. The top-level table-spec element contains one or more row-spec elements, each with an XPath in the select attribute that determines what main elements in the source document are the basis for table rows in the output. These elements contain col-spec elements whose select attribute specifies, relative to the current row element from the XML source, what expressions are used for each row value. A default value for table-spec-source tells the processor to look for the table information right in the XSLT itself, as a top-level element. This default spec in Listing 1 is a good overall example of what table specs look like, so I highlight it in Listing 3.


Listing 3. The sample table spec for turning Listing 1 into an XHTML table for OOo Calc
  <u:table-spec>
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='normalize-space(address)'/>
      <u:col-spec select='@added'/>
    </u:row-spec>
  </u:table-spec>
  

Within each row are four columns, which come from:

  • The label element's id attribute
  • The label element's name child element
  • The result of applying the normalize-space function on the address element child of the label element
  • The label element's added attribute

The power of dynamic XSLT

The main template uses row-spec to gather all the elements that serve as the basis of the rows. Since the XPath it uses is not actually written into the XSLT instructions, but is given at run-time as a result of evaluating $row-query/@select, I must use the EXSLT extension function call dyn:evaluate($row-query/@select). EXSLT is a set of XSLT extensions that are developed and sanctioned by the XSLT community, and supported in most XSLT processors. The dyn:evaluate function converts its argument to a string and dynamically evaluates that string as an XPath expression, returning the result of this evaluation.

Note: Remember that the row elements are computed as XPaths, not as XSLT patterns, so you have to express the full path from the document root to the elements in question. In other words, it would not work for you to replace /labels/label with label in the table specs.

To handle each row element, apply templates in process-cols mode, which loops over all the col-spec elements and again uses dyn:evaluate to compute the text that is output for each column.


Generating the table

To demonstrate the XSLT utility, I ran the transform using the open source XSLT tool 4XSLT, part of 4Suite (see Resources) as follows:

$ 4xslt -o labels-table-output.html labels.xml quick-xml2table.xslt

The -o option redirects the output to a file with the given name, labels-table-output.html. Listing 4 shows the output.


Listing 4. Output of Listing 2 run against Listing 1 (labels-table-output.html)
<?xml version="1.0" encoding="UTF-8"?>
<table>
  <tr>
    <td>ep</td>
    <td>Ezra Pound</td>
    <td>45 Usura Place Hailey ID</td>
    <td>2003-06-10</td>
  </tr>
  <tr>
    <td>tse</td>
    <td>Thomas Eliot</td>
    <td>3 Prufrock Lane Stamford CT</td>
    <td>2003-06-20</td>
  </tr>
  <tr>
    <td>lh</td>
    <td>Langston Hughes</td>
    <td>10 Bridge Tunnel Harlem NY</td>
    <td>2004-11-01</td>
  </tr>
  <tr>
    <td>co</td>
    <td>Christopher Okigbo</td>
    <td>7 Heaven's Gate Idoto Anambra</td>
    <td>2004-11-15</td>
  </tr>
</table>
  

A key feature of the OOo preparation utility is your ability to substitute different table specs for the output. Suppose you only want to export the ID, name, and province for each label. Use a table specification file like that in Listing 5 (tspec.xml). (To download this code, see Download.)


Listing 5. Alternative output table specification in a separate XML file (tspec.xml)
  <u:table-spec xmlns:u="http://uche.ogbuji.net">
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='address/province'/>
    </u:row-spec>
  </u:table-spec>
  

To instruct the XSLT utility to use this alternate set of output table specifications, override the table-spec-source parameter in your XSLT processor of choice. With 4XSLT, do this using the -D command line parameter:

$ 4xslt -D table-spec-source=tspec.xml labels.xml quick-xml2table.xslt
  

This time the output is different. See Listing 6.


Listing 6. Output from alternative table specs
<?xml version="1.0" encoding="UTF-8"?>
<table>
  <tr>
    <td>ep</td>
    <td>Ezra Pound</td>
    <td>ID</td>
  </tr>
  <tr>
    <td>tse</td>
    <td>Thomas Eliot</td>
    <td>CT</td>
  </tr>
  <tr>
    <td>lh</td>
    <td>Langston Hughes</td>
    <td>NY</td>
  </tr>
  <tr>
    <td>co</td>
    <td>Christopher Okigbo</td>
    <td>Anambra</td>
  </tr>
</table>
  


The actual import

Once you prepare the import file, actually performing the import is a snap. I performed the following tasks with my OOo 1.1.3 installation on Fedora Core 3 Linux. I selected the menu entry Insert > External Data. In the field URL of external data source, I entered the output file from the XSLT transform (Listing 4 in this case). In the next dialog box (see Figure 1), I made the indicated selection.


Figure 1. The OOo Calc "Select Filter" dialog box
OOo Calc Select Filter dialog

I selected HTML_tables from the resulting dialog. OOo Calc then performed the import, and Figure 2 shows the resulting data in the spreadsheet. You will probably want to tweak the formatting of the results of such imports. For example, you might want to change font type or color, enable word wrapping of text fields, or set a specific formatting for numerical fields.


Figure 2. Result of import into OOo Calc
Import result in spreadsheet

Wrap-up

I hope that OOo develops more sophisticated import facilities for XML, or that contributors develop such tools for people unable to go to such sophisticated lengths in XSLT. For example, I can imagine a dialog box where one enters a series of XPaths to provide column information. Until such a time, though, I think that some variation on the XSLT technique I've presented here will be useful whenever you need to pull some XML data into OOo Calc.



Download

DescriptionNameSizeDownload method
Code filesx-oocalc-quick-xml2table.zip3 KB HTTP

Information about download methods


Resources

About the author

Uche photo

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at uche@ogbuji.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=56882
ArticleTitle=Use XSLT to prepare XML for import into OpenOffice Calc
publish-date=03252005
author1-email=uche@ogbuji.net
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers