Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Use XSLT to prepare XML for import into OpenOffice Calc

Use a generic XSLT stylesheet to pull XML data into the popular spreadsheet

Uche Ogbuji (uche@ogbuji.net), Consultant, Fourthought, Inc.
Uche photo
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at uche@ogbuji.net.

Summary:  The popular open source office suite OpenOffice.org is XML-savvy at its core. It uses XML in its file formats and offers several XML-processing plug-ins, so you might expect it to have nice tools built in for importing XML data. Unfortunately, things are not so simple, and a bit of work is required to manipulate general XML into delimited text format in order to import the data into its spreadsheet component, Calc. This article offers a quick XSLT tool for this purpose and demonstrates the Calc import of records-oriented XML. In addition to learning a practical trick for working with Calc, you might also learn a few handy XSLT techniques for using dynamic criteria to transform XML.

Date:  25 Mar 2005
Level:  Intermediate
Also available in:   Japanese

Activity:  37390 views
Comments:  

A few months ago a client asked me to import some XML data into a spreadsheet. The data was for managers who couldn't handle raw XML. I was happy that the company had standardized on the popular office suite OpenOffice.org (popularly abbreviated "OOo"), and I thought this would make the task trivial. After all, OOo is one of the most XML-savvy projects of its size. As I discussed in a recent developerWorks article (see Resources), OOo uses XML for its file format. Once I set to the task, however, I was surprised to find that none of the built-in or readily available third-party importers handled importing generic XML into the spreadsheet component, OOo Calc.

OOo Calc can import from some specialized XML formats, such as XHTML tables, and several existing tools can import general XML into OOo Writer (the word processor component). But when it comes to importing generic XML into OOo Calc, the user is on his own. I can understand this limitation: it's not usually easy to superimpose a regular lattice of information upon an XML document; such is the mismatch between extensible hierarchies and rigid tables. In this article I discuss a technique I used to solve the practical problem of converting XML into a form that's easily digestible by OOo Calc. Please make sure you are familiar with XSLT and XPath, and that you understand HTML and XHTML table structures before reading on.

From XML to XHTML table

I ended up writing a quick XSLT program to convert the XML to an XHTML table structure, which unmistakably signals how the information structure should be worked into the spreadsheet. The technique I offer in this article extends this idea, making it useful for arbitrary XML formats, with just a bit of extra work from the user. Listing 1 (labels.xml) is a sample XML source file that I want to import into a spreadsheet. (To download this code, see Download.) The XML code is quite regular, but I still need to process it before I can easily import it.


Listing 1. Sample XML to be imported into a spreadsheet (labels.xml)
<?xml version="1.0" encoding="iso-8859-1"?>
<labels>
  <label id='ep' added="2003-06-10">
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <province>ID</province>
    </address>
  </label>
  <label id='tse' added="2003-06-20">
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <province>CT</province>
    </address>
  </label>
  <label id="lh" added="2004-11-01">
    <name>Langston Hughes</name>
    <address>
      <street>10 Bridge Tunnel</street>
      <city>Harlem</city>
      <province>NY</province>
    </address>
  </label>
  <label id="co" added="2004-11-15">
    <name>Christopher Okigbo</name>
    <address>
      <street>7 Heaven's Gate</street>
      <city>Idoto</city>
      <province>Anambra</province>
    </address>
  </label>
</labels>
 

Listing 2 (quick-xml2table.xslt) shows the XSLT tool to create an XHTML table. (To download this code, see Download.)


Listing 2. XSLT module to convert arbitrary XML to a regular XHTML table grid (quick-xml2table.xslt)
<?xml version="1.0" encoding="utf-8"?>
<xsl:transform
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dyn="http://exslt.org/dynamic"
    xmlns:u="http://uche.ogbuji.net"
    exclude-result-prefixes="dyn u"
    version="1.0"
>

  <xsl:param name='table-spec-source' select='""'/>
  <xsl:output method='xml' indent='yes'/>

  <u:table-spec>
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='normalize-space(address)'/>
      <u:col-spec select='@added'/>
    </u:row-spec>
  </u:table-spec>

  <!-- This code recognizes the u:table-spec element whether
       as the root element (for example, in a stand-alone file) or
       within another root element (for example, within the xsl
       root element)
    -->
  <xsl:variable
    name='table-spec'
    select='document($table-spec-source)/*/u:table-spec
            |document($table-spec-source)/u:table-spec'/>

  <xsl:template match='/'>
    <!-- Remember current source location (document root) while
         looping through the column expressions -->
    <xsl:variable name='source' select='.'/>
    <table>
      <xsl:for-each select='$table-spec/u:row-spec'>
        <xsl:variable name='row-query' select='.'/>
        <!-- Force the context back to the current source -->
        <xsl:for-each select='$source'>
          <!-- Compute the source elements that make up the rows -->
          <xsl:variable name='rows'
            select='dyn:evaluate($row-query/@select)'/>
          <!-- Process source row elements -->
          <xsl:apply-templates select='$rows' mode='process-cols'>
            <!-- Remember column specs for row -->
            <xsl:with-param name='col-specs'
              select='$row-query/u:col-spec'/>
          </xsl:apply-templates>
        </xsl:for-each>
      </xsl:for-each>
    </table>
  </xsl:template>

  <xsl:template match='*' mode='process-cols'>
    <xsl:param name='col-specs'/>
    <!-- Remember current source location while looping through the
         column expressions -->
    <xsl:variable name='source' select='.'/>
    <tr>
      <xsl:for-each select='$col-specs'>
        <xsl:variable name='col-query' select='@select'/>
        <td>
          <!-- Force the context back to the current source -->
          <xsl:for-each select='$source'>
            <!-- Compute column as string value of current source node -->
            <xsl:value-of select='dyn:evaluate($col-query)'/>
          </xsl:for-each>
        </td>
      </xsl:for-each>
    </tr>
  </xsl:template>

</xsl:transform>
  

The key to this technique is the table-spec-source parameter, which is passed to the XSLT processor. I set this parameter to a node set that gives an XML structure in the namespace http://uche.ogbuji.net, which outlines how to construct the output table. The top-level table-spec element contains one or more row-spec elements, each with an XPath in the select attribute that determines what main elements in the source document are the basis for table rows in the output. These elements contain col-spec elements whose select attribute specifies, relative to the current row element from the XML source, what expressions are used for each row value. A default value for table-spec-source tells the processor to look for the table information right in the XSLT itself, as a top-level element. This default spec in Listing 1 is a good overall example of what table specs look like, so I highlight it in Listing 3.


Listing 3. The sample table spec for turning Listing 1 into an XHTML table for OOo Calc
  <u:table-spec>
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='normalize-space(address)'/>
      <u:col-spec select='@added'/>
    </u:row-spec>
  </u:table-spec>
  

Within each row are four columns, which come from:

  • The label element's id attribute
  • The label element's name child element
  • The result of applying the normalize-space function on the address element child of the label element
  • The label element's added attribute

The power of dynamic XSLT

The main template uses row-spec to gather all the elements that serve as the basis of the rows. Since the XPath it uses is not actually written into the XSLT instructions, but is given at run-time as a result of evaluating $row-query/@select, I must use the EXSLT extension function call dyn:evaluate($row-query/@select). EXSLT is a set of XSLT extensions that are developed and sanctioned by the XSLT community, and supported in most XSLT processors. The dyn:evaluate function converts its argument to a string and dynamically evaluates that string as an XPath expression, returning the result of this evaluation.

Note: Remember that the row elements are computed as XPaths, not as XSLT patterns, so you have to express the full path from the document root to the elements in question. In other words, it would not work for you to replace /labels/label with label in the table specs.

To handle each row element, apply templates in process-cols mode, which loops over all the col-spec elements and again uses dyn:evaluate to compute the text that is output for each column.


Generating the table

To demonstrate the XSLT utility, I ran the transform using the open source XSLT tool 4XSLT, part of 4Suite (see Resources) as follows:

$ 4xslt -o labels-table-output.html labels.xml quick-xml2table.xslt

The -o option redirects the output to a file with the given name, labels-table-output.html. Listing 4 shows the output.


Listing 4. Output of Listing 2 run against Listing 1 (labels-table-output.html)
<?xml version="1.0" encoding="UTF-8"?>
<table>
  <tr>
    <td>ep</td>
    <td>Ezra Pound</td>
    <td>45 Usura Place Hailey ID</td>
    <td>2003-06-10</td>
  </tr>
  <tr>
    <td>tse</td>
    <td>Thomas Eliot</td>
    <td>3 Prufrock Lane Stamford CT</td>
    <td>2003-06-20</td>
  </tr>
  <tr>
    <td>lh</td>
    <td>Langston Hughes</td>
    <td>10 Bridge Tunnel Harlem NY</td>
    <td>2004-11-01</td>
  </tr>
  <tr>
    <td>co</td>
    <td>Christopher Okigbo</td>
    <td>7 Heaven's Gate Idoto Anambra</td>
    <td>2004-11-15</td>
  </tr>
</table>
  

A key feature of the OOo preparation utility is your ability to substitute different table specs for the output. Suppose you only want to export the ID, name, and province for each label. Use a table specification file like that in Listing 5 (tspec.xml). (To download this code, see Download.)


Listing 5. Alternative output table specification in a separate XML file (tspec.xml)
  <u:table-spec xmlns:u="http://uche.ogbuji.net">
    <u:row-spec select='/labels/label'>
      <u:col-spec select='@id'/>
      <u:col-spec select='name'/>
      <u:col-spec select='address/province'/>
    </u:row-spec>
  </u:table-spec>
  

To instruct the XSLT utility to use this alternate set of output table specifications, override the table-spec-source parameter in your XSLT processor of choice. With 4XSLT, do this using the -D command line parameter:

$ 4xslt -D table-spec-source=tspec.xml labels.xml quick-xml2table.xslt
  

This time the output is different. See Listing 6.


Listing 6. Output from alternative table specs
<?xml version="1.0" encoding="UTF-8"?>
<table>
  <tr>
    <td>ep</td>
    <td>Ezra Pound</td>
    <td>ID</td>
  </tr>
  <tr>
    <td>tse</td>
    <td>Thomas Eliot</td>
    <td>CT</td>
  </tr>
  <tr>
    <td>lh</td>
    <td>Langston Hughes</td>
    <td>NY</td>
  </tr>
  <tr>
    <td>co</td>
    <td>Christopher Okigbo</td>
    <td>Anambra</td>
  </tr>
</table>
  


The actual import

Once you prepare the import file, actually performing the import is a snap. I performed the following tasks with my OOo 1.1.3 installation on Fedora Core 3 Linux. I selected the menu entry Insert > External Data. In the field URL of external data source, I entered the output file from the XSLT transform (Listing 4 in this case). In the next dialog box (see Figure 1), I made the indicated selection.


Figure 1. The OOo Calc "Select Filter" dialog box
OOo Calc Select Filter dialog

I selected HTML_tables from the resulting dialog. OOo Calc then performed the import, and Figure 2 shows the resulting data in the spreadsheet. You will probably want to tweak the formatting of the results of such imports. For example, you might want to change font type or color, enable word wrapping of text fields, or set a specific formatting for numerical fields.


Figure 2. Result of import into OOo Calc
Import result in spreadsheet

Wrap-up

I hope that OOo develops more sophisticated import facilities for XML, or that contributors develop such tools for people unable to go to such sophisticated lengths in XSLT. For example, I can imagine a dialog box where one enters a series of XPaths to provide column information. Until such a time, though, I think that some variation on the XSLT technique I've presented here will be useful whenever you need to pull some XML data into OOo Calc.



Download

DescriptionNameSizeDownload method
Code filesx-oocalc-quick-xml2table.zip3 KB HTTP

Information about download methods


Resources

About the author

Uche photo

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=56882
ArticleTitle=Use XSLT to prepare XML for import into OpenOffice Calc
publish-date=03252005
author1-email=uche@ogbuji.net
author1-email-cc=