A few months ago a client asked me to import some XML data into a spreadsheet. The data was for managers who couldn't handle raw XML. I was happy that the company had standardized on the popular office suite OpenOffice.org (popularly abbreviated "OOo"), and I thought this would make the task trivial. After all, OOo is one of the most XML-savvy projects of its size. As I discussed in a recent developerWorks article (see Resources), OOo uses XML for its file format. Once I set to the task, however, I was surprised to find that none of the built-in or readily available third-party importers handled importing generic XML into the spreadsheet component, OOo Calc.
OOo Calc can import from some specialized XML formats, such as XHTML tables, and several existing tools can import general XML into OOo Writer (the word processor component). But when it comes to importing generic XML into OOo Calc, the user is on his own. I can understand this limitation: it's not usually easy to superimpose a regular lattice of information upon an XML document; such is the mismatch between extensible hierarchies and rigid tables. In this article I discuss a technique I used to solve the practical problem of converting XML into a form that's easily digestible by OOo Calc. Please make sure you are familiar with XSLT and XPath, and that you understand HTML and XHTML table structures before reading on.
I ended up writing a quick XSLT program to convert the XML to an XHTML table structure, which unmistakably signals how the information structure should be worked into the spreadsheet. The technique I offer in this article extends this idea, making it useful for arbitrary XML formats, with just a bit of extra work from the user. Listing 1 (labels.xml) is a sample XML source file that I want to import into a spreadsheet. (To download this code, see Download.) The XML code is quite regular, but I still need to process it before I can easily import it.
Listing 1. Sample XML to be imported into a spreadsheet (labels.xml)
<?xml version="1.0" encoding="iso-8859-1"?> <labels> <label id='ep' added="2003-06-10"> <name>Ezra Pound</name> <address> <street>45 Usura Place</street> <city>Hailey</city> <province>ID</province> </address> </label> <label id='tse' added="2003-06-20"> <name>Thomas Eliot</name> <address> <street>3 Prufrock Lane</street> <city>Stamford</city> <province>CT</province> </address> </label> <label id="lh" added="2004-11-01"> <name>Langston Hughes</name> <address> <street>10 Bridge Tunnel</street> <city>Harlem</city> <province>NY</province> </address> </label> <label id="co" added="2004-11-15"> <name>Christopher Okigbo</name> <address> <street>7 Heaven's Gate</street> <city>Idoto</city> <province>Anambra</province> </address> </label> </labels>
Listing 2 (quick-xml2table.xslt) shows the XSLT tool to create an XHTML table. (To download this code, see Download.)
Listing 2. XSLT module to convert arbitrary XML to a regular XHTML table grid (quick-xml2table.xslt)
<?xml version="1.0" encoding="utf-8"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dyn="http://exslt.org/dynamic" xmlns:u="http://uche.ogbuji.net" exclude-result-prefixes="dyn u" version="1.0" > <xsl:param name='table-spec-source' select='""'/> <xsl:output method='xml' indent='yes'/> <u:table-spec> <u:row-spec select='/labels/label'> <u:col-spec select='@id'/> <u:col-spec select='name'/> <u:col-spec select='normalize-space(address)'/> <u:col-spec select='@added'/> </u:row-spec> </u:table-spec> <!-- This code recognizes the u:table-spec element whether as the root element (for example, in a stand-alone file) or within another root element (for example, within the xsl root element) --> <xsl:variable name='table-spec' select='document($table-spec-source)/*/u:table-spec |document($table-spec-source)/u:table-spec'/> <xsl:template match='/'> <!-- Remember current source location (document root) while looping through the column expressions --> <xsl:variable name='source' select='.'/> <table> <xsl:for-each select='$table-spec/u:row-spec'> <xsl:variable name='row-query' select='.'/> <!-- Force the context back to the current source --> <xsl:for-each select='$source'> <!-- Compute the source elements that make up the rows --> <xsl:variable name='rows' select='dyn:evaluate($row-query/@select)'/> <!-- Process source row elements --> <xsl:apply-templates select='$rows' mode='process-cols'> <!-- Remember column specs for row --> <xsl:with-param name='col-specs' select='$row-query/u:col-spec'/> </xsl:apply-templates> </xsl:for-each> </xsl:for-each> </table> </xsl:template> <xsl:template match='*' mode='process-cols'> <xsl:param name='col-specs'/> <!-- Remember current source location while looping through the column expressions --> <xsl:variable name='source' select='.'/> <tr> <xsl:for-each select='$col-specs'> <xsl:variable name='col-query' select='@select'/> <td> <!-- Force the context back to the current source --> <xsl:for-each select='$source'> <!-- Compute column as string value of current source node --> <xsl:value-of select='dyn:evaluate($col-query)'/> </xsl:for-each> </td> </xsl:for-each> </tr> </xsl:template> </xsl:transform>
The key to this technique is the
table-spec-source parameter, which is passed to the XSLT processor. I set this parameter to a node set that gives an XML structure in the namespace
http://uche.ogbuji.net, which outlines how to construct the output table. The top-level
table-spec element contains one or more
row-spec elements, each with an XPath in the
select attribute that determines what main elements in the source document are the basis for table rows in the output. These elements contain
col-spec elements whose
select attribute specifies, relative to the current row element from the XML source, what expressions are used for each row value. A default value for
table-spec-source tells the processor to look for the table information right in the XSLT itself, as a top-level element. This default spec in Listing 1 is a good overall example of what table specs look like, so I highlight it in Listing 3.
Listing 3. The sample table spec for turning Listing 1 into an XHTML table for OOo Calc
<u:table-spec> <u:row-spec select='/labels/label'> <u:col-spec select='@id'/> <u:col-spec select='name'/> <u:col-spec select='normalize-space(address)'/> <u:col-spec select='@added'/> </u:row-spec> </u:table-spec>
Within each row are four columns, which come from:
- The result of applying the
normalize-spacefunction on the
addresselement child of the
The main template uses
row-spec to gather all the elements that serve as the basis of the rows. Since the XPath it uses is not actually written into the XSLT instructions, but is given at run-time as a result of evaluating
$row-query/@select, I must use the EXSLT extension function call
dyn:evaluate($row-query/@select). EXSLT is a set of XSLT extensions that are developed and sanctioned by the XSLT community, and supported in most XSLT processors. The
dyn:evaluate function converts its argument to a string and dynamically evaluates that string as an XPath expression, returning the result of this evaluation.
Note: Remember that the row elements are computed as XPaths, not as XSLT patterns, so you have to express the full path from the document root to the elements in question. In other words, it would not work for you to replace
label in the table specs.
To handle each row element, apply templates in
process-cols mode, which loops over all the
col-spec elements and again uses
dyn:evaluate to compute the text that is output for each column.
To demonstrate the XSLT utility, I ran the transform using the open source XSLT tool 4XSLT, part of 4Suite (see Resources) as follows:
$ 4xslt -o labels-table-output.html labels.xml quick-xml2table.xslt
-o option redirects the output to a file with the given name, labels-table-output.html. Listing 4 shows the output.
Listing 4. Output of Listing 2 run against Listing 1 (labels-table-output.html)
<?xml version="1.0" encoding="UTF-8"?> <table> <tr> <td>ep</td> <td>Ezra Pound</td> <td>45 Usura Place Hailey ID</td> <td>2003-06-10</td> </tr> <tr> <td>tse</td> <td>Thomas Eliot</td> <td>3 Prufrock Lane Stamford CT</td> <td>2003-06-20</td> </tr> <tr> <td>lh</td> <td>Langston Hughes</td> <td>10 Bridge Tunnel Harlem NY</td> <td>2004-11-01</td> </tr> <tr> <td>co</td> <td>Christopher Okigbo</td> <td>7 Heaven's Gate Idoto Anambra</td> <td>2004-11-15</td> </tr> </table>
A key feature of the OOo preparation utility is your ability to substitute different table specs for the output. Suppose you only want to export the ID, name, and province for each label. Use a table specification file like that in Listing 5 (tspec.xml). (To download this code, see Download.)
Listing 5. Alternative output table specification in a separate XML file (tspec.xml)
<u:table-spec xmlns:u="http://uche.ogbuji.net"> <u:row-spec select='/labels/label'> <u:col-spec select='@id'/> <u:col-spec select='name'/> <u:col-spec select='address/province'/> </u:row-spec> </u:table-spec>
To instruct the XSLT utility to use this alternate set of output table specifications, override the
table-spec-source parameter in your XSLT processor of choice. With 4XSLT, do this using the
-D command line parameter:
$ 4xslt -D table-spec-source=tspec.xml labels.xml quick-xml2table.xslt
This time the output is different. See Listing 6.
Listing 6. Output from alternative table specs
<?xml version="1.0" encoding="UTF-8"?> <table> <tr> <td>ep</td> <td>Ezra Pound</td> <td>ID</td> </tr> <tr> <td>tse</td> <td>Thomas Eliot</td> <td>CT</td> </tr> <tr> <td>lh</td> <td>Langston Hughes</td> <td>NY</td> </tr> <tr> <td>co</td> <td>Christopher Okigbo</td> <td>Anambra</td> </tr> </table>
Once you prepare the import file, actually performing the import is a snap. I performed the following tasks with my OOo 1.1.3 installation on Fedora Core 3 Linux. I selected the menu entry Insert > External Data. In the field URL of external data source, I entered the output file from the XSLT transform (Listing 4 in this case). In the next dialog box (see Figure 1), I made the indicated selection.
Figure 1. The OOo Calc "Select Filter" dialog box
I selected HTML_tables from the resulting dialog. OOo Calc then performed the import, and Figure 2 shows the resulting data in the spreadsheet. You will probably want to tweak the formatting of the results of such imports. For example, you might want to change font type or color, enable word wrapping of text fields, or set a specific formatting for numerical fields.
Figure 2. Result of import into OOo Calc
I hope that OOo develops more sophisticated import facilities for XML, or that contributors develop such tools for people unable to go to such sophisticated lengths in XSLT. For example, I can imagine a dialog box where one enters a series of XPaths to provide column information. Until such a time, though, I think that some variation on the XSLT technique I've presented here will be useful whenever you need to pull some XML data into OOo Calc.
|Code files||x-oocalc-quick-xml2table.zip||3 KB||HTTP|
- Download the code listings presented in this article.
- Learn more about OpenOffice.org at the home page, and especially the page dedicated to the role of XML in the project. Also, look at several third-party resources for OpenOffice.org, including the OpenOffice.org Utility Library and OOoMacros.org.
- Read about the XML file format for OpenOffice.org in Uche Ogbuji's article "The open office file format" (developerWorks, January 2003), an installment in his Thinking XML column.
- For more on EXSLT, try these resources:
- Read Uche Ogbuji's article "EXSLT by example" (developerWorks, February 2003).
- Visit the EXSLT home page for details on all the modules, elements, and functions. In particular, see the EXSLT dynamic module.
- Learn how to contribute your own additions to EXSLT, if you'd like.
- For discussion of EXSLT, join the EXSLT mailing list.
- Check out 4XSLT, the stylesheet processor used to test the examples. 4XSLT supports EXSLT and is part of 4Suite, which is co-developed by Uche Ogbuji.
- Find more XML resources on the developerWorks XML zone.
Browse for books on these and other technical topics.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at firstname.lastname@example.org.