Open output: Producing ODF spreadsheets from your Web services

Directly generate files with PHP and Python

Whenever a Web page or service provides data, users particularly appreciate getting it in spreadsheet format or at least in a format they can easily load into a spreadsheet. This article shows how to produce Open Document Format (ODF) spreadsheet files (or, ODS) either by directly creating them byte by byte (which requires a study of the inner structure of ODS files), or through specific libraries that simplify the work. You also get a glance into producing CSV files—not just because they are a sort of "lowest common denominator" interchange format, but because you can convert them automatically into ODS files.

Before you start

Begin by getting some data. I worked with a simple database (see Listing 1) that includes the countries, regions, and cities of the world—about three million records. I started with a free table of cities around the world (see Related topics for a link) and added the ISO 3166 table of country codes plus both the ISO 3166-2 and FIPS 10-4 tables of region codes, because the former codes (instead of the more standard codes in the latter) were used for the United States. I added the completeCities view just to simplify the code examples. Basically, understand that:

  • Countries are identified by a code (such as UY for Uruguay) and have a name.
  • Countries have regions, identified by a code (unique for a country) and with a name.
  • Cities are in a region of a country and have a name (in two versions: a plain, unaccented ASCII name and a foreign-characters name), a population (if known), and geographical coordinates.
Listing 1. Creating the view that you will be querying
  COLLATE utf8_general_ci;

USE world;

CREATE TABLE countries (
  countryCode char(2) NOT NULL,
  countryName varchar(50) NOT NULL,
  PRIMARY KEY (countryCode),
  KEY countryName (countryName)

CREATE TABLE regions (
  countryCode char(2) NOT NULL,
  regionCode char(2) NOT NULL,
  regionName varchar(50) NOT NULL,
  PRIMARY KEY (countryCode,regionCode),
  KEY regionName (regionName)

  countryCode char(2) NOT NULL,
  cityName varchar(50) NOT NULL,
  cityAccentedName varchar(50) NOT NULL,
  regionCode char(2) NOT NULL,
  population bigint(20) NOT NULL,
  latitude float(10,7) NOT NULL,
  longitude float(10,7) NOT NULL,
  KEY `INDEX` (countryCode,regionCode,cityName),
  KEY cityName (cityName),
  KEY cityAccentedName (cityAccentedName)

CREATE VIEW completeCities AS
    co.countryCode AS countryCode,
    co.countryName AS countryName,
    re.regionCode AS regionCode,
    re.regionName AS regionName,
    ci.cityName AS cityName,
    ci.population AS population,
    ci.latitude AS latitude,
    ci.longitude AS longitude
  FROM cities ci
    JOIN regions re ON re.countryCode=ci.countryCode
        AND re.regionCode=ci.regionCode
    JOIN countries co ON co.countryCode=re.countryCode
  ORDER BY 2,4,5;

I also set up a simple page to test the services. The page lets you enter a string, and the services get the data on all the cities whose names begin with that string (by running SELECT * FROM completeCities WHERE cityName LIKE '...%'). The page (see Figure 1) is as simple as it gets: You just need a text box for the string and a button for each service.

Figure 1. A simple page to allow you to call the different services
Page begins with a text field labled 'Start of city name.' Below this field are      buttons to get it via: PHP generated CSV, PHP generated CSV variant, Python generated      CSV, and so on.
Page begins with a text field labled 'Start of city name.' Below this field are buttons to get it via: PHP generated CSV, PHP generated CSV variant, Python generated CSV, and so on.

Clicking any of the buttons invokes the corresponding service, which produces either a CSV or an ODS file (see Figure 2). To be on the safe side—and to make sure there were no incompatibilities—I tried opening all produced files with both KOffice KSpread and Calc.

Figure 2. All buttons produce the same result but in different ways
The previous screen from figure 1 is in the background. A button on the page has      been clicked, and a pop-up window overlays the screen. The window's title is 'Opening      xml_1_php.ods'. User can cancel or click OK.
The previous screen from figure 1 is in the background. A button on the page has been clicked, and a pop-up window overlays the screen. The window's title is 'Opening xml_1_php.ods'. User can cancel or click OK.

Producing CSV files

Start by producing simple CSV files. CSV files are typically accepted by all kinds of software and can be massaged automatically into ODS files (though not without some setup inconveniences).

Producing CSV files with PHP is quite easy (see the code in Listing 2). After getting the desired data, it's simply a matter of going through the results and printing the fields one at a time. I limited the SELECT output to 1,000 records, but I could have gone up to 65,536, which is the maximum number of rows for Calc (and, coincidentally, Microsoft® Office Excel®), or just 32,767 for KOffice KSpread. Note the need for escaping the field values by using addslashes(); otherwise, values with quotation marks would break the code.

Listing 2. Csv_1.php produces a simple CSV file
// Get the data:

$start= addslashes($_REQUEST["start"]);
$conn= mysql_connect("localhost", "testuser", "testpass");
$db= mysql_select_db("world");
$cur= mysql_query("SELECT * FROM completeCities ".
    "WHERE cityName LIKE '{$start}%' LIMIT 1000");

// Send out the data, with headers identifying it as CSV:

header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=csv_1.csv");

while ($row= mysql_fetch_assoc($cur)) {
    $sep= "";
    foreach ($row as $value) {
        $item= is_numeric($value) ? $value : '"'.addslashes($value).'"';
        echo $sep.$item;
        $sep= ',';
    echo "\n";

You could program the main loop more elegantly by using fputcsv(), which handles formatting problems (see Listing 3). Using tmpfile() avoids collisions should the Web script be called by several users at the same time. When the file is ready, the same headers as in Listing 2 are sent, and then you need to read in the temporary file's contents and print them.

Listing 3. A variation (csv_2.php) uses fputcsv, one of PHP's CSV functions
// ...generate results...

$handle= tmpfile();
while ($row= mysql_fetch_assoc($cur)) {
    fputcsv($handle, $row);

// ...put out headers...

while ($contents= fread($handle, 1024)) {
    print $contents;

// ...clean up code...

Python's csv module makes things even easier, as Listing 4 shows. The method for getting the data is similar to PHP's. Creating the CSV files requires defining which delimiter to use (a comma [,]), and which fields to quote; I opted to quote all non-numeric fields. Using a TemporaryFile saves clean-up code; in Python version 2.6, a SpooledTemporaryFile would be even better, because data is kept in memory unless the file becomes too big. The csv.writer method produces a CSV file from an iterable object; cursor.fetchall() is a bit of a beast, and four lines are enough to produce the CSV output. Then, just as in the previous PHP versions, you need only output headers, followed by the data itself from the temporary file.

Listing 4.
def index(req):
    # ...imports...

    # Get the data:

    start= req.form["start"]
    conn= MySQLdb.connect(host= "localhost", user= "testuser",
        passwd= "testpass", db= "world")
    cursor= conn.cursor()
    cursor.execute("""SELECT * FROM completeCities WHERE
        cityName LIKE %s LIMIT 1000""", start+"%")

    #   Create the CSV file:

    csv.register_dialect("simple", delimiter= ',', quoting= csv.QUOTE_NONNUMERIC)
    myFile= tempfile.TemporaryFile()
    obj= csv.writer(myFile, dialect= "simple")

    # ...clean up...

    # Send back the data, with headers identifying the data as CSV:

    req.headers_out.add("Content-type", "text/csv");

What does an ODS file look like?

ODF files are actually ZIP files that include a slew of files and directories. Not all contents are mandatory; for example, I created simple spreadsheets (with only IBM in cell A1) with both KOffice KSpread and Calc, and then I extracted the resulting ODS files to see what they included. Listing 5 shows the result.

Listing 5. Examining the contents of an ODS file
# unzip -l kspread_ibm.ods
Archive:  kspread_ibm.ods
  Length     Date   Time    Name
 --------    ----   ----    ----
       46  08-21-09 14:00   mimetype
     2092  08-21-09 14:00   content.xml
     2631  08-21-09 14:00   styles.xml
     6342  08-21-09 14:00   settings.xml
      634  08-21-09 14:00   meta.xml
     1171  08-21-09 14:00   Thumbnails/thumbnail.png
      786  08-21-09 14:00   META-INF/manifest.xml
 --------                   -------
    13702                   7 files

# unzip -l openoffice_ibm.ods
Archive:  openoffice_ibm.ods
  Length     Date   Time    Name
 --------    ----   ----    ----
       46  08-21-09 17:00   mimetype
        0  08-21-09 17:00   Configurations2/statusbar/
        0  08-21-09 17:00   Configurations2/accelerator/current.xml
        0  08-21-09 17:00   Configurations2/floater/
        0  08-21-09 17:00   Configurations2/popupmenu/
        0  08-21-09 17:00   Configurations2/progressbar/
        0  08-21-09 17:00   Configurations2/menubar/
        0  08-21-09 17:00   Configurations2/toolbar/
        0  08-21-09 17:00   Configurations2/images/Bitmaps/
     3808  08-21-09 17:00   content.xml
     6411  08-21-09 17:00   styles.xml
      876  08-21-09 17:00   meta.xml
     1012  08-21-09 17:00   Thumbnails/thumbnail.png
     7226  08-21-09 17:00   settings.xml
     1896  08-21-09 17:00   META-INF/manifest.xml
 --------                   -------
    21275                   15 files

In both cases, the first included file is mimetype, which contains application/vnd.oasis.opendocument.spreadsheet. This file must be the first stream of the package's ZIP file.

Another common file is thumbnail.png: ODF files include, for presentation purposes, a 128 x 128 thumbnail representation of the saved document. However, the specification doesn't mandate including that image, so for the purposes of this example, you can skip that step.

Similarly, you can do away with most of the other files, but you must include the META-INF directory with the manifest.xml file, which describes all the other files included in the ZIP, and the contents.xml file, which stores the actual spreadsheet contents. A bit of experimentation confirmed that both KOffice KSpread and Calc could deal with such minimal contents, so I just needed to create three files:

  • The mimetype file is constant, so producing it is trivial.
  • For the reduced set of contents, the manifest.xml file is just a few lines long, as shown in Listing 6.
  • The more complicated file is contents.xml.
Listing 6. A minimalistic manifest.xml file
<?xml version='1.0' encoding='UTF-8'?>
   manifest:full-path='/' />
   manifest:full-path='content.xml' />

Basically, the XML contents document contains an office:spreadsheet element, which itself includes a table:table element representing each individual sheet in the spreadsheet. This element itself includes table:table-row elements (one per row), with table:table-cell elements for sequential cells in the row, as shown in Listing 7.

Listing 7. A sample contents file with just one cell
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content ...many snipped attributes...>
<office:automatic-styles />
      <table:table table:name="the sheet name">

Note that this kind of minimalistic contents file won't allow any styling, but you'll get to that later. Let's start producing actual ODS files.

Directly generating ODS files through XML

Because XML files are text files and compressing them is easily done from the command line, it's trivial to produce ODS files with any scripting language. This article provides two methods of doing so: a plain way with PHP, and a more elaborate way, using appropriate modules, with Python. (There are several XML and ZIP packages for PHP, should you want to do a more refined job.) Let's start with the simpler version, shown in Listing 8. After getting the data (in the same way as in previous listings), you must create the contents.xml file; a constant header is first included, followed by the result data, row by row and cell by cell, ending with a final footer. The manifest.xml and mimetype files are easily generated by using file_put_contents(). Then, you compress all files, put out the contents of the resulting ZIP file preceded by appropriate headers, and delete all extra files and directories to clean up.

Listing 8. Xml_1.php
// ...get the data...

    Define the constants that will be needed for the text files
  (The constants were somewhat abridged; see the original source code.)

define(MIMETYPE, "application/vnd.oasis.opendocument.spreadsheet");

    "<?xml version='1.0' encoding='UTF-8'?>\n".
    "<manifest:manifest> ... </manifest:manifest>");

    "<?xml version='1.0' encoding='UTF-8'?> ... ".
    "<office:body><office:spreadsheet><table:table table:name='Results'>");

define(XML_ROW_START, "<table:table-row>");

define(XML_CELL_START, "<table:table-cell><text:p>");

define(XML_CELL_END, "</text:p></table:table-cell>");

define(XML_ROW_END, "</table:table-row>");


// Create the content.xml file:

$contents= XML_START;
while ($row= mysql_fetch_assoc($cur)) {
    $contents.= XML_ROW_START;
    foreach ($row as $value) {
        $contents.= XML_CELL_START;
        $contents.= htmlentities($value);
        $contents.= XML_CELL_END;
    $contents.= XML_ROW_END;
$contents.= XML_END;

// let $tempzip be the name of a temporary file

file_put_contents($tempzip."/META-INF/manifest.xml", XML_MANIFEST);
file_put_contents($tempzip."/content.xml", $contents);
file_put_contents($tempzip."/mimetype", MIMETYPE);
system("cd {$tempzip}; zip -mr {$tempzip} mimetype META-INF/* content.xml >/dev/null");

// Put out the data:

header("Content-Type: application/vnd.oasis.opendocument.spreadsheet");
header("Content-Disposition: attachment; filename=xml_1.ods");
header("Content-Transfer-Encoding: binary");

// ...clean up, using unlink() and rmdir() to delete all created files

Now, let's turn to Python and go for a more "modular" version by creating XML objects in memory, dumping them to files, then using the zip module to produce the desired ODS file, as shown in Listing 9. Getting the data is the same as in Listing 4. The manifestXml object can be created with just a few lines of code because its contents are fixed. Building up the contentXml object is more arduous, because it's a larger, more complex structure; note that you need to do a loop for each cursor row (creating a row in the XML object), then again a loop for each data field (adding cells to each previously created row). After getting everything ready, it's just a matter of writing the actual files, using zip to create the desired ZIP file, and finishing by putting out the output headers followed by the contents of the zipped structure.

Listing 9.
def index(req):
    # ...imports...
    # ...get the data...
    # ...create the manifestXml object...

    # Create the contentXml document:

    contentXml= getDOMImplementation().createDocument("office",
        "office:document-content", None)
    contentXml.documentElement.setAttribute("office:version", "1.1")

    # ...add more attributes to the contentXml object...
    # ...add an empty "office:automatic-styles" element to the document...

    obd= contentXml.createElement("office:body")

    oss= contentXml.createElement("office:spreadsheet")

    table= contentXml.createElement("table:table")
    table.setAttribute("table:name", "Results")

    # Each cursor row becomes a row in the table; each field, a cell:

    for datarow in cursor.fetchall():
        tablerow= contentXml.createElement("table:table-row")
        for datafield in datarow:
            cell= contentXml.createElement("table:table-cell")
            text= contentXml.createElement("text:p")

    # Create all required directories and files:

    tempDir= tempfile.mkdtemp("", "xmlpy")

    contentFile= open(tempDir+"/content.xml", "w")

    # ...create files "mimetype" and "META-INF/manifest.xml" similarly...

    # Zip everything:

    myZip= zipfile.ZipFile(tempDir+".zip", "w")

    # the contents of the created zip file into variable dataToReturn
    # ...clean up, by using os.remove() and os.rmdir()
    # ...send back dataToReturn, with appropriate headers

The code is verbose, and you could have done things the same way as you did with PHP, but I wanted to show different ways of tackling the same problem. The next section introduces some libraries that can help cut down the coding even more.

Producing ODS through specific libraries

Creating the XML files by hand is interesting, but fortunately there are libraries that can produce ODS documents directly. I used ods-php, which even at version 0.1 (and a release candidate at that) gets the job done. (You can also use this class for reading ODF files.) On the minus side, there's no documentation apart from the PHP code itself, so getting this library to work entailed some guesswork.

Getting the data is the same as earlier. Producing an ODS file requires creating a newOds() object and adding cells to it with the addCell method. Cells are identified by row and column, starting at 0; cell A1 would be row 0, column 0. After readying the object, the saveOds method saves it to disk in the correct ODS format, and all that remains is putting out the appropriate headers, followed by the ODS file contents, as shown in Listing 10. Cleaning up requires deleting the ODS file you just created.

Listing 10. Ods_1.php
// ...get the data...

// Create an ODS object and load data into it:

$object= newOds();
for ($curRow=0; $row= mysql_fetch_assoc($cur); $curRow++) {
    $curCol= 0;
    foreach ($row as $value) {
        $type= is_numeric($value) ? "float" : "string";
        $object->addCell(0, $curRow, $curCol, $value, $type);

// Write the object to a temporary file:

$tempname= tempnam("./", "odsphp");
$tempname.= ".ods";
saveOds($object, $tempname);

// ...send out the contents of the $tempname file, with appropriate headers...
// ...clean up...

Python's Odfpy module provides a similar but more filled out library. You can build all kinds of ODF files from scratch or load an existing document into memory, change it, and save it back again. After getting the data (in similar fashion as before), to create an ODS file, you must create a document with OpenDocumentSpreadsheet(); then, create and add a table to it, and finally, insert the data by first adding rows to the table, and then adding cells to the rows, as shown in Listing 11. The final part of the code should be familiar by now: Put out headers, get and put the contents of the produced ODS file, and clean up by deleting the extra files.

Listing 11.
def index(req):
    # ...imports...

    # ...get the data...

    # Build the ODS object, row by row and cell by cell:

    doc= OpenDocumentSpreadsheet()
    table= Table(name="Results")

    for cursorRow in cursor.fetchall():
        tr= TableRow()
        for val in cursorRow:
                tc= TableCell()

    myFile= tempfile.TemporaryFile()

    # ...clean up...

    # ...send back the contents of myFile...
    # ...with headers identifying the data as ODS...

Check the odfpy package for more options. Specifically, you may be interested in the xml2odf script, which can help produce the final ODS file. Now, let's start thinking about dressing up the ODS file a bit for a more appealing look.

Jazzing it up

So far, you've been successful in creating ODS files in several different ways, but the results are—to put it mildly—plain (see Figure 3). So, let's examine two ways of including styled text in your output: the plain way, by directly producing appropriate XML files in PHP, and a more sophisticated way using the Odfpy library in Python.

Figure 3. The results so far
A plain-looking spreadsheet         listing the results.
A plain-looking spreadsheet listing the results.

Using styles isn't terribly complicated, but there are many things to consider. In this case, I wanted a big, bold, blue style for a title, and a bold over grey style for the column headings. I decided to go with automatic styles, which are easier to use; these styles are created automatically (hence the name) whenever you apply format by hand to any cell and are included within the content.xml file instead of separately. The office:automatic-styles element of your document should look something like Listing 12.

Listing 12. Producing some extra XML for a jazzed-up spreadsheet


Working in PHP, the code is practically the same as Listing 10, but you have to change the XML document header to include the required automatic styles element. Note that I also defined two new cell prefixes, each including an appropriate table:style-name attribute. Finally, it's just a matter of adding the new main title, an empty row for spacing, and a row with the column titles, as Listing 13 shows.

Listing 13. Xml_3.php
// ...everything is the same, just up to XML_START:

        "<style:style style:name='bbb' style:display-name='bbb' ".
            "<style:text-properties fo:font-weight='bold' ".
            "fo:color='#0000ff' fo:font-size='15'/>".
        "<style:style style:name='bld'  style:display-name='bld' ".
            "<style:text-properties fo:font-weight='bold'/>".
            "<style:table-cell-properties fo:background-color='#AEAEAE'/>".

    "<?xml version='1.0' encoding='UTF-8'?>\n".
    "<office:document-content ".
        //...many lines...
    "<table:table table:name='Results'>");

// ...more define() lines, as earlier, and two new definitions:

    "<table:table-cell table:style-name='bbb'><text:p>");

    "<table:table-cell table:style-name='bld'><text:p>");

// ...then, everything the same, up to:

$contents= XML_START;

// Add a big, bold, blue, title, and an empty line:

$contents.= XML_ROW_START;
$contents.= XML_BBB_CELL_START;
$contents.= "Cities whose name starts with '".$start."'";
$contents.= XML_CELL_END;
$contents.= XML_ROW_END;

$contents.= XML_ROW_START;
$contents.= XML_ROW_END;

// Add some titles, in bold:

$contents.= XML_ROW_START;
foreach (array("Country","","Region","","City","Pop","Lat","Long") as $title) {
    $contents.= XML_BLD_CELL_START;
    $contents.= $title;
    $contents.= XML_CELL_END;
$contents.= XML_ROW_END;

// ...everything is the same to the end

Turning to Python, creating styles with odfpy isn't difficult, but because the documentation isn't as helpful, I had to run several experiments and compare the results I was getting with the contents of an Calc document. You need to create the new styles and add them to the automaticstyles part of the document. Given that, adding a title or column headers is easy: You just have to create a cell, specifying the desired stylename and nothing else (see Listing 14). Note that the rest of the code is more or less the same as Listing 11.

Listing 14.
def index(req)
    # ...everything the same as in, up to including these lines:

    doc= OpenDocumentSpreadsheet()
    table= Table(name="Results")

    # Define a "bold big blue" style, and a simple bold on grey one:

    bbb= Style(name="bbb", family="table-cell")
    bbb.addElement(TextProperties(fontweight="bold", fontsize="13", color="#0000ff"))

    bld= Style(name="bld", family="table-cell")

    # Add a listing description, in the bold big blue style, and skip a row:

    tr= TableRow()
    tc= TableCell(stylename="bbb")
    tc.addElement(P(text="Cities whose name starts with '"+start+"'"))


    # Add some column titles, in the simple bold style:

    tr= TableRow()
    for myText in ["Country", "", "Region", "", "City", "Pop", "Lat", "Long"]:
        tc= TableCell(stylename="bld")

    # ...add the data, create the ODS, clean up; everything the same from here onwards

The results of the styling are, although not probably deserving of a styling award, at least better looking! See Figure 4.

Figure 4. Adding titles and some styling enhances the results.
More stylish results, with a         title in blue and a title row with a grey background and row headings in bold.
More stylish results, with a title in blue and a title row with a grey background and row headings in bold.

Now, you can start thinking about including multiple pages or sheets in the same spreadsheet, adding formulas, and even including graphs so that you can really crank up the level of your output!


This article examined several ways of producing tabular data in standard formats, from a basic CSV file to a full ODS file, doing the latter either by hand (by manually producing all required files, directories, and zipped results), or by using appropriate libraries. With a bit of extra work, you can also produce nice-looking spreadsheets. Users appreciate having their work made easier, and producing ready-to-use spreadsheets fits the bill. Now, you can start adding this functionality to your own Web pages and services!

Downloadable resources

Related topics

  • "Patterns + GWT + Ajax = Usability!" (Federico Kereki, developerWorks, July 2009): See another way to use the database presented in this article.
  • Cities table: MaxMind offers this free table of the world's cities. ISO provides the ISO 3166 family of codes, both for countries and for regions (3166-2). Get the FIPS region codes for an alternative identification scheme.
  • ODF 1.1 specification (PDF): Get the current standard (until version 1.2 comes out).
  • Odfpy: Odfpy simplifies the process of producing ODF files directly.
  • ods-php: This package is similar to Odfpy (although with fewer options) for PHP.
  • PyODConverter: This tool lets you convert CSV into ODS and do many more similar conversions, but you may have problems setting it up. Another possibility is JodConverter, found at the same Web site.
  • Python-UNO bridge: This tool provides another way to interact with Calc and produce an ODS file out of a CSV file, but you may also find installation and setup isn't straightforward because of version problems.
  • IBM product evaluation versions: Download these versions today and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Zone=Web development
ArticleTitle=Open output: Producing ODF spreadsheets from your Web services