Reading data from Gnumeric spreadsheets directly through XML from external applications

The best of both worlds in account reporting


Content series:

This content is part # of # in the series: Tip

Stay tuned for additional content in this series.

This content is part of the series:Tip

Stay tuned for additional content in this series.

Dynamics and statics in accounting

Financial programmers and users frequently find that a spreadsheet format is good for some aspects of basic accounting data; changes are immediately reflected, and you can explore trial scenarios without making changes to the fundamental accounting records.

However, this can present a small problem. Static reports, such as income and expense, trial balance, and balance sheets, might be produced by a different application that needs access to the dynamic data to complete the report. Copying and pasting from one application to the other takes time and is subject to error, and transfer of information using comma-separated values and other techniques is clunky at best.

When the spreadsheet stores its data as XML, an XML-aware static reporting program can read that data directly. This example uses Gnumeric, which stores its data in XML format, as the spreadsheet and PHP, which is able to read the XML directly, as a reporting application.

Example: Depreciation allowances

A typical example concerns depreciation or capital cost allowances (CCA). A machine with a productive life of many years that is used in the production process wears out over time and at the end of its lifetime needs to be replaced. Accountants expense a certain percentage of the depreciated value each year. The tax authority might permit a certain maximum percentage deduction in any year, but it is up to the business to decide what amount, less than or equal to the permitted maximum, suits the business to claim during the current reporting period.

A dynamic spreadsheet can display multiple years, presenting the history of deductions in previous years and those for future years. Solver and other statistical tools help the business owner decide what deduction to take in the current year. When the decision is made, the value needs to be exposed to the static reporting application. There are many ways of doing this, some more effective than others.

Figure 1 shows a simple example. (View a text-only version of Figure 1.)

Figure 1. A simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example from 2005-2014
Screen capture of spreadsheet showing depreciation or CCA example from 2005-2014

A machine was purchased in 2005 at the beginning of the year for $30,000. The tax authority allows a maximum of 20% yearly depreciation on this type of machine. In the first two years the business decided to use 10%. Now, in 2010, the owner is thinking of using 15%. Changing that value in the spreadsheet (cell C8) immediately updates the following years' entries in the spreadsheet.

The raw spreadsheet data

Because the uncompressed Gnumeric file is pure XML, you can get an idea of the structure of the data with a basic text editor. Note that Gnumeric can store the spreadsheet data in a compressed format. To view the data directly, make sure the spreadsheet is stored using zero compression from the Gnumeric preferences.

A better way to view the overall XML structure of the Gnumeric file is to copy it to a new file with an .xml extension and then open it with an XML-aware browser. This allows you to collapse or expand elements as needed.

Much of the data, while important in a spreadsheet context, is superfluous to simple data exchange, so you need to focus on the real data.

Listing 1 is an extract from a data sheet as saved by Gnumeric.

Listing 1. An extract from the data file
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="" 
  <gnm:Version Epoch="1" Major="9" Minor="13" Full="1.9.13"/>
  <gnm:Attributes ...>
    <gnm:Sheet ... >
        <gnm:Cell Row="0" Col="0" ValueType="60">A depreciation example</gnm:Cell>

The root element is <gnm:Workbook ... >, where gnm is a prefix and indicates a namespace. The Workbook element has a number of child elements, one of which is Sheets. In turn, this element has other children including Sheet elements, each of which contains the data in the Cell elements, which are children of an umbrella Cells element. So you have to drill down four levels to get the data. In this listing, you see that the cell at row zero and column zero contains the data string A depreciation example. Row zero and column zero correspond to cell A1 in the spreadsheet. Also note the name of the sheet, Sheet1, is stored in its own child element whose parent is Sheet.

Extract the data with PHP

Now that you know where the data is stored in the document, it ought to be a simple matter to pull it out with a PHP routine. However, you will quickly find a couple issues that make the operation non-trivial:

  • Namespaces
  • Calculated cell contents


Namespaces do a lot to ensure that the XML conforms to an expected schema but add complexity for retrieval of information. SimpleXML functions in PHP fail unless the namespace is taken into account. Calls to methods such as children() need to pass arguments that describe the namespace prefix and indicate whether it is used as a prefix. In addition, retrieval of the values of attributes associated with namespace-prefixed elements is slightly different.

Calculated cell contents

Cells in spreadsheets can contain fixed values or calculated expressions. If you ask the static reporting engine to look for the contents of a calculated expression, you incur work to recalculate the value in the same way that the spreadsheet does. So, in fact, it saves effort if you can refer to only cells that contain fixed values.

An example PHP function

Listing 2 is an example PHP function.

Listing 2. Extracting the data
function read_cca_data() {
  $ccafile = "sscca.gnumeric";
  if (!file_exists($ccafile)) {
    die("Problem: CCA file $ccafile not found");
  } else {
    $xml = simplexml_load_file($ccafile);
    foreach ($xml->children('gnm',TRUE) as $child1) {
      if ($child1->getName() == "Sheets") {
        foreach ($child1->children('gnm',TRUE) as $child2) {
          foreach ($child2->children('gnm',TRUE) as $child3) {
            if ($child3->getName() == "Name") {
              if ($child3 != "Sheet1") break 2;
            if ($child3->getName() == "Cells") {
              foreach ($child3->children('gnm',TRUE) as $child4) {
                // child4 is a cell
                $row = (int) $child4->attributes()->Row;
                $col = (int) $child4->attributes()->Col;
                $val = "$child4"; // quotes evaluate to the type, string or numeric
                $val = (is_numeric($val)) ? round($val,3) : $val;
                $ccaarr[$row][$col] = $val;
  return $ccaarr;

The code begins by defining the name of the Gnumeric file, followed by a trap that exits PHP if the file is not found. If the file is present, it is loaded into a SimpleXML object, and it drills down four levels ($child1, $child2, $child3, $child4) to find the data in individual cells. At several levels it checks to see that it has the right child by verifying the name of the element or the name of the sheet. Each call to the children() method specifies the namespace and that it is used as a prefix. At the cell level, it gets the value (the row or column number) stored in the Row and Col attributes (for example, $child4->attributes()->Row), casts the value into an integer, and stores it for later use. If the value found in the cell is numeric, the value is rounded to three decimal places. The value, string or numeric, is then stored in a two-dimensional array, which is returned to the calling routine after the whole sheet has been read. The calling program can then pick and choose values from the array as required using known subscripts.

Gnumeric and PHP working together

Returning to the depreciation example, Figure 2 shows a modification of Figure 1. (View a text-only version of Figure 2.)

Figure 2. Another simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example extended
Screen capture of spreadsheet showing depreciation or CCA example extended

The new range F3:H12 is a copy and special paste as values of the range B3:D12. Because some of the values in the original range are calculated, the new range is helpful because its values are fixed and, therefore, directly readable. You can easily automate the copy-and-paste operation in the spreadsheet. Alternatively, considering the raw data is available in the array, you can go through the recalculation of the needed value. The numbers 5, 6, and 7 in row 2 are there simply as a reminder of the column number, which starts from zero at column A. Column E is deliberately left blank as a spacer—you must remember that it is still recorded as a cell, but with no value, in the XML file.

Calling the function in Listing 2 from the reporting application creates a multidimensional array—some sample values are $ccaarr[2][0]=2005 and $ccaarr[2][6]=0.1.


As you can see, the combination of Gnumeric and PHP encompasses the best of both worlds. The Gnumeric spreadsheet accommodates changeable information and stores the data in XML. The PHP reporting application can read the XML directly, so it is not deprived of accurate data.

The file (see Download) contains the Gnumeric file used in this article together with a basic implementation of the PHP function described in Listing 2

Downloadable resources

Related topics

  • Use the YouTube API with PHP (Vikram Vaswani, developerWorks, April 2008): In this article, explore an example of using namespaces with PHP in a different context.
  • Gnumeric: Learn more about this spreadsheet application.
  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.
  • IBM trial software: Build your next development project with software, available for download directly from developerWorks.


Sign in or register to add and subscribe to comments.

Zone=XML, Open source
ArticleTitle=Tip: Reading data from Gnumeric spreadsheets directly through XML from external applications