Tip: Reading data from Gnumeric spreadsheets directly through XML from external applications

The best of both worlds in account reporting

When keeping accounts, bookkeepers often like to manage dynamic data using spreadsheets and produce static reports with a different application. However, allowing the static reporting program to read directly from the spreadsheet can be problematic. With Gnumeric as the spreadsheet and PHP as the reporting application, this article shows how spreadsheet data stored as XML, with proper management of namespaces, allows reading of data directly from the spreadsheet. You save time, increase accuracy, and avoid copy-and-paste and other errors.

Colin Beckingham, Writer and Researcher, Freelance

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields including banking, horticulture, horse racing, teaching, civil service, retail, and travel and tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming, VoIP, and voice-control applications on Linux. You can reach Colin at colbec@start.ca.



06 July 2010

Also available in Chinese Russian Spanish

Dynamics and statics in accounting

Financial programmers and users frequently find that a spreadsheet format is good for some aspects of basic accounting data; changes are immediately reflected, and you can explore trial scenarios without making changes to the fundamental accounting records.

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See XML and data compression

However, this can present a small problem. Static reports, such as income and expense, trial balance, and balance sheets, might be produced by a different application that needs access to the dynamic data to complete the report. Copying and pasting from one application to the other takes time and is subject to error, and transfer of information using comma-separated values and other techniques is clunky at best.

When the spreadsheet stores its data as XML, an XML-aware static reporting program can read that data directly. This example uses Gnumeric, which stores its data in XML format, as the spreadsheet and PHP, which is able to read the XML directly, as a reporting application.

Example: Depreciation allowances

A typical example concerns depreciation or capital cost allowances (CCA). A machine with a productive life of many years that is used in the production process wears out over time and at the end of its lifetime needs to be replaced. Accountants expense a certain percentage of the depreciated value each year. The tax authority might permit a certain maximum percentage deduction in any year, but it is up to the business to decide what amount, less than or equal to the permitted maximum, suits the business to claim during the current reporting period.

A dynamic spreadsheet can display multiple years, presenting the history of deductions in previous years and those for future years. Solver and other statistical tools help the business owner decide what deduction to take in the current year. When the decision is made, the value needs to be exposed to the static reporting application. There are many ways of doing this, some more effective than others.

Figure 1 shows a simple example. (View a text-only version of Figure 1.)

Figure 1. A simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example from 2005-2014

A machine was purchased in 2005 at the beginning of the year for $30,000. The tax authority allows a maximum of 20% yearly depreciation on this type of machine. In the first two years the business decided to use 10%. Now, in 2010, the owner is thinking of using 15%. Changing that value in the spreadsheet (cell C8) immediately updates the following years' entries in the spreadsheet.

The raw spreadsheet data

Because the uncompressed Gnumeric file is pure XML, you can get an idea of the structure of the data with a basic text editor. Note that Gnumeric can store the spreadsheet data in a compressed format. To view the data directly, make sure the spreadsheet is stored using zero compression from the Gnumeric preferences.

A better way to view the overall XML structure of the Gnumeric file is to copy it to a new file with an .xml extension and then open it with an XML-aware browser. This allows you to collapse or expand elements as needed.

Much of the data, while important in a spreadsheet context, is superfluous to simple data exchange, so you need to focus on the real data.

Listing 1 is an extract from a data sheet as saved by Gnumeric.

Listing 1. An extract from the data file
<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
  <gnm:Version Epoch="1" Major="9" Minor="13" Full="1.9.13"/>
  <gnm:Attributes ...>
  ...
  <gnm:Sheets>
    ...
    <gnm:Sheet ... >
      ...
      <gnm:Name>Sheet1</gnm:Name>
      ...
      <gnm:Cells>
        <gnm:Cell Row="0" Col="0" ValueType="60">A depreciation example</gnm:Cell>
        ...

The root element is <gnm:Workbook ... >, where gnm is a prefix and indicates a namespace. The Workbook element has a number of child elements, one of which is Sheets. In turn, this element has other children including Sheet elements, each of which contains the data in the Cell elements, which are children of an umbrella Cells element. So you have to drill down four levels to get the data. In this listing, you see that the cell at row zero and column zero contains the data string A depreciation example. Row zero and column zero correspond to cell A1 in the spreadsheet. Also note the name of the sheet, Sheet1, is stored in its own child element whose parent is Sheet.

Extract the data with PHP

Now that you know where the data is stored in the document, it ought to be a simple matter to pull it out with a PHP routine. However, you will quickly find a couple issues that make the operation non-trivial:

  • Namespaces
  • Calculated cell contents

Namespaces

Namespaces do a lot to ensure that the XML conforms to an expected schema but add complexity for retrieval of information. SimpleXML functions in PHP fail unless the namespace is taken into account. Calls to methods such as children() need to pass arguments that describe the namespace prefix and indicate whether it is used as a prefix. In addition, retrieval of the values of attributes associated with namespace-prefixed elements is slightly different.

Calculated cell contents

Cells in spreadsheets can contain fixed values or calculated expressions. If you ask the static reporting engine to look for the contents of a calculated expression, you incur work to recalculate the value in the same way that the spreadsheet does. So, in fact, it saves effort if you can refer to only cells that contain fixed values.

An example PHP function

Listing 2 is an example PHP function.

Listing 2. Extracting the data
function read_cca_data() {
  $ccafile = "sscca.gnumeric";
  if (!file_exists($ccafile)) {
    die("Problem: CCA file $ccafile not found");
  } else {
    $xml = simplexml_load_file($ccafile);
    foreach ($xml->children('gnm',TRUE) as $child1) {
      if ($child1->getName() == "Sheets") {
        foreach ($child1->children('gnm',TRUE) as $child2) {
          foreach ($child2->children('gnm',TRUE) as $child3) {
            if ($child3->getName() == "Name") {
              if ($child3 != "Sheet1") break 2;
            }
            if ($child3->getName() == "Cells") {
              foreach ($child3->children('gnm',TRUE) as $child4) {
                // child4 is a cell
                $row = (int) $child4->attributes()->Row;
                $col = (int) $child4->attributes()->Col;
                $val = "$child4"; // quotes evaluate to the type, string or numeric
                $val = (is_numeric($val)) ? round($val,3) : $val;
                $ccaarr[$row][$col] = $val;
              }
            }
          }
        }
      }
    }
  }
  return $ccaarr;
}

The code begins by defining the name of the Gnumeric file, followed by a trap that exits PHP if the file is not found. If the file is present, it is loaded into a SimpleXML object, and it drills down four levels ($child1, $child2, $child3, $child4) to find the data in individual cells. At several levels it checks to see that it has the right child by verifying the name of the element or the name of the sheet. Each call to the children() method specifies the namespace and that it is used as a prefix. At the cell level, it gets the value (the row or column number) stored in the Row and Col attributes (for example, $child4->attributes()->Row), casts the value into an integer, and stores it for later use. If the value found in the cell is numeric, the value is rounded to three decimal places. The value, string or numeric, is then stored in a two-dimensional array, which is returned to the calling routine after the whole sheet has been read. The calling program can then pick and choose values from the array as required using known subscripts.

Gnumeric and PHP working together

Returning to the depreciation example, Figure 2 shows a modification of Figure 1. (View a text-only version of Figure 2.)

Figure 2. Another simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example extended

The new range F3:H12 is a copy and special paste as values of the range B3:D12. Because some of the values in the original range are calculated, the new range is helpful because its values are fixed and, therefore, directly readable. You can easily automate the copy-and-paste operation in the spreadsheet. Alternatively, considering the raw data is available in the array, you can go through the recalculation of the needed value. The numbers 5, 6, and 7 in row 2 are there simply as a reminder of the column number, which starts from zero at column A. Column E is deliberately left blank as a spacer—you must remember that it is still recorded as a cell, but with no value, in the XML file.

Calling the function in Listing 2 from the reporting application creates a multidimensional array—some sample values are $ccaarr[2][0]=2005 and $ccaarr[2][6]=0.1.

Conclusion

As you can see, the combination of Gnumeric and PHP encompasses the best of both worlds. The Gnumeric spreadsheet accommodates changeable information and stores the data in XML. The PHP reporting application can read the XML directly, so it is not deprived of accurate data.

The testrun.zip file (see Download) contains the Gnumeric file used in this article together with a basic implementation of the PHP function described in Listing 2


Download

DescriptionNameSize
Sample files for this articletestrun.zip4 KB

Resources

Learn

Get products and technologies

  • Gnumeric: Learn more about this spreadsheet application.
  • IBM trial software: Build your next development project with software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=498964
ArticleTitle=Tip: Reading data from Gnumeric spreadsheets directly through XML from external applications
publish-date=07062010