Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Tip: Reading data from Gnumeric spreadsheets directly through XML from external applications

The best of both worlds in account reporting

Colin Beckingham, Writer and Researcher, Freelance
Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields including banking, horticulture, horse racing, teaching, civil service, retail, and travel and tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming, VoIP, and voice-control applications on Linux. You can reach Colin at colbec@start.ca.

Summary:  When keeping accounts, bookkeepers often like to manage dynamic data using spreadsheets and produce static reports with a different application. However, allowing the static reporting program to read directly from the spreadsheet can be problematic. With Gnumeric as the spreadsheet and PHP as the reporting application, this article shows how spreadsheet data stored as XML, with proper management of namespaces, allows reading of data directly from the spreadsheet. You save time, increase accuracy, and avoid copy-and-paste and other errors.

View more content in this series

Date:  06 Jul 2010
Level:  Intermediate PDF:  A4 and Letter (253 KB | 8 pages)Get Adobe® Reader®
Also available in:   Chinese

Activity:  6776 views
Comments:  

Dynamics and statics in accounting

Financial programmers and users frequently find that a spreadsheet format is good for some aspects of basic accounting data; changes are immediately reflected, and you can explore trial scenarios without making changes to the fundamental accounting records.

Develop skills on this topic

This content is part of progressive knowledge paths for advancing your skills. See:

However, this can present a small problem. Static reports, such as income and expense, trial balance, and balance sheets, might be produced by a different application that needs access to the dynamic data to complete the report. Copying and pasting from one application to the other takes time and is subject to error, and transfer of information using comma-separated values and other techniques is clunky at best.

When the spreadsheet stores its data as XML, an XML-aware static reporting program can read that data directly. This example uses Gnumeric, which stores its data in XML format, as the spreadsheet and PHP, which is able to read the XML directly, as a reporting application.

Example: Depreciation allowances

A typical example concerns depreciation or capital cost allowances (CCA). A machine with a productive life of many years that is used in the production process wears out over time and at the end of its lifetime needs to be replaced. Accountants expense a certain percentage of the depreciated value each year. The tax authority might permit a certain maximum percentage deduction in any year, but it is up to the business to decide what amount, less than or equal to the permitted maximum, suits the business to claim during the current reporting period.

A dynamic spreadsheet can display multiple years, presenting the history of deductions in previous years and those for future years. Solver and other statistical tools help the business owner decide what deduction to take in the current year. When the decision is made, the value needs to be exposed to the static reporting application. There are many ways of doing this, some more effective than others.

Figure 1 shows a simple example. (View a text-only version of Figure 1.)


Figure 1. A simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example from 2005-2014

A machine was purchased in 2005 at the beginning of the year for $30,000. The tax authority allows a maximum of 20% yearly depreciation on this type of machine. In the first two years the business decided to use 10%. Now, in 2010, the owner is thinking of using 15%. Changing that value in the spreadsheet (cell C8) immediately updates the following years' entries in the spreadsheet.

The raw spreadsheet data

Because the uncompressed Gnumeric file is pure XML, you can get an idea of the structure of the data with a basic text editor. Note that Gnumeric can store the spreadsheet data in a compressed format. To view the data directly, make sure the spreadsheet is stored using zero compression from the Gnumeric preferences.

A better way to view the overall XML structure of the Gnumeric file is to copy it to a new file with an .xml extension and then open it with an XML-aware browser. This allows you to collapse or expand elements as needed.

Much of the data, while important in a spreadsheet context, is superfluous to simple data exchange, so you need to focus on the real data.

Listing 1 is an extract from a data sheet as saved by Gnumeric.


Listing 1. An extract from the data file

<?xml version="1.0" encoding="UTF-8"?>
<gnm:Workbook xmlns:gnm="http://www.gnumeric.org/v10.dtd" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.gnumeric.org/v9.xsd">
  <gnm:Version Epoch="1" Major="9" Minor="13" Full="1.9.13"/>
  <gnm:Attributes ...>
  ...
  <gnm:Sheets>
    ...
    <gnm:Sheet ... >
      ...
      <gnm:Name>Sheet1</gnm:Name>
      ...
      <gnm:Cells>
        <gnm:Cell Row="0" Col="0" ValueType="60">A depreciation example</gnm:Cell>
        ...

The root element is <gnm:Workbook ... >, where gnm is a prefix and indicates a namespace. The Workbook element has a number of child elements, one of which is Sheets. In turn, this element has other children including Sheet elements, each of which contains the data in the Cell elements, which are children of an umbrella Cells element. So you have to drill down four levels to get the data. In this listing, you see that the cell at row zero and column zero contains the data string A depreciation example. Row zero and column zero correspond to cell A1 in the spreadsheet. Also note the name of the sheet, Sheet1, is stored in its own child element whose parent is Sheet.

Extract the data with PHP

Now that you know where the data is stored in the document, it ought to be a simple matter to pull it out with a PHP routine. However, you will quickly find a couple issues that make the operation non-trivial:

  • Namespaces
  • Calculated cell contents

Namespaces

Namespaces do a lot to ensure that the XML conforms to an expected schema but add complexity for retrieval of information. SimpleXML functions in PHP fail unless the namespace is taken into account. Calls to methods such as children() need to pass arguments that describe the namespace prefix and indicate whether it is used as a prefix. In addition, retrieval of the values of attributes associated with namespace-prefixed elements is slightly different.

Calculated cell contents

Cells in spreadsheets can contain fixed values or calculated expressions. If you ask the static reporting engine to look for the contents of a calculated expression, you incur work to recalculate the value in the same way that the spreadsheet does. So, in fact, it saves effort if you can refer to only cells that contain fixed values.

An example PHP function

Listing 2 is an example PHP function.


Listing 2. Extracting the data

function read_cca_data() {
  $ccafile = "sscca.gnumeric";
  if (!file_exists($ccafile)) {
    die("Problem: CCA file $ccafile not found");
  } else {
    $xml = simplexml_load_file($ccafile);
    foreach ($xml->children('gnm',TRUE) as $child1) {
      if ($child1->getName() == "Sheets") {
        foreach ($child1->children('gnm',TRUE) as $child2) {
          foreach ($child2->children('gnm',TRUE) as $child3) {
            if ($child3->getName() == "Name") {
              if ($child3 != "Sheet1") break 2;
            }
            if ($child3->getName() == "Cells") {
              foreach ($child3->children('gnm',TRUE) as $child4) {
                // child4 is a cell
                $row = (int) $child4->attributes()->Row;
                $col = (int) $child4->attributes()->Col;
                $val = "$child4"; // quotes evaluate to the type, string or numeric
                $val = (is_numeric($val)) ? round($val,3) : $val;
                $ccaarr[$row][$col] = $val;
              }
            }
          }
        }
      }
    }
  }
  return $ccaarr;
}

The code begins by defining the name of the Gnumeric file, followed by a trap that exits PHP if the file is not found. If the file is present, it is loaded into a SimpleXML object, and it drills down four levels ($child1, $child2, $child3, $child4) to find the data in individual cells. At several levels it checks to see that it has the right child by verifying the name of the element or the name of the sheet. Each call to the children() method specifies the namespace and that it is used as a prefix. At the cell level, it gets the value (the row or column number) stored in the Row and Col attributes (for example, $child4->attributes()->Row), casts the value into an integer, and stores it for later use. If the value found in the cell is numeric, the value is rounded to three decimal places. The value, string or numeric, is then stored in a two-dimensional array, which is returned to the calling routine after the whole sheet has been read. The calling program can then pick and choose values from the array as required using known subscripts.

Gnumeric and PHP working together

Returning to the depreciation example, Figure 2 shows a modification of Figure 1. (View a text-only version of Figure 2.)


Figure 2. Another simple depreciation or CCA spreadsheet
Screen capture of spreadsheet showing depreciation or CCA example extended

The new range F3:H12 is a copy and special paste as values of the range B3:D12. Because some of the values in the original range are calculated, the new range is helpful because its values are fixed and, therefore, directly readable. You can easily automate the copy-and-paste operation in the spreadsheet. Alternatively, considering the raw data is available in the array, you can go through the recalculation of the needed value. The numbers 5, 6, and 7 in row 2 are there simply as a reminder of the column number, which starts from zero at column A. Column E is deliberately left blank as a spacer—you must remember that it is still recorded as a cell, but with no value, in the XML file.

Calling the function in Listing 2 from the reporting application creates a multidimensional array—some sample values are $ccaarr[2][0]=2005 and $ccaarr[2][6]=0.1.

Conclusion

As you can see, the combination of Gnumeric and PHP encompasses the best of both worlds. The Gnumeric spreadsheet accommodates changeable information and stores the data in XML. The PHP reporting application can read the XML directly, so it is not deprived of accurate data.

The testrun.zip file (see Download) contains the Gnumeric file used in this article together with a basic implementation of the PHP function described in Listing 2



Download

DescriptionNameSizeDownload method
Sample files for this articletestrun.zip4 KBHTTP

Information about download methods


Resources

Learn

Get products and technologies

Discuss

About the author

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields including banking, horticulture, horse racing, teaching, civil service, retail, and travel and tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming, VoIP, and voice-control applications on Linux. You can reach Colin at colbec@start.ca.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=498964
ArticleTitle=Tip: Reading data from Gnumeric spreadsheets directly through XML from external applications
publish-date=07062010

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers