Skip to main content

Read and write Excel data with PHP

Using XML support

Jack Herrington (jherr@pobox.com), Editor-in-Chief, Code Generation Network
An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Summary:  Learn how to use the XML support in PHP to read the data from the XML exported from Microsoft® Excel® 2003. Also, learn to export data from your PHP application as Excel XML so your users can see their data in a real spreadsheet.

Date:  04 Oct 2005
Level:  Intermediate
Activity:  29493 views

Microsoft Office 2003 for the Microsoft Windows® operating system opened a whole new set of opportunities that non-Microsoft engineers have yet to realize. Of course, you had the usual set of new features. But the big new advance was the addition of XML file formats. With Office 2003, you can save your Microsoft Excel spreadsheet as XML and use the file just as you would the binary equivalent. The same goes for Microsoft Word.

Why are XML file formats so important? Because for years, the true power of Excel or Word was locked in binary file formats that required elaborate converters to access. Now, you can read or write Excel or Word files using XML tools like Extensible Stylesheet Language Transformation (XSLT) or the XML Document Object Model (DOM) functions built into the PHP programming language.

In this article, I show how to build a PHP Web application that uses these formats to read data into a database from an Excel spreadsheet and to export the contents of a database table to an Excel spreadsheet.

Create the database

For this article, I use a simple Web application so you can clearly see the Excel XML mechanism. This application is a table of names and e-mail addresses.

The schema in MySQL syntax looks like the code in Listing 1.


Listing 1. SQL for the database
                
DROP TABLE IF EXISTS names;
CREATE TABLE names (
	id INT NOT NULL AUTO_INCREMENT,
	first TEXT,
	middle TEXT,
	last TEXT,
	email TEXT,
	PRIMARY KEY( id )
);

This file is a single-table database in which the table -- names -- has five fields: an auto-incrementing ID field, followed by first, middle, and last name fields, and an e-mail field.

To set up the database, create the database using the Mysqladmin command-line tool: mysqladmin --user=root create names. You then load the database from the schema file: mysql --user=root names < schema.sql. The user and password authentication you use varies depending on your installation, but the idea remains the same. First, create the database. Then use the SQL file to create the tables with the required fields.


Create the import data

The next step is to create some data for import. Create a new Excel file. In the first workbook, call the top row of columns First, Middle, Last, and Email. Then, add a few rows of data to the list (see Figure 1).


Figure 1. Data for import
Data for import

You can make the list as long as you like or change the fields however you see fit. The PHP import script in this article ignores the first line of data unconditionally, because it assumes that it's the header line. In a production application, you would probably want to read and parse the header line to determine which fields are in which columns and make the appropriate changes to your import logic.

The last step is to save the file as XML by clicking File > Save As and then, in the Save As window, selecting XML Spreadsheet from the Save as type drop-down list (see Figure 2).


Figure 2. Save the file as an XML spreadsheet
Save file as XML spreadsheet

With the XML file in hand, you can begin to develop your PHP application.


Import the data

The import system starts easily enough with a page in which you specify the input Excel XML file (see Figure 3).


Figure 3. Specify the input Excel XML file
Specify input Excel XML file

The page logic is simple, as shown in Listing 2:


Listing 2. The upload page code
                
<html>
<body>
<form enctype="multipart/form-data" 
  action="import.php" method="post">
  <input type="hidden" name="MAX_FILE_SIZE" value="2000000" />
  <table width="600">
  <tr>
  <td>Names file:</td>
  <td><input type="file" name="file" /></td>
  <td><input type="submit" value="Upload" /></td>
  </tr>
  </table>
  </form>
  </body>
  </html>
  

I've named the file with a .php extension, but it's really not PHP at all. It's just an HTML file that allows the user to specify a file and submits that file to the import.php page, which is where the real fun occurs.


Read the Excel XML data

To make it a little easier to follow, I've written the import.php page in two phases. In the first phase, I simply parse the XML data and output it as a table. In the second phase, I add the logic that inserts the records into the database.

Listing 3 shows an example Excel 2003 XML file.


Listing 3. Sample Excel XML file
                
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>Jack Herrington</Author>
  <LastAuthor>Jack Herrington</LastAuthor>
  <Created>2005-08-02T04:06:26Z</Created>
  <LastSaved>2005-08-02T04:30:11Z</LastSaved>
  <Company>My Software Company, Inc.</Company>
  <Version>11.6360</Version>
  </DocumentProperties>
  <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>8535</WindowHeight>
  <WindowWidth>12345</WindowWidth>
  <WindowTopX>480</WindowTopX>
  <WindowTopY>90</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
  </ExcelWorkbook>
  <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
  <Alignment ss:Vertical="Bottom"/>
  <Borders/>
  <Font/>
  <Interior/>
  <NumberFormat/>
  <Protection/>
  </Style>
  <Style ss:ID="s21" ss:Name="Hyperlink">
  <Font ss:Color="#0000FF" ss:Underline="Single"/>
  </Style>
  <Style ss:ID="s23">
  <Font x:Family="Swiss" ss:Bold="1"/>
  </Style>
  </Styles>
  <Worksheet ss:Name="Sheet1">
  <Table ss:ExpandedColumnCount="4" 
  ss:ExpandedRowCount="5" x:FullColumns="1"
  x:FullRows="1">
  <Column ss:Index="4" ss:AutoFitWidth="0" ss:Width="154.5"/>
  <Row ss:StyleID="s23">
  <Cell><Data ss:Type="String">First</Data></Cell>
  <Cell><Data ss:Type="String">Middle</Data></Cell>
  <Cell><Data ss:Type="String">Last</Data></Cell>
  <Cell><Data ss:Type="String">Email</Data></Cell>
  </Row>
  <Row>
  <Cell><Data ss:Type="String">Molly</Data></Cell>
  <Cell ss:Index="3"><Data 
  ss:Type="String">Katzen</Data></Cell>
  <Cell ss:StyleID="s21" ss:HRef="mailto:molly@katzen.com">
  <Data ss:Type="String">molly@katzen.com</Data></Cell>
  </Row>
  ...
  </Table>
  <WorksheetOptions 
  xmlns="urn:schemas-microsoft-com:office:excel">
  <Print>
  <ValidPrinterInfo/>
  <HorizontalResolution>300</HorizontalResolution>
  <VerticalResolution>300</VerticalResolution>
  </Print>
  <Selected/>
  <Panes>
  <Pane>
  <Number>3</Number>
  <ActiveRow>5</ActiveRow>
  </Pane>
  </Panes>
  <ProtectObjects>False</ProtectObjects>
  <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
  </Worksheet>
  <Worksheet ss:Name="Sheet2">
  <WorksheetOptions 
  xmlns="urn:schemas-microsoft-com:office:excel">
  <ProtectObjects>False</ProtectObjects>
  <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
  </Worksheet>
  <Worksheet ss:Name="Sheet3">
  <WorksheetOptions 
  xmlns="urn:schemas-microsoft-com:office:excel">
  <ProtectObjects>False</ProtectObjects>
  <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
  </Worksheet>
  </Workbook>

I've chopped out a couple of rows in the middle, but otherwise, the file is verbatim what comes out of Excel. It's relatively clean XML. Note the document header portion at the beginning that describes the document and who is writing it, lays down some visual information, lists styles, an so on. Then, the data comes as a set of worksheets within the main Workbook object.

The first Worksheet object contains the real data. Within that object, the data resides inside the Table tag in a set of Row and Cell tags. Each Cell tag has a Data tag associated with it that holds the data for the cell. In this case, the data is always formatted as String type.

By default, when you create a new document, Excel creates three worksheets named Sheet1, Sheet2, and Sheet3. I didn't delete the second and third worksheets, so you see these empty workbooks at the end of the document.

Listing 4 shows the first version of the import.php script.


Listing 4. The first version of the import script
                
  <?php
  $data = array();
  
  function add_person( $first, $middle, $last, $email )
  {
  global $data;
  
  $data []= array(
  'first' => $first,
  'middle' => $middle,
  'last' => $last,
  'email' => $email 
  );
  }
  
  if ( $_FILES['file']['tmp_name'] )
  {
  $dom = DOMDocument::load( $_FILES['file']['tmp_name'] );
  $rows = $dom->getElementsByTagName( 'Row' );
  $first_row = true;
  foreach ($rows as $row)
  {
  if ( !$first_row )
  {
  $first = "";
  $middle = "";
  $last = "";
  $email = "";
  
  $index = 1;
  $cells = $row->getElementsByTagName( 'Cell' );
  foreach( $cells as $cell )
  { 
  $ind = $cell->getAttribute( 'Index' );
  if ( $ind != null ) $index = $ind;
  
  if ( $index == 1 ) $first = $cell->nodeValue;
  if ( $index == 2 ) $middle = $cell->nodeValue;
  if ( $index == 3 ) $last = $cell->nodeValue;
  if ( $index == 4 ) $email = $cell->nodeValue;
  
  $index += 1;
  }
  add_person( $first, $middle, $last, $email );
  }
  $first_row = false;
  }
  }
  ?>
  <html>
  <body>
  <table>
  <tr>
  <th>First</th>
  <th>Middle</th>
  <th>Last</th>
  <th>Email</th>
  </tr>
  <?php foreach( $data as $row ) { ?>
  <tr>
  <td><?php echo( $row['first'] ); ?></td>
  <td><?php echo( $row['middle'] ); ?></td>
  <td><?php echo( $row['last'] ); ?></td>
  <td><?php echo( $row['email'] ); ?></td>
  </tr>
  <?php } ?>
  </table>
  </body>
  </html>

The script starts by reading in the uploaded temporary file into a DOMDocument object. Then the script finds each Row tag. The first row is ignored using the logic associated with the $first_row variable. After the first row, an inside loop parses each Cell tag within the row.

The next tricky bit is to figure out which column you're in. As you can see in the XML, the Cell tag doesn't specify the row or column number. The script needs to keep track of that itself. Actually, it's a bit more complicated than that, even. In fact, the Cell tag has an ss:Index attribute that tells you what column the cell is on if there are blank columns in this row. That's what the getAttribute('index') code is looking for.

After determining the index, the code is simple. Place the cell value into a local value associated with that field. Then, at the end of the row, call the add_person function to add the person to the data set.

At the end of the page, the PHP outputs the data that was found into an HTML table using familiar PHP mechanisms (see Figure 4).


Figure 4. Data output into an HTML table
Data output into an HTML table

The next step is to load this data into the database.


Add the data to the database

After the script has the row data in a PHP data structure, it needs to add that data to the database. To do that, I've added some code that uses the Pear DB module (see Listing 5).


Listing 5. The second version of the import script
                
<?php
$data = array();

function add_person( $first, $middle, $last, $email )
{
global $data;

$data []= array(
'first' => $first,
'middle' => $middle,
'last' => $last,
'email' => $email 
);
}

if ( $_FILES['file']['tmp_name'] )
{
$dom = DOMDocument::load( $_FILES['file']['tmp_name'] );
$rows = $dom->getElementsByTagName( 'Row' );
$first_row = true;
foreach ($rows as $row)
{
if ( !$first_row )
{
$first = "";
$middle = "";
$last = "";
$email = "";

$index = 1;
$cells = $row->getElementsByTagName( 'Cell' );
foreach( $cells as $cell )
{ 
$ind = $cell->getAttribute( 'Index' );
if ( $ind != null ) $index = $ind;

if ( $index == 1 ) $first = $cell->nodeValue;
if ( $index == 2 ) $middle = $cell->nodeValue;
if ( $index == 3 ) $last = $cell->nodeValue;
if ( $index == 4 ) $email = $cell->nodeValue;

$index += 1;
}
add_person( $first, $middle, $last, $email );
}
$first_row = false;
}
}
?>
<html>
<body>
<table>
<tr>
<th>First</th>
<th>Middle</th>
<th>Last</th>
<th>Email</th>
</tr>
<?php foreach( $data as $row ) { ?>
<tr>
<td><?php echo( $row['first'] ); ?></td>
<td><?php echo( $row['middle'] ); ?></td>
<td><?php echo( $row['last'] ); ?></td>
<td><?php echo( $row['email'] ); ?></td>
</tr>
<?php } ?>
</table>
</body>
</html>

Figure 5 shows the output in Firefox.


Figure 5. The database
The database

It's not much on looks, but that's not the point. The point is that through use of the database object's prepare and execute statements, you can add the data into the database. To prove it, I've created another page called list.php that shows the data in the database (see Listing 6).


Listing 6. List.php
                
  <?php
  // Install the DB module using 'pear install DB'
  require_once( "db.php" );
  
  $data = array();
  
  $db =& DB::connect("mysql://root@localhost/names", array());
  if (PEAR::isError($db)) { die($db->getMessage()); }
  
  $res = $db->query( "SELECT * FROM names ORDER BY last" );
  ?>
  <html>
  <body>
  <table>
  <tr>
  <th>ID</th>
  <th>First</th>
  <th>Middle</th>
  <th>Last</th>
  <th>Email</th>
  </tr>
  <?php while( $res->fetchInto( $row, 
            DB_FETCHMODE_ASSOC ) ) { ?>
  <tr>
  <td><?php echo( $row['id'] ); ?></td>
  <td><?php echo( $row['first'] ); ?></td>
  <td><?php echo( $row['middle'] ); ?></td>
  <td><?php echo( $row['last'] ); ?></td>
  <td><?php echo( $row['email'] ); ?></td>
  </tr>
  <?php } ?>
  </table>
  Download as an 
  <a href="listxl.php">Excel spreadsheet</a>.
 </body>
  </html>

This simple page starts by executing a SQL select operation against the names table. Then it creates a table and adds every row in the table to it using the fetchInto method to get the row data.

Figure 6 shows the output of the page.


Figure 6. Output from list.php
Output from list.php

Again, not a beauty contest winner, but with this page, I have explained the basics of how to get to the data into the database. That, in turn, provides the basis for the script that will generate the Excel XML file for export.


Generate the export Excel XML

The final step is to generate the Excel XML. For me, that started with copying the Excel XML into a PHP script (see Listing 7). I know that's lazy, but it's the easiest way to get to an Excel XML file that parses properly. (Excel is picky about its XML.)


Listing 7. The XML export page
                
  <?php
  header( "content-type: text/xml" );
  // Install the DB module using 'pear install DB'
  require_once( "db.php" );
  
  $data = array();
  
  $db =& DB::connect("mysql://root@localhost/names", array());
  if (PEAR::isError($db)) { die($db->getMessage()); }
  
  $res = $db->query( "SELECT * FROM names ORDER BY last" );
  
  $rows = array();
  while( $res->fetchInto( $row, DB_FETCHMODE_ASSOC ) ) 
  { $rows []= $row; }
  print "<?xml version=\"1.0\"?>\n";
  print "<?mso-application progid=\"Excel.Sheet\"?>\n";
  ?>
  <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:o="urn:schemas-microsoft-com:office:office"
  xmlns:x="urn:schemas-microsoft-com:office:excel"
  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:html="http://www.w3.org/TR/REC-html40">
  <DocumentProperties 
     xmlns="urn:schemas-microsoft-com:office:office">
 <Author>Jack Herrington</Author>
  <LastAuthor>Jack Herrington</LastAuthor>
  <Created>2005-08-02T04:06:26Z</Created>
  <LastSaved>2005-08-02T04:30:11Z</LastSaved>
  <Company>My Company, Inc.</Company>
  <Version>11.6360</Version>
  </DocumentProperties>
  <ExcelWorkbook 
     xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>8535</WindowHeight>
  <WindowWidth>12345</WindowWidth>
  <WindowTopX>480</WindowTopX>
  <WindowTopY>90</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
  </ExcelWorkbook>
  <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
  <Alignment ss:Vertical="Bottom"/>
  <Borders/>
  <Font/>
  <Interior/>
  <NumberFormat/>
  <Protection/>
  </Style>
  <Style ss:ID="s21" ss:Name="Hyperlink">
  <Font ss:Color="#0000FF" ss:Underline="Single"/>
  </Style>
  <Style ss:ID="s23">
  <Font x:Family="Swiss" ss:Bold="1"/>
  </Style>
  </Styles>
  <Worksheet ss:Name="Names">
  <Table ss:ExpandedColumnCount="4"
  ss:ExpandedRowCount="<?php echo( count( $rows ) + 1 ); ?>"
  x:FullColumns="1" x:FullRows="1">
  <Column ss:Index="4" ss:AutoFitWidth="0" ss:Width="154.5"/>
  <Row ss:StyleID="s23">
  <Cell><Data 
    ss:Type="String">First</Data></Cell>
  <Cell><Data 
   ss:Type="String">Middle</Data></Cell>
  <Cell><Data 
    ss:Type="String">Last</Data></Cell>
  <Cell><Data 
    ss:Type="String">Email</Data></Cell>
  </Row>
  <?php foreach( $rows as $row ) { ?>
  <Row>
  <Cell><Data 
     ss:Type="String"><?php echo( $row['first'] ); ?>
  </Data></Cell>
  <Cell><Data 
     ss:Type="String"><?php echo( $row['middle'] ); ?>
  </Data></Cell>
  <Cell><Data 
    ss:Type="String"><?php echo( $row['last'] ); ?>
  </Data></Cell>
  <Cell ss:StyleID="s21"><Data ss:Type="String">
  <?php echo( $row['email'] ); ?></Data></Cell>
  </Row>
  <?php } ?>
  </Table>
  <WorksheetOptions 
     xmlns="urn:schemas-microsoft-com:office:excel">
  <Print>
  <ValidPrinterInfo/>
  <HorizontalResolution>300</HorizontalResolution>
  <VerticalResolution>300</VerticalResolution>
  </Print>
  <Selected/>
  <Panes>
  <Pane>
  <Number>3</Number>
  <ActiveRow>1</ActiveRow>
  </Pane>
  </Panes>
  <ProtectObjects>False</ProtectObjects>
  <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
  </Worksheet>
  </Workbook>

The script starts with setting the content type of the output to XML. That's important because browsers will think this code is simply bad HTML otherwise.

I've changed the SQL query portion of the code to save the results of the query into an array. Typically, I wouldn't do that with this type of report page, but in this case, I need to put the number of rows, plus one, into the ss:ExpandedRowCount attribute. The plus one is to account for the header row.

Figure 7 shows the result of clicking the link.


Figure 7. The export XML in Firefox
Export XML in Firefox

Not terribly impressive. But look what happens when I click the same link in Internet Explorer (see Figure 8):


Figure 8. The exported XML in Internet Explorer
Exported XML in Internet Explorer

What a difference. This is a full spreadsheet -- formatting and all right -- inside the browser. (Of course, in Firefox, you can right-click the link, save the XML to a file, and launch it that way.)


A technique with possibilities

As with anything on the bleeding edge, this technique has some pitfalls. For example, it doesn't work on Macintosh yet because the latest Office for Mac version doesn't support XML files.

Another hitch is that debugging these files can be a problem. If the XML is even slightly wrong, the embedded Excel object get into a kind of bad state in which Excel already thinks it's running and refuses to launch. This can only be fixed only by restarting the application.

That said, this technique does offer unparalleled integration possibilities for PHP programmers. How often do you find that the source of the data is in something like Excel or Word and needs to be hand-migrated -- cell by cell or paragraph by paragraph -- into a Web application? With import technology like this, the problem is solved. You can read the data directly from the worksheets or document.

The same can be said of the export side. HTML is great for articles and papers, but was never designed to render spreadsheet information properly. With the techniques shown here, you can generate a spreadsheet -- formulae, formatting, and all -- in a way users expect to see it.


Resources

Learn

  • PHP.net is the place to learn about the latest news about PHP, find downloads, and learn from other users.

  • Microsoft Office Online is the best place to start for Office information, including support.

  • An authoritative source for XML standards is the World Wide Web Consortium (W3C).

  • XML in Office 2003: Information Sharing with Desktop XML (Prentice Hall, 2003), by Charles F. Goldfarb and Priscilla Walmsley, is a complete and wide-ranging guide.

  • This November 2003 developerWorks article "Convert Excel data to XML" shows how to unlock data from Excel files to process them in XML, and examines the pros and cons of different solutions. It is appropriate for Excel 2002 and Excel XP client software.

  • Find more resources for XML developers in the developerWorks XML zone.

  • Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.

Get products and technologies

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

Discuss

About the author

An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, XML
ArticleID=94956
ArticleTitle=Read and write Excel data with PHP
publish-date=10042005
author1-email=jherr@pobox.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers