Read, recycle, and reuse: Reporting made easy with Excel, XML, and Java technologies, Part 2

Convert between XML and Excel reporting formats

Part 1 of this series walked through reading Microsoft® Excel® files using Java™ technology and Apache POI. But reading Excel files is only a start. This installment mixes up Excel and XML to soothe developers who turn green at the thought of converting between reporting formats.

Shaene M Siders (consult@DragonUnderGlass.com), Writer and Consultant, Dragon Under Glass

Photo of Shaene SidersShaene Siders is a consultant and writer for technology, training, and entertainment. She started a writing and Web development company in 1996 and has been an IBM Rational Certified Instructor for ClearCase and ClearQuest since 2003. She is the author and producer of the training courseware series Java for Mad Scientists, and she is also an IBM Rational Certified Instructor for Build Forge, Rational Functional Tester, Rational Performance Tester, and Rational Manual Tester. You can reach Shaene at consult@DragonUnderGlass.com.



02 March 2010

Also available in Chinese Japanese Vietnamese Portuguese

Did you try out creative excuse #432 on your boss this week? Were you tempted to?

Hopefully, you didn't have to. Part 1 of this series helped you understand how to work with Microsoft® Excel® spreadsheets in Java™ technology (see the link to "Read, recycle, and reuse: Reporting made easy with Excel, XML, and Java technologies, Part 1" in Resources). Simply download Apache POI and set up to use it, and soon enough, walking through an Excel spreadsheet is almost as easy as walking through a park. And almost as green.

But reading Excel files is only a start. This installment shows you how to use Apache POI and the XML Object Model (XOM) to store an Excel file in XML objects. Then, you can recycle those objects to write an entirely new Excel spreadsheet and XML file.

Frequently used acronyms

  • API: Application Programming Interface
  • ASCII: American Standard Code for Information Interchange
  • HSSF: Horrible Spread Sheet Format
  • HTML: Hypertext Markup Language
  • XML: Extensible Markup Language
  • XSSF: XML SpreadSheet Format

The sample application

The sample application contains an Excel spreadsheet called Employee_List.xls from the fictional Planet Power corporation. The Big Boss has convinced Planet Power's top employees to donate 1% of their salaries to his favorite cause: the Genetically Engineered Enormous Wild Hamster Interplanetary Sanctuary (GEE WHIS). The sample application calculates the amount and creates an XML report to rush to the sanctuary. Meanwhile, the application writes an Excel spreadsheet for the Big Boss.

To follow the examples in this article, download the samples and extract the files to C:\Planet Power. Then, start Eclipse.

Employees2 Eclipse project

To import the Employees2 Eclipse project containing the sample application, perform these steps:

  1. In Eclipse, right-click in the Package Explorer, and then click Import.
  2. Expand General, and then select Existing Projects into Workspace. Click Next (Figure 1).
    Figure 1. Bring an existing project into the workspace
    Screen capture of the Eclipse Import dialog with an existing project selected to bring into the workspace
  3. Click Browse beside Select root directory, and then navigate to C:\Planet Power\Employees2.
  4. Select the Employees2 folder, click OK, and then click Finish (Figure 2).
    Figure 2. Finish importing a project into Eclipse
    Screen capture of the Eclipse Import dialog with Finish of project import into Eclipse

The Employees2 folder should appear in the Package Explorer pane.

Note: For this project, use the file ExcelXML.java in the Employees2 project under src\default_package.


Getting started

In Part 1 of this series, your first step is to import Apache POI along with exception and file-handling classes (see Resources for the link to Part 1). In addition, you'll need to add some XML API classes along with classes for working with numbers, as in Listing 1.

Listing 1. Importing classes (ExcelXML.java)
// File and exception handling imports
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.text.ParseException;

// Apache POI imports
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFHyperlink;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFDataFormatter;
import org.apache.poi.hssf.usermodel.HSSFCellStyle;
import org.apache.poi.hssf.usermodel.HSSFDataFormat;

// XOM imports
import nu.xom.Document;
import nu.xom.Elements;
import nu.xom.Element;
import nu.xom.Attribute;
import nu.xom.Serializer;

// Imports for later calculations
import java.text.NumberFormat;

After importing the classes,you're ready to begin programming inside main().


Handling files

Reading XML files

This article covers writing XML but not reading XML files. For information on reading XML, see the link to the XOM Web site provided in Resources or the code sample XMLReader.java in the downloaded Eclipse project.

The code in Listing 2 represents an exception-handling structure for file (IOException) and number conversion (ParseException) errors.

Listing 2. Set up exception handling (ExcelXML.java)
public class ExcelXML {

   public static void main(String[] args) {

      try {

      // Put file and number handling code here.

   // End try
   }
   catch (IOException e) {
      System.out.println("File Input/Output Exception!");
   }
   catch (ParseException e) {
      System.out.println("Number Parse Exception!");
   }

   // End main method
   }

// End class block
}

Now you can begin to work with XOM.


XOM, Document, and XML objects

XOM simplifies working with XML by parsing it into objects that represent an XML document's pieces. The class that represents an entire XML document is nu.xom.Document. From the Document, you can add or access other pieces. Some classes for working with XML include:

  • nu.xom.Builder
  • nu.xom.Document
  • nu.xom.Elements
  • nu.xom.Element
  • nu.xom.Serializer

The main pieces of XML are called elements. An element consists of a pair of tags and the content in between. Here's one example of an element from a sample file called weather_service.xml.

<dawn>07:00</dawn>

The words dawn and their brackets (<>) and slash (/) are called tags. The 07:00 is the content.

Elements can contain other elements, text content, or both. A containing element is called a parent, and the elements inside are called child elements or children.

In XOM, elements are represented by nu.xom.Element objects. A set of elements is a nu.xom.Elements object.

The root of all good

To find elements, use the one element every well-formed XML document is guaranteed to have: the root element. The root acts as a container for every other element. If a document doesn't have a root, it's not proper XML.

To find the root, use getRootElement() on the Document object. Getting just one element opens strategies for exploring the document. Want to work with the root's children? Get a list of all child elements or child elements with a specific name using variations of Element.getChildElements(). Want a single element? Get only the first child element with a specific name from Element.getFirstChildElement().

Does this seem familiar? Iterating through an XML document to find children of children is similar to iterating through an Excel worksheet.

In the previous article, you saw how Apache POI's getStringCellValue() method retrieved a string value from an Excel HSSFCell. Similarly, XOM uses Element.getValue() to get string content from an XML element. However, unlike working with cells, working with XOM Elements does not require testing the data types. XML is all text.


Excel-XML mashup and XML markup

When traversing spreadsheets and extracting data, you can store the data in any number of objects: Strings. Arrays. Squirrels. This article uses XML elements. (That way, it's easier on the Squirrels.) It also offers some insight into translating between the two formats.

Listing 3 prepares a new HSSFWorkbook and XML Document to store the workbook's information.

Listing 3. Prepare a workbook and an XML Document (ExcelXML.java)
// First, create a new XML Document object to load the Excel sheet into XML.
// To create an XML Document, first create an element to be its root.
Element reportRoot = new Element("sheet");

// Create a new XML Document object using the root element
Document XMLReport = new Document(reportRoot);

// Set up a FileInputStream to represent the Excel spreadsheet
FileInputStream excelFIS = new FileInputStream("C:\\Planet Power\\Employee_List.xls");

// Create an Excel Workbook object using the FileInputStream created above 
HSSFWorkbook excelWB = new HSSFWorkbook(excelFIS);

Does something seem backward about Listing 3? The code creates a new Element before it creates a Document. Why?

A brand new Document requires one thing: an Element to be its root element. Remember, the Document can't represent a well-formed XML document without having a root element. Therefore, you must pass it a root element when you create it.

In contrast, you can create elements that don't belong to a root. You can leave them as free-floating elements or append them to a document root or other elements. However, before you start, you should plan the structure of your XML.

XML structure

XML describes and formats data, mainly through elements and attributes. Attributes are name-value pairs. The name part of an attribute describes the data the attribute holds. The value of the attribute is its data. Writers of HTML code are familiar with attributes such as:

<font size="12">Hello, Planet!</font>

The entire listing is an element. The tag is <font>. The attribute is size="12". The name of the attribute is size. Its value is 12. The equal sign (=) joins the two. Generally, data is stored in elements, while metadata is stored in attributes.

So, how should the Employee_List.xls workbook translate to XML?

Markup method 1: mimic workbook structure

One way to structure the XML is to mimic the physical structure of an Excel workbook. Consider Listing 4.

Listing 4. XML structure based on general workbook structure
<sheet>
   <row>
      <cell>
         Thumb
      </cell>
      <cell>
         Green
      </cell>
      <cell>
         Growing Plants
      </cell>
      <cell>
         5:00 AM
      </cell>
      <cell>
         2:00 PM
      </cell>
      <cell>
         150,000
      </cell>
   </row>
</sheet>

With this format, the rows and cells are clear. But what data does each cell hold? Is 5:00 AM the employee's start or end time? What about using the column name as an attribute of the cell?

If maintaining the Excel spreadsheet structure is important, that might work. However, in XOM, without learning how to write XPath queries, there is no easy method to extract a collection of elements based on their attribute values. With the column names stored as attributes, extra code is required to locate a collection of all of the elements from a specific column. Because XOM contains a method to find elements by their names, consider using the Excel columns as element names rather than as attributes.

Markup method 2: mimic the data structure

Instead of basing XML on the presentation of data, consider a structure that describes the data. That's what XML does best, as Listing 5 shows.

Listing 5. XML structure based on describing data
<EmployeeData>
   <Employee>
      <EmployeeLastName>
         Thumb
      </EmployeeLastName>
      <EmployeeFirstName>
         Green
      </EmployeeFirstName>
      <MainSuperPower>
         Growing Plants
      </MainSuperPower>
      <DailyStartTime>
         5:00 AM
      </DailyStartTime>
      <DailyEndTime>
         2:00 PM
      </DailyEndTime>
      <Salary>
         150,000
      </Salary>
   </Employee>
</EmployeeData>

This approach allows you to use the getChildElements("Salary") method on each Employee element to quickly find the employees' salaries.

However, using Excel column names for the element names is risky. Excel columns can use characters that are not valid in XML element names, such as spaces. So, be sure to scrub those characters out of potential element names.

To structure the data this way, you must be familiar with the data. It would be difficult to calculate programmatically that a row in the Excel spreadsheet should be called an Employee in the XML. It would also be difficult to calculate a name for the root element (EmployeeData in the above example).

Also, what happens if the structure changes or the Big Boss wants to recycle the code for other spreadsheets? Spreadsheet rows could list types of hamsters rather than employees. You would then have to adjust what the program calls the rows.

Markup method 3: Excel XML mashup blend

Consider blending the Excel-structured XML with the data-structured markup, as in Listing 6.

Listing 6. Blending workbook structure with data-based markup
<sheet>
   <row>
      <EmployeeLastName>
         Thumb
      </EmployeeLastName>
      <EmployeeFirstName>
         Green
      </EmployeeFirstName>
      <MainSuperPower>
         Growing Plants
      </MainSuperPower>
      <DailyStartTime>
         5:00 AM
      </DailyStartTime>
      <DailyEndTime>
         2:00 PM
      </DailyEndTime>
      <Salary>
         150,000
      </Salary>
   </row>
</sheet>

If the first row in each Excel spreadsheet contains column names, this blend could be flexible enough to work with several Excel workbooks. Because the data in the document and rows might not be consistent between spreadsheets, you can use the generic sheet and row as element names, and they will still have meaning to programmers reading the code.

Note: As with pure data-centric markup, beware of illegal characters weaseling into XML element names.

This article uses an XML-Excel blended format to store cell values. But what about tracking other cell information, like data type and formatting?

Data type and formatting are metadata. Depending on what metadata must be preserved, you can use attributes like dataFormat and dataType.


Coding to convert

After deciding how your XML should look, begin storing Excel data in XML elements. Use the same loops to traverse the Excel spreadsheet as in Part 1 of this series (see Resources for the link). Then, add XML. Listing 7 recycles Excel-reading code from the previous article.

Listing 7. Start traversing through the Excel spreadsheet (ExcelXML.java)
// Traverse the workbook's rows and cells.			
// Remember, excelWB is the workbook object obtained earlier.

// Just use the first sheet in the book to keep the example simple.
// Pretend this is an outer loop (looping through sheets).

HSSFSheet oneSheet = excelWB.getSheetAt(0);

      // Now get the number of rows in the sheet
      int rows = oneSheet.getPhysicalNumberOfRows();				

      // Middle loop: Loop through rows in the sheet

         for (int rowNumber = 0; rowNumber < rows; rowNumber++) {
            HSSFRow oneRow = oneSheet.getRow(rowNumber);

            // Skip empty (null) rows.
            if (oneRow == null) {
               continue;
            }

As you iterate through the spreadsheet's rows, create XML elements to represent the rows, as in Listing 8.

Listing 8. Create a row element (ExcelXML.java)
            // Create an XML element to represent the row.
            Element rowElement = new Element("row");

Don't attach the rows to the Document yet, in case you have empty rows or null rows. If the row isn't empty after adding cells, you can attach it to the root element at the bottom of the row loop.

Next, start the inner loop to read the cells, as in Listing 9.

Listing 9. Continue to iterate through the Excel cells (ExcelXML.java)
            // Get the number of cells in the row
            int cells = oneRow.getPhysicalNumberOfCells();

            // Inner loop: Loop through each cell in the row

            for (int cellNumber = 0; cellNumber < cells; cellNumber++) {
               HSSFCell oneCell = oneRow.getCell(cellNumber);

               // If the cell is blank, the cell object is null, so don't 
               // try to use it. It will cause errors.
               // Use continue to skip it and just keep going.
               if (oneCell == null) {
                  continue;
               }

Once you're inside the inner loop, create an XML element to represent the cell, using the appropriate column name as the element name. In Listing 10, because element names can't be empty, each name defaults to header. After the first row, calculate a new name based on the data stored in the first row elements, which contain the Excel spreadsheet column names.

Listing 10. Create elements for the cells using the column names as element names (ExcelXML.java)
// Set up a string to use just "header" as the element name
// to store the column header cells themselves.

String elementName="header";

// Figure out the column position of the Excel cell.
int cellColumnNumber = oneCell.getColumnIndex();

// If on the first Excel row, don't change the element name from "header,"  
// because the first row is headers. Before changing the element name,
// test to make sure you're past the first row, which is the zero row.

if (rowNumber > 0)

 // Set the elementName variable equal to the column name
 elementName=reportRoot.getFirstChildElement("row").getChild(cellColumnNumber).getValue();

// Remove weird characters and spaces from elementName,
// as they're not allowed in element names.
elementName = elementName.replaceAll("[\\P{ASCII}]","");
elementName = elementName.replaceAll(" ", "");

// Create an XML element to represent the cell, using 
// the calculated elementName

Element cellElement = new Element(elementName);

A lot happens in Listing 10. In the long line in the middle beneath the comment, // Set the elementName variable equal to the column name, elementName is set equal to something. Reading the end of the line, you can see it's set to the text value inside an element using getValue(). Which element's value does it use?

Inside the reportRoot, inside the first row element (reportRoot.getFirstChildElement("row")), the code locates a child element by its index number using getChild(cellColumnNumber). You have already stored the first row of the spreadsheet in the first row element, so the values of the row's child elements are the column names in the spreadsheet. The index number is the same as the current cell's column number in the spreadsheet. So the value set for the elementName is the corresponding column name from the first row of heading elements.

Next, the code scrubs the elementName string of illegal characters likely to exist in spreadsheets. First, the replaceAll() method of String replaces all non-ASCII characters with empty strings. Then, it replaces all spaces. Both lines use regular expressions. For information on regular expressions, see Resources.

Finally, the last line of Listing 10 creates the element using the proper column name.


Appending attributes and elements

Just like elements, you can create attributes independently, and they can remain free-floating until you append them to an element using its addAttribute() method. Create an attribute, and then populate it with cell metadata using a getter method such as getDataFormatString() from the Apache POI's HSSFCell object, as in Listing 11.

Listing 11. Add Attributes (ExcelXML.java)
// Create an attribute to hold the cell's format.
// May be repeated for any other formatting item of interest.
String attributeValue = oneCell.getCellStyle().getDataFormatString();
Attribute dataFormatAttribute = new Attribute("dataFormat", attributeValue);

// Add the attribute to the cell element
cellElement.addAttribute(dataFormatAttribute);

Now, the element exists and has an attribute.

To get data for the element, remember to test the data type of each HSSFCell, so you know which getter method to use. Because you're testing for the data type anyway, you can also create an attribute that stores the cell's data type information.

Adding data to the element and appending the element as a child of the row is straightforward when working with string values, as in Listing 12.

Listing 12. Add Attributes, append the cell text to the element, and append the element into the row (ExcelXML.java)
switch (oneCell.getCellType()) {

   case HSSFCell.CELL_TYPE_STRING:

      // If the cell value is string, create an attribute
      // for the cellElement to state the data type is a string

      Attribute strTypeAttribute = new Attribute("dataType", "String");
      cellElement.addAttribute(strTypeAttribute);

      // Append the cell text into the element
      cellElement.appendChild(oneCell.getStringCellValue());

      // Append the cell element into the row
      rowElement.appendChild(cellElement);

      break;

Repeat the case section for each data type. But numeric data can be tricky. Part 1 of this article series pointed out that extracted Excel data is not the same as spreadsheet data (see the link to Part 1 in Resources). It is raw data. Certain numbers, like dates, look awfully funny as raw data and would not be the correct values if stored without formatting. You'll want to properly format numeric data before it goes into an element. You can use the Apache POI class HSSFDataFormatter and its method formatCellValue() to accomplish this. See Listing 13.

Listing 13. Add Attributes, format numeric data, and append the cell element (ExcelXML.java)
   case HSSFCell.CELL_TYPE_NUMERIC:
      // If the cell value is a number, create an attribute
      // for the cellElement to state the data type is numeric
      Attribute cellAttribute = new Attribute("dataType", "Numeric");

      // Add the attribute to the cell
      cellElement.addAttribute(cellAttribute);

      // Apply the formatting from the cells to the raw data
      // to get the right format in the XML. First, create an
      // HSSFDataFormatter object.

      HSSFDataFormatter dataFormatter = new HSSFDataFormatter();

      // Then use the HSSFDataFormatter to return a formatted string
      // from the cell rather than a raw numeric value:
      String cellFormatted = dataFormatter.formatCellValue(oneCell);

      //Append the formatted data into the element
      cellElement.appendChild(cellFormatted);

      // Append the cell element into the row
      rowElement.appendChild(cellElement);

      break;

Repeat the case section for each possible cell type. Check out the full example in ExcelXML.java.

After storing the cell data, close the inner loop. Before closing the middle loop, which represents rows, test whether the row element is empty. If it's not, append it to the root element. Then, close the middle loop, as in Listing 14.

Listing 14. Append the row element into the root element (ExcelXML.java)
      // End inner loop
      }

   // Append the row element into the root 
   // if the row isn't empty.  
   if (rowElement.getChildCount() > 0) {
      reportRoot.appendChild(rowElement);
   }

   // End middle loop	
   }

Now your Excel file is a complete XML document.


Inside the XML

When using XML to perform calculations, such as figuring out 1% of salaries, you must convert between strings and numbers. Listing 15 provides an example.

Listing 15. Calculate 1% of the salary and store it in a donation element (ExcelXML.java)
// To get employees' salaries, iterate through row elements and get a collection of rows

Elements rowElements = reportRoot.getChildElements("row");

// For each row element

for (int i = 0; i < rowElements.size(); i++) {

   // Get the salary element, 
   // Calculate 1% of it and store it in a donation element.
      // Unless it's the first row (0), which needs a header element.
   if (i==0) {
      Element donationElement = new Element("header");
      donationElement.appendChild("Donation");

      Attribute dataType = new Attribute("dataType","String");
      donationElement.addAttribute(dataType);

      Attribute dataFormat = new Attribute("dataFormat","General");
      donationElement.addAttribute(dataFormat);

      // Append the donation element to the row element.
      rowElements.get(i).appendChild(donationElement);
   }

   // If the row is not the first row, put the donation in the element.
   else {
      Element donationElement = new Element("Donation");

      Attribute dataType = new Attribute("dataType","Numeric");
      donationElement.addAttribute(dataType);

      // The dataFormat of the donation should be the same 
      // number format as salary, which looking at the XML file tells
      // us is "#,##0".
      Attribute dataFormat = new Attribute("dataFormat","#,##0");
      donationElement.addAttribute(dataFormat);

      // Get the salary element and its value
      Element salaryElement = rowElements.get(i).getFirstChildElement("Salary");
      String salaryString = salaryElement.getValue();

      // Calculate 1% of the salary. Salary is a string
      // with commas, so it 
      // must be converted for the calculation.

      // Get a java.text.NumberFormat object for converting string to a double
      NumberFormat numberFormat = NumberFormat.getInstance(); 

      // Use numberFormat.parse() to convert string to double.
      // Throws ParseException
      Number salaryNumber = numberFormat.parse(salaryString);

      // Use Number.doubleValue() method on salaryNumber to 
      // return a double to use in the calculation.
      // Perform the calculation to figure out 1%.
      double donationAmount = salaryNumber.doubleValue()*.01;

      // Append the value of the donation into the donationElement.
      // donationAmount is a double and must be converted to a string.
      donationElement.appendChild(Double.toString(donationAmount));

      // Append the donation element to the row element
      rowElements.get(i).appendChild(donationElement);

      //End else
      }

// End for loop 
}

Now you have stored an extra Donation element for each row. You have finished building your XML Document object.

XOM makes it easy to write your Document object to an XML file. Use nu.xom.Serailizer.write(), as in Listing 16.

Listing 16. Writing XML to Excel (ExcelXML.java)
   // Print out the XML version of the spreadsheet to see it in the console
   System.out.println(XMLReport.toXML());

   // To save the XML into a file for GEE WHIS, start with a FileOutputStream
   // to represent the file to write, C:\Planet Power\GEE_WHIS.xml.
   FileOutputStream hamsterFile = new FileOutputStream("C:\\Planet Power\\GEE_WHIS.xml");

   // Create a serializer to handle writing the XML
   Serializer saveTheHamsters = new Serializer(hamsterFile);

   // Set child element indent level to 5 spaces to make it pretty
   saveTheHamsters.setIndent(5);

   // Write the XML to the file C:\Planet Power\GEE_WHIS.xml
   saveTheHamsters.write(XMLReport);

The hamsters will love their new donation report. XML is their native language.


Writing back to Excel

To write the XML to an Excel spreadsheet, iterate through the XML and set cell values and formatting. Listing 17 sets up and starts looping through rows.

Listing 17. Set up to write XML to Excel (ExcelXML.java)
// Create a new Excel workbook and iterate through the XML 
// to fill the cells.
// Create an Excel workbook object 
HSSFWorkbook donationWorkbook = new HSSFWorkbook();

// Next, create a sheet for the workbook.	
HSSFSheet donationSheet = donationWorkbook.createSheet(); 

// Iterate through the row elements and then cell elements

// Outer loop: There was already an elements collection of all row elements
// created earlier. It's called rowElements. 
// For each row element in rowElements:

for (int j = 0; j < rowElements.size(); j++) {

   // Create a row in the workbook for each row element (j)
   HSSFRow createdRow = donationSheet.createRow(j);

   // Get the cell elements from that row element and add them to the workbook.
   Elements cellElements = rowElements.get(j).getChildElements();

Like looping through rows, looping through cell creation is straightforward. The harder part is formatting the cells.

An HSSFCellStyle object represents style options for a cell, like font, border, and numeric formatting (including date and time formats). However, styles are per workbook, not per cell. The HSSFCellStyle object represents a named style existing in the workbook that can be applied to a cell. These styles are groupings of style options, like named styles in Microsoft Office Word. Similarly, an HSSFDataFormat is created per workbook but represents only numeric formatting, like date and time formats.

To style a cell, create a new HSSFCellStyle for the workbook, or use an existing HSSFCellStyle. Then, apply it to the cell using HSSFCell.setCellStyle(). To set a numeric format for a cell, set the numeric format of the HSSFCellStyle, not the cell. Then, apply the HSSFCellStyle to the cell.

The HSSFDataFormat objects are indexed by number in the workbook. To tell an HSSFCellStyle which HSSFDataFormat to use, you need the index number of the HSSFDataFormat. This is a numeric short, not an HSSFDataFormat object.

Fortunately, the HSSFDataFormat object has a method called getFormat(). When passed a string representing a desired format, it returns the index number of the HSSFDataFormat that matches the string. The index is returned as a numeric short. If there is no match, it creates a new HSSFDataFormat and returns its index. You can use that index to apply the formatting to the cell style and apply the cell style to the cell, as in Listing 18.

Listing 18. Loop through cell elements and set up proper numeric formatting before inserting data
   // Middle loop: Loop through the cell elements.
   for (int k = 0; k < cellElements.size(); k++) {	

      // Create cells and cell styles. Use the row's
      // createCell (int column) method.
      // The column index is the same as the cell element index, which is k.
      HSSFCell createdCell = createdRow.createCell(k);	

      // To set the cell data format, retrieve it from the attribute 
      // where it was stored: the dataFormat attribute. Store it in a string.
      String dataFormatString = cellElements.get(k).getAttributeValue("dataFormat");

      // Create an HSSFCellStyle using the createCellStyle() method of the workbook.
      HSSFCellStyle currentCellStyle = donationWorkbook.createCellStyle();

      // Create an HSSFDataFormat object from the workbook's method
      HSSFDataFormat currentDataFormat = donationWorkbook.createDataFormat();

      // Get the index of the HSSFDataFormat to use. The index of the numeric format
      // matching the dataFormatString is returned by getFormat(dataFormatString).
      short dataFormatIndex = currentDataFormat.getFormat(dataFormatString);

      // Next, use the retrieved index to set the HSSFCellStyle object's DataFormat.
      currentCellStyle.setDataFormat(dataFormatIndex);

      // Then apply the HSSFCellStyle to the created cell.
      createdCell.setCellStyle(currentCellStyle);

After setting the cell's style, use setCellValue() to put most data types in the cell.

However, numeric data needs special handling. To store numbers as numbers rather than text, convert them to doubles first. Do not convert dates to doubles, or they will be incorrect. Test the data format of numeric data to determine whether it's a date (see Listing 19).

Listing 19. Insert cell data, converting to doubles for certain numeric data
      // Set cell value and types depending on the dataType attribute

      if (cellElements.get(k).getAttributeValue("dataType")=="String") {
         createdCell.setCellType(HSSFCell.CELL_TYPE_STRING);
         createdCell.setCellValue(cellElements.get(k).getValue());
      }

      if (cellElements.get(k).getAttributeValue("dataType")=="Numeric") {
         createdCell.setCellType(HSSFCell.CELL_TYPE_NUMERIC);

         // In this spreadsheet, number styles are times, dates,
         // or salaries. To store as a number and not as text,
         // salaries should be converted to doubles first.
         // Dates and times should not be converted to doubles first,
         // or you'll be inputting the wrong date or time value.
         // Dates and times can be entered as Java Date objects.

         if (cellElements.get(k).getAttributeValue("dataFormat").contains("#")) {

            // If formatting contains a pound sign, it's not a date.
            // Use a Java NumberFormat to format the numeric type cell as a double,
            // because like before, the element has commas in it.
            NumberFormat numberFormat = NumberFormat.getInstance(); 
            Number cellValueNumber = numberFormat.parse(cellElements.get(k).getValue());
            createdCell.setCellValue(cellValueNumber.doubleValue());

            // Add a hyperlink to the fictional GEE WHIS Web site just
            // to demonstrate that you can.
            HSSFHyperlink hyperlink = new HSSFHyperlink(HSSFHyperlink.LINK_URL);
            hyperlink.setAddress("http://www.ibm.com/developerworks/");
            createdCell.setHyperlink(hyperlink);
         }

         else {
            // if it's a date, don't convert to double
            createdCell.setCellValue(cellElements.get(k).getValue());
         }

      }

//  Handle formula and error type cells. See ExcelXML.java for the full example.

      //End middle (cell) for loop
      }
   // End outer (row) for loop	
   }

Formatting is also necessary for Excel functions that create dates, such as TODAY() and NOW(). See Listing 20.

Listing 20. Use Excel functions with proper formatting
// Demonstrate functions:
// Add the TODAY() and NOW() functions at bottom of the Excel report
// to say when the workbook was opened.

// Find the last row and increment by two to skip a row
int lastRowIndex = donationSheet.getLastRowNum()+2;

// Create a row and three cells to hold the information.
HSSFRow lastRow = donationSheet.createRow(lastRowIndex);
HSSFCell notationCell = lastRow.createCell(0);
HSSFCell reportDateCell = lastRow.createCell(1);
HSSFCell reportTimeCell = lastRow.createCell(2);

// Set a regular string value in one cell
notationCell.setCellValue("Time:");

// Setting formula values uses setCellFormula()
reportDateCell.setCellFormula("TODAY()");
reportTimeCell.setCellFormula("NOW()");	

// Create HSSFCellStyle objects for the date and time cells.
// Use the createCellStyle() method of the workbook.

HSSFCellStyle dateCellStyle = donationWorkbook.createCellStyle();
HSSFCellStyle timeCellStyle = donationWorkbook.createCellStyle();

// Get a HSSFDataFormat object to set the time and date formats for the cell styles
HSSFDataFormat dataFormat = donationWorkbook.createDataFormat();

// Set the cell styles to the right format by using the index numbers of
// the desired formats retrieved from the getFormat() function of the HSSFDataFormat.
dateCellStyle.setDataFormat(dataFormat.getFormat("m/dd/yy"));
timeCellStyle.setDataFormat(dataFormat.getFormat("h:mm AM/PM"));

// Set the date and time cells to the appropriate HSSFCellStyles.
reportDateCell.setCellStyle(dateCellStyle);
reportTimeCell.setCellStyle(timeCellStyle);

Finally, after finishing the desired workbook object, write it to a file using the workbook's write() method (Listing 21).

Listing 21. Write the Excel workbook to a file
// Write out the workbook to a file. First,
// you need some sort of OutputStream to represent the file.
	
	String filePathString = "C:\\Planet Power\\Employee_Donations.xls";
	FileOutputStream donationStream = new FileOutputStream(filePathString);
	
	donationWorkbook.write(donationStream);

You have now written an Excel spreadsheet calculating donations to GEE WHIS.


Conclusion

The reports are done. Along with reading Excel and creating XML, Java programmers can now write XML back to Excel files. After understanding the basics of converting between the two formats, you might feel yourself globally warming to the whole idea of reporting.

The Big Boss is happy, and you've done your part toward helping Planet Power's environmental superheroes save Genetically Engineered Enormous Wild Hamsters. Everyone's happy. Reporting really can be good for the environment.


Download

DescriptionNameSize
Sample Excel spreadsheet and Java codeJava-Excel-XML-Planet-Power2.zip17KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=470121
ArticleTitle=Read, recycle, and reuse: Reporting made easy with Excel, XML, and Java technologies, Part 2
publish-date=03022010