To make a comma-separated value (CSV) file to work with, I used DB2's EXPORT command for the sample database of employee records that ships with the product. The command I used was:
EXPORT to test.csv of DEL select * from employees |
That generated the file test.csv , which you can download.
Parsing the comma-separated value (CSV) file
After I had the data file, I started looking around for some code to parse comma-separated values. I didn't find exactly what I wanted, but I did stumble across the Java class StreamTokenizer. This class lets you design a rudimentary parser fairly easily. You select the delimiter between tokens, and it parses the file, converts data to strings and integers, and does some other nice things. You can view the code I wrote or download) it.
To use this file, I typed:
java csvParser test.csv output.xml |
That command opens and parses the CSV file and then converts it to XML. The XML for one employee looks like Listing 1. (You can also download output.xml.)
Listing 1. The XML output for one employee record from the CSV data sample
<?xml version="1.0"?>
<document>
<row>
<column1>000010</column1>
<column2>CHRISTINE</column2>
<column3>I</column3>
<column4>HAAS</column4>
<column5>A00</column5>
<column6>3978</column6>
<column7>19650101</column7>
<column8>PRES</column8>
<column9>18</column9>
<column10>F</column10>
<column11>19330824</column11>
<column12>52750</column12>
<column13>1000</column13>
<column14>4220</column14>
</row>
<row>
<column1>000020</column1>
...
</document>
|
When I started working on this, I thought the first line of the CSV file would contain the column names from DB2. I was going to use those names as the XML tag names. I didn't immediately find a way to get DB2 to export data in this format, so I just made up the column names, using the imaginative naming scheme you see in Listing 1.
Now that I had this XML-tagged data, I needed to write an XSLT style sheet that would convert the XML above into a more useful (and understandable) tag set. You can view the style sheet listing in a separate window, and you can also download it and view it locally.
To convert the generated XML document using the rules in the style sheet and then write the output to the file called employees.xml, I typed:
java org.apache.xalan.xslt.Process -in output.xml -xsl csv-stylesheet.xsl -out employees.xml |
If you want to run the transform code, you'll need the Xerces XML parser and version 1.0.0 of the Xalan style sheet processor (see Resources), both of which are free and available at xml.apache.org. The resulting XML looks like Listing 2. (You can also download employees.xml and employees.dtd.)
Listing 2. XML output of the document converted using csv-stylesheet.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE employees SYSTEM "employees.dtd">
<employees>
<employee sex="F">
<serial_number>000010</serial_number>
<name>
<first_name>CHRISTINE</first_name>
<middle_initial>I</middle_initial>
<last_name>HAAS</last_name>
</name>
<department>A00</department>
<phone>3978</phone>
<date_of_hire year="1965" month="01" day="01"/>
<job_title>PRES</job_title>
<years_of_education>18</years_of_education>
<date_of_birth year="1933" month="08" day="24"/>
<salary>52750</salary>
<bonus>1000</bonus>
<commission>4220</commission>
</employee>
<employee sex="M">
...
</employee>
</employees>
|
Figure 1. Excerpt of the HTML table converted from the CSV data via the style sheet employee-table.xsl
| Employee | Sex | Serial Number | Department | Job Title | Date of Hire | Years of Education |
| Phone | Date of Birth | Salary | Bonus | Commission | Total Compensation | |
| ADAMSON, BRUCE | M | 000150 | D11 | DESIGNER | 02/12/1972 | 16 |
| 4510 | 05/17/1947 | $25,280.00 | $500.00 | $2,022.00 | $27,802.00 | |
| BROWN, DAVID | M | 000200 | D11 | DESIGNER | 03/03/1966 | 16 |
| 4501 | 05/29/1941 | $27,740.00 | $600.00 | $2,217.00 | $30,557.00 | |
| GEYER, JOHN B. | M | 000050 | E01 | MANAGER | 08/17/1949 | 16 |
| 6789 | 09/15/1925 | $40,175.00 | $800.00 | $3,214.00 | $44,189.00 | |
| GOUNOT, JASON R. | M | 000340 | E21 | FIELDREP | 05/05/1947 | 16 |
| 5698 | 05/17/1926 | $23,840.00 | $500.00 | $1,907.00 | $26,247.00 | |
| HAAS, CHRISTINE I. | F | 000010 | A00 | PRES | 01/01/1965 | 18 |
| 3978 | 08/24/1933 | $52,750.00 | $1,000.00 | $4,220.00 | $57,970.00 |
You can view the entire table in a separate window, and also view the HTML source.
I hope this helps. I've certainly learned some things along the way, and I hope you have too.
- If you want to transform the XML documents, you'll need the Xerces XML parser and the Xalan
style sheet processor, both of which are free and available at xml.apache.org.

Doug Tidwell is a Senior Programmer at IBM. He has been writing code since the early days of the Reagan Administration. Although he swears he has been writing Java code since the late 1950s, the fact that he was born in the mid-1960s has led many to doubt this claim. His job as a Cyber Evangelist is to help customers evaluate and implement new technology. Thanks to a special arrangement with his employer, his salary is now paid entirely in chocolate truffles. He can be reached at dtidwell@us.ibm.com.




