Introduction
Over the last few years, XML has become a universal data format. In this updated tutorial, I'll show you the most common programming interfaces for working with XML documents in the Java language.
The most common XML processing task is parsing an XML document. Parsing involves reading an XML document to determine its structure and contents. One of the pleasures of XML programming is the availability of open-source, no-cost XML parsers that read XML documents for you. This tutorial focuses on creating parser objects, asking those parsers to process XML files, and handling the results. As you might expect, you can do these common tasks in several different ways; I'll examine the standards involved as well as when you should use one approach or another.
A number of programming interfaces have been created to simplify writing Java programs that process XML. These interfaces have been defined by companies, by standards bodies, and by user groups to meet the needs of XML programmers. In this tutorial, I'll cover the following interfaces:
- The Document Object Model (DOM), Level 2
- The Simple API for XML (SAX), Version 2.0
- JDOM, a simple Java API created by Jason Hunter and Brett McLaughlin
- The Java API for XML Processing (JAXP)
The first three of these four interfaces (DOM, SAX, and JDOM) define how the contents of an XML document are accessed and represented. JAXP contains classes for creating parser objects. To create DOM or SAX parsers, you'll use JAXP. When you use JDOM, the JDOM library uses JAXP under the covers to create a parser. To sum it all up:
- You use DOM, SAX, or JDOM to work with the contents of an XML document.
- If you use DOM or SAX, you use JAXP to create a parser.
- If you use JDOM, the JDOM library creates a parser for you.
I'll explore the design goals, strengths, and weaknesses of each of these APIs, along with a bit of their histories and the standards bodies that created them.
Throughout this tutorial, I'll show you a number of sample programs that use the DOM, SAX, and JDOM APIs. All of them work with an XML-tagged Shakespearean sonnet. The structure of the sonnet is:
<sonnet>
<author>
<lastName>
<firstName>
<nationality>
<yearOfBirth>
<yearOfDeath>
</author>
<lines>
[14 <line> elements]
</lines>
</sonnet>
|
For the complete example, see sonnet.xml and sonnet.dtd (download to view in a text editor).
You'll need to set up a few things on your machine before you can run the examples. (I'm assuming that you know how to compile and run a Java program, and that you know how to set your CLASSPATH variable.)
- First, visit the home page of the Xerces XML parser at the Apache XML Project (http://xml.apache.org/xerces2-j/). You can also go directly to the download page (http://xml.apache.org/xerces2-j/download.cgi).
- Unzip the file that you downloaded from Apache. This creates a directory named
xerces-2_5_0or something similar, depending on the release level of the parser. The JAR files you need (xercesImpl.jarandxml-apis.jar) should be in the Xerces root directory. - Visit the JDOM project's Web site and download the latest version of JDOM (http://jdom.org/).
- Unzip the file you unloaded from JDOM. This creates a directory named
jdom-b9or something similar. The JAR file you need (jdom.jar) should be in thebuilddirectory. - Finally, download the zip file of examples for this tutorial, xmlprogj.zip, and unzip the file.
- Add the current directory (
.),xercesImpl.jar,xml-apis.jar, andjdom.jarto yourCLASSPATH.




