 | Level: Introductory Brett McLaughlin (brett@oreilly.com), Author, O'Reilly and Associates
22 Jul 2003 This tip details the process of creating a SAX ContentHandler, the construct that handles user-defined logic in SAX parsing. You will understand the SAX package structure, see its relation to the SAX ContentHandler class, and get a handle on callback methods and their use in SAX parsing.
In my last tip, you learned how to set some basic features and properties of parsing on your SAX parser (in the form of an instance
of the XMLReader class). Those features and properties all related to the basic handling of all XML documents that the
parser interacted with, and included such things as validation, namespace handling, and entity expansion. While these
are certainly important aspects of parsing, they are not tailored to a specific document format (such as XML that handles
orders from an online store, or XML that represents the inventory of a machine shop). When it comes to writing logic
that interacts with the parsing process itself, you want to write a SAX
ContentHandler
.
A ContentHandler is a specific SAX interface, located at org.xml.sax.ContentHandler. This
interface defines the methods shown in Listing 1; you should familiarize yourself with these, as they are the basis
of all SAX processing.
Listing 1. The org.xml.sax.ContentHandler interface
package org.xml.sax;
public interface ContentHandler
{
public void setDocumentLocator (Locator locator);
public void startDocument ()
throws SAXException;
public void endDocument()
throws SAXException;
public void startPrefixMapping (String prefix, String uri)
throws SAXException;
public void endPrefixMapping (String prefix)
throws SAXException;
public void startElement (String uri, String localName,
String qName, Attributes atts)
throws SAXException;
public void endElement (String uri, String localName,
String qName)
throws SAXException;
public void characters (char ch[], int start, int length)
throws SAXException;
public void ignorableWhitespace (char ch[], int start, int length)
throws SAXException;
public void processingInstruction (String target, String data)
throws SAXException;
public void skippedEntity (String name)
throws SAXException;
}
|
Each method in this interface provides a hook for you to insert your own custom code into the XML parsing
process. Before diving into the details of each method, though, I'll show you a dummy implementation and how to
register the implementation with your parser. Listing 2 is a simple handler, DummyHandler, that provides
an empty method body for all the required ContentHandler methods.
Listing 2. Implementing the ContentHandler interface (very simply)
import org.xml.sax.*;
public class DummyHandler implements ContentHandler
{
public void setDocumentLocator (Locator locator) { }
public void startDocument ()
throws SAXException { }
public void endDocument()
throws SAXException { }
public void startPrefixMapping (String prefix, String uri)
throws SAXException { }
public void endPrefixMapping (String prefix)
throws SAXException { }
public void startElement (String uri, String localName,
String qName, Attributes atts)
throws SAXException { }
public void endElement (String uri, String localName,
String qName)
throws SAXException { }
public void characters (char ch[], int start, int length)
throws SAXException { }
public void ignorableWhitespace (char ch[], int start, int length)
throws SAXException { }
public void processingInstruction (String target, String data)
throws SAXException { }
public void skippedEntity (String name)
throws SAXException { }
}
|
Now, you can take an instance of DummyHandler and attach it to your parser, as shown in Listing 3.
Listing 3. Registering a ContentHandler
// Obtain an instance of an XMLReader implementation
// from a system property
XMLReader
parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
// Create a new instance and register it with the parser
ContentHandler contentHandler = new DummyHandler();
parser.setContentHandler(contentHandler);
// Don't worry about this for now -- we'll get to it later
parser.parse(myXMLURI);
|
To understand exactly how the interaction between the parser and the ContentHandler takes place, you
first need to understand what a callback method is. When the XML parser begins to parse your XML input document,
it encounters certain specific events, such as the beginning of the document, the character data in an element,
and the end of an element. Each of these events has an association with a specific method in the ContentHandler
interface; using the examples here, the relevant methods are startDocument(),
characters(), and endElement(), respectively. At each event, the parser calls back
to the content handler, temporarily passing control to the handler, along with some information about the event,
such as the name of the element or the characters being processed. It is at this point, in a callback method,
that your programming logic gets to take part. When the callback method is finished, program flow returns to the parser,
and the process repeats again.
That's a bit of a mouthful, so I'll give you an example. Listing 4 is called HelloHandler, and has a very simple
print statement embedded in each method of the ContentHandler interface. You can use this to do some simple testing.
Listing 4. The HelloHandler class
import org.xml.sax.*;
public class HelloHandler implements ContentHandler
{
public void setDocumentLocator (Locator locator) {
System.out.println("Hello from setDocumentLocator()!");
}
public void startDocument ()
throws SAXException {
System.out.println("Hello from startDocument()!");
}
public void endDocument()
throws SAXException {
System.out.println("Hello from endDocument()!");
}
public void startPrefixMapping (String prefix, String uri)
throws SAXException {
System.out.println("Hello from startPrefixMapping()!");
}
public void endPrefixMapping (String prefix)
throws SAXException {
System.out.println("Hello from endPrefixMapping()!");
}
public void startElement (String uri, String localName,
String qName, Attributes atts)
throws SAXException {
System.out.println("Hello from startElement()!");
}
public void endElement (String uri, String localName,
String qName)
throws SAXException {
System.out.println("Hello from endElement()!");
}
public void characters (char ch[], int start, int length)
throws SAXException {
System.out.println("Hello from characters()!");
}
public void ignorableWhitespace (char ch[], int start, int length)
throws SAXException {
System.out.println("Hello from ignorableWhitespace()!");
}
public void processingInstruction (String target, String data)
throws SAXException {
System.out.println("Hello from processingInstruction()!");
}
public void skippedEntity (String name)
throws SAXException {
System.out.println("Hello from skippedEntity()!");
}
}
|
This is simple enough, right? It shows you when each method is called. The only thing left is to
define a basic XML document, and put things in motion. Listing 5 shows a very simple XML document.
Listing 5. The foo.xml document
<?xml version="1.0"?>
<root>
<some-element>Some content in the element</some-element>
<some-other-element>
<child>
More content
</child>
</some-other-element>
</root>
|
Now, you can run your test parsing program -- I've included a complete working version for reference in Listing 6 -- and you should get output similar to Listing 7.
Listing 6. The TestParse sample class
import org.xml.sax.*;
public class TestParse {
public static void main(String[] args) {
try {
XMLReader parser =
org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
// Create a new instance and register it with the parser
ContentHandler contentHandler = new HelloHandler();
parser.setContentHandler(contentHandler);
// Don't worry about this for now -- we'll get to it later
parser.parse("foo.xml");
} catch (Exception e) {
e.printStackTrace();
}
}
}
|
Listing 7. Output from the TestParse class
[aragorn:~/dev] bmclaugh%
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser
TestParse
Hello from setDocumentLocator()!
Hello from startDocument()!
Hello from startElement()!
Hello from characters()!
Hello from startElement()!
Hello from characters()!
Hello from endElement()!
Hello from characters()!
Hello from startElement()!
Hello from characters()!
Hello from startElement()!
Hello from characters()!
Hello from characters()!
Hello from endElement()!
Hello from characters()!
Hello from endElement()!
Hello from characters()!
Hello from endElement()!
Hello from endDocument()!
|
Looking closely at the output, you can begin to get an idea of how your callback methods are used: When parsing begins, the document locator is set (something I'll look at in great detail in a future tip), the start of the document is handled, and then things take off. The start of an element is encountered (some-element), then some content, the end of that element, and so on. Of course, this handler isn't that useful, as it's often hard to tell exactly what element is being processed, especially as elements are nested.
However, I'm out of time (and space) in this tip, so I'm going to leave improving this handler to you as an exercise. Try to write a version of HelloHandler (call it InfoHandler) that prints out the method being called, as well as the arguments supplied to that method. This will help you see more clearly what's going on in each callback. In the next tip, I'll show you my code for that handler, and I'll begin to delve into callbacks more deeply, and see what each does. Until then, have fun, and I'll see you online and in the newsgroups!
Resources
About the author  | 
|  |
Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open-source EJB application server, and Cocoon, an open-source XML Web publishing engine. |
Rate this page
|  |