Tip: Implement XMLReader

An interface for XML converters

In this tip, Benoit Marchal explores APIs for XML pipelines. He concludes that the familiar XMLReader interface is appropriate for many XML components.

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

Benoit MarchalBenoit Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.



13 November 2003

One of the most popular designs for processing XML documents is the pipeline. A pipeline is a group of components where each component is responsible for a single step in the processing of XML documents. A document flows from one component to the next, gradually building the final output. Because it divides complex processes into smaller steps, the pipeline promotes a modular design. It is flexible, too: Adding or removing a component will change the processing.

The pipeline API

The javax.xml.transform API implements a crude pipeline with three components:

  • A reader (Source class)
  • A transformer (Transformer class)
  • A serializer (Result class)

Projects like Cocoon, Jelly, and GNU implement more sophisticated pipelines. Unfortunately for the component developer, each project has a different API, which means it is difficult to write a component that is easy to integrate into all three projects.

To minimize porting work, one option is to design the component around basic and commonly available APIs. What better choice then than XMLReader and XMLFilter? XMLReader defines an interface to pass SAX events to an application. XMLFilter chains readers into a mini-pipeline. More importantly, every XML pipeline works with an XMLReader, making it a perfect candidate for generic components.

Even though XMLReader is one of the most common parser APIs, a lot of mystique surrounds it. There really shouldn't be. While XML parsers are complex animals, XMLReader is a straightforward API, with its methods falling into one of the following categories:

  • Most methods, such as setContentHandler() or setProperty(), register event handlers
  • Some methods, such as parse() and setFeature(), control the flow of events

Not surprisingly, this is exactly what is needed for a component to output XML documents as SAX events. As you will see in the next section, if your component writes an XML document, it takes an hour or less to package it as an XMLReader.


Implementing XMLReader

XMLReader defines a very simple model for controlling a SAX event flow. To illustrate how a component can implement it, you can wrap ZipFile in an XMLReader. ZipFile is part of the standard Java API (in the java.util.zip package) and makes it easy to read the directory of a ZIP file.

SAX events

Components that write XML documents have methods for printing the tags and the text content. The text printing method escapes special characters such as <. To use these printing methods, you might write code such as:

writeStartTag("zip:Entry");
writeContent(name);
writeEndTag("zip:Entry");

The first step toward turning the component into an XMLReader is to replace these methods with their SAX equivalents: startElement(), characters(), and endElement(), respectively. The SAX version is more verbose, but not more difficult:

contentHandler.startElement(NS_URI,"Entry","zip:Entry",attributes);
contentHandler.characters(name.toCharArray(),0,name.length());
contentHandler.endElement(NS_URI,"Entry","zip:Entry");

Next, move this code into the parse() method. Listing 1 is the relevant excerpt from the ZIP reader (see Resources for the complete listing). It generates an XML document (as SAX events) from the ZIP directory. Make sure you properly declare the XML namespaces. It is not enough to call startPrefixMapping(), because the application can request the xmlns: attributes by setting the http://xml.org/sax/features/namespace-prefixes feature to "true".

Listing 1. Excerpt from the parse() method
if(source.getSystemId() != null && contentHandler != null)
{
   URL url = new URL(source.getSystemId());
   ZipFile file = new ZipFile(url.getPath());
   Enumeration enum = file.entries();
   contentHandler.startDocument();
   contentHandler.startPrefixMapping("zip",NS_URI);
   if(namespacePrefixes)
      attributes.addAttribute("","zip","xmlns:zip","CDATA",NS_URI);
   contentHandler.startElement(NS_URI,"File","zip:File",attributes);
   attributes.clear();
   while(enum.hasMoreElements())
   {
      ZipEntry entry = (ZipEntry)enum.nextElement();
      String name = entry.getName();
      contentHandler.startElement(NS_URI,"Entry","zip:Entry",attributes);
      contentHandler.characters(name.toCharArray(),0,name.length());
      contentHandler.endElement(NS_URI,"Entry","zip:Entry");
   }
   contentHandler.endElement(NS_URI,"File","zip:File");
   contentHandler.endPrefixMapping("zip");
   contentHandler.endDocument();
}
else
   throw new FileNotFoundException("InputSource has no system id");

Getter and setter

The next step is to implement the getter and setter methods for the various interfaces that SAX uses. This is plain old Java coding. Listing 2 is an example for the content handler:

Listing 2. Getter and setter methods
public ContentHandler getContentHandler()
{
   return contentHandler;
}

public void setContentHandler(ContentHandler value)
   throws NullPointerException
{
   if(value == null)
      throw new NullPointerException("ContentHandler");
   else
      contentHandler = value;
}

Using the reader

The XML pipeline is well designed for processing XML documents. Since pipelines have no standard API, component writers have to improvise around other basic APIs such as XMLReader. As this tip illustrates, XMLReader is not for parsers only. Rather, it is for any tool that writes XML documents.

To illustrate, Listing 3 shows how to interface the ZIP reader with a Java transformer. Note that the code initializes a copy transformer (it takes no stylesheet) as a trick for saving the XML document to a file.

Listing 3. Using the reader
XMLReader reader = XMLReaderFactory.createXMLReader("org.ananas.tips.ZipReader");
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
Source source = new SAXSource(reader,new InputSource(args[0]));
Result result = new StreamResult(System.out);
transformer.transform(source,result);

As this tip has shown, XML component developers should consider using XMLReader. It offers a standard API for any component that outputs XML. It is not a very complex API to implement, as it mostly consists of setters and getters for SAX event handlers. Finally, every pipeline can take an XMLReader as a starting point, which makes it a very portable API.

Resources

  • Participate in the discussion forum for Benoit Marchal's Working XML column.
  • Download the source code used in this article.
  • Learn how to "Set up a SAX parser" (developerWorks, July 2003) as Brett McLaughlin discusses the initialization of a SAX parser.
  • See a more involved example of packaging an XML generator in the SAX API in "Wrapping Up XI" (developerWorks, July 2002) by Benoit Marchal.
  • Find out more about "SAX, the power API" (developerWorks, August 2001) in this article by Benoit Marchal.
  • Read about Jelly, an XML language from the Apache Jakarta project that offers XML pipelines.
  • Look into the Cocoon project -- also from Apache -- an XML server that is built around XML pipelines.
  • Learn more about GNU JAXP, an open-source implementation of JAXP that includes extensions and, more specifically, a pipeline API.
  • Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
  • Find out how you can become an IBM Certified Developer in XML and related technologies.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12342
ArticleTitle=Tip: Implement XMLReader
publish-date=11132003