Skip to main content

Tip: Implement XMLReader

An interface for XML converters

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoit Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.

Summary:  In this tip, Benoit Marchal explores APIs for XML pipelines. He concludes that the familiar XMLReader interface is appropriate for many XML components.

View more content in this series

Date:  13 Nov 2003
Level:  Intermediate
Activity:  3203 views

One of the most popular designs for processing XML documents is the pipeline. A pipeline is a group of components where each component is responsible for a single step in the processing of XML documents. A document flows from one component to the next, gradually building the final output. Because it divides complex processes into smaller steps, the pipeline promotes a modular design. It is flexible, too: Adding or removing a component will change the processing.

The pipeline API

The javax.xml.transform API implements a crude pipeline with three components:

  • A reader (Source class)
  • A transformer (Transformer class)
  • A serializer (Result class)

Projects like Cocoon, Jelly, and GNU implement more sophisticated pipelines. Unfortunately for the component developer, each project has a different API, which means it is difficult to write a component that is easy to integrate into all three projects.

To minimize porting work, one option is to design the component around basic and commonly available APIs. What better choice then than XMLReader and XMLFilter? XMLReader defines an interface to pass SAX events to an application. XMLFilter chains readers into a mini-pipeline. More importantly, every XML pipeline works with an XMLReader, making it a perfect candidate for generic components.

Even though XMLReader is one of the most common parser APIs, a lot of mystique surrounds it. There really shouldn't be. While XML parsers are complex animals, XMLReader is a straightforward API, with its methods falling into one of the following categories:

  • Most methods, such as setContentHandler() or setProperty(), register event handlers
  • Some methods, such as parse() and setFeature(), control the flow of events

Not surprisingly, this is exactly what is needed for a component to output XML documents as SAX events. As you will see in the next section, if your component writes an XML document, it takes an hour or less to package it as an XMLReader.


Implementing XMLReader

XMLReader defines a very simple model for controlling a SAX event flow. To illustrate how a component can implement it, you can wrap ZipFile in an XMLReader. ZipFile is part of the standard Java API (in the java.util.zip package) and makes it easy to read the directory of a ZIP file.

SAX events

Components that write XML documents have methods for printing the tags and the text content. The text printing method escapes special characters such as <. To use these printing methods, you might write code such as:

writeStartTag("zip:Entry");
writeContent(name);
writeEndTag("zip:Entry");

The first step toward turning the component into an XMLReader is to replace these methods with their SAX equivalents: startElement(), characters(), and endElement(), respectively. The SAX version is more verbose, but not more difficult:

contentHandler.startElement(NS_URI,"Entry","zip:Entry",attributes);
contentHandler.characters(name.toCharArray(),0,name.length());
contentHandler.endElement(NS_URI,"Entry","zip:Entry");

Next, move this code into the parse() method. Listing 1 is the relevant excerpt from the ZIP reader (see Resources for the complete listing). It generates an XML document (as SAX events) from the ZIP directory. Make sure you properly declare the XML namespaces. It is not enough to call startPrefixMapping(), because the application can request the xmlns: attributes by setting the http://xml.org/sax/features/namespace-prefixes feature to "true".


Listing 1. Excerpt from the parse() method
                
if(source.getSystemId() != null && contentHandler != null)
{
   URL url = new URL(source.getSystemId());
   ZipFile file = new ZipFile(url.getPath());
   Enumeration enum = file.entries();
   contentHandler.startDocument();
   contentHandler.startPrefixMapping("zip",NS_URI);
   if(namespacePrefixes)
      attributes.addAttribute("","zip","xmlns:zip","CDATA",NS_URI);
   contentHandler.startElement(NS_URI,"File","zip:File",attributes);
   attributes.clear();
   while(enum.hasMoreElements())
   {
      ZipEntry entry = (ZipEntry)enum.nextElement();
      String name = entry.getName();
      contentHandler.startElement(NS_URI,"Entry","zip:Entry",attributes);
      contentHandler.characters(name.toCharArray(),0,name.length());
      contentHandler.endElement(NS_URI,"Entry","zip:Entry");
   }
   contentHandler.endElement(NS_URI,"File","zip:File");
   contentHandler.endPrefixMapping("zip");
   contentHandler.endDocument();
}
else
   throw new FileNotFoundException("InputSource has no system id");

Getter and setter

The next step is to implement the getter and setter methods for the various interfaces that SAX uses. This is plain old Java coding. Listing 2 is an example for the content handler:


Listing 2. Getter and setter methods
                
public ContentHandler getContentHandler()
{
   return contentHandler;
}

public void setContentHandler(ContentHandler value)
   throws NullPointerException
{
   if(value == null)
      throw new NullPointerException("ContentHandler");
   else
      contentHandler = value;
}


Using the reader

The XML pipeline is well designed for processing XML documents. Since pipelines have no standard API, component writers have to improvise around other basic APIs such as XMLReader. As this tip illustrates, XMLReader is not for parsers only. Rather, it is for any tool that writes XML documents.

To illustrate, Listing 3 shows how to interface the ZIP reader with a Java transformer. Note that the code initializes a copy transformer (it takes no stylesheet) as a trick for saving the XML document to a file.


Listing 3. Using the reader
                
XMLReader reader = XMLReaderFactory.createXMLReader("org.ananas.tips.ZipReader");
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
Source source = new SAXSource(reader,new InputSource(args[0]));
Result result = new StreamResult(System.out);
transformer.transform(source,result);

As this tip has shown, XML component developers should consider using XMLReader. It offers a standard API for any component that outputs XML. It is not a very complex API to implement, as it mostly consists of setters and getters for SAX event handlers. Finally, every pipeline can take an XMLReader as a starting point, which makes it a very portable API.


Resources

  • Participate in the discussion forum for Benoit Marchal's Working XML column.

  • Download the source code used in this article.

  • Learn how to "Set up a SAX parser" (developerWorks, July 2003) as Brett McLaughlin discusses the initialization of a SAX parser.

  • See a more involved example of packaging an XML generator in the SAX API in "Wrapping Up XI" (developerWorks, July 2002) by Benoit Marchal.

  • Find out more about "SAX, the power API" (developerWorks, August 2001) in this article by Benoit Marchal.

  • Read about Jelly, an XML language from the Apache Jakarta project that offers XML pipelines.

  • Look into the Cocoon project -- also from Apache -- an XML server that is built around XML pipelines.

  • Learn more about GNU JAXP, an open-source implementation of JAXP that includes extensions and, more specifically, a pipeline API.

  • Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.

  • Find out how you can become an IBM Certified Developer in XML and related technologies.

About the author

Benoit Marchal

Benoit Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12342
ArticleTitle=Tip: Implement XMLReader
publish-date=11132003
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers