Skip to main content

Tip: Set up a SAX parser

Use properties and features in SAX parsers

Brett McLaughlin (brett@oreilly.com), Author, O'Reilly and Associates
Photo of Brett McLaughlin
Brett McLaughlin has been working in computers since the Logo days. (Remember the little triangle?) He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Summary:  This is the first in a series of tips that will serve as a comprehensive guide to using XML from the Java programming language. I begin with coverage of the SAX API. This tip reviews getting an instance of a SAX parser and setting various features and properties on that parser. Also, be sure to participate in the developerWorks XML and Java technology forum, hosted by Brett McLaughlin.

View more content in this series

Date:  02 Jul 2003
Level:  Introductory
Activity:  8312 views

Working with XML from Java is a pretty rich topic; multiple APIs are available, and many of these make working with XML as easy as reading lines from a text document. Tree-based APIs like DOM present an in-memory XML structure that is optimal for GUIs and editors, and stream-based APIs like SAX are great for high-performance applications that only need to get at a document's data. In this series of tips, I walk you through the use of XML from Java, starting with the basics. Along the way, you'll learn lots of tricks that many of the pros don't even know about, so stick around even if you already have some XML experience.

I begin with SAX -- the Simple API for XML. While this API is probably the hardest of the Java and XML APIs to master, it's also arguably the most powerful. Additionally, most other API implementations (like DOM parsers, JDOM, dom4j, and so forth) are based in part on a SAX parser. Understanding SAX gives you a headstart on everything else you do in XML and the Java language. In this tip specifically, I'll cover getting an instance of a SAX parser and setting some basic features and properties of that parser.

Note: I'm assuming you have downloaded a SAX-compliant parser, such as Apache Xerces-J (see Resources for links). The Apache site has a wealth of information on how to get things set up, but basically you just need to drop the downloaded JAR files into your CLASSPATH. These examples assume that your parser is available for use.

Getting a parser

The first step in working with SAX is actually getting an instance of a parser. In SAX, the parser is represented by an instance of the org.xml.sax.XMLReader class. I covered this in detail in a previous tip ("Achieving vendor independence with SAX" -- see Resources), so I won't spend much time on it here. Listing 1 shows the correct way to get a new SAX parser instance without writing vendor-dependent code.


Listing 1. Getting a SAX parser instance
                
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

Using this methodology, you need to set the system property org.xml.sax.driver to the class name of the parser you want to load. This is a vendor-specific class; for Xerces it should be org.apache.xerces.parsers.SAXParser. You specify this argument with the -D switch to your Java compiler:

java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser some.sample.Class

Of course, you want to ensure that the class specified exists and is on your class path.


Features

Once you have an instance of your parser, you need to configure it. Note that this isn't the same as setting up the parser to deal with errors, content, or structures in XML; instead, configuration is the process of actually telling the parser how to behave. You may turn on validation, turn off namespace checking, and expand entities. These behaviors are totally independent of a specific XML document, and therefore involve interaction with your new parser instance.

Note: For those of you who are overly anxious (I know you're out there), I will indeed be dealing with content, error handling, and the like. However, those subjects will be addressed in future tips, so you'll have to check back. For now, just concentrate on configuration, features, and properties.

You can configure parsers in two ways: features and properties. Features involve turning on or off a specific piece of functionality, like validation. Properties involve setting the value of a specific item that the parser uses, like the location of a schema to validate all documents against. I'll deal with features first, and then look at properties in the next section.

Features are set, not surprisingly, through a method on your parser called setFeature(). The syntax looks like that in Listing 2.


Listing 2. Setting features on a SAX parser
                
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

String featureName = "some feature URI";
boolean featureOn = true;

try {
  parser.setFeature(featureName, featureOn);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

This is pretty self-explanatory; the key is knowing the common features available to SAX parsers. Each feature is identified by a specific URI. A complete list of these URIs is available online at the SAX Web site (see Resources). Some of the most common features are validation and namespace processing. Listing 3 shows an example of setting both of these properties.


Listing 3. Some common features
                
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

try {
  // Turn on validation
  parser.setFeature("http://xml.org/sax/features/validation", true);
  // Ensure namespace processing is on (the default)
  parser.setFeature("http://xml.org/sax/features/namespaces", true);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

Note that while parsers have several standard SAX features, they are free to add their own vendor-specific features. For example, Apache Xerces-J adds features that allow for dynamic validation and the continuance of processing after encountering a fatal error. Consult your parser vendor's documentation for the relevant feature URIs.


Properties

Once you understand features, making sense of properties is easy. They behave in exactly the same manner, except that properties take an object as an argument where features take in a boolean value. You use the setProperty() method for this purpose, as shown in Listing 4.


Listing 4. Setting properties on a SAX parser
                
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

String propertyName = "some property URI";

try {
  parser.setProperty(propertyName, obj-arg);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown property specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported property specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting property: " + e.getMessage());
}

The same error-handling framework is in play here, so you can easily duplicate code between the two types of configuration options. As with features, SAX provides a standard set of properties, and vendors can add their own extensions. Common SAX-standard properties allow for setting a Lexical Handler and a Declaration Handler (two handlers I'll discuss in later tips). Parsers like Apache Xerces extend these with, for example, the ability to set the input buffer size and the location of an external schema to use in validation. Listing 5 shows a few properties in action.


Listing 5. Some common properties
                
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

try {
  // Set the chunk to read in by SAX
  parser.setProperty("http://apache.org/xml/properties/input-buffer-size", 
      new Integer(2048));
  // Set a LexicalHandler
  parser.setProperty("http://xml.org/sax/properties/lexical-handler", 
      new MyLexicalHandler());
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

With an understanding of features and properties, you can make your parser do almost anything. Once you understand setting up your parser in this fashion, you're ready for my next tip, which will discuss building a basic content handler. Until then, I'll see you online in the XML and Java technology forum.


Resources

About the author

Photo of Brett McLaughlin

Brett McLaughlin has been working in computers since the Logo days. (Remember the little triangle?) He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=10836
ArticleTitle=Tip: Set up a SAX parser
publish-date=07022003
author1-email=brett@oreilly.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers