Working with XML from Java is a pretty rich topic; multiple APIs are available, and many of these make working with XML as easy as reading lines from a text document. Tree-based APIs like DOM present an in-memory XML structure that is optimal for GUIs and editors, and stream-based APIs like SAX are great for high-performance applications that only need to get at a document's data. In this series of tips, I walk you through the use of XML from Java, starting with the basics. Along the way, you'll learn lots of tricks that many of the pros don't even know about, so stick around even if you already have some XML experience.
I begin with SAX -- the Simple API for XML. While this API is probably the hardest of the Java and XML APIs to master, it's also arguably the most powerful. Additionally, most other API implementations (like DOM parsers, JDOM, dom4j, and so forth) are based in part on a SAX parser. Understanding SAX gives you a headstart on everything else you do in XML and the Java language. In this tip specifically, I'll cover getting an instance of a SAX parser and setting some basic features and properties of that parser.
Note: I'm assuming you have downloaded a SAX-compliant parser, such as Apache Xerces-J (see Resources for links). The Apache site has a wealth of information on how to get things set up, but basically you just need to drop the downloaded JAR files into your CLASSPATH. These examples assume that your parser is available for use.
The first step in working with SAX is actually getting an instance of a parser. In SAX, the parser is represented by an instance of the org.xml.sax.XMLReader class. I covered this in detail in a previous tip ("Achieving vendor independence with SAX" -- see Resources),
so I won't spend much time on it here. Listing 1 shows the correct way to get a new SAX parser instance without writing vendor-dependent code.
Listing 1. Getting a SAX parser instance
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
|
Using this methodology, you need to set the system property org.xml.sax.driver to the class name of the parser you want to load. This is a vendor-specific class; for Xerces it should be org.apache.xerces.parsers.SAXParser. You specify this argument with the -D switch to your Java compiler:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser some.sample.Class |
Of course, you want to ensure that the class specified exists and is on your class path.
Once you have an instance of your parser, you need to configure it. Note that this isn't the same as setting up the parser to deal with errors, content, or structures in XML; instead, configuration is the process of actually telling the parser how to behave. You may turn on validation, turn off namespace checking, and expand entities. These behaviors are totally independent of a specific XML document, and therefore involve interaction with your new parser instance.
Note: For those of you who are overly anxious (I know you're out there), I will indeed be dealing with content, error handling, and the like. However, those subjects will be addressed in future tips, so you'll have to check back. For now, just concentrate on configuration, features, and properties.
You can configure parsers in two ways: features and properties. Features involve turning on or off a specific piece of functionality, like validation. Properties involve setting the value of a specific item that the parser uses, like the location of a schema to validate all documents against. I'll deal with features first, and then look at properties in the next section.
Features are set, not surprisingly, through a method on your parser called setFeature(). The syntax looks like that in Listing 2.
Listing 2. Setting features on a SAX parser
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
String featureName = "some feature URI";
boolean featureOn = true;
try {
parser.setFeature(featureName, featureOn);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
This is pretty self-explanatory; the key is knowing the common features available to SAX parsers. Each feature is identified by a specific URI. A complete list of these URIs is available online at the SAX Web site (see Resources). Some of the most common features are validation and namespace processing. Listing 3 shows an example of setting both of these properties.
Listing 3. Some common features
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
try {
// Turn on validation
parser.setFeature("http://xml.org/sax/features/validation", true);
// Ensure namespace processing is on (the default)
parser.setFeature("http://xml.org/sax/features/namespaces", true);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
Note that while parsers have several standard SAX features, they are free to add their own vendor-specific features. For example, Apache Xerces-J adds features that allow for dynamic validation and the continuance of processing after encountering a fatal error. Consult your parser vendor's documentation for the relevant feature URIs.
Once you understand features, making sense of properties is easy. They behave in exactly the same manner, except that properties take an object as an argument where features take in a boolean value. You use the setProperty() method for this purpose, as shown in Listing 4.
Listing 4. Setting properties on a SAX parser
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
String propertyName = "some property URI";
try {
parser.setProperty(propertyName, obj-arg);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown property specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported property specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting property: " + e.getMessage());
}
|
The same error-handling framework is in play here, so you can easily duplicate code between the two types of configuration options. As with features, SAX provides a standard set of properties, and vendors can add their own extensions. Common SAX-standard properties allow for setting a Lexical Handler and a Declaration Handler (two handlers I'll discuss in later tips). Parsers like Apache Xerces extend these with, for example, the ability to set the input buffer size and the location of an external schema to use in validation. Listing 5 shows a few properties in action.
Listing 5. Some common properties
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
try {
// Set the chunk to read in by SAX
parser.setProperty("http://apache.org/xml/properties/input-buffer-size",
new Integer(2048));
// Set a LexicalHandler
parser.setProperty("http://xml.org/sax/properties/lexical-handler",
new MyLexicalHandler());
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
With an understanding of features and properties, you can make your parser do almost anything. Once you understand setting up your parser in this fashion, you're ready for my next tip, which will discuss building a basic content handler. Until then, I'll see you online in the XML and Java technology forum.
- Review the details of SAX and vendor-independence (developerWorks, March 2001).
- Get the nitty-gritty details in the XML specification at the W3C.
- Check out annotated XML on XML.com.
- Check out the SAX Project home page.
- See the SAX-standardized features and properties list.
- Supplement your skills with
Java and XML
by Brett McLaughlin (O'Reilly and Associates).
- Learn more about Xerces-J -- and download the latest version -- at the Apache Web site.
- Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
- Subscribe to the developerWorks
XML tips newsletter.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

Brett McLaughlin has been working in computers since the Logo days. (Remember the little triangle?) He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.
Comments (Undergoing maintenance)





