With the recent surge of interest in, and resources for, JAXP (the Java API for XML Parsing, which you can learn more about through the Resources section), you may have forgotten that SAX, DOM, and JDOM have been around for quite a while. In fact, many developers (maybe even you) use SAX, DOM, or JDOM without even touching JAXP.
While JDOM has a well-known, standard way of handling various parsers, and DOM has no facility for this at all prior to the DOM Level 3 version, SAX remains a bit of a mystery to many developers. Many programmers write SAX code that is neither portable nor vendor independent. The result is that they lock their applications into a specific parser -- sometimes a specific version of a parser. In this tip, I explain how you can make your life easier by using a SAX helper class to free your code from this dependence on a specific vendor class.
The foundation for all SAX programming is an instance of a
implementation. The most common means of parsing XML using SAX is simply
to instantiate the vendor-provided implementation class. Listing 1 shows
this methodology in action.
Of course, the problem with Listing 1 is that, in order to change parsers, you have to change the implementation class. And that might mean adding or removing import statements, and ... you get the idea. Suddenly, you are spending your coffee breaks modifying code and recompiling instead of drinking coffee! Lots of problems can result, to say the least. Changing from one SAX parser to another, however, shouldn't be such an ordeal. SAX provides developers a better option. Unfortunately, a lot of programmers miss out on it.
To make it easier to swap parsers, consider using the
helper class. This class provides a handy factory for taking a class name
and then generating an instance of the provided class. If you have done
much work with Java, you may recognize that this is similar to
className).newInstance(). So, instead of using code like that shown
in Listing 1, you should code as shown in Listing 2.
In Listing 2, changing parser implementations simply involves changing a single class name. And that's a pretty safe change, since it involves
merely modifying a
String value. This is much better than the
messy imports and code changes from Listing 1, wouldn't you say? The one
drawback is that, when changing the parser, you still have to make
some changes to your code. And that's not perfect. So, instead of hard coding
in this string, you might want to use Java system properties to handle
the class to load. This would allow specification of the parser class to
your application either through a command-line argument (using the Java
-D argument) or through a simple Java properties file. And, as if SAX weren't
easy enough, the
XMLReaderFactory class provides this functionality
"out of the box." If no argument is specified to the
method, the method will look for a class specified as the value of the
org.xml.sax.driver. You don't even have to do
any work of your own! The code in Listing 2 then is modified to that in
Now, if you follow the method in Listing 3, your SAX code can hum along without even having to be recompiled, using different parser implementations as needed. Of course, if you're using JAXP, this approach isn't needed. But if you're a hardcore SAX person, then this tip will help you write code that's a little more portable a little more efficiently. And if you're using JAXP, you might want to check out my articles on the subject (see Resources). So code away, but be smart about it!
- Read up on JAXP in Brett McLaughlin's articles: JAXP 1.0 and the more recent and more functional JAXP 1.1
- Visit the SAX home page
- Read the JAXP specification (PDF)
Catch up with other recent tips in the developerWorksXML zone.
- Using XSLT as a shortcut to Web page tables of contents
- Documenting style sheets with RDF
- Using lookup tables in XSLT
- Moving DOM nodes (without triggering the
- Using JDOM with XSLT