Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Sun's Java API for XML Parsing, Version 1.1

JAXP revisited

Brett McLaughlin (brett@newInstance.com), Enhydra Strategist, Lutris Technologies
Brett McLaughlin works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he recently founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project, EJBoss EJB server, and a co-founder of the Apache Turbine project. Brett is currently on the expert group working on the JAXP 1.1 specification and release. You can contact him at brett@newInstance.com.

Summary:  In this follow-up article on JAXP, Sun's Java API for XML Parsing, the author analyzes the newest version, 1.1, which includes updated support for the SAX and DOM standards. With the addition of TRaX, JAXP 1.1 provides Java and XML developers an indispensable tool in writing vendor-neutral code for parsing and transforming XML documents.

Date:  01 Dec 2000
Level:  Introductory

Comments:  

If you are a frequent reader of developerWorks' XML zone, you may be a little puzzled by the presence of another JAXP article. Just about a month ago, I wrote a piece entitled "All About JAXP." In that article, I gave a complete explanation of JAXP, the Java API for XML Parsing, how it works, and how it could help you out in dealing with XML data in Java programs. That article covered the 1.0 release of JAXP.

Familiar territory

So why the heck am I writing about JAXP again? I'm a member of the expert group for JAXP 1.1, and we're nearing completion on the 1.1 specification. While most "point releases" (in which a version moves from 1.0 to 1.1, or 2.2 to 2.3) result in minor, or at least simple, changes to existing APIs, the 1.1 release of JAXP is significantly different than its predecessor. In fact, I'll spend only about one third of this article covering new methods on existing classes and functionalities; the rest of the article will focus on completely new classes and features of the 1.1 version of JAXP. In other words, there's just so much new (and good) in JAXP 1.1 that I couldn't wait to give you a taste of what's coming.

If you are new to JAXP, if you're using it now, or if you've been holding off on using it until it matures a bit more, this article is right for you. I'll cover the modifications to the 1.0 version of the API and spend a good bit of time talking about TRaX (Transformations for XML). TRaX is the API that has been incorporated into JAXP to allow a vendor-neutral means of making XSL transformations; this complements the existing ability of JAXP to allow for vendor-independence in XML parsing. I suggest you read my first JAXP article, take a quick coffee break, and dive into this discussion of JAXP 1.1.


Enhancing the parsing API

Many of the changes to the JAXP API have centered around parsing, which makes sense, given that the "P" in JAXP stands for "parsing." But the most significant changes in JAXP 1.1 center around XML transformations, which I will cover later in this article. In terms of the existing JAXP functionality, the changes are fairly minor. The biggest addition is support for SAX 2.0, which went final in May of 2000, and DOM Level 2, which is still being finalized. The previous version of JAXP only supported SAX 1.0 and DOM Level 1. This lack of updated standards has been one of the biggest criticisms of JAXP 1.0.

In addition to updating JAXP to the newest versions of SAX and DOM, several small changes have been made in the API (as discussed in my last article). Almost all of these changes are important ones that are the result of feedback from the various companies and individuals on the JAXP expert group. All of these changes also deal with configuring the parsers returned from JAXP's two factories, SAXParserFactory and DocumentBuilderFactory. I'll cover these, as well as the update in standards support for SAX and DOM, now.


Updating the standards

The most anticipated change from JAXP 1.0 to 1.1 is the updated support for the popular SAX and DOM standards. SAX, the Simple API for XML, had a version 2.0 release in May of 2000 that provided greatly enhanced support for XML namespaces, among other items. This namespace support enables the use of numerous other XML vocabularies, such as XML Schema, XLink, and XPointer. While it was possible to use these vocabularies in SAX 1.0, the burden was on the developer to split an element's local (or qualified) name from its namespace, and keep track of namespaces throughout the document. SAX 2.0 provides this information to the developer, dramatically simplifying the process of carrying out these programming tasks. The same goes for DOM Level 2: namespace support, as well as a wealth of other methods on the DOM classes, is available. While DOM Level 2 has not been finalized, JAXP 1.1 supports the specification as it now stands. As minor changes get introduced in the final stages of the DOM standard, JAXP will, of course, include these modifications.

The good news is that these changes are generally transparent to the developer using JAXP. In other words, these standards updates happen somewhat "automatically," without user intervention. Simply specifying a SAX 2.0-compliant parser to the SAXParserFactory and a DOM Level 2-compliant parser to the DocumentBuilderFactory class takes care of the update.


The road to SAX 2.0

There are a few significant changes related to these standards updates. In SAX 1.0, the parser interface that was implemented by vendors and XML parser projects was org.xml.sax.Parser. The JAXP class SAXParser, then, provided a method to get this underlying implementation class through the getParser() method. The signature for that method looks like:

public interface SAXParser {
    public org.xml.sax.Parser getParser();
    // Other methods
}

However, in the change from SAX 1.0 to 2.0, the Parser interface was deprecated and replaced with a new interface, org.xml.sax.XMLReader. This made the getParser() method essentially useless for obtaining an instance of the SAX 2.0 XMLReader class. To support this, and to support SAX 2.0, a new method has been added to the JAXP SAXParser class. Not surprisingly, this method is named getXMLReader() and looks like:

public interface SAXParser {
    public org.xml.sax.XMLReader getXMLReader();
    public org.xml.sax.Parser getParser();
    // Other methods
}

In this same way, the class that was used in SAX 1.0 to implement callbacks was org.xml.sax.HandlerBase, and an instance of that class was supplied to all of the JAXP 1.0 parse() methods. But due to some additional SAX 2.0 deprecations and changes, this class is no longer used in SAX 2.0. Instead, it has been replaced by a new class, org.xml.sax.ext.DefaultHandler. To accommodate this change, all of the parse() methods on the SAXParser class have been complemented with versions of the same method that take an instance of the DefaultHandler class to support SAX 2.0. To help you see this difference, the methods I'm talking about are shown in Listing 3:

public interface SAXParser {
    // The SAX 1.0 parse methods
    public void parse(File file, HandlerBase handlerBase);
    public void parse(InputSource inputSource, HandlerBase handlerBase);
    public void parse(InputStream inputStream, HandlerBase handlerBase);
    public void parse(InputStream inputStream, HandlerBase handlerBase, 
                      String systemID);
    public void parse(String uri, HandlerBase handlerBase);
    // The SAX 2.0 parse methods
    public void parse(File file, DefaultHandler defaultHandler);
    public void parse(InputSource inputSource, DefaultHandler defaultHandler);
    public void parse(InputStream inputStream, DefaultHandler defaultHandler);
    public void parse(InputStream inputStream, DefaultHandler defaultHandler, 
                      String systemID);
    public void parse(String uri, DefaultHandler defaultHandler);
    // Other methods
}

Having all these methods for parsing may seem a bit confusing, but it's only tricky if you're working with both versions of SAX. If you are using SAX 1.0, you'll be working with the Parser interface and HandlerBase class, and it will be obvious which methods to use. Similarly, when using SAX 2.0, it will be obvious that the methods that accept DefaultHandler instances and return XMLReaders will be used. So take all this as a reference and don't worry too much about it! There are some other changes to the SAX portion of the API, as well.


Changes in existing SAX classes

To complete the discussion of the changes to existing JAXP functionality, I need to go over a few new methods that are available to JAXP SAX users. First, the SAXParserFactory class has a new method, setFeature(). As you may recall from JAXP 1.0, the SAXParserFactory class allows configuration of SAXParser instances returned from the factory. In addition to the methods already available, (setValidating() and setNamespaceAware()), this new method allows SAX 2.0 features to be requested for new parser instances. SAX 2.0 provides features that allow vendors to create specific functionality for their parsers; users can then interact with these features through SAX. For example, a user may request the http://apache.org/xml/features/validation/schema feature, which allows XML Schema validation to be turned on or off. This can now be performed directly on a SAXParserFactory, which is shown in Listing 4:

    SAXParserFactory myFactory = SAXParserFactory.newInstance();
    // Turn on XML Schema validation
    myFactory.setFeature("http://apache.org/xml/features/validation/schema", true);
    // Now get an instance of the parser with schema validation enabled
    SAXParser parser = myFactory.newSAXParser();

Of course, a getFeature() method is provided to complement the setFeature() method and allow querying of particular features. This method returns a simple boolean value.

In addition to SAX allowing features to be set (with true or false values), properties also can be set. In SAX, properties are names associated with actual Java objects. For example, using an instance of a SAX parser, you could set the property http://xml.org/sax/properties/lexical-handler, assigning that property an implementation of a SAX LexicalHandler interface. That implementation would then be used by the parser for lexical processing. Because properties like this lexical one are parser-specific instead of factory-specific (as features were), a setProperty() method is provided on the JAXP SAXParser class, rather than on the SAXParserFactory class. And as with features, a getProperty() complement is provided to return the value associated with a specific property, also on the SAXParser class.


Updates in DOM

A number of new methods are available for the DOM portion of JAXP. These methods have been added to existing JAXP classes to support both DOM Level 2 options, as well as common configuration situations that have arisen in the last year. I won't cover all of these options and the corresponding methods here since many are fairly obtuse (they are used only in very unusual situations) and won't be needed in many of your applications. You are certainly encouraged to check these out in the latest JAXP specification online (see the Resources section). With the coverage of standards updates, SAX changes, and additional DOM methods, you're ready to read about the most substantial changes in JAXP 1.1 -- the TRaX API.


The TRaX API

So far, I've covered the changes to XML parsing in JAXP. Now I can turn to XML transformations in JAXP 1.1. Perhaps the most exciting development in the newest version of Sun's API is that JAXP 1.1 will allow vendor-neutral XML document transformations. If you're unfamiliar with XML transformations and XSLT (XML transformations), check out dW tutorials (see Resources). While this vendor neutrality may expand on the current vision of JAXP as simply a parsing API, it is a much needed facility since XSL processors currently employ different methods and means for enabling user and developer interaction. In fact, XSL processors have even greater variance across providers than their XML parser counterparts.

Originally, the JAXP expert group sought to provide a simple Transform class with a few methods to allow specification of a style sheet and subsequent document transformations. This first effort turned out to be rather shaky, but I'm happy to report that we (the JAXP expert group) are going much further in our continued efforts. Scott Boag and Michael Kay, two of the XSL processor gurus today (working on Apache Xalan and SAXON, respectively), have worked with others to develop TRaX. This supports a much wider array of options and features, and provides complete support for almost all XML transformations -- all under the JAXP umbrella.

Like the parsing portion of JAXP, performing XML transformations requires three basic steps:

  • Obtain a Transformer factory
  • Retrieve a Transformer
  • Perform operations (transformations)

Working with the factory

For the transformation portion of JAXP, the factory you will work with is called javax.xml.transform.TransformerFactory. This class is analogous to the SAXParserFactory and DocumentBuilderFactory classes that I already covered in both my first JAXP article and earlier in this article. Of course, simply obtaining a factory instance to work with is a piece of cake:

    TransformerFactory factory = TransformerFactory.newInstance();

Once the factory is available, various options can be set upon the factory. Those options will affect all instances of Transformer (which I'll cover in a minute) created by that factory. (By the way, you can also obtain instances of javax.xml.transform.Templates through the TransformerFactory. Templates are an advanced JAXP concept, and one I don't have space to cover here.)

The first of the options you can work with are attributes. These are not XML attributes, but are similar to the properties I discussed in reference to XML parsers. Attributes allow options to be passed to the underlying XSL processor, which may be Apache Xalan, SAXON, or Oracle's XSL processor. They are largely vendor-dependent. Like the parsing side of JAXP, a setAttribute() method is provided as well as a counterpart, getAttribute(). Like setProperty(), the former takes an attribute name and Object value. And like getProperty(), the latter takes an attribute name and returns the associated Object value.

Setting an ErrorListener is the second option available. Defined in the javax.xml.transform.ErrorListener interface, an ErrorListener allows problems in transformation to be caught and handled programmatically. If you're familiar with SAX, this interface looks remarkably similar to the org.xml.sax.ErrorHandler interface:

package javax.xml.transform;
public interface ErrorListener {
    public void warning(TransformerException exception)
        throws TransformerException;
    public void error(TransformerException exception)
        throws TransformerException;
    public void fatalError(TransformerException exception)
        throws TransformerException;
}

Creating an implementation of this interface, filling the three callback methods, and using the setErrorListener() method on the TransformerFactory instance you are working with sets you up to deal with any errors.

Finally, a method is provided to set and retrieve the URI (a uniform resource indicator, often a URL) resolver for the instances generated by the factory. The interface defined in javax.xml.transform.URIResolver also behaves similarly to a SAX counterpart, org.xml.sax.EntityResolver. The interface has a single method:

package javax.xml.transform;
public interface URIResolver {
    public Source resolve(String href, String base)
        throws TransformerException;
}

This interface, when implemented, allows URIs found in XML constructs like xsl:import and xsl:include to be handled. Returning a Source (which I'll cover in a moment), you can instruct your transformer to search for the specified document in various locations when a particular URI is encountered. For example, when an include of the URI http://www.oreilly.com/oreilly.xsl is encountered, you might instead return the local document oreilly.xsl and prevent the need for network access. Implementations of the URIResolver interface can be set using the TransformerFactory's setURIResolver() method, and retrieved using the getURIResolver() method.

Finally, once you have set the options of your choice, you can obtain an instance, or instances, of a Transformer through the newTransformer method of the factory:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance();
    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);
    // Get a Transformer to work with, with the options specified
    Transformer transformer = factory.newTransformer(new StreamSource("sheet.xsl"));

As you see, this method takes the style sheet as input to use in all transformations for that Transformer instance. In other words, if you wanted to transform a document using style sheet A and style sheet B, you would need two Transformer instances, one for each style sheet. If you wanted to transform multiple documents with the same style sheet (call it style sheet C), however, you would only need a single Transformer instance, associated with style sheet C.


Transforming XML

Once you have an instance of a Transformer, you can go about actually performing XML transformations. This consists of two basic steps:

  • Set the XSL style sheet to use
  • Perform the transformation, specifying the XML document and result target

As I discussed above, the first step is really the easiest. A style sheet must be supplied when obtaining a Transformer instance from the factory. The location of this style sheet must be specified by providing a javax.xml.transform.Source for its location. The Source interface, which you've seen in a few code samples so far, is the means of locating an input -- be it a style sheet, document, or other information set. TRaX not only provides the Source interface, but also three concrete implementations:

  • javax.xml.transform.stream.StreamSource

  • javax.xml.transform.dom.DOMSource

  • javax.xml.transform.sax.SAXSource

The first of these, StreamSource, reads input from some type of I/O device. Constructors are provided for accepting an InputStream, a Reader, or a String system ID as input. Once created, the StreamSource can be passed in to the Transformer for use. This will probably be your most common Source implementation used. It's great for reading a document from a network, input stream, user input, or other somewhat static representation.

The next Source, DOMSource, provides for reading from an existing DOM tree. It supplies a constructor for taking in a DOM org.w3c.dom.Node, and will read from that Node when used. This is ideal for supplying an existing DOM tree to a transformation, perhaps if parsing has already occurred and an XML document is already in memory as a DOM structure.

SAXSource provides for reading input from SAX producers. This Source implementation takes either a SAX org.xml.sax.InputSource, or an org.xml.sax.XMLReader as input, and uses the events from these sources as input. This is ideal for situations in which a SAX is already in use, and callbacks are set up and need to be triggered prior to transformations.

Once you've obtained an instance of a Transformer (by providing the style sheet to use through an appropriate Source), you're ready to perform a transformation. To accomplish this, the transform() method is used (no surprise there) as follows:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance();
    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);
    // Get a Transformer to work with, with the options specified
    Transformer transformer = factory.newTransformer(new StreamSource("sheet.xsl"));
    // Perform transformation on document A, and print out result
    transfomer.transform(new StreamSource("documentA.xml"),
                         new StreamResult(System.out));

The transform() method takes two arguments: a Source implementation, and a javax.xml.transform.Result implementation. You should already be seeing the symmetry in how this works and have an idea about the functionality of the Result interface. The Source should provide the XML document to be transformed, and the Result should provide an output target for the transformation. Like Source, there are three concrete implementations provided with TRaX and JAXP of the Result interface:

  • javax.xml.transform.stream.StreamResult

  • javax.xml.transform.dom.DOMResult

  • javax.xml.transform.sax.SAXResult

The StreamResult takes as a construction mechanism either an OutputStream (like System.out in the example above), or a Writer. DOMResult takes a DOM Node to output the transformation to (presumably as a DOM org.w3c.dom.Document), and SAXResult takes a SAX ContentHandler to fire callbacks to, resulting from the transformed XML. All are analogous to their Source counterparts, and you can easily figure out their uses from those counterparts.

While the example above shows transforming from a stream to a stream, any combination of sources and results is possible. Here are a few examples:

    // Perform transformation on document A, and print out result
    transformer.transform(new StreamSource("documentA.xml"),
                         new StreamResult(System.out));
    // Transform from SAX and output results to a DOM Node
    transformer.transform(new SAXSource
                          (new InputSource("http://www.oreilly.com/catalog.xml")),
                           new DOMResult(DocumentBuilder.newDocument()));
    // Transform from DOM and output to a File
    transformer.transform(new DOMSource(myDomTree),
                          new StreamResult(new FileOutputStream("results.xml")));
    // Use a custom source and result (JDOM)
    transformer.transform(new org.jdom.trax.JDOMSource(myJdomDocument),
                          new 
org.jdom.trax.JDOMResult(new org.jdom.Document()));

As you can see, TRaX provides tremendous flexibility in moving from various input types to various output types, and in using XSL style sheets in a variety of formats -- files, in-memory DOM trees, SAX readers, and so on.


Scratching the surface

A number of other items in TRaX are important, but they are not as commonly used as those shown here, and there isn't room here to list them all. I do recommend you check out the TRaX API when the JAXP specification has included it (something that should happen any day now); it is a rich and robust API for XML transformations. You can play around with output properties, set error handling (not only for XSL transformations, but also for locating input sources) and find a variety of other goodies in the API. Enjoy, and let us (the expert group) know what you think!


Warnings

Before I wrap up, a warning is in order. In case you read this article three months from now, download JAXP 1.1, and get compiler and runtime errors, keep in mind that this article is being written as JAXP 1.1 is being finalized. As with any early-release piece, things can change as this article ages -- even as it goes from my laptop into the developerWorks production process. In other words, the methods and features covered here are current as I write them, but the JAXP specification is still somewhat in flux. Bearing that in mind, consider the concepts here important, yet be prepared for a method or two to undergo a name change or perhaps even go through a slight alteration in behavior. Still, the core ideas outlined here will appear in JAXP 1.1 in some form. So count on what is detailed here to be correct in concept, if not exactly in detail, by the time JAXP 1.1 goes final in both its specification and reference implementation.


Summary

You now have the lowdown on what's coming in the next version of JAXP. The public draft of the specification, in its final form, should be available close to the end of the year 2000. The actual reference implementation should follow shortly, with all loose ends tied up by the first quarter of 2001. You'll want to be careful when looking up resources on JAXP since the current draft of the specification (as of early November 2000) does not include the TRaX API that I discussed in this article. The specification is being revamped as I write this, so an updated specification will be available shortly.

For those of you who have been waiting to use JAXP (a fairly wise move considering the limitations of the 1.0 version), consider this the time to dive in head first. In my articles and my book, Java and XML, I gave a rather tenuous endorsement of JAXP 1.0, due to its shortcomings with regard to SAX 2.0 and DOM Level 2. I'm happily endorsing JAXP 1.1 now as a major step forward. Java and XML developers will find it an indispensable tool in writing vendor-neutral code for parsing and transforming XML documents. So check it out, and get your applications in gear.


Resources

About the author

Brett McLaughlin works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he recently founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project, EJBoss EJB server, and a co-founder of the Apache Turbine project. Brett is currently on the expert group working on the JAXP 1.1 specification and release. You can contact him at brett@newInstance.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=10496
ArticleTitle=Sun's Java API for XML Parsing, Version 1.1
publish-date=12012000
author1-email=brett@newInstance.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).