Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

All about JAXP, Part 2

Transform XML using Sun's XML processing toolkit

Brett D. McLaughlin, Sr., Author and Editor, O'Reilly Media, Inc.
Photo of Brett McLaughlin
Brett McLaughlin has worked in computers since the Logo days. (Remember the little triangle?) In recent years, he's become one of the most well-known authors and programmers in the Java technology and XML communities. He's worked for Nextel Communications, implementing complex enterprise systems; at Lutris Technologies, actually writing application servers; and most recently at O'Reilly Media, Inc., where he continues to write and edit books that matter. His most recent book, Java 5.0 Tiger: A Developer's Notebook, is the first book available on the newest version of Java technology, and his classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Summary:  Part 1 of this two-part series introduced the Java™ API for XML Processing (JAXP) and its parsing and validation features. JAXP also offers Java programmers the ability to transform XML documents using Extensible Stylesheet Language (XSL). Through both direct programmatic access and XSL templating, JAXP makes conversion from one XML format to another an easy task. This article shows you how to use JAXP to transform XML documents and how to cache XSL stylesheets for the best performance possible.

Date:  31 May 2005
Level:  Intermediate
Also available in:   Russian  Japanese

Activity:  16590 views
Comments:  

In JAXP's earlier incarnations, the acronym stood for the Java API for XML Parsing. As you learned in Part 1, JAXP is a layer over SAX and DOM that enables Java programmers to perform vendor-neutral XML parsing. Originally, that's all that JAXP did. But as the saying goes, that was then and this is now.

At one time, the Java-and-XML combination itself was largely about parsing. Java applications just needed to read in an XML document and then programmatically do something with the document's data. As XML-consuming applications became commonplace, though, it became apparent that the do-somethings that various applications were interested in had a lot of overlap. With all good software, overlap leads to specification (and every once in a while, new and useful APIs).

One of the first specs to come out of the widespread use of XML was the XSL (see Resources). Applications constantly take XML data, add some formatting, and toss it into user interfaces -- usually as HTML, XHTML, or WML. XSL took this task and built a specification upon it, enabling applications to ditch all of their proprietary transformation code. With an XSL specification in place, the Transformation API for XML (TrAX) came along (see Resources). TrAX provided a simple, consistent approach to XSL in Java applications. And now JAXP -- the final link in this rather long chain (and introduction) -- has incorporated TrAX into the core Java development environment. In light of all this evolution, as well as recent additions such as extended validation and XPath support, JAXP now stands for the Java API for XML Processing. This article focuses on using JAXP for processing rather than parsing.

Point A to point B

Understanding XSL's basic program flow is crucial to grasping how JAXP handles transformations. If you're fairly new to XSL, a quick review of XSL basics is worthwhile. Bear with me even if you're an XSL expert.

Source (XML)

When you work with XSL, you've got to start with XML. I realize that sounds obvious, but it's worth stating. You're probably accustomed to starting with an XML file -- something like phonebook.xml -- and passing it into an XSL processor. JAXP lets you do a lot more than just pass in files, as you'll learn in the next section, Input and output.

Stylesheet (XSL)

What probably interests most designers is the XSL stylesheet. A stylesheet is a set of instructions that specify certain types of data as input and specify some other set of data and formatting as output. Keep in mind, though, that the stylesheet should operate upon the structure of the incoming XML, rather than the specific data in the document. This ensures that your stylesheet works with any XML in a given format, rather than one specific instance document.

Target (*ML)

Finally, you need to keep in mind that you can output only well-formed markup language from XSL. You can't spit out Microsoft Word documents or PDFs. You're stuck with a markup language such as XML, XHTML, WML, or some other wonderful *ML (markup language) variant.

Of course, I can hear the objections from here -- you've seen this application that spits out PDF from XML, or that application than converts an XML document into Excel. It's true that engines exist that can take a specific flavor of XML and convert it into binary formats. But that's not XSL's domain; it's a post-transformation process. JAXP will help you transform XML, but it won't get you to a binary format.


Input and output

You've probably gathered from that pithy review that a lot of XML transformation is simply about input and output. You take XML in, do something with it, and push *ML out. Before I deal with all that middle bit -- which I realize is where most of the fun lies -- I'll show you how to get data into JAXP and how to get it back out.

JAXP's flexibility

JAXP is not a general transformation engine. It can't convert your Java property files, or your one-off text formats, into XML or some other form of markup language. In fact, JAXP won't even accept HTML unless it's well-formed (and therefore some XHTML derivative). So before you try to make JAXP more of a general API than it really is, realize that you've got to start with well-formed XML, or everything else breaks down.

The flexibility of JAXP lies in how you can represent XML. JAXP can accept XML as a file or as a stream wrapped around a file, for obvious reasons. But it can also accept as input a DOM Document, which represents an XML document that might not exist on disk anywhere, or a series of SAX events, which also represent an XML document. With this added flexibility, you can insert JAXP into any of your XML event chains. For example, if you have code that reads in an XML document using SAX, and then passes certain pieces of that data on to another application, you can simply insert JAXP between the SAX code and the other application component. It can consume SAX events, transform the data as needed, and pass on the result to the receiving application component.

The moral of the story is that you can give XML to JAXP in just about any standard format and get it out in just as many formats. Even if you don't need this sort of flexibility now -- or can't imagine how you might ever use all these various formats -- JAXP is ready for you when your needs become more advanced.

Source for input

Interface programming 101

If working with interfaces is fairly new to you, notice that Listing 1 always uses the interface on the left side of the equals sign (=) and the specific implementation class on the right. So Source is on the left, and StreamSource, SAXSource, and the like are on the right.

The javax.xml.transform.Source interface is the basis for all input into JAXP and the transformation API. This interface defines only two methods -- getSystemId() and setSystemId(String systemId). In reality, you won't deal directly with this interface as much as with the concrete implementations that JAXP provides:

  • javax.xml.transform.dom.DOMSource passes a DOM Node (and its children) into JAXP.
  • javax.xml.transform.sax.SAXSource passes the results of SAX callbacks (from an XMLReader) into JAXP.
  • javax.xml.transform.stream.StreamSource passes XML wrapped in a File, InputStream, or Reader into JAXP.

Listing 1 shows you several ways to create a Source for use in a transformation:


Listing 1. Using implementations of the Source interface
// Create a Source from a file on disk
Source fileSource = 
  new StreamSource(new File("phonebook.xml"));
 
// Create a Source from a DOM tree
Document myDomDocument = getDocument();
Source domSource = new DOMSource(myDomDocument);

// Create a Source from an InputStream
BufferedInputStream bis = 
  new BufferedInputStream(getInputStream());
Source streamSource = new StreamSource(bis);

// Create a Source from a reader and SAX InputSource
XMLReader myXMLReader = getXMLReader();
InputSource myInputSource = getInputSource();
Source saxSource = new SAXSource(myXMLReader, myInputSource);

Listing 1 is pretty self-explanatory. Once you've got a Source, you're ready to feed your XML into the XSL-processing portion of JAXP.

Result for output

Before moving on to transformations themselves, I'll give you a look at the output counterpart to Source -- javax.xml.transform.Result. It even has the same two basic methods as Source -- getSystemId() and setSystemId(String systemId).

As with its input counterpart, you'll generally use JAXP's concrete Result implementations:

  • javax.xml.transform.dom.DOMResult passes transformed content into a DOM Node.
  • javax.xml.transform.sax.SAXResult passes the results of a transformation to a SAX ContentHandler.
  • javax.xml.transform.stream.StreamResult passes the transformed *ML into a File, OutputStream, or Writer.

Listing 2 shows some simple examples, much like those you saw for Source in Listing 1:


Listing 2. Using implementations of the Result interface
// Write to a file on disk
Result fileResult = 
  new StreamResult(new File("output.xml"));
 
// Write a Result to a DOM tree (inserted into the supplied Document)
Document myDomDocument = getDocument();
Result domResult = new DOMResult(myDomDocument);

// Create a Result from an OutputStream
BufferedOutputStream bos = 
  new BufferedOutputStream(getOutputStream());
Result streamResult = new StreamResult(bos);

// Create a Result to write to a SAX ContentHandler
ContentHandler myContentHandler = new MyContentHandler();
Result saxResult = new SAXResult(myContentHandler);

Once you understand the Source and Result interfaces -- and the implementations that come bundled with JAXP -- you're already halfway to mastering XML transformations.


Performing transformations with JAXP

If it's been a while since you've read Part 1, or if you're still a little rusty when it comes to JAXP and parsing, you should take some time to review the SAXParserFactory and DOMBuilderFactory classes. You'll find that if you know how to use these classes, you're already well on your way to figuring out how JAXP transformations work.

Getting a factory

Transformations are so simple as to be almost trivial. First, you need to set up your input and output sinks. Wrap a Source around both your input XML document and your XSL stylesheet. Then, create a sink to write the transformed results to -- and wrap that in a Result.

Next, you need to create a TransformerFactory, using the static newInstance() method. Listing 3 shows you all the details:


Listing 3. Creating a new TransformerFactory instance
try {
  // Set up input documents
  Source inputXML = new StreamSource(
    new File("phonebook.xml"));

  Source inputXSL = new StreamSource(
    new File("phonebook.xsl"));

  // Set up output sink
  Result outputXHTML = new StreamResult(
    new File("output.html"));

  // Setup a factory for transforms
  TransformerFactory factory = TransformerFactory.newInstance();

} catch (TransformerConfigurationException e) {
  System.out.println("The underlying XSL processor " +
    "does not support the requested features.");
} catch (TransformerException e) {
  System.out.println("Error occurred obtaining " +
    "XSL processor.");
}

There's not much to this step. The exception handling takes as much time as the code itself. As with the SAX and DOM factory classes, one exception handles requested -- but unsupported -- features, and another exception handles instantiation errors.

The identity transformation

One version of TransformerFactory.newTransformer() doesn't accept any arguments (and therefore no XSL stylesheet). This lets you perform an identity transformation, which simply converts your input XML from one form (such as a stream) to another (such as a DOM tree). You supply the XML as a Source in one format and push it back out as a Result in another format. It's a nice trick that's worth remembering.

The factory class itself is used to get an instance of Transformer (discussed in the next subsection) and to perform simple configuration. You can use the setFeature(String feature, boolean value) method to invoke features on the processor. Of course, any features set on the factory apply to all Transformer instances created from it.

Creating a Transformer

The next step is to get the object to perform the actual transformation. This is another rather boring line of code: Just call newTransformer() on your factory and supply the method with the XSL stylesheet you want to use. Listing 4 shows you what to do:


Listing 4. Using the TransformerFactory to create a Transformer
try {
  // Set up input documents
  Source inputXML = new StreamSource(
    new File("phonebook.xml"));

  Source inputXSL = new StreamSource(
    new File("phonebook.xsl"));

  // Set up output sink
  Result outputXHTML = new StreamResult(
    new File("output.html"));

  // Setup a factory for transforms
  TransformerFactory factory = TransformerFactory.newInstance();

  // Get a transformer for this XSL
  Transformer transformer = factory.newTransformer(inputXSL);

} catch (TransformerConfigurationException e) {
  System.out.println("The underlying XSL processor " +
    "does not support the requested features.");
} catch (TransformerException e) {
  System.out.println("Error occurred obtaining " +
    "XSL processor.");
}

Not much is notable here; the only thing you need to be sure you don't miss is the connection between the Transformer and a specific stylesheet. Because the stylesheet is used in the creation of the Transformer, that's the only XSL you can use with the instance. If you want to perform additional transformations using a different stylesheet, you can reuse the TransformerFactory but must create a different Transformer instance, tied to the new stylesheet.

Performing the transformation

With all the pieces in place, you need just one more line of code to perform the transformation. Listing 5 shows you how to use the transform() method. Just supply it with your input XML and output sink; the stylesheet is already tied to the Transformer instance you're using:


Listing 5. Using the transform() method
try {
  // Set up input documents
  Source inputXML = new StreamSource(
    new File("phonebook.xml"));

  Source inputXSL = new StreamSource(
    new File("phonebook.xsl"));

  // Set up output sink
  Result outputXHTML = new StreamResult(
    new File("output.html"));

  // Setup a factory for transforms
  TransformerFactory factory = TransformerFactory.newInstance();

  // Get a transformer for this XSL
  Transformer transformer = factory.newTransformer(inputXSL);

  // Perform the transformation
  transformer.transform(inputXML, outputXHTML);

} catch (TransformerConfigurationException e) {
  System.out.println("The underlying XSL processor " +
    "does not support the requested features.");
} catch (TransformerException e) {
  System.out.println("Error occurred obtaining " +
    "XSL processor.");
}

Once you've invoked this method, the result of the transformation is written out to the supplied Result. In Listing 5 that's a file, but you could also send the output to a SAX ContentHandler or a DOM Node. If you want to try all this out, the bundled files provide a simple XML file, XSL stylesheet, and source code (see Download).


Caching XSL stylesheets

As simple as all of this is, using JAXP this way has two significant limitations:

  • The Transformer object processes the XSL stylesheet each and every time transform() is executed.
  • Instances of Transformer are not thread-safe. You can't use the same instances across multiple threads.

Both of these limitations stem from the same problem: A Transformer must reprocess the XSL every time it executes a transformation. If this processing occurs in multiple threads, you can start to have serious problems. And on top of the threading issues, you've got to pay the processing cost for the XSL stylesheet over and over again. No doubt you're eager to find out how to solve these problems. Read on.

Loading a template

An interface I haven't discussed yet -- javax.xml.transform.Templates -- sits right next to javax.xml.transform.Transformer. The Templates interface is thread-safe (addressing the second limitation) and represents a compiled stylesheet (addressing the first limitation). Before I get into the concepts involved here, check out Listing 6:


Listing 6. Using the JAXP Templates interface
try {
  // Set up input documents
  Source inputXML = new StreamSource(
    new File("phonebook.xml"));

  Source inputXSL = new StreamSource(
    new File("phonebook.xsl"));

  // Set up output sink
  Result outputXHTML = new StreamResult(
    new File("output-templates.html"));

  // Setup a factory for transforms
  TransformerFactory factory = TransformerFactory.newInstance();

  // Pre-compile instructions
  Templates templates = factory.newTemplates(inputXSL);

  // Get a transformer for this XSL
  Transformer transformer = templates.newTransformer();

  // Perform the transformation
  transformer.transform(inputXML, outputXHTML);

} catch (TransformerConfigurationException e) {
  System.out.println("The underlying XSL processor " +
    "does not support the requested features.");
} catch (TransformerException e) {
  System.out.println("Error occurred obtaining " +
    "XSL processor.");
}

The bolded lines in Listing 6 represent the only changes you need from the code in Listing 5. Instead of using the factory to get a Transformer directly, you use the newTemplates() method; this returns a Templates object, which is thread-safe. You can pass this object in to other methods -- in other threads -- and not worry about it at all. Because it precompiles the transformation instruction from the XSL it's handed, it is safe for handing off to other methods and even threads.

Then, you obtain the Transformer instance from the Templates.newTransformer() method. You don't need to specify the XSL at this stage, because the Transformer has already handled that (and, in fact, it's compiled that XSL, so you couldn't change the stylesheet if you wanted to). Other than an extra line, and a change to an existing line, there's nothing new here. Pretty cool, considering how much better your code gets as a result of this small change.

From Transformer to Templates

The last issue worth considering is when to use Transformers directly from a factory, and when to use Templates objects. I almost always prefer to go with the Templates object, because when I use XSL I usually use the same stylesheet repeatedly. Rather than pay for multiple passes over the XSL, I prefer to precompile the instructions into a Templates object and be done with the XSL processing.

That said, in a few cases it's better to go with directly pulling a Transformer from your TransformerFactory. If you know that you're going to perform only a single transformation using a specific stylesheet, then it's quicker not to precompile into a Templates object, which requires a little more overhead. However, you need to be sure about no reuse. In my (totally unscientific, using a smallish sample size) tests, I find that once I've used an XSL stylesheet twice, it's a wash between using a Templates object as opposed to using a Transformer directly. Once you hit three times, the Templates approach wins hands down. You also need to be sure you're not going to have any threading issues; that's a simple thing to determine, though, so I'll leave that to you to apply in your programming. As a general rule, it's almost always safer to go with the Templates object.


Changing the XSL processor

In Part 1, you saw that you can replace the default JAXP parser implementation with your own implementation by changing a system property. The same principle applies for the XSL processor. JAXP comes prepackaged with Xalan-J (see Resources), which I always use. But flexibility is always good, and JAXP provides it.

If you want to use a processor other than Xalan, supply a value for the system property named javax.xml.transform.TransformerFactory. You need to assign this property the name of a class to instantiate. The class should extend javax.xml.transform.TransformerFactory (yes, that's also the name of the system property to set) and fill in the methods left abstract. Just use something like:

java -Djavax.xml.transform.TransformerFactory=[transformer.impl.class] TestTransformations 
      simple.xml simple.xsl

That's all there is to it!


Summary

In its early incarnations, JAXP was little more than a thin veneer over SAX and DOM -- and over outdated versions of those APIs, at that. Now, with JAXP 1.3, you can parse, validate, and transform XML without ever writing a line of vendor-specific code. Although it often makes sense to drop down to SAX code or use a tool like DTDParser (see Resources), or even handle transformations yourself, you need JAXP in your arsenal of APIs and tools. Perhaps even more important than vendor neutrality, all your customers and clients with a fairly recent Java Virtual Machine (JVM) are going to have JAXP. So use it, use it well, and use it often.



Download

DescriptionNameSizeDownload method
Sample code for All about JAXP, Part 2x-jaxp2-all-about.zip4 KB HTTP

Information about download methods


Resources

  • Read Part 1 of this two-part series, which introduces JAXP and its parsing and validation features.

  • Visit the "XML and Java technology" forum, hosted by Brett McLaughlin, for additional information on how to work with these technologies.

  • Learn more about JAXP at Sun's Java and XML headquarters.

  • If you're new to Java programming, you can get JAXP along with a complete JDK by downloading Java 5.0.

  • Get the full scoop on XSL standards from the World Wide Web Consortium (W3C).

  • Download Apache Xalan, the XSL processor in Sun's JDK 5.0 implementation.

  • Sun uses the Apache Xerces parser in its JDK 5.0 implementation.

  • For an in-depth look at the new features in JAXP 1.3, read the two-part developerWorks series "What's new in JAXP 1.3?" Part 1 (November 2004) provides a brief overview of the JAXP specification, gives details of the modifications to the javax.xml.parsers package, and describes a powerful schema caching and validation framework. Part 2 (December 2004) touches on utilities that add support for concepts defined in the Namespaces in XML specification, and describes changes to the javax.xml.transform package.

  • Read more about Transformation API for XML (TrAX), which is now part of JAXP.

  • Read "Putting XSL transformations to work" (developerWorks, October 2001) for an introduction to XSL and a discussion of real-world business scenarios that benefit from the use of XSL transformations.

  • Read Brett McLaughlin's book Java & XML (O'Reilly & Associates, 2001), which explains how Java programmers can use XML to build Web-based enterprise applications.

  • Find out more about the APIs under the covers of JAXP. Start with SAX 2 for Java at the SAX Web site, and then take a look at DOM at the W3C Web site.

  • Learn more about JDOM, an open source toolkit that provides a way to represent XML documents in the Java language for easy and efficient reading, writing, and manipulation.

  • Read "Working XML: Processing instructions and parameters" (developerWorks, September 2001) to learn how to use multiple stylesheets in a simple content-management system.

  • Learn the basics of manipulating XML documents using Java technology from Doug Tidwell's developerWorks tutorial "XML programming in Java technology, Part 1" (January 2004). Part 2 (July 2004) looks at more difficult topics, such as working with namespaces, validating XML documents, and building XML structures without a typical XML document. Finally, Part 3 (July 2004) shows you how to do more sophisticated tasks such as generate XML data structures, manipulate those structures, and interface XML parsers with non-XML data sources.

  • If you're interested in a parser that deals with DTDs directly, rather than through a validation API, check out Mark Wutka's handy DTDParser.

  • Browse for books on these and other technical topics.

  • Find out how you can become an IBM Certified Developer in XML and related technologies.

About the author

Photo of Brett McLaughlin

Brett McLaughlin has worked in computers since the Logo days. (Remember the little triangle?) In recent years, he's become one of the most well-known authors and programmers in the Java technology and XML communities. He's worked for Nextel Communications, implementing complex enterprise systems; at Lutris Technologies, actually writing application servers; and most recently at O'Reilly Media, Inc., where he continues to write and edit books that matter. His most recent book, Java 5.0 Tiger: A Developer's Notebook, is the first book available on the newest version of Java technology, and his classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=84213
ArticleTitle=All about JAXP, Part 2
publish-date=05312005
author1-email=brett@newInstance.com
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers