Skip to main content

Evaluating XPaths from the Java platform

Select parts of an XML document from Java code

Brett D. McLaughlin, Sr., Author and Editor, O'Reilly Media, Inc.
Photo of Brett McLaughlin
Brett McLaughlin is a bestselling and award-winning non-fiction author. His books on computer programming, home theater, and analysis and design have sold in excess of 100,000 copies. He has been writing, editing, and producing technical books for nearly a decade, and is as comfortable in front of a word processor as he is behind a guitar, chasing his two sons around the house, or laughing at reruns of Arrested Development with his wife. His last book, Head First Object Oriented Analysis and Design, won the 2007 Jolt Technical Book award. His classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Summary:  No data format is harder to search than XML, but with the fairly recent introduction of the XQuery API, XML searches are now flexible and easy to perform. For Java™ programmers who work with XML documents using SAX, DOM, JDOM, JAXP, and more, the XQuery API for Java is a welcome addition to the programmer's toolkit. Now the power of XQuery is available to Java programmers without resorting to system calls or unwieldy APIs, all in a Sun-standardized package.

Date:  08 Jul 2008
Level:  Intermediate PDF:  A4 and Letter (133 KB)Get Adobe® Reader®
Activity:  1828 views
Comments:  

Grouping XML technologies

Some ten years ago, when XML was peaking as one of the hottest technologies around, it was inconceivable that in a decade, XML would be so important, plus you can find many XML-related technologies that are just as interesting. In fact, you can view the various XML technologies through several paths.

The traditional XML groups

Frequently used acronyms

  • API: Application Programming Interface
  • DOM: Document Object Model
  • IDE: Integrated Development Environment
  • JAXP: Java API for XML Processing
  • JDOM: Java Document Object Model
  • SAX: Simple API for XML
  • URI: Uniform Resource Identifier
  • W3C: World Wide Web Consortium
  • XML: Extensible Markup Language

You can group XML-related technologies into a few basic groups, or focus areas:

  1. Document authoring: This group is for the folks that spend most of their time actually authoring XML. Whether they create original XML data, or represent existing data in an XML format, the focus here is pure XML, and takes little notice of the programming tasks that might use those documents. Here's where the core of XML sits, along with specific XML vocabularies, like MathML or some of the scientific XML vocabularies.
  2. Processing XML: These are the technologies like XSL that allows XML to be transformed, massaged, or migrated from one format to another. Again, the focus is the XML documents and data within them, although programming languages are sometimes used to accomplish these transformations.
  3. Reading/writing XML (and persisting data): This is the more programming-centric grouping of technologies, ranging from low-level APIs like SAX and DOM to data binding technologies like JAXB and Castor. This is where XML is seen as a data storage mechanism, and in a lot of ways, a means to an end.

Until recently, these were the big categories...with most new technologies and specifications adding to one of these three groupings.

Moving XML to a first-class data citizen

One of the big problems with XML, though—and a limitation to the three groups above—was the lack of good search support. If you wanted to search through your data, and the data was in XML, it's been a problem. In fact, the general solution was to force together a few of the groupings above. Document authors might struggle through using a command-line tool like grep, which is a lousy way to perform searches. Programmers might read in the XML (another grouping), and then use their programming language (like Java or C#) to search through the data in a non-XML format. That's workable, but still reveals a limitation of XML.

Fortunately, the introduction and now popularity of XPath (and XQuery, mentioned late in this article) have introduced a new group:

  1. Searching XML: This is where XPath and XQuery come in. These specifications/technologies allow you to search XML documents in what amounts to an XML manner. In other words, the searches are capable of working with XML semantics, and can search through not only the data in an XML document, but the structure of those documents, as well.

With XPath and XQuery, you're not stuck pulling data from XML into a programming language, and then using that language's tools to search the data. In addition to the constraint of your programming language with that approach, you typically lose most of the XML semantics and structure, such as what element was a child of what other element, and so on. XPath and XQuery allow you to search XML without needing a programming language.

All that said, though, there's still a need for programming languages and interacting with XPath and XQuery from Java (and other) languages. While XPath and XQuery give you great XML-aware searching capabilities, you still need a way to use these technologies if you're a programmer. Simply launching a command-line process with a system-aware command like exec() is a pain, and prone to all sorts of errors you can't handle. Worse yet, it probably makes working with the results of a search nearly impossible (if not actually impossible). That's where XPath meets the Java (or C# or Perl) language. This is a Java-specific article. (If you want to see articles about those other languages, tell us in the feedback section of this article!).

If you're reading this article, you should at least be familiar with XPath. Check out Resources for links to a good two-part introductory tutorial on XPath, if you've never used the technology before, and then come back to this article.


JAXP and Java 5 software

But I don't have Java 5 technology!

Why not? Seriously, Java 5 technology is pretty mature, and the Java platform on the whole isn't introducing radical changes into the API like it once was. The leap (skip?) from Java 1.4 to 5 (1.5) technology is much smaller than from 1.3 to 1.4, from 1.2 to 1.3, etc. And, Java 6 technology is already out, with Java 7 technology coming all-too soon. If you're not in the Java 5 environment, you're missing out on a lot of functionality, most notably XPath and a revamped and improved version of JAXP.

If you are somehow stuck without Java 5 technology, and can't do anything about it, check out an API like JDOM (see Resources for links) that has Java 1.4 support and an integrated XPath engine. That might tide you over while you work to move to the Java 5 level.

It's at this point—the intersection of XPath and Java technology—that Sun has done Java programmers a big favor. They're integrated XPath support into the Java 5 environment. Even better, you don't need to download the enterprise edition, or a supplemental package (like Sun used to do with parts of JDBC). If you've got Java 5 software on your machine, you've got XPath support, in a very Java-centric way. In fact, it's part of JAXP, the Java API for XML Processing, something you're probably already familiar with.

Make sure you've got the Java 5 release

If you're not sure what version of Java technology you have, or what version of Java technology runs on the machine (perhaps on a remote server) that you write code on, you can find out easily. Just run java with the -version flag. Listing 1 shows what that should look like.


Listing 1. Making sure you've got Java 5 or later
                
[bdm0509:~/Documents/developerworks/java_xpath] java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

As long as the version is 1.5 or greater, you're ready to work through this article. Note that 1.5 is equivalent to 5.0, and it's not completely clear (to me or most Java developers) why all the public literature says "Java 5", but the java command still returns 1.5. Still, if you've got 1.5.x or even 1.6.x, you're in great shape. If not, check Resources for links to download Java 5 technology.

Make sure you've got XPath support

Next, you need to make sure that you have XPath support. That probably sounds like a redundancy; you just checked to see if you've got at least Java 5 software, right? Still, there are tons of developers who have a different Java version on their system path than their development environment is using. Or Eclipse, your IDE, is running something other than your Web application server. And on and on...the best way to avoid problems like this sneaking up on you is to build a tiny program to test things out. Listing 2 shows a program that does nothing more than create a new instance of the XPath factory, XPathFactory. This also ensures things like parsers and an implementation are set up and running.


Listing 2. A very simple XPath test class
                
import javax.xml.xpath.XPathFactory;

public class XPathTester {

  public static void main(String args[]) {
    try {
      XPathFactory factory = XPathFactory.newInstance();
    } catch (Exception e) {
      System.err.println("Uh oh...looks like you don't have the version " +
        "of JAXP with XPath support. Better upgrade to Java 5 or greater.");
    }
    System.out.println("Successfully loaded XPath factory. Things look good.");
  }
}

Compile this class and run it. You should get the very basic output in Listing 3.


Listing 3. Successful output of the test class from Listing 2
                
[bdm0509:~/Documents/developerworks/java_xpath] java XPathTester
Successfully loaded XPath factory. Things look good.

This is pretty trivial, but you can take this class and try it on your Web server, your application server, your four mirrored production servers...and anywhere else you want your XPath code to run. If the class runs on those machines, then you're safe to develop more complex XPath apps. If the test class doesn't run, spend your time getting XPath support working before spending hours on writing code that might not work when it counts.


An overview of the XPath API

Understanding the XPath part of the JAXP API is really dependent upon understanding how JAXP handles all XML parsing, processing, and transformations.

The basic JAXP workflow

You'll recall that the basic steps to work with XML are:

  1. Get a factory class to provide instances of a vendor-specific JAXP implementation.
  2. Get a parser or transformer instance from the factory.
  3. Set configuration options on or around that parser or transformer (validation, namespace-awareness, stylesheet to use, and so on).
  4. Create an object to hold, store, or reference the XML to be operated on (usually through some type of InputSource.
  5. Parse or transform the XML.

This usually resembles Listing 4 in code, which shows a simple XML parse, using a command-line argument as the filename of the XML document to parse.


Listing 4. Using the SAXParserFactory
                
import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

// SAX
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSAXParsing {
    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.err.println ("Usage: java TestSAXParsing [filename]");
                System.exit (1);
            }

            // Get SAX Parser Factory
            SAXParserFactory factory = SAXParserFactory.newInstance();

            // Turn on validation, and turn off namespaces
            factory.setValidating(true);
            factory.setNamespaceAware(false);
            SAXParser parser = factory.newSAXParser();
            parser.parse(new File(args[0]), new MyHandler());
        } catch (ParserConfigurationException e) {
            System.out.println("The underlying parser does not support " +
                               " the requested features.");
        } catch (FactoryConfigurationError e) {
            System.out.println("Error occurred obtaining SAX Parser Factory.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

class MyHandler extends DefaultHandler {

    // SAX callback implementations from ContentHandler, ErrorHandler, etc.
}

If you're building a DOM tree, the process still follows the same model. Listing 5 shows code to create a DOM tree of an XML document, and the steps are very similar, even though the class and method names change.


Listing 5. Using the document builder factory
                
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

// DOM
import org.w3c.dom.Document;
import org.w3c.dom.DocumentType;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class TestDOMParsing {

    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.err.println ("Usage: java TestDOMParsing " +
                                    "[filename]");
                System.exit (1);
            }

            // Get Document Builder Factory
            DocumentBuilderFactory factory = 
                DocumentBuilderFactory.newInstance();

            // Turn on validation, and turn off namespaces
            factory.setValidating(true);
            factory.setNamespaceAware(false);

            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(new File(args[0]));

            // Print the document from the DOM tree and
            //   feed it an initial indentation of nothing
            printNode(doc, "");

        } catch (ParserConfigurationException e) {
            System.out.println("The underlying parser does not " +
                               "support the requested features.");

        } catch (FactoryConfigurationError e) {
            System.out.println("Error occurred obtaining Document " +
                               "Builder Factory.");

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void printNode(Node node, String indent)  {
        // print the DOM tree
    }
}

In both cases, you get a factory, use that to create a new parser/processor instance, and then operate on that instance.

The XPath workflow

The workflow is very similar when you write XPath code:

  1. Get an XPath factory class to provide instances of a vendor-specific XPath implementation.
  2. Get an XPath evaluator instance from the factory.
  3. Create a new XPath expression. (This step is different from the parsing model, although it still aligns with assigning a stylesheet in the XML transformations model.)
  4. Build a DOM tree of the XML document to evaluate the XPath expression against.
  5. Evaluate the XPath expression.

Let's walk through this process step-by-step, build up a basic program for parsing XPath expressions, and then you can evaluate any of your own XPaths, or any of the XPaths you wrote when you worked through the XPath tutorial (those links are in Resources).


Feed a DOM tree to your XPath

You need to keep in mind a few suppositions that apply to the program you'll build in this article:

  • You have an XML document that you can easily convert into a DOM tree. This article's example reads in an XML document from the command line, and converts it to a DOM tree, but you can just as easily build a DOM tree from a network URI, a set of SAX events, or any other source. If you're rusty on how to get a DOM tree from various sources using JAXP, check out Resources for some helpful links.
  • An XPath you want to evaluate. This article assumes you already have an XPath, or at least know how to construct one. There's no substantive discussion about how to build XPaths, but more on how to evaluate them.

Once you take care of these things, you're ready to write code.

Get an XML document to evaluate your XPath against

Begin with a simple program that reads in a filename from the command-line. You'll use that name to build a DOM tree from the XML document the filename references. There's nothing XPath- or even JAXP-specific here; just some simple I/O and program plumbing. Listing 6 is the beginning of your program; save this as XPathEvaluator.java.


Listing 6. Initial version of program to evaluate XPaths
                
package ibm.dw.xpath;

public class XPathEvaluator {

  public XPathEvaluator(String xmlFilename) {
    // Convert filename into a DOM tree
  }

  public void evaluateXPath(String xpathString) {
  }

  public static void main(String[] args) {
    try {
      if (args.length != 1) {
        System.err.println("Usage: java ibm.dw.xpath.XPathEvaluator " +
          "[XML filename]");
        System.exit(1);
      }
      XPathEvaluator evaluator = new XPathEvaluator(args[0]);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Convert your XML into a DOM tree

The XPath API—at least in its current form in JAXP—requires a DOM tree to operate upon. All XPaths require some sort of in-memory model to operate upon, because XPaths are fundamentally about the hierarchy of an XML document. DOM provides this, in the form of a navigable tree of elements, attributes, and text nodes.

Since you're already using JAXP for XPath support, you get DOM support as well, for free. Use the DocumentBuilder class (and its associated factory, DocumentBuilderFactory) to convert the string reference to an XML document into an in-memory DOM tree. Listing 7 shows the additions to XPathEvaluator to take care of this.


Listing 7. Creating a DOM tree from the input XML document
                
package ibm.dw.xpath;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XPathEvaluator {

  private Document domTree = null;

  public XPathEvaluator(String xmlFilename) {
    try {
      // Convert filename into a DOM tree
      DocumentBuilderFactory domFactory =
        DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = domFactory.newDocumentBuilder();
      this.domTree = builder.parse(xmlFilename);
    } catch (SAXException e) {
      throw new IOException("Error in document parsing: " + e.getMessage());
    } catch (ParserConfigurationException e) {
      throw new IOException("Error in configuring parser: " + e.getMessage());
    }
  }

  public void evaluateXPath(String xpathString) {
  }

  public static void main(String[] args) {
    try {
      if (args.length != 1) {
        System.err.println("Usage: java ibm.dw.xpath.XPathEvaluator " +
          "[XML filename]");
        System.exit(1);
      }
      XPathEvaluator evaluator = new XPathEvaluator(args[0]);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Most of the code is just DOM-based JAXP parsing; if you're unclear on what's going on here, check Resources specifically for the links on general JAXP parsing and transformation articles.

A note on namespace awareness

XPath, as is the case with most XML specifications that are fairly modern and current, is namespace aware. That means that namespace prefixes on elements (like iTunes:artist) can be part of your XPaths. Even if you're not using namespaced documents, though, you should ensure that you have this capability for the future.

To do that, though, you must ensure that your DOM tree is namespace aware. In other words, you ensure that the input to your XPath evaluations is namespace-aware, so your evaluations can be. To ensure that, always turn on namespace awareness when you build your DOM tree. Listing 8 shows a single-line addition to accomplish that.


Listing 8. Adding namespace awareness to building the DOM tree
                
package ibm.dw.xpath;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XPathEvaluator {

  private Document domTree = null;

  public XPathEvaluator(String xmlFilename) {
    try {
      // Convert filename into a DOM tree
      DocumentBuilderFactory domFactory =
        DocumentBuilderFactory.newInstance();
      domFactory.setNamespaceAware(true);
      DocumentBuilder builder = domFactory.newDocumentBuilder();
      this.domTree = builder.parse(xmlFilename);
    } catch (SAXException e) {
      throw new IOException("Error in document parsing: " + e.getMessage());
    } catch (ParserConfigurationException e) {
      throw new IOException("Error in configuring parser: " + e.getMessage());
    }
  }

  public void evaluateXPath(String xpathString) {
  }

  public static void main(String[] args) {
    try {
      if (args.length != 1) {
        System.err.println("Usage: java ibm.dw.xpath.XPathEvaluator " +
          "[XML filename]");
        System.exit(1);
      }
      XPathEvaluator evaluator = new XPathEvaluator(args[0]);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}


Represent an XPath in the Java environment

Once you've got a DOM tree to evaluate, you need to take your XPath—which is just a textual string—and create a Java representation of that. Of course, that doesn't just mean that you create a String variable and stuff the XPath into it. You need an actual Java object that can either evaluate itself, or be evaluated by some other XPath-aware component, against the DOM tree you've now got. That's where JAXP's new API additions come into play.

Start with an XPath factory

Here's where that sequence of events from earlier comes into play. You begin all your XPath work—outside of getting a DOM tree ready, which technically can be done anytime before actual XPath evaluation—with a new class, javax.xml.xpath.XPathFactory.

Specifically, XPathFactory is an interface, and you need an implementation of that interface. That implementation will be vendor-specific; Sun provides a default implementation, Apache might have an implementation, Oracle might have an implementation...but none of that code belongs in a nice, vendor-neutral piece of code. Instead, you can abstract vendor specifics away with XPathFactory, and its newInstance() method, which handles getting an implementation of XPathFactory for you.

Listing 9 takes care of that. Note that this listing shows only the evaluateXPath() method. You'll need to add a few import statements to your code to make this work, all in the javax.xml.xpath package.


Listing 9. Getting an instance of XPathFactory
                
  public void evaluateXPath(String xpathString) {
    XPathFactory factory = XPathFactory.newInstance();
  }

Get an XPath object

Next up, you need an XPath object. This object is capable of evaluating XPaths, and is the cornerstone of your XPath-aware Java programs. Just as you get a DocumentBuilder from a DocumentBuilderFactory, you get an XPath from an XPathFactory. Listing 10 shows this minimal code.


Listing 10. Getting an XPath object
                
  public void evaluateXPath(String xpathString) {
    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
  }

With this object, you're ready to evaluate your XPath, and work with the results.


Evaluate an XPath expression

Once you have an XPath instance, you can evaluate XPaths, get a resulting node set, and do some work with those results.

Evaluate an XPath

You evaluate an XPath (not the Java object, but a string path referring to an XML document) with the evaluate method on the XPath Java object. That's a bit confusing: you use XPath to evaluate an XPath. So in a truer sense, the XPath object is an XPath evaluator.

The evaluate() method takes two arguments: a string XPath, a DOM tree to evaluate that XPath against, and an XPath constant indicating the return type. The return type turns out to be pretty inflexible; the specification of a return type is really for future compatibility; for now, always use XPathConstants.NODESET, to have your results returned as a DOM NodeList structure.

See the code to evaluate an XPath in Listing 11, added to the evaluateXPath method.


Listing 11. Evaluating an XPath
                
  public NodeList evaluateXPath(String xpathString) throws IOException {
    try {
      XPathFactory factory = XPathFactory.newInstance();
      XPath xpath = factory.newXPath();
      return (NodeList)xpath.evaluate(
        xpathString, domTree, XPathConstants.NODESET);
    } catch (XPathExpressionException e) {
      throw new IOException("Error evaluating XPath: " + e.getMessage());
    }
  }

Listing 11 includes lots of new additions, all important:

  1. The method now returns an org.w3c.dom.NodeList. Be sure to add an import org.w3c.dom.NodeList; statement to your code to make this work. NodeList is the structure used to return the list of nodes from the evaluation of your XPath.
  2. The entire code block is wrapped in a try/catch block, and the exception that can result from XPath evaluation— javax.xml.xpath.XPathExpressionException —is caught and rethrown as an IOException. You'll come back to the reasoning behind this shortly.
  3. evaluate() is called with the XPath string passed into the method, the DOM tree you built in the class's constructor, and the constant indicating to return results as a list of nodes.
  4. The result of evaluate, which is an Object, is cast to the DOM NodeList type, and returned.

Despite several things happening, they're all pretty straightforward, and nothing that should trip you up.

XPath-specific, DOM-specific, JAXP-specific?

One interesting point is the decision to return any exceptions from this method, as well as any that arise in the constructor, as IOExceptions. That's a design decision, and not really XPath-specific, but it's important. With that decision, you can insulate users of this class—through the command line or another program—from having to know, import, or directly use any XPath classes or interfaces.

In fact, you abstracted away all JAXP classes, DOM classes, SAX classes, and XPath classes...except the NodeList class from the DOM. That's pretty powerful, as other programmers don't need to be familiar with the JAXP or XPath API to get XPath evaluation. It takes your program from an interesting programming exercise to a reusable tool, and that's a pretty important distinction.

If you take this principle and want to go even a bit further, you could take the returned NodeList and iterate through it, and dump the results into a Java List. That would abstract away the details about DOM completely, and remove even the current small dependency on org.w3c.dom.NodeList.


Work with the results of evaluation

Once you get the results of an XPath evaluation, you're ready to work with those results...in whatever format you like. For the sake of example, you'll look at just iterating through the results and printing them out. Of course, you can expand on this as much as you like.

A very simple iteration through result nodes

Each member of a NodeList is in fact a DOM Node (org.w3c.dom.Node), and you can then find out the name of the node, its type, and pretty much anything else about that node you want. Listing 12 shows a very basic addition to the XPathEvaluator class that passes in an XPath to evaluate, gets the results, and prints them out.


Listing 12. Completing the XPathEvaluator program (take one)
                
package ibm.dw.xpath;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class XPathEvaluator {

  private Document domTree = null;

  public XPathEvaluator(String xmlFilename) throws IOException {
    try {
      // Convert filename into a DOM tree
      DocumentBuilderFactory domFactory =
        DocumentBuilderFactory.newInstance();
      domFactory.setNamespaceAware(true);
      DocumentBuilder builder = domFactory.newDocumentBuilder();
      this.domTree = builder.parse(xmlFilename);
    } catch (SAXException e) {
      throw new IOException("Error in document parsing: " + e.getMessage());
    } catch (ParserConfigurationException e) {
      throw new IOException("Error in configuring parser: " + e.getMessage());
    }
  }

  public NodeList evaluateXPath(String xpathString) throws IOException {
    try {
      XPathFactory factory = XPathFactory.newInstance();
      XPath xpath = factory.newXPath();
      return (NodeList)xpath.evaluate(
        xpathString, domTree, XPathConstants.NODESET);
    } catch (XPathExpressionException e) {
      throw new IOException("Error evaluating XPath: " + e.getMessage());
    }
  }

  public static void main(String[] args) {
    try {
      if (args.length != 1) {
        System.err.println("Usage: java ibm.dw.xpath.XPathEvaluator " +
          "[XML filename]");
        System.exit(1);
      }
      XPathEvaluator evaluator = new XPathEvaluator(args[0]);
      String xpathString = "//target[@name='init']/property[" +
                           "starts-with(@name, 'parser')]";
      NodeList results = evaluator.evaluateXPath(xpathString);
      for (int i=0; i<results.getLength(); i++) {
        Node node = results.item(i);
        System.out.print("Result: ");
        switch (node.getNodeType()) {
          case Node.ELEMENT_NODE: System.out.println("Element node named " +
                                    node.getNodeName());
                                  break;
          case Node.ATTRIBUTE_NODE: System.out.println(
                                     "Attribute node named " +
                                       node.getNodeName() + " with value '" +
                                       node.getNodeValue() + "'");
                                  break;
          case Node.TEXT_NODE:    System.out.println("Text: '" +
                                    node.getNodeValue() + "'");
                                  break;
          default: System.out.println(node);
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

If you read the two-part tutorial on XPath, you'll recognize this XPath as selecting several properties. You should download the example file xerces-build.xml (available in Resources if you don't already have it) to run this example, as shown in Listing 13.


Listing 13. Running XPathEvaluator program (take one)
                
[bdm0509:~/java_xpath] java ibm.dw.xpath.XPathEvaluator xerces-build.xml 
Result: Element node named property
Result: Element node named property
Result: Element node named property
Result: Element node named property
Result: Element node named property
Result: Element node named property

These results look pretty bland, especially if you compare them to Figure 1, a screen capture from the tutorial where this same expression was evaluated graphically using a tool for evaluating XPaths.


Figure 1. Evaluating an XPath expression in a graphical tool
You can use starts-with() in a predicate, and indicate any relative node to involve in the comparison

However, that printed out view of elements is deceptively simple.

A node has more than just a name

Remember that while all the sample program did was print out a node's name, type, and possibly its value (depending on that type), you've still got a complete Node object. Further, that node isn't in isolation; it's a reference to a node in an in-memory DOM tree (even if you don't see that DOM tree from a usage perspective; it's hidden internally in the XPathEvalutor code).

What that means is that for each Node, you've really got a location pointer within the complete XML document you handed off to XPathEvaluator. That means you can navigate to a node's children, see what attributes exist on an element node, find out the name of a text node's parent element, and perform any other DOM operation that's allowed on a Node. You don't just have a node, you have a reference to that node in its full DOM context. It's up to you to determine what you do with that node, and the context within which it's positioned.

About those earlier JAXP, DOM, and XPath abstractions...

You might have noticed that all the work intimated above to avoid DOM-specific references now goes out the window. In fact, that's why XPathEvaluator abstracts XPath details away from users of the class, but still returns a DOM NodeList. You can safely insulate your users from JAXP and XPath, but to do much with the results of an XPath evaluation, you'll need to work with the DOM.

For that reason, it's best to return DOM structures, but avoid requiring XPath-specific input or providing XPath-specific output. Let your users work with the DOM, and nothing else, at least in terms of your requirements for your class functioning.


And then on to XQuery...

Developers like you and me are a short-tempered, anxious lot. As you begin to get the feel and command of using XPath from the Java environment, you're probably already thinking about what you can't do with XPath. Particularly complex relationships between data aren't easy to deal with (using SQL-like joins is at the outer extremes of what XPath is built to do), you must do ordering of nodes and further filtering in the Java environment, and readability of XPath is pretty difficult if you're not already familiar with the specification.

Thankfully, you can take a very natural step from XPath to the next thing which addresses all of these limitations, and does it in a way that is reminiscent of what you've already don. XQuery adds more of an XML-ized version of SQL, allowing you to build queries, sort and order results, and use actual WHERE statements in your queries. XQuery also builds on XPath, meaning everything you've learned about nodes, predicate matching, and how elements and attributes relate to each other applies to XQuery.

And, just as XPath does, XQuery has an API for its inclusion in Java programmers: the XQuery for Java (XQJ) API. For a lot more on XQuery, check out Resources, which has links to articles and tutorials on XQuery and XQJ. And once you feel you've gotten your head firmly around XPath, take a look at XQuery to add even more power to your XML-related application code.


Conclusion

Much of using XPath from Java technology is simply to learn new syntax, get an API and a few tools configured, and then apply what you already know about XPath. That shouldn't make you think that using XPath in the Java environment is trivial, though. Beyond a need for complexity, XPath offers a tremendous amount of flexibility when you work with XML from Java programming. It certainly moves you far beyond what most basic SAX, DOM, JAXP, JDOM, or other., implementations provide (although some vendors and projects provide XPath-capable extensions to the basics that those specs and APIs offer).

And, XPath offers a wonderful gateway to the more complex XQuery language, and Java and XQuery combinations (using the XQJ API). Rather than immediately move on to XQuery, you'll do well to polish your XPath skills, and learn to select complex node sets from within your Java applications, and manipulate those as needed. You'll find lots of cases where you don't need anything beyond XPath. On top of that, XQuery builds upon XPath—both from a lexical perspective and in terms of the XQJ API, which can actually evaluate XPaths as well as execute XQueries—so you're improving your XQuery skills implicitly. Most of all, have fun with the increased flexibility that XPath offers, especially when evaluated from the Java environment.



Downloads

DescriptionNameSizeDownload method
Sample compiled code for articlecompiledCode.zip3KB HTTP
Sample source code for articlesourceCode.zip2KB HTTP
Sample XML for articlexerces-build-xml.zip11KB HTTP

Information about download methods


Resources

Learn

  • If you're unfamiliar with XPath, take this two-part tutorial:
  • XPath 1.0: Read the formal definition of XPath in the original specification.

  • XPath 2.0: Read the online specification for the most current version of XPath.

  • Tutorial on XPath: Understand how XPath is fundamental to much advanced XML usage with this useful but brief tutorial from the W3C.

  • Online function reference for XPath, XQuery, and XSLT: Once you understand how predicates and functions work, visit this great resource to look up syntax and find functions that aren't commonly discussed.

  • Sun's XQuery for API: Move from XPath to XQuery when you check out this page, which details the complete XQuery for Java API.

  • DataDirect resource page: From DataDirect, the hosts of xquery.com, explore implementations of both an XPath evaluator (Stylus Studio), and an XQuery for Java (XQJ) engine, plus related work on XQuery and XPath.

  • DataDirect online help system: Visit this indexed, searchable resource that's great for finding out about particular DataDirect objects and methods.

  • Understanding DOM (Nicholas Chase, developerWorks, March 2007): Dig deeper into manipulating XML from a node-based API in an excellent tutorial.

  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.

  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

  • The technology bookstore: Browse for books on these and other technical topics.

  • developerWorks podcasts: Listen to interesting interviews and discussions for software developers.

Get products and technologies

  • Java 5 SE: Download for integrated XPath support on your system.

  • Java 6 software: If you're considering upgrading to Java 5 technology, just skip version 5 and go straight to the very latest version, if at all possible.

  • Stylus Studio 2008 XML: Download to start with XPath and XML documents on the Windows platform.

  • AquaPath: Download to enable easy XPath location evaluation on Mac OS X.

  • DataDirect's XQuery for Java implementation: Download and get started with XQuery and Java searches.

  • Java & XML, Third Edition (Brett McLaughlin and Justin Edelson, O'Reilly Media, 2006): Cover XML from start to finish, including extensive information on XML, XSL, and a number of related XML specifications.

  • IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

About the author

Photo of Brett McLaughlin

Brett McLaughlin is a bestselling and award-winning non-fiction author. His books on computer programming, home theater, and analysis and design have sold in excess of 100,000 copies. He has been writing, editing, and producing technical books for nearly a decade, and is as comfortable in front of a word processor as he is behind a guitar, chasing his two sons around the house, or laughing at reruns of Arrested Development with his wife. His last book, Head First Object Oriented Analysis and Design, won the 2007 Jolt Technical Book award. His classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Comments



Trademarks

static.content.url=/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=317427
ArticleTitle=Evaluating XPaths from the Java platform
publish-date=07082008
author1-email=brett@newInstance.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Special offers