To complete namespace support, JAXP 1.3 introduces a new javax.xml.namespace package that enables you to manipulate and query namespace information using the NamespaceContext interface and the QName class. The NamespaceContext interface stores the prefix-to-namespace mapping that is available in the current document context. The interface provides methods for getting a namespace URI for a given prefix, getting a prefix for a given namespace URI, or getting all prefixes bound to a given namespace URI. The NamespaceContext is used in the new XPath API, which is described later in the article.
The QName class represents the qualified name as specified by the Namespaces in XML Recommendation (see Resources) and, as mentioned in Part 1, this class was originally defined in the Java API for XML-Based RPC (JAX-RPC) specification (see Resources). The QName stores values for the local part, the namespace URI, and the corresponding prefix if available. It is important to point out that the prefix value is ignored in the implementation of the equals() and hashCode() methods.
The javax.xml package, also mentioned in Part 1, contains one class: XMLConstants. This class defines useful constants such as constant values for JAXPâs recognized schema languages and various constant values related to namespaces.
XSL transformation package changes
In this version of JAXP, the changes to the javax.xml.transform package mainly focus on fixing bugs and clarifying some parts of the API. The most significant change is that the JAXP 1.3 reference implementation changes the default transformation engine: In JAXP 1.2, it was the interpreting transformer (Xalan); in JAXP 1.3, the default transformation engine is the compiling transformer (XSLTC). XSLTC works by compiling a stylesheet into Java byte code, called a translet. The translet is later used to perform XSL transformations. This approach greatly improves the XSLT performance, since each stylesheet is parsed and compiled only once and reused for each subsequent transformation.
The XML Schema to Java types mapping
The XML Schema datatypes are widely accepted and used as a type system for many other specifications, such as Web Services Description Language (WSDL -- see Resources). Many XML applications written in the Java language either need or like to access a Java type that represents an XML Schema datatype value. As a result, the last couple of years have seen several attempts to define a mapping between XML Schema datatypes (for example, xs:string) and Java types. Examples of such attempts include Castor, the open source XML data binding framework, and the Java Architecture for XML Binding (JAXB) 1.0 specification (see Resources). As you can see in Table 1, the mapping is straightforward for most types.
Table 1. XML Schema datatypes to Java types mapping (partial)
| XML Schema datatypes | Java types |
xs:string | java.lang.String |
xs:decimal | java.math.BigDecimal |
xs:float | float |
xs:short, xs:unsignedByte | short |
xs:int | int |
xs:boolean | boolean |
xs:base64Binary, xs:hexBinary | byte[] |
| ... | ... |
However, some data types defined in the XML Schema Datatypes specification do not map one-to-one to any existing Java classes. In particular, the Java type system does not have a type that corresponds to the XML Schema xs:duration datatype, and does not have a class with a one-to-one correspondence to other XML Schema date/time types (for example, xs:gYear).
JAXP 1.3 completes the mapping by defining the missing types as a part of the Java platform. Table 2 shows the newly defined Java types and their mapping to the XML Schema datatypes.
Table 2. New JAXP 1.3 types and their mapping to the XML Schema datatypes
| Java types | XML Schema datatypes |
javax.xml.datatype.XMLGregorianCalendar | xs:gDay, xs:gMonth, xs:gMonthDay, xs:gYear, xs:gYearMonth, xs:time, xs:dateTime, xs:date |
javax.xml.datatype.Duration | xs:duration |
javax.xml.namespace.QName | xs:QName and xs:NOTATION |
Note that the new datatypes defined by the XQuery 1.0 and XPath 2.0 specifications (xdt:dayTimeDuration and xdt:yearMonthDuration) also map to the Duration class.
The javax.xml.datatype overview
Like other JAXP packages, javax.xml.datatype defines a DatatypeFactory class that enables you to plug in multiple implementations of data type factories. Similar to the DOM and SAX factories (DOMBuilderFactory and SAXParserFactory), DatatypeFactory is an abstract class that has a static newInstance() method that enables you to create a concrete implementation of the DatatypeFactory. Using an instance of the DatatypeFactory, you can create Duration and XMLGregorianCalendar objects.
Duration objects are immutable and you can create them from a wide range of values, including:
- Lexical representations
- Values expressed in milliseconds (Java long)
- A sequence of values indicating positive or negative direction in time, years, months, days, hours, minutes, and seconds
The Duration class provides several methods, including methods that:
- Allow comparison of
Durationobjects (as defined by the XML Schema specification) - Add or subtract two
Durations - Add
Durationto JavaCalendar,Date, orXMLGregorianCalendarobjects
For more details, see the Java documentation for the Duration class.
XMLGregorianCalendar objects are mutable, and therefore you can simply set any of the date or time fields directly on the object. The class also has methods that allow for validation of XMLGregorianCalendar objects, conversion to instances of the Java GregorianCalendar class, and other actions.
The javax.xml.datatypes package also defines the DatatypeConstants utility class that contains basic data type values as constants.
Listing 1 shows how to create and work with Duration and XMLGregorianCalendar types. For simplicity, you can assume that you want to create an application that, when given a purchase date of a product and the Duration of the warranty for this product, will compute the date of the warrantyâs expiration.
Listing 1. Using the JAXP types
// Create a data type factory
DatatypeFactory df = DatatypeFactory.newInstance();
// Create a purchase date of a product (xs:date)
XMLGregorianCalendar purchaseDate = df.newXMLGregorianCalendar();
purchaseDate.setYear(2004);
purchaseDate.setMonth(DatatypeConstants.DECEMBER);
purchaseDate.setDay(1);
// Print the purchase date
System.out.println("Purchase date: " + purchaseDate.toXMLFormat());
// Create a warranty duration (1 year)
Duration warrantyDuration = df.newDuration("P1Y");
// Now compute the warranty expiration date
purchaseDate.addDuration(warrantyDuration);
// Print out the expiration date
System.out.println("Expiration date: " +purchaseDate.toXMLFormat());
|
Listing 2 shows the output of the above application:
Listing 2. Output from Listing 1
Purchase date: 2004-12-01 Expiration date: 2005-12-01 |
XPath 1.0 is a W3C Recommendation that defines a language that provides the capability to extract portions of an XML document. You can extract portions that might be as large as collections of elements together with all their descendants, or as small as a single attribute value. To select some parts of a document, you specify a path between a starting node (called the context node) and the contents to be selected. You might specify a path that simply selects a particular child element of the context node, or one that selects all elements in the entire subtree rooted at the context node that match a complex expression (such as that they contain attributes with particular values and exactly two child elements).
Despite the fact that the XPath 1.0 Recommendation has been around for a long time (in XML terms anyway; it celebrated its fifth birthday on November 16, 2004), JAXP 1.3 finally brings this functionality into the Java platform. Unlike previous XPath APIâs, JAXP 1.3 is entirely vendor-neutral; it provides the same type of factory mechanism to allow the system to find and create a compliant object, just as has always been true in the parser and transformer arenas. The JAXP 1.3 API is also agnostic about the underlying data model. In principle, you can use any data model with a well-defined mapping to the simple model defined by XPath 1.0 (so you can be apply XPath expressions against it in a well-defined way) with JAXP 1.3. The W3C Document Object Model (DOM) is the only data model that JAXP 1.3 implementations are required to support.
You can find all interfaces and abstract classes associated with the new XPath API in the javax.xml.xpath package. Unsurprisingly, the object used to create objects that can evaluate XPath expressions is called the XPathFactory. The objects it creates are simply called XPaths.
An XPathFactory is only expected to know how to create XPath objects for one particular type of data model. Therefore, you must specify the data model when creating an XPathFactory. As with the validation API, you do this by assigning URIâs to data models. If no URI is specified, an XPathFactory for the DOM is produced. You can use the same XPath object on multiple DOM trees, though you should note that XPath objects are not thread-safe.
You can use XPath objects for two primary purposes
- To evaluate XPath expressions, passed as simple String objects, given a particular node in an instance of the supported data model to act as the context node. In this mode they are said to interpret the XPath expression, since the String is being applied directly to the data model.
- To convert an XPath expression from a String to an
XPathExpressionobject, which can then be applied to any node in an instance of the supported data model.XPathExpressionobjects are created by passing a String representation of the XPath expression into the compile method onXPath.XPathExpressionobjects are compiled representations of the original XPath String; they represent internal, optimized representations of the XPath expression. Indeed, in many implementations theXPathExpressionis made up entirely of Java bytecode -- and itâs hard to get more optimized than that!
Whenever you can do something in two ways, consider the conditions under which you prefer one alternative. With XPath expressions, keep in mind that compiling an XPath expression into any optimized representation, including bytecode, involves quite a bit of effort; hence, if an expression is very simple or used infrequently, itâs probably not worthwhile to compile it. But, for complex expressions, and especially for expressions that are frequently used in your application, compilation can offer dramatic performance increases.
Both XPath and XPathExpression offer four different methods for evaluating an XPath expression, which are provided as different overloadings of a method called evaluate. The signatures for the corresponding methods are identical, except that XPath's evaluate methods must take a String representing the XPath expression as their first parameter, whereas XPathExpressions can dispense with this parameter entirely since they already embody particular expressions. For simplicity, this article only covers the evaluate method of XPathExpression.
The form of evaluate that you will use most often is likely to be the method that takes an object representing the context node and a QName indicating the expressionâs expected return type. The return type can be any one of the four basic XPath 1.0 data types: boolean, string, node set, and number. The data model in use determines how these XPath data types are represented in Java code; for DOM, theyâre defined to be Boolean, String, org.w3c.dom.NodeList and Double respectively. One must therefore take into account both the expected return type and the data model when deciding how to cast the object that this method returns. The object representing the context node also needs to be appropriate for the data model; for DOM, you can use any type of org.w3c.dom.Node. The XPathConstants class contained in this package defines QNames for each of the four XPath data types.
The other forms of evaluate are mere variations:
evaluate(Object item): String-- shorthand forevaluate(item, XPathConstants.STRING); note that the return type is already known.evaluate(InputSource source, QName returnType): Object-- causesorg.xml.sax.InputSourceto be parsed into an instance of the data model, the root node of which serves as the context node; otherwise, it is identical to theevaluatemethod described in the preceding paragraph.evaluate(InputSource source): String-- shorthand forevaluate(source, XPathConstants.STRING); it is symmetric with the version ofevaluatedescribed in the first bullet.
The following example returns to the purchase order document used in Part 1. This time, you assume that you have to do some special processing if the purchase involves more than one item and you are shipping the goods to a different person than you are billing. Assuming that fileSource is an org.xml.sax.InputSource that points to a purchase order document, this code does the trick:
Listing 3. Using the XPath API
// XPath expressions
String shipToNameExprStr = "string(purchaseOrder/shipTo/name)";
String billToNameExprStr = "string(purchaseOrder/billTo/name)";
String itemsExprStr = "purchaseOrder//item";
try {
// First, get a DocumentBuilder
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
// Get an XPathFactory that works on DOM trees
XPathFactory xpf4dom = XPathFactory.newInstance();
XPath xpath4dom = xpf4dom.newXPath();
// Build compiled XPath expressions
XPathExpression billToNameExpr = xpath4dom.compile(billToNameExprStr);
XPathExpression shipToNameExpr = xpath4dom.compile(shipToNameExprStr);
XPathExpression itemsExpr = xpath4dom.compile(itemsExprStr);
// Parse the file and run XPaths against result
Document doc = db.parse(fileSource);
NodeList items = (NodeList)itemsExpr.evaluate(doc, XPathConstants.NODESET);
String billToName = billToNameExpr.evaluate(doc);
String shipToName = shipToNameExpr.evaluate(doc);
// Do you have special processing to do?
if(items.getLength() > 1
&& !billToName.equalsIgnoreCase(shipToName)) {
// special processing
}
} catch (Exception e) {
// Uh-oh; something bad has happenedâ¦
}
|
Note a few other details about the XPath API. Foremost among them is that XPath expressions can only reference namespace-qualified element and attribute names by the use of namespace prefixes. To compile an XPath expression that uses namespaces , the application needs to register an instance of an implementation of the NamespaceContext interface thatâs part of the javax.xml.namespace package, since no context node is present for namespace declarations to reference.
The XPath 1.0 Recommendation provides some useful hooks for extensibility. XPath expressions can include:
- Variables: The XPath processor associates values to these identifiers when evaluating the expression.
- Functions: The XPath processor associates values to functions when evaluating the expression.
JAXP 1.3 enables this to work by defining the XPathVariableResolver and XPathFunctionResolver interfaces. The application can implement both and register either with an XPathFactory or an XPath instance.
The XPathVariableResolver interface contains one method, resolveVariable. This method takes a QName that identifies the variable and returns an object appropriate to the underlying data model. XPathFunctionResolver contains an analogous method called resolveFunction. But that method takes an int specifying the number of arguments that the function expects as well as a QName to identify it. The resolveFunction method returns an XPathFunction object. XPathFunctions have an evaluate method that takes a List whose length must correspond to the int parameter in the resolveFunction callback that resulted in the XPathFunction being returned; similarly, the resolveVariable method returns an object that's appropriate to the data model.
In this article, we described utilities that support XML Namespaces in JAXP 1.3, as well as the slight changes made to the javax.xml.transform package. We also discussed how this API completes native Java support for all W3C XML Schema datatypes. The article concluded by presenting the XPath capabilities contained in JAXP 1.3. Combined with the information found in the first article in this series, you should now have a clear picture of the many performance and usability enhancements offered by JAXP 1.3.
- Read Part 1 of this two-part series on JAXP 1.3, which provides a brief overview of the specification, gives details of the modifications to the
javax.xml.parserspackage, and describes a powerful schema caching and validation framework (developerWorks, November 2004). - Find out more about Java API for XML Processing (JAXP).
- Find all of the W3C specifications on the W3C Technical Reports page, including:
- XML Path Language (XPath) Version 1.0
- Namespaces in XML 1.1
- XML Schema 1.1 Part 2: Datatypes
- Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language
- XQuery 1.0
- Take a closer look at the Namespaces in XML page on W3C, where you can learn more about qualified names.
- Read about Java API for XML-Based RPC (JAX-RPC).
- Get the latest copy of Castor from the Castor Web site.
- Learn about the DOM data model.
- Find out more about the Java Architecture for XML Binding (JAXB), the evolving standard for Java Platform data binding.
- Confused by all the XML standards out there? Uche Ogbuji's developerWorks article series on XML standards can help you sort through it all:
- Part 1 -- The core standards
- Part 2 -- XML processing standards
- Part 3 -- The most important vocabularies
- Part 4 -- Detailed cross-reference of the most important XML standards
- Find more related resources on the developerWorks XML and Java technology zones.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
Neil Graham is the Manager of XML Parser Development at IBM. He is a committer on Apache's Xerces-Java and Xerces-C++ XML parsers, where he has worked on, among other things, the implementation of XML Schema, XML 1.1, and grammar caching. He was also one of IBM's representatives on the Expert Group that developed JAXP 1.3.
Elena Litani is a Software Developer working for IBM. She is one of the main contributors to the Eclipse Modeling Framework (EMF) project at Eclipse.org, which provides the reference implementation for Service Data Objects (SDO). Previously, Elena was one of the main contributors to the Apache Xerces2 project, working on Xerces2 XML Schema and DOM Level 3 implementations, as well as analyzing and improving performance of the parser. Elena has also represented IBM in the W3C DOM Working Group, and participated in the development of the DOM Level 3 specifications.