Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

What's new in JAXP 1.3? Part 1

An overview of the technology, and a look at parsing API changes and a new validation API

Neil Graham (neilg@ca.ibm.com), Manager, XML Parser Development, IBM
Neil Graham is the Manager of XML Parser Development at IBM. He is a committer on Apache's Xerces-Java and Xerces-C++ XML parsers, where he has worked on, among other things, the implementation of XML Schema, XML 1.1, and grammar caching. He was also one of IBM's representatives on the Expert Group that developed JAXP 1.3.
Elena Litani (elitani@ca.ibm.com), Software Developer, IBM
Elena Litani is a Software Developer working for IBM. She is one of the main contributors to the Eclipse Modeling Framework (EMF) project at Eclipse.org, which provides the reference implementation for Service Data Objects (SDO). Previously, Elena was one of the main contributors to the Apache Xerces2 project, working on Xerces2 XML Schema and DOM Level 3 implementations, as well as analyzing and improving performance of the parser. Elena has also represented IBM in the W3C DOM Working Group, and participated in the development of the DOM Level 3 specifications.

Summary:  For a mature technology, the XML space is surprisingly active. Java™ API for XML Processing (JAXP) 1.3 was recently finalized, and is the conduit through which many of the newest open standards relating to XML will enter the J2SE platform. In this installment of a two-part article describing the JAXP 1.3 API, authors Neil Graham and Elena Litani provide a brief overview of the JAXP specification, give details of the modifications to the javax.xml.parsers package, and describe a powerful schema caching and validation framework.

Date:  09 Nov 2004
Level:  Intermediate

Comments:  

Originally christened the Java API for XML Parsing, JAXP 1.0 simply provided a vendor-neutral means by which an application could create a DOM Level 1 or a SAX 1.0 parser. With the advent of JAXP 1.1 in 2001, the "P" came to signify Processing rather than Parsing, and the API’s focus broadened to provide a standardized means for applications to interact with XSLT processors. JAXP 1.1 was made part of both the Java 2 Standard Edition (J2SE) 1.4 and the Java 2 Enterprise Edition (J2EE) 1.3. JAXP 1.2 emerged in 2002 as a minor revision of the specification, and added a standardized means of invoking W3C XML Schema validation in JAXP-compliant parsers.

JAXP 1.3, which will be part of J2SE 5 and J2EE 4, is the first major release of this API in over three years. In this pair of articles, we will explore each of the areas of new functionality added to JAXP in this new version.

JAXP 1.3 overview

The JAXP specification endorses and builds upon the following specifications (see Resources):

  • XML 1.0 (3rd Edition) and XML 1.1, W3C Recommendations
  • Namespaces in XML 1.0 (including the Errata) and Namespaces 1.1, W3C Recommendations
  • XML Schema (including the Errata), a W3C Recommendation
  • XSL Transformations (XSLT) Version 1.0, a W3C Recommendation
  • XML Path Language (XPath) Version 1.0 (including the Errata), a W3C Recommendation
  • XML Inclusions (XInclude) Version 1.0, a W3C Proposed Recommendation at the time of this writing
  • Simple API for XML (SAX) 2.0.2 (sax2r3) and SAX Extensions 1.1

All JAXP 1.3-compliant implementations must support the specifications listed above.

The JAXP API includes several Java packages, each providing a portion of JAXP’s functionality:

  • javax.xml: This is the root package. It contains only one class (XMLConstants) that defines useful constants.
  • javax.xml.parsers: This package has existed since JAXP 1.0. It defines a vendor-neutral API for parsing and validating XML documents using SAX or DOM.
  • javax.xml.transform: This package has existed since JAXP 1.1. It defines an API for XSL Transformations.
  • javax.xml.namespace: This is a new package added in JAXP 1.3. It defines the QName class and NamespaceContext interface that allow you to manipulate namespaces. These classes were originally defined in the Java API for XML-Based RPC (JAX-RPC) specification (see Resources).
  • javax.xml.datatype: This is a new package added in JAXP 1.3. It defines new Java types to complete a mapping between W3C XML Schema data types and Java types.
  • javax.xml.validation: This is a new package added in JAXP 1.3. It defines an API that allows applications to cache schemas (such as W3C XML Schemas) and use them for validation of XML documents.
  • javax.xml.xpath: This is a new package added in JAXP 1.3. It defines a data model- and implementation-independent API for applying XPath expressions to documents.

JAXP also includes the org.xml.sax package, which contains the SAX API, and the org.w3c.dom package, which contains the DOM Level 3 API (see Resources).


JAXP 1.3 and XML parsing

To ensure that applications depending on a specific version of JAXP have the maximum amount of portability, ever since its inception, JAXP specification versions have been tied to specific versions of DOM and SAX, as well as the underlying XML and XML Namespaces specifications. None of these specifications have been static in the three years since JAXP’s last major revision (JAXP 1.1), so JAXP 1.3 steps up to the most recent versions of each of the specifications, allowing them to make their way into J2SE and J2EE.

Evolution of XML standards

The W3C finalized XML 1.0 3rd Edition, XML 1.1, and XML Namespaces 1.1 early in 2004. JAXP 1.3 requires that all three be implemented by conforming parsers. While XML 1.0 3rd Edition contains mostly clarifications that will be noticed by only the most XML-savvy of applications, XML 1.1 should have a very positive impact on the XML world by bringing about the dramatic expansion of characters that may be used in XML names. It does this by allowing XML forward compatibility with the Unicode Standard, alignment between the XML and Unicode definitions of what marks the end of a line, and a provision for the inclusion of references to all ASCII characters except 0 (including all control characters). XML Namespaces 1.1 allows namespace prefixes to be undeclared inside of document fragments, and of course, it references XML 1.1. Find out more about these specifications in the developerWorks article "XML 1.1 and Namespaces 1.1 revealed."

Another product of the W3C is XML Inclusions (XInclude) 1.0, currently a Proposed Recommendation. XInclude provides means by which XML documents can include all or parts of other XML documents and textual resources. Unlike XML entities, this is done entirely outside the framework of Document Type Definitions (DTDs), and so is friendly to XML Schema validation. It is also designed with namespaces in mind. Authors of XML resources with content that's shared among many documents will find XInclude invaluable. JAXP 1.3 provides that all conforming implementations will track this specification until it becomes a W3C Recommendation.

In terms of XML parsing APIs themselves, JAXP endorses SAX 2.0.2 and SAX’s Extensions 1.1, as well as DOM Level 3 Core and DOM Level 3 Load and Save. The DOM Level 3 specifications represent significant bodies of new functionality in their own right, and so fall outside the scope of these articles. IBM developerWorks already has some excellent articles on DOM Level 3 Core (see Resources), which the interested reader may wish to consult.

As the very minor change in version number implies, SAX 2.0.2 is not radically different from the SAX 2.0 that JAXP 1.1 endorsed. SAX 2.0.1 contained a number of signature-compatibility changes (which prevented its endorsement by JAXP 1.2), such as the addition of default constructors to SAX-defined exception classes and the addition of IOExceptions to the throws clause on the EntityResolver#resolveEntity callback -- but was otherwise virtually identical to SAX 2.0. Among the new additions, SAX 2.0.2 defines:

  • A feature that allows the application to query the SAX parser as to whether it supports XML 1.1.
  • A feature that instructs the parser to intern XML names and namespaces into the JVM. To determine String equality on intern strings, you can use == instead of String.equals().
  • A feature that enables XML 1.1 normalization checking. Note that JAXP 1.3 does not require compliant parsers to support this feature.

Extensions 1.1 are a significant improvement over SAX’s original extensions. Here are some of the additions:

  • The EntityResolver2 interface extends EntityResolver by providing callbacks for a DTD's external subset, and adds baseURI and the entity’s name to the resolveEntity method’s parameter list.
  • Attributes2 extends Attributes by providing information as to whether each attribute is declared in the DTD or whether an attribute value is defaulted by the DTD.
  • Locator2 extends Locator by adding getXMLVersion() and getEncoding(). This provides complete access to the pseudo-attributes on the XML declaration of the entity currently being processed.

Additions to javax.xml.parsers functionality

The changes that JAXP 1.3 makes to the parsing-related interfaces that it defines directly are not earth-shattering. Possibly the most generally useful involve the reset() method, which has been added to both DocumentBuilder and SAXParser to permit these objects to be returned to their default state. Since the JAXP factory mechanism for parser objects is very expensive, applications often wish to implement a pool of SAXParsers and DocumentBuilders, which permits these objects to be made available when a parsing task is encountered, and not necessarily destroyed once the parsing task is completed. The ability to reset the objects to a known state permits such pools to have no knowledge of the objects' usage by the code requiring them, and does not require the code utilizing the parsers to know anything about the previous use of the parser to which it is given access. This should make such pooling much more efficient and easy to implement. To find out how you can implement a parser pool, read "Improve performance in your XML applications, Part 2."

You can connect the parsers with schemas (see the discussion of the javax.xml.validation package below) through the setSchema() methods, which have been added to SAXParserFactory and DocumentBuilderFactory. This permits the construction of parsers that are optimized for particular schema (javax.xml.validation.Schemas); this allows for considerable performance improvements over standard parser objects with no built-in knowledge of the grammars against which they can be used to validate documents. Applications can also configure their parser factories to produce parsers that are aware of XInclude through the get/setXIncludeAware methods that the factories now contain. Both parsers and factories can be queried as to whether they are aware of XInclude through the isXIncludeAware() method, and the Schema currently associated with them (if any) can be obtained with the getSchema() method.


Validation and Schema caching JAXP API

Many applications seek to validate XML documents against a schema, such as one defined according to the W3C XML Schema Recommendation. To validate a document, a validating processor needs to parse the schema document, build an internal in-memory representation of this schema, and then use this in-memory schema to validate an XML document. Hence, validation can entail a large performance cost if a validating processor needs to parse and build an in-memory representation of a schema before validating each XML document. Normally, an application has a limited set of schemas, and therefore wants the processor to build an in-memory representation of a given schema once and use it to validate documents.

So far, implementations have had to provide their own mechanisms for caching schemas. For example, the Apache Xerces-J parser defines its own grammar caching API (see Resources). Now JAXP 1.3 defines a standard API (the javax.xml.validation package) that lets an application re-use schemas and therefore improve overall performance.

Take a closer look at the validation API. To retrieve an in-memory representation of a schema or schemas, you first need to get an instance of a schema factory (javax.xml.validation.SchemaFactory) that specifies which particular schema language this factory supports. A compliant JAXP implementation must support W3C XML Schema. Supporting other languages, such as RELAX NG, is optional. You can configure the factory by using features and properties, similar to how you would configure an XML parser, and finally you can ask the factory to build an in-memory representation of a given schema (or schemas). The in-memory representation of a schema is defined as the Schema class, which is immutable and therefore thread-safe. The API provides no means to permit querying of the schema’s structure or properties.

You can use the Schema class in a couple of ways:

  • You can construct parsers that are optimized to use an in-memory representation of a given Schema for validation (as mentioned earlier).
  • Using the Schema class, you can create validators that can validate different XML input sources (such as DOM or SAX) using the Schema.

First, we'll show you how to improve parsing performance by re-using an in-memory representation of a given schema. For simplicity, in the sample code in Listing 1, we use an XML document (po.xml) that describes a purchase order and the purchase order schema (po.xsd). Both the document and schema are defined by the W3C XML Schema Primer Recommendation (see Resources).

Start by constructing a schema factory and use it to build an in-memory representation of the purchase order schema. Then retrieve an instance of a DOM factory, and set the purchase order Schema on the factory. Then you create a DOM parser using the DOM factory. This new parser will only be able to validate XML documents against the purchase order schema.


Listing 1. Re-using Schema to parse and validate XML documents
  // create a SchemaFactory that conforms to W3C XML Schema
  SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

  // set your error handler to catch errors during schema construction
  sf.setErrorHandler(myErrorHandler);

  // parse the purchase order schema
  Schema schema = sf.newSchema("po.xsd");

  // get a DOM factory
  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

  // configure the factory
  dbf.setNamespaceAware(true);

  // set schema on the factory
  dbf.setSchema(schema);

  // create a new parser that validates documents against
  // the schema specified (po.xsd)
  DocumentBuilder db = dbf.newDocumentBuilder();

  // attach an error handler to detect document validation errors
  db.setErrorHandler(myErrorHandler);
 
  // parse and validate against po.xsd an XML document
  Document purchaseOrderDoc = db.parse("po.xml");

Now look at how you can use validators. You can create two types of validators from a given Schema:

  • A Validator can validate either a DOM or SAX source, optionally producing DOM or SAX events, respectively.
  • A ValidatorHandler validates a stream of SAX events. This validator acts as a SAX ContentHandler. If you set your own org.xml.sax.ContentHandler on the validator handler, the validator handler acts as a filter that validates incoming SAX events and forwards events to your ContentHandler. This validator also lets you retrieve type information for elements and attributes using the TypeInfoProvider interface (see the ValidatorHandler.getTypeInfoProvider() method).

Neither of these validators are thread-safe. Validators may modify resulting data by augmenting the original data with some additional information. For example, default attributes can appear in a DOM tree or new SAX events can occur as a result of validation. You can set various features and properties to configure validators, register an entity resolver (org.w3c.dom.ls.LSResourceResolver) to help the validator resolve any external entities, or attach an error handler (org.xml.sax.ErrorHandler). Note that if no error handler is attached, the default implementation throws a SAXParseException on any validation error.

Listing 2 shows how to use the Validator interface to validate DOM documents. In this case, you are assuming that your application wants to validate DOM documents against two types of schemas: po.xsd and ipo.xsd. Your application might have received a DOM document from another application, or made some modifications to the existing DOM document, and you want to make sure that the DOM document is still valid according to po.xsd or ipo.xsd.


Listing 2. Using the Validator interface to validate DOM documents
  // create JAXP transformation sources to specify
  // schema sources you want to use
  StreamSource po = new StreamSource("po.xsd");
  StreamSource ipo = new StreamSource("ipo.xsd");

  // build in-memory representation for po.xsd and ipo.xsd
  Schema schemas = sf.newSchema(new Source[]{po, ipo});

  // create a validator that will be able to validate
  // against po.xsd and ipo.xsd
  Validator validator = schemas.newValidator();

  // configure this validator
  validator.setErrorHandler(myErrorHandler);

  // specify a DOM tree that you want to validate
  DOMSource docSource = new DOMSource(purchaseOrderDoc);

  // validate the source
  validator.validate(docSource, null);


Conclusion

In this article, we provided a general overview of the JAXP API, including a description of revisions to the basic XML standards and modifications made to the parsing API. We have also gone into detail describing the new javax.xml.validation package and how it offers applications the means to improve XML parsing performance. Part 2 will cover the new data type support offered in JAXP 1.3, some of the general utilities it offers in terms of namespace support, changes to the javax.xml.transform package, and the new javax.xml.xpath package with its data-model and vendor-neutral XPath 1.0 API.


Resources

About the authors

Neil Graham is the Manager of XML Parser Development at IBM. He is a committer on Apache's Xerces-Java and Xerces-C++ XML parsers, where he has worked on, among other things, the implementation of XML Schema, XML 1.1, and grammar caching. He was also one of IBM's representatives on the Expert Group that developed JAXP 1.3.

Elena Litani is a Software Developer working for IBM. She is one of the main contributors to the Eclipse Modeling Framework (EMF) project at Eclipse.org, which provides the reference implementation for Service Data Objects (SDO). Previously, Elena was one of the main contributors to the Apache Xerces2 project, working on Xerces2 XML Schema and DOM Level 3 implementations, as well as analyzing and improving performance of the parser. Elena has also represented IBM in the W3C DOM Working Group, and participated in the development of the DOM Level 3 specifications.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=31922
ArticleTitle=What's new in JAXP 1.3? Part 1
publish-date=11092004
author1-email=neilg@ca.ibm.com
author1-email-cc=dwxed@us.ibm.com
author2-email=elitani@ca.ibm.com
author2-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).