Contents


Tell a parser where to find a schema

More useful document validation with JAXP 1.2

Comments

Content series:

This content is part # of # in the series: Tip

Stay tuned for additional content in this series.

This content is part of the series:Tip

Stay tuned for additional content in this series.

Most discussions on schemas center on the best possible vocabularies or on how to organize a schema efficiently (Russian dolls, Venetian blinds, salami slices, and so on). Also, the dispute about the most appropriate schema language is ongoing -- is it DTDs, W3C's XML Schema, or OASIS' Relax NG?

These are important considerations. Yet when you design an XML application, it's even more important to know what to do with the schema. This tip discusses new features in Java API for XML Processing (JAXP) 1.2 that give you more flexibility in validating documents against schemas.

Validating documents

Typically, an application validates XML documents against a list of known schemas as part of its error handling. Schemas describe the vocabulary: the names of elements, the attributes, and their datatypes (such as integer, string, and date). If a document validates against a schema, it conforms to a vocabulary that the application recognizes. Validating is useful -- after all, what's the point in processing documents if the application does not recognize the elements?

Yet to be of any use, it is important that the application validate against a known schema, which, until JAXP 1.2, was easier said than done. For instance, how do you associate a document to its schema in a portable way? In most cases, through the xsi:schemaLocation attribute. The attribute takes pairs of namespace URIs and the associated schema file (there's an xsi:noNamespaceSchemaLocation attribute for documents with no namespaces). In Listing 1, the schemaLocation attribute associates the http://ananas.org/2003/tips/validate namespace to the file simple.xsd.

Listing 1. XML document with an xsi:schemaLocation attribute
<?xml version="1.0"?>
<simple:Root
   xmlns:simple="http://ananas.org/2003/tips/validate"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://ananas.org/2003/tips/validate simple.xsd">
      Document content comes here.
</simple:Root>

The attribute is a simple solution for managing XML schemas, but it has one big flaw: It assumes that your application can control the xsi:schemaLocation attribute. Depending on your application, this may or may not be the case. Consider the following scenarios:

  • An XML editor, such as Corel XMetaL or XMLmind XML Editor, lets you change the xsi:schemaLocation attribute so it points to the appropriate file.
  • A Web publishing framework such as XM or Cocoon may expect that writers set the xsi:schemaLocation correctly, which may not be the case if several authors have contributed to the site.
  • An electronic commerce server that processes incoming XML documents needs to validate the documents, but may not be able to trust that the other party has set the schema correctly.

As these scenarios illustrate, the xsi:schemaLocation attribute works well in small-scale applications, but it becomes increasingly difficult to manage in more distributed environments. Among other things, for example, the chances are small that the schema will be stored under the same name on different computers.

By definition, an application validates a document if there's a risk that the document is incorrect. The validation cannot be robust if it depends on the content of the document, such as xsi:schemaLocation. Why would the attribute be more correct than the rest of the document? Clearly another solution is needed, one that puts more control in the hands of the application.

Schema support in JAXP 1.2

The schema specification correctly recognizes the lack of robustness due to xsi:schemaLocation. According to the specification, xsi:schemaLocation is only a hint to the parser and the parser may use other means to decide which schema to apply. Unfortunately the specification does not say what those other means should be. JAXP 1.2, a maintenance release for JAXP, fills in the blanks by providing a standard mechanism on the Java platform.

Essentially, JAXP 1.2 defines two new properties (for SAX parsers) and two new attributes (for DOM parsers) that control schema validation. The first property (http://java.sun.com/xml/jaxp/properties/schemaLanguage) specifies the schema language to use. For the time being, the only acceptable value is http://www.w3.org/2001/XMLSchema (the W3C recommendation on XML schema). Future releases may support other values for Relax NG or other schema languages.

The second property (http://java.sun.com/xml/jaxp/properties/schemaSource) sets the location of the schema. This is the one of most interest. It accepts many values, such as:

  • A string with the URI of the schema.
  • An InputStream object with the content of the schema.
  • An InputSource object pointing to the schema.
  • A File object pointing to the schema file.
  • An array with one of these defined types. The array is useful if your application accepts documents that can conform to different schemas.

A SAX example

Listing 2 demonstrates how to use the new properties in JAXP 1.2 to validate a document through a SAX parser. To use the SAX parser to validate a document:

  1. Create a SAXParserFactory object.
  2. Set the namespace-aware and validating properties to true.
  3. Obtain a SAXParser object.
  4. Set the properties for the schema language and schema source (this is new to JAXP 1.2 and schema).
  5. Parse the document. The parser must have access to an ErrorHandler object.
Listing 2. ValidateSAX.java demonstrates JAXP 1.2
                package org.ananas.tips;

import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
public class ValidateSAX
{
   public static String SCHEMA_LANGUAGE =
      "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                        XML_SCHEMA =
      "http://www.w3.org/2001/XMLSchema",
                        SCHEMA_SOURCE =
      "http://java.sun.com/xml/jaxp/properties/schemaSource";

   public final static void main(String[] args)
      throws IOException, SAXException, ParserConfigurationException
   {
      if(args.length < 2)
      {
         System.err.println("usage is:");
         System.err.println("   java -jar tips.jar -validatesax "
                            + "input.xml schema.xsd");
         return;
      }

      File input = new File(args[0]),
           schema = new File(args[1]);
      SAXParserFactory factory = SAXParserFactory.newInstance();
      factory.setNamespaceAware(true);
      factory.setValidating(true);
      SAXParser parser = factory.newSAXParser();
      try
      {
         parser.setProperty(SCHEMA_LANGUAGE,XML_SCHEMA);
         parser.setProperty(SCHEMA_SOURCE,schema);
      }
      catch(SAXNotRecognizedException x)
      {
         System.err.println("Your SAX parser is not JAXP 1.2 compliant.");
      }
      parser.parse(input,new ErrorPrinter());
   }
}

To test Listing 2, you need a JAXP 1.2-compliant parser. Check the documentation for your favorite parser or download the most recent version of Apache Xerces (I have used version 2.4.0 to prepare this tip). If your parser is not JAXP 1.2-compliant, it throws a SAXNotRecognizedException exception when you try to set the property. That's your cue to upgrade to the latest version of Xerces.

Listing 2 only registers a DefaultHandler object that simply prints validation errors on the console, as shown in Listing 3. Your application could register a more interesting handler, such as one that does something with the document content.

Listing 3. ErrorPrinter.java
package org.ananas.tips;

import java.text.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class ErrorPrinter
   extends DefaultHandler
{
   private MessageFormat message =
      new MessageFormat("({0}: {1}, {2}): {3}");

   private void print(SAXParseException x)
   {
      String msg = message.format(new Object[]
                                  {
                                     x.getSystemId(),
                                     new Integer(x.getLineNumber()),
                                     new Integer(x.getColumnNumber()),
                                     x.getMessage()
                                  });
      System.out.println(msg);
   }

   public void warning(SAXParseException x)
   {
      print(x);
   }

   public void error(SAXParseException x)
   {
      print(x);
   }

   public void fatalError(SAXParseException x)
      throws SAXParseException
   {
      print(x);
      throw x;
   }
}

What about DOM?

JAXP 1.2 also defines schema support for the DOM parser, as shown in Listing 4. The procedure is very similar to that of a SAX parser, the only difference being that you set attributes on the factory object instead of properties on the parser object. The detailed procedure is:

  1. Create a DOMBuilderFactory object.
  2. Set the namespace-aware and validating properties to true.
  3. Set the attributes for the schema language and schema source. If your parser is not JAXP 1.2-compliant, it will throw an IllegalArgumentException exception.
  4. Obtain a DocumentBuilder object (the parser).
  5. Register an ErrorHandler object with the parser.
  6. Parse the document.

This example only demonstrates validation. Your application could do more interesting things with the parse tree.

Listing 4. ValidateDOM.java
package org.ananas.tips;

import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class ValidateDOM
{
   public static String SCHEMA_LANGUAGE =
      "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                        XML_SCHEMA =
      "http://www.w3.org/2001/XMLSchema",
                        SCHEMA_SOURCE =
      "http://java.sun.com/xml/jaxp/properties/schemaSource";
   public final static void main(String[] args)
      throws IOException, SAXException, ParserConfigurationException
   {
      if(args.length < 2)
      {
         System.err.println("usage is:");
         System.err.println("   java -jar tips.jar -validatedom "
                            + "input.xml schema.xsd");
         return;
      }

      File input = new File(args[0]),
           schema = new File(args[1]);
      DocumentBuilderFactory factory =
         DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      factory.setValidating(true);
      try
      {
         factory.setAttribute(SCHEMA_LANGUAGE,XML_SCHEMA);
         factory.setAttribute(SCHEMA_SOURCE,schema);
      }
      catch(IllegalArgumentException x)
      {
         System.err.println("Your DOM parser is not JAXP 1.2 compliant.");
      }
      DocumentBuilder parser = factory.newDocumentBuilder();
      parser.setErrorHandler(new ErrorPrinter());
      parser.parse(input);
   }
}

Towards more robust XML applications

When implementing robust validations with XML schemas, keep in mind that -- almost by definition -- when your application validates documents, it should not depend on those documents being correct. More specifically, it should not depend on documents having the appropriate xsi:schemaLocation attribute.


Downloadable resources


Related topics

  • Download the source code used in this article. You also need the latest version of Apache Xerces or another JAXP 1.2-compliant parser.
  • XMLmind XML Editor and Corel XMetaL are representative of XML editors with schema support. XM and Cocoon are representative of publishing solutions. When there are not too many writers, it is possible to depend on xsi:schemaLocation with these applications.
  • RELAX NG is an alternative to XML Schema.
  • In his document, XML Schemas: Best Practices, Roger Costello discusses different design techniques with exotic names such as Venetian blinds or Russian doll.
  • Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12273
ArticleTitle=Tip: Tell a parser where to find a schema
publish-date=05222003