Tip: Validation and the SAX ErrorHandler interface

What to do to turn on validation and error handling in a SAX-based parser

In this tip, Brett McLaughlin explores SAX's validation capabilities and explains how to turn XML document validation on and off. He also covers the ErrorHandler interface, which enables you to receive notification of errors in your applications and act on that notification. Code samples demonstrate how to request validation and how to create and register an error handler in SAX.

Share:

Brett McLaughlin (brett@newinstance.com), Enhydra strategist, Lutris Technologies

Author photo: Brett McLaughlin Brett McLaughlin (brett@newinstance.com) works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project and the EJBoss EJB server as well as a co-founder of the Apache Turbine project.



01 June 2001

XML validation is the cornerstone of good document authoring. The key to giving meaning to an XML document -- and the crux of validation -- lies in the set of constraints that governs that document, and in ensuring that those constraints are followed. As an example, the element page takes on a different meaning when only one page element is allowed (as in representing a single page of content) than it does when many page elements are allowed (as in a lengthy novel with hundreds of pages). A DTD or an XML Schema plus a validating parser make a document usable across applications. Validating a document's constraints, and providing this meaning to one or more XML documents, can be achieved easily by using SAX, the Simple API for XML (see Resources).

In XML parsers, validation is usually turned off by default because many XML authors are not writing constraints; leaving it off helps to avoid lengthy processing in production environments. To turn on validation, you must request it explicitly. In this tip, I show you how to do that using the SAX API. Because SAX is event driven, you'll want to be notified of, and react to, any errors that occur during validation. You can do this by using the SAX ErrorHandler interface, and I'll show you how.

Setting SAX features

Setting a SAX feature is the key to validation in SAX. This is done through the SAX 2.0 method setFeature(). This method takes as arguments a URI that describes the feature to set and the Boolean value (either true or false). In Resources I refer you to an online list of SAX-defined URIs. The feature that you and I are interested in is listed on that page. Its String constant is http://xml.org/sax/features/validation and, as I mentioned earlier, it is usually turned off by default in parsers. To request validation in an XML parser, you simply need to set the value of this feature to true, as shown in Listing 1.

Listing 1. Requesting validation
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class ValidateXML {
    public static void main(String[] args) {
        try {
            // Create a new XML parser
            XMLReader reader = XMLReaderFactory.createXMLReader();
            // Request validation
            reader.setFeature("http://xml.org/sax/features/validation", true);
            // Parse the file as the first argument on the command-line
            reader.parse(args[0]); 
	} catch (SAXException e) {
            System.out.println("Error: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Getting notification through ErrorHandler

After making the changes per Listing 1, the parser will perform validation on documents, but you might not hear about any problems it encounters because this code doesn't provide a means to report errors. When a validation error occurs -- for example, a disallowed element is found -- then a SAX callback occurs. But if you don't write code to do something in that callback, nothing will get reported to your code or to the application client. To take care of that, implement the org.xml.sax.ErrorHandler interface. The interface has three methods, all of them intended to receive warning and error notifications. Listing 2 adds a class to the source shown in Listing 1 and registers that error handler with the parser.

Listing 2. Creating and registering an ErrorHandler
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class ValidateXML {
    public static void main(String[] args) {
        try {
            // Create a new XML parser
            XMLReader reader = XMLReaderFactory.createXMLReader();
            // Request validation
            reader.setFeature("http://xml.org/sax/features/validation", true);
            // Register the error handler
            reader.setErrorHandler(new MyErrorHandler());
            // Parse the file as the first argument on the command-line
            reader.parse(args[0]);
        } catch (SAXException e) {
            System.out.println("Error: " + e.getMessage());
            e.printStackTrace();
        }
    }
}
class MyErrorHandler implements ErrorHandler {
    public void warning(SAXParseException exception) throws SAXException {
        // Bring things to a crashing halt
        System.out.println("**Parsing Warning**
" +
                           "  Line:    " + 
                              exception.getLineNumber() + "
" +
                           "  URI:     " + 
                              exception.getSystemId() + "
" +
                           "  Message: " + 
                              exception.getMessage());        
        throw new SAXException("Warning encountered");
    }
    public void error(SAXParseException exception) throws SAXException {
        // Bring things to a crashing halt
        System.out.println("**Parsing Error**
" +
                           "  Line:    " + 
                              exception.getLineNumber() + "
" +
                           "  URI:     " + 
                              exception.getSystemId() + "
" +
                           "  Message: " + 
                              exception.getMessage());        
        throw new SAXException("Error encountered");
    }
    public void fatalError(SAXParseException exception) throws SAXException {
        // Bring things to a crashing halt
        System.out.println("**Parsing Fatal Error**
" +
                           "  Line:    " + 
                              exception.getLineNumber() + "
" +
                           "  URI:     " + 
                              exception.getSystemId() + "
" +
                           "  Message: " + 
                              exception.getMessage());        
        throw new SAXException("Fatal Error encountered");
    }
}

This is a bit of an extremist's implementation of ErrorHandler as it brings things to a crashing halt when any problems arise. Instead of gracefully returning an error code to the parent application, I print the error to the screen and bail out of the code. You would probably want a more graceful solution in your production applications. But these three methods are very helpful in letting you know exactly what the problem is and where that problem occurred. And that's all there is to using ErrorHandler. Turn on the validation feature, register an error handler, and boom! You're validating XML with SAX.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12007
ArticleTitle=Tip: Validation and the SAX ErrorHandler interface
publish-date=06012001