Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Tip: Get the most from ContentHandlers

A detailed look at the SAX ContentHandler callback methods

Brett McLaughlin (brett@oreilly.com), Author, O'Reilly and Associates
Photo of Brett McLaughlin
Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Summary:  This tip breaks down each method in the org.xml.sax.ContentHandler interface, explaining the purpose and usage of each callback, and its relationship to an XML parsing event. You will understand the arguments to each method, and the information passed from a SAX parser to its registered ContentHandler.

View more content in this series

Date:  31 Jul 2003
Level:  Introductory
Also available in:   Japanese

Activity:  5248 views
Comments:  

The last tip ended with a homework assignment, so I want to begin this tip by getting right to the completion of that assigment. You might recall that I supplied you with a simple HelloHandler that printed out the name of a callback method each time the method was called. Listing 1 shows that code as a refresher.

The assignment, which was to modify this class so that it prints out the arguments supplied to each method, provided some useful insight into the SAX parsing and callback process. Listing 2 shows the simplest way to accomplish this task.


Listing 2. The InfoHandler class
                
import org.xml.sax.*;

public class HelloHandler implements ContentHandler
{
    public void setDocumentLocator (Locator locator) { 
        System.out.println("Hello from setDocumentLocator()!"); 
    }

    public void startDocument ()
        throws SAXException  { 
		
		System.out.println("Hello from startDocument()!"); 
	}

    public void endDocument() throws SAXException { 
	    System.out.println("Hello from endDocument()!"); 
	}

    public void startPrefixMapping (String prefix, String uri)
        throws SAXException  { 
  
        System.out.println("Hello from startPrefixMapping(" + 
            prefix + ", " + uri + ")!"); 
    }

    public void endPrefixMapping (String prefix)
        throws SAXException  { 
  
        System.out.println("Hello from endPrefixMapping(" + 
            prefix + ")!"); 
    }

    public void startElement (String uri, String localName,
                              String qName, Attributes atts)
        throws SAXException  { 
  
  System.out.println("Hello from startElement(" + uri + 
      ", " + localName + ", " + qName + ")!"); 
 }

    public void endElement (String uri, String localName,
                            String qName)
        throws SAXException { 
  
  System.out.println("Hello from endElement(" + 
      uri + ", " + localName + ", " + qName + ")!"); 
 }

    public void characters (char ch[], int start, int length)
        throws SAXException { 
  
  System.out.println("Hello from characters(" + 
      new String(ch, start, length) + ")!"); 
 }

    public void ignorableWhitespace (char ch[], int start, int length)
        throws SAXException { 
  
  System.out.println("Hello from ignorableWhitespace(" + 
      new String(ch, start, length) + ")!"); 
 }

    public void processingInstruction (String target, String data)
        throws SAXException { 
  
  System.out.println("Hello from processingInstruction(" + 
      target + ", " + data + ")!"); 
 }

    public void skippedEntity (String name)
        throws SAXException { 
  
  System.out.println("Hello from skippedEntity(" + 
      name + ")!"); 
 }
}

The actual code that was added here is pretty uninteresting; I left out all non-String arguments in printing, such as the Attributes object passed to startElement(), and did some quick conversion of the characters passed into characters() and ignorableWhitespace() to make them easily printable. I also realize that I performed lots of string concatenation (a real no-no in programming); however, this is a tip on XML, not Java performance -- so overlook it for now!

Before I detail exactly what is going on here, it is useful to use the test class from the last tip to examine the output from using this new handler. I modified my version of the TestParse class to use InfoHandler instead of HelloHandler, and here's what I got as output:


Listing 3. Using the InfoHandler class
                
[aragorn:~/dev] bmclaugh% java 
    -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser TestParse
	
Hello from setDocumentLocator()!
Hello from startDocument()!
Hello from startElement(, root, root)!
Hello from characters(
  )!
Hello from startElement(, some-element, some-element)!
Hello from characters(Some content in the element)!
Hello from endElement(, some-element, some-element)!
Hello from characters(
  )!
Hello from startElement(, some-other-element, some-other-element)!
Hello from characters(
    )!
Hello from startElement(, child, child)!
Hello from characters(
      More content)!
Hello from characters(
    )!
Hello from endElement(, child, child)!
Hello from characters(
  )!
Hello from endElement(, some-other-element, some-other-element)!
Hello from characters(
)!
Hello from endElement(, root, root)!
Hello from endDocument()!

You should be starting to get an idea of how things work by now. However, you may have also noticed a lot of seemingly odd things in the output shown in Listing 3. First, you'll see that the characters() callback reports empty strings, and sometimes even line breaks. This is something to really get a hold of when working with SAX: Everything in your XML is reported. This means that every carriage return, line break, tab, space, and other piece of information in your XML document is captured in some fashion by SAX, and passed on to one of the SAX handlers (usually ContentHandler, although you'll see in future tips that some events are reported through other handlers).

In the case of this simple document you've been using, the spacing between the end of one element (such as root) and the beginning of another (such as some-element) is captured, seen as character data, and passed on to the characters() callback. The result is a string something like " [CR] " where [CR] is a carriage return. This may seem odd at first, but it turns out to be very powerful -- you can see exactly what the document being parsed looks like, including any indenting!

Another oddity is in the arguments to startElement() and endElement(), and in particular the qName, localName, and uri of an element. First, the qName, or qualified name, is the full name of the element, including any namespace prefix. So the qName of root is "root", and the qName of article:root is "article:root". Simple enough, right? The localName is the unprefixed name of the element. In the previous example, both elements have the same localName: "root". However, their namespace URI is different. The first element has no namespace prefix, so it is attached to the default namespace. The second element is in the namespace attached to the prefix article. So while they share the same localName, they are not indentical.

SAX 2.0 and above reports all this namespace data, so you can accurately determine an element's localName and namespace. However, if you want to simply ignore namespaces, you can just work with the qName of the element. Of course, when an element has no namespace prefix (and no URI assigned to the default namespace), the arguments to startElement() and endElement() can look sort of funny -- you'll get lots of no-length strings for namespace URI, and the localName and qName will be identical. To get a better idea of how namespace processing works, examine the XML in Listing 4.


Listing 4. The namespace.xml document
                
<?xml version="1.0"?>

<article:root xmlns:article="http://www.ibm.com/developer">
  <article:some-element>Some content in the element</article:some-element>
  <article:some-other-element>
    <nested:child xmlns:nested="http://www.nested.com">
      More content
    </nested:child>
</article:some-other-element>
</article:root>

Run this document through your parser class, and see how it differs from the simpler output of the non-namespaced XML from the last tip. My output from parsing, using InfoHandler, is shown in Listing 5.

You'll notice the difference in data reported to startElement() and endElement(), as well as calls to startPrefixMapping() and endPrefixMapping(). These latter two methods handle the relationship of a prefix to a namespace URI, which is then used by the element methods to look up the URI for a given element.

This tip has added quite a bit of information to the SAX toolbox, and you should really be starting to feel comfortable with the ContentHandler interface. You'll deal with a few simple applications of this interface in the next tip, and then leave the workings of ContentHandler for a while to investigate other SAX handlers. For the short-term, you should play around with various XML documents and see what you can discover. Also, try adding comments, processing instructions, and other XML constructs to your documents and see how InfoHandler reports them. I'll be back soon to look at how this affects your output.


Resources

About the author

Photo of Brett McLaughlin

Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12304
ArticleTitle=Tip: Get the most from ContentHandlers
publish-date=07312003
author1-email=brett@oreilly.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers