Tip: Use a SAX filter to manipulate data

Change the events output by a SAX stream

The streaming nature of the Simple API for XML (SAX) provides not only an opportunity to process large amounts of data in a short time, but also the ability to insert changes into the stream that implement business rules without affecting the underlying application. This tip explains how to create and use a SAX filter to control how data is processed.

Nicholas Chase (nicholas@nicholaschase.com), President, Chase and Chase, Inc.

Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, Florida, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at nicholas@nicholaschase.com.



01 October 2002

Note: This tip uses JAXP. The classes are also part of the Java 2 SDK 1.4, so if you have this version installed, you don't need any additional software. It briefly covers the basics of SAX, but you should already understand the basics of both Java and XML.

This tip looks at an application that determines which employees to notify of a particular emergency situation, and then acts accordingly. (The actual contact is left as an exercise for the reader.) The source document in Listing 1 simply lists employees, their department, and their status:

Listing 1. The source document
<?xml version="1.0"?>
<personnel>
  <employee empid="332" deptid="24" shift="night" 
         status="contact">
    JennyBerman
  </employee>
  <employee empid="994" deptid="24" shift="day" 
         status="donotcontact">
    AndrewFule
  </employee>
  <employee empid="948" deptid="3" shift="night" 
         status="contact">
    AnnaBangle
  </employee>
  <employee empid="1032" deptid="3" shift="day" 
         status="contact">
    DavidBaines
  </employee>
</personnel>

The basic application

A SAX application consists of two parts. The main application creates an XMLReader that actually parses the document, sending events such as startElement and endDocument to a content handler. You can send errors to a separate error handler object. The handler objects receive these events and act on them.

The main application can also act as either the content or the error handler (or both), but in Listing 2 they are three separate classes:

Listing 2. The main application
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.XMLReader;
import org.xml.sax.SAXException;
import org.xml.sax.InputSource;
import java.io.IOException;
       
public class MainSaxApp {
       
    public staticvoid main (String[] args){
    
       try {
       
          StringparserClass = "org.apache.crimson.parser.XMLReaderImpl";
         XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);
       
         reader.setContentHandler(new DataProcessor());
         reader.setErrorHandler(new ErrorProcessor());
       
         InputSource file = new InputSource("employees.xml");
         reader.parse(file);
       
       } catch (IOException ioe) {
         System.out.println("IO Exception: "+ioe.getMessage());
       } catch(SAXException se) {
         System.out.println("SAX Exception: "+se.getMessage());
       } 
    
    }
       
}

By setting the content handler for the reader to be a DataProcessor object, the application tells the reader to send its events to that object. In Listing 3, the DataProcessor is simple, checking only for the name of the element and the status of employees before determining whether to contact them:

Listing 3. The content handler
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
       
public class DataProcessor extends DefaultHandler
{
   public voidstartElement (String namespaceUri, String localName,
                           String qualifiedName, Attributesattributes) {
       
       if(localName.equals("employee")){
           if(attributes.getValue("status").equals("contact")){
              System.out.println("Contacting employee "+
                                 attributes.getValue("empid"));
              //Implement actual contact here
           }
       }
   }
}

The ErrorProcessor class is trivial, and is included in the source code for this tip. (See Resources to download the source code.)

When the application runs, the output includes all of the employees with a status attribute of contact, no matter which department they work in:

Contacting employee 332  
Contacting employee 948  
Contacting employee 1032

Filtering the data

So far the application contacts all employees that are listed as on duty regardless of their department, and it works well (or at least, we can hope so!). When you receive a new requirement to contact only employees in a particular department, you have two options:

  • Change the content handler and risk all sorts of new bugs
  • Change the data that comes to the content handler so that only the appropriate employees are seen as on duty.

Because other requirements are also likely to be added later, it makes more sense to implement them separately.

A SAX filter sits between a parser and a content handler. It receives events from the parser and, unless instructed otherwise, passes them on to the content handler unchanged. For example, consider this filter in Listing 4:

Listing 4. A simple XML filter
import org.xml.sax.helpers.XMLFilterImpl;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
       
public class DataFilter extends XMLFilterImpl
{
       
}

The XMLFilterImpl class includes methods that simply pass the data on unchanged. Inserting the filter into the stream within the main application is all that's necessary (see Listing 5):

Listing 5. Inserting the filter into the main application
...
          XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);

         DataFilter filter = new DataFilter();
          filter.setParent(reader);

          filter.setContentHandler(new DataProcessor());
          filter.setErrorHandler(new ErrorProcessor());

          filter.parse("employees.xml");    

       } catch (IOException ioe) {
...

The application creates the XMLReader as usual, but it's actually the filter that initiates the parse of the file receiving the events from its parent, the XMLReader. (Remember, the filter calls super(parent).) It passes the events on to its content handler -- the same DataProcessor object used in the original version.

So far, the filter just passes the events on unchanged, so running the application still produces this:

Contacting employee 332
Contacting employee 948
Contacting employee 1032

With the filter in place, however, you can easily make changes without touching the main application. For example, in Listing 6, the filter can eliminate all employees that are not in department 24 by simply setting everyone else's status to donotcontact:

Listing 6. Filtering data
...
import org.xml.sax.helpers.AttributesImpl;

public class DataFilter extends XMLFilterImpl
{s

 public void startElement (String namespaceUri, String localName,
                        String qualifiedName, Attributes attributes)
                         throws SAXException
 {

   AttributesImpl attributesImpl = new AttributesImpl(attributes);
   if (localName.equals("employee")){
    if (!attributes.getValue("deptid").equals("24")){
      attributesImpl.setValue(3, "donotcontact");
    }
   }
   super.startElement(namespaceUri, localName, qualifiedName, attributesImpl);
 }

}

In this case, you're overriding the startElement() method defined in XMLFilterImpl. It still passes on the event, but if the employee is not in department 24, the filter passes it on with an altered Attributes object that lists the employee as do not contact.

The DataProcessor object has no idea that the data has been manipulated. It simply knows that some employees should be contacted and others shouldn't. Processing now produces a different result:

Contacting employee 332

Next steps

This tip has demonstrated a simple way to alter the processing of a SAX application using an XML filter. In this case, the filter has been pre-determined, but you can build an application to accomodate different situations by choosing filter behavior at run-time. You might accomplish this by replacing the DataFilter class, by passing a parameter at run-time, or even by using a factory to create the filter class in the first place.

A SAX application can also chain filters together so that the output of one filter is used as the input for another, allowing for complex programming in modular chunks.


Download

DescriptionNameSize
Code samplex-tipsaxfilter/saxfiltertipsourcecode.zip---

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12172
ArticleTitle=Tip: Use a SAX filter to manipulate data
publish-date=10012002