Tip

Use a SAX filter to manipulate data

Change the events output by a SAX stream

Comments

Content series:

This content is part # of # in the series: Tip

Stay tuned for additional content in this series.

This content is part of the series:Tip

Stay tuned for additional content in this series.

Note: This tip uses JAXP. The classes are also part of the Java 2 SDK 1.4, so if you have this version installed, you don't need any additional software. It briefly covers the basics of SAX, but you should already understand the basics of both Java and XML.

This tip looks at an application that determines which employees to notify of a particular emergency situation, and then acts accordingly. (The actual contact is left as an exercise for the reader.) The source document in Listing 1 simply lists employees, their department, and their status:

Listing 1. The source document
<?xml version="1.0"?>
<personnel>
  <employee empid="332" deptid="24" shift="night" 
         status="contact">
    JennyBerman
  </employee>
  <employee empid="994" deptid="24" shift="day" 
         status="donotcontact">
    AndrewFule
  </employee>
  <employee empid="948" deptid="3" shift="night" 
         status="contact">
    AnnaBangle
  </employee>
  <employee empid="1032" deptid="3" shift="day" 
         status="contact">
    DavidBaines
  </employee>
</personnel>

The basic application

A SAX application consists of two parts. The main application creates an XMLReader that actually parses the document, sending events such as startElement and endDocument to a content handler. You can send errors to a separate error handler object. The handler objects receive these events and act on them.

The main application can also act as either the content or the error handler (or both), but in Listing 2 they are three separate classes:

Listing 2. The main application
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.XMLReader;
import org.xml.sax.SAXException;
import org.xml.sax.InputSource;
import java.io.IOException;
       
public class MainSaxApp {
       
    public staticvoid main (String[] args){
    
       try {
       
          StringparserClass = "org.apache.crimson.parser.XMLReaderImpl";
         XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);
       
         reader.setContentHandler(new DataProcessor());
         reader.setErrorHandler(new ErrorProcessor());
       
         InputSource file = new InputSource("employees.xml");
         reader.parse(file);
       
       } catch (IOException ioe) {
         System.out.println("IO Exception: "+ioe.getMessage());
       } catch(SAXException se) {
         System.out.println("SAX Exception: "+se.getMessage());
       } 
    
    }
       
}

By setting the content handler for the reader to be a DataProcessor object, the application tells the reader to send its events to that object. In Listing 3, the DataProcessor is simple, checking only for the name of the element and the status of employees before determining whether to contact them:

Listing 3. The content handler
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
       
public class DataProcessor extends DefaultHandler
{
   public voidstartElement (String namespaceUri, String localName,
                           String qualifiedName, Attributesattributes) {
       
       if(localName.equals("employee")){
           if(attributes.getValue("status").equals("contact")){
              System.out.println("Contacting employee "+
                                 attributes.getValue("empid"));
              //Implement actual contact here
           }
       }
   }
}

The ErrorProcessor class is trivial, and is included in the source code for this tip. (See Related topics to download the source code.)

When the application runs, the output includes all of the employees with a status attribute of contact, no matter which department they work in:

Contacting employee 332  
Contacting employee 948  
Contacting employee 1032

Filtering the data

So far the application contacts all employees that are listed as on duty regardless of their department, and it works well (or at least, we can hope so!). When you receive a new requirement to contact only employees in a particular department, you have two options:

  • Change the content handler and risk all sorts of new bugs
  • Change the data that comes to the content handler so that only the appropriate employees are seen as on duty.

Because other requirements are also likely to be added later, it makes more sense to implement them separately.

A SAX filter sits between a parser and a content handler. It receives events from the parser and, unless instructed otherwise, passes them on to the content handler unchanged. For example, consider this filter in Listing 4:

Listing 4. A simple XML filter
import org.xml.sax.helpers.XMLFilterImpl;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
       
public class DataFilter extends XMLFilterImpl
{
       
}

The XMLFilterImpl class includes methods that simply pass the data on unchanged. Inserting the filter into the stream within the main application is all that's necessary (see Listing 5):

Listing 5. Inserting the filter into the main application
...
          XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);

         DataFilter filter = new DataFilter();
          filter.setParent(reader);

          filter.setContentHandler(new DataProcessor());
          filter.setErrorHandler(new ErrorProcessor());

          filter.parse("employees.xml");    

       } catch (IOException ioe) {
...

The application creates the XMLReader as usual, but it's actually the filter that initiates the parse of the file receiving the events from its parent, the XMLReader. (Remember, the filter calls super(parent).) It passes the events on to its content handler -- the same DataProcessor object used in the original version.

So far, the filter just passes the events on unchanged, so running the application still produces this:

Contacting employee 332
Contacting employee 948
Contacting employee 1032

With the filter in place, however, you can easily make changes without touching the main application. For example, in Listing 6, the filter can eliminate all employees that are not in department 24 by simply setting everyone else's status to donotcontact:

Listing 6. Filtering data
...
import org.xml.sax.helpers.AttributesImpl;

public class DataFilter extends XMLFilterImpl
{s

 public void startElement (String namespaceUri, String localName,
                        String qualifiedName, Attributes attributes)
                         throws SAXException
 {

   AttributesImpl attributesImpl = new AttributesImpl(attributes);
   if (localName.equals("employee")){
    if (!attributes.getValue("deptid").equals("24")){
      attributesImpl.setValue(3, "donotcontact");
    }
   }
   super.startElement(namespaceUri, localName, qualifiedName, attributesImpl);
 }

}

In this case, you're overriding the startElement() method defined in XMLFilterImpl. It still passes on the event, but if the employee is not in department 24, the filter passes it on with an altered Attributes object that lists the employee as do not contact.

The DataProcessor object has no idea that the data has been manipulated. It simply knows that some employees should be contacted and others shouldn't. Processing now produces a different result:

Contacting employee 332

Next steps

This tip has demonstrated a simple way to alter the processing of a SAX application using an XML filter. In this case, the filter has been pre-determined, but you can build an application to accomodate different situations by choosing filter behavior at run-time. You might accomplish this by replacing the DataFilter class, by passing a parameter at run-time, or even by using a factory to create the filter class in the first place.

A SAX application can also chain filters together so that the output of one filter is used as the input for another, allowing for complex programming in modular chunks.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12172
ArticleTitle=Tip: Use a SAX filter to manipulate data
publish-date=10012002