 | Level: Intermediate Berthold Daum (berthold.daum@bdaum.de), President, BDaum Industrial Communications
02 Dec 2003 The Streaming API for XML (StAX), introduced in the previous tip, provides not only an XML parser that is fast, easy to use, and has a low memory footprint, but one that also provides a filter interface that allows programmers to hide unnecessary document detail from the application's business logic. This tip shows how to apply event filters and stream filters to StAX parsers. As with the first tip, I will demonstrate and explain this using both the iterator-style API and the cursor-based API.
When parsing an XML document, an XMLEventReader instance delivers event objects to the client application through its next() method -- one for each syntactical unit in the document. However, applications are not always interested in receiving all event classes; an application that only looks at XML elements and their attributes doesn't care about events that represent comments or processing instructions. Fortunately, StAX allows you to skip certain event classes by implementing an event filter.
Listing 1 shows an event filter that skips all XML processing instructions. These events are not passed to the event reader's hasNext(), next(), or peek() methods. To add a filter to a given event reader, you must construct a new reader. This is done with the factory method createFilteredReader(). This method accepts the original reader and an EventFilter as parameters. I will then use this new filtered event reader to parse the document.
Listing 1. Filtering XML events
import java.io.*;
import javax.xml.stream.*;
import javax.xml.stream.events.XMLEvent;
public class ParseFilteredByEvent {
public static void main(String[] args)
throws FileNotFoundException, XMLStreamException {
// Use reference implementation
System.setProperty(
"javax.xml.stream.XMLInputFactory",
"com.bea.xml.stream.MXParserFactory");
// Create the XML input factory
XMLInputFactory factory = XMLInputFactory.newInstance();
// Create event reader
FileReader reader = new FileReader("somefile.xml");
XMLEventReader eventReader = factory.createXMLEventReader(reader);
// Create a filtered reader
XMLEventReader filteredEventReader =
factory.createFilteredReader(eventReader, new EventFilter() {
public boolean accept(XMLEvent event) {
// Exclude PIs
return (!event.isProcessingInstruction());
}
});
// Main event loop
while (filteredEventReader.hasNext()) {
XMLEvent e = filteredEventReader.next();
System.out.println(e);
}
}
}
|
You can hide other event classes from the main application logic in the same way. You can even combine several EventFilters in a layered fashion by constructing filtered event readers on top of each other.
Hiding document branches
In the next example, I'll show a filter that skips a whole branch of an XML document. This time I'll be using the cursor-based API and a filtered stream reader instead of an event reader, as I have found that complex filters are best implemented as stream filters. Similar to the example above, a new filtered stream reader is constructed on top of a base stream reader:
Listing 2. Creating a filtered stream reader
// Create stream reader
XMLStreamReader xmlr =
xmlif.createXMLStreamReader(new FileReader("somefile.xml"));
// Create a filtered stream reader
XMLStreamReader xmlfr = xmlif.createFilteredReader(xmlr, filter);
|
The StreamFilter used here in the second parameter is shown in Listing 3. It acts upon the start and end of XML elements and compares the name of the respective elements with a path segment. The path specifies which sections of the document should be skipped, and is implemented as a QName array. In this example, all elements in the path invoice/item will be skipped.
When implementing such a filter, you need to be aware of the fact that the filter's accept() method is called whenever a hasNext(), next(), or peek() method is invoked. Consequently, the accept() method may be called several times for the same event. Here, I made sure that the filter logic is only executed once for each event; it is only executed when the character position within the document has changed.
Listing 3. A stream filter
// Exclusion path
private static QName[] exclude = new QName[] {
new QName("invoice"), new QName("item")};
private static StreamFilter filter = new StreamFilter() {
// Element level
int depth = -1;
// Last matching path segment
int match = -1;
// Filter result
boolean process = true;
// Character position in document
int currentPos = -1;
public boolean accept(XMLStreamReader reader) {
// Get character position
Location loc = reader.getLocation();
int pos = loc.getCharacterOffset();
// Inhibit double execution
if (pos != currentPos) {
currentPos = pos;
switch (reader.getEventType()) {
case XMLStreamConstants.START_ELEMENT :
// Increment element depth
if (++depth < exclude.length && match == depth - 1) {
// Compare path segment with current element
if (reader.getName().equals(exclude[depth]))
// Equal - set segment pointer
match = depth;
}
// Process all elements not in path
process = match < exclude.length - 1;
break;
// End of XML element
case XMLStreamConstants.END_ELEMENT :
// Process all elements not in path
process = match < exclude.length - 1;
// Decrement element depth
if (--depth < match)
// Update segment pointer
match = depth;
break;
}
}
return process;
}
};
|
 |
Next steps
This tip demonstrated the use of filters in StAX parsers. In the next tip, I will show how these and other
techniques can be used to screen XML documents efficiently.
Download | Name | Size | Download method |
|---|
| x-tipstx2filters.zip | 2KB | HTTP |
Resources
About the author  | |  | Berthold Daum is a consultant and writer based in Lützelbach, Germany. For information on his recent books, System Architecture with XML and Modeling Business Objects with XML Schema (both from Morgan Kaufman) see http://www.bdaum.de. You can contact Berthold at berthold.daum@bdaum.de. |
Rate this page
|  |