When parsing an XML document, an XMLEventReader instance delivers event objects to the client application through its next() method -- one for each syntactical unit in the document. However, applications are not always interested in receiving all event classes; an application that only looks at XML elements and their attributes doesn't care about events that represent comments or processing instructions. Fortunately, StAX allows you to skip certain event classes by implementing an event filter.
Listing 1 shows an event filter that skips all XML processing instructions. These events are not passed to the event reader's hasNext(), next(), or peek() methods. To add a filter to a given event reader, you must construct a new reader. This is done with the factory method createFilteredReader(). This method accepts the original reader and an EventFilter as parameters. I will then use this new filtered event reader to parse the document.
Listing 1. Filtering XML events
import java.io.*;
import javax.xml.stream.*;
import javax.xml.stream.events.XMLEvent;
public class ParseFilteredByEvent {
public static void main(String[] args)
throws FileNotFoundException, XMLStreamException {
// Use reference implementation
System.setProperty(
"javax.xml.stream.XMLInputFactory",
"com.bea.xml.stream.MXParserFactory");
// Create the XML input factory
XMLInputFactory factory = XMLInputFactory.newInstance();
// Create event reader
FileReader reader = new FileReader("somefile.xml");
XMLEventReader eventReader = factory.createXMLEventReader(reader);
// Create a filtered reader
XMLEventReader filteredEventReader =
factory.createFilteredReader(eventReader, new EventFilter() {
public boolean accept(XMLEvent event) {
// Exclude PIs
return (!event.isProcessingInstruction());
}
});
// Main event loop
while (filteredEventReader.hasNext()) {
XMLEvent e = filteredEventReader.next();
System.out.println(e);
}
}
}
|
You can hide other event classes from the main application logic in the same way. You can even combine several EventFilters in a layered fashion by constructing filtered event readers on top of each other.
In the next example, I'll show a filter that skips a whole branch of an XML document. This time I'll be using the cursor-based API and a filtered stream reader instead of an event reader, as I have found that complex filters are best implemented as stream filters. Similar to the example above, a new filtered stream reader is constructed on top of a base stream reader:
Listing 2. Creating a filtered stream reader
// Create stream reader
XMLStreamReader xmlr =
xmlif.createXMLStreamReader(new FileReader("somefile.xml"));
// Create a filtered stream reader
XMLStreamReader xmlfr = xmlif.createFilteredReader(xmlr, filter);
|
The StreamFilter used here in the second parameter is shown in Listing 3. It acts upon the start and end of XML elements and compares the name of the respective elements with a path segment. The path specifies which sections of the document should be skipped, and is implemented as a QName array. In this example, all elements in the path invoice/item will be skipped.
When implementing such a filter, you need to be aware of the fact that the filter's accept() method is called whenever a hasNext(), next(), or peek() method is invoked. Consequently, the accept() method may be called several times for the same event. Here, I made sure that the filter logic is only executed once for each event; it is only executed when the character position within the document has changed.
Listing 3. A stream filter
// Exclusion path
private static QName[] exclude = new QName[] {
new QName("invoice"), new QName("item")};
private static StreamFilter filter = new StreamFilter() {
// Element level
int depth = -1;
// Last matching path segment
int match = -1;
// Filter result
boolean process = true;
// Character position in document
int currentPos = -1;
public boolean accept(XMLStreamReader reader) {
// Get character position
Location loc = reader.getLocation();
int pos = loc.getCharacterOffset();
// Inhibit double execution
if (pos != currentPos) {
currentPos = pos;
switch (reader.getEventType()) {
case XMLStreamConstants.START_ELEMENT :
// Increment element depth
if (++depth < exclude.length && match == depth - 1) {
// Compare path segment with current element
if (reader.getName().equals(exclude[depth]))
// Equal - set segment pointer
match = depth;
}
// Process all elements not in path
process = match < exclude.length - 1;
break;
// End of XML element
case XMLStreamConstants.END_ELEMENT :
// Process all elements not in path
process = match < exclude.length - 1;
// Decrement element depth
if (--depth < match)
// Update segment pointer
match = depth;
break;
}
}
return process;
}
};
|
This tip demonstrated the use of filters in StAX parsers. In the next tip, I will show how these and other techniques can be used to screen XML documents efficiently.
| Name | Size | Download method |
|---|---|---|
| x-tipstx2filters.zip | 2KB | HTTP |
Information about download methods
- Download the source files for this tip.
- Learn how to bridge the gap between SAX and pull-based DOMs in the developerWorks tip "Using pull-based DOMs" (May 2002).
- Get more information on the Streaming API for XML (StAX) at
the Java Community Process site. This final draft of the reference implementation has been available since August 2003.
- Find more XML resources on the developerWorks
XML zone. For a complete list of XML tips to date, check out the tips summary page.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Berthold Daum is a consultant and writer based in Lützelbach, Germany. For information on his recent books, System Architecture with XML and Modeling Business Objects with XML Schema (both from Morgan Kaufman) see http://www.bdaum.de. You can contact Berthold at berthold.daum@bdaum.de.
Comments (Undergoing maintenance)





