In my previous tip, "Write XML documents with StAX", I showed how to use the low-level, cursor-based StAX API to create XML documents programmatically. In this tip, I use the high-level, event-based API to demonstrate this by creating a program that merges two incoming XML documents into one.
Processing several XML documents simultaneously can be a significant challenge. SAX parsers, for example, deliver the parsing events through callbacks to the client application. Because the SAX parser controls this process, the client application does not really have a chance to synchronize the different input sources. Therefore, programmers usually resort to the DOM parser when it comes to multi-document processing. However, the penalty here is excessive resource usage; the node trees of all input documents must completely reside in memory.
StAX does not suffer from these drawbacks. As its name indicates, it is targeted at streaming applications such as the merging of two documents. The following example shows how this is done. Assume that you want to merge two documents containing lists of products. Each document consists of a <products> element that contains one or several <product> elements sorted alphabetically by attribute pid. Listing 1 is an example of such a document:
Listing 1. Product list
<products> <product pid="01"/> <product pid="05"/> <product pid="09"/> </products> |
In Listing 2, I use a classical merge algorithm to merge the lists from both documents. Depending on the comparison between the merge criteria from the documents, I either copy events from document 1 to the output document or from document 2 to the output document. This is done by the readToNextElement() method. This method contains some extra logic for detecting the end of the product list. Special treatment is also required for the beginning of the document and for the end of the document.
Listing 2. Merging documents
import java.io.*;
import javax.xml.namespace.QName;
import javax.xml.stream.*;
import javax.xml.stream.events.XMLEvent;
public class Merger {
private static final QName prodName = new QName("product");
private static final QName pidName = new QName("pid");
public static void main(String[] args)
throws FileNotFoundException, XMLStreamException {
// Use the reference implementation for the XML input factory
System.setProperty(
"javax.xml.stream.XMLInputFactory",
"com.bea.xml.stream.MXParserFactory");
// Create the XML input factory
XMLInputFactory factory = XMLInputFactory.newInstance();
// Create XML event reader 1
XMLEventReader r1 =
factory.createXMLEventReader(new FileReader("prodList1.xml"));
// Create XML event reader 2
XMLEventReader r2 =
factory.createXMLEventReader(new FileReader("prodList2.xml"));
// Create the output factory
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
// Create XML event writer
XMLEventWriter xmlw = xmlof.createXMLEventWriter(System.out);
// Read to first <product> element in document 1
// and output to result document
String pid1 = readToNextElement(r1, xmlw, false);
// Read to first <product> element in document 1
// without writing to result document
String pid2 = readToNextElement(r2, null, false);
// Loop over both XML input streams
while (pid1 != null || pid2 != null) {
// Compare merge criteria
if (pid2 == null || (pid1 != null && pid1.compareTo(pid2) <= 0))
// Continue in document 1
pid1 = readToNextElement(r1, xmlw, pid2 == null);
else
// Continue in document 2
pid2 = readToNextElement(r2, xmlw, pid1 == null);
}
xmlw.close();
}
/**
* @param reader - the document reader
* @param writer - the document writer
* @param processEnd - forces the document end to be written
* @return - the next merge criterion value
* @throws XMLStreamException
*/
private static String readToNextElement(XMLEventReader reader,
XMLEventWriter writer, boolean processEnd) throws XMLStreamException {
// Nesting level
int level = 0;
while (true) {
// Read event to be written to result document
XMLEvent event = reader.next();
// Avoid double processing of document end
if (!processEnd)
switch (event.getEventType()) {
case XMLEvent.START_ELEMENT :
++level;
break;
case XMLEvent.END_ELEMENT :
if (--level < 0)
return null;
break;
}
// Output event
if (writer != null)
writer.add(event);
// Look at next event
event = reader.peek();
switch (event.getEventType()) {
case XMLEvent.START_ELEMENT :
// Start element - stop at <product> element
QName name = event.asStartElement().getName();
if (name.equals(prodName)) {
return event
.asStartElement()
.getAttributeByName(pidName)
.getValue();
}
break;
case XMLEvent.END_DOCUMENT :
// Stop at end of document
return null;
}
}
}
}
|
As you can see, the event-based API is ideally suited for deriving a document from other documents. With the low-level, cursor-based API, you would need to use different method calls for each different event type, but with the event-based API you just pass generic events to the event writer's add()method and that's it.
This tip has demonstrated the use of the event-based API of StAX for pipelined XML applications, such as the merging of documents. As of Nov 3, 2003, StAX has passed the Final JSR-0173 Approval Ballot. It will make a valuable addition to every Java programmer's toolbox.
| Name | Size | Download method |
|---|---|---|
| x-tipstx5_merger.zip | 2KB | HTTP |
Information about download methods
- Download the source files for this tip.
- Read "Using XML streaming parsers," the first in a series of developerWorks tips covering StAX (November 2003).
- Learn how to apply event filters and stream filters to StAX parsers in the second in this series of StAX tips, "Parsing XML documents partially with StAX" (December 2003).
- Find out how to retrieve specific information from XML documents and how to stop the parsing process once this information is collected in the third tip in this series, "Screen XML documents efficiently with StAX" (December 2003).
- Use the low-level, cursor-based StAX API to create XML documents efficiently in the fourth tip in this series,
"Write XML documents with StAX" (December 2003).
- Get more information on the Streaming API for XML (StAX) at the
Java Community Process site.
- Find more XML resources on the developerWorks
XML content area. For a complete list of XML tips to date, check out the tips summary page.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
Berthold Daum is a consultant and writer based in Lützelbach, Germany. For information on his recent books, System Architecture with XML and Modeling Business Objects with XML Schema (both from Morgan Kaufman), see http://www.bdaum.de. You can contact Berthold at berthold.daum@bdaum.de.



