Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Tip: Merge XML documents with StAX

Use the high-level, event-based API for pipelined XML applications

Berthold Daum (berthold.daum@bdaum.de), President, BDaum Industrial Communications
Berthold Daum is a consultant and writer based in Lützelbach, Germany. For information on his recent books, System Architecture with XML and Modeling Business Objects with XML Schema (both from Morgan Kaufman), see http://www.bdaum.de. You can contact Berthold at berthold.daum@bdaum.de.

Summary:  Deriving new XML documents from input documents is where the Streaming API for XML (StAX) shines. This tip explores how client applications can utilize the event-based API to efficiently merge two incoming XML documents into one.

View more content in this series

Date:  07 Jan 2004
Level:  Intermediate

Activity:  16400 views
Comments:  

In my previous tip, "Write XML documents with StAX", I showed how to use the low-level, cursor-based StAX API to create XML documents programmatically. In this tip, I use the high-level, event-based API to demonstrate this by creating a program that merges two incoming XML documents into one.

Processing several XML documents simultaneously can be a significant challenge. SAX parsers, for example, deliver the parsing events through callbacks to the client application. Because the SAX parser controls this process, the client application does not really have a chance to synchronize the different input sources. Therefore, programmers usually resort to the DOM parser when it comes to multi-document processing. However, the penalty here is excessive resource usage; the node trees of all input documents must completely reside in memory.

StAX does not suffer from these drawbacks. As its name indicates, it is targeted at streaming applications such as the merging of two documents. The following example shows how this is done. Assume that you want to merge two documents containing lists of products. Each document consists of a <products> element that contains one or several <product> elements sorted alphabetically by attribute pid. Listing 1 is an example of such a document:


Listing 1. Product list
<products>
   <product pid="01"/>
   <product pid="05"/>
   <product pid="09"/>
</products>

In Listing 2, I use a classical merge algorithm to merge the lists from both documents. Depending on the comparison between the merge criteria from the documents, I either copy events from document 1 to the output document or from document 2 to the output document. This is done by the readToNextElement() method. This method contains some extra logic for detecting the end of the product list. Special treatment is also required for the beginning of the document and for the end of the document.


Listing 2. Merging documents
import java.io.*;
import javax.xml.namespace.QName;
import javax.xml.stream.*;
import javax.xml.stream.events.XMLEvent;

public class Merger {

   private static final QName prodName = new QName("product");
   private static final QName pidName = new QName("pid");

   public static void main(String[] args)
      throws FileNotFoundException, XMLStreamException {
         
      // Use  the reference implementation for the  XML input factory
      System.setProperty(
         "javax.xml.stream.XMLInputFactory",
         "com.bea.xml.stream.MXParserFactory");
      // Create the XML input factory
      XMLInputFactory factory = XMLInputFactory.newInstance();
      // Create XML event reader 1
      XMLEventReader r1 = 
         factory.createXMLEventReader(new FileReader("prodList1.xml"));
      // Create XML event reader 2
      XMLEventReader r2 = 
         factory.createXMLEventReader(new FileReader("prodList2.xml"));

      // Create the output factory
      XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
      // Create XML event writer
      XMLEventWriter xmlw = xmlof.createXMLEventWriter(System.out);

      // Read to first <product> element in document 1
      // and output to result document
      String pid1 = readToNextElement(r1, xmlw, false);
      // Read to first <product> element in document 1
      // without writing to result document
      String pid2 = readToNextElement(r2, null, false);
      // Loop over both XML input streams
      while (pid1 != null || pid2 != null) {
         // Compare merge criteria
         if (pid2 == null || (pid1 != null && pid1.compareTo(pid2) <= 0))
            // Continue in document 1
            pid1 = readToNextElement(r1, xmlw, pid2 == null);
         else
            // Continue in document 2
            pid2 = readToNextElement(r2, xmlw, pid1 == null);
      }
      xmlw.close();
   }

   /**
    * @param reader - the document reader
    * @param writer - the document writer
    * @param processEnd - forces the document end to be written
    * @return - the next merge criterion value
    * @throws XMLStreamException
    */
   private static String readToNextElement(XMLEventReader reader,
         XMLEventWriter writer, boolean processEnd) throws XMLStreamException {
      // Nesting level
      int level = 0;
      while (true) {
         // Read event to be written to result document
         XMLEvent event = reader.next();
         // Avoid double processing of document end
         if (!processEnd)
            switch (event.getEventType()) {
               case XMLEvent.START_ELEMENT :
                  ++level;
                  break;
               case XMLEvent.END_ELEMENT :
                  if (--level < 0)
                     return null;
                  break;
            }
         // Output event
         if (writer != null)
            writer.add(event);
         // Look at next event
         event = reader.peek();
         switch (event.getEventType()) {
            case XMLEvent.START_ELEMENT :
               // Start element - stop at <product> element
               QName name = event.asStartElement().getName();
               if (name.equals(prodName)) {
                  return event
                     .asStartElement()
                     .getAttributeByName(pidName)
                     .getValue();
               }
               break;
            case XMLEvent.END_DOCUMENT :
               // Stop at end of document
               return null;
         }
      }
   }
}

As you can see, the event-based API is ideally suited for deriving a document from other documents. With the low-level, cursor-based API, you would need to use different method calls for each different event type, but with the event-based API you just pass generic events to the event writer's add()method and that's it.


Summary

This tip has demonstrated the use of the event-based API of StAX for pipelined XML applications, such as the merging of documents. As of Nov 3, 2003, StAX has passed the Final JSR-0173 Approval Ballot. It will make a valuable addition to every Java programmer's toolbox.



Download

NameSizeDownload method
x-tipstx5_merger.zip2KB HTTP

Information about download methods


Resources

About the author

Berthold Daum is a consultant and writer based in Lützelbach, Germany. For information on his recent books, System Architecture with XML and Modeling Business Objects with XML Schema (both from Morgan Kaufman), see http://www.bdaum.de. You can contact Berthold at berthold.daum@bdaum.de.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12359
ArticleTitle=Tip: Merge XML documents with StAX
publish-date=01072004
author1-email=berthold.daum@bdaum.de
author1-email-cc=