Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Working XML: Compiling the proxy

Using a Doclet to compile the proxy ContentHandler

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoît Marchal is a consultant and writer based in Namur, Belgium. He has just released the second edition of XML by Example . He is also the author of Applied XML Solutions and XML and the Enterprise. Details on his latest projects are at marchal.com. You can contact Benoît at bmarchal@pineapplesoft.com.

Summary:  In this column, Benoît provides the front end for the Handler Compiler, HC, and encounters unexpected problems with the DFA. A stable but less than optimal solution makes it possible to release a first version of HC for further testing.

Date:  01 Mar 2002
Level:  Introductory
Also available in:   Japanese

Activity:  5910 views
Comments:  

This column, the last in the current round of development around the Handler Compiler (HC), delivers the first working version of HC. As I did with XM, the first project for this column, I now intend to move to another project for the next few months while field testing HC.

I hope to use this time to gain more experience from the project and draw a list of requirements for future developments. Of course, in the meantime, I encourage you to download HC, test it in your environment, and share your thoughts on the ananas-discussion mailing list. I will post bug fixes and updates on the CVS server (see Resources).

Where are we?

HC evolved from my experience writing SAX code. While I appreciate the power and flexibility of SAX parsers, I have also found that they require lots of tedious coding to track where the parser is in the document. HC automatically generates the state tracking code from XPaths.

HC is broken down in two components. The first is a compiler that accepts the application handler and creates a table class. The application handler is a Java class that implements the HCHandler interface. Special Javadoc comments indicate which method to invoke when the parser matches an XPath. The compiler is used for development only.

The second component is the run-time. Its most important class is XPathHandler. XPathHandler acts as a proxy to translate SAX events (start element, end element, and the like) in calls to the application handler. The run-time ships with applications.

Figure 1 is the class model for the HC run-time. Here, the application handler is HCCountHandler.


Figure 1. Class model for the HC run-time
Code generated

In the last column, we wrote the logic to compile a set of XPath in a so-called Deterministic Finite Automaton or DFA. Without repeating the discussion from that column, a DFA is a structure to which XPaths can be compiled efficiently.

What remains for this column is interfacing the DFA construction algorithm with Javadoc to pull XPaths from the application handler source. In this column, we also need to write the table class.

Unfortunately what was supposed to be a smooth ride to the end of the project turned into a more involved coding session when I found a problem with the DFA. More on this later.


Javadoc Doclet

To integrate HC smoothly into Java classes, I turned to an old friend: Javadoc comments. The new @xpath Javadoc tag indicates that the method should match the given XPath such as:

/**
 * @xpath para
 */
public void startPara()
{
   writer.print("<p>");
}

A word of caution: We have two different tags here. Javadoc tags appear in Java code and have the form @name value. XML tags appear in XML code and have the form <para/>. Unfortunately, Javadoc and XML have adopted the same vocabulary, so be careful not to confuse them.

I like this solution because it does not force me to switch from the Java editor to another tool or to learn a new language. Furthermore, since JDK 1.2, Javadoc has supported Doclet extensions. Doclets let you plug any code into the Javadoc parser.

Doclets were originally introduced to let you change the format of Javadoc documentation. The Javadoc parser reads the files, compiles information about the classes, methods, and packages, and passes it to the Doclet. The default Doclet writes HTML documentation for the classes. Sun also ships Doclet for MIF (Framemaker), PDF, and RTF.

Doclets have many other applications. Since they have access to the entire parse tree (minus the actual method body), Doclets provide a handy mechanism to compile utility classes or compile reports on the code. For example, Sun has a Doclet that checks the quality and consistency of comments.

Doclet API

Doclets are fashioned after the main() method. A Doclet has a static start() method and takes the parse tree as an argument.

The Doclet API defines many classes for storing the parse tree (note that this is the Java parse tree, not the XML one). The most important ones for HC are RootDoc, ClassDoc, and MethodDoc; they return information on the parse tree, classes, and methods, respectively.

HC compiler is CompilerDoclet. It collects the namespace declarations (from the @xmlns tags) and the XPaths attached to methods (from the @xpath tags) and uses the HCTablesGenerator (to be introduced shortly) to write the table class.

Complete listings for CompilerDoclet are available online (see Resources). Listing 1 is the start() method. It extracts command-line arguments and processes the parse tree. Note the use of an inner-class, DocletMessenger, to report HC errors properly.


Listing 1: CompilerDoclet.start()

public static boolean start(RootDoc root)
   throws Exception
{
   try
   {
      String[][] options = root.options();
      File destdir = new File(".");
      for(int i = 0;i < options.length;i++)
         if(options[i][0].equals("-d"))
            destdir = new File(options[i][1]);
      CompilerDoclet compiler = new CompilerDoclet();
      HandlerInfo[] handlers = compiler.compile(root);
      Messenger messenger = new DocletMessenger(root);
      HCTablesGenerator generator =
          new HCTablesGenerator(getMessageStore(),messenger,destdir);
      for(int i = 0;i < handlers.length;i++)
         generator.generateHCTables(handlers[i]);
       return true;
   }
   catch(CompilerException e)
   {
      // no need to display again, it has already been shown
      // to the user
      return false;
   }
}

Listing 2 is the compile(ClassDoc) method, which extracts HC information for a class. The Doclet API is readable so you should have no problem following along. For example, ClassDoc.interfaces() returns the interfaces that the class implements. ClassDoc.tags() returns Javadoc tags for the class.

HC defines two classes, HandlerInfo and MethodInfo, to collect this information. I chose not to use the Javadoc-provided classes directly in order to buy myself some independence from Javadoc. Who knows -- I might want to switch to another Java parser in the future.


Listing 2: compile(ClassDoc)

protected HandlerInfo compile(ClassDoc clasz)
{
   ClassDoc[] interfaces = clasz.interfaces();
   if(interfaces == null)
      return null;
   boolean found = false;
   for(int i = 0;i < interfaces.length;i++)
      if(interfaces[i].qualifiedName().equals("org.ananas.hc.HCHandler"))
         found = true;
   if(!found)
      return null;

   Tag[] tags = clasz.tags("xmlns");
   NamespaceSupport namespaceSupport = new NamespaceSupport();
   if(tags != null)
      for(int i = 0;i < tags.length;i++)
      {
         String content = tags[i].text();
         int pos = content.indexOf(' ');
         if(pos == -1)
            namespaceSupport.declarePrefix("",content);
         else
            namespaceSupport.declarePrefix(content.substring(0,pos),
                                           content.substring(pos + 1));
      }

   MethodDoc[] methods = clasz.methods();
   List methodsList = new ArrayList();
   if(methods != null)
      for(int i = 0;i < methods.length;i++)
      {
         MethodInfo method = compile(methods[i]);
         if(method != null)
            methodsList.add(method);
      }
   MethodInfo[] methodsArray = new MethodInfo[methodsList.size()];
   methodsList.toArray(methodsArray);

   return new HandlerInfo(clasz.qualifiedName(),
                          namespaceSupport,
                          methodsArray);
}

There are other compile() methods for RootDoc and MethodDoc. HandlerInfo and MethodInfo are also available online.

Table Generator

Writing the table class is the responsibility of HCTablesGenerator. It uses the XPathParser and DFAFactory introduced in the previous column to create DFAs from HandlerInfo. Listing 3 presents the relevant methods.

You might wonder why compileDFA() in Listing 3 creates a new DFA for each XPath. In the last column, they were combined through an OR. Well, that's a consequence of the problem I mentioned before. More details on this in the section "Problems with OR".


Listing 3: Compiling the DFA

public void generateHCTables(HandlerInfo handler)
   throws CompilerException
{
   messenger.info(message.getMessage("Compiling",handler.getName()));
   try
   {
      DFATable[] tables = compileDFA(handler);
      writeHCTables(handler,tables);
   }
   catch(IOException e)
   {
      error("IOException",e.getLocalizedMessage());
   }
}

protected DFATable[] compileDFA(HandlerInfo handler)
   throws CompilerException
{
   XPathParser parser = new XPathParser(handler.getNamespaceSupport(),message);
   MethodInfo[] methods = handler.getMethods();
   ArrayList array = new ArrayList();
   for(int i = 0;i < methods.length;i++)
   {
      String[] xpaths = methods[i].getXPaths();
      for(int j = 0;j < xpaths.length;j++)
      {
         XPathNode node = parser.axpath(xpaths[j],i,methods[i]);
         if(node != null)
            array.add(factory.createDFA(node));
      }
   }
   DFATable[] tables = new DFATable[array.size()];
   return (DFATable[])array.toArray(tables);
}

The writeHCTables() method serializes the DFA tables as a Java class. Currently, it just writes Java code in a text file. In the future, I might want to compile directly to bytecodes. However compiling to Java code is easier to debug.

writeHCTables() is too long (see the excerpt in Listing 4). It's a prime candidate for refactoring and I will certainly break it into more manageable units in a future iteration of HC.

The table class implements the HCTables interface (see Listing 5).


Listing 5: HCTables

package org.ananas.hc;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;

public interface HCTables
{
   public static final String CLASS_SUFFIX =
      "__org_ananas_hc_tables_1";
   public void setHCHandler(HCHandler handler);
   public int getCount();
   public int move(int xpath,QName qname,int state);
   public boolean isAcceptingState(int xpath,int state);
   public void acceptStartEvent(int xpath,
                                int state,
                                QName qName,
                                Attributes atts)
      throws SAXException;
   public void acceptEndEvent(int xpath,int state,QName qName)
      throws SAXException;
   public void acceptCharactersEvent(int xpath,
                                     int state,
                                     char[] ch,
                                     int start,
                                     int length)
      throws SAXException;
}

The interface specifies the contract between XPathHandler and the table class. The setHCHandler() is used to initialize the table class. getCount(), move(), and isAcceptingState() define the interface to the DFA itself.

Finally the acceptXXXEvent() method implements a call in the application handler. The problem here is that since the XPathHandler is a generic class, it does not know about the actual application handler. Therefore it does not know which method to call when the DFA matches. The compiler creates these methods that call the appropriate method in the application handler.

I intentionally chose not to use Java reflection because it is less efficient. If you step through the code you will see that the compiler is very flexible when it creates these methods. For example, it gives the user a lot of control over the parameters. Again, this would be prohibitive to implement with reflection but it is perfectly acceptable through this method.


XPathHandler

The remaining class is XPathHandler, which acts as a proxy between the SAX events and the HC events. Listing 6 is the handler.

The constructor takes an HCHandler as a parameter. It attempts to load the corresponding table class dynamically. Since the name of the table class is derived from the name of the application handler, this is not too difficult. A version number in the name guarantees compatibility in the future.

XPathHandler implements selected SAX events and issues the proper calls on the table class. startDocument() and startElement() cause it to transition (move) to the next state. endElement() restores the state before reading the current element.


Listing 6: XPathHandler

package org.ananas.hc;

import java.util.Stack;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class XPathHandler
   extends DefaultHandler
{
   protected HCTables tables;
   protected int[] states;
   protected Stack stack;

   public XPathHandler(HCHandler handler)
      throws HCException
   {
      try
      {
         Class handlerClass = handler.getClass(),
               tablesClass = handlerClass.forName(handlerClass.getName() +
                                                  HCTables.CLASS_SUFFIX);
         tables = (HCTables)tablesClass.newInstance();
         tables.setHCHandler(handler);
      }
      catch(ClassNotFoundException e)
      {
         throw new HCException(e);
      }
      catch(IllegalAccessException e)
      {
         throw new HCException(e);
      }
      catch(InstantiationException e)
      {
         throw new HCException(e);
      }
   }

   public void startDocument()
      throws SAXException
   {
      stack = new Stack();
      QName qname = new QName(QName.ROOT);
      states = new int[tables.getCount()];
      for(int i = 0;i < tables.getCount();i++)
      {
         states[i] = tables.move(i,qname,-1);
         tables.acceptStartEvent(i,states[i],qname,null);
      }
   }

   public void startElement(String namespaceURI,
                            String localName,
                            String qualifiedName,
                            Attributes atts)
      throws SAXException
   {
      stack.push(states);
      QName qname = new QName(QName.ELEMENT,namespaceURI,localName);
      int[] cstates = states;
      states = new int[tables.getCount()];
      for(int i = 0;i < tables.getCount();i++)
      {
         if(tables.isAcceptingState(i,cstates[i]))
            states[i] = tables.move(i,qname,-1);
         else
            states[i] = tables.move(i,qname,cstates[i]);
         tables.acceptStartEvent(i,states[i],qname,atts);
      }
   }

   public void characters(char[] ch,
                          int start,
                          int length)
      throws SAXException
   {
      for(int i = 0;i < tables.getCount();i++)
         tables.acceptCharactersEvent(i,states[i],ch,start,length);
   }

   public void endElement(String namespaceURI,
                          String localName,
                          String qualifiedName)
      throws SAXException
   {
      QName qname = new QName(QName.ELEMENT,namespaceURI,localName);
      for(int i = 0;i < tables.getCount();i++)
         tables.acceptEndEvent(i,states[i],qname);
      states = (int[])stack.pop();
   }

   public void endDocument()
      throws SAXException
   {
      QName qname = new QName(QName.ROOT);
      for(int i = 0;i < tables.getCount();i++)
         tables.acceptEndEvent(i,states[i],qname);
   }
}


Problems with OR

You might wonder why this application creates as many DFAs as XPaths. If anything, this is less efficient than maintaining only one DFA. That's the best trade-off I can think of, given an unexpected problem occurred.

The promise of this column is to show you work on the project as it unfolds. I promised that I would share with you how I attempted to solve the problems and what I learned in the process. If anything, I hope my false starts and problems will help you avoid the same situation.

This time the problem is that I misunderstood something. In the last column, I was linking XPaths with an OR operator to create a single DFA. In other words, I was treating the following two XPaths:

simpara/ulink sect1info/title

as if they have been written:

simpara/ulink | sect1info/title

This looks fine, and for the most part it is. However, it breaks in the following situation. If one XPath ends with the beginning of the next one, then this is incorrect. For example, the following two XPaths:

sect1/simpara simpara/ulink

are not equivalent to

sect1/simpara | simpara/ulink

Can you spot the difference? It took me a while. The problem is that if the DFA recognizes sect1/simpara, it will never explore the other branch (simpara/ulink). The above two XPaths really are equivalent to:

sect1/(simpara|simpara/ulink) | simpara/ulink

Although this is not proper XPath syntax, the parentheses imply a different priority.

So what went wrong? When I set out looking for an algorithm, I looked at examples in a specific context (regular expression) and I attempted to match them to a completely different context. I saw some problems (e.g. the symbol space for XPath is unlimited) but I missed this one.

I had to choose between spending several columns revising the algorithm and releasing it with a less satisfactory (running multiple DFAs in parallel) but stable technical solution. Add to the mix a self-imposed deadline to deliver a working version in this column, pause the project, and go gain some practical experience, as I did with XM.

As a technician, I'm inclined to ignore deadlines in favour of a more elegant technical solution. Yet, as a consultant, I have learned that it's best to release a stable but slower product early. Nothing beats practical experience, and the second release is your best chance to optimize performance.


Using HC

To conclude this series on HC, here's a short how-to guide. Listing 7 is an HC application handler that formats a small subset of Docbook in HTML. Docbook is a popular DTD for technical publishing. The class defines methods for selected Docbook elements. @xpath tags mark the XPath. It also implements the HCHandler interface (which is not a lot of work given HCHandler defines no method; it is essentially a flag for the compiler).

The HC compiler differentiates start, end, and character events by their names. It is quite flexible when it comes to parameters. For example, notice that a characters method accepts the SAX character array or a String.

Run the HC compiler to create the corresponding table class:

javadoc -docletpath hc.jar;xerces.jar
-doclet org.ananas.hc.compiler.CompilerDoclet
-classpath hc.jar;xerces.jar -sourcepath src
-d autosrc org.ananas.hc.test.*

If we review the parameters one by one, -docletpath is the classpath to the doclet, -doclet selects the Doclet, -classpath is the classpath for the application handler (do not confuse them), and -d is the output directory.

The last parameter (org.ananas.hc.test.*) selects the package to use.

Compile the application (including the class file) with javac or jikes and run. Congratulations, you're ready to roll.


What's next?

As I mentioned in the introduction, I plan to stop developing HC in order to gain practical experience with it. It's far from complete, but as I have often stated in this column, I believe in pragmatically testing software to find out what needs to be improved.

I encourage you to give it a try. Download and install HC, test it, and report your findings on the ananas-discussion mailing list.

The next column will launch a third "Working XML" project. As always, the new project will be released as open source.


Resources

  • You can download the code for this project from ananas.org. Follow the links to the CVS repository on developerWorks as well as the ananas-discussion mailing list. I encourage to join the list and contribute your thoughts to the project.

  • If you'd rather have a ZIP file, that's available too.

  • Jikes is an excellent Java compiler from IBM that greatly improves your build time.

  • Contrast this project with the SIA Parser from Robert Berlinski.

  • You might also look at the JaxMe framework that compiles an XML schema in Java classes using SAX parsers.

  • IBM WebSphere Studio Application Developer is an easy-to-use, integrated development environment for building, testing, and deploying J2EE (TM) applications, including generating XML documents from DTDs and schemas.

About the author

Benoit Marchal

Benoît Marchal is a consultant and writer based in Namur, Belgium. He has just released the second edition of XML by Example . He is also the author of Applied XML Solutions and XML and the Enterprise. Details on his latest projects are at marchal.com. You can contact Benoît at bmarchal@pineapplesoft.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12079
ArticleTitle=Working XML: Compiling the proxy
publish-date=03012002
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers