Well, finished off the WWW 2007 conference in Banff in fine form. There were quite a number of good papers in all. Due to my interests, I did of course enjoy those in the XML and Web Data track the most.
There was one interesting presentation on an IBM Research project led by Noah Mendelsohn called iScreamer, which was noteworthy because it essentially provided a method for doing schema validated XML parsing much faster than expat (which is considered quite fast and only checks for well-formedness). Of course, the performance comparisons were done based on a SAX model. The upside is that this filters out application-specific work, but the downside is that you don't get a clear idea of how much a DOM-amenable application might benefit from this work. Who knows, maybe there will be future work done on using iScreamer parsing to generate a DOM compatible with, say, Xerces. It would be nice, too, to understand why iScreamer parsers are faster than expat. Since the focus of the work seemed to be about speedy schema validated parsing, it is easy to believe that they might have nailed it and hence would be able to beat other schema validating parsers. But beating expat makes me wonder how to distinguish what was the actual performance improvement due to the novel schema validation method being reported versus what was due to simply having more efficient low-level parse routines (which could also be applied to other parsers like expat and xerces without adopting the novel schema method). Put another way, is there something at the lowest level of iScreamer that could be done to expat which would then make expat faster than iScreamer? Well, maybe some of the answers are in the paper or its predecessor (I was not able to read the paper as a referee because its authors are from IBM, so only now can I access the paper). Anyway, fast is fast, and that always gets my attention!
Speaking of which, there was a good paper on an efficient XML storage method: "Querying and Maintaining a Compact XML Storage". The paper is focused on being able to quickly perform various kinds of XPath twig queries on huge XML documents based on the creation of some kind of 'index' for the XML. The method used is so compact, that there is even room to leave some empty spaces, enough so that efficient time bounds can be achieved on certain operations. I liked the use of amortized analysis to argue that the space added was enough so that only occasionally did more space need to be added, lending to overall highly efficient performance characteristics. To me, the most important contribution was not the ability to do the queries efficiently, but to more efficiently update the index (more efficient in the amortized order of magnitude sense).
Honorable mentions (from me, anyway) would also go to "XML Design for Relational Storage" and "Visibly Pushdown Automata for Streaming XML". The first provided a normalization routine for XML that eliminated redundancy in the XML, as measured in the classic relational database sense. The second provided an interesting language theoretic model for describing various classes of streaming XML algorithms, which allows you to make certain types of statements and guarantees about the streaming algorithms. It's probably the most theoretical work in the bunch, but it gets you thinking in new ways, which is quite valuable.
Our own W3C Track presentation on the "Rich Web Application Backplane" went very well. I felt really good about how the whole architecture story played out, and it fit really well as the front end of the talks by Rafah Hosn (IBM) and Mark Birbeck (x-port.net) as well as connecting to material on SCXML that Rafah presented in the morning session. After the talk, I learned that the W3C will be publishing the track presentation slides on its website, so I'll let you know a link to that when I get it. For my own part, I took the time to annotate all of my slides so you will get much more of the talk that I actually gave than if you just got the slides alone.
Well, that's it for WWW 2007, and now I'm off to India to talk up XForms. More on that next time... stay tuned![Read More]