Simple API for XML (SAX)

Process XML by writing handlers for events from basic XML parsing

Simple API for XML (SAX), an event-driven XML API, defines a stream of events specifying XML structure as handed from the parser to the specialized handler code. Discover how SAX originated, and learn why it's considered to be one of the most efficient, yet one of the most difficult, ways to process XML.

Contributors:  Community specification

25 April 2007 (First published 06 February 2007)

Simple API for XML (SAX), an event-driven XML API, defines a stream of events specifying XML structure as handed from the parser to the specialized handler code. Discover how SAX originated, and learn why it's considered to be one of the most efficient, yet one of the most difficult, ways to process XML.

Simple API for XML (SAX) [community specification] is an event-driven API. You register handler code for specific events that are triggered by different parts of XML markup (such as start and end tags, text, and entities). The parser then sends a stream of these events based on the input XML, which the handler code processes in turn.

SAX was essentially created on a marathon thread starting in late 1997 on the XML-DEV mailing list, which has long been the prime habitat for XML experts. David Megginson led the discussion, and the result was one of the most successful XML initiatives, with no large company or standards-body sponsorship. Before SAX, each parser had its own peculiar API for communicating XML structure to handler code, and SAX provided important unification. In general, parsers provide SAX drivers that translate low-level parser events into SAX standard events, allowing for portable code. SAX was developed with the Java™ language in mind, but it has become popular across numerous languages and environments, although sometimes its Java-centricity complicates porting. SAX is currently in its second generation, which includes XML namespace processing and optional reporting of certain events relating to document structure.

In mainstream languages, event-based interfaces are usually implemented using callback functions, a style familiar in graphical user interface (GUI) programming and the like. In object- oriented languages, callbacks are usually registered methods for an object, using polymorphism to match the method name to the handler code, and using encapsulation to manage state in the handler between callbacks. This overall model of event-based programming is known as a push model and has a reputation for being difficult for many programmers to master. Most models that are considered easier to program, however, require random access to the document, and thus can lead to inefficiencies, so SAX has the reputation for being the most efficient standard way to process XML, if far from the easiest.

Resources

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=193479
SummaryTitle=Simple API for XML (SAX)
publish-date=04252007