To better take advantage of core language strengths, various developers have developed XML processing APIs that are native to particular languages. Almost all well-known languages have one or more toolkits offering such an API. For some time, the conventional wisdom has been that it's best to stick to SAX and DOM for maximum portability, but experience has convinced me that this is more often than not an overstated consideration. For one thing, because the language bindings for SAX and DOM have some deviations, code is rarely truly portable across languages; the work needed to adapt the code from one language to another is still considerable. Using SAX and DOM usually does improve portability between implementations in the same language, but this has to be traded off against the fact that the programmer often loses productivity by forfeiting some language strengths.
One area where developers in several languages independently made early explorations
is in the pull DOM, a system that wraps SAX so that one can pull events from
the parser rather than having it pushed. This adjustment generally allows for more
straightforward code, and implementations usually use native language constructs to
a greater extent than pure SAX or DOM. Java Specification Requests (JSR #173) for
Streaming API for XML (StAX) is a Java API for pull-parsing XML. Other pull
APIs include libxml2's xmlTextReader for C, C++, Python, Perl, and many other
languages that have libxml2 wrappers. Python comes with a
xml.dm.pulldom module, which offers a pull API.
Marshallers and XML data structures
Another early convention apart from SAX and DOM was developing tools that turn XML into generic data structures native to the language -- a process called unmarshalling -- and vice versa (marshalling). The idea is to make developers in a specific language feel at home and not have to really think about the XML behind the data. Unfortunately, many developers are hostile to XML and this is often the only way they can find it palatable. But even for those who are comfortable with XML, marshalling tools are useful for quick and dirty processing: JDOM is a DOM-like API that sticks strictly to Java-language idioms; Python users have ElementTree, which creates a specialized data structure from XML, focusing on elements; Perl users have the now rather dated XML::Grove, which interchanges parsed XML, HTML, or SGML with a tree of Perl hashes; Ruby users have XMLification for very simple translation of Ruby objects to XML; an option for PHP is class_path_parser.php, which allows you to register XPath-like expressions for an XML source and dispatches PHP handler functions accordingly; an option for Haskell is Haskell2Xml, which allows you to read and write ordinary Haskell data as XML documents.
XML data bindings
A twist on marshalling that is emerging as a popular option is to use XML schema languages and other such sources to create data structures in the native language that use the vocabulary expressed in the XML document. Such systems are called XML data bindings, and in many cases they lead to the most natural possible manipulation of XML. Java technology users can look to JSR #31, "XML Data Binding Specification". The Castor, JBind, and JiBX tools have some similar features to JAXB. Python users have Anobind, gnosis.xml.objectify, and xmltramp, which operate from direct inspection of the source XML, and generateDS.py, which uses a W3C XML Schema to drive the binding. An option for Perl is XML::Smart.
So regardless of which language you prefer, you have many options for processing XML. Don't be afraid to put aside conventional wisdom and look for options besides the ruling pair.
- Check out JSR #173, a pull API
for the Java language. You can also learn more about StAX in Berthold Daum's
series of developerWorks tips:
- "Parsing XML documents partially with StAX" (December 2003)
- Learn about Libxml2's XmlTextReader Interface, a pull API for C, C++, Perl, Python, and other languages.
- For a Perl data structure for XML, SGML, and HTML, take a look at XML::Grove.
- Translate Ruby objects into XML using XMLification.
- Work with class_path_parser.php to dispatch PHP functions against XML.
- Visit The HaXml project . It includes Haskell2Xml, which interchanges XML with ordinary Haskell objects.
- Try JDOM, a Java-centric DOM variation.
- Look into ElementTree, an easy-to-use Python data structure representing XML documents.
- gnosis.xml.objectify and xmltramp are basic Python data binding tools.For more on gnosis.xml.objectify, check out David Mertz's XML Matters column here on developerWorks.
- Need a data binding for Perl? Try XML::Smart.
- Find a broad array of articles, columns, tutorials, and tips on these two popular technologies at the developerWorks Web services and XML content areas. For a complete list of XML tips to date, check out the tips summary page.