 | Level: Introductory Parand Darugar (tdarugar@yahoo com), Head of architecture, Yahoo! Search Marketing Services
01 Jun 2001 In this article updated June 2001, find out about more than 20 of the essential tools, libraries, and modules needed for XML development with Perl. Use the table of resources to quickly locate the elements that enable you to assemble a powerful toolkit for XML manipulation.
Perl offers a rich set of modules and libraries for the XML developer, rivaling that of any other language. The Perl community was quick to come up with XML tools in the early days. The Perl/XML community remains quite active, not only supporting new protocols and standards with amazing speed, but also playing an active role in the general advancement of XML. Perl's extensibility allows the easy integration of C and C++ modules within the Perl framework, offering a combination of speed and ease of use. The following tools are favorites selected from my experiences developing Perl/XML tools and applications, as well as gems gleaned from various mailing lists, magazines, and Web pages. These tools will help you develop professional XML-based applications in no time. Parsers and object models
XML parsers have been available for Perl since the early days of XML. The XML::Parser module, a Perl interface for James Clark's excellent expat parser, serves as the basis for most other parsing
and manipulation modules. XML::Simple provides an intuitive, pure-Perl parser for simple XML files, and the SAX API is supported by most of the parsing modules. Perl also enjoys excellent support for various XML object models, including
DOM, Grove, and Twig. A rich variety of packages offer DOM or DOM-like
processing options, including the pure-Perl XML::DOM module, XML::LibXML,
XML::XPath, Orchard, and the soon-to-be-released Sablotron::DOM package.
Alternative processing models are also available, via the Grove, Twig, and
PYX modules. Twig is particularly useful for large documents, allowing
processing of segments of the document without parsing the entire document.
| XML::Parser | Perl interface to James Clark's XML parser, expat | | XML::Simple | Trivial API for reading and writing XML, optimized for use with config files in XML format | | XML::XPath | A complete implementation of the XPath specification. | | XML::DOM | Perl extension to XML::Parser to build an object-oriented data structure with a DOM Level 1-compliant interface. Distributed as part of libxml-enno. | | XML::LibXML | Perl interface to the gnome libxml2 library for high performance DOM processing. | | XML::Grove | Simple access to the information set of parsed XML, HTML, or SGML instances using a tree of Perl hashes | | XML::Twig | Tree interface to XML documents allowing processing chunk by chunk of huge documents | | libxml-perl | Collection of Perl modules, scripts, and documents for working with XML in Perl. libxml-perl software works in combination with XML::Parser, PerlSAX, XML::DOM, XML::Grove and others. | | XML::Schematron | XSLT-based XML validation module | | Xerces Perl | Perl interface to the Xerces XML parser from the Apache XML Project | | REX | Shallow parsing of XML documents with regular expressions | | PYX | XML to PYX generator |
 |
XML convertors, writers, and readers
Perl is known for its wealth of options for connecting to all types
of legacy systems. With these connections, and facilities for converting
Perl data structures to XML, Perl presents an excellent platform for
creating XML interfaces for existing systems. Extensions such as
XML::Edifact, DBIx::XML_RDB, XML::CSV, XML::Generator, XML::Dumper,
and XML::Writer handle various aspects of serialization and
deserialization of data between Perl data structures, XML, and
other formats. | XML::Generator | Module for the generation of XML from within Perl | | XML::Writer | Helper module for Perl programs that write XML documents. The module handles all escaping for attribute values and character data and constructs different types of markup. | | XML::Edifact | Module for translating UN/Edifact documents to XML |
Protocols and libraries
In general, new protocols and standards are quickly supported in Perl.
Extensions exist for SOAP, WDDX, RSS, XML-RPC, and Microsoft's BizTalk. | SOAP::Lite | An excellent implementation of the SOAP protocol | | SOAP/Perl | XML-based protocol for accessing services, objects, and servers in a platform-independent manner | | WDDX.pm | Protocol for exchange of data between different languages such as Perl, Java, and Cold Fusion. This module converts Perl variables to and from WDDX packets. | | XML::RSS | Basic framework for creating and maintaining Rich Site Summary (RSS) files. RSS is primarily used for distributing news headlines, commonly called channels. |
Commercial products
Most Perl extensions enjoy community support under the
open-source model. There are far fewer commercially supported
systems and packages for Perl than there are for languages
such as Java. VelociGen XML Server, a commercial product (from the company where I work), leverages Perl
as the language for exchange and processing of XML documents,
and creation of Web services-based applications. Commercial
support is also available for the open-source Axkit, which
offers Web publishing and content management using XML. | AxKit | Open-source XML Web publishing and content management | | VelociGenX | Web Services platform with database and legacy system connectivity. Exposes Perl interface for parsing, manipulating, and transforming XML documents. |
Style sheets and query languages
XML has spawned a set of related standards for querying and transforming data. Two of the most popular are eXtensible Stylesheet Language Transformations (XSLT), and XML Path Language (XPath). XPath provides a common syntax and functionlity for addressing and searching parts of XML documents. XSLT uses XPath to allow transformations of XML documents to other XML documents.
| XML::XSLT | Perl implementation of XSL template processing. XML::XSLT performs transformations specified in an XSL style sheet to XML files. | | XML::XPath | Implementation of the W3C's XPath specification | | XML::QL | Implementation of W3C notes called "XML-QL: A Query Language for XML." Allows the user to query an XML document much like a database and describe a construct for output. | | XML::XQL | Perl implementation of XQL specification, allowing XQL queries on XML tree structures such as XML::DOM. Distributed as part of libxml-enno. | | XML::LibXSLT | Perl interface to the gnome libxslt library for high performance XSLT processing. | | XML::XSLT::Wrapper | Generic wrapper for the various XSLT modules | | XML::Xalan | Perl interface to the Apache Xalan XSLT library |
Database interfaces
Perl has long enjoyed excellent database support via the DBD/DBI modules. DBIx::XML_RDB uses these modules, building an XML wrapper around any popular database. XML::CSV provides similar support for text-delimited files, such as the popular comma-separated values and tab-delimited formats.
| DBIx::XML_RDB | Perl extension for creating XML from existing DBI data sources such as databases | | XML::CSV | Converts comma separated values to XML |
Resources
About the author  | 
|  | Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com. |
Rate this page
|  |