Perl offers a rich set of modules and libraries for the XML developer, rivaling that of any other language. The Perl community was quick to come up with XML tools in the early days. The Perl/XML community remains quite active, not only supporting new protocols and standards with amazing speed, but also playing an active role in the general advancement of XML. Perl's extensibility allows the easy integration of C and C++ modules within the Perl framework, offering a combination of speed and ease of use.
The following tools are favorites selected from my experiences developing Perl/XML tools and applications, as well as gems gleaned from various mailing lists, magazines, and Web pages. These tools will help you develop professional XML-based applications in no time.
XML parsers have been available for Perl since the early days of XML. The XML::Parser module, a Perl interface for James Clark's excellent expat parser, serves as the basis for most other parsing and manipulation modules. XML::Simple provides an intuitive, pure-Perl parser for simple XML files, and the SAX API is supported by most of the parsing modules.
Perl also enjoys excellent support for various XML object models, including DOM, Grove, and Twig. A rich variety of packages offer DOM or DOM-like processing options, including the pure-Perl XML::DOM module, XML::LibXML, XML::XPath, Orchard, and the soon-to-be-released Sablotron::DOM package. Alternative processing models are also available, via the Grove, Twig, and PYX modules. Twig is particularly useful for large documents, allowing processing of segments of the document without parsing the entire document.
| XML::Parser | Perl interface to James Clark's XML parser, expat |
| XML::Simple | Trivial API for reading and writing XML, optimized for use with config files in XML format |
| XML::XPath | A complete implementation of the XPath specification. |
| XML::DOM | Perl extension to XML::Parser to build an object-oriented data structure with a DOM Level 1-compliant interface. Distributed as part of libxml-enno. |
| XML::LibXML | Perl interface to the gnome libxml2 library for high performance DOM processing. |
| XML::Grove | Simple access to the information set of parsed XML, HTML, or SGML instances using a tree of Perl hashes |
| XML::Twig | Tree interface to XML documents allowing processing chunk by chunk of huge documents |
| libxml-perl | Collection of Perl modules, scripts, and documents for working with XML in Perl. libxml-perl software works in combination with XML::Parser, PerlSAX, XML::DOM, XML::Grove and others. |
| XML::Schematron | XSLT-based XML validation module |
| Xerces Perl | Perl interface to the Xerces XML parser from the Apache XML Project |
| REX | Shallow parsing of XML documents with regular expressions |
| PYX | XML to PYX generator |
XML convertors, writers, and readers
Perl is known for its wealth of options for connecting to all types of legacy systems. With these connections, and facilities for converting Perl data structures to XML, Perl presents an excellent platform for creating XML interfaces for existing systems. Extensions such as XML::Edifact, DBIx::XML_RDB, XML::CSV, XML::Generator, XML::Dumper, and XML::Writer handle various aspects of serialization and deserialization of data between Perl data structures, XML, and other formats.
| XML::Generator | Module for the generation of XML from within Perl |
| XML::Writer | Helper module for Perl programs that write XML documents. The module handles all escaping for attribute values and character data and constructs different types of markup. |
| XML::Edifact | Module for translating UN/Edifact documents to XML |
In general, new protocols and standards are quickly supported in Perl. Extensions exist for SOAP, WDDX, RSS, XML-RPC, and Microsoft's BizTalk.
| SOAP::Lite | An excellent implementation of the SOAP protocol |
| SOAP/Perl | XML-based protocol for accessing services, objects, and servers in a platform-independent manner |
| WDDX.pm | Protocol for exchange of data between different languages such as Perl, Java, and Cold Fusion. This module converts Perl variables to and from WDDX packets. |
| XML::RSS | Basic framework for creating and maintaining Rich Site Summary (RSS) files. RSS is primarily used for distributing news headlines, commonly called channels. |
Most Perl extensions enjoy community support under the open-source model. There are far fewer commercially supported systems and packages for Perl than there are for languages such as Java.
VelociGen XML Server, a commercial product (from the company where I work), leverages Perl as the language for exchange and processing of XML documents, and creation of Web services-based applications. Commercial support is also available for the open-source Axkit, which offers Web publishing and content management using XML.
| AxKit | Open-source XML Web publishing and content management |
| VelociGenX | Web Services platform with database and legacy system connectivity. Exposes Perl interface for parsing, manipulating, and transforming XML documents. |
Style sheets and query languages
XML has spawned a set of related standards for querying and transforming data. Two of the most popular are eXtensible Stylesheet Language Transformations (XSLT), and XML Path Language (XPath). XPath provides a common syntax and functionlity for addressing and searching parts of XML documents. XSLT uses XPath to allow transformations of XML documents to other XML documents.
| XML::XSLT | Perl implementation of XSL template processing. XML::XSLT performs transformations specified in an XSL style sheet to XML files. |
| XML::XPath | Implementation of the W3C's XPath specification |
| XML::QL | Implementation of W3C notes called "XML-QL: A Query Language for XML." Allows the user to query an XML document much like a database and describe a construct for output. |
| XML::XQL | Perl implementation of XQL specification, allowing XQL queries on XML tree structures such as XML::DOM. Distributed as part of libxml-enno. |
| XML::LibXSLT | Perl interface to the gnome libxslt library for high performance XSLT processing. |
| XML::XSLT::Wrapper | Generic wrapper for the various XSLT modules |
| XML::Xalan | Perl interface to the Apache Xalan XSLT library |
Perl has long enjoyed excellent database support via the DBD/DBI modules. DBIx::XML_RDB uses these modules, building an XML wrapper around any popular database. XML::CSV provides similar support for text-delimited files, such as the popular comma-separated values and tab-delimited formats.
| DBIx::XML_RDB | Perl extension for creating XML from existing DBI data sources such as databases |
| XML::CSV | Converts comma separated values to XML |
- CPAN is always an excellent source for finding Perl XML modules
- To learn more about manipulation of XML documents with Perl and other scripting languages, see XML and scripting languages on developerWorks.
- IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.
Comments (Undergoing maintenance)





