Skip to main content

skip to main content

developerWorks  >  XML  >

Perl developers: Fill your XML toolbox

Essential tools and libraries for using XML with Perl

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Parand Darugar (tdarugar@yahoo com), Head of architecture, Yahoo! Search Marketing Services

01 Jun 2001

In this article updated June 2001, find out about more than 20 of the essential tools, libraries, and modules needed for XML development with Perl. Use the table of resources to quickly locate the elements that enable you to assemble a powerful toolkit for XML manipulation.

Perl offers a rich set of modules and libraries for the XML developer, rivaling that of any other language. The Perl community was quick to come up with XML tools in the early days. The Perl/XML community remains quite active, not only supporting new protocols and standards with amazing speed, but also playing an active role in the general advancement of XML. Perl's extensibility allows the easy integration of C and C++ modules within the Perl framework, offering a combination of speed and ease of use.

The following tools are favorites selected from my experiences developing Perl/XML tools and applications, as well as gems gleaned from various mailing lists, magazines, and Web pages. These tools will help you develop professional XML-based applications in no time.

Parsers and object models

XML parsers have been available for Perl since the early days of XML. The XML::Parser module, a Perl interface for James Clark's excellent expat parser, serves as the basis for most other parsing and manipulation modules. XML::Simple provides an intuitive, pure-Perl parser for simple XML files, and the SAX API is supported by most of the parsing modules.

Perl also enjoys excellent support for various XML object models, including DOM, Grove, and Twig. A rich variety of packages offer DOM or DOM-like processing options, including the pure-Perl XML::DOM module, XML::LibXML, XML::XPath, Orchard, and the soon-to-be-released Sablotron::DOM package. Alternative processing models are also available, via the Grove, Twig, and PYX modules. Twig is particularly useful for large documents, allowing processing of segments of the document without parsing the entire document.

XML::ParserPerl interface to James Clark's XML parser, expat
XML::SimpleTrivial API for reading and writing XML, optimized for use with config files in XML format
XML::XPathA complete implementation of the XPath specification.
XML::DOMPerl extension to XML::Parser to build an object-oriented data structure with a DOM Level 1-compliant interface. Distributed as part of libxml-enno.
XML::LibXMLPerl interface to the gnome libxml2 library for high performance DOM processing.
XML::GroveSimple access to the information set of parsed XML, HTML, or SGML instances using a tree of Perl hashes
XML::TwigTree interface to XML documents allowing processing chunk by chunk of huge documents
libxml-perlCollection of Perl modules, scripts, and documents for working with XML in Perl. libxml-perl software works in combination with XML::Parser, PerlSAX, XML::DOM, XML::Grove and others.
XML::SchematronXSLT-based XML validation module
Xerces PerlPerl interface to the Xerces XML parser from the Apache XML Project
REXShallow parsing of XML documents with regular expressions
PYXXML to PYX generator


Back to top


XML convertors, writers, and readers

Perl is known for its wealth of options for connecting to all types of legacy systems. With these connections, and facilities for converting Perl data structures to XML, Perl presents an excellent platform for creating XML interfaces for existing systems. Extensions such as XML::Edifact, DBIx::XML_RDB, XML::CSV, XML::Generator, XML::Dumper, and XML::Writer handle various aspects of serialization and deserialization of data between Perl data structures, XML, and other formats.

XML::GeneratorModule for the generation of XML from within Perl
XML::WriterHelper module for Perl programs that write XML documents. The module handles all escaping for attribute values and character data and constructs different types of markup.
XML::EdifactModule for translating UN/Edifact documents to XML


Back to top


Protocols and libraries

In general, new protocols and standards are quickly supported in Perl. Extensions exist for SOAP, WDDX, RSS, XML-RPC, and Microsoft's BizTalk.

SOAP::LiteAn excellent implementation of the SOAP protocol
SOAP/PerlXML-based protocol for accessing services, objects, and servers in a platform-independent manner
WDDX.pmProtocol for exchange of data between different languages such as Perl, Java, and Cold Fusion. This module converts Perl variables to and from WDDX packets.
XML::RSSBasic framework for creating and maintaining Rich Site Summary (RSS) files. RSS is primarily used for distributing news headlines, commonly called channels.


Back to top


Commercial products

Most Perl extensions enjoy community support under the open-source model. There are far fewer commercially supported systems and packages for Perl than there are for languages such as Java.

VelociGen XML Server, a commercial product (from the company where I work), leverages Perl as the language for exchange and processing of XML documents, and creation of Web services-based applications. Commercial support is also available for the open-source Axkit, which offers Web publishing and content management using XML.

AxKitOpen-source XML Web publishing and content management
VelociGenXWeb Services platform with database and legacy system connectivity. Exposes Perl interface for parsing, manipulating, and transforming XML documents.


Back to top


Style sheets and query languages

XML has spawned a set of related standards for querying and transforming data. Two of the most popular are eXtensible Stylesheet Language Transformations (XSLT), and XML Path Language (XPath). XPath provides a common syntax and functionlity for addressing and searching parts of XML documents. XSLT uses XPath to allow transformations of XML documents to other XML documents.

XML::XSLTPerl implementation of XSL template processing. XML::XSLT performs transformations specified in an XSL style sheet to XML files.
XML::XPathImplementation of the W3C's XPath specification
XML::QLImplementation of W3C notes called "XML-QL: A Query Language for XML." Allows the user to query an XML document much like a database and describe a construct for output.
XML::XQLPerl implementation of XQL specification, allowing XQL queries on XML tree structures such as XML::DOM. Distributed as part of libxml-enno.
XML::LibXSLTPerl interface to the gnome libxslt library for high performance XSLT processing.
XML::XSLT::WrapperGeneric wrapper for the various XSLT modules
XML::XalanPerl interface to the Apache Xalan XSLT library



Back to top


Database interfaces

Perl has long enjoyed excellent database support via the DBD/DBI modules. DBIx::XML_RDB uses these modules, building an XML wrapper around any popular database. XML::CSV provides similar support for text-delimited files, such as the popular comma-separated values and tab-delimited formats.

DBIx::XML_RDBPerl extension for creating XML from existing DBI data sources such as databases
XML::CSVConverts comma separated values to XML



Resources



About the author

Author photo: Parand Tony Darugar

Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top