Perl developers: Fill your XML toolbox

Essential tools and libraries for using XML with Perl

In this article updated June 2001, find out about more than 20 of the essential tools, libraries, and modules needed for XML development with Perl. Use the table of resources to quickly locate the elements that enable you to assemble a powerful toolkit for XML manipulation.

Parand Darugar (tdarugar@yahoo com), Head of architecture, Yahoo! Search Marketing Services

Author photo: Parand Tony DarugarParand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.



01 June 2001

Also available in Japanese

Perl offers a rich set of modules and libraries for the XML developer, rivaling that of any other language. The Perl community was quick to come up with XML tools in the early days. The Perl/XML community remains quite active, not only supporting new protocols and standards with amazing speed, but also playing an active role in the general advancement of XML. Perl's extensibility allows the easy integration of C and C++ modules within the Perl framework, offering a combination of speed and ease of use.

The following tools are favorites selected from my experiences developing Perl/XML tools and applications, as well as gems gleaned from various mailing lists, magazines, and Web pages. These tools will help you develop professional XML-based applications in no time.

Parsers and object models

XML parsers have been available for Perl since the early days of XML. The XML::Parser module, a Perl interface for James Clark's excellent expat parser, serves as the basis for most other parsing and manipulation modules. XML::Simple provides an intuitive, pure-Perl parser for simple XML files, and the SAX API is supported by most of the parsing modules.

Perl also enjoys excellent support for various XML object models, including DOM, Grove, and Twig. A rich variety of packages offer DOM or DOM-like processing options, including the pure-Perl XML::DOM module, XML::LibXML, XML::XPath, Orchard, and the soon-to-be-released Sablotron::DOM package. Alternative processing models are also available, via the Grove, Twig, and PYX modules. Twig is particularly useful for large documents, allowing processing of segments of the document without parsing the entire document.

NameParser description
XML::ParserPerl interface to James Clark's XML parser, expat
XML::SimpleTrivial API for reading and writing XML, optimized for use with config files in XML format
XML::XPathA complete implementation of the XPath specification.
XML::DOMPerl extension to XML::Parser to build an object-oriented data structure with a DOM Level 1-compliant interface. Distributed as part of libxml-enno.
XML::LibXMLPerl interface to the gnome libxml2 library for high performance DOM processing.
XML::GroveSimple access to the information set of parsed XML, HTML, or SGML instances using a tree of Perl hashes
XML::TwigTree interface to XML documents allowing processing chunk by chunk of huge documents
libxml-perlCollection of Perl modules, scripts, and documents for working with XML in Perl. libxml-perl software works in combination with XML::Parser, PerlSAX, XML::DOM, XML::Grove and others.
XML::SchematronXSLT-based XML validation module
Xerces PerlPerl interface to the Xerces XML parser from the Apache XML Project
REXShallow parsing of XML documents with regular expressions
PYXXML to PYX generator

XML convertors, writers, and readers

Perl is known for its wealth of options for connecting to all types of legacy systems. With these connections, and facilities for converting Perl data structures to XML, Perl presents an excellent platform for creating XML interfaces for existing systems. Extensions such as XML::Edifact, DBIx::XML_RDB, XML::CSV, XML::Generator, XML::Dumper, and XML::Writer handle various aspects of serialization and deserialization of data between Perl data structures, XML, and other formats.

NameDescription of XML tools
XML::GeneratorModule for the generation of XML from within Perl
XML::WriterHelper module for Perl programs that write XML documents. The module handles all escaping for attribute values and character data and constructs different types of markup.
XML::EdifactModule for translating UN/Edifact documents to XML

Protocols and libraries

In general, new protocols and standards are quickly supported in Perl. Extensions exist for SOAP, WDDX, RSS, XML-RPC, and Microsoft's BizTalk.

NameDescription of protocols and libraries
SOAP::LiteAn excellent implementation of the SOAP protocol
SOAP/PerlXML-based protocol for accessing services, objects, and servers in a platform-independent manner
WDDX.pmProtocol for exchange of data between different languages such as Perl, Java, and Cold Fusion. This module converts Perl variables to and from WDDX packets.
XML::RSSBasic framework for creating and maintaining Rich Site Summary (RSS) files. RSS is primarily used for distributing news headlines, commonly called channels.

Commercial products

Most Perl extensions enjoy community support under the open-source model. There are far fewer commercially supported systems and packages for Perl than there are for languages such as Java.

VelociGen XML Server, a commercial product (from the company where I work), leverages Perl as the language for exchange and processing of XML documents, and creation of Web services-based applications. Commercial support is also available for the open-source Axkit, which offers Web publishing and content management using XML.

NameDescription of commercial products
AxKitOpen-source XML Web publishing and content management
VelociGenXWeb Services platform with database and legacy system connectivity. Exposes Perl interface for parsing, manipulating, and transforming XML documents.

Style sheets and query languages

XML has spawned a set of related standards for querying and transforming data. Two of the most popular are eXtensible Stylesheet Language Transformations (XSLT), and XML Path Language (XPath). XPath provides a common syntax and functionlity for addressing and searching parts of XML documents. XSLT uses XPath to allow transformations of XML documents to other XML documents.

NameDescription of stylesheets and query languages
XML::XSLTPerl implementation of XSL template processing. XML::XSLT performs transformations specified in an XSL style sheet to XML files.
XML::XPathImplementation of the W3C's XPath specification
XML::QLImplementation of W3C notes called "XML-QL: A Query Language for XML." Allows the user to query an XML document much like a database and describe a construct for output.
XML::XQLPerl implementation of XQL specification, allowing XQL queries on XML tree structures such as XML::DOM. Distributed as part of libxml-enno.
XML::LibXSLTPerl interface to the gnome libxslt library for high performance XSLT processing.
XML::XSLT::WrapperGeneric wrapper for the various XSLT modules
XML::XalanPerl interface to the Apache Xalan XSLT library

Database interfaces

Perl has long enjoyed excellent database support via the DBD/DBI modules. DBIx::XML_RDB uses these modules, building an XML wrapper around any popular database. XML::CSV provides similar support for text-delimited files, such as the popular comma-separated values and tab-delimited formats.

NameDescription of database interfaces
DBIx::XML_RDBPerl extension for creating XML from existing DBI data sources such as databases
XML::CSVConverts comma separated values to XML

Resources

  • CPAN is always an excellent source for finding Perl XML modules
  • To learn more about manipulation of XML documents with Perl and other scripting languages, see XML and scripting languages on developerWorks.
  • IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12003
ArticleTitle=Perl developers: Fill your XML toolbox
publish-date=06012001