Skip to main content

Perl developers: Fill your XML toolbox

Essential tools and libraries for using XML with Perl

Parand Darugar (tdarugar@yahoo com), Head of architecture, Yahoo! Search Marketing Services
Author photo: Parand Tony Darugar
Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.

Summary:  In this article updated June 2001, find out about more than 20 of the essential tools, libraries, and modules needed for XML development with Perl. Use the table of resources to quickly locate the elements that enable you to assemble a powerful toolkit for XML manipulation.

Date:  01 Jun 2001
Level:  Introductory
Activity:  4670 views

Perl offers a rich set of modules and libraries for the XML developer, rivaling that of any other language. The Perl community was quick to come up with XML tools in the early days. The Perl/XML community remains quite active, not only supporting new protocols and standards with amazing speed, but also playing an active role in the general advancement of XML. Perl's extensibility allows the easy integration of C and C++ modules within the Perl framework, offering a combination of speed and ease of use.

The following tools are favorites selected from my experiences developing Perl/XML tools and applications, as well as gems gleaned from various mailing lists, magazines, and Web pages. These tools will help you develop professional XML-based applications in no time.

Parsers and object models

XML parsers have been available for Perl since the early days of XML. The XML::Parser module, a Perl interface for James Clark's excellent expat parser, serves as the basis for most other parsing and manipulation modules. XML::Simple provides an intuitive, pure-Perl parser for simple XML files, and the SAX API is supported by most of the parsing modules.

Perl also enjoys excellent support for various XML object models, including DOM, Grove, and Twig. A rich variety of packages offer DOM or DOM-like processing options, including the pure-Perl XML::DOM module, XML::LibXML, XML::XPath, Orchard, and the soon-to-be-released Sablotron::DOM package. Alternative processing models are also available, via the Grove, Twig, and PYX modules. Twig is particularly useful for large documents, allowing processing of segments of the document without parsing the entire document.

XML::ParserPerl interface to James Clark's XML parser, expat
XML::SimpleTrivial API for reading and writing XML, optimized for use with config files in XML format
XML::XPathA complete implementation of the XPath specification.
XML::DOMPerl extension to XML::Parser to build an object-oriented data structure with a DOM Level 1-compliant interface. Distributed as part of libxml-enno.
XML::LibXMLPerl interface to the gnome libxml2 library for high performance DOM processing.
XML::GroveSimple access to the information set of parsed XML, HTML, or SGML instances using a tree of Perl hashes
XML::TwigTree interface to XML documents allowing processing chunk by chunk of huge documents
libxml-perlCollection of Perl modules, scripts, and documents for working with XML in Perl. libxml-perl software works in combination with XML::Parser, PerlSAX, XML::DOM, XML::Grove and others.
XML::SchematronXSLT-based XML validation module
Xerces PerlPerl interface to the Xerces XML parser from the Apache XML Project
REXShallow parsing of XML documents with regular expressions
PYXXML to PYX generator

XML convertors, writers, and readers

Perl is known for its wealth of options for connecting to all types of legacy systems. With these connections, and facilities for converting Perl data structures to XML, Perl presents an excellent platform for creating XML interfaces for existing systems. Extensions such as XML::Edifact, DBIx::XML_RDB, XML::CSV, XML::Generator, XML::Dumper, and XML::Writer handle various aspects of serialization and deserialization of data between Perl data structures, XML, and other formats.

XML::GeneratorModule for the generation of XML from within Perl
XML::WriterHelper module for Perl programs that write XML documents. The module handles all escaping for attribute values and character data and constructs different types of markup.
XML::EdifactModule for translating UN/Edifact documents to XML

Protocols and libraries

In general, new protocols and standards are quickly supported in Perl. Extensions exist for SOAP, WDDX, RSS, XML-RPC, and Microsoft's BizTalk.

SOAP::LiteAn excellent implementation of the SOAP protocol
SOAP/PerlXML-based protocol for accessing services, objects, and servers in a platform-independent manner
WDDX.pmProtocol for exchange of data between different languages such as Perl, Java, and Cold Fusion. This module converts Perl variables to and from WDDX packets.
XML::RSSBasic framework for creating and maintaining Rich Site Summary (RSS) files. RSS is primarily used for distributing news headlines, commonly called channels.

Commercial products

Most Perl extensions enjoy community support under the open-source model. There are far fewer commercially supported systems and packages for Perl than there are for languages such as Java.

VelociGen XML Server, a commercial product (from the company where I work), leverages Perl as the language for exchange and processing of XML documents, and creation of Web services-based applications. Commercial support is also available for the open-source Axkit, which offers Web publishing and content management using XML.

AxKitOpen-source XML Web publishing and content management
VelociGenXWeb Services platform with database and legacy system connectivity. Exposes Perl interface for parsing, manipulating, and transforming XML documents.

Style sheets and query languages

XML has spawned a set of related standards for querying and transforming data. Two of the most popular are eXtensible Stylesheet Language Transformations (XSLT), and XML Path Language (XPath). XPath provides a common syntax and functionlity for addressing and searching parts of XML documents. XSLT uses XPath to allow transformations of XML documents to other XML documents.

XML::XSLTPerl implementation of XSL template processing. XML::XSLT performs transformations specified in an XSL style sheet to XML files.
XML::XPathImplementation of the W3C's XPath specification
XML::QLImplementation of W3C notes called "XML-QL: A Query Language for XML." Allows the user to query an XML document much like a database and describe a construct for output.
XML::XQLPerl implementation of XQL specification, allowing XQL queries on XML tree structures such as XML::DOM. Distributed as part of libxml-enno.
XML::LibXSLTPerl interface to the gnome libxslt library for high performance XSLT processing.
XML::XSLT::WrapperGeneric wrapper for the various XSLT modules
XML::XalanPerl interface to the Apache Xalan XSLT library

Database interfaces

Perl has long enjoyed excellent database support via the DBD/DBI modules. DBIx::XML_RDB uses these modules, building an XML wrapper around any popular database. XML::CSV provides similar support for text-delimited files, such as the popular comma-separated values and tab-delimited formats.

DBIx::XML_RDBPerl extension for creating XML from existing DBI data sources such as databases
XML::CSVConverts comma separated values to XML

Resources

  • CPAN is always an excellent source for finding Perl XML modules

  • To learn more about manipulation of XML documents with Perl and other scripting languages, see XML and scripting languages on developerWorks.

  • IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

About the author

Author photo: Parand Tony Darugar

Parand Tony Darugar is the head of architecture for Yahoo! Search Marketing Services (formerly Overture). His interests include Web services and Service Oriented Architectures (SOA), XML, high-performance business systems, distributed architectures, and artificial intelligence. You can reach him at tdarugar@yahoo.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12003
ArticleTitle=Perl developers: Fill your XML toolbox
publish-date=06012001
author1-email=tdarugar@yahoo com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers