Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Analyze with XSLT, Part 1: Analyze non-XML data with XSLT

Create string parsing routines to convert documents into XML elements

Chuck White (chuck@tumeric.net), XSLT consultant and Web engineer, Freelance Developer
Chuck White, a Studio B author, has been working with XML since before its official inception in February, 1998. He was co-author of Mastering XML Premium Edition (with Linda Burman and the W3C's XML Activity Lead, Liam Quin) and author of Mastering XSLT, both from Sybex Books. His latest books are Developing Killer Web Apps with Dreamweaver MX & C# (also for Sybex Books) and HTML, XHTML, and CSS Bible, 3rd Edition for Wiley, for which he is co-author with Steve Schafer. Chuck is currently working with the XSL Team at eBay as a project consultant and Web engineer.

Summary:  This tutorial explores how to create string parsing routines in XSLT so that you can tokenize straight, non-XML text, thus turning that text into a series of XML elements. Specifically, this tutorial examines how to convert such documents as weblogs and Web configuration files into XML for improved readability and programmatic access.

View more content in this series

Date:  16 Dec 2003
Level:  Introductory PDF:  A4 and Letter (106 KB | 28 pages)Get Adobe® Reader®

Activity:  12331 views
Comments:  

Finding and using generic templates

Introducing EXSLT

You can do a Web search to find generic templates. MindMap team members generally type in the functionality they're looking for, then the term "generic template" (in quotes), then "XSLT". For example, one of the MindMap team members who wants a generic string replacement template might type the following into Google:

string replace "generic template" XSLT

This has resulted in a growing library of reusable templates for the MindMap team. (See Resources for more EXSLT links.)

EXSLT consists of a group of extensions to the XSLT language. These extensions have been developed by a group of dedicated XSLT enthusiasts. The underlying architecture consists of generic templates, but there is one caveat: They're often not portable in the way other XSLT templates are because they sometimes (but not always) rely on proprietary extensions and sometimes JavaScript.

An EXSLT function's purpose can range from handling regular expressions to manipulating node sets on the fly (both of which are common wish list items for XSLT developers, and both will be standard features in XSLT 2.0). Which extensions you use generally depend on which XSLT processor you are using. They work if the processor has built-in support for them, which is where the dependency lies. You can't use an extension built for Saxon on MSXML processors, and vice versa, so EXSLT is an attempt to standardize extensions. However, you need to check the EXSLT site (in Resources) to determine if the processor you're using supports a specific extension. Luckily, the site is replete with sample generic templates and examples on how to use them.


Using an EXSLT function

The first step in using an EXSLT function is to go to www.exslt.org and look for the extension you want. For example, that Web site has some very handy string manipulation functions, and even a regular expression function. Some of them are also less stable than others. In preparing for this article, I found a number of them that simply didn't work, but the site is pretty reliable about indicating whether or not specific functions can be relied on.

The main trick in using an EXSLT function is to find out what that function actually is. I refer to EXSLT functions as either a process defined in a named template, which you then call with a parameter value, or as an actual function. Take a look, for example at this call to the EXSLT math max function, which works with the Saxon and Xalan processors (but not MSXML).

<?xml version="1.0" encoding="utf-8"?>
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:func="http://exslt.org/functions" 
xmlns:math="http://exslt.org/math" version="1.0" 
extension-element-prefixes="math"
math:doc="http://www.exslt.org/math">
<xsl:output method="html" />
   <import href="math.max.function.xsl"/>
   <import href="math.max.template.xsl"/>
   <xsl:template match="/">
   <MaxMachine>
      <xsl:call-template name="math:max">
         <xsl:with-param name="nodes"
                         select="MachineName/value" />
      </xsl:call-template>
   </MaxMachine>
   </xsl:template>
</stylesheet>

The use of the namespaces is an absolute requirement because they tell the processor to expect an extension function defined within the EXSLT library. The nodes parameter assumes an element named values that contains numbers. If you have string characters, you'll need to parse those out of the value element first. The result of applying the template will be the maximum value of all the value elements.

One advantage of generic templates like this is that you just need to know how to call the parameters and where to put your template within the scope of the rest of your document.

Note: Generally, if the site says a specific function is not considered stable, they really mean it.

Many of the templates in the EXSLT library have since found their way, in simplified form, into XSLT 2.0. And as XSLT 2.0 approaches, the MindMap team is nearly breathless with anticipation, because they'll be able to use one of the most powerful string parsing mechanisms available: regular expressions.

7 of 12 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=138216
TutorialTitle=Analyze with XSLT, Part 1: Analyze non-XML data with XSLT
publish-date=12162003
author1-email=chuck@tumeric.net
author1-email-cc=dwxed@us.ibm.com