Skip to main content

Tip: Batch processing XML with XSLT 2.0

Use directory listings in XML to drive XSLT 2.0 processing

Jack Herrington (jherr@pobox.com), Editor-in-Chief, Code Generation Network
An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Summary:  A common problem with XSLT is that it takes only a single XML file as input. You can use a cross-platform Java™ tool to create an XML directory listing, then use XSLT to process every file in the directory from that listing. This tip covers installation and use of such a tool, as well as the corresponding XSL that processes multiple files from the directory listing.

View more content in this series

Date:  07 Mar 2005
Level:  Introductory
Activity:  2516 views

Don't you wish that XSLT processors like Saxon could use more than one file as input? Often, you're faced with a directory of XML files that require conversion into HTML. You could run Saxon on each of them, but what if you want another file at the end that has an index to all the HTML files you've created?

What you need is an XML version of the directory listing. Then, you could use that XML file as the single input file to XSLT and process each file using XSLT. It would be wonderful if you could do the directory processing in XSLT directly. Unfortunately, with all the power of XSLT -- and particularly XSLT 2.0 -- the language still doesn't have directory operations.

HXDLG to the rescue!

While surfing the Web, I found an obscure little Java program called the HTML/XML Directory List Generator (HXDLG) on SourceForge (see Resources). One of the functions of HXDLG is to create either HTML or XML representations of directory listings. I downloaded the tool and ran the statement in Listing 1 from the command line.


Listing 1. Code to create an XML directory using HXDLG
                
java -jar hdlg.jar XML
   /Users/jherr/Projects/ibm_xml_tips/filelist/testfiles/
   /Users/jherr/Projects/ibm_xml_tips/filelist/files.xml

The program takes three arguments. The first argument is the output type -- either XML or HTML. The second argument is the directory path. The third argument is the path of the output XML file. The result looks something like the code in Listing 2.


Listing 2. The directory in XML
                
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hdlg:filesystem SYSTEM
  "http://www.hdlg.info/XML/filesystem.dtd">
<hdlg:filesystem
 xmlns:hdlg="http://www.hdlg.info/XML/filesystem">
   <hdlg:folder name="testfiles"
     url="file:/Users/jherr/Projects/ibm_xml_tips/filelist/testfiles/">
      <hdlg:file name="test1.xml" size="179"
         type="unknown"
  url="file:/ibm_xml_tips/filelist/testfiles/test1.xml">
      </hdlg:file>
      <hdlg:file name="test2.xml" size="181"
         type="unknown"
  url="file:/ibm_xml_tips/filelist/testfiles/test2.xml">
      </hdlg:file>
      <hdlg:file name="test3.xml" size="181"
         type="unknown"
  url="file:/ibm_xml_tips/filelist/testfiles/test3.xml">
      </hdlg:file>
   </hdlg:folder>
</hdlg:filesystem>

That's some high-end stuff. It has a Document Type Definition (DTD) and uses namespaces, and also has the file names and URLs that you're looking for. With absolute paths, to boot!


Test it out

To test this system, I'm using a sample set of test results in three different XML files: test1.xml, test2.xml, and test3.xml. I want to read them all and create corresponding HTML files for each one. Listing 3 shows one such sample test file.


Listing 3. A test file in XML
                
<?xml version="1.0" encoding="UTF-8"?>
<testrun run="test1">
    <test name="foo" pass="true" />
    <test name="bar" pass="true" />
    <test name="baz" pass="true" />
</testrun>

The first step is to run HXDLG to get the directory listing in XML. This directory listing contains the URLs of the test files and will be the input to the XSL stylesheet.

Reading from multiple files in XSL

For the first pass, I'm just going to read the files and print the test name (see Listing 4). Doing so ensures that I can parse the directory structure and read the target files.


Listing 4. Printout of test names
                
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
<xsl:output method="text"  indent="no"/>
    
<xsl:template match="/">
<xsl:for-each select="//*:file">
<xsl:variable select="document(@url)" name="contents" />
<xsl:value-of select="$contents/testrun/@run" /><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
   
</xsl:stylesheet>

The first thing the XSLT engine does with the directory listing, which is the input, is match it to the template. The template then iterates through each file tag using the for-each XSL tag. The fun stuff happens when I use the XSL variable tag to call document, which reads the contents of the specified XML file into the variable. XSL makes reading XML documents a snap.

Now, with the contents of the XML test file in hand, I use the value-of tag to print the name of the test run followed by a carriage return with the xsl:text tag (see Listing 5).


Listing 5. The output of the first XSL template
                
test1
test2
test3

The output shows three files and three tests. So, the tool's working so far. Now all I have to do is build the HTML for each test result. To do that, I'm going to use the xsl:result-document tag, a new feature of XSLT 2.0. (That's why in Listing 6, the version attribute on the stylesheet tag has been bumped to 2.0.)


Listing 6. The stylesheet that creates the HTML files
                
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
   version="2.0">
    
<xsl:output method="text" indent="no"/>
<xsl:output method="html" indent="yes" name="html"/>

<xsl:template match="/">
<xsl:for-each select="//*:file">
<xsl:variable select="document(@url)" name="contents" />
<xsl:variable select="replace(@url,'[.]xml','.html')"
  name="newfile" />
Creating <xsl:value-of select="$newfile" />
<xsl:result-document href="{$newfile}" format="html">
 <html><body>
 Test run: <xsl:value-of select="$contents/testrun/@run" />
 </body></html>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
   
</xsl:stylesheet>

Where I used to print the test run name, I now use a variable tag to build a new file name for the HTML. Using the XPath function replace, I take the original URL and replace the .xml extension with .html to create the new file name.

Next, I print the name of the file to let the user know what I'm creating. This is always a good idea because otherwise you would see nothing and have no idea whether the stylesheet did anything.

After I print the message, I use the xsl:result-document tag to create the new file, with some HTML that gives the name of the test run. One thing to notice here is that I had to use a format statement to specify that the output file should be HTML. If I hadn't done this, the file that I created would be in text format and all the HTML tags would have been ignored.


Summary

Batch processing in XSLT 2.0 is simple if you have a directory listing utility that exports XML and know how to use the xsl:result-document tag to redirect the output of the engine. With these tools in hand, you no longer need fear the directory of XML files that you once might have merged into one mega-file to ease processing.


Resources

About the author

An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=50957
ArticleTitle=Tip: Batch processing XML with XSLT 2.0
publish-date=03072005
author1-email=jherr@pobox.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers