Skip to main content

Tip: How to combine documents with XSLT

Exploring XPath functions

Benoit Marchal (bmarchal@pineapplesoft.com), consultant, Pineapplesoft
Photo of Benoit Marchal
Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. Contact Benoit at bmarchal@pineapplesoft.com for help with your XML projects.

Summary:  This tip explains how to write XSLT style sheets that process several documents. This is useful, among other things, for combining book chapters, merging a letter template and a list of addresses, creating tables of content that span several files, or -- following the tip example -- reusing photo descriptions in different galleries.

View more content in this series

Date:  29 May 2003
Level:  Introductory
Activity:  5943 views

XSLT offers an attractive mix of flexibility and power. XSLT is useful not only for publishing Web sites, but also for transforming or manipulating XML documents. Now that an XSLT processor ships with the Java platform (through the javax.xml.transform) package, you can't ignore it.

A recurring question from XSLT programmers is how should you deal with multiple documents? The Java API expects only two parameters: the source (input XML document) and the result (where to save the output). While this API is suitable for many applications, in some cases you need to combine several sources. Some examples include:

  • A mail merge, such as a direct mail marketing campaign, where the stylesheet merges names and addresses from a customer file with a letter template.
  • Converting code lists, such as product references. Country codes often require matching a catalogue file against a list of codes stored in a separate file.
  • Combining individual documents for publication. For instance, when publishing a book you might want to merge the chapter files or -- as you'll see in the following section -- photos files.

An example

The following example illustrates how you might use XML and XSLT to merge multiple documents for publication. Every photo has two files: the photo itself (in JPEG format) and an XML description with the title, date, and location. The description files might look like Listing 1.


Listing 1. geneva.xml -- an XML description of a photo
                
<?xml version="1.0"?>
<ph:photo xmlns:ph="http://ananas.org/2003/tips/photo">
   <ph:title>The Jet d'Eau fountain</ph:title>
   <ph:location>Geneva</ph:location>
   <ph:date>April 2003</ph:date>
   <ph:description>
      The Jet d'Eau fountain is the most recognizable symbol of Geneva.
      The fountain reaches 140 meters (460 feet) high, roughly the same height
      as the Embassy Suites hotel in Times Square.
   </ph:description>
</ph:photo>

The markup in Listing 1 is simple. It would not take long to write a stylesheet that publishes it in HTML. A more interesting question is how do you create a gallery that combines Listing 1 with other photo descriptions such as those in Listing 2? (You will find even more descriptions in the downloadable code, which you'll find in Resources.)


Listing 2. london.xml -- another XML description of a photo
                
<?xml version="1.0"?>
<ph:photo xmlns:ph="http://ananas.org/2003/tips/photo">
   <ph:title>Double-decker bus</ph:title>
   <ph:location>London</ph:location>
   <ph:date>October 2002</ph:date>
   <ph:description>
      An inescapable symbol of London, the double-decker bus is much taller
      than typical buses to carry many passengers through the city's overcrowded
      streets.
   </ph:description>
</ph:photo>

Before going any further, I want to show you a few definitions. The XML document you pass to the stylesheet through the Java API is called the main source. Other documents that the stylesheet loads are called secondary sources.


The document() function

Since the Java API accepts only one source, you have to load secondary sources from the stylesheet itself. This is done through the document() function, which takes a URI as a parameter. The function parses the document and returns a node set.

For completeness, document() also accepts a node-set as a parameter, in which case the function assumes each node is a URI. Finally, you can pass a combination of a string and a node-set as parameters. The function combines the two to decide which documents to load. Still, in most cases it is simpler to pass a single URI.

For example, the following XPath loads the geneva.xml document shown in Listing 1:

document('geneva.xml')

Because document() returns a node-set, you can insert it almost anywhere an XPath is valid. More specifically, you can add selectors to the path. For example, the following XPath loads the geneva.xml document and returns its title:

document('geneva.xml')/ph:photo/ph:title

XPaths with the document() function can appear anywhere an XPath is legal, such as the select attribute for the xsl:value-of and xsl:apply-templates instructions. Therefore, to display the document title, you could write the following instruction:

<xsl:value-of select="document('geneva.xml')/ph:photo/ph:title"/>

Be careful when writing XPaths for secondary sources. It's easy to forget the root, but if you do, the XPath returns an empty node-set. For example, if I forget ph:photo (the root of the secondary document) in the XPath, I end up with the following XPath, which returns the empty node-set:

document('geneva.xml')/ph:title


Combining documents

To summarize, when combining documents with XSLT, you want to start from the main source and load the secondary sources. One of the easiest solutions is to list the secondary sources in the main one. The main source looks like Listing 3 -- it simply lists the documents to combine (as well as providing a title for the collection).


Listing 3. index.xml lists the secondary sources
                
<?xml version="1.0"?>
<ph:index xmlns:ph="http://ananas.org/2003/tips/photo">
   <ph:title>City sights</ph:title>
   <ph:entry>geneva</ph:entry>
   <ph:entry>london</ph:entry>
   <ph:entry>paris</ph:entry>
   <ph:entry>roma</ph:entry>
</ph:index>

Writing the main source

You can generate Listing 3 automatically (a small Java component goes through a directory and automatically generates it). XM, the publishing framework that I developed for my Working XML column (also on developerWorks) includes such a component.

Listing 4 is the stylesheet itself. It defines templates for the elements in the main source and the secondary sources. The link between the two is in the ph:index template, which parses the secondary sources. The xsl:apply-templates instruction means that the processor walks through the secondary source and applies templates as it goes along. In practice, it behaves as if the secondary sources were included in the main one.


Listing 4. merge.xsl -- the stylesheet processes the documents listed in the main source
                
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ph="http://ananas.org/2003/tips/photo">

<xsl:output method="html"/>

<xsl:template match="ph:index">
   <html>
      <head><title><xsl:value-of select="ph:title"/></title></head>
      <xsl:apply-templates/>
   </html>
</xsl:template>

<xsl:template match="ph:index/ph:title">
   <h1><xsl:apply-templates/></h1>
</xsl:template>

<xsl:template match="ph:entry">
   <img src="{concat(.,'.jpg')}" align="right"/>
   <xsl:apply-templates select="document(concat(.,'.xml'))"/>
   <br clear="right"/>
</xsl:template>

<xsl:template match="ph:photo/ph:title">
   <h2><xsl:apply-templates/></h2>
</xsl:template>

<xsl:template match="ph:location">
   <h3>in <xsl:apply-templates/></h3>
</xsl:template>

<xsl:template match="ph:date">
   <p>Date: <xsl:apply-templates/></p>
</xsl:template>

<xsl:template match="ph:description">
   <p><xsl:apply-templates/></p>
</xsl:template>

</xsl:stylesheet>

The result of applying Listing 4 to Listing 3 is an HTML document with all the photos. One of the benefits of this approach is that you can add or remove photos to the gallery by editing the index.xml document. Furthermore, it's easy to share photos across different galleries. Say you want a second gallery with more photos of Geneva. You only need to create another list, similar to Listing 3, that points to these photos. You can include the description from Listing 1 in both galleries, but you don't have to duplicate it.


Other applications

The technique described in this tip has numerous applications. You can use the document() function to create indices or tables of content that span multiple files. I have also used it to retrieve link titles: Just follow the URI with the document() function and recover the document title. I have also used it to go through daily logs and generate a monthly report.



Download

NameSizeDownload method
x-tipcombxslt/x-combxsltcode.zip HTTP

Information about download methods


Resources

About the author

Photo of Benoit Marchal

Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. Contact Benoit at bmarchal@pineapplesoft.com for help with your XML projects.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12275
ArticleTitle=Tip: How to combine documents with XSLT
publish-date=05292003
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers