Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Tip: Divide and conquer large XML documents

How to break up documents with XSLT

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com.

Summary:  Occasionally, you get an XML file that is too large to publish as is. The solution is to use your XSLT processor to break the file into smaller documents. This tip demonstrates how to break up documents with popular XSLT processors.

View more content in this series

Date:  05 Jun 2003
Level:  Introductory
Also available in:   Russian  Japanese

Activity:  11929 views
Comments:  

In a previous tip (see Resources), I explained how to combine different XML documents in a stylesheet. To illustrate the example, I used a photo gallery composed of four individual XML documents that were eventually combined into one Web page. This technique is also handy for log files (combining daily logs into a monthly report) and tables of contents (combining several chapters into a single table of contents).

This tip takes a different approach: What if you have one XML document, but you need several pages in the output? You can break long documents into smaller pages so that they download faster.

A photo gallery

The document in Listing 1 is a small photo gallery with descriptions for four photos. Your task is to style the document into a small Web site. To speed up the download, each photo should be on a page of its own. The difficult part is splitting the original document into as many pages as there are photos.

When splitting pages, there's good news and bad news. The good news is that you can split a document into as many pages as you like with just a regular XSLT processor. The bad news is that this is not (yet) a standard feature; as you will see, every XSLT processor has a different implementation. Fortunately, the differences are only cosmetic.


Instant publishing

Listing 2 is a stylesheet for publishing the photo gallery. Pay particular attention to the template for gl:photo. This template creates a separate HTML page that is stored as a file of its own with the xalan:redirect instruction. This stylesheet has been tested with JDK 1.4.1 and it works with either the JDK XSLT processor or Xalan only (Apache Xalan is the reference implementation for JAXP).


Listing 2. jdk.xsl -- a stylesheet for the JDK 1.4 (and Xalan)
                
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery"
          xmlns:xalan="org.apache.xalan.xslt.extensions.Redirect"
                extension-element-prefixes="xalan">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-0.html">here</a>.</p>
      </body>
   </html>
   <xsl:apply-templates select="gl:photo"/>
</xsl:template>

<xsl:template match="gl:photo">
        <xalan:write select="concat('photo-',position(),'.html')">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </xalan:write>
</xsl:template>

</xsl:stylesheet>

The source code (see Resources) includes the document, the stylesheet, and a small Java application to test this code. Make sure you're running the example on JDK 1.4.

xalan:redirect tells the processor to save the content of the element in a distinct file. The name of the file is given through the select attribute. In this example, the stylesheet computes the file names by adding the photo number (its position to be more precise) to photo-x. The files are called photo-1.html, photo-2.html, photo-3.html, and photo-4.html.

Unfortunately, xalan:redirect is not part of the standard, so no other processors recognize it. xalan:redirect is implemented as an extension. To declare the extension, you must first declare a namespace for the org.apache.xalan.xslt.extensions.Redirect URI. Admittedly, org.apache.xalan.xslt.extensions.Redirect is not a valid URI, but Xalan still recognizes it. Next, you have to declare the namespace as an extension through the extension-element-prefixes attribute. Both the namespace declaration and the extension-element-prefixes attribute must appear on the xsl:stylesheet element.


What about other processors?

Xalan is a good processor, but what if you want to use another one? For the time being, you have to dig into the documentation of your favorite XSLT processor and find the equivalent extension. To the best of my knowledge, every XSLT processor offers at least one extension to support multiple output documents.

For example, if you'd rather use Michael Kay's excellent Saxon processor, then you would rewrite the gl:photo template to use saxon:output instead of xalan:redirect. The changes are minimal because saxon:output is very similar to xalan:redirect. Listing 3 is a Saxon version of Listing 2 (note the use of a Saxon-defined namespace for the extension).


Listing 3. saxon.xsl -- a stylesheet for Saxon
                
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery"
                xmlns:saxon="http://icl.com/saxon" 
                extension-element-prefixes="saxon">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-1.html">here</a>.</p>
      </body>
   </html>
   <xsl:apply-templates select="gl:photo"/>
</xsl:template>

<xsl:template match="gl:photo">
   <saxon:output href="photo-{position()}.html">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </saxon:output>
</xsl:template>

</xsl:stylesheet>

XSLT 2.0, which is currently under development with the W3C, defines a standard instruction to generate multiple outputs. In practice, it is very similar to xalan:redirect or saxon:output, but it has been given a standard name. In the May 2, 2003 draft of XSLT 2.0 (the last draft available at the time of this writing), the instruction is xsl:result-document. Listing 4 demonstrates its use. Note that this is an XSLT 2.0 stylesheet, as denoted by the version attribute.


Listing 4. xsl2.xsl -- an XSLT 2.0 stylesheet takes advantage of the new standard xsl:result-document instruction
                <?xml version="1.0"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <xsl:apply-templates select="gl:photo"/>
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-0.html">here</a>.</p>
      </body>
   </html>
</xsl:template>

<xsl:template match="gl:photo">
   <xsl:result-document href="photo-{position()}.html">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </xsl:result-document>
</xsl:template>

</xsl:stylesheet>


Conclusion

When publishing a Web site, it is often helpful to break one XML document into several smaller files, which download faster and are more effective. This is also helpful when working with frames. However, multiple outputs are not limited to publishing. I have also used this technique in electronic commerce projects to break large database exports into smaller, more manageable files.



Download

NameSizeDownload method
x-tipdivbigcode.zip HTTP

Information about download methods


Resources

About the author

Benoit Marchal

Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12277
ArticleTitle=Tip: Divide and conquer large XML documents
publish-date=06052003
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers