Skip to main content

skip to main content

developerWorks  >  XML  >

Tip: Divide and conquer large XML documents

How to break up documents with XSLT

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


Rate this page

Help us improve this content


Level: Introductory

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

05 Jun 2003

Occasionally, you get an XML file that is too large to publish as is. The solution is to use your XSLT processor to break the file into smaller documents. This tip demonstrates how to break up documents with popular XSLT processors.

In a previous tip (see Resources), I explained how to combine different XML documents in a stylesheet. To illustrate the example, I used a photo gallery composed of four individual XML documents that were eventually combined into one Web page. This technique is also handy for log files (combining daily logs into a monthly report) and tables of contents (combining several chapters into a single table of contents).

This tip takes a different approach: What if you have one XML document, but you need several pages in the output? You can break long documents into smaller pages so that they download faster.

A photo gallery

The document in Listing 1 is a small photo gallery with descriptions for four photos. Your task is to style the document into a small Web site. To speed up the download, each photo should be on a page of its own. The difficult part is splitting the original document into as many pages as there are photos.

When splitting pages, there's good news and bad news. The good news is that you can split a document into as many pages as you like with just a regular XSLT processor. The bad news is that this is not (yet) a standard feature; as you will see, every XSLT processor has a different implementation. Fortunately, the differences are only cosmetic.



Back to top


Instant publishing

Listing 2 is a stylesheet for publishing the photo gallery. Pay particular attention to the template for gl:photo. This template creates a separate HTML page that is stored as a file of its own with the xalan:redirect instruction. This stylesheet has been tested with JDK 1.4.1 and it works with either the JDK XSLT processor or Xalan only (Apache Xalan is the reference implementation for JAXP).


Listing 2. jdk.xsl -- a stylesheet for the JDK 1.4 (and Xalan)
                
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery"
          xmlns:xalan="org.apache.xalan.xslt.extensions.Redirect"
                extension-element-prefixes="xalan">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-0.html">here</a>.</p>
      </body>
   </html>
   <xsl:apply-templates select="gl:photo"/>
</xsl:template>

<xsl:template match="gl:photo">
        <xalan:write select="concat('photo-',position(),'.html')">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </xalan:write>
</xsl:template>

</xsl:stylesheet>

The source code (see Resources) includes the document, the stylesheet, and a small Java application to test this code. Make sure you're running the example on JDK 1.4.

xalan:redirect tells the processor to save the content of the element in a distinct file. The name of the file is given through the select attribute. In this example, the stylesheet computes the file names by adding the photo number (its position to be more precise) to photo-x. The files are called photo-1.html, photo-2.html, photo-3.html, and photo-4.html.

Unfortunately, xalan:redirect is not part of the standard, so no other processors recognize it. xalan:redirect is implemented as an extension. To declare the extension, you must first declare a namespace for the org.apache.xalan.xslt.extensions.Redirect URI. Admittedly, org.apache.xalan.xslt.extensions.Redirect is not a valid URI, but Xalan still recognizes it. Next, you have to declare the namespace as an extension through the extension-element-prefixes attribute. Both the namespace declaration and the extension-element-prefixes attribute must appear on the xsl:stylesheet element.



Back to top


What about other processors?

Xalan is a good processor, but what if you want to use another one? For the time being, you have to dig into the documentation of your favorite XSLT processor and find the equivalent extension. To the best of my knowledge, every XSLT processor offers at least one extension to support multiple output documents.

For example, if you'd rather use Michael Kay's excellent Saxon processor, then you would rewrite the gl:photo template to use saxon:output instead of xalan:redirect. The changes are minimal because saxon:output is very similar to xalan:redirect. Listing 3 is a Saxon version of Listing 2 (note the use of a Saxon-defined namespace for the extension).


Listing 3. saxon.xsl -- a stylesheet for Saxon
                
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery"
                xmlns:saxon="http://icl.com/saxon" 
                extension-element-prefixes="saxon">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-1.html">here</a>.</p>
      </body>
   </html>
   <xsl:apply-templates select="gl:photo"/>
</xsl:template>

<xsl:template match="gl:photo">
   <saxon:output href="photo-{position()}.html">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </saxon:output>
</xsl:template>

</xsl:stylesheet>

XSLT 2.0, which is currently under development with the W3C, defines a standard instruction to generate multiple outputs. In practice, it is very similar to xalan:redirect or saxon:output, but it has been given a standard name. In the May 2, 2003 draft of XSLT 2.0 (the last draft available at the time of this writing), the instruction is xsl:result-document. Listing 4 demonstrates its use. Note that this is an XSLT 2.0 stylesheet, as denoted by the version attribute.


Listing 4. xsl2.xsl -- an XSLT 2.0 stylesheet takes advantage of the new standard xsl:result-document instruction
                <?xml version="1.0"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:gl="http://ananas.org/2003/tips/gallery">

<xsl:output method="html"/>

<xsl:template match="gl:gallery">
   <xsl:apply-templates select="gl:photo"/>
   <html>
      <head><title><xsl:value-of select="gl:title"/></title></head>
      <body>
         <h1><xsl:value-of select="gl:title"/></h1>
         <p>The photos are <a href="photo-0.html">here</a>.</p>
      </body>
   </html>
</xsl:template>

<xsl:template match="gl:photo">
   <xsl:result-document href="photo-{position()}.html">
      <html>
         <head><title><xsl:value-of select="gl:title"/></title></head>
         <body>
            <img src="{gl:image}" align="left"/>
            <h1><xsl:value-of select="gl:title"/></h1>
            <p><xsl:value-of select="gl:date"/></p>
            <p><xsl:value-of select="gl:description"/></p>
            <p>
               <xsl:if test="preceding-sibling::gl:photo">
                  <a href="photo-{position() - 1}.html">Previous</a>
                  <xsl:text> </xsl:text>
               </xsl:if>
               <xsl:if test="following-sibling::gl:photo">
                  <a href="photo-{position() + 1}.html">Next</a>
               </xsl:if>
            </p>
         </body>
      </html>
   </xsl:result-document>
</xsl:template>

</xsl:stylesheet>



Back to top


Conclusion

When publishing a Web site, it is often helpful to break one XML document into several smaller files, which download faster and are more effective. This is also helpful when working with frames. However, multiple outputs are not limited to publishing. I have also used this technique in electronic commerce projects to break large database exports into smaller, more manageable files.




Back to top


Download

NameSizeDownload method
x-tipdivbigcode.zipHTTP
Information about download methods


Resources



About the author

Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top