Skip to main content

Tip: Use XSL-FO for page breaks and tables

Produce great-looking documents with keep properties and blind tables

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com.

Summary:  The XSL Formatting Objects (XSL-FO) standard offers powerful properties for controlling the layout of printed documents. This tip shows you how to control the insertion of page breaks for better-looking documents. I'll present a standard method that works with commercial XSL-FO renderers, and a workaround so you can apply the same technique with the open source FOP.

View more content in this series

Date:  11 Jun 2003
Level:  Introductory
Activity:  4812 views

The XSL Formatting Objects (XSL-FO) standard is one of the least known parts of the XSL standard (the more prominent part being XSLT). This is unfortunate because XSL-FO can be very useful for such tasks as producing PDF or Postscript files from XML documents. Generally speaking, XSL-FO controls the layout and the presentation of XML documents. In practice, most users find that HTML pages are suitable for on-screen reading, but they prefer PDFs (and therefore XSL-FO) for printed copies.

Control page breaks

One of the benefits of XSL-FO is that it automatically places a document's text on as many pages as necessary. When a page is full, XSL-FO inserts the next page automatically. Unfortunately, the algorithm may sometimes insert page breaks at the wrong positions, such as breaks between a section title and the section content. Likewise, it sometimes separates figure labels from the actual image, or table headings from the table rows.

Figure 1 and Figure 2 illustrate the problem. In Figure 1, the section title is at the bottom of a page while the remainder of the section is on the next page.


Figure 1. An inappropriate page break after the section title
Page break after section title

In Figure 2, the image label has been separated from the image.


Figure 2. An inappropriate page break after an image label
Page break after image label

Problems arise because the XSL-FO renderer has a bunch of blocks to work with and it does not know how they are related to each other. It does not know that the title is . . . well, a title. The solution is to tell the XSL-FO renderer which blocks are related to each other or, more specifically, which blocks should remain together. To this end, the standard defines several properties in the keep and break category. The keep-with-previous and keep-with-next properties specify whether a block should remain with the previous or next block.

The properties apply to the within-line, within-column, and within-page components. As their names imply, these components control the level at which the grouping of blocks should occur. Typically, I use the within-page component.

Acceptable values are auto (no special processing), always (always keeps the blocks on the same page), or an integer. Integers specify a priority so if several keep properties are in conflict , the one with the highest numeric priority takes precedence. Overall, the always value has the highest priority.

Listing 1 is an excerpt from a stylesheet that uses the keep properties to prevent breaks between the title and the section or breaks between the label and the image. In XSL-FO, the property is an attribute. (See Resources to download the companion code for the complete stylesheet.)


Listing 1. The keep properties
                <xsl:template match="doc:title" mode="keep">
   <fo:block font-family="Times Roman"
             font-size="12pt"
             space-after="0.6em"
             keep-with-next.within-page="always">
      <xsl:apply-templates mode="keep"/>
   </fo:block>
</xsl:template>

<xsl:template match="doc:figure" mode="keep">
   <fo:block font-family="Times Roman"
             font-size="8pt"
             font-style="italic"
             space-after="0.5em">
      <xsl:value-of select="@label"/>
   </fo:block>
   <fo:block keep-with-previous.within-page="always">
      <fo:external-graphic src="url({@src})"
                           width="{@width}"
                           height="{@height}"/>
   </fo:block>
</xsl:template>

Be warned, though, that FOP -- Formatting Objects Processor, the open-source XSL processor from Apache -- does not fully implement the keep properties. If you use Listing 1, FOP ignores the keep properties, which may still cause inappropriate page breaks. To the best of my knowledge, only commercial renderers such as XEP from RenderX or XSL Formatter from Antenna House (see Resources) implement the keep properties -- at least at the time of this writing.


FOP workaround

Given that commercial implementations offer more complete support for the standard, buying a copy is probably the best solution for the time being. Still, with licensing costs around US$5,000 per server (though limited workstation licenses can be as low as US$79), not every project can afford its own XSL-FO renderer. This is particularly true in the early phases of projects when budgets are limited.

While a workaround for using the keep properties with FOP is available, it's only a partial one. FOP recognizes the keep properties for table rows only. The limited support is useful to group table headings with the remainder of the table, but it also provides a workaround until full support is available. The solution relies on so-called blind tables (tables introduced exclusively for layout, but that are not visible as such). If you've done any HTML coding, you should already be familiar with using tables for layout only (such as to emulate columns).

Listing 2 is another stylesheet excerpt that demonstrates the keep properties in a blind table with FOP. This version of the stylesheet prevents inappropriate page breaks. The template for doc:figure creates a blind table with the label in the first row and the image in the second row. The keep-with-previous property is applied to the second row.


Listing 2. A blind table
                <xsl:template match="doc:figure" mode="blind">
   <fo:table table-layout="fixed" width="100%">
      <fo:table-column column-width="proportional-column-width(1)"/>
      <fo:table-body>
         <fo:table-row padding-bottom="0.5em">
            <fo:table-cell>
               <fo:block font-family="Times Roman"
                         font-style="italic"
                         font-size="8pt">
                  <xsl:value-of select="@label"/>
               </fo:block>
            </fo:table-cell>
         </fo:table-row>
         <fo:table-row keep-with-previous="always">
            <fo:table-cell>
               <fo:block>
                  <fo:external-graphic src="url({@src})"
                                       width="{@width}"
                                       height="{@height}"/>
                </fo:block>
            </fo:table-cell>
         </fo:table-row>
     </fo:table-body>
  </fo:table>
</xsl:template>

<xsl:template match="doc:section" mode="blind">
   <fo:table table-layout="fixed" width="100%">
      <fo:table-column column-number="1"/>
      <fo:table-body>
         <fo:table-row keep-with-next="always">
            <fo:table-cell>
               <xsl:apply-templates select="doc:title" mode="blind"/>
            </fo:table-cell>
         </fo:table-row>
         <fo:table-row>
            <fo:table-cell>
               <xsl:apply-templates select="*[2]" mode="blind"/>
            </fo:table-cell>
         </fo:table-row>
     </fo:table-body>
  </fo:table>
  <xsl:apply-templates select="*[position() > 2]" mode="blind"/>
</xsl:template>

The doc:section template is similar. Again, it creates a blind table with two rows. The title goes in the first row, the first paragraph in the second row. Care must be taken to select the first paragraph. I choose to select the paragraph by position for increased flexibility.

In practice, blind tables are best suited for figure labels and table headings rather than section titles. FOP is reluctant to break in the middle of a cell, which means that it will resist breaking long paragraphs if they are included in a blind table. This, in turn, may cause more page breaks than are strictly necessary. You need to evaluate where to apply this technique in your documents.

One improvement might be to generate a blind table for short paragraphs only (you can evaluate the length of a paragraph with the string-length() function).


Summary

Thanks to XSL-FO, it is not difficult to turn XML documents into PDFs, and XSL-FO gives you lots of control over the final layout of the document, as this tip demonstrates.



Download

NameSizeDownload method
x-tippgbkcode4.zip HTTP

Information about download methods


Resources

  • Download the source code used in this article. This download includes a test document and a stylesheet that produces a PDF demonstrating the three techniques: no concern for page breaks, using break properties, and using a blind table. The second technique won't work with FOP.

  • To test the code, you need an XSLT processor such as Xalan-Java and an XSL-FO renderer. FOP is a popular open source XSL-FO renderer. Be warned, though, that it does not fully implement the XSL-FO standard. For more complete standard coverage, you should turn to commercial renderers such as XEP from RenderX or the XSL Formatter from Antenna House.

  • Learn XSL-FO with these developerWorks tutorials:

  • If you already know HTML, the HTML to Formatting Objects (FO) conversion guide will help you be productive in no time (developerWorks, February 2003).

  • Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

  • Find out how you can become an IBM Certified Developer in XML and related technologies.

About the author

Benoit Marchal

Benoit Marchal is a Belgian consultant. He is the author of XML by Example and other XML books. Benoit is available to help you with XML projects. You can contact him at bmarchal@pineapplesoft.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12281
ArticleTitle=Tip: Use XSL-FO for page breaks and tables
publish-date=06112003
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers