HTML to Formatting Objects (FO) conversion guide

Use these XSLT templates to speed your conversions of HTML elements to FO and thence to PDF

Need help converting HTML documents to PDF? This reference guide shows by example how to use XSLT templates to convert 45 commonly used HTML elements to formatting objects (from the XSL-FO vocabulary) for easy transformation to PDF using XSLT. The examples assume that you're using the Apache XML Project's FOP tool, but most of the methods work just as well with other XSL-FO tools.

Doug Tidwell, Cyber Evangelist for developerWorks, IBM

Photo of Doug TidwellDoug Tidwell is developerWorks' Cyber Evangelist, helping people use new technologies to solve problems. He has spoken about Web Services and XML to tens of thousands of developers around the world, a number of whom actually stayed awake. He is also the author of O'Reilly's XSLT and a co-author of O'Reilly's Programming Web Services with SOAP, both of which make excellent gifts for your friends and loved ones. You can contact Doug at dtidwell@us.ibm.com.



11 December 2012 (First published 01 February 2003)

Also available in Japanese

11 Dec 2012 - The author updated the article and the accompanying stylesheet to work with FOP Version 1.1. He changed the <fo:page-sequence> element and the implementation of the HTML <nobr> element (see the <nobr> Text with no line breaks section). The stylesheet also uses the standard <fo:bookmark-tree>, <fo:bookmark> and <fo:bookmark-title> elements to generate bookmarks in the PDF file (see Generating bookmarks under <h1> through <h6> Headings). See the Downloads section to get the latest version of the stylesheet.

20 Jun 2011 - The author requested two updates to the second code listing of the <ul> An unordered list section. He added an fo:block element around the content of two fo:list-item-body elements which changed A Love Supreme to <fo:block>A Love Supreme</fo:block> and changed The Joshua Tree to <fo:block>The Joshua Tree</fo:block>.

We all design our HTML pages to look good on the screen, but printing those Web pages is usually an afterthought. To create printable versions of Web pages, the best approach is to use XSLT and XSL-FO to generate a PDF file. You can do the job with an open-source XSLT processor, the XSL Formatting Objects (XSL-FO) vocabulary, and a formatting-object engine. If you already know how to work with XSL-FO and XSLT, this guide provides a valuable resource: It goes through the most common HTML tags and defines how to convert each of them into formatting objects. (If you need background on using XSL-FO, try the developerWorks tutorials on the subject, easily found through Resources.)

This guide includes dozens of examples that illustrate how to write XSLT style sheets to do the conversion from HTML element to the corresponding formatting object, the basic building block of documents rendered with XSL-FO.

A quick note about the XSLT templates in this guide; almost all of them contain this text:

<xsl:apply-templates select="*|text()"/>

This element tells the XSLT processor to get all of the text and child elements of the current element and transform them as well. This recursive technique ensures that all of the HTML elements are processed, regardless of how they are nested within each other. For more information about XSLT and XSL-FO techniques, see Resources at the end of this guide.

Context by example

For a little context for this reference material, you can view an HTML document (download x-xslfo2app-samples.zip to view everything.html) which contains all of the elements discussed in the guide. You can also see the XSLT style sheet, xhtml-to-xslfo.xsl (found in x-xslfo2app-samples.zip), that contains all of the templates referred to in this guide, along with most of the advanced techniques covered in the companion tutorial. To use the style sheet with the HTML file, use this command:

> fop -xml everything.html -xsl xhtml-to-xslfo.xsl -pdf everything.pdf

The command tells the FOP rendering engine to transform the XHTML file everything.html into formatting objects using the rules in xhtml-to-xslfo.xsl, then generate a PDF file named everything.pdf from those formatting objects.

Here's a screen capture of the PDF file:

Screen capture of everything.pdf sample file

If you like, you can view the file everything.pdf (found in x-xslfo2app-samples.zip). This generates a PDF file on letter-size paper. To generate a A4-sized PDF file, add -param page-size a4 to the end of the command above.


Missing HTML elements

For a variety of reasons (most of which are reasonable), this guide doesn't cover some HTML elements. The primary reason they're left out is that they don't make sense inside a PDF file, which is the most common result of formatting-object conversion. Some of the omitted HTML elements have been deprecated by the W3C, which is also a good reason to leave them out. Some of the elements may make sense to you in a PDF context. Tell me what you think; if you make a good case for adding other HTML elements to this guide, I'll consider it (you can use the feedback form at the end of this guide to make suggestions).


Guide to converting HTML elements

This guide shows you how to convert most of the HTML elements into XSL formatting objects. If you're viewing this online, you can click any of the links below to go directly to the discussion of a particular element. For each HTML element, you'll find a brief description of the element, the corresponding formatting object, and an XSLT template for converting HTML into XSL-FO. Like the HTML elements they process, some of the formatting objects and templates are very straightforward, and some are quite involved.

As always happens with example code, I've had to make some choices that are specific to the case at hand. All of the examples here assume that ultimately you'll be using the formatting objects as an intermediary to achieve a conversion to PDF. A few of the values I've chosen are arbitrary, but most of the choices I've made are guided by the layout defined for the PDF file that is the ultimate result of all this conversion. That layout is the same one I used in the two tutorials I've completed for developerWorks. Of course, in adapting the examples to your own needs, you would substitute the values that will yield the look you're aiming for; you needn't follow the look of our PDF file.

Keep in mind that alphabetical order, while great for reference, is not ideal for reading straight through. For example, although most of the HTML tags for building tables do fall together under T, they're interrupted in alpha order by the title element.


<a name="..."> Named anchor points

This guide looks at transforming three different kinds of anchor elements: named anchor points, discussed in this entry, and named anchor references and anchor references, both of which are discussed in the next two entries of the alphabetical guide. The third entry includes an XSLT template sample that demonstrates the conversion of all three types of anchor elements.

A named anchor looks like <a name="xyz" />. This is usually transformed to an <fo:block> element with an id. Here's the typical result:

<fo:block id="xyz"/>

This seems simple enough, but there can be problems, depending upon the organization of your document. For example, in the tutorial's example, the style calls for rendering the HTML <h1> element by inserting a horizontal rule and a page break before the heading text. A page break in that position causes problems for named anchors that look like this:

<a name="xslt"/>
<!-- A page break will be inserted here -->
<h1>Using XSLT style sheets</h1>

If the <h1> starts on a new page, creating a link to the named anchor takes the user to the end of the previous page, which is not what was intended. To handle this situation, have the processor look at the element after the named anchor in the HTML document. If the following element is an <h1>, ignore the named anchor; the XSLT template for the <h1> element handles the named anchor in this instance. Here's the XSLT logic that handles a named anchor even in the case of a heading preceded by a page break:

<xsl:template match="a">
  <xsl:choose>
    <xsl:when test="@name">
      <xsl:if test="not(name(following-sibling::*[1]) = 'h1')">
        <fo:block line-height="0pt" space-after="0pt" 
          font-size="0pt" id="{@name}"/>
      </xsl:if>
    </xsl:when> 

Specifying the following-sibling axis ensures that the style sheet processor checks the element that follows the named anchor. If the name of the first element after this one is not h1, the processor creates an <fo:block> with an id. Notice also that the <fo:block> element sets the line-height, font-size, and space-after properties to zero; you don't want to waste any vertical space rendering the invisible anchor point.


<a href="#..."> Named anchor references

To transform an anchor tag that references another destination in the same document, convert it into an <fo:basic-link> element. For references to the same document, use the internal-destination attribute. For example, say you have an anchor element that looks like this:

For more information, see <a href="#chapter1">Chapter 1</a>.

You need to transform the anchor element into this XSL-FO markup:

For more information, see 
<fo:basic-link color="blue" internal-destination="chapter1">
  Chapter 1
</fo:basic-link>.

If an HTML anchor element has an href attribute, check to see if it begins with a hash mark (#). If it does, use the href attribute as the internal-destination of an <fo:basic-link>. To use the value, you have to remove the hash mark: Use the XSLT substring() function. The final thing you do to handle an internal link is add an <fo:page-number-citation> to the referenced section.

An XSLT template to do this conversion might look like this:

<xsl:template match="a">
  <xsl:choose>
    <xsl:when test="@name">
    ... The previous entry covered named anchors ...
    </xsl:when>
    <xsl:when test="@href">
      <fo:basic-link color="blue">
        <xsl:choose>
          <xsl:when test="starts-with(@href, '#')">
            <xsl:attribute name="internal-destination">
              <xsl:value-of select="substring(@href, 2)"/>
            </xsl:attribute>
          </xsl:when> 
          <xsl:otherwise>
            ... Handle external links here ...
          </xsl:otherwise>
        </xsl:choose>
        <xsl:apply-templates select="*|text()"/>
      </fo:basic-link>
      <xsl:if test="starts-with(@href, '#')">
        <xsl:text> on page </xsl:text>
        <fo:page-number-citation ref-id="{substring(@href, 2)}"/>
      </xsl:if>
    </xsl:when>
  </xsl:choose>
</xsl:template>

The <fo:page-number-citation> element means the rendered link looks something like this:

For more information, see Chapter 1 on page 73.

<a href="..."> Anchor references

The final type of link discussed in this guide is a reference to a URI. To render these in the tutorial's example PDF file, use the external-destination attribute of <fo:basic-link>. For example, say you have an anchor element that looks like this:

<a href="http://www.ibm.com/developerWorks/">
  IBM's developerWorks Web site
</a>

You would convert that element into the following markup:

<fo:basic-link color="blue" 
    external-destination="http://www.ibm.com/developerWorks/">
  IBM's developerWorks Web site
</fo:basic-link>

Here's the complete XSLT template for all three types of anchor elements:

<xsl:template match="a">
  <xsl:choose>
    <xsl:when test="@name">
      <xsl:if test="not(name(following-sibling::*[1]) = 'h1')">
        <fo:block line-height="0" space-after="0pt" 
          font-size="0pt" id="{@name}"/>
      </xsl:if>
    </xsl:when>
    <xsl:when test="@href">
      <fo:basic-link color="blue">
        <xsl:choose>
          <xsl:when test="starts-with(@href, '#')">
            <xsl:attribute name="internal-destination">
              <xsl:value-of select="substring(@href, 2)"/>
            </xsl:attribute>
          </xsl:when>
          <xsl:otherwise>
            <xsl:attribute name="external-destination">
              <xsl:value-of select="@href"/>
            </xsl:attribute>
          </xsl:otherwise>
        </xsl:choose>
        <xsl:apply-templates select="*|text()"/>
      </fo:basic-link>
      <xsl:if test="starts-with(@href, '#')">
        <xsl:text> on page </xsl:text>
        <fo:page-number-citation ref-id="{substring(@href, 2)}"/>
      </xsl:if>
    </xsl:when>
  </xsl:choose>
</xsl:template>

<address> An address

This seldom-used HTML element defines an address, although the components of a typical address (phone number, e-mail address, street address, city, and so on) aren't identified inside the <address> element. The <address> element is typically used like this:

<address>
  Mrs. Mary Backstayge
  <br />
  283 First Avenue
  <br />
  Skunk Haven, MA  02718
</address>

Notice that the example uses <br> elements inside the <address> to indicate line breaks. Here's the equivalent XSL-FO markup:

<fo:block>Mrs. Mary Backstayge<fo:block> </fo:block>
283 First Avenue<fo:block> </fo:block>
Skunk Haven, MA  02718</fo:block>

The XSLT template for <address> is very simple; you just convert the <address> element to an <fo:block> element and then process the text and any other elements inside it. Here's the template:

<xsl:template match="address">
  <fo:block>
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<b> Boldfaced text

Transforming a bold element is very easy; simply transform it into an <fo:inline> element with an attribute of font-weight="bold". Here's an example:

<p>Jackdaws <b>love</b> my big sphinx of quartz.</p>

Use the basic XSL-FO elements of <fo:block> and <fo:inline> to render this content:

<fo:block>
  Jackdaws <fo:inline font-weight="bold">love</fo:inline>
  my big sphinx of quartz.
</fo:block>

Remember from XSL-FO basics that the <fo:block> element always causes a line break, while the <fo:inline> element does not. For this reason, use <fo:inline> to render the contents of the <b> element. This simple XSLT template does the job:

<xsl:template match="b">
  <fo:inline font-weight="bold">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Notice that the select attribute of the <xsl:apply-templates> element selects both the text of the <b> element as well as any child elements it might contain. For example, if the markup above were <p>Jackdaws <b><i>love</i></b> ..., selecting any child elements of <b> ensures that both the bold and the italic elements are processed.


<big> Bigger text

The seldom-used <big> HTML element makes the enclosed text slightly bigger than the surrounding text. Here's an example:

<p>Jackdaws <big>love</big> my big sphinx of quartz. </p>

This transformation is simple because the FOP tool used as the FO processor in the XSL-FO tutorial examples now supports percentages as values of the font-size property. (That wasn't always the case.) A reasonable way to render the <big> element is with a font-size of 120%:

<xsl:template match="big">
  <fo:inline font-size="120%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Using a relative font size means that multiple <big> elements nested inside each other make the text rendered from the transformation progressively larger, just as nested <big> elements do in HTML. The formatting objects generated for the example paragraph would then look like this:

<fo:block>Jackdaws 
<fo:inline font-size="120%">love</fo:inline> my big
sphinx of quartz. </fo:block>

Of course, you can modify the example template to process the <big> element however you want; you could make the font even bigger, change the color, and so on.


<blockquote> Block quotations

To process a block quotation, the example renders it as a single-spaced paragraph with a 1.5-centimeter indentation on the left and right sides. To do this, use the start-indent and end-indent attributes of the <fo:block> element. Here's the <blockquote> element:

<blockquote>
  When in the Course of human events, it becomes necessary for one people 
  to dissolve the political bands which have connected them with another, 
  and to assume among the powers of the earth, the separate and equal 
  station to which the Laws of Nature and of Nature's God entitle them, 
  a decent respect to the opinions of mankind requires that they should 
  declare the causes which impel them to the separation.
</blockquote>

To format this excerpt as a paragraph indented on both sides, use this XSL-FO markup:

<fo:block start-indent="1.5cm" end-indent="1.5cm">
  When in the Course of human events, it becomes necessary for one people 
  to dissolve the political bands which have connected them with another, 
  ...
</fo:block>

Use this template to transform <blockquote> elements:

<xsl:template match="blockquote">
  <fo:block start-indent="1.5cm" end-indent="1.5cm">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

Of course, you can modify the layout of the quote by changing the indentation attribute values.


<body> The document body

The XSL-FO equivalent of the <body> element is an <fo:flow flow-name="xsl-region-body"> element. To keep the symmetry between the HTML document and the XSLT style sheet, the example here uses the processing for the <body> element to generate the corresponding XSL-FO element. For the example document, the <fo:flow flow-name="xsl-region-body"> element contains the following six things:

  • The title of the document (the HTML title element inside the <head>)
  • The uplifting message developerWorks loves you!
  • The developerWorks URL
  • The table of contents
  • All of the content in the document
  • An id to use to identify the last page in the document

Obviously, most of these items are included so that the document has the intended layout. You can change the XSLT template to create a different layout (maybe you'd like a title page, for example), or you can have multiple style sheets to render the same information in multiple formats. Here is the complete template:

<xsl:template match="body">
  <fo:flow flow-name="xsl-region-body">
    <!-- Item 1 -->
    <xsl:apply-templates select="/html/head/title"/>

    <!-- Item 2 -->
    <fo:block space-after="12pt" line-height="17pt" 
      font-size="14pt" text-align="center">
      developerWorks loves you!
    </fo:block>

    <!-- Item 3 -->
    <fo:block space-after="24pt" line-height="17pt" 
      font-size="14pt" text-align="center" font-weight="bold" 
      font-family="monospace">
      ibm.com/developerWorks
    </fo:block>

    <!-- Item 4 -->
    <xsl:call-template name="toc"/>

    <!-- Item 5 -->
    <xsl:apply-templates select="*|text()"/>

    <!-- Item 6 -->
    <fo:block id="TheVeryLastPage" font-size="0pt"
      line-height="0pt" space-after="0pt"/>
  </fo:flow>
</xsl:template>

<br> Line breaks

You already know that the <fo:break> element causes a line break; to process the <br> element, you simply embed one <fo:break> element inside another. For example, consider the following markup:

<p>My mistress' eyes are nothing like the sun,
  <br/>Coral is far more red than her lips red.
  <br/>If snow be white, why then her breasts be dun,
  <br/>If hairs be wires, black wires grow on her head. 
  ...
</p>

When you transform this markup into XSL-FO elements, the result looks like this:

<fo:block>
  My mistress' eyes are nothing like the sun, 
  <fo:block> </fo:block>
  Coral is far more red than her lips red. 
  <fo:block> </fo:block>
  If snow be white, why then her breasts be dun,
  <fo:block> </fo:block>
  If hairs be wires, black wires grow on her head.
</fo:block>

Here's the XSLT template that transforms the <br> element:

<xsl:template match="br">
  <fo:block> </fo:block>
</xsl:template>

<caption> Caption text for a table

The <caption> element is used to create a caption for a table. In XSL-FO, this is represented with the <fo:table-caption> element. The guide discusses the <fo:table-caption> and <fo:table-and-caption> elements here even though FOP does not currently support them. Ideally this limitation will be remedied soon.

There is one slight complication to the transformation: In HTML, the <caption> element can appear just about anywhere; in XSL-FO, it must be inside the <fo:table-and-caption> element. The XSL-FO structure looks like this:

<table-and-caption>
  <table-caption>
  <table>
</table-and-caption>

Here's a simple XSLT template for transforming the <caption> element:

<xsl:template match="caption">
  <fo:table-caption>
    <xsl:apply-templates select="*|text()"/>
  </fo:table-caption>
</xsl:template>

The only issue you might have with the HTML <caption> element is that it might not appear inside the HTML <table> you're processing, and it might not appear as the first element inside the <table>. Here's a template for the <table> element that tries to deal with these problems:

<xsl:template match="table">
  <fo:table-and-caption>
    <xsl:choice>
      <xsl:when test=".//caption">
        <xsl:apply-templates select=".//caption[1]">
      </xsl:when>
      <xsl:when test="preceding-sibling::caption">
        <xsl:apply-templates select="preceding-sibling::caption"/>
      </xsl:when>
    </xsl:choice>
    <fo:table>
      <xsl:apply-templates select="*[not(name()='caption')]"/>
    </fo:table>
  </fo:table-and-caption>
</xsl:template>

This template first checks for the existence of any <caption> elements inside the <table> element. If there is one, it selects the first one and processes it. (This lets the processor ignore any extra <caption>s.) If there isn't a <caption> inside the table, the processor looks at the first element before the table. If it's a <caption>, you can assume it's the table caption. The last hassle here is that when you process the actual <table> itself, you have to use <xsl:apply-templates> against everything except any <caption> elements.


<center> Centered text

To handle centered text, you can use the text-align="center" attribute of the <fo:block> element. Given this markup:

<center>
  Table of Contents
</center>

You can convert it into the following XSL-FO element:

<fo:block text-align="center">
  Table of Contents
</fo:block>

Here's an XSLT template that does what you want:

<xsl:template match="center">
  <fo:block text-align="center">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<cite> A citation

The <cite> element is typically rendered in italic text. To complicate things a little, you can write the XSLT template so that a <cite> element contained inside an <i> element is rendered as normal text to set it off from the surrounding italicized text. Here's some sample markup:

<p>
  When she was little, my daughter loved it when I read 
  <cite>Goodnight Moon</cite> to her.
  <i>But <cite>Harold and the Purple Crayon</cite> 
  was her favorite.</i>
</p>

To render this markup, use the XSL-FO <fo:inline> element:

<fo:block>
  When she was little, my daughter loved it when I read
  <fo:inline font-style="italic">Goodnight Moon</fo:inline> to her.
  <fo:inline font-style="italic">But
    <fo:inline font-style="normal">Harold and the Purple Crayon</fo:inline>
    was her favorite.
  </fo:inline>
</fo:block>

To handle any <cite> elements that are inside an italicized phrase, check its parent before transforming it:

<xsl:template match="cite">
  <xsl:choose>
    <xsl:when test="parent::i">
      <fo:inline font-style="normal">
        <xsl:apply-templates select="*|text()"/>
      </fo:inline>
    </xsl:when>
    <xsl:otherwise>
      <fo:inline font-style="italic">
        <xsl:apply-templates select="*|text()"/>
      </fo:inline>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

If the parent of the <cite> element is an <i> element, change the font-style to normal; otherwise, change it to italic. This technique correctly handles the combination of <cite> and <i> elements when they are nested inside each other.


<code> A code sample

Rendering a <code> element requires using a monospaced font. As you'd expect, you use an <fo:inline> element. Here's some sample markup:

<p>If you're a Java programmer, an easy way to break a string apart 
is with the <code>java.util.StringTokenizer</code> class.</p>

To render this fragment of text properly, you would use an <fo:inline> element with the font-family="monospace" attribute:

<fo:block>If you're a Java programmer, an easy way to 
  break a string apart is with the 
  <fo:inline font-family="monospace">java.util.StringTokenizer</fo:inline>
  class.
</fo:block>

Here's an XSLT template that transforms <code> elements:

<xsl:template match="code">
  <fo:inline font-family="monospace">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<dl>, <dt>, and <dd> Definition lists

Although definition lists are less common than ordered lists (<ol>) and unordered lists (<ul>), they are very useful for defining lists of terms or options. Here is a definition list that defines some options for a parameter:

<p>There are four valid values for the 
  <code>text-align</code> parameter:</p>
<dl>
  <dt>start</dt>
  <dd>
    The text is aligned at the start of the paragraph, normally 
    the left side.  
  </dd>
  <dt>middle</dt>
  <dd>The text is aligned at the middle of the paragraph.</dd>
  <dt>end</dt>
  <dd>
    The text is aligned at the end of the paragraph, normally 
    the right side. 
  </dd>
  <dt>justify</dt>
  <dd>The text is aligned at both the start and end of 
    the paragraph.</dd>
</dl>

Typical formatting for a definition list puts the term (the <dt> element) on one line in bold with the definitions indented beneath it, starting on the next line. Here are the templates for <dl>, <dt>, and <dd>:

<xsl:template match="dl">
  <xsl:apply-templates select="*"/>
</xsl:template>

<xsl:template match="dt">
  <fo:block font-weight="bold" space-after="2pt"
      keep-with-next="always">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<xsl:template match="dd">
  <fo:block start-indent="1cm">
    <xsl:attribute name="space-after">
      <xsl:choose>
        <xsl:when test="name(following::*[1]) = 'dd'">
          <xsl:text>3pt</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>12pt</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

In the template for the <dd> element, the processor looks to see if the next element is also a <dd> element. If it is, the current <dt> element must have multiple definitions for the same term, and so the processor puts 3 points of vertical space after the current definition. Otherwise, the processor uses 12 points of vertical space. Also notice that the template uses the keep-with-next property of the <fo:block> element, even though FOP doesn't always handle this correctly.


<em> Emphasized text

Most browsers render emphasized text in italics, and so you can simply transform the <em> element into an <fo:inline> element with an <font-style="italic"> attribute. Start with this markup:

<p>You <em>must<> disconnect the power supply before 
you open the product housing.  </p>

You would render it in FO as follows:

<fo:block>
  You <fo:inline font-style="italic">must</fo:inline> disconnect
  the power supply before you open the product housing.
</fo:block>

Here's how the XSLT template looks:

<xsl:template match="em">
  <fo:inline font-style="italic">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<font color="..."> Change the text color

There are three attributes of the <font> element that I'll use to demonstrate how to transform for XSL-FO elements: color, face, and size. For the color attribute, you can use a hex RGB value (such as x33cc99) or one of the 16 color names defined by the XSL-FO spec:

16 color names defined by XSL-FO spec
aquablackbluefuchsia
graygreenlimemaroon
navyolivepurplered
silvertealwhiteyellow

The XSLT template assumes that FOP can handle the color value if there is one. The default color is black:

<xsl:template match="font">
  <xsl:variable name="color">
    <xsl:choose>
      <xsl:when test="@color">
        <xsl:value-of select="@color"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>black</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  ...
  <fo:inline font-size="{$size}" font-family="{$face}"
    color="{$color}">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Notice that the XSLT template creates a variable to hold the value of the color attribute. The template creates variables for color, face, and size, and then it builds the <fo:inline> element with those values.


<font face="..."> Change the text font

The <font> element's face attribute maps to the XSL-FO font-family attribute, but there's a catch: The FOP tool supports a limited number of fonts. The valid values for font-family are:

  • serif
  • sans-serif
  • monospace
  • Courier, Courier-Bold, Courier-BoldOblique, or Courier-Oblique
  • Helvetica, Helvetica-Bold, Helvetica-BoldOblique, or Helvetica-Oblique
  • Symbol
  • Times-Roman, Times-Bold, Times-BoldItalic, or Times-Italic

The XSLT template creates a variable for the typeface just as it did for the color. It sets the default typeface to sans-serif:

<xsl:template match="font">
  <xsl:variable name="face">
    <xsl:choose>
      <xsl:when test="@face">
        <xsl:value-of select="@face"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>sans-serif</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  ...
  <fo:inline font-size="{$size}" font-family="{$face}"
    color="{$color}">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

In addition, the FOP tool provides ways to convert Adobe Type 1 fonts and Truetype fonts into XML font-metric files. Those files are used by FOP to render text in fonts other than the ones listed above. See the FOP documentation for details.


<font size="..."> Change the text size

The <font> element's size attribute maps to the XSL-FO font-size attribute. Here's the logic used to process the HTML size attribute:

  • If the size attribute contains the string pt (size="24pt", for example), use the value as-is.
  • If the size attribute begins with a plus or minus sign (such as size="+2" or size="-1"), use a relative size for the font; map +1 to a relative font size of 110%, for instance. (I picked the values arbitrarily, so feel free to change them.)
  • If the size attribute is a number between 1 and 7, set the font to an arbitrary size. (Again, change the attribute values if you like.)
  • As a last resort, set the font size to 12pt.

Here's the entire XSLT template for the <font> element. At the end, it uses the variables previously initialized to create the <fo:inline> element:

<xsl:template match="font">
  <xsl:variable name="color">
    <xsl:choose>
      <xsl:when test="@color">
        <xsl:value-of select="@color"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>black</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="face">
    <xsl:choose>
      <xsl:when test="@face">
        <xsl:value-of select="@face"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>sans-serif</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="size">
    <xsl:choose>
      <xsl:when test="@size">
        <xsl:choose>
          <xsl:when test="contains(@size, 'pt')">
            <xsl:text>@size</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+1'">
            <xsl:text>110%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+2'">
            <xsl:text>120%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+3'">
            <xsl:text>130%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+4'">
            <xsl:text>140%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+5'">
            <xsl:text>150%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+6'">
            <xsl:text>175%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '+7'">
            <xsl:text>200%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-1'">
            <xsl:text>90%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-2'">
            <xsl:text>80%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-3'">
            <xsl:text>70%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-4'">
            <xsl:text>60%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-5'">
            <xsl:text>50%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-6'">
            <xsl:text>40%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '-7'">
            <xsl:text>30%</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '1'">
            <xsl:text>8pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '2'">
            <xsl:text>10pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '3'">
            <xsl:text>12pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '4'">
            <xsl:text>14pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '5'">
            <xsl:text>18pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '6'">
            <xsl:text>24pt</xsl:text>
          </xsl:when>
          <xsl:when test="@size = '7'">
            <xsl:text>36pt</xsl:text>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text>12pt</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise> 
        <xsl:text>12pt</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <fo:inline font-size="{$size}" font-family="{$face}"
    color="{$color}">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<h1> through <h6> Headings

Transforming heading tags is relatively straightforward; you put each one into an <fo:block> element, changing the font, font size, and other attributes based on the heading level. To make the top-level heading really stand out, the example layout puts a page break and a horizontal line before the <h1> text. Here are the formatting choices used:

HTML tagFont sizeLine heightSpace afterOther
<h1>28pt32pt22ptAdd a page break and a horizontal line before the text
<h2>24pt28pt18ptNone
<h3>21pt24pt14ptNone
<h4>18pt21pt12ptNone
<h5>16pt19pt12ptText is underlined
<h6>14pt17pt12ptText is underlined and italicized

Here are some heading elements:

<h1>Sample text from Henry Fielding's <cite>Tom Jones</cite></h1>

<h2><b>Book I.</b>  Containing as Much of the Birth of the Foundling 
  as Is Necessary or Proper to Acquaint the Reader with in the 
  Beginning of This History</h2>

<h3><b>Chapter VII.</b>  Containing Such Grave Matter, That the Reader
  Cannot Laugh Once Through the Whole Chapter, Unless Peradventure He 
  Should Laugh at the Author</h3>

These elements are converted into the following formatting objects:

<fo:block break-before="page">
  <fo:leader leader-pattern="rule"/>
</fo:block>
<fo:block font-family="serif" space-after="22pt" keep-with-next="always" 
    line-height="32pt" font-size="28pt" id="tomjones">
  Sample text from Henry Fielding's 
    <fo:inline font-style="italic">Tom Jones</fo:inline>
</fo:block>

<fo:block font-family="serif" space-after="18pt" keep-with-next="always" 
    line-height="28pt" font-size="24pt" id="N10017">
  <fo:inline font-weight="bold">Book I.</fo:inline>  
  Containing as Much of the Birth of the Foundling 
  as Is Necessary or Proper to Acquaint the Reader with in the 
  Beginning of This History
</fo:block>

<fo:block font-family="serif" space-after="14pt" keep-with-next="always" 
    line-height="24pt" font-size="21pt" id="N1001C">
  <fo:inline font-weight="bold">Chapter VII.</fo:inline>  
  Containing Such Grave Matter, That the Reader
  Cannot Laugh Once Through the Whole Chapter, Unless Peradventure He 
  Should Laugh at the Author
</fo:block>

The page break inserted before the text of an <h1> element complicates using named anchors and <h1> elements together. For that reason, the XSLT template for the <h1> element checks the previous element to see if it is a named anchor. Here are the templates for the <h1> and <h6> elements:

<xsl:template match="h1">
  <fo:block break-before="page">
    <fo:leader leader-pattern="rule"/>
  </fo:block> 
  <fo:block font-size="28pt" line-height="32pt"
      keep-with-next="always"
      space-after="22pt" font-family="serif">
    <xsl:attribute name="id">
      <xsl:choose>
        <xsl:when test="@id">
          <xsl:value-of select="@id"/>
        </xsl:when>
        <xsl:when test="name(preceding-sibling::*[1]) = 'a' and
                         preceding-sibling::*[1][@name]">
          <xsl:value-of select="preceding-sibling::*[1]/@name"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="generate-id()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<xsl:template match="h6">
  <fo:block font-size="14pt" line-height="17pt"
      keep-with-next="always" space-after="12pt"
      font-family="serif" font-style="italic"
      text-decoration="underline">
    <xsl:attribute name="id">
      <xsl:choose>
        <xsl:when test="@id">
          <xsl:value-of select="@id"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="generate-id()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

A final note: Because headings are used for bookmarks, tables of contents, and are useful link points, it's best to make sure that each heading has an id. If a given heading element already has an id attribute, use it; otherwise, create one using the XSLT generate-id() function:

<xsl:attribute name="id">
  <xsl:choose>
    <xsl:when test="@id">
      <xsl:value-of select="@id"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="generate-id()"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:attribute>

Generating bookmarks

PDF files can contain a hierarchical tree of bookmarks to different parts of your document. You can generate these automatically from heading elements using the XSL-FO <fo:bookmark-tree>, <fo:bookmark>, and <fo:bookmark-title> elements. Given this HTML markup:

              <h1>Sample text from Henry Fielding's Tom Jones</h1>
              <h2>Book I. Containing as Much of the Birth...</h2>
              <h3>Chapter VII. Containing Such Grave Matter...</h3>

The stylesheet generates this XSL-FO markup:

              <fo:bookmark-tree>
              <fo:bookmark starting-state="hide" internal-destination="tomjones">
              <fo:bookmark-title>
              Sample text from Henry Fielding's Tom Jones
              </fo:bookmark-title>
              <fo:bookmark starting-state="hide" internal-destination="N10017">
              <fo:bookmark-title>
              Book I.  Containing as Much of the Birth...
              </fo:bookmark-title>
              <fo:bookmark starting-state="hide" internal-destination="N1001C">
              <fo:bookmark-title>
              Chapter VII.  Containing Such Grave Matter...
              </fo:bookmark-title>
              </fo:bookmark>
              </fo:bookmark>
              </fo:bookmark>
              </fo:bookmark-tree>

The result is these three nested bookmarks:

Figure 1.
Screen capture of bookmarks in a PDF file

See the generate-bookmarks template in the stylesheet for the details.


<hr> Horizontal rules

There is a special XSL-FO element, <fo:leader>, that's designed to handle horizontal rules. Here's some HTML markup:

<p>Here's a short paragraph.</p>
<hr/>
<p>Here's another paragraph, following a horizontal rule.</p>

The XSL-FO markup that renders this content should look like this:

<fo:block>
  Here's a short paragraph.
</fo:block>
<fo:block>
  <fo:leader leader-pattern="rule"/>
</fo:block>
<fo:block>
  Here's another paragraph, following a horizontal rule.
</fo:block>

The XSLT template for processing an <hr> element is very simple:

<xsl:template match="hr">
  <fo:block>
    <fo:leader leader-pattern="rule"/>
  </fo:block>
</xsl:template>

A final note: The leader-pattern attribute also supports the values of dots, typically used in Tables of Contents, and space, which creates an area of blank space.


<i> Italicized text

The HTML <i> element is easily represented in XSL-FO: Simply put the text of the element inside an <fo:inline> with a font-style="italic"> attribute. Here's an example that's remarkably similar to the one for the <b> element, shown earlier:

<p>Jackdaws <i>love</i> my big sphinx of quartz.</p>

Use the basic XSL-FO elements of <fo:block> and <fo:inline> to render this content:

<fo:block>
  Jackdaws <fo:inline font-style="italic">love</fo:inline>
  my big sphinx of quartz.
</fo:block>

The <fo:block> element always causes a line break, while the <fo:inline> element does not. For this reason, use <fo:inline> to render the contents of the <i> element. This simple XSLT template does the job:

<xsl:template match="i">
  <fo:inline font-style="italic">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<img> An embedded image

The <img> element maps directly to the XSL-FO <fo:external-graphic> element. A complication to transforming <img> into <fo:external-graphic> is that the FOP tool does not support HTML attributes like width="200"; the width and height attributes must include a unit of measurement (width="200px", for example). For that reason, the XSLT template is set up to have the processor look at the HTML values as it generates the <fo:external-graphic> element. Here's the template:

<xsl:template match="img">
  <fo:block space-after="12pt">
    <fo:external-graphic src="{@src}">
      <xsl:if test="@width">
        <xsl:attribute name="width">
          <xsl:choose>
            <xsl:when test="contains(@width, 'px')">
              <xsl:value-of select="@width"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="concat(@width, 'px')"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="@height">
        <xsl:attribute name="height">
          <xsl:choose>
            <xsl:when test="contains(@height, 'px')">
              <xsl:value-of select="@height"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="concat(@height, 'px')"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:attribute>
      </xsl:if>
    </fo:external-graphic>
  </fo:block>
</xsl:template>

Notice that the template uses the src attribute as-is, and the processor adds the unit of measurement to the width and height attributes as needed.


<kbd> Keyboard input

Keyboard input, indicated with the HTML <kbd> element, is typically represented in a slightly larger monospaced font. Here's an HTML sample:

<p>
  An easy way to delete a directory and all its contents (including
  all subdirectories) is with the <kbd>rd /s</kbd> command.
</p>

You'd represent this sample with XSL-FO elements like this:

<fo:block>
  An easy way to delete a directory and all its contents (including
  all subdirectories) is with the 
  <fo:inline font-family="monospace" font-size="110%">rd /s</fo:inline>
  command.
</fo:block>

The XSLT template to do this conversion is simple:

<xsl:template match="kbd">
  <fo:inline font-family="monospace" font-size="110%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<li> A list item

A list item is handled almost like a paragraph, with a couple of minor differences. Because those differences depend on the parent of the list item (is this an <li> element inside an <ol> element or a <ul> element?), look for the details on list items under the <ol> element and the <ul> element entries.


<nobr> Text with no line breaks

The HTML <nobr> element specifies that the browser should not wrap lines, regardless of how much content is contained inside the <nobr> element. For example, take the following HTML markup:

<p>
  On the Windows 2000 command line, you can use the ampersand (&amp;) to 
  combine several commands into one statement. 
  <nobr>
    pushd d:\projects\xslfo &amp; del *.fo &amp; rebuild.bat &amp; popd
  </nobr>
  is an example of this technique.
</p>

To render this content in formatting objects, use the <fo:inline> element with the wrap-option="no-wrap" attribute. Unfortunately, FOP 1.1 doesn't handle this correctly, so you can use <fo:block> instead.

<fo:block>
  On the Windows 2000 command line, you can use the ampersand (&amp;) 
  to combine several commands into one statement.  
  <fo:block wrap-option="no-wrap">pushd d:\projects\xslfo &amp;
  del *.fo &amp; rebuild.bat &amp; popd </fo:block> is an 
  example of this technique.
</fo:block>

The XSLT template to do this transformation is straightforward:

<xsl:template match="nobr">
  <fo:block wrap-option="no-wrap">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

The good news is that most <nobr> elements are set off by themselves, so this generates the results you want on the page. If you use an XSL-FO rendering engine other than FOP, try changing the stylesheet to use <fo-inline>. It's likely to work.


<ol> An ordered list

The XSL-FO vocabulary features specialized elements for lists, including ordered lists. Here's a sample HTML list:

<p>A few of my favorite albums</p>
<ol>
  <li>A Love Supreme</li>
  <li>Remain in Light</li>
  <li>London Calling</li>
  <li>The Indestructible Beat of Soweto</li>
  <li>The Joshua Tree</li>
</ol>

To represent list items in XSL-FO, you must use several elements: <fo:list-block>, <fo:list-item>, <fo:list-item-label>, and <fo:list-item-body>. Here's how they're used:

  • The <fo:list-block> element contains the entire list.
  • Each <fo:list-item> element contains a single list item; it contains <fo:list-item-label> and <list-item-body> elements.
  • The <fo:list-item-label> is the label next to the content. For an ordered list, the <fo:list-item-label> element might contain the text 10., for example.
  • <fo:list-item-body> contains the actual content of the list item.

Here's how the HTML markup appears as formatting objects:

<fo:block>A few of my favorite albums</fo:block>
<fo:list-block provisional-distance-between-starts="0.75cm"
  provisional-label-separation="0.5cm" space-after="12pt"
  start-indent="1cm">
  <fo:list-item>
    <fo:list-item-label end-indent="label-end()">
      <fo:block>1. </fo:block>
    </fo:list-item-label>
    <fo:list-item-body start-indent="body-start()">
      <fo:block>
        A Love Supreme
      </fo:block>
    </fo:list-item-body>
  </fo:list-item>
   ...
  <!-- other list items appear here -->
   ...
</fo:list-block>

The XSLT templates that do the transformations appear below. The templates must handle several things:

  • If this list appears inside another list, don't insert any space after it. Otherwise, insert 12 points of vertical space after the list.
  • If this list is not inside any other lists, indent it by 1 cm. Otherwise, indent it by 1 cm plus 1.25 cm for each additional list. For example, if this list is three levels deep, indent the list by 3.5 cm.
  • If the <ol> element has a start attribute, start numbering list items with that number. If not, simply use the XSLT position() function to number the list items.
  • If the <ol> element has a type attribute, use its value to figure out how the number of each list item should be formatted (Roman numerals, Arabic numerals, or alphabetic characters). It does this with the format attribute of the <xsl:number> element.

Here are the XSLT templates that transform any <ol> and <li> items using the rules in the bulleted list:

<xsl:template match="ol">
  <fo:list-block provisional-distance-between-starts="1cm"
    provisional-label-separation="0.5cm">
    <xsl:attribute name="space-after">
      <xsl:choose>
        <xsl:when test="ancestor::ul or ancestor::ol">
          <xsl:text>0pt</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>12pt</xsl:text>
          </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
    <xsl:attribute name="start-indent">
      <xsl:variable name="ancestors">
        <xsl:choose>
          <xsl:when test="count(ancestor::ol) or count(ancestor::ul)">
            <xsl:value-of select="1 + 
                                  (count(ancestor::ol) + 
                                   count(ancestor::ul)) * 
                                  1.25"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text>1</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:value-of select="concat($ancestors, 'cm')"/>
    </xsl:attribute>
    <xsl:apply-templates select="*"/>
  </fo:list-block>
</xsl:template>

<xsl:template match="ol/li">
  <fo:list-item>
    <fo:list-item-label end-indent="label-end()">
      <fo:block>
        <xsl:variable name="value-attr">
          <xsl:choose>
            <xsl:when test="../@start">
              <xsl:number value="position() + ../@start - 1"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:number value="position()"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>
        <xsl:choose>
          <xsl:when test="../@type='i'">
            <xsl:number value="$value-attr" format="i. "/>
          </xsl:when>
          <xsl:when test="../@type='I'">
            <xsl:number value="$value-attr" format="I. "/>
          </xsl:when>
          <xsl:when test="../@type='a'">
            <xsl:number value="$value-attr" format="a. "/>
          </xsl:when>
          <xsl:when test="../@type='A'">
            <xsl:number value="$value-attr" format="A. "/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:number value="$value-attr" format="1. "/>
          </xsl:otherwise>
        </xsl:choose>
      </fo:block>
    </fo:list-item-label>
    <fo:list-item-body start-indent="body-start()">
      <fo:block>
        <xsl:apply-templates select="*|text()"/>
      </fo:block>
    </fo:list-item-body>
  </fo:list-item>
</xsl:template>

These two templates convert the <ol> and <li> elements into the appropriate XSL-FO elements.


<p> A paragraph

An HTML paragraph element is easily transformed into an <fo:block> element. For instance, take this paragraph:

<p>When in the Course of human events, it becomes necessary 
  for one people to dissolve the political bonds which have connected 
  them with another, and to assume among the powers of the earth, 
  the separate and equal station to which the Laws of Nature and 
  of Nature's God entitle them, a decent respect to the opinions 
  of mankind requires that they should declare the causes which 
  impel them to the separation.</p>

The paragraph would translate into this XSL-FO markup:

<fo:block font-size="12pt" line-height="15pt
  space-after="12pt">When in the Course of human events, it becomes 
  necessary for one people to dissolve the political bonds which 
  ...
</fo:block>

Here's the simple XSLT template that does the transform:

<xsl:template match="p">
  <fo:block font-size="12pt" line-height="15pt"
    space-after="12pt">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

This template is implemented with default values for the font-size, line-height, and space-after properties. Obviously you can change those values if you so desire.


<pre> Preformatted text

The <pre> element has a couple of complications. You need to preserve all of the white space it contains, and you need to turn off the automatic word wrapping that the XSL-FO engine does. By convention, you also display the contents of a <pre> element in a monospaced font. This excerpt is the example <pre> element:

<pre>
public static void main(String args[])
{
  System.out.println("Hello, world!");
}
</pre>

To handle this <pre> element example correctly, you must convert it into the following XSL-FO markup:

<fo:block font-family="monospace" 
  white-space-collapse="false"
  wrap-option="no-wrap">
public static void main(String args[])
{
  System.out.println("Hello, world!");
}
</fo:block>

The XSLT template is straightforward:

<xsl:template match="pre">
  <fo:block font-family="monospace"
    white-space-collapse="false"
    wrap-option="no-wrap">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<samp> Sample text

The <samp> element is typically rendered in a slightly larger monospaced font. Although <samp> is rarely used, it's easy to convert to formatting objects. Here's a sample <samp> element:

<p>The <samp>DOCTYPE</samp> keyword lets you
  refer to a DTD from your XML source document.</p>

The XSLT template to convert this into formatting objects is short and sweet:

<xsl:template match="samp">
  <fo:inline font-family="monospace"
      font-size="110%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<small> Text in a smaller typeface

The <small> element is also easy to convert. As you'd expect, it's rendered in a slightly smaller font. Because the template for the <big> element rendered it with a font-size 20 percent larger, it would make sense to define <small> as 20 percent smaller. Here's the straightforward XSLT template:

<xsl:template match="small">
  <fo:inline font-size="80%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Given the paragraph <p>The Lakers' chances for a fourth straight title are <small>slim</small>.</p>, the template generates this markup:

<fo:block font-size="12pt" line-height="15pt" space-after="12pt">
  The Lakers' chances for a fourth straight title are
  <fo:inline font-size="80%">slim</fo:inline>.
</fo:block>

As with the <big> element, nesting multiple <small> elements inside each other creates progressively smaller text.


<strike> Strikethrough text

Implementing the HTML <strike> element is simple; you create an <fo:inline> element with the property text-decoration="line-through". Strikethrough text is useful for highlighting deleted sections of a document. Here's an example:

<p>The underline property
  <strike>is not currently supported by FOP.</strike>
  is now supported by FOP.</p>

When rendered, this paragraph makes it clear how the text has changed. The XSLT template for <strike> is a short one:

<xsl:template match="strike">
  <fo:inline text-decoration="line-through">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Note that the keywords for the text-decoration property are both negative and positive. If for some reason you wanted a short section of text that had the line-through turned off in a long section with line-through turned on, you could create a <fo:inline> element with a text-decoration="no-line-through" property. You can also specify multiple values; the property text-decoration="line-through underline" turns on both strikethrough and underlining.


<strong> Strongly emphasized text

The <strong> element is typically rendered in bold. Here's the simple template that does the trick:

<xsl:template match="strong">
  <fo:inline font-weight="bold">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<sub> Subscript text

To handle subscript text, you use the XSL-FO vertical-align property to change the baseline of the text. You also generally want to reduce the font size. Here's an HTML sample:

<p>When I'm thirsty, nothing beats a cold 
  glass of H<sub>2</sub>O.</p>

Here's our XSLT template:

<xsl:template match="sub">
  <fo:inline vertical-align="sub"
        font-size="75%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<sup> Superscript text

Just as with subscript text, you use the XSL-FO vertical-align property to handle superscript text. Here's an HTML sample:

<p>Einstein's famous e=mc<sup>2</sup>
  is an equation that changed the world. </p>

Here's our XSLT template:

<xsl:template match="sup">
  <fo:inline vertical-align="super"
        font-size="75%">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<table> Table tags

To process the HTML <table> element, the biggest challenge is to determine how many columns the XSL-FO table should have, as well as the width of those columns. FOP requires you to provide an <fo:table-column> element for each column in the table. Once you've handled the columns, you simply invoke the XSLT templates for all the elements inside the table. This XSLT template handles cols that look like this:

<table cols="200 100pt" border="1">

You need to transform the cols attribute into this markup:

<fo:table-column column-width="200pt"/>
<fo:table-column column-width="100pt"/>

To effect that transformation, you need to use a common XSLT technique known as tail recursion. You create a named template that takes the first word from the attribute's value, converts it into a <fo:table-column> element, and then invokes itself to process the rest of the attribute's value. Eventually the entire attribute will be processed. Here's how the XSLT template for the <table> element looks:

<xsl:template match="table">
  <fo:table table-layout="fixed">
    <xsl:choose>
      <xsl:when test="@cols">
        <xsl:call-template name="build-columns">
          <xsl:with-param name="cols" 
            select="concat(@cols, ' ')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <fo:table-column column-width="200pt"/>
      </xsl:otherwise>
    </xsl:choose>
    <fo:table-body>
      <xsl:apply-templates select="*"/>
    </fo:table-body>
  </fo:table>
</xsl:template>

If there is a cols attribute, the processor invokes the build-columns template; otherwise it creates a single <fo:table-column> element. Also notice that the <fo:table> element has the property table-layout="fixed"; FOP currently issues a warning message without this attribute.

Let's look at the build-columns template:

<xsl:template name="build-columns">
  <xsl:param name="cols"/>
  
  <xsl:if test="string-length(normalize-space($cols))">
    <xsl:variable name="next-col">
      <xsl:value-of select="substring-before($cols, ' ')"/>
    </xsl:variable>
    <xsl:variable name="remaining-cols">
      <xsl:value-of select="substring-after($cols, ' ')"/>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="contains($next-col, 'pt')">
        <fo:table-column column-width="{$next-col}"/>
      </xsl:when>
      <xsl:when test="number($next-col) > 0">
        <fo:table-column column-width="{concat($next-col, 'pt')}"/>
      </xsl:when>
      <xsl:otherwise>
        <fo:table-column column-width="50pt"/>
      </xsl:otherwise>
    </xsl:choose>
    
    <xsl:call-template name="build-columns">
      <xsl:with-param name="cols" select="concat($remaining-cols, ' ')"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>

To start with, the template says to look at the length of the $cols parameter after removing any leading or trailing blanks (that's what the XSLT normalize-space() function does). Next, it breaks the $cols value into two parts: The substring before the first space, and everything after the first space. (Notice that when the processor calls this template from the template for the <table> element, it adds a space to the end of the value so there will always be at least one space.)

Now that the $cols parameter is split into two parts, the first part is processed:

  • If the value contains the string pt, assume it's a value like 200pt, and generate the <fo:table-column> element with it.
  • If the value is a number, add the string pt to the end of it and then generate the <fo:table-column> element. Use the XSLT number() function to convert the value to a number. If the value isn't a number (say it's number(xyz), for example), the function returns the string NaN (not a number).
  • If neither of the previous rules applies, just create a new <fo:table-column> element with a column-width of 50pt.

After the processor handles the first part of the $cols parameter, the named template invokes itself with the last part of the $cols parameter, adding a space to the end of the value to make sure there is one.


<td> Table cell

The HTML <td> element maps well to the XSL-FO <fo:table-cell> element. As a default, the template calls for 3 points of space at the top, bottom, left, and right of the table cell. There are three HTML attributes you need to handle: colspan, rowspan, and align. You also handle the border attribute in a limited way.

The colspan and rowspan attributes map directly to the XSL-FO number-columns-spanned and number-rows-spanned attributes, so they aren't difficult to handle.

The HTML align attribute maps to the XSL-FO text-align attribute, but there are two different sets of values. The HTML align values of left, center, right, and justify map to the XSL-FO text-align values of start, center, end, and justify. The final complication here is that the alignment value might appear on the HTML <td> or <tr> or <thead> or <table> elements, so you have to check all of the ancestors until you find it. (Note that the text-align property is set on the <fo:block> element, not the <fo:table-cell>.)

Finally, if the <td>, <tr>, <thead>, or <table> element has a border="1" attribute, you draw a border around that cell of the table with the three XSL-FO properties border-style="solid" border-color="black" border-width="1pt".

Here's the complete XSLT template, most of which is made up of <xsl:choose> elements that determine which XSL-FO properties to use:

<xsl:template match="td">
  <fo:table-cell 
    padding-start="3pt" padding-end="3pt"
    padding-before="3pt" padding-after="3pt">
    <xsl:if test="@colspan">
      <xsl:attribute name="number-columns-spanned">
        <xsl:value-of select="@colspan"/>
      </xsl:attribute>
    </xsl:if>
    <xsl:if test="@rowspan">
      <xsl:attribute name="number-rows-spanned">
        <xsl:value-of select="@rowspan"/>
      </xsl:attribute>
    </xsl:if>
    <xsl:if test="@border='1' or 
                  ancestor::tr[@border='1'] or
                  ancestor::thead[@border='1'] or
                  ancestor::table[@border='1']">
      <xsl:attribute name="border-style">
        <xsl:text>solid</xsl:text>
      </xsl:attribute>
      <xsl:attribute name="border-color">
        <xsl:text>black</xsl:text>
      </xsl:attribute>
      <xsl:attribute name="border-width">
        <xsl:text>1pt</xsl:text>
      </xsl:attribute>
    </xsl:if>
    <xsl:variable name="align">
      <xsl:choose>
        <xsl:when test="@align">
          <xsl:choose>
            <xsl:when test="@align='center'">
              <xsl:text>center</xsl:text>
            </xsl:when>
            <xsl:when test="@align='right'">
              <xsl:text>end</xsl:text>
            </xsl:when>
            <xsl:when test="@align='justify'">
              <xsl:text>justify</xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text>start</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:when>
        <xsl:when test="ancestor::tr[@align]">
          <xsl:choose>
            <xsl:when test="ancestor::tr/@align='center'">
              <xsl:text>center</xsl:text>
            </xsl:when>
            <xsl:when test="ancestor::tr/@align='right'">
              <xsl:text>end</xsl:text>
            </xsl:when>
            <xsl:when test="ancestor::tr/@align='justify'">
              <xsl:text>justify</xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text>start</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:when>
        <xsl:when test="ancestor::thead">
          <xsl:text>center</xsl:text>
        </xsl:when>
        <xsl:when test="ancestor::table[@align]">
          <xsl:choose>
            <xsl:when test="ancestor::table/@align='center'">
              <xsl:text>center</xsl:text>
            </xsl:when>
            <xsl:when test="ancestor::table/@align='right'">
              <xsl:text>end</xsl:text>
            </xsl:when>
            <xsl:when test="ancestor::table/@align='justify'">
              <xsl:text>justify</xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text>start</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>start</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>
    <fo:block text-align="{$align}">
      <xsl:apply-templates select="*|text()"/>
    </fo:block>
  </fo:table-cell>
</xsl:template>

<tfoot> A table footer

The rarely used <tfoot> element creates a table footer. It contains a number of table row (<tr>) elements, each of which contains some number of table cells. To process it, you simply invoke the XSLT template for the <tr> elements that the <tfoot> element contains:

<xsl:template match="tfoot">
  <xsl:apply-templates select="tr"/>
</xsl:template>

<th> A cell in a table header

The HTML <th> element contains a cell of a table header. To process it, you create a <fo:table-cell> with 3-point padding all around (as specified earlier in the defaults for the <td> element). That <fo:table-cell> element contains an <fo:block> element with bold, centered text. The template checks this element's ancestors for the border="1" attribute; if any of them have that border attribute, it sets the XSL-FO border properties accordingly. Here's the complete XSLT template:

<xsl:template match="th">
  <fo:table-cell
    padding-start="3pt" padding-end="3pt"
    padding-before="3pt" padding-after="3pt">
    <xsl:if test="@border='1' or 
                  ancestor::tr[@border='1'] or
                  ancestor::table[@border='1']">
      <xsl:attribute name="border-style">
        <xsl:text>solid</xsl:text>
      </xsl:attribute>
      <xsl:attribute name="border-color">
        <xsl:text>black</xsl:text>
      </xsl:attribute>
      <xsl:attribute name="border-width">
        <xsl:text>1pt</xsl:text>
      </xsl:attribute>
    </xsl:if>
    <fo:block font-weight="bold" text-align="center">
      <xsl:apply-templates select="*|text()"/>
    </fo:block>
  </fo:table-cell>
</xsl:template>

<thead> A table header

You handle the rarely used <thead> element just like the <tfoot> element. Here's the simple XSLT template:

<xsl:template match="thead">
  <xsl:apply-templates select="tr"/>
</xsl:template>

<title> Document title

The assumption for all of these instructions is that you intend to ultimately convert the FO document to a PDF file, which has a layout you need to follow. To achieve the PDF layout called for in the tutorial example, you put the title of the document (the <title> element inside the <head> element inside the <html> element) in large, bold type, centered at the top of the page. Here's the XSLT template:

<xsl:template match="title">
  <fo:block space-after="18pt" line-height="27pt" 
    font-size="24pt" font-weight="bold" text-align="center">
    <xsl:apply-templates select="*|text()"/>
  </fo:block>
</xsl:template>

<tr> A table row

An HTML <tr> element maps directly to the XSL-FO <fo:table-row> element. Because most of the work of handling a table is in the template for the <td> element, all you have to do now is create the <fo:table-row> and invoke the XSLT templates for everything contained in the HTML <tr> element. Here's the simple template:

<xsl:template match="tr">
  <fo:table-row>
    <xsl:apply-templates select="*|text()"/>
  </fo:table-row>
</xsl:template>

<tt> Teletyped text

Teletyped text is rendered in a monospaced font. Here's the XSLT template:

<xsl:template match="tt">
  <fo:inline font-family="monospace">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

<u> Underlined text

To render underlined text, use the XSL-FO text-decoration property. Here's a short HTML sample:

<p>When typewriters ruled the earth, 
  <u>underlining</u> was the most 
  common way to highlight text.</p>

When you convert this to XSL-FO, use an <fo:inline> element with the text-decoration="underline" property:

<xsl:template match="u">
  <fo:inline text-decoration="underline">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Note that the keywords for the text-decoration property are both negative and positive. If for some reason you wanted a short section of text that had the underline property turned off in a long section with underline turned on, you could create an <fo:inline> element with a text-decoration="no-underline" property. You can also specify multiple values; the property text-decoration="underline line-through" turns on both underlining and strikethrough.


<ul> An unordered list

Unordered lists are simpler than ordered lists and definition lists because the <fo:list-item-label> is a bullet for all of the items. Here's an HTML list to use for a sample:

<p>A few of my favorite albums</p>
<ul>
  <li>A Love Supreme</li>
  <li>Remain in Light</li>
  <li>London Calling</li>
  <li>The Indestructible Beat of Soweto</li>
  <li>The Joshua Tree</li>
</ul>

You use the XSL-FO list elements <fo:list-block>, <fo:list-item>, <fo:list-item-label>, and <fo:list-item-body>. Here's how the HTML list looks when it's converted to XSL-FO:

<fo:block>A few of my favorite albums</fo:block>
<fo:list-block provisional-distance-between-starts="0.2cm"
  provisional-label-separation="0.5cm"
  space-after="12pt" start-indent="1cm">
  <fo:list-item>
    <fo:list-item-label end-indent="label-end()">
      <fo:block>•</fo:block>
    </fo:list-item-label>
    <fo:list-item-body start-indent="body-start()">
      <fo:block>A Love Supreme</fo:block>
    </fo:list-item-body>
  </fo:list-item>
  ...
  <fo:list-item>
    <fo:list-item-label end-indent="label-end()">
      <fo:block>•</fo:block>
    </fo:list-item-label>
    <fo:list-item-body start-indent="body-start()">
      <fo:block>The Joshua Tree</fo:block>
    </fo:list-item-body>
  </fo:list-item>
</fo:list-block>

This template uses the Unicode entity • for the bullet character.

The XSLT templates that transform the <ul> and <li> items into the formatting objects follow these rules when processing the <ul> element:

  • If this list appears inside another list, don't insert any space after it.
  • If this list is not inside any other lists, indent it by 1 cm. Otherwise, indent it by 1 cm plus 1.25 cm for each additional list. For example, if this list is three levels deep, indent the list by 3.5 cm.

Here are the XSLT templates:

<xsl:template match="ul">
  <fo:list-block provisional-distance-between-starts="1cm"
    provisional-label-separation="0.5cm">
    <xsl:attribute name="space-after">
      <xsl:choose>
        <xsl:when test="ancestor::ul or ancestor::ol">
          <xsl:text>0pt</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>12pt</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
    <xsl:attribute name="start-indent">
      <xsl:variable name="ancestors">
        <xsl:choose>
          <xsl:when test="count(ancestor::ol) or count(ancestor::ul)">
            <xsl:value-of select="1 + 
                                  (count(ancestor::ol) + 
                                   count(ancestor::ul)) * 
                                  1.25"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text>1</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:value-of select="concat($ancestors, 'cm')"/>
    </xsl:attribute>
    <xsl:apply-templates select="*"/>
  </fo:list-block>
</xsl:template>

<xsl:template match="ul/li">
  <fo:list-item>
    <fo:list-item-label end-indent="label-end()">
      <fo:block>•</fo:block>
    </fo:list-item-label>
    <fo:list-item-body start-indent="body-start()">
      <fo:block>
        <xsl:apply-templates select="*|text()"/>
      </fo:block>
    </fo:list-item-body>
  </fo:list-item>
</xsl:template>

<var> A variable name

A variable name is typically rendered in an italicized monospaced font. Use the XSL-FO font-family and font-style properties. Here's a short sample:

<p>To run the FOP program, you must make sure
  your <var>classpath</var> variable
  is set correctly.</p>

Here's the XSLT template that does the trick:

<xsl:template match="var">
  <fo:inline font-style="italic"
      font-family="monospace">
    <xsl:apply-templates select="*|text()"/>
  </fo:inline>
</xsl:template>

Download

DescriptionNameSize
Example files (html, xsl, fo, pdf)x-xslfo2app-samples.zip45KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12210
ArticleTitle=HTML to Formatting Objects (FO) conversion guide
publish-date=12112012