Publishing XML data in HTML and PDF using a single XSLT stylesheet

Easily and rapidly convert XML data to HTML and PDF

Learn how to publish XML documents in HTML and PDF using the css2xslfo utility.

Claudius Teodorescu, XML Consultant, Independent author

Photo of Claudius TeodorescuClaudius Teodorescu is an XML consultant based in Bucharest, Romania. He has more than 7 years of experience with XML, XForms, XPath, XSLT, and XSD. You can contact him at claudius.teodorescu@gmail.com.



03 January 2012

Also available in Chinese Russian Japanese Vietnamese

Prerequisites

Familiarity with XML and other W3C standards is useful. To run the examples in this article, install the eXist XML database along with its XQuery extension function for digital publishing. See Resources.

Storing data, for example as part of an XForms/REST/XQuery [XRX] architecture (see Resources), is commonplace today. You can query, retrieve, and serialize data stored in this manner to the desired format. For web applications, developers might want to allow users to retrieve data in HTML format (to be viewed in their web browser) or as a PDF file (to be downloaded for later use).

This article shows how you can convert XML data to HTML, to XSL-FO from that HTML, and to PDF from the XSL-FO with the help of an XQuery extension function based on the CSSToXSLFO tool (see Resources).

Converting XML data to other formats

Frequently used acronyms

  • CSS: Cascading stylesheet
  • HTTP: Hypertext Transfer Protocol
  • JAR: Java archive
  • LDAP: Lightweight Directory Access Protocol
  • SQL: Structured Query Language
  • W3C: World Wide Web Consortium
  • XSL-FO: Extensible Stylesheet Language Formatting Objects
  • XSLT: Extensible Stylesheet Language Transformations

Typically, you need two XSL stylesheets to transform data from XML to HTML and XSL-FO: one to transform XML to HTML and one XSL-FO stylesheet to transform the HTML to an XSL-FO document, which you then convert to PDF using an XSL-FO processor. This process means that in an environment such as a web application that generates reports in HTML and PDF, you must write a new XSLT stylesheet that transforms data to HTML, and then write, debug, and maintain the corresponding XSL-FO stylesheet. Keeping up with these stylesheets can be difficult, and XSL-FO isn't easy to work with. Fortunately, because the reports don't have to be sophisticated, you can easily express the layout in a CSS file. Then, you can further process the CSS file using the CSSToXSLFO utility, which can generate the XSL-FO document with just a bit of coding.

This process and other, similar situations are perfect use cases for the approach that this article presents—namely, writing an XSLT stylesheet that transforms XML to HTML, then adding a few extra CSS instructions to it so that it can transform the HTML to XSL-FO and then to PDF.


The CSSToXSLFO utility

The CSSToXSLFO utility allows conversion of an XML document, together with a CSS version 2 (CSS2) stylesheet, into an XSL-FO document. To use this utility, I developed an XQuery extension function to the eXist XML database as part of the XQuery extension module for digital publishing.

This utility processes most of the CSS2 specifications. For dealing with specific XSL-FO features, it provides several CSS extension instructions that browsers typically ignore. Use these properties in the @page rule of the @media print section in the CSS stylesheet. The properties are related to page regions, numbering, references, leaders, named strings, hyphenation, footnotes, external graphics, and foreign elements. Be sure to look at the manual for the CSSToXSLFO utility (see Resources) for more information on the tool and tips for refining the XSLT stylesheets that you design.


The eXist XML database

eXist-db is an open source database management system built completely with XML technology. It supports, among other standards, XQuery, XPath, and XSLT. eXist stores data according to the XML data model and is highly compliant with the XQuery standard. The stored data is processed with XQuery in an index-based manner. The database also has a full-text index based on Apache Lucene.

The XQuery engine of eXist is extensible, so eXist has various XQuery extension modules. These modules provide XQuery extension functions, such as those for:

  • Global key-value cache
  • Various compression operations
  • Additional operations on date and time types
  • Various operations on files and directories
  • HTTP requests (an XPath module)
  • Operations on images stored in the database, including retrieving image dimensions, creating thumbnails, and resizing images
  • Accessing and manipulating Java™ Naming and Directory Interface-based directories, such as LDAP
  • Sending text or HTML emails
  • Scheduling jobs and manipulating existing jobs
  • Performing SQL operations against the relational database management system
  • Determining the differences between XML nodes
  • XSL-FO rendering
  • XProc functionality
  • Cryptographic operations

Both eXist and CSSToXSLFO are written in the Java language. At the time of writing, eXist allows you to use the Apache Formatting Objects Processor (FOP) or RenderHouse XEP as the XSL-FO processor. Check the eXist website for instructions on installing the database so you can run the examples for this article.

The eXist module for digital publishing

The eXist module for digital publishing is currently under development and will comprise more functions in the future. For now, it has the html-to-xslfo() function, which helps with the approach provided in this article.

The intention behind this module is to provide a single source for all the XQuery extension functions needed for digital publishing, including conversion between various formats, such as DocBook, Open XML, DOC, DOCX, .html, PDF, TXT, RTF, PPT, PPTX, and CSV. To install this module within eXist:

  1. Download the eXist digital publishing module JAR, and copy it into $EXIST_HOME/lib/extensions.
  2. Download css2xslfo1_6_2.jar, and copy it into $EXIST_HOME/lib/user.
  3. Add <module class="ro.kuberam.kPub.kPubModule" uri="http://kuberam.ro/k-Pub"/> to the built-in modules section in the $EXIST_HOME/conf.xml file.

Examples

This section analyzes the use of CSSToXSLFO as it is implemented in the XQuery extension module of eXist for digital publishing. For this task, use the XML data presented in Listing 1, along with the XSLT stylesheet (xml-to-html.xsl) included in the code example available for download.

Note: To make this article easier to read, I didn't include the complete contents of the XSLT stylesheet used. Instead, I present only those elements that make the stylesheet suitable for converting XML to HTML and PDF.

The examples will be uploaded to an eXist XML database in a collection called html-and-pdf-single-stylesheet located in the root collection of eXist, so you can view each example in your browser. For example, to see example 1, type this URL in your browser's address bar (assuming that eXist is installed locally):

http://127.0.0.1:8080/rest/db/html-and-pdf-single-stylesheet/example%201/example1.xql

To use much more of the functionality in the CSSToXSLFO tool, you don't need any other XQuery code beyond that provided in Listing 2 and Listing 3, which render XML data in HTML format and PDF, respectively. To obtain increasingly refined PDF documents, you must add CSS instructions to the CSS section of the XSLT stylesheet.

Listing 1. An XML document representing a summary of issued invoices (file xml-data.xml in the example code)
<invoices-summary> 
  <invoice id=""> 
    <issue-date>2011-10-17</issue-date> 
    <amount>108</amount> 
    <vat>19.47</vat> 
    <vat-base>22</vat-base> 
    <currency>EURO</currency> 
    <customer-id>0001008</customer-id>
   </invoice> 
  <invoice id=""> 
    <issue-date>2011-10-17</issue-date> 
     <amount>40</amount> 
    <vat>7.21</vat> 
     <vat-base>22</vat-base> 
     <currency>EURO</currency> 
    <customer-id>0000017</customer-id> 
  </invoice> 
  <invoice id=""> 
    <issue-date>2011-10-17</issue-date> 
    <amount>1700</amount> 
    <vat>306.56</vat> 
    <vat-base>22</vat-base> 
    <currency>EURO</currency> 
     <customer-id>0000040</customer-id> 
  </invoice> 
</invoices-summary>

The first example, in Listing 2, is an XQuery script that transforms the XML data presented in Listing 1 to HTML. The script uses the transform:transform() function of eXist, which in turn transforms XML data using an XSLT stylesheet and (optionally) parameters for transformation. You can write the XSLT stylesheet in XSLT 1.0 (based on Apache Xalan) or XSLT 2.0 (optional with Saxon).

Listing 2. XQuery script that transforms XML data to HTML (file example-01.xql in the example code)
xquery version "1.0"; 
let $xml-data := doc('/db/html-and-pdf-single-stylesheet/xml-data.xml') 
let $xslt-stylesheet := doc( '/db/html-and-pdf-single-stylesheet/xml-to-html.xsl' ) 
let $html := transform:transform($xml-data, $xslt-stylesheet, ()) 
return $html

Figure 1 shows the rendering of the resulting HTML document in my browser. The HTML document displays a summary of details for three invoices in a non-serif font. (View a text version of the formatted content found in Figures 1 and 2.)

Figure 1. The result of the transformation to HTML
Screen capture showing the result of the transformation to HTML

First, the XML data is converted to HTML, as in the previous example. The resulting HTML document contains all the CSS instructions needed for rendering the HTML document as intended, along with the CSS extension instructions specific to CSSToXSLFO that will help in using more sophisticated features of XSL-FO.

For a simple use case such as the one presented in this article, you do not need such extension instructions. The CSSToXSLFO utility transforms the HTML document into an XSL-FO document that in turn generates a PDF document that will closely resemble the HTML document.

Next, you convert the resulting HTML document to an XSL-FO document using the html-to-xslfo() function, then generate a PDF document as in Listing 3. To create the PDF, use the render() function of the xslfo eXist module.

Listing 3. XQuery script that transforms XML data to PDF format (file example-02.xql in the example code)
xquery version "1.0"; 
declare namespace xslfo="http://exist-db.org/xquery/xslfo"; 
declare namespace k-Pub="http://kuberam.ro/k-Pub"; 
let $xml-data := doc('/db/html-and-pdf-single-stylesheet/xml-data.xml') 
let $xslt-stylesheet := doc('/db/html-and-pdf-single-stylesheet/xml-to-html.xsl') 
let $html := transform:transform($xml-data, $xslt-stylesheet, ()) 
let $fo := k-Pub:html-to-xslfo($html) 
let $pdf := xslfo:render($fo, "application/pdf", ()) 
return response:stream-binary( $pdf, "application/pdf", "output.pdf" )

Figure 2 shows the rendered resulting PDF document in my browser. The PDF document displays a summary of details for three invoices in a serif font. (View a text version of the formatted content found in Figures 1 and 2.)

Figure 2. The result of the transformation to PDF format
Screen capture showing the resulting PDF document

Listing 4 contains the CSS instructions needed to render XML data in both HTML and PDF. To obtain a similar PDF file, I added only one CSS instruction—to make the table headers bold.

Listing 4. CSS instructions to render XML to both HTML and PDF with a similar appearance
body { 
    font-family: arial; 
    font-size: 12px; 
    text-align: center; 
} 
table { 
    border-collapse: collapse; 
    width: 100%; 
    border: solid black 1px; 
} 
table th, td { 
    border: solid black 1px; 
} 
@media screen { 
    body { 
        width: 570px; 
    } 
} 
@media print { 
    table th { 
        font-weight: bold; 
    } 
}

Conclusion

In this article, you used a simple function to convert XML data to both HTML and PDF formats using just the power and simplicity of CSS syntax and some extension instructions to deal with more complex XSL-FO features. This approach is particularly useful in situations where you have reports or documents with simple styling.


Download

DescriptionNameSize
Complete code exampleshtml-and-pdf-single-stylesheet.zip4KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=781448
ArticleTitle=Publishing XML data in HTML and PDF using a single XSLT stylesheet
publish-date=01032012