Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Dynamically generate internationalized PDFs in Web applications

Use open source technologies to generate double-byte PDF documents

Ning Yan (nyan@us.ibm.com), Software Engineer, IBM
Ning Yan is a Software Engineer at IBM. His expertise is in Web application development and business solutions that include DB2, WebSphere, and open-source technologies. He is an IBM Certified DB2 Specialist and Brainbench Certified Web Service Engineering and Web Developer, and he received an M.S. in computer science in 1996 from the State University of New York at Albany. You can contact Ning at nyan@us.ibm.com.
Ajay Raina (ajraina@us.ibm.com), Software Architect, IBM
Ajay Raina is a Team Lead in the IBM Corporate Webmaster team and helps in IBM’s business transformation using WebSphere technologies. He is the Lead Architect of Order Status OnLine application deployed on ibm.com. Earlier, he led the Authorization module of WebSphere Commerce Server. His expertise is in J2EE, WebSphere, and DB2. Ajay received his B.E. (Hons) in Electrical and Electronics Engineering from BITS, Pilani, India and an M.S. in Computer Science from New York University. You can contact Ajay at ajraina@us.ibm.com.

Summary:  Find out how to internationalize your PDF documents. This article describes a way to dynamically generate PDF documents in Java Web applications using open source technologies, with an emphasis on generating double-byte PDF documents. The approach described fits the popular Model-View-Controller architecture for Web applications. A sample Web application is provided for reference.

Date:  21 Jan 2005
Level:  Advanced

Activity:  6652 views
Comments:  

The Internet has enabled companies to do business in an international marketplace. This makes it imperative to have Web-enabled ways of delivering international content to customers. Portable Document Format (PDF) is a popular format for delivering content on the Web; you can easily download a PDF document using any popular browser and then view it using Adobe Acrobat Reader, or you can use Adobe plug-ins for viewing within a browser. Generating PDF content for an international audience poses challenges, especially since the double-byte nature of languages like Japanese, Chinese, and Korean require special considerations. A Unicode font is usually a good solution, but this may be platform specific. Another important consideration is to avoid changing the application business logic just because you desire the content in PDF format, in addition to the usual HTML.

We start by discussing fonts and languages in general, and then we describe an approach for generating PDF documents in Java Web applications using open source technologies.

Fonts in general

A font is a collection of character images, called glyphs, and the mappings from character codes to the glyphs. A character is a symbol that represents items like letters and numbers in a particular writing system. When a particular character is rendered, the shape representing this character is called a glyph. A popular font used in computer systems is the Unicode TrueType font. However, the TrueType font alone is not enough to render data in double-byte languages. The TrueType font is the best choice for rendering a wide range of characters, but it is system specific:

  • In the Windows operating system, Microsoft provides the Arial Unicode MS font, which enables the display of characters in most languages including double-byte languages. This font comes in a single file; by default it is not installed on the user's system. Windows' international support feature may be required in order to use this font.
  • Similarly, in IBM's AIX operating system TrueType fonts are not automatically installed if your system is installed and configured using English by default. You must install multiple font files such as AIXwindows Unicode TrueType fonts-CJK for languages like Japanese, Korean, and Chinese (simplified and traditional). If your operating system is Linux, we suggest you do additional research to see how to install TrueType fonts.

Fonts in PDF documents

In a standard Web application that supports internationalization, it's important to be careful about locale and encoding issues. However, for internationalizing PDF documents, you need to consider an additional font-related issue.

In PDF, embedded fonts make documents portable so that they can be viewed on any operating system with Adobe Acrobat Reader. If fonts are not embedded, a localized version of Reader on the user's system picks up the native available fonts for displaying that language. For example, if a system is English-language enabled, Base 14 PostScript fonts are used for substituting fonts on screen and for printing; this covers most single-byte languages, but does not cover any double-byte languages. The other option is to prompt the user to download Font Pack from Adobe for viewing international documents; however it's not a good idea to ask the customer to download additional font utilities. By using embedded fonts in PDF, you need not worry about whether a remote user or machine has the fonts required to display your document. So using embedded Unicode TrueType fonts for rendering internationalized PDF data is a good solution.


FOP and XSL-FO

Our approach of generating PDF documents in Web applications is based on Formatting Objects Processor (FOP). FOP is the world's first print formatter driven by XSL Formatting Objects (XSL-FO). It is an open source project under Apache's XML using Java technology.

Figure 1 illustrates the general flow for generating PDF documentation using open source FOP. The input data needs to be in XML UTF-8 encoding. The transformer takes the XML data, applies the stylesheet to it, and generates the PDF. To lay out the PDF, the FOP formatter needs to know the details about the fonts to be used in the document, particularly the widths of all the glyphs used. It needs these details to calculate line lengths, hyphenation, justification, and so on. This information is known as the metrics of the font, and is stored with each font. When the metrics are available to the formatter, the FOP formatter can successfully lay out the PDF. Later in this article, we discuss how the metrics file is generated.


Figure 1. PDF transformation using Apache open-source FOP
PDF transformation using Apache open-source FOP

In addition, the font-family attribute is also used in the stylesheet to generate PDF from XSL-FO. Based on the W3C definition, either the font family name or the generic family name can be used. The font-family values are Helvetica, Times, Courier, and Symbol. The generic families are serif, sansserif, cursive, fantasy, and monospace. In XSL-FO, the font-family property is a prioritized list of font family names, which are attempted in sequence to find an available font that matches the selection criteria (shown in Listing 1). However, the current FOP does not support the font-family list; it only uses the first font in the list, if that exists. For example, if you specify font-family="A,B,C" (as in Listing 1) and A doesn't exist, then B and C are ignored and not used. Please refer to the W3C site for more information.


Listing 1. Typical XSL-FO syntax using font list in font-family is not supported in FOP

	<fo:block text-align="center" 
			  font-family="A,B,C" 
			  font-size="16pt">
		<xsl:text>Welcome! </xsl:text>
	</fo:block>


Comparison with iText

We have discussed that FOP is an open-source Java program for PDF transformation that uses XSL-FO and XML. Another open source PDF transformer called iText (see Resources) uses an object-oriented approach and provides Java objects to render the PDF documents. Both approaches have their pros and cons, but we believe that the XSL-FO model fits better in the Model-View-Controller architecture (MVC) architecture at the view tier. It is well supported by Struts for Transforming XML and XSL (stxx) extensions, which are based on the Struts application framework. Additionally, this approach is better suited to generating PDF reports where the templates for the reports are specified using XSL, which is endorsed by W3C.


Sample Web application

We will use a sample Java Web application to demonstrate the steps involved in setting up Unicode fonts, generating the font matrix files, and using stxx to generate PDF documents.

The sample application takes some application data and transforms it into PDF. We assume that you have some general knowledge of XML, XSL-FO, Struts, and J2EE. The sample application has two versions:

  • The first version shows how the generated PDF cannot display double-byte languages properly without using the Unicode-embedded fonts
  • The second version demonstrates the additional steps needed to generate the PDF for double-byte languages

Essentially, it is the same application running in two modes which are controlled by a flag in an XML configuration file. This approach requires the data to be available in XML format with Unicode encoding. Many Web applications generate XML at some point, in which case that XML can be fed into the stxx FOP along with the appropriate XSL to do the transformation. However, if the XML is not already generated in the application, an additional step is required to transform the application data into XML and make it ready for transformation. The XML data that's fed into stxx is enhanced by adding the locale information. You can use the locale to pick the appropriate static messages displayed in the PDF document, along with the dynamic data. For example, in an order status application that generates PFD output, the static labels can be picked from a resource bundle using the appropriate locale.

To run the sample application, we have used Tomcat 4.1.30 and stxx 1.3. Tomcat is an open source servlet container. You are free to use another servlet container -- either an open-source one or a commercial product like IBM WebSphere Application Server. stxx is an extension of the Struts framework that supports XML and XSL without changing the runtime behavior of Struts. The current release of stxx is version 1.3, which uses FOP 0.20.4.

The sample application is a simple Struts-based application. We used IBM's WebSphere Studio as our development environment. (WebSphere Studio uses the Eclipse tooling framework, integrates well with Tomcat server, and has Java and XML development features). The data to be rendered as PDF comes from an XML file that contains multilingual content (see Figure 2). It includes the text "Product Service" translated into different double-byte languages. When the application is initialized, the embedded fonts are loaded into the Struts system. Using a stxx action, the XML document is converted to stxx XML format, which includes the Web application locale information. We used the Apache Jakarta Digester pattern to load the XML configuration file into a Struts plug-in. The stxx FOP transformer does the PDF transform by using the XSL-FO stylesheet. Arial Unicode MS, which belongs to the Helvetica font family, is specifically used in the stylesheet, and ultimately the font information is embedded in the PDF.

If you extract the provided sample.war file, you should be able to see all the configuration files and the source code.


Figure 2. Sample XML data
Sample XML data

Steps for deploying the sample application

The sample shown in this article uses an English language machine that doesn't have the Unicode font installed. If you are using a different machine (for example, a Japanese machine), the steps remain the same.

You can deploy the sample application on any J2EE-compliant servlet container. Instructions on how to install the application on Tomcat are provided below. (See Resources for additional detail.)

  1. Once Tomcat is installed, start the server.
  2. Using Tomcat Web Application Manager at a URL such as http://localhost:8080/manager/html/list, install the sample Web application by uploading the sample.war file provided in Download.
  3. Copy all the necessary jar files, such as stxx-1.3.jar and struts.jar, from stxx to the Tomcat sample Web application lib directory. The jar files list should look like those in Figure 3.
    Figure 3. jar files in sample Web application lib directory
    jar files in sample web application lib directory
  4. Bring up homepage.html using a URL such as http://localhost:8080/sample/homepage.html. You will see the home page of the sample application as in Figure 4. You have the option of viewing the PDF in a browser or downloading it to your system.
    Figure 4. Sample Web Application Homepage
    Sample Web Application Homepage
  5. Click View PDF to see the PDF rendered as in Figure 5.
    Figure 5. PDF rendered with the wrong font
    PDF rendered with the wrong font

As you can see, junk symbols like "#####" show up for the double-byte sample data. This means the default font setting in the system cannot display the characters for double-byte languages. At this point, you are finished running the first version of the application.

To ensure that the double-byte data is rendered correctly, take the following steps to embed the Unicode font for use with FOP:

  1. Install the Unicode TrueType font if your system doesn't support it.
  2. Generate a Unicode TrueType font metrics file.
  3. Register the embedded font with FOP.
  4. Transform using FOP in a servlet engine.

Step 1: Install the Unicode TrueType font

This sample is developed on a Windows system. Refer to Microsoft's international support site to install the Arial Unicode MS font (see Resources). If you use another operating system, please follow the standard instructions to install the font.

Step 2: Generate a Unicode TrueType font metrics file

TrueType font files come in two types: a TrueType Font file (.ttf extension) and a TrueType Collection file (.ttc extension). FOP allows both of them to be embedded.

After the Unicode font is installed, check to see if the font file is in your font directory (For example, C:\windows\fonts\ARIALUNI.TTF). Then use the FOP command in Listing 2 to generate the font matrix file. Make sure that your classpath is set correctly. Save the generated XML file for the Web application to use at run time.


Listing 2. Using FOP to generate a font matrix file

$ java org.apache.fop.fonts.apps.TTFReader C:\windows\fonts\Arialuni.ttf arialuni.xml

Step 3: Register the embedded font with FOP

It is a good idea to register the embedded font with FOP when you initialize the application. In Struts, you can use a plug-in to load the matrix file. You can create an XML file that specifies the font matrix file location. The format we chose is shown in Listing 3.

Modify userconfig.xml to pick up the font matrix file created above.


Listing 3. Use userconfig.xml to register the embedded Unicode TrueType font

   <font metrics-file="C:/temp/font/arialuni/arialuni.xml" 
   			embed-file="C:/windows/Fonts/arialuni.ttf" kerning="yes">
    	<font-triplet name="arialuni" style="normal" weight="normal"/>
    	<font-triplet name="arialuni" style="normal" weight="bold"/>
    	<font-triplet name="arialuni" style="italic" weight="normal"/>
    	<font-triplet name="arialuni" style="italic" weight="bold"/>
  </font>

The application configuration is specified in a separate configuration file (pdf-userconfig.xml), which is selected by the Struts plug-in. It is in the "WEB-INF" folder in the sample application. The content is shown in Listing 4. By default, the enabled attribute is set to false.


Listing 4. pdf-userconfig.xml Web application configuration

 <configuration>
	<pdf-fonts>
		<userconfig name="pdf-unicode" 
					path="font\userconfig.xml" 
					enabled="false" 
					comment="for Unicode Font"/>
	</pdf-fonts>
 </configuration>

To load the font matrix file, load userconfig.xml using the Struts plug-in and Java code as shown in Listing 5.


Listing 5. Java code used in Struts plug-in to enable embedded font

	...
	try {
		File userConfigFile = new File("userconfig.xml");
		org.apache.fop.apps.Options options 
				= new org.apache.fop.apps.Options(userConfigFile);
	} catch (FOPException fe) {
		fe.printStackTrace();
	}
	...

Step 4: Transform using FOP in a servlet engine

Following the Struts architecture, you can create an action that takes three steps to generate the PDF document:

  1. Construct an XML document.
  2. Process the documentation view option over the Web.
  3. Render the PDF.

Listing 6 shows the code.


Listing 6. Struts action for rendering PDF

 public class SampleXslFoAction extends Action {
 
 	private String xmlUsed = "/xml/sample_transform.xml"; //default
	private String successFwd = "success"; //default
	
	public org.apache.struts.action.ActionForward execute(
					ActionMapping mapping,
					ActionForm form,
					HttpServletRequest request,
			    	HttpServletResponse response)
			throws IOException, ServletException {
				
		//**************************
		// make user selections
		//**************************
		decideSuccessFwd(request);
		decideXmlUsed(request);
		
		//*******************
		// Construct XML 
		//*******************
		Document doc = null;
		try { 
			String fileName = request.getRealPath(xmlUsed);
			FileInputStream fis = new FileInputStream(fileName);
			doc = new SAXBuilder().build(fis);
		} 
		catch (Exception ex) {
			ex.printStackTrace();
		}

		saveDocument(request, doc);
		//****************************
		// process PDF doc view option
		//****************************
		if(request.getParameter("viewformat").equalsIgnoreCase("download")){
			response.setContentType("application/pdf");
			response.setHeader("Content-Disposition",
			"attachment;filename=pdfdoc.pdf");
		}
		//**************************
		//Go forward rendering it
		//**************************
		return mapping.findForward(successFwd);
	}
	...
}

All of the configuration and code is already implemented in the sample. Once you have followed the steps to generate the font matrix file and change the userconfig.xml file, you have to change the enabled property to true in the pdf-userconfig.xml file. Then, restart the Tomcat server and access homepage.html again. This time, if you click View PDF, you should see all the content displayed correctly as in Figure 6.


Figure 6. PDF view of FOP-transformed multilingual content
PDF view of FOP-transformed multilingual content

As you can see, the sample text "Product Service" is rendered correctly in various languages.


Conclusion

You can use XSL-FO and FOP to dynamically generate international PDF documents in Web applications. In the absence of a universal solution that can satisfy the font requirements for different operating systems, XSL-FO provides a system-dependent solution for rendering international PDF documents. The stxx feature used in this article fits in well with the MVC architecture for Web applications. In this approach, the PDF transformation becomes transparent to the application developer. Once an XML DTD is finalized, the application developer can focus on the business logic and the XSL developer can define the XSL for transformation.



Download

DescriptionNameSizeDownload method
Sample Web app to generate internationalized PDFsx-ospdf_OpenSourcePDF.zip43 KBHTTP

Information about download methods


Resources

About the authors

Ning Yan is a Software Engineer at IBM. His expertise is in Web application development and business solutions that include DB2, WebSphere, and open-source technologies. He is an IBM Certified DB2 Specialist and Brainbench Certified Web Service Engineering and Web Developer, and he received an M.S. in computer science in 1996 from the State University of New York at Albany. You can contact Ning at nyan@us.ibm.com.

Ajay Raina is a Team Lead in the IBM Corporate Webmaster team and helps in IBM’s business transformation using WebSphere technologies. He is the Lead Architect of Order Status OnLine application deployed on ibm.com. Earlier, he led the Authorization module of WebSphere Commerce Server. His expertise is in J2EE, WebSphere, and DB2. Ajay received his B.E. (Hons) in Electrical and Electronics Engineering from BITS, Pilani, India and an M.S. in Computer Science from New York University. You can contact Ajay at ajraina@us.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=33313
ArticleTitle=Dynamically generate internationalized PDFs in Web applications
publish-date=01212005
author1-email=nyan@us.ibm.com
author1-email-cc=dwxed@us.ibm.com
author2-email=ajraina@us.ibm.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers