Skip to main content

Tip: Control white space in an XSLT style sheet

Create the document you want by understanding white space stripping

Nicholas Chase (nicholas@nicholaschase.com), President, Chase and Chase Inc.
Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, Florida, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at nicholas@nicholaschase.com.

Summary:  Because the style sheet and the source document in an XSLT transformation have different rules regarding white space stripping, it often seems as though the production of spaces and line breaks has no rhyme or reason in the process. This tip shows you how to control the production of white space in a transformation's result, which can lead to documents that more closely align with your requirements.

View more content in this series

Date:  01 Nov 2002
Level:  Introductory
Activity:  2901 views
Comments:  


Note: For this tip, you can use any XSLT processor, such as Xalan or Saxon, or a browser-based solution, such as Microsoft Internet Explorer or Mozilla.


White space stripping rules

Before processing a transformation, an XSLT processor analyzes the style sheet and the source document, and removes any applicable white space nodes. It then processes the document, building the result tree from the remaining nodes.

Let's look at a basic transformation. The source document contains raw FAQ information that will be translated into a different XML structure for processing by a second application:


Listing 1. The source document
                
		
<?xml version="1.0"?>
<?xml-stylesheet href="style.xsl" type="text/xsl"?>
<faqs>
   <question>
      <questiontitle>Output DOM object?</questiontitle>
      <questiontext>
        Is there an easy way to send a DOM Document to the screen?
        <address>confused@wisconsin.com</address>
      </questiontext>
      <answer>
          <answertext>
              <line>Yes, as a matter of fact, there is. </line>
              <line>All you have to do is transform the </line>
              <line>Document, but don't add a style sheet:</line>
          </answertext>
          <codesection><codeline>...</codeline>
          <codeline>  DOMSource source = new DOMSource(myDocObject);</codeline>
          <codeline>  StreamResult result = new StreamResult(System.out);</codeline>
          <codeline>  TransformerFactory transFactory = </codeline>
          <codeline>                         TransformerFactory.newInstance();</codeline>
          <codeline>  Transformer transformer = transFactory.newTransformer();</codeline>
          <codeline>  transformer.transform(source, result);</codeline>
          <codeline>...</codeline></codesection>       
      </answer>
   </question>
</faqs>

The style sheet for this transformation is intended to combine individual lines, but will ultimately have to preserve the white space in the code section:


Listing 2. The basic style sheet
                
		
<?xml version="1.0"?>
<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<faqoutput>
    <info>
       <title>
         <xsl:value-of select="faqs/question/questiontitle"/>
       </title>
    </info>
    <xsl:apply-templates/>
</faqoutput>
</xsl:template>

<xsl:template match="question">
<issue><xsl:value-of select="questiontext"/></issue>

<solution><xsl:apply-templates select="answer"/></solution>
</xsl:template>

<xsl:template match="answertext">
<answered><xsl:apply-templates/></answered>
</xsl:template>

<xsl:template match="codesection">
<code>
<xsl:apply-templates/>
</code>
</xsl:template>
</xsl:stylesheet>

Transforming the document shows the results of the default rules for white space stripping:


Listing 3. The result set
                
		
<?xml version="1.0" encoding="UTF-8"?>
<faqoutput><info><title>Output DOM object?</title></info>
   <issue>
        Is there an easy way to send a DOM Document to the screen?
        confused@wisconsin.com
      </issue><solution>
          <answered>
              Yes, as a matter of fact, there is.
              All you have to do is transform the
              Document, but don't add a style sheet:
          </answered>
          <code>...
            DOMSource source = new DOMSource(myDocObject);
            StreamResult result = new StreamResult(System.out);
            TransformerFactory transFactory = 
                                   TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(source, result);
          ...</code>
      </solution>
</faqoutput>

Notice that the white space nodes within individual templates, such as those in the main template, have been removed, but that the nodes within the source document, such as those between the line elements in answertext, have been preserved. There are several options for dealing with these issues.


Controlling white space in the source

When the processor strips the white space nodes from an element, it first checks to see if that element is on a list of white-space preserving elements. By default, all of the source nodes are added to this list, but you can remove one (or more) by adding them to the xsl:strip-space element. For example, you can strip all of the white space nodes out of the question element in order to compress any responses within the question or answer texts:


Listing 4. Stripping the question element
                
		
<?xml version="1.0"?>
<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements="question"/>

<xsl:template match="/">
<faqoutput>
    <info>
       <title>
...

If necessary, you can add more element names to the elements attribute by using a space-delimited list. The result of this rather brute-force method is that all of the white space nodes are stripped out:


Listing 5. The stripped results:
                
		
<?xml version="1.0" encoding="UTF-8"?>
<faqoutput><info><title>Output DOM object?</title></info>
   <issue>
        Is there an easy way to send a DOM Document to the screen?
        confused@wisconsin.com</issue><solution><answered>Yes, as a matter of fa
ct, there is. All you have to do is transform the Document, but don't add a styl
e sheet:</answered><code>...  DOMSource source = new DOMSource(myDocObject);  St
reamResult result = new StreamResult(System.out);  TransformerFactory transFacto
ry =                          TransformerFactory.newInstance();  Transformer tra
nsformer = transFactory.newTransformer();  transformer.transform(source, result)
;...</code></solution>
</faqoutput>

You may have noticed that there is still white space between the question and the text of the address element, and within the code section. This may seem confusing, but remember we're stripping out white space nodes, not the white space itself. If a text node has anything other than white space in it, the processor will never strip it out, no matter what other settings are in place.

On the other hand, all of the line breaks in the code are now gone, which was not desired. Once an element has been removed from the list of white space preserving elements, you can add one of its descendents back in using the xsl:preserve-space element:


Listing 6. Preserving white space
                
		
<?xml version="1.0"?>
<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements="question"/>
<xsl:preserve-space elements="codesection"/>

<xsl:template match="/">
<faqoutput>
    <info>
...

As a result of this change, the codesection element is added back in to the list of white space preserving elements, so the line breaks are retained:


Listing 7. The preserved results:
                
		
<?xml version="1.0" encoding="UTF-8"?>
<faqoutput><info><title>Output DOM object?</title></info>
   <issue>
        Is there an easy way to send a DOM Document to the screen?
        confused@wisconsin.com</issue><solution><answered>Yes, as a matter of fa
ct, there is. All you have to do is transform the Document, but don't add a styl
e sheet:</answered><code>...
            DOMSource source = new DOMSource(myDocObject);
            StreamResult result = new StreamResult(System.out);
            TransformerFactory transFactory = 
                                   TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(source, result);
          ...</code></solution>
</faqoutput>

The document still could be a little more readable, however, by managing the white space within the style sheet.


Adding white space to the style sheet

When the processor strips the white space nodes from the style sheet, only one element is, by default, on the list of white space preserving elements: xsl:text. A text element is always preserved, so it can be useful for adding line breaks or spaces within a document:


Listing 8. Adding line breaks with xsl:text:
                
		
...
<xsl:template match="question">
<issue><xsl:value-of select="questiontext"/></issue><xsl:text>

   </xsl:text><solution><xsl:apply-templates select="answer"/></solution>
</xsl:template>
...

You can use the xml:space attribute to individually control the addition of individual elements to the list of white space preserving elements, as well as the removal of individual elements from that list:


Listing 9. Controlling white space with xml:space
                
		
...
<xsl:template match="/">
<faqoutput><xsl:text>
    </xsl:text><info xml:space="preserve">
       <title xml:space="default">
         <xsl:value-of select="faqs/question/questiontitle"/>
       </title>
    </info>
    <xsl:apply-templates/>
</faqoutput>
</xsl:template>
...

The result in this case is that the white spaces around the title element are preserved, while those within the title element are discarded.


Listing 10. Results of controlling white space in a style sheet:
                
		
<?xml version="1.0" encoding="UTF-8"?>
<faqoutput>
    <info xml:space="preserve">
       <title xml:space="default">Output DOM object?</title>
    </info>
   <issue>
        Is there an easy way to send a DOM Document to the screen?
        confused@wisconsin.com</issue>

   <solution><answered>Yes, as a matter of fact, there is. All you have to do is
 transform the Document, but don't add a style sheet:</answered><code>...
            DOMSource source = new DOMSource(myDocObject);
            StreamResult result = new StreamResult(System.out);
            TransformerFactory transFactory = 
                                   TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(source, result);
          ...</code></solution>
</faqoutput>


Summary

By understanding the rules for white space preservation as they apply to a style sheet or a source document, you can closely control the appearance of the final document, but white space will never be stripped from within a text node.


Resources

About the author

Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, Florida, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at nicholas@nicholaschase.com.

Comments



Trademarks

static.content.url=/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12186
ArticleTitle=Tip: Control white space in an XSLT style sheet
publish-date=11012002
author1-email=nicholas@nicholaschase.com
author1-email-cc=