Tip: Counting with node sets

Using special properties of XSLT node sets to make life easier

Many common XSLT tasks, including simple loops, can be made easier by using special properties of node set operations. This tip discusses using node sets for simple and efficient loop control.

Share:

Uche Ogbuji, Principal Consultant, Fourthought, Inc.

Photo of Uche Ogbuji Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colo. USA. You can contact Mr. Ogbuji at uche@ogbuji.net.



01 May 2002

As with all programming languages, getting to know XSLT's built-in data types and structures is essential to mastering the language. Node sets are the most interesting creatures among XPath's data types (which form the basis of XSLT's data types). In this discussion, I'll show a couple of nonobvious ways that you can use node sets to simplify XSLT processing.

Counted loops in traditional XSLT

XSLT provides a primitive operation for iterating all the items in a node set: xsl:for-each. If you have used XSLT seriously, you are also familiar with the standard approach for making iterations based on a given number, rather than on a given node set. As an illustration, the following XSLT template takes a number, and prints that many asterisks:

Listing 1. Template for printing a specified number of asterisks
                <xsl:template name="print-asterisks"> <xsl:param name="count"/> <!--
                The termination condition (infinite recursion is no fun) --> <xsl:if
                test="$count"> <!-- print the asterisk for this iteration -->
                <xsl:text>*</xsl:text> <!-- recursive call to print remaining
                asterisks --> <xsl:call-template name="print-asterisks"> <xsl:with-param
                name="count" select="$count-1"/> </xsl:call-template> </xsl:if>
                </xsl:template>

If you are unfamiliar with this technique, find a good XSLT tutorial or book right away and learn how such recursive templates work. This is one of the most fundamental techniques in XSLT. Even though this tip offers an occasional way around it, you won't get very far in XSLT without being able to rattle off such code in your sleep.

The template takes a single parameter, which is the number of asterisks to count out and print. When the template is initially called, you pass in the total number of asterisks to print. I go easy on the error-checking in this script. For example, if you were to pass in a negative number value for count, the result would be infinite recursion. The if test avoids infinite recursion in normal cases by doing nothing when the count falls to zero. Then a single asterisk is printed and the template is called recursively (with one subtracted from the count) to print the remaining asterisks.

Performance is the biggest problem with this approach. Recursion in its raw form can take up a lot of resources. Most XSLT processors recognize this as an example of tail recursion, which can be optimized into a regular iteration. This helps, but even iteration can be slow if it goes through the machinery of template dispatch each time. Perhaps by now some XSLT processors have even more sophisticated optimizers that eliminate even this overhead, but I wouldn't count on such advancement yet. In general, when each step in the recursion is a trivial operation (such as printing a single asterisk) the overhead can be a problem.


The node set trick

You could use xsl:for-each for such loops if you were able to contrive a node set of exactly the length you want. One way to do this is to take a node set that is longer than the length you want, and select the subset with the right length. The following XPath expression does this, if count is the desired number and nodeset is a node set you know to be longer than count:

 $nodeset[position() < $count]

From the source node set, the predicate creates another node set with exactly count nodes. The main question is where to get nodeset from. Any means of obtaining a node set is fine for the job, as long as the resulting node set is large enough. You could then just grab a bunch of nodes from the source document -- or better yet, all of them -- using the XPath //node(). The problem is that you can't always rely on the length of the source document. The stylesheet itself is probably a better source, since you can vouch for its size when you write the transform, and you can even pad it with dummy nodes if necessary. The expression document("") gets the entire stylesheet as a secondary source document.

Using these tricks, you can rewrite the template for printing asterisks to the following:

Listing 2. Using a tailored node set for looping
 <!-- use all nodes in the current stylesheet as a
                source --> <xsl:variable name="nodeset" select="document('')//node()"/>
                <xsl:template name="print-asterisks"> <xsl:param name="count"/>
                <xsl:if test="$count > count($nodeset)"> <!-- Basic safety measure:
                better to crash and burn than to fail in a non-obvious way --> <xsl:message
                terminate="yes"> Not enough nodes for iteration </xsl:message>
                </xsl:if> <!-- Execute the loop, using the node set we want -->
                <xsl:for-each select="$nodeset[position() < $count]">
                <xsl:text>*</xsl:text> </xsl:for-each> </xsl:template>

The node set of all nodes in the stylesheet is constructed once, at top-level, and can be reused for any such loop in the transform. The template first checks that there are sufficient nodes for the iteration, and aborts all processing if there aren't. While you may choose more elegant error handling than this, do not omit the check or you may request a certain number of iterations, and end up with fewer without any warning. Such errors can be very hard to spot.

Another potential disadvantage is that for some XSLT implementations, the document("")//node() operation can be expensive in terms of time and space. The stylesheet could be reparsed, and then plumbed for every node. This is a one-time penalty for the stylesheet execution. If you use the trick several times, you'll probably still get an appreciable speed boost. If you only need iterations of smaller length, you could use the variation document("")/node(), which restricts the mining of nodes to the top level. There are a handful of other tricks along these lines that you can use to suit your purposes. For instance, you can decrease the chances of running out of nodes by creating a node set from both the stylesheet and the source document: //node()|document("")//node().


Conclusion

Someone lacking charity might call this technique a hack, but as long as you understand the standard iteration tricks for XSLT, you can use this short-cut when you really need it. It looks as if this trick will become redundant with XPath and XSLT 2.0, which have far more sophisticated looping primitives built in, but it could be years before these are finalized and compliant implementations emerge.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12108
ArticleTitle=Tip: Counting with node sets
publish-date=05012002