In this article you'll use some of the new features of XSLT 2.0 and XPath 2.0. XSLT 2.0 has a number of functions and elements that allow you to specify a collation. A collation is the heart of any sorting algorithm. A collation function compares two items and returns one of three values. If the first item appears before the second, the function returns a value less than zero. If the two items are equal, the function returns zero. Finally, as you might expect, if the first item appears after the second, the return value is greater than zero.
The examples in this article use the Java-based Saxon XSLT 2.0 processor. Saxon implements the XSLT 2.0 specification (its author, Michael Kay, was the editor of the XSLT 2.0 spec), including custom collations. To use a custom collation with Saxon, you specify the name of the Java class that implements the collation function.
We'll cover three examples that:
- Sort a list of Spanish words
- Compare German words
- Sort a list of bands and musicians, and ignore the text "
The" at the start of the band name
More on Spanish and German collations
Your author is in no way a speaker of Spanish or German, so please pardon any incorrect statements about the languages themselves. The point here is to illustrate how to create extensions that implement custom collations and then use those extensions to sort and compare text in your stylesheets.
The traditional Spanish collation, the one you'll implement here, treats ch, ll and ñ as separate letters that sort after c, l and n respectively. However, much of the Spanish speaking world now uses the modern Spanish collation, defined by the Association of Spanish Language Academies (La Asociación de Academias de Lingua Española). The modern Spanish collation doesn't treat ch or ll as special characters; they sort as they would in English. The letter ñ still sorts after the letter n.
To sort German, you can choose from three collations: DIN-1, DIN-2 and Austrian.
(The DIN standards are defined by the standards body Deutsches Institut
Für Normung.) The collation used varies from one country to the next.
Typically DIN-1 is used to sort words, although in Switzerland it's also used to
sort names. DIN-2, the collation algorithm you'll implement here, is used to sort
names in Germany. Austria uses the Austrian collation, although it seems to be
disappearing in favor of the DIN-2 rules. The main complication for the DIN-2
algorithm you'll implement here is that ä is equal to ae, ö is equal to oe, à is equal to ss and ü is equal to ue. The code has to realize that two characters in one word are equivalent to one character in another word.
Sorting a list of Spanish words
The first custom collator is one to sort Spanish words. The Spanish alphabet contains 30 letters; in addition to the 26 basic letters of Western European languages, ch (che), ll (elle), ñ (eñe) and rr (erre) are considered separate letters as well. The traditional Spanish collation sorts words beginning with ch after anything starting with cz, words beginning with ll after anything starting with lz and words starting with ñ after any word starting with n. The letter rr doesn't sort in any special way. Our Spanish custom collation implements these rules.
Here is the list of Spanish words:
Listing 1. A list of Spanish words
<?xml version="1.0"?>
<!-- spanish-words.xml -->
<wordlist>
<word>campo</word>
<word>luna</word>
<word>ciudad</word>
<word>llaves</word>
<word>chihuahua</word>
<word>arroz</word>
<word>limonada</word>
</wordlist>
|
Defining a custom collation algorithm seems like a daunting task. Fortunately, Java defines a class named RuleBasedCollator. You'll define a rule string that indicates the order in which letters should be sorted and let Java do the rest of the work. To create a Java class that extends RuleBasedCollator, the rule string and the constructor function are all you have to write. Here's what the code looks like, including the rule string:
Listing 2. Source code for the Spanish collator
package com.oreilly.xslt;
import java.text.ParseException;
import java.text.RuleBasedCollator;
public class SpanishCollation extends RuleBasedCollator
{
public SpanishCollation() throws ParseException
{
super(traditionalSpanishRules);
}
private static String smallnTilde = new String("\u00F1");
private static String capitalNTilde = new String("\u00D1");
private static String traditionalSpanishRules =
("< a,A < b,B < c,C < ch, cH, Ch, CH " +
"< d,D < e,E < f,F < g,G < h,H < i,I " +
"< j,J < k,K < l,L < ll, lL, Ll, LL " +
"< m,M < n,N " +
"< " + smallnTilde + "," + capitalNTilde + " " +
"< o,O < p,P < q,Q < r,R < s,S < t,T " +
"< u,U < v,V < w,W < x,X < y,Y < z,Z");
}
|
You use the rule string to define the order in which characters are sorted. In
Listing 2, notice the many sets of lowercase and
uppercase characters. The less-than signs between them indicate that a and A appear before b and B. The che and elle are defined along with the character groups, even though they are two characters instead of one. When the Java runtime sorts information, the rule here tells it to process ll as a separate letter between l and m.
The two other special rules define that all uppercase and lowercase combinations of ch appear between c and d and that ñ and Ñ appear between n and o.
Now that you've defined the rules to sort Spanish words, it's time to use that
code in your XSLT 2.0 stylesheet. XSLT 2.0 defines a number of places you can ask
for a custom collation in your stylesheet. For example, the <xsl:sort> element now has a collation attribute. What the XSLT 2.0 spec does not define
is how to use the contents of that attribute. The Saxon processor requires the attribute to have this format:
collation="http://saxon.sf.net/collation?class=com.oreilly.xslt.SpanishCollation;" |
Saxon requires that the name of a custom collation class have the format http://saxon.sf.net/collation? followed by the keyword class= and the fully-qualified name of the class. The class is loaded at runtime, so it must be accessible from the Java classpath. If another XSLT 2.0 processor supports custom collations (I'm not aware of any, as of October 2007), the format of the collation attribute will be different.
Here's the stylesheet that invokes the com.oreilly.xslt.SpanishCollation class:
Listing 3. Invoking a collation class with an XSLT 2.0 stylesheet
<?xml version="1.0"?>
<!-- custom-collation-spanish.xsl -->
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="html"/>
<xsl:variable name="words" as="xs:string*"
select="wordlist/word"/>
<xsl:variable name="normally_sorted_words" as="xs:string*">
<xsl:perform-sort select="$words">
<xsl:sort select="."/>
</xsl:perform-sort>
</xsl:variable>
<xsl:variable name="usefully_sorted_words" as="xs:string*">
<xsl:perform-sort select="$words">
<xsl:sort select="."
collation="http://saxon.sf.net/collation?class=com.oreilly.xslt.SpanishCollation;"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:template match="/">
<html>
<head>
<title>Sorting with a custom collation</title>
</head>
<body style="font-family: sans-serif; font-size: 12pt;">
<h1>Sorting with a custom collation</h1>
<p>Here is a table that uses a <i>custom collation</i>
to sort words according to the traditional rules of
Spanish.</p>
<table cellpadding="5" width="50%"
style="font-weight: bold;">
<tr style="font-size: 120%; font-style: italic;
text-align: center;">
<td>Original words</td>
<td>Normally sorted words</td>
<td>Spanish sorted words</td>
</tr>
<xsl:for-each select="1 to count($words)">
<tr style="background: {if (. mod 2 = 1)
then 'gray' else 'white'};
color: {if (. mod 2 = 1)
then 'white' else 'black'};">
<td style="border: solid white 6px;">
<xsl:value-of
select="., '. ', subsequence($words, ., 1)"
separator=""/>
</td>
<td style="border: solid white 6px;">
<xsl:value-of
select="index-of($words,
subsequence($normally_sorted_words, ., 1))"/>
<xsl:text>. </xsl:text>
<xsl:value-of
select="subsequence($normally_sorted_words, ., 1)"/>
</td>
<td style="border: solid white 6px;">
<xsl:value-of
select="index-of($words,
subsequence($usefully_sorted_words, ., 1))"/>
<xsl:text>. </xsl:text>
<xsl:value-of
select="subsequence($usefully_sorted_words, ., 1)"/>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
You invoke the stylesheet engine with this command:
java net.sf.saxon.Transform spanish-words.xml custom-collation-spanish.xsl |
This command writes the results to standard output. If you'd like to capture those results in a file, use the -o option. The command java net.sf.saxon.Transform -o results.html words.xml custom-collation-spanish.xsl writes the results to the file results.html.
The stylesheet creates three sequences of strings. The first is the list
of words from the XML input file. The second sequence is the list of words
sorted with the default sorting algorithm. The third uses the custom
Spanish collation function. The <xsl:for-each> element writes the word lists to an HTML table.
The HTML document looks like Figure 1:
Figure 1. The list of words, sorted in two different ways

Notice several XSLT 2.0 features here. First of all, you use the <xsl:perform-sort> element to sort the items in the sequence variables. That tells the XSLT 2.0 processor to process the <xsl:sort> elements against the values in the sequence. You use <xsl:sort> to invoke the custom collation class.
The <xsl:for-each> element iterates from 1 to
the number of words in the word list. For the rows of the table, every other row
has a gray background with white text. To generate this code, you use attribute
value templates (the code inside the curly braces { }) with the current value of the context item. Each table row is defined like this:
<tr style="background: {if (. mod 2 = 1)
then 'gray' else 'white'};
color: {if (. mod 2 = 1)
then 'white' else 'black'};">
|
If dividing the context item by 2 leaves a remainder of 1, the stylesheet generates the CSS code background: gray; color: white;. Otherwise, the cells in the current row have a white background and black text. The mod operator is extremely useful for cycling through a set of values.
Finally, notice that the first column numbers the words in the order they appear in the XML source document. For the other two columns, each word has the same number it had in the first column, regardless of where it appears. In the example, chihuahua is the fifth word in the original word list, so it always has the number 5 beside it. The word chihuahua appears in the third row of column 2 and the fourth row of column 3, but it is displayed as word number 5. Here's how to do that:
<td style="border: solid white 6px;">
<xsl:value-of
select="index-of($words,
subsequence($normally_sorted_words, ., 1))"/>
<xsl:text>. </xsl:text>
<xsl:value-of
select="subsequence($normally_sorted_words, ., 1)"/>
</td>
|
Each cell contains three things; a number, a period and the value of the current
word. Generating the number is the difficult part. For columns 2 and 3 (the code
here handles column 2), you have to find the index of the word in the original word list (stored in the variable $words). To do that, use the index-of() and subsequence() functions. I use subsequence() to retrieve the current word from the sorted sequence. The second argument to subsequence() is the starting position; the dot represents the context item. The third argument is the number of items to return. Given that word, index-of() returns its position in the original sequence.
Comparing strings with a German collation
The next example uses a custom collator to compare German words. The complication here is that four German characters can have more than one representation (note the upper- and lowercase letters):
| Single character | Two-character equivalent |
|---|---|
ä (a with an umlaut) or Ä (A with an umlaut) | ae or
AE |
ö (o with an umlaut) or Ö (O with an umlaut) | oe or
OE |
ß (sharp s) | ss |
ü (u with an umlaut) or
Ü (U with an umlaut) | ue or
UE |
Sometimes, you want to consider these characters as equal. In other words, the
word Strasse (street) is identical to the word Straße. Obviously a character-by-character comparison using
the standard collation says these are not equal, so you'll need to use a custom collation.
As with the Spanish collation, you simply define rules that say how to compare characters:
Listing 4. Source code for the German custom collation
package com.oreilly.xslt;
import java.text.ParseException;
import java.text.RuleBasedCollator;
public class GermanCollation extends RuleBasedCollator
{
public GermanCollation() throws ParseException
{
super(traditionalGermanRules);
}
private static String sharpS = new String("\u00DF");
private static String uppercaseUmlautA = new String("\u00C4");
private static String lowercaseUmlautA = new String("\u00E4");
private static String uppercaseUmlautO = new String("\u00D6");
private static String lowercaseUmlautO = new String("\u00F6");
private static String uppercaseUmlautU = new String("\u00DC");
private static String lowercaseUmlautU = new String("\u00FC");
private static String traditionalGermanRules =
("< a,A " +
"<" + lowercaseUmlautA + "=ae " +
"<" + uppercaseUmlautA + "=AE " +
"< b,B < c,C < d,D < e,E < f,F " +
"< g,G < h,H < i,I < j,J < k,K " +
"< l,L < m,M < n,N < o,O " +
"<" + lowercaseUmlautO + "=oe " +
"<" + uppercaseUmlautO + "=OE " +
"< p,P < q,Q < r,R < s,S " +
"< ss=" + sharpS +
"< t,T < u,U " +
"<" + lowercaseUmlautU + "=ue " +
"<" + uppercaseUmlautU + "=UE " +
"< v,V < w,W < x,X < y,Y < z,Z");
}
|
The code in Listing 4 uses the equals sign to indicate that certain characters and character groups are equivalent. (The Spanish collator used the less-than sign.) The string of rules includes items such as ... s,S < ss=ß < t, T .... This tells the collator that the two characters ss are equal to the single character ß. (The strings that represent each of the special characters makes the code easier to read.)
The stylesheet looks like this:
Listing 5. A stylesheet that uses a German collation function
<?xml version="1.0"?>
<!-- custom-collation-german.xsl -->
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="html"/>
<xsl:variable name="wordgroup1" as="xs:string*"
select="wordlist/wordgroup/word[1]"/>
<xsl:variable name="wordgroup2" as="xs:string*"
select="wordlist/wordgroup/word[2]"/>
<xsl:template match="/">
<html>
<head>
<title>Comparing words with a custom collation</title>
</head>
<body style="font-family: sans-serif; font-size: 12pt;">
<h1>Comparing words with a custom collation</h1>
<p>This table illustrates what happens when you use
a <i>custom collation</i> to compare German words:</p>
<table cellpadding="5" width="50%"
style="font-weight: bold; text-align: center;">
<tr style="font-size: 120%; font-style: italic;
vertical-align: bottom;">
<td>First word</td>
<td>Second word</td>
<td>Compared normally</td>
<td>Compared with <br/>German (DIN-2) rules</td>
</tr>
<xsl:for-each select="1 to count($wordgroup1)">
<xsl:variable name="word1"
select="subsequence($wordgroup1, ., 1)"/>
<xsl:variable name="word2"
select="subsequence($wordgroup2, ., 1)"/>
<tr style="background: {if (. mod 2 = 1)
then 'gray' else 'white'};
color: {if (. mod 2 = 1)
then 'white' else 'black'};">
<td style="border: solid white 6px;">
<xsl:value-of select="$word1"/>
</td>
<td style="border: solid white 6px;">
<xsl:value-of select="$word2"/>
</td>
<td style="border: solid white 6px;">
<xsl:value-of
select="if (compare($word1, $word2) = 0)
then 'Equal'
else 'Not equal'"/>
</td>
<td style="border: solid white 6px;">
<xsl:value-of
select="if (compare($word1, $word2,
'http://saxon.sf.net/collation?class=com.oreilly.xslt.GermanCollation;')
= 0)
then 'Equal'
else 'Not equal'"/>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
This stylesheet is very similar to the Spanish collation example. The main difference here is that you're using the compare() function directly. Here is the list of German words:
Listing 6. A list of German words to compare
<?xml version="1.0"?>
<!-- german-words.xml -->
<wordlist>
<wordgroup>
<word>Berlin</word>
<word>Stuttgart</word>
</wordgroup>
<wordgroup>
<word>StraÃe</word>
<word>Strasse</word>
</wordgroup>
<wordgroup>
<word>Böblingen</word>
<word>Boeblingen</word>
</wordgroup>
<wordgroup>
<word>München</word>
<word>Muenchen</word>
</wordgroup>
</wordlist>
|
Each <wordgroup> contains two words. The first two words are always different, while the other groups are all equal with the German collation. The stylesheet creates two sequences. The first sequence ($wordgroup1) contains all of the first <word> elements, while the second sequence ($wordgroup2) contains all of the second words. Within the <xsl:for-each> element, the stylesheet iterates through the sequences of words. For each row in the table, the two words are stored in the variables $word1 and $word2.
Here's the command to invoke the stylesheet:
java net.sf.saxon.Transform german-words.xml custom-collation-german.xsl |
Figure 2 shows the results:
Figure 2. Comparing words with German rules

The last example will sort a list of bands and musicians. You need a custom
collator here to sort items while ignoring the text "The ". In other words, you want The
Beatles to sort as if it were the string Beatles. You
can't do this with a set of rules because you can't include a space as part of the
characters you want to sort in a special way. You can't just remove the
characters The for a comparison because they might be
the start of a band name such as Them or They Might Be Giants. Another complication is the band The The. You need to sort this item as if the first "The " weren't there.
To further complicate the requirements, you will ignore case when you make the
comparisons between artist names. If a band was named the
e.e. cummingses, you want to sort them with the artists who use a capital E in their name.
You must also make sure to ignore the characters only if they appear at the first
of the string. In other words, you should not process Toad
the Wet Sprocket by removing the characters "the ".
Although this error case only happens if you mistakenly assign any artistic merit
to Toad the Wet Sprocket, your code should be robust enough to ignore the
characters "The " only if they occur at the first of the string.
The string you want to ignore is the four characters "The ". You can't use a RuleBasedCollator here because you want to ignore those characters, not define a place for them in a sorting order. The good news is that you can use the Java Comparator interface. Even better, you only have to implement one method, compare(). The custom collation extension looks like this:
Listing 7. Source code for the custom collator
package com.ibm.dw;
import java.util.Comparator;
public class TheTheCollation implements Comparator<String>
{
public int compare(String stringOne, String stringTwo)
{
return stringOne
.replaceFirst("^(T|t)(H|h)(E|e) ", "")
.compareToIgnoreCase(stringTwo.replaceFirst("^(T|t)(H|h)(E|e) ", ""));
}
}
|
The code is very simple; you simply override the compare() function. Given two strings, you remove "The " if needed, then call the existing Java comparison
function. Because you want to ignore case, use compareToIgnoreCase() instead of the more basic compare().
The String.replaceFirst() function removes the
characters "The " at the first of the
string. (As you can see, the class is named in honor of The The.) The
important thing is that the first argument to the replaceFirst() function is a regular expression. The regular expression "^(T|t)(H|h)(E|e) " only matches "The " if it occurs at the first of the string; the caret anchors the regular expression to the start of the string. The parenthetical groups specify any combination of uppercase and lowercase letters. This is an elegant way to get around using a combination of functions such as String.startsWith() and String.substring().
Here is the list of bands and musicians:
Listing 8. A list of bands and musicians
<?xml version="1.0"?>
<!-- artists.xml -->
<artistlist>
<artist>The Clash</artist>
<artist>They Might Be Giants</artist>
<artist>Eminem</artist>
<artist>The Whigs</artist>
<artist>X</artist>
<artist>Talking Heads</artist>
<artist>The Rutles</artist>
<artist>Them</artist>
<artist>The Yardbirds</artist>
<artist>the e.e. cummingses</artist>
<artist>Romeo Void</artist>
<artist>The B-52's</artist>
<artist>B. B. King</artist>
<artist>The The</artist>
<artist>Beastie Boys</artist>
<artist>The Beatles</artist>
</artistlist>
|
As you can see, eight of the artists are bands whose names begin with "The."
When you look for something by The Beatles in a music store, you don't look under
"T" to find their music. You need a custom collation function that says The Beatles should appear before Eminem.
This stylesheet is virtually identical to the first stylesheet; the main difference here is that you use a different custom collator. Here's the fragment of the stylesheet that invokes the Java class:
<xsl:variable name="usefully_sorted_artists" as="xs:string*">
<xsl:perform-sort select="$artists">
<xsl:sort select="."
collation="http://saxon.sf.net/collation?class=com.ibm.dw.TheTheCollation;"/>
</xsl:perform-sort>
</xsl:variable>
|
You invoke the stylesheet engine with this command:
java net.sf.saxon.Transform artists.xml custom-collation-thethe.xsl |
The results look like Figure 3:
Figure 3. The list of artists, sorted in two different ways

A final enhancement: Adding mouse effects
Writing the original position of each term in each column is a useful way to illustrate the differences between the collations. As a final exercise, look at how to enhance the stylesheet so that moving the mouse over an artist's name in one column highlights that artist's name in the other two columns. The three steps to add this to the generated HTML page are:
- Give every table cell an ID based on a naming convention.
- Define the
id,onmouseoverandonmouseoutattributes for each table cell. - Define JavaScript functions that highlight the appropriate cells when the mouse moves into a table cell and unhighlight the same cells when the mouse moves out.
You need the IDs of the table cells to find the appropriate cells. Those IDs
will be in the format col1-1, col2-1 and col3-1. The cell with the ID
col1-1 is the first term in the first column. The cells with the IDs col2-1 and col3-1 are the cells in columns 2 and 3 that have the same text. That means the IDs for every cell in columns 2 and 3 end with the same number you generate for the cell itself. In this example, The Clash is the first artist in the XML file, so every occurrence of The Clash has the number 1 beside it. The IDs of the cells containing The Clash are col1-1, col2-1 and col3-1.
Now that you know how the IDs work, you'll code the onmouseover and onmouseout attributes to
use the last part of the ID. You'll create two JavaScript functions, highlightCells() and unhighlightCells(). Given the number 3, highlightCells() will highlight the three elements with IDs of col1-3, col2-3 and col3-3. A table row in the generated HTML document looks like this:
<td id="col1-1"
onmouseover="highlightCells('1');"
onmouseout="unhighlightCells('1');"
style="border: solid white 6px;">1. The Clash</td>
|
Finally, you need the JavaScript functions. They look like this:
Listing 9. JavaScript functions for highlighting table cells
<title>Sorting with a custom collation</title><script language="JavaScript">
<!--
function highlightCells(rowNum)
{
el = document.getElementById('col1-' + rowNum);
el.style.border='solid #E15119 6px';
el = document.getElementById('col2-' + rowNum);
el.style.border='solid #E15119 6px';
el = document.getElementById('col3-' + rowNum);
el.style.border='solid #E15119 6px';
}
function unhighlightCells(rowNum)
{
el = document.getElementById('col1-' + rowNum);
el.style.border='solid white 6px';
el = document.getElementById('col2-' + rowNum);
el.style.border='solid white 6px';
el = document.getElementById('col3-' + rowNum);
el.style.border='solid white 6px';
}
--></script></head>
|
The JavaScript code uses the getElementById()
function to find the elements with the given ID. To highlight the cells, it
sets the border color of the cells to the official developerWorks shade of
orange. To unhighlight the cells, it resets the border color to white.
Figure 4 shows how the HTML looks when you place the mouse over any cell that contains The Clash:
Figure 4. JavaScript effects to highlight artists across columns

Here's the complete stylesheet:
Listing 10. An XSLT stylesheet that generates mouse effects
<?xml version="1.0"?>
<!-- custom-collation-advanced.xsl -->
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="html"/>
<xsl:variable name="artists" as="xs:string*"
select="artistlist/artist"/>
<xsl:variable name="normally_sorted_artists" as="xs:string*">
<xsl:perform-sort select="$artists">
<xsl:sort select="."/>
</xsl:perform-sort>
</xsl:variable>
<xsl:variable name="usefully_sorted_artists" as="xs:string*">
<xsl:perform-sort select="$artists">
<xsl:sort select="."
collation="http://saxon.sf.net/collation?class=com.ibm.dw.TheTheCollation;"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:template match="/">
<html>
<head>
<title>Sorting with a custom collation</title>
<script language="JavaScript"><xsl:comment> function highlightCells(rowNum) { el = document.getElementById('col1-' + rowNum); el.style.border='solid #E15119 6px'; el = document.getElementById('col2-' + rowNum); el.style.border='solid #E15119 6px'; el = document.getElementById('col3-' + rowNum); el.style.border='solid #E15119 6px'; } function unhighlightCells(rowNum) { el = document.getElementById('col1-' + rowNum); el.style.border='solid white 6px'; el = document.getElementById('col2-' + rowNum); el.style.border='solid white 6px'; el = document.getElementById('col3-' + rowNum); el.style.border='solid white 6px'; } </xsl:comment></script>
</head>
<body style="font-family: sans-serif; font-size: 12pt;">
<h1>Sorting with a custom collation</h1>
<p>Here is a table that uses a <i>custom collation</i>
to sort data ignoring the characters
<span style="font-family: monospace;">The </span>
at the start of the data:</p>
<table cellpadding="5" width="50%"
style="font-weight: bold;">
<tr style="font-size: 120%; font-style: italic;
text-align: center;">
<td>Original data</td>
<td>Normally-sorted data</td>
<td>Usefully-sorted data</td>
</tr>
<xsl:for-each select="1 to count($artists)">
<tr style="background: {if (. mod 2 = 1)
then 'gray' else 'white'};
color: {if (. mod 2 = 1)
then 'white' else 'black'};">
<xsl:variable name="col2Index"
select="index-of($artists,
subsequence($normally_sorted_artists, ., 1))"/>
<xsl:variable name="col3Index"
select="index-of($artists,
subsequence($usefully_sorted_artists, ., 1))"/>
<td id="col1-{.}"
onmouseover="highlightCells('{.}');" onmouseout="unhighlightCells('{.}');"
style="border: solid white 6px;">
<xsl:value-of
select="., '. ', subsequence($artists, ., 1)"
separator=""/>
</td>
<td id="col2-{$col2Index}" onmouseover="highlightCells('{$col2Index}');" onmouseout="unhighlightCells('{$col2Index}');"
style="border: solid white 6px;">
<xsl:value-of select="$col2Index"/>
<xsl:text>. </xsl:text>
<xsl:value-of
select="subsequence($normally_sorted_artists, ., 1)"/>
</td>
<td id="col3-{$col3Index}" onmouseover="highlightCells('{$col3Index}');" onmouseout="unhighlightCells('{$col3Index}');"
style="border: solid white 6px;">
<xsl:value-of select="$col3Index"/>
<xsl:text>. </xsl:text>
<xsl:value-of
select="subsequence($usefully_sorted_artists, ., 1)"/>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
You invoke the stylesheet engine with this command:
java net.sf.saxon.Transform artists.xml custom-collation-advanced.xsl |
Notice that the JavaScript code is generated with an <xsl:comment> element. The stylesheet outputs the <script> element, which is followed immediately by the start of the comment. The end of the comment is followed immediately by the end of the <script> element. In the past, some browsers had intermittent errors when the end comment marker (-->) was interpreted as the decrement operator. Structuring the stylesheet this way avoids this error.
Java has very powerful classes that make it easy to change the way sorting works. In this article you looked at three Java classes that support Spanish, German and a domain-specific type of sorting. The three classes are arguably one line of code each. The ability to invoke these classes from an XSLT 2.0 stylesheet makes it easy to add custom sorting functions for different languages or other requirements. This simple technique can be a great addition to your XML processing tool box.
The author would like to thank Simon St. Laurent of O'Reilly and Associates for allowing you to use the first two code examples and the explanatory text for the different sorting algorithms. They are taken from the second edition of XSLT (ISBN 0596527217). The second edition of the book includes several extensions to XSLT, including code written in Java, C#, Python, Ruby and JavaScript. You can preorder a copy of the book today at amazon.com.
| Description | Name | Size | Download method |
|---|---|---|---|
| XML, XSLT, HTML and Java samples from this article | x-xsltsort-samples.zip | 16KB | HTTP |
Information about download methods
Learn
- Extensible
Stylesheet Language Family page: Visit the W3C to find all the new recommendations
for XSLT 2.0, XPath 2.0, XQuery 1.0 and other specifications that became official recommendations of the W3C on 23 January 2007. The entire language definition is spread across several specs.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology
bookstore: Browse for books on these and other technical topics.
Get products and technologies
- Saxon XSLT 2.0 processor: Get Michael
Kay's excellent tool, available on SourceForge.
- The AltovaXML page on Altova's Web site:
Find more information on the XSLT processor that Altova (the maker of XML Spy and
other popular products), makes available free of charge. The processor supports XSLT 1.0, XSLT 2.0 and XQuery 1.0. Although it's not open source, the license does allow you to embed the XSLT engine in your products.
- IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.

Doug Tidwell is a strategist for IBM Software Group. As a technology evangelist, his job focuses on emerging technologies such as SCA, SDO and XForms, helping people use tomorrow's technologies to solve today's problems. He is the author of O'Reilly's XSLT, a second edition of which should be available in bookstores in time for Valentine's Day 2008 (ISBN 0596527217). A speaker at the first XML conference in 1997, he has worked with XSLT for about a decade, including some of the earliest approaches to XML transformations. He is currently writing a book about the inventor of the fruit smoothie, Dutch citrus merchant Julius of Orange. Doug lives in Chapel Hill, North Carolina, with his wife, food writer Sheri Castle, their daughter Lily, and their dog Domino, The Supine Canine.





