In terms both of power and simplicity, the combination of XML and XSL has revolutionized data storage and manipulation in a way not seen since the early days of the SQL database language. XML provides a clear and independent way of recoding data that is easily shared and understood. Similarly, many people feel that XSL is also easy to read, write, and understand. Clearly, this powerful duo are essential knowledge for everyone involved in the technology industry.
The broad scope and small learning curve associated with the basic elements of XSL transformation sometimes acts as a double-edged sword -- yielding broad usage of the core technology but dissuading the majority of developers learning XSL from investigating and using its more advanced and powerful features.
This article is written for developers who already have a basic understanding of XML and XSL, and are ready to build on this knowledge. If you are unfamiliar with these technologies, you can find several good introductory articles and tutorials on developerWorks and other Web sites. The article shows you how to use extensions -- a technique present in most XSL processors -- which allows virtually unlimited expansion of the existing capabilities of XSL's core features. This article includes a general description of how to write extensions with code, followed by three specific and widely applicable examples.
It must first be understood that XSL, like all other programing languages, is merely a grammar specification in need of an implementation. Fortunately, XSL has become very popular and there are several implementations to choose from. Extensions are not a required feature of the grammar and, thus, their syntax is not as well defined as the other constructs of the language. They are, however, now included in the W3C's XSLT Recommendation. The examples in this article will follow the format of that recommendation.
What makes these extensions so significant when XSL can already do so much?
What XSL gains in simplicity and broad ability for transformation is often
lost in efficiency and ability to do anything unrelated to transformation.
For instance, suppose you have an XML document that lists 5,000 users of your
system. The user name, real name, and e-mail address of each of these users is
given under a
Users node within the XML. You later append
to the XML document an
Interests node in a separate subtree
of the XML with user names grouped by particular interests such as acrobatics,
bicycling, computers. You hope eventually to transform the data into an HTML
page that groups users by interests and presents e-mail contacts for people of
similar interests. XSL can do this handily with the following code:
Listing 1. User interest XSL transformation without extensions
<xsl:for-each select="Interests/Interest"> <b><xsl:value-of select="@InterestName"/></b> <ul> <xsl:for-each select="User"> <xsl:variable name="userName" select="@userName"/> <xsl:variable name="userNode" select="/Root/Users/User[@userName = $userName]"/> <li> <xsl:value-of select="$userNode/@realName"/> <xsl:value-of select="concat(' ',$userName/@email"/> </li> </xsl:for-each> </ul> </xsl:for-each>
Unfortunately, the way the transform executes, the entire list of 5,000 users will be examined for each user in each interest category. This is far more work than you want your server to do for each request to this Web page.
Extensions provide a convenient way around this and several other possible hang-ups that you may encounter when using XSL on nontrivial data sets. In the above example, a simple hashmap or binary search tree could have easily solved the problem, but implementing one of these data structures in XSL would be inconvenient and unnecessary. Extensions to a language that has more appropriate data types will more easily fix the problem. (Incidentally, the code for this fix is given in the first example below).
It would be a daunting task to list all of the XSL processors and their methods for implementing extensions. This article uses the Java version of Xalan -- a popular and freely available XSL processor from the Apache Project -- to describe the specifics of writing extensions. All of the examples are targeted to that platform. (Xerces, another Apache product, is used as the XML parser. You can download Xalan and Xerces from links in Resources.) Most other popular XSL implementations also provide a mechanism for extensions, but you'll need to consult their documentation to find any differences in approach.
To simplify working with XML and XSL, I have also provided Java code for some of the more common XML manipulations. This code, along with the code and data necessary to run all of the examples, is provided in a zip file in Resources. This file does not, however, include external libraries such as Xalan and Xerces. After you obtain those libraries by following links in Resources (versions: Xalan - Java 2.3.1; Xerces 1.4.4), place their jar files in the lib directory extracted from the zip file. For those readers who wish to jump directly to the examples, all Java code is in the src directory, XML data in the XML directory, XSL transforms in the XSL directory, batch files in the bin directory, and compiled code in the lib directory.
In order to call a method from XSL, that method must first be written and its compiled form placed in the classpath of the application that is performing the XSL transformation. Methods may be of your own design, supplied by the standard libraries of Java, or taken from other Java libraries. In some XSL processors, like Xalan, there are even extension methods written directly into the processor.
The first thing to be aware of when you write or use these methods is the mapping of data types from XSL to Java and back again. The following table provides a reference to these mappings in Xalan.
Tables 1,2. Data Type Mappings
Once your methods are written, incorporating them into XSL is fairly simple.
The first step is to declare a namespace for your methods in the
<xsl:stylesheet> element. For example, if you want to run methods from a class called
foo in package
com.myCompany.XSLExtensions, the root of your XSL file would contain the following line:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:extension="xalan://com.myCompany.XSLExtensions.foo"/>
If you later want to call a method from the class you have declared, use the namespace
declared in the
<xsl:stylesheet> element. Continuing the example, in order
to run a method called
bar() that takes a
String as a parameter and returns a
String, you might use code like the following:
<xsl:variable name="myParam" select="'theParameter'"/>
<xsl:variable name="myResult" select="extension:bar($myParam)"/>
It's that simple. The
myResult variable now contains the result of calling
bar from your Java class. To obtain a better grasp on the technique, work
through the following three examples.
The beginning of this article presented a scenario in which the use of standard XSL techniques for looking up data in distinct subtrees of an XML document used excessive amounts of compute time. A simple way around this is to create a general purpose hashtable that provides a mechanism for storing and retrieving strings. Since hashtables are built directly into the standard Java libraries, writing an extension that uses them should be painless.
The hashtable Java code is found in the src/StringHash.java file contained in the zip file in Resources. It has two methods of note:
addString(String tableName, String key, String value)
getString(String tableName, String key)
The first method allows the creation of hashtables associated with a table name plus the insertion of string values mapped to a key. The second method provides a means for retrieving the stored values.
An XML data source is found in the XML/user_interests.xml file (see the zip file in Resources). It follows the form:
Listing 2. User interest XML fragment
<Users> <User userName="aragon" realName="Aragon" email="aragon@middleEarth.fict"/> <User userName="boromir" realName="Boromir" email="boromir@middleEarth.fict"/> ... </Users> <Interests> <Interest name="archery"> <User userName="legolas"/> ... </Interest> ... </Interests>
Two XSL files are given in the zip file in Resources for producing the Web page result. The first is found in the XSL/user_interests_xsl_only.xsl file and follows the code shown in Listing 1. The second is found in the XSL/user_interests_extensions.xsl file which modifies the former XSL file to the code shown in Listing 3. To easily run the XSL conversion on Windows, use the bin/Example_1*.bat batch files. Unix and Mac developers should have little trouble running the examples after examining these batch files.
Listing 3. User interest XSL transformation with extensions
<xsl:stylesheet xmlns:lookup="xalan://StringHash"> ... <xsl:for-each select="Users/User"> <xsl:value-of select="lookup:addString('realName', string(@userName), string(@realName))"/> <xsl:value-of select="lookup:addString('email', string(@userName), string(@email))"/> </xsl:for-each> ... <li> <xsl:value-of select="lookup:getString('realName',$userName)"/> <xsl:value-of select="concat(' - ',lookup:getString('email', $userName))"/> </li>
The current XSL standard uses the XPath technology to perform all of its pattern
matching. While XPath provides a compact and elegant way of traversing an XML
tree, its pattern matching functions have a rather limited capability. (The entirety of the string functions in XPath that performs boolean matching is:
contains(). You can also automatically parse strings into numbers.)
Regular expressions provide much
richer pattern matching across strings of text, but are as easy to use as XPath when traversing
a data structure such as an XML tree. For more detailed information on regular expressions, see Resources.
The optimum solution is to combine the two technologies. The next version of the XSL transformation language, which is still under development and review, includes a proposal to add regular expressions to the language. For developers who want to use the technology now, extensions provide the mechanism for doing so.
The source code for the Java methods accessed as extensions can be found in the src/PatternMatcher.java file contained in the zip file accompanying this article. These methods make use of external code that is not contained within the standard Java libraries, thus this example also shows what steps are necessary to link external jar files for use in extensions. You will need to obtain he regular expression jar file provided by GNU (see Resources) and place it in the extracted lib directory, in order to get the examples to work. Feel free to find another regular expression package and modify the code to fit it.
For the second example, suppose you wish to generate a list of users from the
original source, for which the first and last names of those users are known.
While this is a fairly trivial example, it is not difficult to imagine more complicated
examples working on groups of users, product catalogs, or reference databases.
A simple way to do this is to look through the real names of the
users and match those names which consist of one name followed by a space followed
by another name. The regular expression for this is
The XSL now contains the lines in Listing 4.
Listing 4. Regular expressions in XSL
<xsl:stylesheet xmlns:regexp="xalan://PatternMatcher"> ... <ul> <xsl:for-each select="Users/User[regexp:containsMatch('\w* \w*', string(@realName))]"> <li> <xsl:value-of select="@realName"/> </li> </xsl:for-each> </ul>
Similar to Example 1, this example can be executed through the bin/Example_2.bat file. You can find the XSL file used at XSL/user_last_names.xsl. The possibilites for extension on this technique are infinite.
Internationalization, sometimes referred to as localization or natural language support, is the method by which developers make their products readable across languages and cultures. It is particularly important in the context of XML translation if the product of the transformation is a set of Web pages that targets a broad audience. While topic of internationalization is too broad to introduce in a comprehensive way in the context of this example, you can find good treatment of it in other developerWorks articles referenced below.
This example makes use of Java's built-in technique of handling internationalization through the use of resource bundles. If you are unfamiliar with the topic, I encourage you to read the referenced articles. Suffice it to say for now that resource bundles consist of a collection of files that contain translations for different regions or, more precisely, locales. Web servers can read the preferred locale of a user when that user requests a Web page and, using these resource bundles, can respond appropriately. XML-based applications can also target results to a specific locale.
The potential uses of the code in this example are just as wide and varied as the previous one. In order to demonstrate the technology, the code executed by the bin/Example_3.bat file creates three Web pages from the sample XML users data. The three resulting pages represent the same view of the data, but are presented in three different languages. The translations used can be found in properties files in the lib directory extracted from the zip file.
Even when considering the most basic components of XSL transformations, their capabilities are remarkable. When this core is extended with extensions to encompass the power of modern programming languages, the possibilities become virtually limitless. The ideas and examples presented above are but the tip of the iceberg, and I leave it to you, after gaining an undestanding of what is presented here, to explore the many remaining possibilites.
|Source code for this article||x-callbk/XSL_Callbacks_Code.zip||1588 KB||HTTP|
- Download the zip file containing all of the code related to this article.
- Download Xerces (XML parser and DOM implementation) and Xalan (XSL transformer) from the Apache XML Project.
- Read articles and explore tutorials on XML/XSL:
- Mark Colan's article on XSL: "Putting XSL transformations to work" (developerWorks, October 2001)
- Dan Day's article on XSL: "Hands-on XSL" (developerWorks, March 2000)
- Get answers to your questions about regular expressions with this guide.
- For more on internationalization and resource bundles, read my developerWorks article "Harnessing internationalization."
- Finally, take a look at IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.