XSLT 2.0, the latest specification released by the W3C, is a language for transforming XML documents. It includes numerous new features, with some specifically designed to address shortcomings in XSLT 1.0. In this collection of articles, you'll get a high-level overview and an in-depth look at XSLT 2.0 from the point of view of an XSLT 1.0 user who wants to fix old problems, learn new techniques, and discover what to look out for. Examples derived from common applications and practical suggestions are provided if you wish to upgrade. To help you begin to use XSLT 2.0, migration techniques will be provided.
Conformance of XSLT processors
Software vendors that implement XSLT 2.0 must conform to the specifications issued by the W3C (see Resources for links), but there are allowable differences. As with XSLT 1.0, numerous details are implementation-defined, which means that each implementer gets to choose what to do. XSLT 2.0 also has three major modules, independent of each other, which the vendor can choose to implement as extra features: Serialization, Schema Awareness, and Backwards Compatibility. Each feature module that the vendor implements must conform to the specification, though there are occasions of additional vendor choices within the module. In the case of the Serialization feature, the 2.0 conformance picture is much clearer than it was for 1.0, where not only was serialization itself optional, but many of its characteristics were optional as well (should statements rather than must statements).
The number of detail choices the vendor can make has been reduced for 2.0, mainly by requiring certain errors in constructing the result to be flagged as errors (as opposed to allowing structural changes as a surprise in the result). There are still quite a few implementation-defined items, and you'll find a convenient list of the XSLT choices as an appendix to the XSLT 2.0 spec. Several of these implementation-defined choices allow support for languages and locales of the vendor's choice. You can read more about that later, after the sections describing each feature module.
Do you need Backwards Compatibility?
The main goal for the Backwards Compatibility (BC) feature is to allow a stylesheet that worked with a 1.0 processor to also work (more or less) with a 2.0 processor. A 2.0-conforming processor is not 100% compatible with the 1.0 spec; thus it is not the same as if it called old 1.0 code. It's still a 2.0 processor! That should not be thought of as a problem, and it might be an opportunity in some cases (by giving you a choice of two behaviors). This series of articles intends to explain all aspects of compatibility across versions.
XSLT 1.0, which you should be familiar with, was designed with the knowledge that future versions would be specified. The most essential evidence of this fact is the requirement that all stylesheets have a version attribute on the outer xsl:stylesheet element. All XSLT processors, including 1.0 processors, are required to support Forwards Compatibility (FC), which is where the version number on the stylesheet is higher than the version of the processor and the processor can handle the parts it recognizes. When FC is in effect on a stylesheet, the processor must be more lax about unknown attributes, and unknown values for known attributes, on XSLT declarations and instructions. New XSLT elements must also be ignored without raising an error. Note, however, that in a few cases, an instruction that existed in 1.0 expanded for 2.0 in a way other than through its attributes. You might use FC in your existing stylesheets to prepare for 2.0, but please keep reading this series for guidance.
A 2.0 processor might support XSLT 2.0 (and FC) only, declining to implement the BC feature. This means that any occurrence of version="1.0" in the stylesheet will cause an error, except on xsl:output, where it sets the version of XML.
Part 2 of this series presented some decision factors regarding a wholesale switch to 2.0, and this part refines that information. With the BC feature in place, the processor will not only accept version="1.0" on the top-level xsl:stylesheet element, but also on subsidiary elements, where its effect will apply to just that element and its descendants. Various rows in the legacy review table present ways that locally-scoped BC can address some behavioral differences between the two versions.
Schema awareness is another optional feature that some XSLT 2.0 conforming processors support. If this feature is supported, then the additional syntactic items and their effects on XSLT transformations are well-defined and interoperable. This feature is mainly used for error checking. It is a tool built into the language that would enable a stylesheet writer to validate the schema type of atomic values and nodes in both temporary (in a variable) and final states. It is also used to select nodes of a specific schema type from input sources and temporary trees, to create atomic values beyond the subset of built-in atomic types defined in the XML Schema specification (see Resources), to confirm the type of nodes and atomic values using the operator instance of, and to create schema validated nodes in both temporary and final states. In Part 1 of this series, Table 1 introduced the 10 syntactic items that can make use of schema awareness.
Do not assume that this feature is only required if you work with XML schemas. During the transformation, schemas (either from an external source or embedded in the stylesheet) are only necessary if you're interested in validating and selecting typed nodes or working with user-defined schema types. Without a schema, this feature is still needed if you're working with built-in atomic types such as xs:nonNegativeInteger and xs:token. A non-schema-aware processor throws a type error if it encounters these types in the stylesheet. If you're only interested in working with typed atomic values and those values only need to be validated with the more common built-in schema types, such as xs:float and xs:boolean, then the schema awareness feature is not necessary. For a complete list of types supported by a non-schema-aware processor, read section 3.13 in the XSLT 2.0 specification (see Resources for a link to XSLT 2.0 at the W3C site).
Serialization is an optional feature that a conforming XSLT processor does not have to support, though this might surprise some people. Most processors at least support a subset of the Serialization feature. If an XSLT 2.0 processor does claim full conformance to this feature, then it must implement all attributes defined in the xsl:output and xsl:character-map declarations and it must be able to serialize to XML, HTML, XHTML, and text output file formats (or output methods such as byte streams).
Processors are permitted to extend serialization functionalities. Thus, a processor can support custom output methods, extension attributes to control some aspect of serialization, or support additional values in existing attributes (if permitted by the specification). To see if the baseline serialization requirement is sufficient for your needs, review your existing stylesheets and any post-transformation processing mechanisms that you might have. Furthermore, review the documentation provided by the XSLT 2.0 processor vendor to see if a serialization extension could replace an existing 1.0 extension mechanism, thus easing your transition to XSLT 2.0. Also look for the support of XML 1.1, normalization forms (such as NFD), and @disable-output-escaping in the documentation because these are not part of the baseline Serialization feature requirements. Do not disqualify a processor simply because @disable-output-escaping is not supported. This feature is deprecated in XSLT 2.0 and can easily be replaced by the standard support of xsl:character-map. (See Part 1 of this series for more information about character maps.)
Buyer take note: Processors can have other variances
XSLT processors can vary not only on whether they offer any of the modular features just described, but also in smaller extras, and there are certain decisions available to the processor vendor. The allowable variance might mean that one implementation is stricter about raising errors while another recovers (where permissible) rather than throw errors. Following are the details about some implementation-defined aspects that might have greater impact on the transition of your legacy stylesheets.
Do you still need extension mechanisms?
Many XSLT processors expose interfaces or have methods designed to be executed as a component within an application written in a specific application framework. Some processors might also support extension facilities, in the programming language of the application framework, which allows the transformation engine to recognize and process extension functions and instructions during its execution. For example, Xalan-J, a popular open-source XSLT 1.0 processor, supports an extension mechanism that would allow instantiating Java objects and calling Java methods in a stylesheet.
To map values of an XPath expression to the arguments or the return value of a Java method, Xalan-J provides a mapping of XSLT types to Java types. For example, a result tree fragment of a variable reference in the stylesheet would be recognized as an org.w3c.dom.DocumentFragment object in the Java method. If your 1.0 stylesheet makes use of these types of extension facilities, then you should first investigate to see if the function performed by your extension code can be replaced with new features provided by XSLT 2.0 and XPath 2.0. If, for example, you implemented a random() function, which is not supported in F&O (the Functions and Operators spec; see Resources for a link), then you need to find an XSLT 2.0 processor that provides extension facilities that minimize your transition effort to that processor.
EXSLT is a pseudo-standard that offers extensions with functionalities lacking in XSLT 1.0. Many existing 1.0 processors provide selective support of EXSLT extensions. Though many extensions overlap with the F&O library or can easily be rewritten in XSLT 2.0 syntax, certain extensions, such as evaluate() and script, require the processor to provide compatible support beyond the standard XSLT 2.0. Another kind of extension provided by some XSLT processors allows connecting to SQL databases and retrieving data.
If your stylesheet cannot function without these extensions, then check the vendor's documentation.
Do you need the namespace axis?
In XSLT 2.0, the support of the namespace axis (that is namespace::nodetest) is optional. Thus, if your 1.0 stylesheets make extensive use of the namespace axis and you don't intend to modify these to call in-scope-prefixes() and namespace-uri-for-prefix() instead, then verify that the namespace axis is supported in the XSLT 2.0 processor's documentation by default (when running in non-BC mode). If you plan to take advantage of BC, then the namespace axis must be supported.
Accommodating different environments
Other points of variation include:
- Support of various URI schemes and fragment identifiers in URIs. This affects
xsl:import, xsl:include, doc(),andunparsed-text(),among others. - Support for XML 1.1. This does not affect input source documents, but requires support if you wish to serialize to XML 1.1 documents, or your stylesheet is an XML 1.1 document.
- Support of various character sets, normalization forms for strings, and collations.
- Language-awareness for numbers and dates, collations, and support of various calendars.
- Support for more decimal precision than the required minimum (including time and duration values).
- The type of information presented by the
trace()function. - Ways in which errors and warnings are handled, including what the
error()function does. For example, a processor can throw a type error as a static error even before the template is executed, provided that the execution of the particular construct would never succeed. - The
collection()function must return a sequence of nodes, and calling the same function multiple times must return the same results, but beyond that the actual functionality is completely implementation-defined.
If any of the preceding points are important to you, keep them in mind when you select a 2.0 processor.
The mechanism for launching a transformation
To launch XSLT within an application, most application development environments expose custom interfaces and custom methods, or support standard transformation APIs such as JAXP (Java API for XML Processing) to execute the transformation. As this article is being written, no known standard interfaces support XSLT 2.0. However, this should not be a major concern unless you use the standard API and are interested in initiating a transformation by specifying new launch options or redirecting additional results (when using xsl:result-document with @href). Otherwise, the processor vendor is likely to provide some support to initiate the processing in the application programming language of its choice. New XSLT 2.0 launch options are the ability to set:
- An initial mode
- An initial named template
- An initial context node
- The base output URI
Existing standard APIs already provide a method to set stylesheet parameter values. You can also set the initial context node if the input source is a node (using a DOM Source, for example). Notice that you cannot set BC mode as a launch option; only stylesheet content can trigger this mode.
Looking at your XSLT legacy code and deciding what you'll need for your 2.0 processor
An XSLT stylesheet still resembles a specialized form of a program, and most of the look and feel has not changed for 2.0. However, many aspects are better because the code is more straightforward. (See Part 1 of this series for illustrations of this point.) This table is intended to help you recognize certain aspects that should change, identifies the replacement, and identifies any permitted differences among 2.0 processors that might be important as you make your choice. The table also helps you plan the transition of your 1.0 code to 2.0.
Table 1. XSLT features that may affect your choice of a 2.0 processor
| Feature you used in 1.0 or objective that you wanted to accomplish in XSLT | Simplest 2.0 solution | Implications for choice of 2.0 processor |
|---|---|---|
Using named templates to perform generic operations, such as:
and various other generic operations. Of particular note is the use of recursion of a named template to:
and various other generic operations.
| New functions and XPath capabilities handle most of these. For the examples given, use:
You might use | BC mode allows certain automatic type conversions to occur, as well as automatic selection of the first member of a node-set. In particular, this applies to function arguments, and you will likely use more functions. |
Having a named template that you wish to call right away. Trying to start the transformation with a stylesheet parameter to determine which template gets called first. | You can select an initial named template as a new launch option for 2.0, and passing stylesheet parameters is still allowed. (Note: the stylesheet parameters are not passed as parameters to the initial named template!) | The exact method of launching the transformation with a designated initial template will be specific to the implementation. |
Heavily using global variables or template parameters as communication vehicles within the stylesheet. Especially notable: a template that receives some parameters and does nothing with them other than pass them to another template. | If you pass values repeatedly down the template stack, that suggests that you want tunnel parameters. Implied Document Nodes (IDNs) can carry a whole structure of values to where you need it, reducing the number of parameters passed. | Passing extra parameters (not declared as incoming parameters on the called template) is an error in 2.0, but BC mode will suppress the error. |
Relying on unwritten built-in match-pattern templates or use of | You can now have | None. |
Using a local variable declared inside a sequence of instructions that is referenced only once. You needed to set the variable because you used a sequence of "programming" instructions, possibly in a named template, to set its value. Alternately, you set the local variable and used | New functions, IfExpr, RangeExpr, ForExpr, and so forth, make it less likely that you will need programmatic instructions to calculate a value. ForExpr (XPath part 3.7) addresses the point of view change issue. | BC mode allows certain automatic type conversions to occur, as well as automatic selection of the first member of a node-set. In particular, this applies to function arguments, and you will likely use more functions. Data types are more formal in 2.0, which might require analysis, and schema usage would increase the need to scrutinize. |
Using the node-set() extension function to convert a Result Tree Fragment (RTF) to be navigable and filterable. | Just use the RTF, now known as an Implied Document Node (IDN), directly. See Part 1 for more about IDNs. | None. |
Using extension functions (other than node-set). Attempting to emulate data types (beyond the XPath 1.0 types, for example, dateTime) through extension functions or named templates. Processing date and time values, usually as strings, but as numbers when a duration must be added. | New functions and XPath capabilities fulfill many of the needs. For more information about the new date/time/duration capabilities, see Part 1 of this series. | BC mode allows certain automatic type conversions to occur. In particular, this applies to function arguments. Schema awareness could help you manage a custom data type. |
Using very large or very small numbers or emulating private data-types derived from numeric types. | Use explicit data-type control and new functions to avoid awkward situations. Watch for situations where these numbers display in E-notation. | Check the vendor documentation about implementation-defined precision beyond minimum requirements. BC mode impacts numeric comparisons in two ways: allowing selection of the first member of a set and obtaining Data types are more formal in 2.0, which might require analysis, and schema usage would increase the need to scrutinize. |
Manipulating a string in careful detail to support the needs of a certain language/country, especially for sorting or displaying purposes. | Use enhanced 2.0 capabilities, and eliminate the
| Check the vendor documentation about implementation-defined locale support and collation support. Some vendors allow you to write your own collators. Serialization should be reviewed, especially supported encodings. |
Relying on the 1.0 behavior of many functions and operators that expect a single number or atomic string, when given a node-set for that argument, would automatically take the first member of the node-set and extract its atomized value. Exceptions are =, !=, <, <=, >, >=, Using | If BC mode is unavailable, append predicate As discussed in Part 1, | If your processor has BC mode, use local versioning to put |
Attempting to check the data type when matching a pattern or in a value comparison. For example, trying to guarantee that two values will be compared as numbers rather than strings. | Use the Read part 3.5 of the XPath 2.0 spec to learn about rigorous use of the New match patterns might cause templates to have default priorities in a tie. If so, set explicit priorities. Use the two-argument forms of | BC mode impacts numeric comparisons in two ways: allowing selection of the first member of a set and obtaining Schema awareness allows You can put available collations in effect to impact relational operators |
Comparing a variable having an atomic value (number, boolean, or string) to another such variable or a literal atomic value, and expecting the type of one to be cast automatically to the other type. Also, passing a non-string argument to a string-oriented function such as Example:
| Values fetched from the input document will be cast as Wrap the function argument in | If the BC feature is available, you can set version 1.0 to be in effect for an element with expression evaluation, eliminating worry about type mismatches. Schema awareness allows more exact casting. If you are using Schema Awareness and your input document is schema-validated, values that get annotated as new-for-2.0 types ( |
Attempting to accomplish grouping of elements by detecting first occurrence of a grouping key and using that as a trigger to form a group. Symptoms are use of:
| Use | None. |
Directly manipulating namespace nodes from the source document and attempting to control the namespace declarations in the output. Accessing such nodes in the source by use of the namespace axis or copying an element to get its namespace. On the output side, you can create an attribute just to add a namespace declaration for its name. Note: A 2.0 processor will catch syntactically invalid Namespace URIs. | Use Look at the new functions that handle namespaces, QNames, and URIs, particularly the two suggested in the namespace axis section of this article. You should be able to retrieve the necessary information using new functions instead of using the namespace axis. If serializing XML, be aware of namespace fixup. You'll need new URIs to replace invalid ones. | Check the vendor documentation to see if the 2.0 processor supports the namespace axis because support has become optional for 2.0. Support for BC requires support for the namespace axis. If the processor can create XML 1.1, you can undeclare prefixed namespaces. See Resources for a link to an explanation of namespaces. |
Using imported or included stylesheets, especially if you really wanted conditional inclusion. | Ask yourself: why are they separate modules? Are some of them utility templates? For conditional use of templates, whether imported/included or not, look at | If you care about URI schemes other than BC allows mixing 1.0 and 2.0 modules. For example, a 2.0 main stylesheet can include a 1.0 stylesheet. |
XSLT 2.0 processors are all new
Even if you don't knowingly use 2.0 behaviors, the 2.0 processor will operate according to the dictates of the 2.0 specs. The 2.0 documents do not incorporate 1.0 normatively; everything is specified again, from scratch, including the Backwards Compatibility behavior. A 2.0 processor with the BC feature will run a 1.0 stylesheet in BC mode, which does not obliterate the new instructions such as xsl:for-each-group. Read the rest of this series, the 2.0 specs, and other material with this in mind.
When you upgrade to 2.0, expect to see much less recursion, fewer RTFs, less mindless passing of parameters, and similar avoidance of 1.0 unpleasantness.
Learn
- The W3C site: Visit this great source of information on standards such as:
- XSLT 1.0
- XSLT 2.0
- XSLT 2.0 Requirements
- XPath
- XPath 2.0
- Functions and Operators (F&O)
- Data Model (XDM)
- Formal Semantics
- Serialization
- XQuery
- XML Schema
- XML 1.1
- XHTML
- Namespaces in XML
- The EXSLT site: Review a set of extension functions for XSLT 1.0.
- "Plan to Use XML Namespaces" (David Marston, developerWorks, Nov 2002, with April 2005 updates): Read about the purpose of namespaces and how to use namespace-qualified names ("QNames") in XPath expressions.
- "Improvements in XSLT" (David Marston and Joanne Tong, developerWorks,October 2006): In the first part of this series, discover the XSLT 2.0 features that are likely to motivate an upgrade. You'll also find some material about XPath 2.0 features.
- "Five strategies for changing from XSLT 1.0 to 2.0" (David Marston and Joanne Tong, developerWorks, November 2006): Read the second part of this series, which describes the higher-level decision factors for planning an upgrade to XSLT 2.0, setting the stage for using Backwards and Forwards Compatibility as transition tools.
- "The Toolkit for XSLT portability" (David Marston and Joanne Tong, , developerWorks, February 2007): In Part 4 of this series, look at the complete toolkit for mixing code of different versions.
- "Make your stylesheets work with any processor version" (David Marston and Joanne Tong, developerWorks, February 2007): Read the fifth part of this series which shows how to write a stylesheet that is portable between XSLT 1.0 and 2.0.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
Get products and technologies
- IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Xalan-Java: See the features and extensions of this well-known open-source XSLT processor from Apache.
- "All About JAXP" (Brett McLaughlin , developerWorks, May 2005): In Part 2, find several examples of API calls for launching an XSLT transformation. You can download sample code.
Discuss
- XML zone discussion forums: Participate in any of several XML-centered forums.
- developerWorks blogs: Get involved in the developerWorks community.
David Marston has worked with XML technologies since late 1998, particularly on standards conformance. Over his 25+ years in the computing business, he has been involved with all aspects of software development. He is a graduate of Dartmouth College and a member of the ACM. He is on the Next-Generation Web team at IBM Research. You can contact him at David_Marston@us.ibm.com.