Planning to upgrade XSLT 1.0 to 2.0, Part 3: Why the transition requires planning

What a stylesheet writer will change in legacy stylesheets

Part 1 of this series described some long-sought XSLT features that will be added to the new 2.0 version. Part 2 discussed different strategies for upgrading to 2.0, with the amount of advance planning being one of the main differentiators. This part is a deeper exploration of the changes you will need or want to perform as you upgrade. To read the other articles in this series, go to the Planning to upgrade overview page.

Share:

David Marston, Software Engineer, IBM, Software Group

David Marston has worked with XML technologies since late 1998, particularly on standards conformance. Over his 25+ years in the computing business, he has been involved with all aspects of software development. He is a graduate of Dartmouth College and a member of the ACM. He is on the Next-Generation Web team at IBM Research. You can contact him at David_Marston@us.ibm.com.



Joanne Tong, Software Developer, IBM, Software Group

Joanne Tong is a developer working on IBM's XSLT processors in the IBM Toronto lab. She is currently an editor of the W3C XSLT 2.0 and XQuery 1.0 Serialization specification and is an active member of the XSL working group. You can contact her at joannet@ca.ibm.com.



29 November 2006

Also available in Chinese

About this series

XSLT 2.0, the latest specification released by the W3C, is a language for transforming XML documents. It includes numerous new features, with some specifically designed to address shortcomings in XSLT 1.0. In this collection of articles, you'll get a high-level overview and an in-depth look at XSLT 2.0 from the point of view of an XSLT 1.0 user who wants to fix old problems, learn new techniques, and discover what to look out for. Examples derived from common applications and practical suggestions are provided if you wish to upgrade. To help you begin to use XSLT 2.0, migration techniques will be provided.


Conformance of XSLT processors

Software vendors that implement XSLT 2.0 must conform to the specifications issued by the W3C (see Resources for links), but there are allowable differences. As with XSLT 1.0, numerous details are implementation-defined, which means that each implementer gets to choose what to do. XSLT 2.0 also has three major modules, independent of each other, which the vendor can choose to implement as extra features: Serialization, Schema Awareness, and Backwards Compatibility. Each feature module that the vendor implements must conform to the specification, though there are occasions of additional vendor choices within the module. In the case of the Serialization feature, the 2.0 conformance picture is much clearer than it was for 1.0, where not only was serialization itself optional, but many of its characteristics were optional as well (should statements rather than must statements).

The number of detail choices the vendor can make has been reduced for 2.0, mainly by requiring certain errors in constructing the result to be flagged as errors (as opposed to allowing structural changes as a surprise in the result). There are still quite a few implementation-defined items, and you'll find a convenient list of the XSLT choices as an appendix to the XSLT 2.0 spec. Several of these implementation-defined choices allow support for languages and locales of the vendor's choice. You can read more about that later, after the sections describing each feature module.


Do you need Backwards Compatibility?

The main goal for the Backwards Compatibility (BC) feature is to allow a stylesheet that worked with a 1.0 processor to also work (more or less) with a 2.0 processor. A 2.0-conforming processor is not 100% compatible with the 1.0 spec; thus it is not the same as if it called old 1.0 code. It's still a 2.0 processor! That should not be thought of as a problem, and it might be an opportunity in some cases (by giving you a choice of two behaviors). This series of articles intends to explain all aspects of compatibility across versions.

XSLT 1.0, which you should be familiar with, was designed with the knowledge that future versions would be specified. The most essential evidence of this fact is the requirement that all stylesheets have a version attribute on the outer xsl:stylesheet element. All XSLT processors, including 1.0 processors, are required to support Forwards Compatibility (FC), which is where the version number on the stylesheet is higher than the version of the processor and the processor can handle the parts it recognizes. When FC is in effect on a stylesheet, the processor must be more lax about unknown attributes, and unknown values for known attributes, on XSLT declarations and instructions. New XSLT elements must also be ignored without raising an error. Note, however, that in a few cases, an instruction that existed in 1.0 expanded for 2.0 in a way other than through its attributes. You might use FC in your existing stylesheets to prepare for 2.0, but please keep reading this series for guidance.

A 2.0 processor might support XSLT 2.0 (and FC) only, declining to implement the BC feature. This means that any occurrence of version="1.0" in the stylesheet will cause an error, except on xsl:output, where it sets the version of XML. Part 2 of this series presented some decision factors regarding a wholesale switch to 2.0, and this part refines that information. With the BC feature in place, the processor will not only accept version="1.0" on the top-level xsl:stylesheet element, but also on subsidiary elements, where its effect will apply to just that element and its descendants. Various rows in the legacy review table present ways that locally-scoped BC can address some behavioral differences between the two versions.


Do you need Schema Awareness?

Schema awareness is another optional feature that some XSLT 2.0 conforming processors support. If this feature is supported, then the additional syntactic items and their effects on XSLT transformations are well-defined and interoperable. This feature is mainly used for error checking. It is a tool built into the language that would enable a stylesheet writer to validate the schema type of atomic values and nodes in both temporary (in a variable) and final states. It is also used to select nodes of a specific schema type from input sources and temporary trees, to create atomic values beyond the subset of built-in atomic types defined in the XML Schema specification (see Resources), to confirm the type of nodes and atomic values using the operator instance of, and to create schema validated nodes in both temporary and final states. In Part 1 of this series, Table 1 introduced the 10 syntactic items that can make use of schema awareness.

Do not assume that this feature is only required if you work with XML schemas. During the transformation, schemas (either from an external source or embedded in the stylesheet) are only necessary if you're interested in validating and selecting typed nodes or working with user-defined schema types. Without a schema, this feature is still needed if you're working with built-in atomic types such as xs:nonNegativeInteger and xs:token. A non-schema-aware processor throws a type error if it encounters these types in the stylesheet. If you're only interested in working with typed atomic values and those values only need to be validated with the more common built-in schema types, such as xs:float and xs:boolean, then the schema awareness feature is not necessary. For a complete list of types supported by a non-schema-aware processor, read section 3.13 in the XSLT 2.0 specification (see Resources for a link to XSLT 2.0 at the W3C site).


Do you need Serialization?

Serialization is an optional feature that a conforming XSLT processor does not have to support, though this might surprise some people. Most processors at least support a subset of the Serialization feature. If an XSLT 2.0 processor does claim full conformance to this feature, then it must implement all attributes defined in the xsl:output and xsl:character-map declarations and it must be able to serialize to XML, HTML, XHTML, and text output file formats (or output methods such as byte streams).

Processors are permitted to extend serialization functionalities. Thus, a processor can support custom output methods, extension attributes to control some aspect of serialization, or support additional values in existing attributes (if permitted by the specification). To see if the baseline serialization requirement is sufficient for your needs, review your existing stylesheets and any post-transformation processing mechanisms that you might have. Furthermore, review the documentation provided by the XSLT 2.0 processor vendor to see if a serialization extension could replace an existing 1.0 extension mechanism, thus easing your transition to XSLT 2.0. Also look for the support of XML 1.1, normalization forms (such as NFD), and @disable-output-escaping in the documentation because these are not part of the baseline Serialization feature requirements. Do not disqualify a processor simply because @disable-output-escaping is not supported. This feature is deprecated in XSLT 2.0 and can easily be replaced by the standard support of xsl:character-map. (See Part 1 of this series for more information about character maps.)


Buyer take note: Processors can have other variances

XSLT processors can vary not only on whether they offer any of the modular features just described, but also in smaller extras, and there are certain decisions available to the processor vendor. The allowable variance might mean that one implementation is stricter about raising errors while another recovers (where permissible) rather than throw errors. Following are the details about some implementation-defined aspects that might have greater impact on the transition of your legacy stylesheets.

Do you still need extension mechanisms?

Many XSLT processors expose interfaces or have methods designed to be executed as a component within an application written in a specific application framework. Some processors might also support extension facilities, in the programming language of the application framework, which allows the transformation engine to recognize and process extension functions and instructions during its execution. For example, Xalan-J, a popular open-source XSLT 1.0 processor, supports an extension mechanism that would allow instantiating Java objects and calling Java methods in a stylesheet. To map values of an XPath expression to the arguments or the return value of a Java method, Xalan-J provides a mapping of XSLT types to Java types. For example, a result tree fragment of a variable reference in the stylesheet would be recognized as an org.w3c.dom.DocumentFragment object in the Java method. If your 1.0 stylesheet makes use of these types of extension facilities, then you should first investigate to see if the function performed by your extension code can be replaced with new features provided by XSLT 2.0 and XPath 2.0. If, for example, you implemented a random() function, which is not supported in F&O (the Functions and Operators spec; see Resources for a link), then you need to find an XSLT 2.0 processor that provides extension facilities that minimize your transition effort to that processor.

EXSLT is a pseudo-standard that offers extensions with functionalities lacking in XSLT 1.0. Many existing 1.0 processors provide selective support of EXSLT extensions. Though many extensions overlap with the F&O library or can easily be rewritten in XSLT 2.0 syntax, certain extensions, such as evaluate() and script, require the processor to provide compatible support beyond the standard XSLT 2.0. Another kind of extension provided by some XSLT processors allows connecting to SQL databases and retrieving data. If your stylesheet cannot function without these extensions, then check the vendor's documentation.

Do you need the namespace axis?

In XSLT 2.0, the support of the namespace axis (that is namespace::nodetest) is optional. Thus, if your 1.0 stylesheets make extensive use of the namespace axis and you don't intend to modify these to call in-scope-prefixes() and namespace-uri-for-prefix() instead, then verify that the namespace axis is supported in the XSLT 2.0 processor's documentation by default (when running in non-BC mode). If you plan to take advantage of BC, then the namespace axis must be supported.

Accommodating different environments

Other points of variation include:

  • Support of various URI schemes and fragment identifiers in URIs. This affects xsl:import, xsl:include, doc(), and unparsed-text(), among others.
  • Support for XML 1.1. This does not affect input source documents, but requires support if you wish to serialize to XML 1.1 documents, or your stylesheet is an XML 1.1 document.
  • Support of various character sets, normalization forms for strings, and collations.
  • Language-awareness for numbers and dates, collations, and support of various calendars.
  • Support for more decimal precision than the required minimum (including time and duration values).
  • The type of information presented by the trace() function.
  • Ways in which errors and warnings are handled, including what the error() function does. For example, a processor can throw a type error as a static error even before the template is executed, provided that the execution of the particular construct would never succeed.
  • The collection() function must return a sequence of nodes, and calling the same function multiple times must return the same results, but beyond that the actual functionality is completely implementation-defined.

If any of the preceding points are important to you, keep them in mind when you select a 2.0 processor.

The mechanism for launching a transformation

To launch XSLT within an application, most application development environments expose custom interfaces and custom methods, or support standard transformation APIs such as JAXP (Java API for XML Processing) to execute the transformation. As this article is being written, no known standard interfaces support XSLT 2.0. However, this should not be a major concern unless you use the standard API and are interested in initiating a transformation by specifying new launch options or redirecting additional results (when using xsl:result-document with @href). Otherwise, the processor vendor is likely to provide some support to initiate the processing in the application programming language of its choice. New XSLT 2.0 launch options are the ability to set:

  • An initial mode
  • An initial named template
  • An initial context node
  • The base output URI

Existing standard APIs already provide a method to set stylesheet parameter values. You can also set the initial context node if the input source is a node (using a DOM Source, for example). Notice that you cannot set BC mode as a launch option; only stylesheet content can trigger this mode.


Looking at your XSLT legacy code and deciding what you'll need for your 2.0 processor

An XSLT stylesheet still resembles a specialized form of a program, and most of the look and feel has not changed for 2.0. However, many aspects are better because the code is more straightforward. (See Part 1 of this series for illustrations of this point.) This table is intended to help you recognize certain aspects that should change, identifies the replacement, and identifies any permitted differences among 2.0 processors that might be important as you make your choice. The table also helps you plan the transition of your 1.0 code to 2.0.

Table 1. XSLT features that may affect your choice of a 2.0 processor
Feature you used in 1.0 or objective that you wanted to accomplish in XSLT Simplest 2.0 solutionImplications for choice of 2.0 processor

Using named templates to perform generic operations, such as:

  1. Making all letters uppercase
  2. Checking whether a longer string ends with a particular shorter string
  3. Finding the distinct values in a set
  4. Rounding a number to a particular (non-zero) number of decimal places

and various other generic operations.

Of particular note is the use of recursion of a named template to:

  1. Calculate a set of sequential integers
  2. Calculate a numerical minimum or maximum
  3. Pull apart a string that has delimiter characters

and various other generic operations.

  1. Complex stepping through a node-set to get a boolean value as the result.

New functions and XPath capabilities handle most of these. For the examples given, use:

  1. upper-case()
  2. ends-with()
  3. distinct-values()
  4. round-half-to-even()
  5. Range expressions with the to operator
  6. min() and max()
  7. tokenize()
  8. Quantified expressions (XPath part 3.9)

You might use xsl:function to define your own function, if there is not a built-in function for your task.

BC mode allows certain automatic type conversions to occur, as well as automatic selection of the first member of a node-set. In particular, this applies to function arguments, and you will likely use more functions.

Having a named template that you wish to call right away. Trying to start the transformation with a stylesheet parameter to determine which template gets called first.

You can select an initial named template as a new launch option for 2.0, and passing stylesheet parameters is still allowed. (Note: the stylesheet parameters are not passed as parameters to the initial named template!)

The exact method of launching the transformation with a designated initial template will be specific to the implementation.

Heavily using global variables or template parameters as communication vehicles within the stylesheet. Especially notable: a template that receives some parameters and does nothing with them other than pass them to another template.

If you pass values repeatedly down the template stack, that suggests that you want tunnel parameters. Implied Document Nodes (IDNs) can carry a whole structure of values to where you need it, reducing the number of parameters passed.

Passing extra parameters (not declared as incoming parameters on the called template) is an error in 2.0, but BC mode will suppress the error.

Relying on unwritten built-in match-pattern templates or use of xsl:apply-imports to run the same node through more template processing, when template parameters are desired.

You can now have xsl:with-param on xsl:apply-imports. These template parameters will not be discarded as before. It is a good idea to use explicit templates instead of relying on built-ins. You can pass tunnel parameters through several templates, including built-in ones, without loss of them.

None.

Using a local variable declared inside a sequence of instructions that is referenced only once. You needed to set the variable because you used a sequence of "programming" instructions, possibly in a named template, to set its value. Alternately, you set the local variable and used xsl:for-each to change the point of view for an inner calculation.

New functions, IfExpr, RangeExpr, ForExpr, and so forth, make it less likely that you will need programmatic instructions to calculate a value. ForExpr (XPath part 3.7) addresses the point of view change issue.

BC mode allows certain automatic type conversions to occur, as well as automatic selection of the first member of a node-set. In particular, this applies to function arguments, and you will likely use more functions.

Data types are more formal in 2.0, which might require analysis, and schema usage would increase the need to scrutinize.

Using the node-set() extension function to convert a Result Tree Fragment (RTF) to be navigable and filterable.

Just use the RTF, now known as an Implied Document Node (IDN), directly. See Part 1 for more about IDNs.

None.

Using extension functions (other than node-set). Attempting to emulate data types (beyond the XPath 1.0 types, for example, dateTime) through extension functions or named templates. Processing date and time values, usually as strings, but as numbers when a duration must be added.

New functions and XPath capabilities fulfill many of the needs.

For more information about the new date/time/duration capabilities, see Part 1 of this series.

BC mode allows certain automatic type conversions to occur. In particular, this applies to function arguments.

Schema awareness could help you manage a custom data type.

Using very large or very small numbers or emulating private data-types derived from numeric types.

Use explicit data-type control and new functions to avoid awkward situations. Watch for situations where these numbers display in E-notation.

Check the vendor documentation about implementation-defined precision beyond minimum requirements.

BC mode impacts numeric comparisons in two ways: allowing selection of the first member of a set and obtaining NaN for a non-numeric value, rather than throwing a type error. See XPath part 3.4 for details.

Data types are more formal in 2.0, which might require analysis, and schema usage would increase the need to scrutinize.

Manipulating a string in careful detail to support the needs of a certain language/country, especially for sorting or displaying purposes.

Use enhanced 2.0 capabilities, and eliminate the disable-output-escaping attribute.

  • format-date() function
  • format-time() function
  • format-dateTime() function
  • formats in xsl:number
  • better control in xsl:sort
  • default-collation attribute

Check the vendor documentation about implementation-defined locale support and collation support. Some vendors allow you to write your own collators.

Serialization should be reviewed, especially supported encodings.

Relying on the 1.0 behavior of many functions and operators that expect a single number or atomic string, when given a node-set for that argument, would automatically take the first member of the node-set and extract its atomized value.

Exceptions are =, !=, <, <=, >, >=, id(), boolean(), and not().

Using xsl:value-of @select="some node" where multiple nodes could be selected.

If BC mode is unavailable, append predicate [1] to the node-set expression that returned more than one node.

As discussed in Part 1, xsl:value-of will (in 2.0) automatically iterate over a sequence of values, including a node-set.

If your processor has BC mode, use local versioning to put version="1.0" in effect. (It's easier than hunting down all the places where [1] might be needed.) BC mode affects the iteration capability of xsl:value-of.

Attempting to check the data type when matching a pattern or in a value comparison. For example, trying to guarantee that two values will be compared as numbers rather than strings.

Use the instance of operator.

Read part 3.5 of the XPath 2.0 spec to learn about rigorous use of the eq relation.

New match patterns might cause templates to have default priorities in a tie. If so, set explicit priorities.

Use the two-argument forms of element() and attribute() patterns.

BC mode impacts numeric comparisons in two ways: allowing selection of the first member of a set and obtaining NaN for a non-numeric value, rather than throwing a type error. See XPath part 3.4 for details.

Schema awareness allows instance of to confirm the type of a node against all types defined in the schema. It also allows schema-element() and schema-attribute() node tests. See Table 1 in Part 1 for a list of the syntactic items affected by schema awareness.

You can put available collations in effect to impact relational operators eq, le,, and so forth.

Comparing a variable having an atomic value (number, boolean, or string) to another such variable or a literal atomic value, and expecting the type of one to be cast automatically to the other type. Also, passing a non-string argument to a string-oriented function such as substring(), contains(), and so forth.

Example:

contains($MyNumber, '0')

Values fetched from the input document will be cast as xs:untypedAtomic, permitting the comparison or function to succeed, but comparisons where both comparands are annotated with an explicit type will cause type errors if the types differ. Wrap one of the comparands in number(), boolean(), string(), or one of the new type-constructor functions.

Wrap the function argument in string() if it needs to be a string.

If the BC feature is available, you can set version 1.0 to be in effect for an element with expression evaluation, eliminating worry about type mismatches.

Schema awareness allows more exact casting. If you are using Schema Awareness and your input document is schema-validated, values that get annotated as new-for-2.0 types (xs:date, xs:duration, xs:hexBinary, and so forth) or private types derived from those primitive types will not be cast to strings automatically. Manually cast them or (better yet) write new expressions that use the typed value for real.

Attempting to accomplish grouping of elements by detecting first occurrence of a grouping key and using that as a trigger to form a group. Symptoms are use of:

  • generate-id(key(...))
  • preceding-sibling axis
  • xsl:for-each select="key(...)"

Use xsl:for each-group (probably replacing xsl:for-each), which was described in Part 1 of this series.

None.

Directly manipulating namespace nodes from the source document and attempting to control the namespace declarations in the output. Accessing such nodes in the source by use of the namespace axis or copying an element to get its namespace. On the output side, you can create an attribute just to add a namespace declaration for its name.

Note: A 2.0 processor will catch syntactically invalid Namespace URIs.

Use xsl:namespace to create namespace nodes anew.

Look at the new functions that handle namespaces, QNames, and URIs, particularly the two suggested in the namespace axis section of this article. You should be able to retrieve the necessary information using new functions instead of using the namespace axis.

If serializing XML, be aware of namespace fixup. You'll need new URIs to replace invalid ones.

Check the vendor documentation to see if the 2.0 processor supports the namespace axis because support has become optional for 2.0. Support for BC requires support for the namespace axis.

If the processor can create XML 1.1, you can undeclare prefixed namespaces.

See Resources for a link to an explanation of namespaces.

Using imported or included stylesheets, especially if you really wanted conditional inclusion.

Ask yourself: why are they separate modules? Are some of them utility templates? For conditional use of templates, whether imported/included or not, look at use-when.

If you care about URI schemes other than http, check the vendor documentation about methods for fetching the subsidiary stylesheets.

BC allows mixing 1.0 and 2.0 modules. For example, a 2.0 main stylesheet can include a 1.0 stylesheet.

XSLT 2.0 processors are all new

Even if you don't knowingly use 2.0 behaviors, the 2.0 processor will operate according to the dictates of the 2.0 specs. The 2.0 documents do not incorporate 1.0 normatively; everything is specified again, from scratch, including the Backwards Compatibility behavior. A 2.0 processor with the BC feature will run a 1.0 stylesheet in BC mode, which does not obliterate the new instructions such as xsl:for-each-group. Read the rest of this series, the 2.0 specs, and other material with this in mind.

When you upgrade to 2.0, expect to see much less recursion, fewer RTFs, less mindless passing of parameters, and similar avoidance of 1.0 unpleasantness.

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
  • Xalan-Java: See the features and extensions of this well-known open-source XSLT processor from Apache.
  • "All About JAXP" (Brett McLaughlin , developerWorks, May 2005): In Part 2, find several examples of API calls for launching an XSLT transformation. You can download sample code.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=175612
ArticleTitle=Planning to upgrade XSLT 1.0 to 2.0, Part 3: Why the transition requires planning
publish-date=11292006