Traditionally, narrative content is marked up to convey how it should look (for example, font size or text alignment), rather than its real structure or meaning. This approach is true of content destined for print in books and office documents as well as many web pages. Some narrative content is not marked up at all and is just represented as paragraphs in plain text.
With the advent of content management, electronic publishing, and advanced searching and querying technologies, content owners now realize the power of structured information and mark up their new content accordingly. But many have a backlog of content that is still unstructured or semi-structured or marked up only for presentation.
To make existing content much more usable, add structure to it, which enables you to:
- Style content separately for multiple output devices such as smart phones, e-book readers, and alternate web browsers
- Implement interactive behavior associated with certain kinds of text—for example, hyperlinks for intra- or inter-document references or pop-up directions for addresses
- Generate alternate representations of the content, such as tables of contents, indexes, and summary views
- Provide more focused searching based on fields in the content
- Ensure more consistent formatting across an entire body of content
- Improve verification of the content—for example, determine whether intra-document references in a legal document are valid
XSLT 2.0 is a well adapted technology for adding structure to narrative content, which is increasingly represented in XML or something that can be easily converted to XML, such as HTML. Unlike some popular scripting languages for manipulating text, XSLT is completely XML-aware. It understands the many variations in XML syntax, encoding, and namespaces.
XSLT is also suited for narrative content because of its flexibility. The use of template rules to handle various events in the input document means that the conversion can be content driven rather than strictly controlled by a sequential process.
Unlike XSLT 1.0, which was geared more toward styling XML, XSLT 2.0 has advanced capabilities for converting content, which can:
- Identify patterns in text through regular expressions
- Group elements by value or position
- Make multiple passes at a document in one stylesheet
- Split a document into many documents or join many documents into one
To use XSLT, the first step is to get your content into XML. Most document editors and other content tools have an XML export capability. Depending on the tool, the XML might be prohibitively complex, and the first step should be to convert the XML to a simplified form so that it's easier to work with.
If you are converting from HTML, HTML Tidy (see Resources for a link) turns your HTML into XHTML, which can be input for XSLT because it is well-formed XML. It also simplifies your documents.
The rest of this article describes techniques that you can use to add structure and semantics to content using XSLT 2.0. Particular attention is paid to XHTML and Microsoft® Office Word XML as input documents, but these concepts apply to any narrative input document.
You can download all of the examples in this section as the downloadable ZIP file (see Download). For XSLT 2.0 processing, I recommend Saxon (see Resources for a link).
It is often useful to recognize a particular pattern in text and mark it up. URLs, email addresses, phone numbers, intra-document references, and wiki-type formatting all follow common patterns that you can identify using regular expressions.
Listing 1 shows an input document that contains email
addresses in plain text. Suppose that you want to mark up each email address
using an a (anchor) element to enable a link
to the email address, which in HTML would allow a user to click the address
to send an email to it.
Listing 1. Sample input to the text pattern recognition XSLT
<document>
<p>Priscilla Walmsley can be reached at pwalmsley@datypic.com.</p>
<p>Questions about XSLT in general are best asked on the XSL list at
xsl-list@lists.mulberrytech.com (subscription required.) </p>
</document>
|
The XSLT in Listing 2 accomplishes this goal. It
uses the XSLT 2.0 analyze-string instruction,
which tests a string to see whether it matches a regular expression. The
analyze-string instruction has two child elements:
matching-substring, which says what to do with
substrings that match the pattern, and non-matching-substring,
which says what to do with the rest of the text.
In this example, the first template rule matches every text node in the input
document. Whenever it finds text that matches the regular expression
specified in the regex attribute, it inserts an
a element. It gives it an href
attribute that concatenates the string mailto:
with the email address. It also puts the email address in the contents of the
a element. Inside a matching-substring
instruction, the period (.) represents the string
that matched the pattern.
If all or part of a text node does not match the pattern, it is simply copied, as
specified in non-matching-substring. The second
template rule in the XSLT is instructing the processor to copy all elements and
continue to their children, so that nothing else is changed other than the email
addresses.
Listing 2. Text pattern recognition XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="text()">
<xsl:analyze-string select="."
regex="[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{{2,4}}">
<xsl:matching-substring>
<a href="mailto:{.}">
<xsl:value-of select="."/>
</a>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:copy/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
|
Listing 3 shows the output of the XSLT, with email addresses marked up as a
elements.
Listing 3. Sample output of the text pattern recognition XSLT
<document> <p>Priscilla Walmsley can be reached at <a href="mailto:pwalmsley@datypic.com">pwalmsley@datypic.com</a>.</p> <p>Questions about XSLT in general are best asked on the XSL list at <a href="mailto:xsl-list@lists.mulberrytech.com">xsl-list@lists.mulberrytech.com</a> (subscription required.) </p> </document> |
At a higher level, it is often useful to add structure representing divisions or sections of a document. You might add structure to conform to a particular XML vocabulary, such as DocBook. Or you might want to clearly delineate sections of content so that you can easily repurpose them using a technology like DITA.
For example, Listing 4 shows an XHTML input document
that has a flat structure. The section headings (h1
and h2) appear at the same level as the paragraphs,
with no structural tags to group the sections.
Listing 4. Sample input to the section grouping XSLT
<html> <h1>Chapter 1</h1> <h2>Section 1.1</h2> <p>In this section...</p> <p>More text</p> <h2>Section 1.2</h2> <p>In this second section...</p> </html> |
The XSLT in Listing 5 uses the grouping
capabilities of XSLT 2.0 to add section elements based on
the position of the section headings. It creates two levels of groups. First,
it uses a for-each-group instruction that groups
all of the children of html using the attribute
group-starting-with="h1". This approach means that an
h1 element signals the beginning of a group at this
level. For each of these groups, the XSLT is inserting a
section level="1" element.
Second, it has an inner for-each-group that calls the
current-group function to get all of the items in
the h1 group and create subgroups out of those. The
beginning of a subgroup is identified by an h2 element.
Inside the inner for-each-group, it tests to see whether
the group contains an h2 element using the XPath
current-group()[self::h2]. It does this because a
group will be created containing everything that appears before the first
h2 element, which in this case is just the
h1 element. You do not want a separate
section level="2" around that element.
Listing 5. Grouping sections in XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="html">
<document>
<xsl:for-each-group select="*" group-starting-with="h1">
<section level="1">
<xsl:for-each-group select="current-group()" group-starting-with="h2">
<xsl:choose>
<xsl:when test="current-group()[self::h2]">
<section level="2">
<xsl:apply-templates select="current-group()"/>
</section>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</section>
</xsl:for-each-group>
</document>
</xsl:template>
<xsl:template match="h1|h2">
<heading>
<xsl:apply-templates/>
</heading>
</xsl:template>
<xsl:template match="node()">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
|
Listing 6 shows the output of the XSLT, which contains two levels of section
elements.
Listing 6. Sample output of the section grouping XSLT
<document>
<section level="1">
<heading>Chapter 1</heading>
<section level="2">
<heading>Section 1.1</heading>
<p>In this section...</p>
<p>More text</p>
</section>
<section level="2">
<heading>Section 1.2</heading>
<p>In this second section...</p>
</section>
</section>
</document>
|
Instead of grouping based on a beginning element, it is sometimes necessary to group
items that are adjacent to each other. One case is list items, which might appear
as siblings of normal paragraphs rather than grouped together in a list. To do that,
a for-each-group instruction can use the attribute
group-adjacent instead of
group-starting-with.
Inferring structure from section numbers
Sometimes, you want to group and infer structure at the same time. All the text in the input document in Listing 7, unlike previous examples, is marked up as generic paragraphs. The only way to determine the structure is by testing the content for patterns.
Listing 7. Sample input for the section inference XSLT
<document> <p>Chapter 1: In the beginning</p> <p>In this chapter...</p> <p>1.1 Introduction</p> <p>In this section...</p> <p>More text</p> <p>1.2 Next Steps</p> <p>In this second section...</p> </document> |
The easiest way to accomplish this is with an XSLT. The XSLT in Listing 8 takes two passes at the document and uses modes to separate the functionality of the two passes.
Listing 8. Section inference XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="chapRegex" select="'^\s*Chapter\s+(\d+)\s*:\s*(.*)$'"/>
<xsl:variable name="secRegex" select="'^\s*(\d+\.\d+)\s*(.*)$'"/>
<xsl:template match="document">
<xsl:variable name="renamed" as="element()*">
<xsl:apply-templates select="*" mode="rename"/>
</xsl:variable>
<document>
<xsl:for-each-group select="$renamed" group-starting-with="chapTitle">
<chapter num="{replace(current-group()[self::chapTitle],$chapRegex,'$1')}">
<xsl:for-each-group select="current-group()" group-starting-with="secTitle">
<xsl:choose>
<xsl:when test="current-group()[self::secTitle]">
<section num="{replace(current-group()[self::secTitle],$secRegex,'$1')}">
<xsl:apply-templates select="current-group()"/>
</section>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</chapter>
</xsl:for-each-group>
</document>
</xsl:template>
<xsl:template match="p[matches(.,$chapRegex)]" mode="rename">
<chapTitle>
<xsl:copy-of select="node()"/>
</chapTitle>
</xsl:template>
<xsl:template match="p[matches(.,$secRegex)]" mode="rename">
<secTitle>
<xsl:copy-of select="node()"/>
</secTitle>
</xsl:template>
<xsl:template match="text()" priority="1">
<xsl:choose>
<xsl:when test="matches(.,$chapRegex) and (. is parent::chapTitle/node()[1])">
<xsl:value-of select="replace(.,$chapRegex,'$2')"/>
</xsl:when>
<xsl:when test="matches(.,$secRegex) and (. is parent::secTitle/node()[1])">
<xsl:value-of select="replace(.,$secRegex,'$2')"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="node()" mode="#all">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
|
The first pass determines which p elements actually
represent chapter titles and section titles and renames them to simplify the
later grouping pass. A variable $renamed is created
that contains the results of the first pass. Within its definition, templates are
applied with the mode rename. Two template rules
in the rename mode look for paragraphs that are
chapter or section titles and rename them. To determine this, they use the
matches function (new in XSLT 2.0), which determines
whether a string matches a regular expression. In this case, the regular
expressions are represented by the variables $chapRegex
and $secRegex.
Listing 9 shows the value of the $renamed
variable after the first pass although it doesn't appear directly in the results.
Listing 9. Value of the $renamed variable in section inference XSLT
<document> <chapTitle>Chapter 1: In the beginning</chapTitle> <p>In this chapter...</p> <secTitle>1.1 Introduction</secTitle> <p>In this section...</p> <p>More text</p> <secTitle>1.2 Next Steps</secTitle> <p>In this second section...</p> </document> |
The second pass starts with the for-each-group
instruction, whose select attribute indicates that
it operates on the $renamed variable instead of
the input document. The second pass uses grouping logic much like the previous
example to create two levels of structure: chapter
and section.
It also creates num attributes that contain the chapter
and section numbers. To do this, it uses replace,
another new function in XSLT 2.0. The replace
function uses three arguments: the first is the text to modify, the second is
the regular expression that matches the part to be replaced, and the third is the
replacement string. In this case, the third argument makes use of subexpressions,
where $2 means that it should include the text that
matched the second set of parentheses in the regular expression.
The text() template rule removes the chapter and section
numbers from the content because they are being moved to the
num attribute. This attribute, too, uses the
replace function. You have to be careful not to
replace every text node that matches the pattern, so test whether it's the first
text node in that parent.
The node() template rule copies all elements that don't
have more specific templates. The use of mode="#all"
means that this template is used whether you're in rename
mode or have no mode in use. When you apply templates from within this template
rule, you use the mode="#current" to indicate that you should stay
in whatever mode matched the template in the first place.
Although modes are not new in XSLT 2.0, the use of #all
and #current keywords is new, as is the ability to
make several passes at a document. Because of restrictions on result tree
fragments, these abilities were not possible in XSLT 1.0.
Listing 10 shows the final output.
Listing 10. Sample output of the section inference XSLT
<document>
<chapter num="1">
<chapTitle>In the beginning</chapTitle>
<p>In this chapter...</p>
<section num="1.1">
<secTitle>Introduction</secTitle>
<p>In this section...</p>
<p>More text</p>
</section>
<section num="1.2">
<secTitle>Next Steps</secTitle>
<p>In this second section...</p>
</section>
</chapter>
</document>
|
Named styles are an important resource in determining the true structure of a
document. Although generally meant to specify and standardize formatting,
a named style can also convey the structure and meaning of the content. In many
technologies, the structure and the style are two different things. For example,
in Microsoft Office Word, text is structured into paragraphs, and styles are applied to
those paragraphs or to text runs within paragraphs. In HTML, structural elements
might come in different varieties like h1 and
p, but they might also have "styles" applied to them
using class attributes that refer to CSS stylesheets.
A Word document that uses styles is depicted in Figure 1, where the Heading 1 and Heading 2 paragraph styles are used for the section headings and character styles Emphasis and Strong are used for inline text formatting.
Figure 1. Word document that uses styles
I saved this document as Word XML before running it through an XSLT. The Word XML file is not shown here because it is large and complex, but it is available in the examples for download (see Download).
I recommend creating a style mapping document, an example of which is in Listing 11. It maps the styles in the input document to the desired structural elements in the results. This mapping make the conversion more general purpose and flexible as well as easier to maintain over time.
Listing 11. Style map used in the Word conversion XSLT
<styles>
<style>
<name>BodyText</name>
<name>Normal</name>
<transformTo>p</transformTo>
</style>
<style>
<name>ListParagraph</name>
<transformTo>li</transformTo>
</style>
<style>
<name>Heading1</name>
<transformTo>h1</transformTo>
</style>
<style>
<name>Heading2</name>
<transformTo>h2</transformTo>
</style>
<style>
<name>Emphasis</name>
<transformTo>em</transformTo>
</style>
<style>
<name>Strong</name>
<transformTo>strong</transformTo>
</style>
</styles>
|
The XSLT in Listing 12 converts the Word
XML document to XML, creating the elements specified in the style map. It
has to define template rules for only two elements: w:p
(paragraph) and w:r (text run). For each of these elements,
it determines the associated style and looks up that style in the style map. It
then creates an XML element using xsl:element
with the name found in the transformTo in the
style map.
Listing 12. Converting Word styles to XML tags
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
exclude-result-prefixes="w">
<xsl:variable name="styles" select="doc('stylemap.xml')/styles/style"/>
<xsl:template match="/">
<document>
<xsl:apply-templates select=".//w:p"/>
</document>
</xsl:template>
<xsl:template match="w:p">
<xsl:variable name="elName"
select="$styles[name=current()/w:pPr/w:pStyle/@w:val]/transformTo"/>
<xsl:choose>
<xsl:when test="$elName != ''">
<xsl:element name="{$elName}">
<xsl:apply-templates select="*"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<!-- paragraphs without a listed style are just plain p's -->
<p>
<xsl:apply-templates select="*"/>
</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="w:r">
<xsl:variable name="elName"
select="$styles[name=current()/w:rPr/w:rStyle/@w:val]/transformTo"/>
<xsl:choose>
<xsl:when test="$elName != ''">
<xsl:element name="{$elName}">
<xsl:apply-templates select="*"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="*"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
|
Listing 13 shows sample output. In this case, I have not actually added or inferred new information. I have combined structure and style into one set of tags. After you reach this step, you can apply the other techniques described in this article to add structure and recognize textual patterns.
Listing 13. Sample output from the Word style conversion XSLT
<document>
<h1>Chapter 1: In the beginning</h1>
<p>In this chapter...</p>
<h2>1.1 Introduction</h2>
<p>In this section there is a list.</p>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
<h2>1.2 Next Steps</h2>
<p>This section uses character styles
<strong>Strong</strong> and <em>Emphasis</em>.</p>
</document>
|
Although this example uses Word, you can perform a similar conversion on XHTML using class attributes to determine the styles. A bonus example is provided in the ZIP download file that does just that (see Download.)
In some situations, named style information is not available. In the case of Word, a variety of users might have created the input documents using different techniques, including directly applying font changes rather than using styles. Similarly, in HTML, an author or tool might have applied individual style attributes to paragraphs and other elements rather than defining reusable CSS classes. In these cases, to infer the structure, you have to rely on text formatting such as font size and font effects like bold and italics. Doing so can be less reliable and consistent than styles but can still yield important clues.
Figure 2 shows a Word document that uses direct formatting rather than styles. Rather than using styles, the author of this document kept everything in the Normal style, directly changed the font size, and applied bold and italics to make some paragraphs appear to be headings.
Figure 2. Word document that uses formatting instead of styles
The XSLT in Listing 14 converts the Word document to XML based on its formatting information rather than style names. It tests the size of the font and whether it is in bold or italics to determine whether it is a first-level or second-level heading. In this type of conversion, a style map is less useful because the mapping rules are more complex and best expressed as XSLT and XPath code.
Listing 14. Getting structural clues from text formatting in XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
exclude-result-prefixes="w">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<document>
<xsl:apply-templates select=".//w:p"/>
</document>
</xsl:template>
<xsl:template match="w:p">
<xsl:variable name="size" select="w:pPr/w:rPr/w:sz/@w:val"/>
<xsl:choose>
<xsl:when test="($size > 32) and exists(w:pPr/w:rPr/w:b)">
<h1>
<xsl:apply-templates select="*"/>
</h1>
</xsl:when>
<xsl:when test="($size > 25) and exists(w:pPr/w:rPr/w:i)">
<h2>
<xsl:apply-templates select="*"/>
</h2>
</xsl:when>
<xsl:otherwise>
<p>
<xsl:apply-templates select="*" mode="char-formatting"/>
</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="w:r" mode="char-formatting">
<xsl:choose>
<xsl:when test="w:rPr/w:b">
<b>
<xsl:apply-templates select="*"/>
</b>
</xsl:when>
<xsl:when test="w:rPr/w:i">
<i>
<xsl:apply-templates select="*"/>
</i>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="*"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
|
Listing 15 shows the sample output.
Listing 15. Sample output from the Word formatting XSLT
<document>
<h1>Chapter 1: In the beginning</h1>
<p>In this chapter...</p>
<h2>1.1 Introduction</h2>
<p>In this section...</p>
<h2>1.2 Next Steps</h2>
<p>In this section there is text that uses <b>bold</b>
and <i>italics</i>.</p>
</document>
|
An analogy in XHTML is the use of style attributes
or formatting elements like font and
center.
All of the examples used in this article are fairly simple for educational purposes. The content you encounter in the real world is likely to be more complex. In some scenarios, you will encounter fairly uniform input documents—for example, if you are converting Word documents that have been carefully corrected by a production editor or XHTML documents that were consistently generated by an application.
Often, though, content is messy. If content is hand-created by humans, it is likely to have many variations. When you create your XSLT, you should never assume consistency. Test the assumptions in your code frequently, and write templates that catch unhandled situations and emit warning messages.
Mixed content deserves special attention. If you convert documents that contain
mixed content, be sure to use apply-templates liberally
rather than value-of, which can inadvertently flatten
mixed content.
White space is also crucial when processing narrative content. Avoid indenting output in your XSLT because it might introduce unintended white space. Conversely, avoid stripping white space from the input document in case it was there for a reason.
Human review of your output is always recommended. Although this review can be time-consuming for a large body of content, it is worth it to at least spot check the output to identify repetitive errors.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample XSLT stylesheets for this article | examples.zip | 21KB | HTTP |
Information about download methods
Learn
- Introduction to XSLT (Nicholas Chase, developerWorks, January 2007): Explore how to transform XML data from one format to another with XSLT stylesheets. Also, examine the basics of XPath and some advanced XSLT capabilities.
- What kind of language is XSLT? (Michael Kay, developerWorks, April 2005): Put XSLT in context with this analysis and overview. Learn where the language comes from, what it's good at, and why you should use it.
- Learn more about highly desirable features in XSLT
2.0, some new and others designed to address XSLT 1.0 shortcomings in this seven-part
"Planning to upgrade XSLT 1.0 to 2.0" series (David Marston, Joanne Tong, Henry Zongaro; developerWorks, October 2006 - July 2007):
- Part 1: Improvements in XSLT: Explore which XSLT 2.0 features are likely to motivate an upgrade as you compare common applications of pure 1.0 syntax to the simpler and more versatile 2.0 syntax.
- Part 2: Five strategies for changing from XSLT 1.0 to 2.0: Look at higher-level decision factors for planning an upgrade to XSLT 2.0, setting the stage for using Backwards and Forwards Compatibility as transition tools.
- Part 3: Why the transition requires planning: Check out a buyer's guide to XSLT 2.0 processors.
- Part 4: The toolkit for XSLT portability: Learn about the complete toolkit for mixing code of different versions.
- Part 5: Make your stylesheets work with any processor version: Find out which stylesheets are portable between versions 1.0 and 2.0 and how to run both 1.0 and 2.0 processors for a long transition period.
- Part 6: How to mix XSLT versions for a 2.0 processor: Learn where and how to adapt your 1.0 legacy code to get equivalent results from a 2.0 processor.
- Part 7: Selection of XSLT 2.0 features and the 1.0 shortcomings they address: Discover how to apply XSLT 2.0 enhancements for data organization, expansion in XPath expression syntax, parameter passing across templates, string processing, and more.
- More
articles by this author (Priscilla Walmsley, developerWorks, January 2010-current): Read articles about NIEM IEPD, XML, and other technologies.
- New to XML? Get the resources you need to learn XML.
- XML
area on developerWorks: Find the resources you need to advance your skills in the
XML arena, including DTDs, schemas, and XSLT. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- Saxon: Download the XSLT 2.0 processor used to test the examples in this article.
- HTML Tidy: Download an open source parser that turns HTML into well-formed XML.
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Priscilla Walmsley serves as Managing Director and Senior Consultant at Datypic. She specializes in XML technologies, architecture, and implementation. She is the author of Definitive XML Schema (Prentice Hall, 2001) and XQuery (O'Reilly Media, 2007). In addition, she is the co-author of Web Service Contract Design and Versioning for SOA (Prentice Hall, 2008). You can reach Priscilla at pwalmsley@datypic.com.




