XHTML is HTML written as well-formed XML, which generally means that the HTML must adhere to XML rules. These rules are stricter than those for HTML—for example:
- Tag names are case sensitive, specifically lowercase—for example,
<p>not<P>. - Quote attribute values—for example,
<input type="checkbox">not<input type=checkbox>. - If an attribute is applied, provide a value. For HTML attributes with no
defined value, use the name of the attribute as its value—for
example,
<input selected="selected">not<input selected>. Valueless attributes include:checkeddisabledselectednowrap
- Properly nest tags—for example,
<b><i>…</i></b>not<b><i>…</b></i>. - Do not omit optional closing tags—for example,
<p>…</p><p>…</p>not<p>…<p>….
In one area, however, XML is less strict than HTML—namely, in how tags are closed. With XML, you can close empty elements (that is, any element without text or other tags within it) using either a short form (self-closing) or long form with a separate closing tag:
- Short form with or without a space before the forward slash (
/):<tag/>or<tag /> - Long form:
<tag></tag>
With HTML, however, some tags require closing tags, while others prohibit them.
Tags that require closing tags include <a>,
<abbr>, <acronym>,
<address>, <b>,
<big>, <blockquote>,
<button>, <code>,
<dir>, <div>,
<em>, <font>,
<form>, <h1>,
<i>, <label>,
<li>, <map>,
<ol>, <pre>,
<script>, <span>,
<strong>, <style>,
<sub>, <table>,
<tt>, <ul>, and
<xml>. Tags that prohibit closing tags
include <area>, <base>,
<br>, <col>,
<frame>, <img>,
<isindex>, <link>,
<meta>, and <param>.
In addition, the W3C recommends placing a space at the end of a self-closing tag to improve compatibility with browsers:
- Recommended:
<input type="checkbox" /> - Not recommended:
<input type="checkbox"/>
See Resources for links to the HTML Compatibility Guidelines.
Because XHTML is XML, XSLT can transform XHTML. The original intent of XSLT was as a flexible and powerful means of converting XML data to HTML. The wide adoption of XML technologies—especially XHTML—has broadened the number of applications that XSLT solves. XHTML can be an input to a transformation, generated by it, or both. Using XSLT to produce XHTML presents the problem of how to close empty tags in a way that conforms to HTML.
What happens if empty tags are improperly closed?
- Script tags to download a JavaScript file, if closed in short form,
fail to get the file.
Fails:
<script type="text/javascript" href="myfile.js" />Succeeds:
<script type="text/javascript" href="myfiles.js"></script> - A self-closing empty
<div>tag is treated as an opening tag. The self-closing<div>element captures the following elements and text as its own contents until the next opening<div>tag. For example:
<div id="mydiv1" /> <p>This paragraph will be contained within mydiv1</p> <div id="mydiv2"></div> <p>This paragraph will NOT be contained in either 'div'</p>
The browser interprets the markup as follows, with the implied closing
<div>tag added and noted as a comment:
<div id="mydiv1"> <p>This paragraph is contained within mydiv1</p> </div> <!-- implied closing tag --> <div id="mydiv2"></div> <p>This paragraph is NOT contained in either 'div'</p>
- A single
<br>element expressed in long form<br></br>, is interpreted as two elements:<br><br>, thus duplicating the number of line breaks.
Three solutions for properly closing XHTML tags exist, depending on the development
environment. The serialization involves writing code (for example, C#
or Java™ code) to convert an XML document object to a string. Serialization
is the most complex solution, but it's also the most flexible. The other two
solutions depend on the version of XSLT (XSLT 2.0 is the easiest solution).
Serialization is the process of converting a binary object in memory to a string suitable for storage in a file system or transmission over a network. Whether you code the serialization of an object model to XHTML or the result of the XSLT transform is already a string, solve the problem of properly closing empty XHTML tags by controlling serialization.
If the result of a transform is an object, serialize tags that prohibit closing tags in short, self-closing form:
"<" tag-name [ attributes ] " />" |
Close all other empty tags with a separate closing tag:
"<" tag-name [ attributes ] "></" tag-name ">" |
Here are two examples in C#: one for an
XmlTextWriter and the other for
a StringWriter. In Listing 1,
XhtmlTextWriter is derived from XmlTextWriter
and overrides the WriteEndElement method to close the
element in either short form or long form.
Listing 1. XhtmlTextWriter
public class XhtmlTextWriter : System.Xml.XmlTextWriter
{
private string tagName = string.Empty;
private string elementNamespace = string.Empty;
public XhtmlTextWriter(System.IO.TextWriter w)
: base(w)
{
}
public override void WriteEndElement()
{
bool isShortNotation = true;
// Check if XHTML Namespace
if (string.IsNullOrEmpty(this.elementNamespace) ||
(this.elementNamespace.Contains("www.w3.org") &&
this.elementNamespace.Contains("xhtml")))
{
switch (this.tagName)
{
case "area":
isShortNotation = true;
break;
case "base":
isShortNotation = true;
break;
case "basefont":
isShortNotation = true;
break;
case "bgsound":
isShortNotation = true;
break;
case "br":
isShortNotation = true;
break;
case "col":
isShortNotation = true;
break;
case "frame":
isShortNotation = true;
break;
case "hr":
isShortNotation = true;
break;
case "img":
isShortNotation = true;
break;
case "input":
isShortNotation = true;
break;
case "isindex":
isShortNotation = true;
break;
case "keygen":
isShortNotation = true;
break;
case "link":
isShortNotation = true;
break;
case "meta":
isShortNotation = true;
break;
case "param":
isShortNotation = true;
break;
default:
isShortNotation = false;
break;
}
}
if (isShortNotation)
{
base.WriteEndElement();
}
else
{
base.WriteFullEndElement();
}
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
this.tagName = localName.ToLower();
this.elementNamespace = ns;
base.WriteStartElement(prefix, localName, ns);
}
public override void WriteStartDocument()
{
// Don't emit XML declaration
}
public override void WriteStartDocument(bool standalone)
{
// Don't emit XML declaration
}
}
|
Listing 2 shows the XhtmlStringWriter
class, which is derived from StringWriter and
overrides the Write method to convert long form to
short form for those tags that require it. You can write similar methods for other
programming languages, such as the Java language.
Listing 2. XhtmlStringWriter
public class XhtmlStringWriter : System.IO.StringWriter
{
public override void Write(string value)
{
bool isShortNotation = false;
switch (value)
{
case "></area>":
isShortNotation = true;
break;
case "></base>":
isShortNotation = true;
break;
case "></basefont>":
isShortNotation = true;
break;
case "></bgsound>":
isShortNotation = true;
break;
case "></br>":
isShortNotation = true;
break;
case "></col>":
isShortNotation = true;
break;
case "></frame>":
isShortNotation = true;
break;
case "></hr>":
isShortNotation = true;
break;
case "></img>":
isShortNotation = true;
break;
case "></input>":
isShortNotation = true;
break;
case "></isindex>":
isShortNotation = true;
break;
case "></keygen>":
isShortNotation = true;
break;
case "></link>":
isShortNotation = true;
break;
case "></meta>":
isShortNotation = true;
break;
case "></param>":
isShortNotation = true;
break;
}
if (isShortNotation)
{
base.Write(" />");
}
else
{
base.Write(value);
}
}
}
|
First, ensure that the XSLT output method is xml, not
html. The html method
is not XHTML; HTML is not XML. Neither an XSLT processor nor an XML parser can
process HTML.
If the result of a transform is a string or file, control serialization indirectly by coding the XSLT templates to force the correct closing of empty tags. The form in which empty tags are closed depends on the implementation of the XSLT processor.
If the input is also XHTML, use identity templates to copy unchanged tags to the output. Identity templates process the input elements and attributes and copy them to the output. Without identity templates, only the text between tags is copied to the output.
The XSLT in Listing 3, which lacks identity templates, outputs only plain text.
Listing 3. Results are plain text only
<?xml version='1.0' ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
|
The XSLT in Listing 4 has identity templates to copy elements
that are not processed by other templates. An identity template matches a
node and copies it. Two options to copy elements exist: This example uses
xsl:copy. The other option uses
xsl:element and is discussed later.
Listing 4. Results include tags but might not be properly closed
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<!-- put your templates here -->
<!-- identity templates -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
Controlling how tags are closed
To properly close tags, the identity templates must select tags requiring short
form. Selecting tags in a template's match expression requires knowing the
tag's namespace or knowing that no namespace is used. The trick to controlling
how an empty tag is rendered as either short form or long form is not to
process child nodes (short form) or process child nodes (long form), even if
there are no child nodes to process. In this regard, the XSLT processor makes
a difference. The Microsoft processors—Microsoft® .NET and
MSXML—work with the trick of not processing child nodes to output
tags in short form. Other processors, such as Saxon, always use short form for
empty tags, so for HTML elements that require a closing tag, some text must
be inserted. For most elements, a space is appropriate. For the
<script> tag, a JavaScript comment token
(that is, //), separates the opening and closing tags.
Fortunately, this approach also works with Microsoft processors.
The Microsoft .NET or MSXML processor
If the input document has no namespace, as in Listing 5, the XSLT does not require a namespace, either.
Listing 5. XHTML input document without a namespace
<html> ... </html> |
Listing 6 shows XSLT that matches the tags that must be self-closing. The self-closing tags are processed such that the Microsoft XSLT processors use the short form. Because there is no namespace, the tag names do not have a namespace prefix.
Listing 6. XSLT without a namespace
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<!-- identity templates -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="area[not(node())]|base[not(node())]|
basefont[not(node())]|bgsound[not(node())]|br[not(node())]|
col[not(node())]|frame[not(node())]|hr[not(node())]|
img[not(node())]|input[not(node())]|isindex[not(node())]|
keygen[not(node())]|link[not(node())]|meta[not(node())]|
param[not(node())]">
<!-- identity without closing tags -->
<xsl:copy>
<xsl:apply-templates select="@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
If the input document has a namespace, as in Listing 7, the XSLT requires a namespace, and the tag names require a prefix.
Listing 7. XHTML input document with namespace
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> ... </html> |
Listing 8 shows XSLT that matches the tags that must
be self-closing. Because there is a namespace, the tag names require a
namespace prefix. Without a prefix, the tags do not match. Note the XHTML namespace declaration begins with xmlns:htm. The prefix, htm, is arbitrary.
Listing 8. XSLT with a namespace
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:htm="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<!-- identity templates -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="htm:area|htm:base|htm:basefont|
htm:bgsound|htm:br|htm:col|htm:frame|htm:hr|htm:img|
htm:input|htm:isindex|htm:keygen|htm:link|htm:meta|
htm:param">
<!-- identity without closing tags -->
<xsl:copy>
<xsl:apply-templates select="@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
If the input document has no namespace, as in Listing 9, the XSLT does not require a namespace, either.
Listing 9. XHTML input document without a namespace
<html> ... </html> |
Listing 10 shows XSLT that matches the tags that
must be self-closing. Tags that require a separate closing tag but are empty
are output with a space to prevent them being serialized using the short form.
The exception is empty script elements, which
are given a JavaScript comment symbol (//). Because
there is no namespace, the tag names do not have a namespace prefix.
Listing 10. XSLT with matching self-closing tags
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<!-- identity templates -->
<xsl:template match="*[not(node())]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:text> </xsl:text>
</xsl:copy>
</xsl:template>
<xsl:template match="script[not(node())]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:text>//</xsl:text>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="area[not(node())]|base[not(node())]|
basefont[not(node())]|bgsound[not(node())]|br[not(node())]|
col[not(node())]|frame[not(node())]|hr[not(node())]|
img[not(node())]|input[not(node())]|isindex[not(node())]|
keygen[not(node())]|link[not(node())]|meta[not(node())]|
param[not(node())]">
<!-- identity without closing tags -->
<xsl:copy>
<xsl:apply-templates select="@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
If the input document has a namespace, as in the XHTML document in Listing 11, the XSLT requires a namespace, and the tag names require a prefix.
Listing 11. XHTML input document with namespace
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
...
</html>
|
Listing 12 shows XSLT that matches the tags that must
be self-closing. Tags that require a separate closing tag but are empty are
output with a space to prevent them being serialized using the short form. The
exception is empty script elements, which are
given a JavaScript comment symbol (//). Because
there is a namespace, the tag names require a namespace prefix. Without a
prefix, the tags would not match. Note the XHTML namespace declaration begins with xmlns:htm. The prefix, htm, is arbitrary.
The template with a negative priority allows the match expression for self-closing tags to have a higher priority. Without it, the template for self-closing tags is ignored.
Listing 12. XSLT with matching self-closing tags and a namespace
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:htm="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<!-- identity templates -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="htm:area|htm:base|htm:basefont|
htm:bgsound|htm:br|htm:col|htm:frame|htm:hr|htm:img|
htm:input|htm:isindex|htm:keygen|htm:link|htm:meta|
htm:param">
<!-- identity without closing tags -->
<xsl:copy>
<xsl:apply-templates select="@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(node())]" priority="-0.5">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:text> </xsl:text>
</xsl:copy>
</xsl:template>
<xsl:template match="htm:script[not(node())]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:text>//</xsl:text>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
Controlling the output namespace
To exclude the XHTML namespace from the output, such as when converting to
another XML format, use the <xsl:element>
tag rather than <xsl:copy>, as in
Listing 13.
Listing 13. XSLT template that excludes an output namespace
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:htm="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- identity templates -->
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="htm:area|htm:base|htm:basefont|
htm:bgsound|htm:br|htm:col|htm:frame|htm:hr|
htm:img|htm:input|htm:isindex|htm:keygen|
htm:link|htm:meta|htm:param">
<!-- identity without closing tags -->
<xsl:element name="{name()}">
<xsl:apply-templates select="@*"/>
</xsl:element>
</xsl:template>
<xsl:template match="*[not(node())]" priority="-0.5">
<xsl:element name="{name()}">
<xsl:apply-templates select="@*"/>
<xsl:text> </xsl:text>
</xsl:element>
</xsl:template>
<xsl:template match="htm:script[not(node())]">
<xsl:element name="{name()}">
<xsl:apply-templates select="@*"/>
<xsl:text>//</xsl:text>
</xsl:element>
</xsl:template>
<xsl:template match="@*|text()">
<xsl:copy/>
</xsl:template>
<xsl:template match="comment()">
<xsl:comment xml:space="preserve">
<xsl:value-of select="."/>
</xsl:comment>
</xsl:template>
<xsl:template match="processing-instruction()">
<xsl:processing-instruction name="{name()}">
<xsl:value-of select="."/>
</xsl:processing-instruction>
</xsl:template>
</xsl:stylesheet>
|
With XSLT 2.0, another method is available—xhtml—which,
as the name implies, solves the problem of producing correctly closed empty XHTML
tags. The namespace, if applied to the input document, must be specified in the xpath-default-namespace attribute. Listing 14 shows the method and xpath-default-namespace attributes on the xsl:output tag.
To use XSLT 2.0, use an XSLT processor that supports it, such as Saxon. At this time, Microsoft processors do not support XSLT 2.0.
Listing 14. XSLT 2.0
<?xml version="1.0" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xhtml"
xpath-default-namespace="http://www.w3.org/1999/xhtml"/>
<!-- put your templates here -->
<!-- identity templates -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
Controlling the output namespace in XSLT 2.0
To exclude the XHTML namespace from the output, such as when you convert to
another XML format, use the <xsl:element> tag
rather than <xsl:copy>, as in Listing 15.
Listing 15. XSLT 2.0 template that excludes an output namespace
<?xml version="1.0" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xhtml"
xpath-default-namespace="http://www.w3.org/1999/xhtml"/>
<!-- put your templates here -->
<!-- identity templates -->
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
|
You must close XHTML tags properly, either with a separate tag or self-closing, depending on the tag name. When you produce XHTML by an XSLT transformation, the method for controlling how tags are closed depends on the XSLT processor. The universal but complex solution is to write a serialization method. Other solutions for XSLT 1.0 involve coding the XSL templates in a certain way. The easiest solution by far is XSLT 2.0, which has native support for XHTML.
Learn
- XHTML: The power of two languages (Sathyan Munirathinam, developerWorks, July 2002): Learn more about XHTML.
- XHTML 1.0: Marking up a new dawn (Molly Holzschlag, developerWorks, February 2005): Read about the introduction of XHTML and its standards.
- XHTML 1.0 The Extensible HyperText Markup Language (Second Edition): Read the W3C's XHTML recommendation.
- The empty elements as well as element minimization and empty element content: Learn more in the W3C's HTML Compatibility Guidelines.
- XML area on developerWorks: Get the resources you need to advance your skills in the XML arena.
- My developerWorks: Personalize your developerWorks experience.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Doug Domeny has developed a browser-based, multilingual, business user-friendly XML editor written using XSLT, W3C XML Schema, DHTML, JavaScript, jQuery, regular expressions, and CSS. Holding a bachelor's degree in computer science and mathematics from Gordon College in Wenham, MA, Doug has served for many years on OASIS technical committees such as XML Localization Interchange File Format (XLIFF) and Open Architecture for XML Authoring and Localization (OAXAL). In his roles as a software engineer, he has developed significant skills in software engineering and architecture, UI design, and technical writing.




