Skip to main content

XHTML: The power of two languages

Extensible Hypertext Markup Language is a reformulation of HTML 4 in XML

Sathyan Munirathinam (sat_hyan@yahoo.com), Software Engineer, Aztec Software
Sathyan Munirathinam holds a Bachelor of Science in Computer Science and Master of Computer Applications from Madurai Kamaraj University. He has more than two years experience in information technology working as a software engineer at Aztec Software. His professional interests are in database systems and networking, and his personal interests are reading technical journals, hacking network systems, and playing cricket. You can reach him at sat_hyan@yahoo.com.

Summary:  This article takes a pragmatic look at XHTML, a markup language that effectively bridges the gap between the simplicity of HTML and the extensibility of XML. It also covers the essential features of the various flavors of XHTML and includes discussions of the language and a number of real-world applications.

Date:  01 Jul 2002
Level:  Introductory
Activity:  3442 views

Being a Web developer is a tough job. Not only do you have to steer clear of the traps and pitfalls that the popular browsers throw at you on a daily basis, but you also have to keep at least half an eye on the myriad developments that may (or may not) have an impact on your job. You may have just barely mastered style sheets and DHTML, yet new techniques clamor for your attention. Which ones do you need to learn right away? Which ones can you dismiss for now? Traditional HTML may ultimately be put out to pasture with the emergence of Extensible Hypertext Markup Language, or XHTML.

XHTML overview

XHTML is a hybrid of HTML and XML that's specifically designed for Net device displays (which include Web browsers, PDA devices, and cell phones). January 26, 2002 marked the second birthday of XHTML 1.0 as the official W3C recommendation for Web markup. But XHTML has yet to toddle, yet to smile, and yet to cry loud enough to get the attention of most Web designers.

W3C director Tim Berners-Lee put it this way: "XHTML 1.0 connects the present Web to the future Web...It provides the bridge to page and site authors for entering the structured data, XML world, while still being able to maintain operability with user agents that support HTML 4."

XHTML is a fairly rigid markup language. Its rules are very straightforward, and it really has very little extensibility -- that is, you can't write your own definitions to dictate how the language behaves; you've got to follow its rules. XHTML 1.0 adopts concepts that were introduced in HTML 4.0, which requires structured and methodological behavior before it is valid.

XHTML can be used with cascading style sheets (CSS) to achieve presentation goals. XHTML also allows you to use Extensible Stylesheet Language (XSL) with transformations. By using this XML-based style technology, you can actually transform a document from one type to another -- say, from an HTML document to a PDF document.


Why would you want to use XHTML?

Normally, you might upgrade to a new version of a technology for new functions, or because problems with the previous version have been fixed. However, XHTML is a fairly faithful copy of HTML 4, as far as tag functionalities go, so don't expect any fancy new tags.

The W3C states that the primary advantages of XHTML are extensibility and portability:

Extensibility

XML documents are required to be well-formed (with elements nested properly). With HTML, the addition of a new group of elements requires alteration of the entire DTD. In an XML-based DTD, a new set of elements simply needs to be internally consistent and well-formed to be added to an existing DTD. This greatly eases the development and integration of new collections of elements.

Portability

Non-desktop devices are being used more and more frequently to access Internet documents. In most cases, these devices do not have the computing power of a desktop computer and aren't designed to accommodate ill-formed HTML, as standard desktop browsers tend to do. In fact, if these non-desktop browsers do not receive well-formed markup (HTML or XHTML), they may simply fail to display the document.


XHTML document structure

An XHTML document consists of three main parts:

  • DOCTYPE
  • Head
  • Body

The basic document structure is:

<!DOCTYPE ...>
<html  ... >
<head> ... </head>
<body> ... </body>
</html>

The <head> area contains information about the document, such as ownership, copyright, and keywords; and the <body> area contains the content of the document to be displayed.

Listing 1 shows you how this structure might be used in practice:


Listing 1. An XHTML example

1.  <?xml version="1.0"?>
2.  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0   
    Transitional//EN"  "DTD/xhtml1-transitional.dtd">
3.  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"  
                     lang="en">
4.  <head>
    <title>My XHTML Sample Page</title>
    </head>
5.  <body bgcolor="white">
    <center><h1>Welcome to XHTML !</h1></center>
    </body>
6.  </html>

Line 1: Since XHTML is HTML expressed in an XML document, it must include the initial XML declaration <?xml version="1.0"?> at the top of the document.

Line 2: XHTML documents must be identified by one of three standard sets of rules. These rules are stored in a separate document called a Document Type Declaration (DTD), and are utilized to validate the accuracy of the XHTML document structure. The purpose of a DTD is to describe, in precise terms, the language and syntax allowed in XHTML.

Line 3: The second tag in an XHTML document must include the opening <html> tag with the XML namespace identified by the xmlns=http://www.w3.org/1999/xhtml attribute. The XML namespace identifies the range of tags used by the XHTML document. It is used to ensure that names used by one DTD don't conflict with user-defined tags or tags defined in other DTDs.

Line 4: XHTML documents must include a full header area. This area contains the opening <head> tag and the title tags (<title></title>), and is then completed with the closing </head> tag.

Line 5: XHTML documents must include opening and closing <body></body> tags. Within these tags you can place your traditional HTML coding tags. To be XHTML conformant, the coding of these tags must be well-formed.

Line 6: Finally, the XHTML document is completed with the closing </html> tag.


XHTML DTD

When an XHTML document is created, the DTD to which it conforms is declared at the top of the document. Each DTD may be recognized by a unique label called a Formal Public Identifier, or FPI. The literal, or quoted, text following the word PUBLIC is an FPI referring to the W3C's XHTML 1.0 DTD.

Currently, there are three XHTML document types:

  • Strict
  • Transitional
  • Frameset

Strict DTD

<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
		"DTD/xhtml1-strict.dtd">

Use this with CSS when you want really clean markup, free of presentational clutter. Several tags have been removed from the language (like <center>), and even some attributes of other tags have been removed too (like the align attribute of the H1 tag).

Transitional DTD

<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
		"DTD/xhtml1-transitional.dtd">

Use this when you need to take advantage of HTML's presentation features; many of your readers don't have the latest browsers that understand CSS. The transitional DTD supports most of the standard HTML 4 tags and attributes.

Frameset DTD

<! DOCTYPE html PUBLIC 	"-//W3C//DTD XHTML 1.0 Frameset//EN"
		"DTD/xhtml1-frameset.dtd">

This enables you to use HTML frames to partition the browser window into two or more frames. This DTD holds the frameset definitions.


XHTML validation rules

An XHTML document must be well-formed XML. It must conform to basic XML syntax:

Tag and attribute names must be written in lower-case.

HTMLXHTML
<TD BGCOLOR="#ffcc33"><td bgcolor="#ffcc33">

Elements must nest; no overlapping. With XML and XHTML, you need to close the tags in reverse order -- in other words: last opened, first closed.

HTMLXHTML
<p>Be <b>bold!</p></b><p>Be <b>bold!</b></p>

All non-empty elements must be closed. For example, with HTML, many people use the <p> tag to separate paragraphs. This tag is designed to mark the beginning and (with the closing </p> tag) end of a paragraph. That makes it a non-empty tag since it contains the paragraph text.

HTMLXHTML
First paragraph<p> Second paragraph<p> <p>First paragraph</p> <p>Second paragraph</p>

Affected elements: <basefont>, <body>, <colgroup>, <dd>, <dt>, <head>, <html>, <li>, <p>, <tbody>, <thead>, <tfoot>, <th>, <td>, <tr>.

Empty elements must be terminated. All empty elements must use the XML empty tag syntax with a trailing forward slash before the end bracket (for example, <br> becomes <br />). Note the space after the element text and the closing delimiter, />. This is for compatibility with current browsers.

HTMLXHTML
<hr><hr />
<br><br />
<input ... ><input ... />
<param ... ><param ... />
<img src="valid.gif"><img src="valid.gif" />

Affected elements: <area>, <base>, <br>, <col>, <frame>, <hr>, <img>, <input>, <isindex>, <link>, <meta>, <option>, <param>.

Attribute values must be quoted. No more <img ... border=0>. You now need to put quotes around every attribute, even if it's numeric.

HTMLXHTML
<img ... border=0><img ... border="0" />

Attribute value pairs cannot be minimized. No stand alone attributes (also known as minimized attributes) are allowed. For example, <option selected> is no longer valid. Instead, you must use <option selected="selected">.

Inline tags cannot contain block-level tags. For example, an anchor tag cannot enclose a table.

Scripting elements pose a problem for XHTML compatibility. The XML parser will parse the script as an XML document unless you enclose your script in a CDATA block. Therefore, a JavaScript element would now look like this:

<script type="text/javascript">
<![CDATA[ alert("hello"); ]]>
</script>

This can be a problem for most current browsers, as they do not like the CDATA block. For now, the only solution is to call the JavaScript from an external file. For example:

<script language="JavaScript" type="text/javascript" src="main.js"></script> 

For server-side programmers, this can be a problem when the JavaScript is modified dynamically. Using a separate file source for your JavaScript prevents you from being able to dynamically change your JavaScript. Because the JavaScript is included on the client side, the server side isn't able to touch it. When modifying JavaScript using ASP, JSP, or PHP scripting, use the standard HTML method of script declaration. This is the one place where making JSP or ASP 100% compatible with XHTML will be most problematic. Remember, however, the goal is not to be 100% compatible with XHTML, but to begin incorporating XHTML where feasible, allowing a quick and easy transition when that is necessary. When that time arrives, new compatible browsers should be available and you'll be set to make the jump to 100% compatibility.


XHTML Basic to replace CHTML and WML

A fundamental problem for developers who want to create mobile versions of their Web sites is that they currently have to format their pages in HTML for desktop browsing, in Wireless Markup Language (WML) for WAP devices, and in Compact HTML (CHTML) for iMode devices. This has led to a new industry devoted to converting existing Web sites into WML or CHTML. WML is based on XML, and replaces the near-obsolete Handheld Device Markup Language (HDML), while CHTML is based on HTML. Although these markup languages are similar, the differences between them prevent a Web page from being viewable by both WAP and iMode devices. XHTML Basic will be understood by all devices and will be a universal markup language.

The complete XHTML Basic specification (see Resources) is available in English in several formats, including HTML, plain text, PostScript, and PDF. You can expect an inevitable push to replace languages like HDML and WML with XHTML Basic. However, it's important to remember that WML and HDML also define actions as well as content. These currently have no equivalent in XHTML. So, in the short term at least, WML and HDML aren't going to disappear. It will be interesting to see who wins out in the end. Plan on supporting all three markup languages at some point.


Future work in XHTML

One aspect of XHTML that's still under construction is device profiling, also known as Composite Capability Preference Profiles (CCPP). CCPP allows a device such as a cell phone to identify itself to a Web server, describe its limitations, and download only the information that it's capable of displaying. CCPP works because XHTML documents can be split into modules that can be downloaded separately.

The W3C is working on CCPP in collaboration with the WAP Forum, among others. In the summer of 2001, work began on XHTML 2.0, the final step on the bridge between HTML and XML. XHTML 2.0 is forward-looking with its incorporation of XML technologies such as XLink, XPointer, XPath, and XInclude -- all of which are currently in development or recently released by the W3C (see the roadmap in Resources).


Conclusion

XHTML breaks new ground on the Web, giving authors a way to mix and match various XML-based languages and documents on their Web pages. It also provides a framework for nontraditional Web access devices -- from toasters to television sets -- to identify themselves and their capabilities to Web servers, pulling down only information that those devices can display. Thanks to XHTML, you can continue writing in the HTML you've come to know and love. You may just need to clean it up a bit. My guess is that XHTML 2.0 (see Resources) will specifically clean up HTML tags and their usage.

In conclusion, XHTML makes it easy to create documents that can be seen by all kinds of new devices. Additionally, with a little studying, you can create much more powerful pages than ever before. Lastly, XHTML is the bridge to XML -- the future language of the Internet.


Resources

About the author

Sathyan Munirathinam holds a Bachelor of Science in Computer Science and Master of Computer Applications from Madurai Kamaraj University. He has more than two years experience in information technology working as a software engineer at Aztec Software. His professional interests are in database systems and networking, and his personal interests are reading technical journals, hacking network systems, and playing cricket. You can reach him at sat_hyan@yahoo.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=11681
ArticleTitle=XHTML: The power of two languages
publish-date=07012002
author1-email=sat_hyan@yahoo.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers