Level: Introductory Sathyan Munirathinam (sat_hyan@yahoo.com), Software Engineer, Aztec Software
01 Jul 2002 This article takes a pragmatic look at XHTML, a markup language that effectively bridges the gap between the simplicity of HTML and the extensibility of XML. It also covers the essential features of the various flavors of XHTML and includes discussions of the language and a number of real-world applications. Being a Web developer is a tough job. Not only do you have to steer clear of the traps and pitfalls that the popular browsers throw at you on a daily basis, but you also have to keep at least half an eye on the myriad developments that may (or may not) have an impact on your job. You may have just barely mastered style sheets and DHTML, yet new techniques clamor for your attention. Which ones do you need to learn right away? Which ones can you dismiss for now? Traditional HTML may ultimately be put out to pasture with the emergence of Extensible Hypertext Markup Language, or XHTML. XHTML overview
XHTML is a hybrid of HTML and XML that's specifically designed for Net device displays (which include Web browsers, PDA devices, and cell phones). January 26, 2002 marked the second birthday of XHTML 1.0 as the official W3C recommendation for Web markup. But XHTML has yet to toddle, yet to smile, and yet to cry loud enough to get the attention of most Web designers. W3C director Tim Berners-Lee put it this way: "XHTML 1.0 connects the present Web to the future Web...It provides the bridge to page and site authors for entering the structured data, XML world, while still being able to maintain operability with user agents that support HTML 4." XHTML is a fairly rigid markup language. Its rules are very straightforward, and it really has very little extensibility -- that is, you can't write your own definitions to dictate how the language behaves; you've got to follow its rules. XHTML 1.0 adopts concepts that were introduced in HTML 4.0, which requires structured and methodological behavior before it is valid. XHTML can be used with cascading style sheets (CSS) to achieve presentation goals. XHTML also allows you to use Extensible Stylesheet Language (XSL) with transformations. By using this XML-based style technology, you can actually transform a document from one type to another -- say, from an HTML document to a PDF document.
Why would you want to use XHTML?
Normally, you might upgrade to a new version of a technology for new functions, or because problems with the previous version have been fixed. However, XHTML is a fairly faithful copy of HTML 4, as far as tag functionalities go, so don't expect any fancy new tags. The W3C states that the primary advantages of XHTML are extensibility and portability: Extensibility
XML documents are required to be well-formed (with elements nested properly). With HTML, the addition of a new group of elements requires alteration of the entire DTD. In an XML-based DTD, a new set of elements simply needs to be internally consistent and well-formed to be added to an existing DTD. This greatly eases the development and integration of new collections of elements. Portability
Non-desktop devices are being used more and more frequently to access Internet documents. In most cases, these devices do not have the computing power of a desktop computer and aren't designed to accommodate ill-formed HTML, as standard desktop browsers tend to do. In fact, if these non-desktop browsers do not receive well-formed markup (HTML or XHTML), they may simply fail to display the document.
XHTML document structure
An XHTML document consists of three main parts: The basic document structure is:
<!DOCTYPE ...>
<html ... >
<head> ... </head>
<body> ... </body>
</html>
|
The <head> area contains information about the document, such as ownership, copyright, and keywords; and the <body> area contains the content of the document to be displayed. Listing 1 shows you how this structure might be used in practice: Listing 1. An XHTML example
1. <?xml version="1.0"?>
2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "DTD/xhtml1-transitional.dtd">
3. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
4. <head>
<title>My XHTML Sample Page</title>
</head>
5. <body bgcolor="white">
<center><h1>Welcome to XHTML !</h1></center>
</body>
6. </html>
|
Line 1: Since XHTML is HTML expressed in an XML document, it must include the initial XML declaration <?xml version="1.0"?> at the top of the document.
Line 2: XHTML documents must be identified by one of three standard sets of rules. These rules are stored in a separate document called a Document Type Declaration (DTD), and are utilized to validate the accuracy of the XHTML document structure. The purpose of a DTD is to describe, in precise terms, the language and syntax allowed in XHTML.
Line 3: The second tag in an XHTML document must include the opening <html> tag with the XML namespace identified by the xmlns=http://www.w3.org/1999/xhtml attribute. The XML namespace identifies the range of tags used by the XHTML document. It is used to ensure that names used by one DTD don't conflict with user-defined tags or tags defined in other DTDs.
Line 4: XHTML documents must include a full header area. This area contains the opening <head> tag and the title tags (<title></title>), and is then completed with the closing </head> tag.
Line 5: XHTML documents must include opening and closing <body></body> tags. Within these tags you can place your traditional HTML coding tags. To be XHTML conformant, the coding of these tags must be well-formed.
Line 6: Finally, the XHTML document is completed with the closing </html> tag.
XHTML DTD
When an XHTML document is created, the DTD to which it conforms is declared at the top of the document. Each DTD may be recognized by a unique label called a Formal Public Identifier, or FPI. The literal, or quoted, text following the word PUBLIC is an FPI referring to the W3C's XHTML 1.0 DTD. Currently, there are three XHTML document types:
- Strict
- Transitional
- Frameset
Strict DTD
<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
|
Use this with CSS when you want really clean markup, free of presentational clutter. Several tags have been removed from the language (like <center>), and even some attributes of other tags have been removed too (like the align attribute of the H1 tag). Transitional DTD
<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"DTD/xhtml1-transitional.dtd">
|
Use this when you need to take advantage of HTML's presentation features; many of your readers don't have the latest browsers that understand CSS. The transitional DTD supports most of the standard HTML 4 tags and attributes. Frameset DTD
<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"DTD/xhtml1-frameset.dtd">
|
This enables you to use HTML frames to partition the browser window into two or more frames. This DTD holds the frameset definitions.
XHTML validation rules
An XHTML document must be well-formed XML. It must conform to basic XML syntax:
Tag and attribute names must be written in lower-case.
| HTML | XHTML | <TD BGCOLOR="#ffcc33"> | <td bgcolor="#ffcc33"> |
Elements must nest; no overlapping. With XML and XHTML, you need to close the tags in reverse order -- in other words: last opened, first closed. | HTML | XHTML | <p>Be <b>bold!</p></b> | <p>Be <b>bold!</b></p> |
All non-empty elements must be closed. For example, with HTML, many people use the <p> tag to separate paragraphs. This tag is designed to mark the beginning and (with the closing </p> tag) end of a paragraph. That makes it a non-empty tag since it contains the paragraph text. | HTML | XHTML | First paragraph<p>
Second paragraph<p> | <p>First paragraph</p>
<p>Second paragraph</p> |
Affected elements: <basefont>, <body>, <colgroup>, <dd>, <dt>, <head>, <html>, <li>, <p>, <tbody>, <thead>, <tfoot>, <th>, <td>, <tr>.
Empty elements must be terminated. All empty elements must use the XML empty tag syntax with a trailing forward slash before the end bracket (for example, <br> becomes <br />). Note the space after the element text and the closing delimiter, />. This is for compatibility with current browsers.
| HTML | XHTML | <hr> | <hr /> | <br> | <br /> | <input ... > | <input ... /> | <param ... > | <param ... /> | <img src="valid.gif"> | <img src="valid.gif" /> |
Affected elements: <area>, <base>, <br>, <col>, <frame>, <hr>, <img>, <input>, <isindex>, <link>, <meta>, <option>, <param>.
Attribute values must be quoted. No more <img ... border=0>. You now need to put quotes around every attribute, even if it's numeric. | HTML | XHTML | <img ... border=0> | <img ... border="0" /> |
Attribute value pairs cannot be minimized. No stand alone attributes (also known as minimized attributes) are allowed. For example, <option selected> is no longer valid. Instead, you must use <option selected="selected">.
Inline tags cannot contain block-level tags. For example, an anchor tag cannot enclose a table.
Scripting elements pose a problem for XHTML compatibility. The XML parser will parse the script as an XML document unless you enclose your script in a CDATA block. Therefore, a JavaScript element would now look like this:
<script type="text/javascript">
<![CDATA[ alert("hello"); ]]>
</script>
|
This can be a problem for most current browsers, as they do not like the CDATA block. For now, the only solution is to call the JavaScript from an external file. For example:
<script language="JavaScript" type="text/javascript" src="main.js"></script>
|
For server-side programmers, this can be a problem when the JavaScript is modified dynamically. Using a separate file source for your JavaScript prevents you from being able to dynamically change your JavaScript. Because the JavaScript is included on the client side, the server side isn't able to touch it. When modifying JavaScript using ASP, JSP, or PHP scripting, use the standard HTML method of script declaration. This is the one place where making JSP or ASP 100% compatible with XHTML will be most problematic. Remember, however, the goal is not to be 100% compatible with XHTML, but to begin incorporating XHTML where feasible, allowing a quick and easy transition when that is necessary. When that time arrives, new compatible browsers should be available and you'll be set to make the jump to 100% compatibility.
XHTML Basic to replace CHTML and WML
A fundamental problem for developers who want to create mobile versions of their Web sites is that they currently have to format their pages in HTML for desktop browsing, in Wireless Markup Language (WML) for WAP devices, and in Compact HTML (CHTML) for iMode devices. This has led to a new industry devoted to converting existing Web sites into WML or CHTML. WML is based on XML, and replaces the near-obsolete Handheld Device Markup Language (HDML), while CHTML is based on HTML. Although these markup languages are similar, the differences between them prevent a Web page from being viewable by both WAP and iMode devices. XHTML Basic will be understood by all devices and will be a universal markup language. The complete XHTML Basic specification (see Resources) is available in English in several formats, including HTML, plain text, PostScript, and PDF. You can expect an inevitable push to replace languages like HDML and WML with XHTML Basic. However, it's important to remember that WML and HDML also define actions as well as content. These currently have no equivalent in XHTML. So, in the short term at least, WML and HDML aren't going to disappear. It will be interesting to see who wins out in the end. Plan on supporting all three markup languages at some point.
Future work in XHTML
One aspect of XHTML that's still under construction is device profiling, also known as Composite Capability Preference Profiles (CCPP). CCPP allows a device such as a cell phone to identify itself to a Web server, describe its limitations, and download only the information that it's capable of displaying. CCPP works because XHTML documents can be split into modules that can be downloaded separately. The W3C is working on CCPP in collaboration with the WAP Forum, among others. In the summer of 2001, work began on XHTML 2.0, the final step on the bridge between HTML and XML. XHTML 2.0 is forward-looking with its incorporation of XML technologies such as XLink, XPointer, XPath, and XInclude -- all of which are currently in development or recently released by the W3C (see the roadmap in Resources).
Conclusion
XHTML breaks new ground on the Web, giving authors a way to mix and match various XML-based languages and documents on their Web pages. It also provides a framework for nontraditional Web access devices -- from toasters to television sets -- to identify themselves and their capabilities to Web servers, pulling down only information that those devices can display. Thanks to XHTML, you can continue writing in the HTML you've come to know and love. You may just need to clean it up a bit. My guess is that XHTML 2.0 (see Resources) will specifically clean up HTML tags and their usage. In conclusion, XHTML makes it easy to create documents that can be seen by all kinds of new devices. Additionally, with a little studying, you can create much more powerful pages than ever before. Lastly, XHTML is the bridge to XML -- the future language of the Internet.
Resources - Review the W3C XHTML 1.0 specification, which defines a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4.
- Check out XHTML.org for news and information about XHTML.
- Read an introduction and overview of XHTML that includes an explanation of the differences between XHMTL and HTML 4.
- Find out more about XHTML Basic.
- Look at HTML Working Group Roadmap, which lays out a clear picture of future development in XHTML, including information on XHTML 2.0.
- View Encyclozine.com, an example of a site built in XHTML.
- To validate an XHTML page, try the W3C HTML Validation Service.
- Find more XML resources on the developerWorks
XML technology zone.
- Find more Web resources on the developerWorks
Web Architecture topic.
- Get IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the author  | |  |
Sathyan Munirathinam holds a Bachelor of Science in Computer Science and Master of Computer Applications from Madurai Kamaraj University. He has more than two years experience in information technology working as a software engineer at Aztec Software. His professional interests are in database systems and networking, and his personal interests are reading technical journals, hacking network systems, and playing cricket. You can reach him at sat_hyan@yahoo.com.
|
Rate this page
|