When a Web server sends a document to a browser, it prefixes the document
with a response header like the one shown in Listing 1. This contains metadata telling the browser how to interpret the document. One of the most important pieces of metadata is the Content-Type in the last line. This tells the browser how to render the content. For instance, a browser uses different code to display a JPEG than it does to display a GIF. Most importantly, many browsers use different code to display XHTML than they do to display Hypertext Markup Language (HTML).
Listing 1. A typical HTTP response header
HTTP/1.1 200 OK Date: Thu, 04 Jan 2007 19:39:13 GMT Server: Apache/2 Last-Modified: Wed, 06 Sep 2006 11:19:37 GMT ETag: "4dfce0-c4aa-26828440" Accept-Ranges: bytes Content-Length: 50346 Content-Style-Type: text/css Content-Type: application/xhtml+xml |
Web servers are supposed to tag XHTML documents with the media type application/xhtml+xml. Web browsers that recognize this media type take this as a signal to work in strict mode rather than in tag soup mode. This enables more reliable display and is especially important for Cascading Style Sheets (CSS) layouts and JavaScript™ programs based on the document's object model. Indeed, in some cases the same document can display two different ways depending on whether it's processed in tag soup mode or strict mode. If you go to the trouble to generate well-formed or even valid XHTML, strict mode is what you plan for and desire.
A browser that does not support XHTML can still handle a well-formed document in tag soup mode. The results won't be perfect, but they'll be passable for the small fraction of users running very old browsers. They'll also be acceptable to the much larger fraction of users running the standards-nonconformant Internet Explorer. However, current versions of Internet Explorer (including 6 and 7) do not recognize the application/xhtml+xml media type. If you send Internet Explorer an application/xhtml+xml document, it will instead offer to save the file, as shown in Figure 1.
Figure 1. Internet Explorer doesn't know what to do with application/xhtml+xml
Therefore, when serving XHTML, maximum compatibility requires sending application/xhtml+xml to Firefox, Safari, Opera, and other standards-conformant browsers and text/html to Internet Explorer. You send the same file in both cases. You just change the media type that tags it in the Hypertext Transfer Protocol (HTTP) header. When using the Apache Web server, you can do this in the server config file or in the .htaccess file in an individual directory.
Apache configuration directives
By default, Apache decides what media type to send with each file by inspecting the file's extension. The extension-type mappings are stored in the mime.types file in the httpd/conf directory (usually somewhere like the /usr/httpd/conf or /etc/httpd/conf directory). For example, Listing 2 shows part of the mime.types file from Apache 2.0.
Listing 2. Apache's mime.types
# This file controls what Internet media types are sent to the client for # given file extension(s). Sending the correct media type to the client # is important so they know how to handle the content of the file. # Extra types can either be added here or by using an AddType directive # in your config files. For more information about Internet media types, # please read RFC 2045, 2046, 2047, 2048, and 2077. The Internet media type # registry is at <http://www.iana.org/assignments/media-types/>. # MIME type Extensions application/atom+xml atom application/mathematica application/mathml+xml mathml application/msword doc application/octet-stream bin dms lha lzh exe class so dll dmg application/postscript ai eps ps application/rdf+xml rdf application/reginfo+xml application/xhtml+xml xhtml xht application/xslt+xml xslt application/xml xml xsl application/xml-dtd dtd application/xml-external-parsed-entity application/zip zip audio/mpeg mpga mp2 mp3 image/jpeg jpeg jpg jpe image/naplps image/png png image/svg+xml svg image/tiff tiff tif text/html html htm text/plain asc txt text/sgml sgml sgm text/xml text/xml-external-parsed-entity video/mpeg mpeg mpg mpe |
Some older versions don't install all of these mappings by default and might, in fact, use some actively harmful mappings. In particular, using text/xml rather than application/xml for raw XML files is a common problem.
However, with these default mappings all you need to do is suffix your XHTML files with .xhtml or .xht instead of .html and all such files will be served as application/xhtml+xml. This is great for Firefox, Opera, and Safari but not so nice for Internet Explorer. What you need is a way to send one media type to Internet Explorer and a different one to everyone else.
In 2007, it's safe to assume that all browsers that aren't Internet Explorer recognize application/xhtml+xml. (If you really want to support very old browsers, it's easy enough to hack the rules I present below.) Thus you need to identify Internet Explorer in all its versions and change the media type to text/html. Fortunately, Internet Explorer helpfully tells the server who it is when it sends an HTTP request, as shown in Listing 3.
Listing 3. Internet Explorer's HTTP request header
GET /test/a.xhtml HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel,
application/msword, application/vnd.ms-powerpoint, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
Host: www.xom.nu
Connection: Keep-Alive
|
The key is the User-Agent field. While for legacy reasons Internet Explorer is pretending to be Netscape at the start, the string MSIE identifies this as Internet Explorer.
All versions of Internet Explorer contain this string in the User-Agent field, and no other modern browsers do.
You need to configure the server to look at the User-Agent field in the header and send text/html to Internet Explorer and application/xhtml+xml to everyone else. The mod-rewrite module is not limited to rewriting URLs. It can change HTTP response headers too, and it can do that based on the User-Agent. Listing 4 shows the necessary code to place in your config file.
Listing 4. Sending text.html to Internet Explorer
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} .*MSIE.*
RewriteCond %{REQUEST_URI} \.xhtml$
RewriteRule .* - [T=text/html]
|
The first line turns the rewrite engine on.
The second line, the first rewrite condition, indicates that following rules only apply when the user-agent string in the HTTP request header contains the substring MSIE. The regular expression .*MSIE.* accomplishes this.
The third line, the second rewrite condition, indicates that the following rules only apply when the browser requests a file that has the extension .xhtml. Regular .html files should be served as normal text/html to all browsers.
The last line is the actual rewrite rule. This rule is a little unusual because the rewrite doesn't actually change anything in the URL. You match the entire URL (.*) but then substitute it for itself (-). However, the final [T=text/html] field changes the Content-Type header to text/html. If both conditions are matched, this rule is used. Otherwise it isn't.
Depending on your server setup, this code can go in one of several places:
- The main httpd.conf file
- The VirtualHost section within the httpd.conf file or a separate virtual host config file
- An .htaccess file in the directory where the XHTML files reside.
These instructions should work in both Apache 1.3 and 2.0. If you're using these rules in an .htaccess file and it doesn't seem to work, make sure that directory is set to allow overrides in the main httpd.conf file, like so:
<Directory /var/www/foo> AllowOverride FileInfo </Directory> |
If you can, you should support one more browser that does not recognize application/xhtml+xml, Lynx. Lynx is a text mode browser mostly used by automated scripts and UNIX® geeks working in a shell. It has a miniscule market share, but its unique abilities make it important enough to be worth catering to if you can do so without inconveniencing other users.
Fortunately, all Lynx user-agent strings begin with the word "Lynx," which no other user-agent strings contain. Thus, all you need to do to support Lynx is add that string to the rewrite condition regular expression, as shown in Listing 5:
Listing 5. Sending text.html to Lynx
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ((.*MSIE.*)|(Lynx.*))
RewriteCond %{REQUEST_URI} \.xhtml$
RewriteRule .* - [T=text/html]
|
The new regular expression in the second line matches any string that contains MSIE or starts with Lynx. If you discover another browser that doesn't play well with application/xhtml+xml, you can add its user-agent string in a similar fashion.
Pretending to be Internet Explorer
Some older versions of Opera and Safari did pretend to be Internet Explorer by including MSIE in their user-agent strings. However, you don't have to worry about this for two reasons:
- Very few people use those versions any more.
- Unlike more recent versions of Safari and Opera, those older versions work better with text/html than application/xhtml+xml anyway.
For more precise and accurate targeting, see Resources for links to more complete lists of user-agent strings and browser support for application/xhtml+xml that you can consult.
XHTML is the future of the Web. However, like many other important technologies, its adoption has been hampered by poor support in Microsoft browsers. As this article shows, there's no reason to wait for Microsoft. You can easily serve correct XHTML to non-Microsoft browsers while still telling Internet Explorer to treat it as tag soup. Visitors and page authors using modern browsers will get the full benefit of XHTML, while visitors hobbled by Internet Explorer will still get most of the content. Setting the Multipurpose Internet Mail Extensions (MIME) media type properly isn't the only thing you need to do to serve XHTML to legacy browsers, but it is a big step in the right direction.
Learn
- XHTML 1.0: Marking up a new dawn (Molly Holzschlag, developerWorks, February 2005): Learn what XHTML 1.0 is and what it means to you, the Web developer, in this introduction to XHTML and its capabilities.
- mod_rewrite documentation: Explore the capabilities of this very powerful (if often opaque) module and rewrite requested URLs on the fly.
- Sending XHTML as text/html Considered Harmful (Ian Hickson): Learn why just sending all XHTML as text/html is a bad idea and why deliver your XHTML markup as application/xhtml+xml.
- XHTML
Media Types: Visit the World Wide Web Consortium (W3C) definitions of official media types for XHTML content and learn more about the best current practice for the Internet media types that serve XHTML Family documents.
- Content negotiation support in Apache 2.0: Learn how to send different files to different clients. If you want to maintain separate XHTML and HTML copies of each document, you might find this useful.
- Compatibility guidelines in the XHTML 1.0 specification: Make it easier to process your documents as both XHTML and tag soup.
- User-agent strings topic (Wikipedia): Explore the most complete collection of user-agent string that the author has seen.
- XHTML media type test - results: Study the details of this W3C-maintained list on how well different versions of various browsers support application/xhtml+xml.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
-
XML and XML Schema: See developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
Get products and technologies
- IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
- XML zone discussion forums: Participate in any of several XML-centered forums.
- developerWorks blogs: Get involved in the developerWorks community.
- developerWorks XML zone: Share your thoughts: After you've read this article, post your comments and thoughts in this forum moderated by Linda Meyer, the XML zone editor. Do you agree or disagree? Do you have other visions for what to expect for XML this year?

Elliotte Rusty Harold is originally from New Orleans, to which he returns periodically in search of a decent bowl of gumbo. However, he resides in the Prospect Heights neighborhood of Brooklyn with his wife Beth and cats Charm (named after the quark) and Marjorie (named after his mother-in-law). He's an adjunct professor of computer science at Polytechnic University, where he teaches Java and object-oriented programming. His Cafe au Lait Web site has become one of the most popular independent Java sites on the Internet, and his spin-off site, Cafe con Leche, has become one of the most popular XML sites. His most recent book is Java I/O, 2nd edition. He's currently working on the XOM API for processing XML, the Jaxen XPath engine, and the Jester test coverage tool.
Comments (Undergoing maintenance)





