Working XML: Serve friendlier RSS and Atom feeds

Sensible alternatives for novices to RSS and Atom

RSS and Atom feeds are popping up like mushrooms on Web sites. They are popular because they offer a simple mechanism for loyal visitors to register with a site and be notified of updates. Still they are not always easy on users, particularly those with older browsers. In this article, Benoît offers a technique to help visitors to your site read and understand the RSS and Atom feeds.

Share:

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

Photo of Benoit MarchalBenoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at www.marchal.com.



24 October 2006

Also available in Chinese Japanese

Catering to a larger audience

RSS and Atom feeds offer a very effective solution for visitors to subscribe to your site and be notified when new items are made available. They are growing in popularity because visitors are increasingly concerned about their privacy and have become wary of spam. Atom and RSS allow visitors to keep in touch with your site without requiring them to provide any personal data.

RSS became very popular with blogs but it is not limited to blogs: every site benefits from building a loyal readership.

One of the challenges for webmasters is that RSS and Atom are still very new and few people have heard of them, fewer still understand how to use them and have the right software installed.

Specifically the challenge is that you must place a link to the RSS or Atom file on your Web site for visitors to subscribe to but when they click on it, many visitors are presented with XML code... which not a very friendly sight.

Web syndication is in a transition period. The benefits are already here and it is worthwhile to build RSS into your Web site, but it remains that many users are not yet properly equipped to subscribe to RSS feeds. Apple Safari was the first major browser with built-in RSS support (in 2005). It was soon followed by Firefox (version 1.5), Internet Explorer (version 7) and Opera (version 9).

By this time next year, every major browser will boost excellent support for RSS. It might take another year or two before the majority of users upgrade though so it will be a few years before you can safely assume that all your visitors have RSS-capable browsers.

In the meantime, you can provide an alternative that works with almost every browser.

Since RSS is XML, treat it as such

What happens? The RSS or Atom feed is an XML document. When the users clicks it, the browser downloads the document and attempts to display it as XML, that is, as raw code. To make things worse, some versions of Internet Explorer display a security warning.

The situation is not easy. Technically RSS feeds should have the application/rss+xml MIME type while Atom feeds should be identified as application/atom+xml. If the MIME type is correct and the visitor has a news aggregator properly setup on his machine, then the browser will launch it automatically.

In practice, few visitors have the proper configuration so they are more likely to see a cryptic error message. Consequently most Web sites use the text/xml or application/xml MIME type which is incorrect but at least causes the browser to display the raw XML code. It's only a slight improvement over an error message but, hey, take what you can.

To makes matter worse, some sites serve XML documents as application/octet-stream due to misconfiguration. The webmaster must update the server configuration to use the most appropriate MIME type. For example, with the popular Apache Web server, this is done in the .htaccess file.

To alleviate the problem, the most recent browsers sniff incoming XML files to categorize them properly. Sniffing simply means that they read the first few bytes looking for RSS or Atom tags. But, again, that requires the visitor to use an RSS-aware browser.

Stylesheet to the rescue

Fortunately there's a better solution: an XSLT stylesheet. If the browser treats the feed as an XML document, it will use the stylesheet to render a sensible page. If, on the other hand, the browser recognizes an RSS and Atom feed, it will ignore the stylesheet. Voilà, the best of both worlds!

Listing 1 is an RSS document associated to a stylesheet (an excerpt from my podcast's feed). Note the second line is an xml-stylesheet processing instruction. This is the crucial link to the stylesheet. The href is the path to the stylesheet.

Listing 1. RSS excerpt
        <?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="rssfeed.xsl"?>
<rss>
 <channel>
  <title>Declencheur</title>
  <link>http://www.declencheur.com/</link>
  <description>Le podcast qui parle photos.</description>
  <language>fr</language>
  <pubDate>Tue, 11 Jul 2006 15:31:46 +0200</pubDate>
  <item>
   <title>Cadrage</title>
   <link>http://www.declencheur.com/clic/archives/2006/07/esposito</link>
   <description><![CDATA[<p>En 20 minutes, Nicolas nous resume son cours
    d'esthetique. Il nous livre de nombreuses techniques, astuces et regles
    d'esthetique pour ameliorer nos photos.</p>]]></description>
   <pubDate>Mon, 03 Jul 2006 21:30:21 +0200</pubDate>
   <enclosure length="34812500" type="audio/mpeg"
     url="http://www.declencheur.com/clic/medias/2006/decl-2006-07-03.mp3" />
  </item>
  <item>
   <title>Complement visuel : cadrage</title>
   <link>http://www.declencheur.com/clic/archives/2006/07/esposito-visuel</link>
   <description><![CDATA[<p>Le complement visuel de l'episode 6 est maintenant
     disponible !</p>]]></description>
   <pubDate>Sat, 01 Jul 2006 12:07:30 +0200</pubDate>
   <enclosure length="2738765" type="application/pdf"
     url="http://www.declencheur.com/clic/medias/2006/decl-2006-07-01.pdf"/>
  </item>
  </channel>
</rss>

Listing 2 is the stylesheet. If you are familiar with XSLT, you can probably write a similar stylesheet in minutes... but for one quirk covered in the next subsection. If you know XSLT, feel free to skip directly to the next subsection. If you are not familiar with XSLT, read on as I'll cover the bare-bone minimum needed to process RSS in the remainder of this series.

Listing 2. XSLT stylesheet
        <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/>

<xsl:template match="/">
 <html>
  <head>
   <title><xsl:value-of select="rss/channel/title"/></title>
   <script type="text/javascript" src="xsl_mop-up.js"/>
  </head>
  <body onload="go_decoding();">
   <h1><a href="{rss/channel/link}"><xsl:value-of select="rss/channel/title"/></a></h1>
   <div id="cometestme" style="display:none;">
    <xsl:text disable-output-escaping="yes">&</xsl:text>
   </div>
   <p>This is an RSS feed, you must install a news aggregator to subscribe to it.
    This feed contains the following items:</p>
   <p><xsl:value-of select="rss/channel/description"/></p>
   <xsl:for-each select="rss/channel/item">
    <h2><a href="{link}"><xsl:value-of select="title"/></a></h2>
    <div name="decodeable">
     <xsl:value-of select="description" disable-output-escaping="yes"/>
    </div>
    <xsl:if test="enclosure">
     <p><a href="{enclosure/@url}">Extra...</a></p>
    </xsl:if>
   </xsl:for-each>
  </body>
 </html>
</xsl:template>

</xsl:stylesheet>

Note that the stylesheet is an XML document (just like the RSS or Atom stream) and it uses a namespace (like Atom elements or RSS extensions). Typically for XML documents, you must be wary of the syntax. Specifically make sure that the opening tags (<p>) have a matching closing tag (</p>). Empty tags must follow a special syntax (<br />).

The stylesheet contains XSLT statements to control the rendering (in the http://www.w3.org/1999/XSL/Transform namespace, in Listing 2 the XSLT statements begin with the xsl prefix) and HTML tags to control the layout of the page.

If you want to modify Listing 2 to adapt to your site layout, you can edit the contents of the xsl:template element. Make sure to preserve the XSLT statements.

The four XSLT instructions that you will need are xsl:value-of, xsl:for-each, xsl:if and the use of curly brackets.

The xsl:value-of instruction extracts information from the RSS or Atom document and inserts it in HTML. The instruction takes one attribute called select with a path to the RSS or Atom element that you're interested in.

For example, to copy the feed title, the path is rss/channel/title since the title element appears underneath channel which itself is included in rss. As you can see, the path simply lists the elements in the order in which they appear in the RSS document.

To copy data from an attribute, prefix the attribute name with @ as in rss/channel/item/enclosure/@url.

xsl:for-each is the looping instruction. It loops over a set of elements (selected through the select attribute as well), in this case the various items. For each item, the stylesheet prints some basic information: title, description and a link to the enclosure.

The curly brackets in attributes (and only in attributes) extract information from the RSS or Atom feed, like xsl:value-of does for regular text. In the stylesheet, curly brackets populate several href attributes.

Last but not least, the xsl:if instruction executes only if its test succeeds. In Listing 2, xsl:if tests whether it's worth printing the enclosure information or whether the enclosure tag is absent.

I have only scratched the surface of XSLT but if you make good use of copy-and-paste and Listing 2, you can adapt it to fit your site layout. Check Resources for a more complete tutorial on XSLT.

If your stylesheet does not work as expected, review the following:

  • Make sure you declare the namespace exactly as shown (the xmlns:xsl attribute), do not change the URI
  • If your document uses other namespaces (such as the iTunes extension), make sure you declare those as well
  • If the stylesheet seems to work but you cannot extract some data, it most likely is a path problem (when I teach XSLT, incorrect path causes 80% of the problems with my students)

Most feed editors allow you to insert the required xml-stylesheet instruction. If yours does not support it, you can turn to FeedBurner to update the feed. FeedBurner even offers a default XSLT stylesheet (see Resources).

The Firefox pitfall

All would be good in the land of RSS and Atom if Firefox had support for the disable-output-escaping feature in XSLT but it does not.

disable-output-escaping is an obscure feature in XSLT that serves only one purpose: it processes tags that appear in other tags, such as CDATA sections. And, RSS and Atom make heavy use of CDATA sections to embed HTML code.

With disable-output-escaping, you should be able to lift the HTML tags from the feed and insert them right into the HTML page...but for Firefox. Firefox essentially ignores the instruction so it ends up displaying the raw HTML code.

There's been some debate in the Firefox community as to whether this behavior was standard compliant or not. Nevertheless it is a problem and one for which you need a solution.

Fortunately Sean M. Burke came up with a clever piece of JavaScript that circumvents the limitation. Mr Burke was kind enough to place his code in the public domain, enabling anyone to use it in any project. For your convenience, I include a link to a copy of his script in Resources.

For the script to work, your stylesheet must insert a div section with the id "cometestme." Your stylesheet must also place every item that needs escaping in paragraphs with the name "decodeable."

Finally, you must call the script (go_decoding()), as you load the HTML document.

What to do in the stylesheet?

Listing the items in the RSS or Atom feed is only the beginning. After all, that content is already available elsewhere on the Web site and the feed was designed to drive subscriptions, not replicate content.

The popular solutions

Most webmasters who attach an XSLT stylesheet to their RSS or Atom feed include instructions on how to install a news aggregator and subsequently subscribe to their feed.

While this sounds like the right thing to do, it has been my experience that visitors who are presented with such a page are unlikely to install an aggregator. With viruses and trojans, surfers are suspicious of demands to install software.

Many sites therefore include instructions that direct visitors to an online aggregator such as Google Reader or Yahoo!. While it seems like a good idea, I remain unconvinced on its efficiency. Unless they already subscribe to many feeds, visitors are not much more likely to sign up for a new service than to install new software. Assuming they do, what are the chances that they will remember to visit the online aggregator? My thinking is that if they have to bookmark a site, I'd rather they bookmark mine.

Thinking outside of the box

Personally I offer an option to subscribe through e-mail through one of the RSS-to-e-mail services. You can safely assume that every visitor has an e-mail address. I have drafted detailed instructions, outlining the options and including a very prominent e-mail subscription form. I have found that one fifth of the visitors to my podcast would rather subscribe through e-mail over subscribing through RSS.

RSS and Atom might be better technical solutions but nothing beats a familiar service... and e-mail is the most familiar service for many visitors.

To save me having to write subscription instructions twice (with the risk that they might diverge in the future), I use the stylesheet in Listing 3. It is simpler than Listing 2 and it implements an HTML redirect to send visitors to a regular page on my site.

Listing 3. The most simple solution? Redirect them!
        <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/>
   
<xsl:template match="rss/channel">
<html>
<head>
 <title><xsl:value-of select="title"/></title>
 <meta http-equiv="refresh"
 content="0; url=http://www.declencheur.com/clic/archives/2006/03/abonnement" />
</head>
<body>
<h1><xsl:value-of select="title"/></h1>
<p>Welcome, you are on the RSS feed of <xsl:value-of select="title"/>.
 The feed offers free subscription services for
 <xsl:value-of select="title"/>.</p>
<p>In a moment, you will be redirected to a page with more instructions.
 <a href="http://www.declencheur.com/clic/archives/2006/03/abonnement">
 Click here</a> if the new page fails to open.</p>
</body>
</html>
</xsl:template>
   
</xsl:stylesheet>

When a visitor clicks the RSS feed, if her browser does not recognize RSS, it behaves like a redirect!

This article has shown how to put a friendly face on an RSS or Atom feed. Until they are more widely known, it is a good idea to implement this as a safeguard.

Resources

Learn

  • Introduction to Syndication (Vincent Lauria, developerWorks, June 2006): Get started with RSS -- find why it is so popular and its benefits, which feed readers are available and might fit your needs. Plus learn about RSS and Atom subscriptions available to you from IBM.
  • The RSS specification: Dig into this surprisingly readable spec for all the details on RSS.
  • An overview of the Atom 1.0 Syndication Format (James Snell, developerWorks, June 2005): Consider Atom, an alternative to RSS.
  • Hands-on training (Don Day, developerWorks, Mar 2000): Learn Extensible Stylesheet Language Transformations (XSLT) with this simple, hands-on exercise that demonstrates the principles of the XSLT.
  • Process Atom 1.0 with XSLT tutorial (Uche Ogbuji, developerWorks, December 2005): Take a more in-depth look at XSLT and Atom.
  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

Get products and technologies

  • The JavaScript hack: Download the original instructions and code from Sean M. Burke Web site.
  • FeedBurner: If your RSS editor does not support stylesheets, you might want to sign with FeedBurner.
  • developerWorks RSS feeds: Learn more about content feeds and add pre-defined or custom RSS and Atom feeds for developerWorks content to your site.
  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=169242
ArticleTitle=Working XML: Serve friendlier RSS and Atom feeds
publish-date=10242006