RSS is simple, but not limited
The RSS acronym stands for many things: Really Simple Syndication, Rich Site Summary, RDF Site Summary, and there are probably a few more acronyms out there. In practice, RSS is the orange button that you find on a growing number of sites, including here at developerWorks.
Through the RSS feed, you can subscribe to a site and be notified of updates. RSS differs from e-mail subscriptions on two important aspects:
- RSS requires a special client, the RSS reader, but it is increasingly included in the Web browser.
- RSS preserves the user's privacy because, unlike e-mail, no personal information is exchanged with the site.
It's easy to code your first RSS file. The root RSS element is called
rss (appropriately enough). It is immediately followed by a
channel element starts with a description of
the newsfeed. It must include at least the feed title (
a link to the related Web site (
the description of the feed (
Other elements are optional. The most commonly used ones are the language of
the feed (
language), its publication date
pubDate), classification in categories (
category), and time to live (
which is the number of minutes that the channel can be cached.
docs element is often included as well, although
it is... special. The
docs element points to the RSS
documentation so it must have the same value in every RSS document.
After the channel description is the list of
where each item represents a newsworthy event.
What is newsworthy? It depends on the site and the application. For a regular Web site, an item may represent significant updates to a page; for a podcast, it's a new episode; for a network monitoring application, it's a network alert; for a forum, an item is a new post.
The content of
item starts like the channel itself:
description. It can be more specific with date, link to multimedia
enclosure), source, and comment.
To include HTML tags in the description, you must escape them (typically through a CDATA section).
Worth noting is the
guid element, a unique identifier.
When present, RSS readers rely on it to identify new items. If
GUIDs are not available, the readers must compare the item title and content to find
Listing 1 is a sample RSS feed:
Listing 1. RSS feed
<rss version="2.0"> <channel> <title>Marchal.com</title> <link>http://www.marchal.com/</link> <description>Marchal's site.</description> <docs>http://blogs.law.harvard.edu/tech/rss</docs> <language>en</language> <copyright>Copyright 2006, Benoit Marchal.</copyright> <pubDate>Fri, 30 Jun 2006 00:35:40 +0200</pubDate> <item> <title>Online photos</title> <link>http://www.marchal.com/en/photos</link> <description><![CDATA[<p>In 2002 I added a digital camera to my writer toolbox. I have used it to illustrate my articles and web sites.</p> <p>While I had some experience with film-based photos, I was new to digital images. Most of the things I knew were still valid, others I had to re-learn. There was a lot of new material as well. These pages summarize my findings.</p>]]></description> <pubDate>Fri, 30 Jun 2006 00:35:40 +0200</pubDate> <category>photo</category> <category>2002</category> <guid isPermaLink="false">photos</guid> </item> </channel> </rss>
To subscribe to a feed, visitors need an RSS reader. Essentially the reader downloads the RSS file at regular intervals and notifies the user if new items are found.
RSS readers are being built into the main browsers: Firefox 1.5, Internet Explorer 7, and Safari already include an RSS reader. Standalone readers are also available, for example, from NewsGator. Last but not least, for those who don't like to install new software, you can read feeds through aggregator sites such as Google Reader or NetVibes. There are even RSS to e-mail bridges like Zookoda.
More elements are defined in RSS than can be covered in this section. See Resources for the complete specification.
To summarize, hosted applications or Web sites that need to share information or notify users can benefit from an RSS feed. Indeed, RSS use is spreading like wildfire.
With great use, come a lot of headaches. Some developers complain that they need to shoehorn concepts in items and that the latter are too restrictive. To address the limitation, RSS 2.0 is designed as an extensible language. The principle is simple: the RSS elements themselves are not included in a namespace (for backward compatibility with RSS 0.92), but developers are free to extend RSS by adding elements in a namespace of their own.
When a tag has a namespace, RSS readers should try to recognize the extension from its namespace. If they succeed, then they can process the elements. If they don't, they should simply ignore the elements.
RSS, therefore, provides a backbone on which you can build more powerful applications. The beauty of the mechanism is that extensions are optional for the reader, so a reader who fails to process an extension can still do sensible things with the core RSS specification.
Before going any further, I want to debunk a few myths about namespaces. Surprisingly, seven years after the namespace recommendation was published by the W3C, it still generates a great deal of misunderstanding... and, unfortunately, it shows in some RSS readers that implement the standard incorrectly.
Namespaces are designed for vocabularies which, like RSS, need extensions on a core set of elements. Specifically, namespaces prevent name clashes when two different extensions use the same XML elements with different meanings.
Because the extensions are developed independently, it is only a matter of time before common names are reused. Consider the word "key," for example. It might be used as a database key or as a cryptographic key.
To remove the ambiguity, namespaces split element names in a local name and a namespace Uniform Resource Identifier (URI). Namespace URIs are unique identifiers of an extension. On their own, local names are not guaranteed to be unique, but the combination of local name and namespace URI is.
I assume most readers are familiar with the namespace syntax.
xmlns attribute declares a prefix and associates it
to a namespace URI. The prefix, in turn, associates the namespace URI to local names.
The colon is the separator between the local name and the prefix.
Listing 2 is an example:
Listing 2. namespace declaration
<dc:contributor xmlns:dc="http://purl.org/dc/elements/1.1/"> Marchal</dc:contributor>
In practice, two aspects of namespaces are confusing:
- The identifier is a URI and, in most cases, a URL.
- The combination of prefix and local name is not guaranteed to be unique. One needs to use the namespace URI for uniqueness.
I'll address these two mistakes.
First, the namespace URI. In the context of namespaces, the URI is strictly an identifier, not a definition, of the vocabulary. Therefore, the resource that the URI points to is irrelevant. In many cases, the resource does not even exist.
While developers and users like definition, they consider it unacceptable to force applications to download files before they process elements in a given namespace.
Many applications still run with no permanent Internet connection. Even for those with an Internet connection, having to download files would slow them down in ways that are not always acceptable. And what happens if the Web site is (temporarily) unavailable?
Furthermore, in the context of namespaces, an identifier is all that is needed.
What about prefixes? Appending the URI to every tag increases the length of documents. Therefore, prefixes were introduced as a means to shorten the URI. However, the prefix is not guaranteed to be unique (if the local name cannot be made unique, there's no reason to believe that the prefix will be... especially given that prefixes are usually very short), and you must refer to the URI.
Which brings me to the special characteristics of URIs; they are unique identifiers because most URIs are URLs, and URLs include a domain name. As long as you define your namespaces using domain names that you own, you guarantee uniqueness because no other organization has registered the same domain.
Three popular extensions
To better understand how the extension works in practice, let's review three popular extensions of RSS:
- Dublin Core for metadata
- iTunes for podcasting
- Syndicated Photography as an extension that extends an extension
The Dublin Core is a set of metadata elements originally defined in RFC2413.
The Dublin Core is a minimalist set of metadata for searching resources.
It has been used in HTML (in the
element) and in various XML vocabularies.
Some elements in the Dublin Core duplicate RSS elements (for example, language),
which is inevitable because it was developed before RSS. Yet even the
duplicates are useful because you can attach Dublin Core extensions to channels or items. For example, the RSS
copyright element appears at the channel level only whereas you can attach the Dublin Core
rights to individual items.
The Dublin Core namespace is
Listing 3 is an example of Dublin Core in RSS:
Listing 3. Dublin Core in RSS
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Marchal</title> <link>http://www.marchal.com/</link> <description>Personal site.</description> <language>en</language> <item>> <title>Introduction</title> <link>http://www.marchal.com/en/</link> <description>Introduction to the site.</description> <dc:creator>Marchal</dc:creator> <dc:rights>Copyright 2001-2004 Marchal</dc:rights> </item> </channel> </rss>
The Dublin Core extensions show that even elements similar to RSS can add value by offering more options.
iTunes Music Store
iTunes offers direct access to podcasts through its iTunes Music Store. To integrate
the podcast in the store, iTunes has defined an extension in the
The iTunes extension is remarkable because:
- It does not shy away from redefining elements that are similar to RSS
when the definition is different
itunes:imageiTunes needs images of 300 by 300 pixels whereas RSS specifies a maximal width of 144).
- It defines new elements to enhance user accessibility
itunes:durationprovides the duration of a podcast, whereas RSS only offers file length).
Listing 4 is an example of RSS with iTunes extensions:
Listing 4. RSS feed with iTunes extensions
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0"> <channel> <title>Declencheur</title> <link>http://www.declencheur.com/</link> <description>Le podcast qui parle photos</description> <docs>http://blogs.law.harvard.edu/tech/rss</docs> <language>fr</language> <copyright>© 2006, Benoit Marchal. Tous droits reserves.</copyright> <pubDate>Wed, 24 May 2006 16:19:39 +0200</pubDate> <itunes:author>Benoit Marchal</itunes:author> <itunes:image href="http://www.declencheur.com/clic/medias/2006/dsc_7478.jpg"/> <itunes:category text="Arts & Entertainment"> <itunes:category text="Photography"/> </itunes:category> <itunes:category text="International"> <itunes:category text="French"/> <itunes:category text="Belgian"/> </itunes:category> <item> <title>Histogramme</title> <link>http://www.declencheur.com/clic/archives/2006/05/histogramme-visuel</link> <description><![CDATA[<p><img alt="L'histogramme" height="225" width="150" align="right" class="photo illustrationright" src="//www.declencheur.com/clic/medias/2006/_dsc1808.jpg" />A mes yeux, l'histogramme est un des progres les plus remarquables de la photographie sur les dix dernieres annees. Un progres dans la precision de l'exposition au moins aussi important que la mesure matricielle en son temps.</p> <p>Les trois segments de l'episode sont (entre parentheses, le debut du segment concerne) :</p> <ol><li>(01:34) Elinchrom D-Lite, je suis particulierement enthousiaste par l'arrivee de ces flashes electroniques de studio. Mes premieres impressions.</li> <li>(07:40) Histogramme, le theme principal de l'episode. Un <a href="http://www.declencheur.com/clic/medias/2006/decl-2006-05-13.pdf" target="_blank">complement visuel</a> est disponible.</li> <li>(27:03) Vos commentaires, mes reactions : conseils pour la sauvegarde et precisions sur l'impression jet d'encre. Merci de votre soutien !</li></ol> <p>Les liens presentes dans l'episode :</p> <ul><li><a href="http://www.elinchrom.com" target="_blank">Elinchrom</a></li> <li><a href="http://www.foto-mueller.at" target="_blank">Foto Mueller</a><br clear="right" /></li></ul>]]></description> <pubDate>Mon, 15 May 2006 00:21:13 +0200</pubDate> <category>numerique</category> <category>technique</category> <enclosure length="34722926" type="audio/mpeg" url="http://www.declencheur.com/clic/medias/2006/decl-2006-05-14.mp3" /> <guid isPermaLink="false">histogramme</guid> <itunes:duration>36:08</itunes:duration> </item> </channel> </rss>
Photocasting is the distribution of photos through RSS feeds. It is similar
in principle to podcasting, but with images instead of sound.
At first sight, it appears that to
distribute photos in an RSS feed, the
enclosure element is enough.
Efficient photo viewers download a thumbnail first and only download the photo
at the user's request. This is not the behavior of
enclosure, so Pheed has to
define an extension with only two tags:
(a smaller image that is fast to download) and
(the full resolution image).
Pheed is an RSS aggregator with special support for multimedia documents such as photos.
Interestingly enough, photos also need metadata, and Pheed chose... the Dublin Core. The photo extension, therefore, builds on another extension. A smart move as it reduces duplicated efforts.
Listing 5 is an example of photo feed. Note that two namespaces are declared:
Listing 5. RSS feed with photo feed extensions
<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0" xmlns:photo="http://www.pheed.com/pheed/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Fun with photos</title> <link>http://www.marchal.com/en/photos/</link> <description>Photo humor.</description> <language>en</language> <item> <title>Journalist</title> <link>http://www.marchal.com/en/photos/humour</link> <description>He needs to fly</description> <dc:creator>Marchal</dc:creator> <dc:rights>Copyright 2004 Marchal</dc:rights> <dc:format>digital</dc:format> <dc:subject>Lego humor</dc:subject> <photo:imgsrc> http://www.marchal.com/en/photos/humour/phbd0001.jpg </photo:imgsrc> <photo:thumbnail> http://www.marchal.com/images/shared/thbd0001.jpg </photo:thumbnail> </item> </channel> </rss>
Defining your extensions
What are the practical steps if you find yourself wishing that RSS had one more feature?
- Make sure that the extension you need does not exist elsewhere already. No need to reinvent the wheel; it makes life miserable for RSS readers.
- If you find you still need to develop a new extension, make sure to apply the correct rules for namespace creation.
- Don't hesitate to redefine existing elements if they are not totally appropriate
(for example, the
itunes:imageis similar, but different from RSS's own
image). Overloading elements will only introduce confusion.
RSS is a flexible format but, more importantly, it's a great starting point for many applications that need to broadcast or narrowcast. Thanks to the extension mechanism, its usefulness is limitless.
- Participate in the discussion forum.
- Introduction to Syndication (Vincent Lauria, developerWorks, June 2006): Get started with RSS or another feed reader. Also learn why is RSS so popular and its benefits, and which feed readers might fit your needs.
- RSS specification: Check out this surprisingly readable specification.
- An overview of the Atom 1.0 Syndication Format (James Snell, developerWorks, June 2005): Consider Atom as an alternative feed reader to RSS.
- iTunes extensions: Read this very easy spec with info on the submission process and technical aspects of preparing your RSS feed..
- The Dublin Core: Explore this basis for several metadata standards.
- Pheed extension that builds on Dublin Core: Learn about Pheeds -- RSS documents that contain a few other photography-related elements.
- IBM XML 1.1 certification: Find out how you can become an IBM Certified Developer in XML 1.1 and related technologies.
- XML: See developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.