Working XML: Expand RSS capabilities with RSS extensions

Boost the core standard through namespaces

For all its popularity, the RSS standard is surprisingly simple and, one can even write, limited. RSS does not pretend to do many things, but it is designed to be extended through RSS modules. This article introduces three popular RSS extensions and explains how to design your own extensions.

Share:

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

Photo of Benoit MarchalBenoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example, Applied XML Solutions, and XML and the Enterprise. He produces the Declencheur podcast on photography.



15 August 2006

Also available in Chinese Japanese

RSS is simple, but not limited

The RSS acronym stands for many things: Really Simple Syndication, Rich Site Summary, RDF Site Summary, and there are probably a few more acronyms out there. In practice, RSS is the orange button that you find on a growing number of sites, including here at developerWorks.

Through the RSS feed, you can subscribe to a site and be notified of updates. RSS differs from e-mail subscriptions on two important aspects:

  • RSS requires a special client, the RSS reader, but it is increasingly included in the Web browser.
  • RSS preserves the user's privacy because, unlike e-mail, no personal information is exchanged with the site.

RSS 101

It's easy to code your first RSS file. The root RSS element is called rss (appropriately enough). It is immediately followed by a channel element.

The channel element starts with a description of the newsfeed. It must include at least the feed title (title), a link to the related Web site (link), and the description of the feed (description).

Other elements are optional. The most commonly used ones are the language of the feed (language), its publication date (pubDate), classification in categories (category), and time to live (ttl), which is the number of minutes that the channel can be cached.

The docs element is often included as well, although it is... special. The docs element points to the RSS documentation so it must have the same value in every RSS document.

After the channel description is the list of item elements, where each item represents a newsworthy event.

What is newsworthy? It depends on the site and the application. For a regular Web site, an item may represent significant updates to a page; for a podcast, it's a new episode; for a network monitoring application, it's a network alert; for a forum, an item is a new post.

The content of item starts like the channel itself: title, link, and description. It can be more specific with date, link to multimedia content (enclosure), source, and comment.

To include HTML tags in the description, you must escape them (typically through a CDATA section).

Worth noting is the guid element, a unique identifier. When present, RSS readers rely on it to identify new items. If GUIDs are not available, the readers must compare the item title and content to find new items.

Listing 1 is a sample RSS feed:

Listing 1. RSS feed
<rss version="2.0">
  <channel>
    <title>Marchal.com</title>
    <link>http://www.marchal.com/</link>
    <description>Marchal's site.</description>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <language>en</language>
    <copyright>Copyright 2006, Benoit Marchal.</copyright>
    <pubDate>Fri, 30 Jun 2006 00:35:40 +0200</pubDate>
    <item>
      <title>Online photos</title>
      <link>http://www.marchal.com/en/photos</link>
      <description><![CDATA[<p>In 2002 I added a digital camera to my
         writer toolbox. I have used it to illustrate my articles and web
         sites.</p>
         <p>While I had some experience with film-based photos, I was
         new to digital images. Most of the things I knew were
         still valid, others I had to re-learn. There was
         a lot of new material as well. These pages summarize
         my findings.</p>]]></description>
      <pubDate>Fri, 30 Jun 2006 00:35:40 +0200</pubDate>
      <category>photo</category>
      <category>2002</category>
      <guid isPermaLink="false">photos</guid>
    </item>
  </channel>
</rss>

To subscribe to a feed, visitors need an RSS reader. Essentially the reader downloads the RSS file at regular intervals and notifies the user if new items are found.

RSS readers are being built into the main browsers: Firefox 1.5, Internet Explorer 7, and Safari already include an RSS reader. Standalone readers are also available, for example, from NewsGator. Last but not least, for those who don't like to install new software, you can read feeds through aggregator sites such as Google Reader or NetVibes. There are even RSS to e-mail bridges like Zookoda.

More elements are defined in RSS than can be covered in this section. See Resources for the complete specification.

RSS power

To summarize, hosted applications or Web sites that need to share information or notify users can benefit from an RSS feed. Indeed, RSS use is spreading like wildfire.

With great use, come a lot of headaches. Some developers complain that they need to shoehorn concepts in items and that the latter are too restrictive. To address the limitation, RSS 2.0 is designed as an extensible language. The principle is simple: the RSS elements themselves are not included in a namespace (for backward compatibility with RSS 0.92), but developers are free to extend RSS by adding elements in a namespace of their own.

When a tag has a namespace, RSS readers should try to recognize the extension from its namespace. If they succeed, then they can process the elements. If they don't, they should simply ignore the elements.

RSS, therefore, provides a backbone on which you can build more powerful applications. The beauty of the mechanism is that extensions are optional for the reader, so a reader who fails to process an extension can still do sensible things with the core RSS specification.

Namespace reminder

Before going any further, I want to debunk a few myths about namespaces. Surprisingly, seven years after the namespace recommendation was published by the W3C, it still generates a great deal of misunderstanding... and, unfortunately, it shows in some RSS readers that implement the standard incorrectly.

Namespaces are designed for vocabularies which, like RSS, need extensions on a core set of elements. Specifically, namespaces prevent name clashes when two different extensions use the same XML elements with different meanings.

Because the extensions are developed independently, it is only a matter of time before common names are reused. Consider the word "key," for example. It might be used as a database key or as a cryptographic key.

To remove the ambiguity, namespaces split element names in a local name and a namespace Uniform Resource Identifier (URI). Namespace URIs are unique identifiers of an extension. On their own, local names are not guaranteed to be unique, but the combination of local name and namespace URI is.

I assume most readers are familiar with the namespace syntax. Essentially, the xmlns attribute declares a prefix and associates it to a namespace URI. The prefix, in turn, associates the namespace URI to local names. The colon is the separator between the local name and the prefix. Listing 2 is an example:

Listing 2. namespace declaration
<dc:contributor xmlns:dc="http://purl.org/dc/elements/1.1/">
Marchal</dc:contributor>

In practice, two aspects of namespaces are confusing:

  • The identifier is a URI and, in most cases, a URL.
  • The combination of prefix and local name is not guaranteed to be unique. One needs to use the namespace URI for uniqueness.

I'll address these two mistakes.

First, the namespace URI. In the context of namespaces, the URI is strictly an identifier, not a definition, of the vocabulary. Therefore, the resource that the URI points to is irrelevant. In many cases, the resource does not even exist.

While developers and users like definition, they consider it unacceptable to force applications to download files before they process elements in a given namespace.

Many applications still run with no permanent Internet connection. Even for those with an Internet connection, having to download files would slow them down in ways that are not always acceptable. And what happens if the Web site is (temporarily) unavailable?

Furthermore, in the context of namespaces, an identifier is all that is needed.

What about prefixes? Appending the URI to every tag increases the length of documents. Therefore, prefixes were introduced as a means to shorten the URI. However, the prefix is not guaranteed to be unique (if the local name cannot be made unique, there's no reason to believe that the prefix will be... especially given that prefixes are usually very short), and you must refer to the URI.

Which brings me to the special characteristics of URIs; they are unique identifiers because most URIs are URLs, and URLs include a domain name. As long as you define your namespaces using domain names that you own, you guarantee uniqueness because no other organization has registered the same domain.

Three popular extensions

To better understand how the extension works in practice, let's review three popular extensions of RSS:

  • Dublin Core for metadata
  • iTunes for podcasting
  • Syndicated Photography as an extension that extends an extension

Dublin Core

The Dublin Core is a set of metadata elements originally defined in RFC2413. The Dublin Core is a minimalist set of metadata for searching resources. It has been used in HTML (in the META element) and in various XML vocabularies.

Some elements in the Dublin Core duplicate RSS elements (for example, language), which is inevitable because it was developed before RSS. Yet even the duplicates are useful because you can attach Dublin Core extensions to channels or items. For example, the RSS copyright element appears at the channel level only whereas you can attach the Dublin Core rights to individual items.

The Dublin Core namespace is http://purl.org/dc/elements/1.1/.

Listing 3 is an example of Dublin Core in RSS:

Listing 3. Dublin Core in RSS
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> 
<channel>
   <title>Marchal</title>
   <link>http://www.marchal.com/</link>
   <description>Personal site.</description>
   <language>en</language>
   <item>>
      <title>Introduction</title>
      <link>http://www.marchal.com/en/</link>
      <description>Introduction to the site.</description>
      <dc:creator>Marchal</dc:creator>
      <dc:rights>Copyright 2001-2004 Marchal</dc:rights>
   </item>
</channel>
</rss>

The Dublin Core extensions show that even elements similar to RSS can add value by offering more options.

iTunes Music Store

iTunes offers direct access to podcasts through its iTunes Music Store. To integrate the podcast in the store, iTunes has defined an extension in the http://www.itunes.com/dtds/podcast-1.0.dtd namespace.

The iTunes extension is remarkable because:

  • It does not shy away from redefining elements that are similar to RSS when the definition is different (for example, itunes:image iTunes needs images of 300 by 300 pixels whereas RSS specifies a maximal width of 144).
  • It defines new elements to enhance user accessibility (for example, itunes:duration provides the duration of a podcast, whereas RSS only offers file length).

Listing 4 is an example of RSS with iTunes extensions:

Listing 4. RSS feed with iTunes extensions
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
    version="2.0">
  <channel>
    <title>Declencheur</title>
    <link>http://www.declencheur.com/</link>
    <description>Le podcast qui parle photos</description>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <language>fr</language>
    <copyright>© 2006, Benoit Marchal. Tous droits reserves.</copyright>
    <pubDate>Wed, 24 May 2006 16:19:39 +0200</pubDate>
    <itunes:author>Benoit Marchal</itunes:author>
    <itunes:image href="http://www.declencheur.com/clic/medias/2006/dsc_7478.jpg"/>
    <itunes:category text="Arts & Entertainment">
      <itunes:category text="Photography"/>
    </itunes:category>
    <itunes:category text="International">
      <itunes:category text="French"/>
      <itunes:category text="Belgian"/>
    </itunes:category>
    <item>
      <title>Histogramme</title>
      <link>http://www.declencheur.com/clic/archives/2006/05/histogramme-visuel</link>
      <description><![CDATA[<p><img alt="L'histogramme" height="225"
      width="150" align="right" class="photo illustrationright" 
       src="http://www.declencheur.com/clic/medias/2006/_dsc1808.jpg" />A mes yeux,
       l'histogramme est un des progres les plus remarquables de la photographie sur
       les dix dernieres annees. Un progres dans la precision de l'exposition au moins
       aussi important que la mesure matricielle en son temps.</p>
       <p>Les trois segments de l'episode sont (entre parentheses, le debut du
       segment concerne) :</p>
       <ol><li>(01:34) Elinchrom D-Lite, je suis particulierement enthousiaste
       par l'arrivee de ces flashes electroniques de studio. Mes premieres
       impressions.</li>
       <li>(07:40) Histogramme, le theme principal de l'episode. Un
       <a href="http://www.declencheur.com/clic/medias/2006/decl-2006-05-13.pdf"
       target="_blank">complement visuel</a> est disponible.</li>
       <li>(27:03) Vos commentaires, mes reactions : conseils pour la sauvegarde
       et precisions sur l'impression jet d'encre. Merci de votre
       soutien !</li></ol>
       <p>Les liens presentes dans l'episode :</p>
       <ul><li><a href="http://www.elinchrom.com"
       target="_blank">Elinchrom</a></li>
       <li><a href="http://www.foto-mueller.at" target="_blank">Foto
       Mueller</a><br clear="right" /></li></ul>]]></description>
      <pubDate>Mon, 15 May 2006 00:21:13 +0200</pubDate>
      <category>numerique</category>
      <category>technique</category>
      <enclosure length="34722926" type="audio/mpeg"
      url="http://www.declencheur.com/clic/medias/2006/decl-2006-05-14.mp3" />
      <guid isPermaLink="false">histogramme</guid>
      <itunes:duration>36:08</itunes:duration>
    </item>
  </channel>
</rss>

Syndicated Photography

Photocasting is the distribution of photos through RSS feeds. It is similar in principle to podcasting, but with images instead of sound. At first sight, it appears that to distribute photos in an RSS feed, the enclosure element is enough.

Efficient photo viewers download a thumbnail first and only download the photo at the user's request. This is not the behavior of enclosure, so Pheed has to define an extension with only two tags: photo:thumbnail (a smaller image that is fast to download) and photo:imgsrc (the full resolution image). Pheed is an RSS aggregator with special support for multimedia documents such as photos.

Interestingly enough, photos also need metadata, and Pheed chose... the Dublin Core. The photo extension, therefore, builds on another extension. A smart move as it reduces duplicated efforts.

Listing 5 is an example of photo feed. Note that two namespaces are declared:

Listing 5. RSS feed with photo feed extensions
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:photo="http://www.pheed.com/pheed/"
                   xmlns:dc="http://purl.org/dc/elements/1.1/"> 
<channel>
   <title>Fun with photos</title>
   <link>http://www.marchal.com/en/photos/</link>
   <description>Photo humor.</description>
   <language>en</language> 
   <item>
      <title>Journalist</title>
      <link>http://www.marchal.com/en/photos/humour</link>
      <description>He needs to fly</description>
      <dc:creator>Marchal</dc:creator>
      <dc:rights>Copyright 2004 Marchal</dc:rights>
      <dc:format>digital</dc:format>
      <dc:subject>Lego humor</dc:subject>
      <photo:imgsrc>
         http://www.marchal.com/en/photos/humour/phbd0001.jpg
      </photo:imgsrc>
      <photo:thumbnail>
         http://www.marchal.com/images/shared/thbd0001.jpg
      </photo:thumbnail>
   </item>
</channel>
</rss>

Defining your extensions

What are the practical steps if you find yourself wishing that RSS had one more feature?

  1. Make sure that the extension you need does not exist elsewhere already. No need to reinvent the wheel; it makes life miserable for RSS readers.
  2. If you find you still need to develop a new extension, make sure to apply the correct rules for namespace creation.
  3. Don't hesitate to redefine existing elements if they are not totally appropriate (for example, the itunes:image is similar, but different from RSS's own image). Overloading elements will only introduce confusion.

RSS is a flexible format but, more importantly, it's a great starting point for many applications that need to broadcast or narrowcast. Thanks to the extension mechanism, its usefulness is limitless.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=153586
ArticleTitle=Working XML: Expand RSS capabilities with RSS extensions
publish-date=08152006