Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Content feeds with RSS 2.0

Syndication goes mainstream

James Lewin (jim@lewingroup.com), President, The Lewin Group
James Lewin has been working with the Internet since 1995. He is the president and owner of The Lewin Group, and has written extensively on e-business and the Web. His column, Ecommerce in Action, is published by ITWorld. His interests include electronic music, analog circuits, and science fiction. He can be reached at jim@lewingroup.com.

Summary:  A lot has happened in the RSS world since developerWorks last looked at RSS: Two new specifications have come out, RSS has become one of the most popular XML standards, and tools and feeds are popping up everywhere. RSS has contributed to the explosion of weblogs, and it is becoming a standard part of other Web sites, too. This article reviews RSS 2.0, looks at new RSS developments, and jump-starts your understanding of this important format.

Date:  23 Dec 2003
Level:  Introductory
Also available in:   Korean  Japanese

Activity:  633177 views
Comments:  

It's been three years since I wrote my last article on RSS for developerWorks, "An introduction to RSS news feeds." At that time, RSS was one of the more popular uses for XML. Since then, Netscape abandoned the format, five (count 'em, five) new versions of the RSS spec have come out, and there was an acrimonious fork in the format.

In spite of these setbacks, RSS is now more popular than ever.

RSS is everywhere!

Today you can find tens of thousands of RSS feeds. Weblog users, news publishers, government agencies, and many personal and commercial Web sites support the format. Developer tools deal with RSS in Java technology, PERL, PHP, Python, and other major programming languages. Many viewers and aggregators work on the Web, on the desktop, even within e-mail clients. RSS has become the de facto standard for syndicating content and metadata over the Internet.

This article will review the current specification, RSS 2.0. I'll skip the discussion of the colorful characters and controversies surrounding the format -- that wouldn't leave room for much else.

Instead, this article will give you a little background, review how the format is being used, and drop the names of some of the more popular tools for working with it. It will review the nuts and bolts of the format, give you examples, and tell you what you need to know to get started. Finally, it will cover some of the new features of RSS 2.0, such as extending RSS using namespaces. At the end of the article, you'll find the mother lode -- a gargantuan annotated list of RSS resources.


What is RSS?

What does "RSS" stand for, anyway?

It depends on whom you ask!

Like many standards, it's not always easy to get people to agree about even the basics. Some say it stands for "RDF Site Summary", others say it's "Really Simple Syndication", while still others believe it's "Rich Site Summary".

According to Dave Winer, author of the current spec, "There is no consensus on what RSS stands for, so it's not an acronym, it's a name. Later versions of this spec may say it's an acronym, and hopefully this won't break too many applications."

Regardless, RSS does stand for one thing -- a format for syndicating content over the Internet.

RSS is a format for syndicating content and metadata over the Internet. It is commonly used to share headlines and links to news articles. With news articles, the actual article isn't usually shared, but metadata about the article is; this metadata can include a headline, a URL, or a summary. RSS is an important tool for publishers because feeds can be used to syndicate content, and to integrate third-party content into your site.

RSS is a dialect of XML. All RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) Web site.

Here's a typical example of how RSS is used:

  • A publisher has some content that they want to publicize.
  • They create an RSS channel for their content.
  • In this channel, they include items for Web pages they want to promote.
  • This channel can be read by remote applications, and converted to headlines and links. These links can be incorporated into new Web pages, or read in dedicated readers.
  • People see the links on various sites, click on them, and go to the original publisher's site.

While headline syndication is the most common use for RSS, it is also used for many other purposes. RSS is a very popular format in the weblog community. It's also used for photo diaries, classified ad listings, recipes, reviews, and for tracking the status of software packages.

RSS feeds are used in the world of e-commerce as a way of delivering information. For example, Amazon provides custom news feeds based on its Web services platform. This lets you track top books in your news reader, or include information on your Web site about related books for sale at Amazon.

RSS has grown tremendously in popularity in the last few years. Syndic8.com maintains an index of RSS channels, and its list of feeds has grown by about 1400% in two years. Yahoo news, the BBC, Slashdot, LockerGnome, Amazon, CNN, Wired, Rolling Stone, and Apple Computer are among the many popular sources of RSS feeds.


News readers

developerWorks RSS feed

developerWorks offers its own RSS feeds -- find out more at http://www.ibm.com/developerworks/rss/.

As the number of news feeds has grown, a new category of software has emerged: the news reader. News readers are personal aggregators -- they help you find and organize a list of channels that you are interested in. Once you've selected your channels, you can view them all, using the readers' consistent interface. The news reader checks for updates to the channels that your interested in, and converts them to HTML for browsing.

One popular news reader is BlogExpress:


Figure 1. Viewing alphaWorks in BlogExpress
Viewing alphaWorks in a BlogExpress

BlogExpress is what's known as "pizza ware" -- meaning, if you like the software, you can send the author some pizza money.

Customizing channels

Adding a channel in BlogExpress is easy. Say you're interested in tracking what's going on at IBM's alphaWorks site. The first thing you would do is pick a feed to add:


Figure 2. Finding a news feed at alphaWorks
Tons of cool feeds at alphaWorks

Adding a feed is easy. The orange "XML" logo is often used to link to public RSS feeds. In most Web browsers, you can right-click on the "XML" logo, copy the link, and paste it into your news reader.

Other popular news readers include BlogStreet, FeedReader, AmphetaDesk, and NewsGator (see Resources).


Finding RSS feeds

Search engines can be used to find content in RSS format. With Google, for example, you can add "filetype:rss" to a search to find your search terms in .rss files.

Dedicated search engines make searching for content easier. Feedster monitors weblogs, and lets you search through an index of log entries and view them by relevance, date, or ranking (blogrank). When you do a search, Feedster creates an RSS feed based on your request. This can be added to your news reader, so that you can see all the recent activity on your search request, without even leaving your news reader.

DayPop searches news, blogs, and RSS feeds. It lets you track popular news within the weblog world. It provides a list of the current top 40 most popular weblog links. These are the most commonly linked-to articles throughout the world. It creates a list of top words that are being used in weblogs. It also ranks weblogs by citations, providing an index of weblogs that are popular with other webloggers. You can do custom searches, too. The ranking lists and custom searches are available as RSS feeds that you can import into your news reader.


What's new in RSS 2.0?

RSS 2.0 builds upon the RSS 0.91 spec. It's backwards compatible, so tools that work with RSS 2.0 should work with 0.91 feeds. The updated spec adds a few elements, such as <cloud> and <guid>.

It also reduces some restrictions. Previously, the <link> and <url> elements could only be http or ftp; now, any valid URI can be used. In RSS 0.91, each channel could only contain 15 items, and elements were limited in length; these limits have been removed. Larger values should still be used with caution, because they may cause problems with older applications.

The greatest change, though, is the ability to extend the format using namespaces. RSS 2.0 supports namespaces, a standardized approach to adding elements not found in the spec. Feeds can contain new elements if they are defined in a namespace.


Overview of RSS 2.0

RSS is a dialect of XML, and is used for syndicating Web content and metadata. RSS 0.91 is the most commonly used of several versions available. For new RSS feeds, it's better to use version 2.0 because it is the current spec and, as mentioned above, it's backward compatible with 0.91.

Dave Winer authored version 2.0 of the spec. He intentionally avoided changes to the specification that would make it more difficult to use, or that would break existing applications. Winer sums up his philosophy like this: "Keep it simple. That's the value of RSS. Anyone who can understand a little HTML can understand RSS. That's important!"

The spec is published under a Creative Commons license (see Resources). This means that you're free to copy and distribute the spec, make derivative works, and to use it freely in commercial work. An advisory board is responsible for updating the spec, advocating for it, and documenting it.

Form of an RSS file

An RSS file is made up of a <channel> element and its sub-elements. <channel> contains elements that represent metadata about the channel -- such as a <title>, <link>, and <description> -- in addition to the channel content itself, in the form of items. Items typically make up the bulk of the channel, and contain content that changes frequently.

The channel

A channel typically has three elements that tell you about the channel itself:

  • <title>: The name of the channel or feed.
  • <link>: The URL of the Web site or site area associated with this channel.
  • <description>: A brief explanation of what the channel is.

Many channel sub-elements are optional. The commonly-used <image> element contains three required sub-elements:

  • <url>: The URL of a GIF, JPEG, or PNG image that represents the channel.
  • <title>: Describes the image. It is used in the ALT attribute of the HTML <image> tag when the channel is rendered in HTML.
  • <link>: The URL of the site. When the channel is rendered as HTML, the image can act as a link to the site.

<image> also has three optional sub-elements:

  • <width>: Number indicating the width of the image in pixels. The maximum value is 144 and the default value is 88.
  • <height>: Number indicating the height of the image in pixels. The maximum value is 400 and the default value is 31.
  • <description>: Contains text that is included in the title attribute of the link that's formed around the image when rendered.

In addition, many other optional channel elements can be used. Most of these are self-explanatory:

  • <language>: en-us
  • <copyright>: Copyright 2003, James Lewin
  • <managingEditor>: dan@spam_me.com (Dan Deletekey)
  • <webMaster>: dan@spam_me.com (Dan Deletekey)
  • <pubDate>: Sat, 15 Nov 2003 0:00:01 GMT
  • <lastBuildDate>: Sat, 15 Nov 2003 0:00:01 GMT
  • <category>: ebusiness
  • <generator>: Your CMS 2.0
  • <docs>: http://blogs.law.harvard.edu/tech/rss
  • <cloud>: Allows processes to register with a "cloud" to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.
  • <ttl>: Time to live, a number representing the number of minutes a feed can be cached before it should be refreshed.
  • <rating>: The PICS rating for the channel.
  • <textInput>: Defines input box that can be displayed with the channel.
  • <skipHours>: A hint for aggregators that tells them which hours can be skipped for updates.
  • <skipDays>: A hint for aggregators that tells them which days can be skipped for updates.

Items

Items are usually the most important part of a feed. Each item can be an entry on a weblog, a complete article, a movie review, a classified ad, or whatever you want to syndicate with your channel. While other elements within a channel may stay constant, items are likely to change frequently.

You can have as many items as you want. The previous spec had a limit of 15 items, and this is still a good upper limit if you want to ensure backwards compatibility.

Elements of news items

An item typically contains three elements:

  • <title>: This is the name of the item. In standards use, this is translated into a headline within HTML.
  • <link>: This is the URL of the item. The title is commonly used as a link, pointing to the URL contained within the <link> element.
  • <description>: This is usually a summary of or commentary on the URL that is pointed to in the link.

All elements are optional, but an item must contain either a <title> or a <description>.

Several other optional elements of items can be used:

  • <author>: E-mail address of the author.
  • <category>: Supports organizing entries.
  • <comments>: URL of a page for comments on the item.
  • <enclosure>: Supports media objects associated with the item.
  • <guid>: A permanent link that is uniquely tied to the item.
  • <pubDate>: When the item was published.
  • <source>: The RSS channel that an item comes from. This can be useful when items are aggregated together.

Listing 1 is an example of an RSS 2.0 file. Notice that the channel is contained within <rss version="2.0">. This very basic example shows how items and images are contained within the channel. The elements shown are the most commonly used channel sub-elements.


Listing 1. A simple RSS 2.0 file
<?xml version="1.0"?>
<rss version="2.0">
    <channel>
    <title>The channel's name goes here</title>
    <link>http://www.urlofthechannel.com/</link>
    <description>This channel is an example channel for an article.
    </description>
    <language>en-us</language>
    <image>
      <title>The image title goes here</title>
      <url>http://www.urlofthechannel.com/images/logo.gif</url>
      <link>http://www.urlofthechannel.com/</link>
    </image>
    <item>
      <title>The Future of content</title>
      <link>http://www.itworld.com/nl/ecom_in_act/11122003/</link>
      <description> The issue of people distributing and reusing
      digital media is a problem for many businesses. It may also be
      a hidden opportunity. Just as open source licensing has opened
      up new possibilities in the world of technology, it promises to do
      the same in the area of creative content.</description>
    </item>
    <item>
      <title>Online Music Services - Better than free?</title>
      <link>http://www.itworld.com/nl/ecom_in_act/08202003/</link>
      <description>More people than ever are downloading music from 
      the Internet. Many use person-to-person file sharing programs like 
      Kazaa to share and download music in MP3 format, paying nothing. 
      This has made it difficult for companies to setup online music 
      businesses. How can companies compete against free?</description>
    </item>
  </channel>
</rss>


Speaking in tongues

Because of the popularity of RSS, many tools have emerged that allow you to work with the files in almost any environment:

  • Java technology: An RSS Utilities Package, available at Sun's site, supports the use of a Tag Library within JavaServer Pages. It also includes an RSS parser.
  • Perl: Several established Perl tools work with RSS. XML::RSS provides a framework for creating and maintaining RSS files. It supports converting between the more commonly used versions.
  • Python: RSS.py is a set of classes for working with RSS channels with Python.

In addition, many content management and weblog tools support RSS directly. Most weblog tools, including Movable Type, Blogger, and Radio Userland, support RSS. Several content management systems, including Zope and CityDesk, now support it.


Extending RSS

RSS 2.0 has many optional elements, including those that are needed for most channels. However, it supports extensibility so you can use elements that aren't in the spec. The RSS 2.0 spec does not spend much time defining how this will work, though. Extensibility is summed up like this: "A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace."

This leaves a great deal to the imagination! Fortunately, the spec includes an example, and you can refer to a number of examples currently in use.

The basic idea is that you can add any tag you want -- however, it's very easy to add elements that have multiple meanings. People using your channel might not have any idea what a particular tag means. For example, if I wanted to use the <analog> tag in a channel, it would be unclear what it meant. Web gurus might think the tag referred to Analog, the popular Web log file analyzer. Science fiction fans might think the tag had something to do with Analog, the classic sci-fi magazine. Musicians might think that it referred to a popular type of synthesizer, biologists a type of organ, and electrical engineers a type of circuit. Vagueness makes it difficult to understand the meaning of a tag.

Because of this, RSS lets you add any tag you like, but requires that it be used with a namespace. This helps to clarify what that tag means.

Getting back to the <analog> example, I might want to create a set of tags relating to e-business, and have my <analog> tag be an "e-business" element. To do this, I could create an e-business namespace, and have <analog> be a tag in this namespace. To do this, I would add the following namespace entry:

xmlns:ebusiness="http://www.lewingroup.com/ebusinessChannel"

This creates a namespace named "ebusiness" and indicates that the documentation for this namespace is located on my site. To use the <analog> tag, I could use the format <ebusiness:analog>. This would distinguish it from other possible meanings of analog, like <sciencefiction:analog> or <synthesizers:analog>.

A more practical example of extensibility is found within the sample file that accompanies the RSS 2.0 spec:


Listing 2. Namespaces in the RSS 2.0 spec sample file
<?xml version="1.0"?>
<!-- RSS generated by Radio UserLand v8.0.5 on 9/30/2002; 4:00:00 AM Pacific -->
<rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule">
  <channel>
    <title>Scripting News</title>
    <link>http://www.scripting.com/</link>
    <description>A weblog about scripting and stuff like that.</description>
    <language>en-us</language>
    <blogChannel:blogRoll>
      http://radio.weblogs.com/0001015/userland/scriptingNewsLeftLinks.opml
    </blogChannel:blogRoll>
      <item>
      <description>Joshua Allen: 
      <a href="http://www.netcrucible.com/blog/2002/09/29.html#a243">
      Who loves namespaces?</a></description>
      <pubDate>Sun, 29 Sep 2002 19:59:01 GMT</pubDate>
      <guid>
      http://scriptingnews.userland.com/backissues/2002/09/29#When:12:59:01PM
      </guid>
      </item>
  </channel>
</rss>

In this example, a namespace called blogChannel is defined. It points to documentation that explains the use of several new elements that are common to weblogs. One of these is <blogroll>. The documentation explains that a blogroll is a collection of links within a weblog that point to sites related to your weblog's content.

The <blogChannel:blogRoll> tag provides the information needed for users or software to know that blogRoll is an element defined in the blogChannel namespace, and where the documentation for this can be found.

Again, RSS 2.0 requires namespaces only for elements not included in the spec. All the basic tags are assumed to be within the RSS 2.0 namespace. This makes the format easy to use, since you don't need to know anything about namespaces unless you want to extend RSS.


Summary

This article has looked at the importance of RSS in the areas of content syndication and aggregation. The article focused on RSS 2.0, because it is the most recent version of the specification, and it is rapidly growing in popularity. The article also reviewed the tools available for working with RSS, including aggregators, validators, and parsers. For additional information, see Resources.


Resources

About the author

James Lewin has been working with the Internet since 1995. He is the president and owner of The Lewin Group, and has written extensively on e-business and the Web. His column, Ecommerce in Action, is published by ITWorld. His interests include electronic music, analog circuits, and science fiction. He can be reached at jim@lewingroup.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=11865
ArticleTitle=Content feeds with RSS 2.0
publish-date=12232003
author1-email=jim@lewingroup.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers