It's been three years since I wrote my last article on RSS for developerWorks, "An introduction to RSS news feeds." At that time, RSS was one of the more popular uses for XML. Since then, Netscape abandoned the format, five (count 'em, five) new versions of the RSS spec have come out, and there was an acrimonious fork in the format.
In spite of these setbacks, RSS is now more popular than ever.
Today you can find tens of thousands of RSS feeds. Weblog users, news publishers, government agencies, and many personal and commercial Web sites support the format. Developer tools deal with RSS in Java technology, PERL, PHP, Python, and other major programming languages. Many viewers and aggregators work on the Web, on the desktop, even within e-mail clients. RSS has become the de facto standard for syndicating content and metadata over the Internet.
This article will review the current specification, RSS 2.0. I'll skip the discussion of the colorful characters and controversies surrounding the format -- that wouldn't leave room for much else.
Instead, this article will give you a little background, review how the format is being used, and drop the names of some of the more popular tools for working with it. It will review the nuts and bolts of the format, give you examples, and tell you what you need to know to get started. Finally, it will cover some of the new features of RSS 2.0, such as extending RSS using namespaces. At the end of the article, you'll find the mother lode -- a gargantuan annotated list of RSS resources.
RSS is a format for syndicating content and metadata over the Internet. It is commonly used to share headlines and links to news articles. With news articles, the actual article isn't usually shared, but metadata about the article is; this metadata can include a headline, a URL, or a summary. RSS is an important tool for publishers because feeds can be used to syndicate content, and to integrate third-party content into your site.
RSS is a dialect of XML. All RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) Web site.
Here's a typical example of how RSS is used:
- A publisher has some content that they want to publicize.
- They create an RSS channel for their content.
- In this channel, they include items for Web pages they want to promote.
- This channel can be read by remote applications, and converted to headlines and links. These links can be incorporated into new Web pages, or read in dedicated readers.
- People see the links on various sites, click on them, and go to the original publisher's site.
While headline syndication is the most common use for RSS, it is also used for many other purposes. RSS is a very popular format in the weblog community. It's also used for photo diaries, classified ad listings, recipes, reviews, and for tracking the status of software packages.
RSS feeds are used in the world of e-commerce as a way of delivering information. For example, Amazon provides custom news feeds based on its Web services platform. This lets you track top books in your news reader, or include information on your Web site about related books for sale at Amazon.
RSS has grown tremendously in popularity in the last few years. Syndic8.com maintains an index of RSS channels, and its list of feeds has grown by about 1400% in two years. Yahoo news, the BBC, Slashdot, LockerGnome, Amazon, CNN, Wired, Rolling Stone, and Apple Computer are among the many popular sources of RSS feeds.
As the number of news feeds has grown, a new category of software has emerged: the news reader. News readers are personal aggregators -- they help you find and organize a list of channels that you are interested in. Once you've selected your channels, you can view them all, using the readers' consistent interface. The news reader checks for updates to the channels that your interested in, and converts them to HTML for browsing.
One popular news reader is BlogExpress:
Figure 1. Viewing alphaWorks in BlogExpress
BlogExpress is what's known as "pizza ware" -- meaning, if you like the software, you can send the author some pizza money.
Adding a channel in BlogExpress is easy. Say you're interested in tracking what's going on at IBM's alphaWorks site. The first thing you would do is pick a feed to add:
Figure 2. Finding a news feed at alphaWorks
Adding a feed is easy. The orange "XML" logo is often used to link to public RSS feeds. In most Web browsers, you can right-click on the "XML" logo, copy the link, and paste it into your news reader.
Other popular news readers include BlogStreet, FeedReader, AmphetaDesk, and NewsGator (see Resources).
Search engines can be used to find content in RSS format. With Google, for example, you can add "filetype:rss" to a search to find your search terms in .rss files.
Dedicated search engines make searching for content easier. Feedster monitors weblogs, and lets you search through an index of log entries and view them by relevance, date, or ranking (blogrank). When you do a search, Feedster creates an RSS feed based on your request. This can be added to your news reader, so that you can see all the recent activity on your search request, without even leaving your news reader.
DayPop searches news, blogs, and RSS feeds. It lets you track popular news within the weblog world. It provides a list of the current top 40 most popular weblog links. These are the most commonly linked-to articles throughout the world. It creates a list of top words that are being used in weblogs. It also ranks weblogs by citations, providing an index of weblogs that are popular with other webloggers. You can do custom searches, too. The ranking lists and custom searches are available as RSS feeds that you can import into your news reader.
RSS 2.0 builds upon the RSS 0.91 spec. It's backwards compatible, so tools that work with RSS 2.0 should work with 0.91 feeds. The updated spec adds a few elements, such as
It also reduces some restrictions. Previously, the
<url> elements could only be http or ftp; now, any valid URI can be used. In RSS 0.91, each channel could only contain 15 items, and elements were limited in length; these limits have been removed. Larger values should still be used with caution, because they may cause problems with older applications.
The greatest change, though, is the ability to extend the format using namespaces. RSS 2.0 supports namespaces, a standardized approach to adding elements not found in the spec. Feeds can contain new elements if they are defined in a namespace.
RSS is a dialect of XML, and is used for syndicating Web content and metadata. RSS 0.91 is the most commonly used of several versions available. For new RSS feeds, it's better to use version 2.0 because it is the current spec and, as mentioned above, it's backward compatible with 0.91.
Dave Winer authored version 2.0 of the spec. He intentionally avoided changes to the specification that would make it more difficult to use, or that would break existing applications. Winer sums up his philosophy like this: "Keep it simple. That's the value of RSS. Anyone who can understand a little HTML can understand RSS. That's important!"
The spec is published under a Creative Commons license (see Resources). This means that you're free to copy and distribute the spec, make derivative works, and to use it freely in commercial work. An advisory board is responsible for updating the spec, advocating for it, and documenting it.
An RSS file is made up of a
<channel> element and its sub-elements.
<channel> contains elements that represent metadata about the channel -- such as a
<description> -- in addition to the channel content itself, in the form of items. Items typically make up the bulk of the channel, and contain content that changes frequently.
A channel typically has three elements that tell you about the channel itself:
<title>: The name of the channel or feed.
<link>: The URL of the Web site or site area associated with this channel.
<description>: A brief explanation of what the channel is.
Many channel sub-elements are optional. The commonly-used
<image> element contains three required sub-elements:
<url>: The URL of a GIF, JPEG, or PNG image that represents the channel.
<title>: Describes the image. It is used in the
ALTattribute of the HTML
<image>tag when the channel is rendered in HTML.
<link>: The URL of the site. When the channel is rendered as HTML, the image can act as a link to the site.
<image> also has three optional sub-elements:
<width>: Number indicating the width of the image in pixels. The maximum value is 144 and the default value is 88.
<height>: Number indicating the height of the image in pixels. The maximum value is 400 and the default value is 31.
<description>: Contains text that is included in the
titleattribute of the link that's formed around the image when rendered.
In addition, many other optional channel elements can be used. Most of these are self-explanatory:
<copyright>: Copyright 2003, James Lewin
<managingEditor>: dan@spam_me.com (Dan Deletekey)
<webMaster>: dan@spam_me.com (Dan Deletekey)
<pubDate>: Sat, 15 Nov 2003 0:00:01 GMT
<lastBuildDate>: Sat, 15 Nov 2003 0:00:01 GMT
<generator>: Your CMS 2.0
<cloud>: Allows processes to register with a "cloud" to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.
<ttl>: Time to live, a number representing the number of minutes a feed can be cached before it should be refreshed.
<rating>: The PICS rating for the channel.
<textInput>: Defines input box that can be displayed with the channel.
<skipHours>: A hint for aggregators that tells them which hours can be skipped for updates.
<skipDays>: A hint for aggregators that tells them which days can be skipped for updates.
Items are usually the most important part of a feed. Each item can be an entry on a weblog, a complete article, a movie review, a classified ad, or whatever you want to syndicate with your channel. While other elements within a channel may stay constant, items are likely to change frequently.
You can have as many items as you want. The previous spec had a limit of 15 items, and this is still a good upper limit if you want to ensure backwards compatibility.
An item typically contains three elements:
<title>: This is the name of the item. In standards use, this is translated into a headline within HTML.
<link>: This is the URL of the item. The title is commonly used as a link, pointing to the URL contained within the
<description>: This is usually a summary of or commentary on the URL that is pointed to in the link.
All elements are optional, but an item must contain either a
<title> or a
Several other optional elements of items can be used:
<author>: E-mail address of the author.
<category>: Supports organizing entries.
<comments>: URL of a page for comments on the item.
<enclosure>: Supports media objects associated with the item.
<guid>: A permanent link that is uniquely tied to the item.
<pubDate>: When the item was published.
<source>: The RSS channel that an item comes from. This can be useful when items are aggregated together.
Listing 1 is an example of an RSS 2.0 file. Notice that the channel is contained within
<rss version="2.0">. This very basic example shows how items and images are contained within the channel. The elements shown are the most commonly used channel sub-elements.
Listing 1. A simple RSS 2.0 file
<?xml version="1.0"?> <rss version="2.0"> <channel> <title>The channel's name goes here</title> <link>http://www.urlofthechannel.com/</link> <description>This channel is an example channel for an article. </description> <language>en-us</language> <image> <title>The image title goes here</title> <url>http://www.urlofthechannel.com/images/logo.gif</url> <link>http://www.urlofthechannel.com/</link> </image> <item> <title>The Future of content</title> <link>http://www.itworld.com/nl/ecom_in_act/11122003/</link> <description> The issue of people distributing and reusing digital media is a problem for many businesses. It may also be a hidden opportunity. Just as open source licensing has opened up new possibilities in the world of technology, it promises to do the same in the area of creative content.</description> </item> <item> <title>Online Music Services - Better than free?</title> <link>http://www.itworld.com/nl/ecom_in_act/08202003/</link> <description>More people than ever are downloading music from the Internet. Many use person-to-person file sharing programs like Kazaa to share and download music in MP3 format, paying nothing. This has made it difficult for companies to setup online music businesses. How can companies compete against free?</description> </item> </channel> </rss>
Because of the popularity of RSS, many tools have emerged that allow you to work with the files in almost any environment:
- Java technology: An RSS Utilities Package, available at Sun's site, supports the use of a Tag Library within JavaServer Pages. It also includes an RSS parser.
- Perl: Several established Perl tools work with RSS. XML::RSS provides a framework for creating and maintaining RSS files. It supports converting between the more commonly used versions.
- Python: RSS.py is a set of classes for working with RSS channels with Python.
In addition, many content management and weblog tools support RSS directly. Most weblog tools, including Movable Type, Blogger, and Radio Userland, support RSS. Several content management systems, including Zope and CityDesk, now support it.
RSS 2.0 has many optional elements, including those that are needed for most channels. However, it supports extensibility so you can use elements that aren't in the spec. The RSS 2.0 spec does not spend much time defining how this will work, though. Extensibility is summed up like this: "A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace."
This leaves a great deal to the imagination! Fortunately, the spec includes an example, and you can refer to a number of examples currently in use.
The basic idea is that you can add any tag you want -- however, it's very easy to add elements that have multiple meanings. People using your channel might not have any idea what a particular tag means. For example, if I wanted to use the
<analog> tag in a channel, it would be unclear what it meant. Web gurus might think the tag referred to Analog, the popular Web log file analyzer. Science fiction fans might think the tag had something to do with Analog, the classic sci-fi magazine. Musicians might think that it referred to a popular type of synthesizer, biologists a type of organ, and electrical engineers a type of circuit. Vagueness makes it difficult to understand the meaning of a tag.
Because of this, RSS lets you add any tag you like, but requires that it be used with a namespace. This helps to clarify what that tag means.
Getting back to the
<analog> example, I might want to create a set of tags relating to e-business, and have my
<analog> tag be an "e-business" element. To do this, I could create an e-business namespace, and have
<analog> be a tag in this namespace. To do this, I would add the following namespace entry:
This creates a namespace named "ebusiness" and indicates that the documentation for this namespace is located on my site. To use the
<analog> tag, I could use the format
<ebusiness:analog>. This would distinguish it from other possible meanings of analog, like
A more practical example of extensibility is found within the sample file that accompanies the RSS 2.0 spec:
Listing 2. Namespaces in the RSS 2.0 spec sample file
<?xml version="1.0"?> <!-- RSS generated by Radio UserLand v8.0.5 on 9/30/2002; 4:00:00 AM Pacific --> <rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule"> <channel> <title>Scripting News</title> <link>http://www.scripting.com/</link> <description>A weblog about scripting and stuff like that.</description> <language>en-us</language> <blogChannel:blogRoll> http://radio.weblogs.com/0001015/userland/scriptingNewsLeftLinks.opml </blogChannel:blogRoll> <item> <description>Joshua Allen: <a href="http://www.netcrucible.com/blog/2002/09/29.html#a243"> Who loves namespaces?</a></description> <pubDate>Sun, 29 Sep 2002 19:59:01 GMT</pubDate> <guid> http://scriptingnews.userland.com/backissues/2002/09/29#When:12:59:01PM </guid> </item> </channel> </rss>
In this example, a namespace called
blogChannel is defined. It points to documentation that explains the use of several new elements that are common to weblogs. One of these is
<blogroll>. The documentation explains that a blogroll is a collection of links within a weblog that point to sites related to your weblog's content.
<blogChannel:blogRoll> tag provides the information needed for users or software to know that
blogRoll is an element defined in the
blogChannel namespace, and where the documentation for this can be found.
Again, RSS 2.0 requires namespaces only for elements not included in the spec. All the basic tags are assumed to be within the RSS 2.0 namespace. This makes the format easy to use, since you don't need to know anything about namespaces unless you want to extend RSS.
This article has looked at the importance of RSS in the areas of content syndication and aggregation. The article focused on RSS 2.0, because it is the most recent version of the specification, and it is rapidly growing in popularity. The article also reviewed the tools available for working with RSS, including aggregators, validators, and parsers. For additional information, see Resources.
- Read the author's previous developerWorks article on RSS, "An introduction to RSS news feeds" (developerWorks, November 2000), which covers version 0.91.
- Check out these popular news readers:
- Try theses popular tools for building weblogs and news feeds:
- Make use of two RSS validators, Userland and FeedValidator.
- Read the RSS 2.0 spec at Harvard Law School's technology site.
- Ready for a good explanation of the different versions of RSS? Read Mark Pilgrim's XML.com article "What is RSS?"
- Find even more articles and resources related to the topic at the OASIS RSS page.
- Learn more about how RSS can be used with Java technology at Sun's developer site.
- Check out the WirelessDevNet article on "Parsing XML With PHP" by Marc Robards.
- Learn how to develop an RSS Viewer Applet for navigating and viewing RSS channels.
- Read Uche Ogbuji's "Thinking XML" column for a discussion of the Creative Commons project (developerWorks, May 2003). The RSS spec is published under a Creative Commons license.
developerWorks offers its own RSS feeds -- find out more at http://www.ibm.com/developerworks/rss/.
- Connect to
for feeds in several tech categories.
- Try the O'Reilly Network for more tech content feeds.
- Visit the Sun Developer Network Content Syndication Program. Don't let the long name keep you away from its RSS feeds.
- Get IT news in RSS form from InfoWorld.
- Check out LockerGnome -- it has a serious attitude about RSS.
- Get Wired's syndicated news.
- Hunting the most popular weblogs? Find the help you want at DayPop and Feedster.
- Try Syndic8 for many ways to search for news feeds.
- Canada loves RSS! Visit the Government of Canada Web site for a great example of public use of RSS.
- Find a variety of news feeds at Yahoo.
James Lewin has been working with the Internet since 1995. He is the president and owner of The Lewin Group, and has written extensively on e-business and the Web. His column, Ecommerce in Action, is published by ITWorld. His interests include electronic music, analog circuits, and science fiction. He can be reached at firstname.lastname@example.org.