Web content syndication is an area of growing importance on the Internet and behind the firewall. What was once the sole domain of bloggers and online news sites is evolving into a platform for next generation Web-based services and content distribution. While the adoption of syndication technologies is growing at a fevered pitch, these technologies have a long history of technical issues, ambiguities, and interoperability challenges that have made life difficult for software developers and consumers who use these emerging trends. To address these issues, members of the syndication technology community came together to pool their combined experiences and define the Atom Syndication Format and the Atom Publishing Protocol standards (see Resources). On July 15th, 2005, the first of these specifications, the Atom Syndication Format, was released to the world for implementation.
This article assumes that you have at least a basic understanding of content syndication and the existing family of specifications. As you read through this overview, I recommend that you keep a copy of the Atom 1.0 format specification handy as a cross-reference for the various elements discussed.
It is important to point out that it is not the intent of this discussion to disparage RSS in any way. Rather, the goal is to illustrate the types of improvements that the Atom format delivers relative to the existing family of syndication formats, and to highlight the strengths inherent in the Atom format.
A simple example
For anyone who has used the RSS family of specifications to do content syndication, Atom 1.0 will be readily familiar. Atom does, however, differ from RSS in many important respects. Listing 1 illustrates a simple Atom 1.0 feed.
Listing 1. A simple Atom example
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en" xml:base="http://www.example.org"> <id>http://www.example.org/myfeed</id> <title>My Simple Feed</title> <updated>2005-07-15T12:00:00Z</updated> <link href="/blog" /> <link rel="self" href="/myfeed" /> <entry> <id>http://www.example.org/entries/1</id> <title>A simple blog entry</title> <link href="/blog/2005/07/1" /> <updated>2005-07-15T12:00:00Z</updated> <summary>This is a simple blog entry</summary> </entry> <entry> <id>http://www.example.org/entries/2</id> <title /> <link href="/blog/2005/07/2" /> <updated>2005-07-15T12:00:00Z</updated> <summary>This is simple blog entry without a title</summary> </entry> </feed>
Atom requires that every feed and entry contain three elements:
- A unique identifier, which can be as simple as the URI of a blog entry or other Web resource represented by an entry, or as complex as a truly unique 128-bit Globally Unique Identifier (GUID)
- A title, which expresses a short, human readable subject line for the entry, and can be a blank string (represented by an empty title element, such as
- A timestamp which indicates when the last update occurred
Further, Atom takes the time to carefully describe a robust, flexible, and consistent content model that's capable of supporting: plain text, escaped HTML, well-formed XHTML, arbitrary XML, base-64 encoded binary content, and URI pointers to content not included directly within the feed. In contrast, without resorting to the use of non-standardized and inconsistently implemented namespace extensions, RSS is capable only of handling plain text and escaped HTML content.
Atom also provides a well-defined extensibility model that provides the same kind of decentralized, dynamic mechanisms for adding new metadata and content supported by RSS, but does so in a way that helps protect core interoperability between implementations. For example, Atom clearly articulates where extension elements can and cannot appear within a document, which extensions are language sensitive (and thereby affected by
xml:lang attributes), and how an Atom implementation must react when it encounters an unfamiliar extension element.
Finally, Atom provides rigorous definitions for the various required and optional metadata elements within its core namespace. For instance, Atom defines an
author element that is a complex structure including a name, an e-mail address (as defined by RFC 2822), and a resource identifier that's associated with the author in some way (such as the URI of the author's home page).
A feed or entry can have multiple
author elements along with zero or more
contributor elements. These elements identify individuals who might have contributed to the production of the feed or entry, but whose level of input does not warrant recognition as an author (for example, audio engineers, editors, software developers, and others). Both the
contributor elements are fully extensible, allowing content producers to provide as much detail about the author or contributor as they deem appropriate. In comparison, RSS specifies a much more limited
author element that only appears once within an item and is only capable of expressing an e-mail address.
Listing 2. Person and contributor examples with FOAF extensions
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:foaf="http://xmlns.com/foaf/0.1" xml:base="http://www.example.org"> ... <author> <name>James M Snell</name> <foaf:homepage rdf:resource="/blog" /> <foaf:img rdf:resource="/mypic.png" /> </author> <contributor> <name>Jane Doe</name> <foaf:homepage rdf:resource="/janesblog" /> <foaf:image rdf:resource="/janespic.png" /> </contributor> ... </feed>
Additional out-of-the-box features of Atom 1.0 include:
- The ability for individual entries to exist independently of a feed, providing significant new options for aggregation and distribution of syndicated content
- ISO-8601 and XML Schema compatible timestamps
- Relative URI support using
- Enhanced internationalization through the use of Internationalized Resource Identifiers (IRIs) and
- Accessibility features that make it easier for users with disabilities to consume feeds
- An HTML-like dynamically extensible link mechanism that links a feed or entry to external resources
- Self-referential feeds that help to ease the subscription process
- A MIME media type that can identify Atom 1.0 documents
- Built-in support for XML Digital Signatures and XML Encryption
- A non-normative RELAX NG schema that validates Atom 1.0 document instances
- A core subset that is compatible with RDF
Overall, the various features built into Atom are geared towards allowing the format to support a much broader range of syndication use cases while it addresses many of the technical weaknesses that permeate the existing family of syndication standards.
Support for enclosures
Outside of weblog and news content syndication, one of the most popular evolving applications of syndication technology has been in the area of podcasting. A podcast is a data feed that distributes recorded digital audio files that are automatically downloaded and copied to a user's portable media device. Currently, podcasting is enabled through the use of RSS 2.0's
enclosure tag as illustrated in Listing 3.
Listing 3. RSS 2.0 podcasting example
<rss version="2.0"> <channel> <title>My Podcast Feed</title> <link>http://example.org</link> <author>email@example.com</author> <item> <title>Podcasting with RSS</title> <link>http://www.example.org/entries/1</link> <description>An overview of RSS podcasting</description> <pubDate>Fri, 15 Jul 2005 00:00:00 -0500</pubDate> <guid isPermaLink="true">http://www.example.org/entries/1</guid> <enclosure url="http://www.example.org/myaudiofile.mp3" length="12345" type="audio/mpeg" /> </item> </channel> </rss>
While podcasting is rapidly growing in popularity, the RSS 2.0 enclosure tag has at least one very significant limitation that has proven to be an annoyance to podcasters: RSS allows only one
enclosure tag per item. This means that podcast producers who wish to make their audio downloads available in multiple formats (such as MP3, BitTorrent, or WMA) must offer separate feeds for each format they wish to offer. Atom, on the other hand, allows any single entry to contain multiple enclosures, each with an associated media
type attribute that makes it possible for podcasters to produce a single feed containing all of the formats they distribute.
To illustrate this by example, consider the list of podcast feeds available from IT Conversations (see Resources). Because IT Conversations podcasts are offered in multiple formats, potential subscribers must select from at least 73 individual RSS feeds with enclosures (excluding the 37 text-only feeds that are also listed). Using Atom enclosures, IT Conversations would be able to cut the total number of feeds in half simply by including two enclosure links in the Atom entry. Such a reduction in feeds results in a net reduction in complexity for both the content publisher and content subscribers.
Listing 4. Atom 1.0 podcasting example
<feed xmlns="http://www.w3.org/2005/Atom"> <id>http://www.example.org/myfeed</id> <title>My Podcast Feed</title> <updated>2005-07-15T12:00:00Z</updated> <author> <name>James M Snell</name> </author> <link href="http://example.org" /> <link rel="self" href="http://example.org/myfeed" /> <entry> <id>http://www.example.org/entries/1</id> <title>Atom 1.0</title> <updated>2005-07-15T12:00:00Z</updated> <link href="http://www.example.org/entries/1" /> <summary>An overview of Atom 1.0</summary> <link rel="enclosure" type="audio/mpeg" title="MP3" href="http://www.example.org/myaudiofile.mp3" length="1234" /> <link rel="enclosure" type="application/x-bittorrent" title="BitTorrent" href="http://www.example.org/myaudiofile.torrent" length="1234" /> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <h1>Show Notes</h1> <ul> <li>00:01:00 -- Introduction</li> <li>00:15:00 -- Talking about Atom 1.0</li> <li>00:30:00 -- Wrapping up</li> </ul> </div> </content> </entry> </feed>
Atom enclosures allow you to do more than just distribute audio content. Enclosure links can reference any type of resource. Listing 5, for instance, uses multiple enclosures within a single entry to reference translated versions of a single PDF document that's accessible through FTP. The
hreflang attribute identifies the language that each PDF document has been translated into.
Listing 5. Atom 1.0 feed with enclosures for multiple languages
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"> <id>http://www.example.org/myfeed</id> <title>My Feed</title> <updated>2005-07-15T12:00:00Z</updated> <author> <name>James M Snell</name> </author> <entry> <id>http://www.example.org/entries/1</id> <title>Blogging Guidelines</title> <updated>2005-07-15T12:00:00Z</updated> <summary>New Corporate Blogging Guidelines</summary> <link rel="enclosure" xml:lang="en-us" title="Blogging Guidelines -- English" type="application/pdf" hreflang="en-us" href="ftp://www.example.org/en/bloggingguidelines.pdf" /> <link rel="enclosure" xml:lang="de" title="Richtlinien Blogging - Deutscher" type="application/pdf" hreflang="de" href="ftp://www.example.org/de/bloggingguidelines.pdf" /> <link rel="enclosure" xml:lang="fr" title="Directives De Blogging - Francais" type="application/pdf" hreflang="fr" href="ftp://www.example.org/fr/bloggingguidelines.pdf" /> </entry> </feed>
The example in Listing 5 is impossible to support in RSS 2.0 unless you introduce non-standardized namespace extensions into the feed. There are a number of important reasons for this:
- RSS does not allow multiple enclosures within an entry
- RSS does not provide a means of associating a language with the enclosed resource
- RSS enclosures are required to use HTTP URLs
- RSS does not provide a means of optionally associating a human readable title for a referenced resource
Another important point is that the Atom
link elements that enable enclosures can do far more than just associate downloadable files with an entry. Links also can specify meaningful links to other types of resources:
<link rel="alternate" />-- Identifies an alternate version of the feed or entry (for example, a weblog home page)
<link rel="related" />-- Identifies a resource that is described in some way by the content of the entry
<link rel="self" />-- Identifies a resource that is equivalent to the feed or entry; generally this permits a feed or entry to become self-referential to allow flexible auto-discovery mechanisms
<link rel="via" />-- Identifies a resource that provided the information contained in the feed or entry; for example, if the entry was distributed through an online aggregation service, the
vialink identifies the aggregator as an alternative to the currently common practice of having the aggregator override the RSS
These built-in link relations are designed to cover the most common and generic types of links expected to be used with feeds. New types of relationships can be dynamically defined using fully-qualified URIs. I'll talk more about the extensibility of link elements, as well as illustrate a simple example, a bit later in this article.
In addition to support for links and enclosures, Atom introduces the ability to reference entry content by URI. Listing 6, for instance, illustrates how an Atom feed for a photo weblog might appear. The
content element references each individual photograph in the blog. The
summary element provides a caption for the image.
Listing 6. A simple list of images using Atom 1.0
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://www.example.org/"> <id>http://www.example.org/pictures</id> <title>My Picture Gallery</title> <updated>2005-07-15T12:00:00Z</updated> <author> <name>James M Snell</name> </author> <entry> <id>http://www.example.org/entries/1</id> <title>Trip to San Francisco</title> <link href="/entries/1" /> <updated>2005-07-15T12:00:00Z</updated> <summary>A picture of my hotel room in San Francisco</summary> <content type="image/png" src="/mypng1.png" /> </entry> <entry> <id>http://www.example.org/entries/2</id> <title>My new car</title> <link href="/entries/2" /> <updated>2005-07-15T12:00:00Z</updated> <summary>A picture of my new car</summary> <content type="image/png" src="/mypng2.png" /> </entry> </feed>
This content-by-reference mechanism provides a very flexible means of expanding the types of content that one can syndicate through Atom.
For example, the idea of using the syndication model to distribute software updates is often discussed. In so doing, it is helpful to link to the downloadable file that contains the software update, and a Web page that describes the update. Because Atom clearly separates the roles of the
content elements, creating such a feed is a straightforward exercise that requires no extensions to the core Atom namespace.
Listing 7. A software update feed using Atom 1.0
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://www.example.com"> ... <entry> <id>tag:update:20050718</id> <title>Update: 20050718</title> <updated>2005-07-18T12:00:00Z</updated> <link rel="alternate" type="text/html" href="/updates/2005/07/18/readme.html_20050718" /> <content type="application/zip" src="/updates/2005/07/18/update_20050718.zip" /> </entry> <entry> <id>tag:update:20050717</id> <title>Update: 20050717</title> <updated>2005-17-17T12:00:00Z</updated> <link rel="alternate" type="text/html" href="/updates/2005/07/17/readme_20050717.html" /> <content type="application/zip" src="/updates/2005/07/17/update_20050717.zip" /> </entry> </feed>
Other applications of content-by-reference include the syndication of data not typically suitable for static embedding within a feed. Examples of such content include live audio or video broadcast streams, links to secure account information or transactions, and large data streams.
Listing 8. Atom 1.0 feed advertising live streaming audio feeds
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://www.example.com"> ... <entry> ... <link rel="alternate" type="text/html" href="/shows/aboutshow1.html" /> <content type="audio/x-mpegurl" src="/streams/show1.mpu" /> </entry> <entry> ... <link rel="alternate" type="text/html" href="/shows/aboutshow2.html" /> <content type="audio/x-mpegurl" src="/streams/show2.mpu" /> </entry> </feed>
An important strength in current syndication technology is the ability for application developers to dynamically extend a feed with new types of metadata. One key goal of the Atom working group was a well-defined extensibility model that preserved the decentralized, dynamic extensibility mechanisms that content publishers and syndication application developers have come to expect, and to protect core interoperability between Atom implementations.
Extensions to Atom come in two flavors, both of which I illustrate here:
- New namespace-qualified extension elements and attributes
- New link element relation types
Namespace extensions involve mixing new XML elements and attributes with the core Atom elements. For example, Atom defines elements that describe the moment when an entry was created and when the entry was published. However, imagine an application that produces entries whose content must expire at a given point in time (for example, a feed representing special sale offers or a weekly top-ten list). Atom does not provide any core elements that can be used to specify an expiration date. It is possible, however, to declare such an element in a separate namespace and include it in the Atom feed as in Listing 9. Consumers of the feed who are not aware of the expiration extension element can simply choose to ignore it.
Listing 9. A simple namespaced expires extension
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://www.snellspace.com/atom/extensions/proposed/"> ... <item> <id>http://www.example.com/offers/1</id> <title>A limited time offer!</title> <updated>2005-07-20T12:00:00Z</updated> <s:expires>2005-08-01T12:00:00Z</s:expires> <summary>Take advantage of this limited time offer!</summary> <link href="http://www.example.com/offers/1" /> </item> </feed>
You can include extension elements and attributes throughout an Atom document with a few basic exceptions. For instance, Atom date constructs such as the
atom:updated element can contain
extension attributes but cannot contain
As a quick aside, because Atom was defined through a formalized IETF standardization process, a common misconception is that extensions like the
s:expires element in Listing 9 also have to be defined and ratified through an equally formal and centralized process. That is absolutely not the case. Atom extensions can be defined in a completely decentralized, open, and informal manner without any involvement on the part of the IETF, while still preserving interoperability.
Link relation extensions involve the creation of a new
rel attribute value that identifies a new type of link relation.
link elements associate external resources with the feed or entry; the
rel attribute identifies the purpose of the link. By creating new link relations, you can extend the types of relationships that the
link element is capable of expressing.
For instance, most weblog software packages support the ability for readers to post comments to a blog entry. These comments can themselves appear as entries within a feed. Listing 10 illustrates a link extension that I have proposed to allow a bi-directional link between an entry and an associated comment.
Listing 10. Proposed comments feed extension
<feed xmlns="http://www.w3.org/2005/Atom"> ... <link title="Link to the comments feed" rel="http://purl.org/syndication/thread/1.0/comments" href="http://www.example.com/feed/comments" type="application/atom+xml" /> <entry> <id>urn:entry:1</id> <title>The original entry</title> <updated>2005-12-20T12:00:00Z</updated> <link href="http://www.example.com/entries/1" /> </entry> </feed> <!-- http://www.example.com/feed/comments --> <feed xmlns="http://www.w3.org/2005/Atom"> <link title="Link to the root feed" rel="http://purl.org/syndication/thread/1.0/root" href="http://www.example.com/feed" type="application/atom+xml" /> <entry> <id>urn:entry:1:comments:1</id> <title>This is a comment</title> <updated>2005-12-20T12:00:10Z</updated> <link href="http://www.example.com/entries/1/comments/1" /> <link rel="http://purl.org/syndication/thread/1.0/in-reply-to" href="urn:entry:1" /> </entry> </feed>
In Listing 10, three new link relations are created:
http://purl.org/syndication/thread/1.0/comments-- Links a feed or entry with an Atom feed that contains comments
http://purl.org/syndication/thread/1.0/root-- Links the comments feed with the feed containing the original entries
http://purl.org/syndication/thread/1.0/in-reply-to-- Links a comment entry with the original entry
This proposed extension is still being actively discussed and developed and is expected to evolve over time.
Other extensions to express feed history, associate licenses, and provide a list ordering mechanism have been proposed and more are in the works. Some of these extensions might ultimately become IETF Internet-Drafts or even RFCs; others will not. It is expected that many useful extensions will emerge over time as developers begin to roll out new and interesting applications. It is also entirely possible to use a great number of existing common RSS extensions with Atom with very little effort.
In May of 2004, Uche Ogbuji published an article here on developerWorks that provided an early, introductory exploration of the effort to define Atom. In his introduction, Uche wrote that one of the goals of Atom "was to create a more technologically sound design than many of the flavors of RSS, using the practical experience of the many RSS users to make the practical design compromises that would enable the new format to work in harmony with the architecture as well as the culture of the Web." While it has taken some time, a lot of careful discussion, and a significant amount of effort on the part of IETF working group participants, Atom 1.0 now achieves the goals of providing a simple, well-defined, and unambiguous format for content syndication on the Web.
- Participate in the discussion forum.
- Read the Atom 1.0 specification.
- Check out the RSS 2.0 specification.
- Visit the Atom Working Group's Wiki for a list of known Atom 1.0 compatible software and a detailed comparison of Atom 1.0 versus RSS 2.0.
- Want to know more about podcasting? Visit ipodder.org.
- Read Uche Ogbuji's developerWorks article "Use the Atom format for syndicating news and more" (May 2004).
- IT Conversations offers a wide range of podcasts through RSS.
- Find hundreds more XML resources on the developerWorks XML zone.
- Find out how you can become an an IBM Certified Developer in XML and related technologies.