Atom 1.0 extensions, Part 1: Feed history, ordering entries, and expiration timestamps

An overview of proposed extensions to the Atom 1.0 Syndication Format

Get a technical overview of several proposed extensions to the Atom 1.0 Syndication Format. This first of two articles discusses three proposed extensions that enable the reconstruction of feed history, the ability to order entries within a feed according to numeric rankings, and the expression of expiration timestamps for syndicated content.

James Snell (jasnell@us.ibm.com), Software Engineer, EMC

Photo of James M SnellJames Snell is a member of IBM's Emerging Technologies Toolkit team. He has spent the past few years focusing on emerging web services technologies and standards, and has been a contributor to the Atom 1.0 specification. He maintains a weblog focused on emerging technologies at http://www.ibm.com/developerworks/blogs/page/jasnell.



25 October 2005

In a previous article, I walked readers through an overview of the newly minted Atom 1.0 Syndication Format and provided a high-level review of the technical strengths of the standard. At the conclusion of that discussion, I briefly touched on Atom's extensibility mechanism which is designed to provide a very flexible, robust method of extending the core capabilities of the format. Since that time, a number of important and useful extensions that leverage that framework have emerged. In this article, I explore a number of these extensions, discuss their purpose and their use, and offer many examples of how you can use them.

This article assumes that you have at least a basic understanding of the Atom 1.0 syndication format and of content syndication in general. As you read through this discussion, I recommend that you keep a copy of the Atom 1.0 specification handy as a cross-reference for the various elements discussed (see Resources).

Getting historical

A blogging environment deployed on the IBM intranet serves over 13,000 registered users and hosts in excess of 1,700 active weblogs globally. Users post to the site 24 hours a day. While every individual blog has its own RSS and Atom feed, the volume of posts and the number of blogs make it difficult to track what discussions are going on in the intranet on a macro-scale. To address this challenge, a Dashboard has been incorporated into the environment so users can view (through browser and feed reader) all blog posts and comments ordered by date and time. While useful, this solution suffers from a distinct problem: Because the aggregated feed represents a sliding window through which readers can view only the most recent posts and comments, users who check the feed infrequently -- or who even just log off of their computers for the evening and check the feed again when they return the next day -- frequently miss entries that were posted during the time they were out.

Processing Atom extensions

The Atom 1.0 Syndication Format specification defines a number of clear rules for how software implementations must handle extensions to the specification. The most important of these rules states that if a particular application encounters an extension that it does not understand, it is required to simply ignore the extension without altering any of its existing behavior. In other words, feed publishers who use extensions should not depend on the clients' ability to process those extensions. See section 6 of the Atom specification (in Resources) for more details.

One solution to this problem is to increase the total number of entries displayed in the feed. For example, if the post rate is 100 posts-per-hour, and your average user checks the feed once an hour, generate your feed so that it contains the 100 most recent posts. While some users might miss some posts here and there, the odds are that most of your users will see most of the entries. The challenge with this solution is that it is playing odds in a way that is simply not scalable. First, what if the posting rate doubles to 200 posts-per-hour, or 400 posts-per-hour, or even 1000 posts-per-hour? Should you keep increasing the size of the sliding window to make sure that users are not missing posts? Second, you have no reliable way of predicting just how often a feed is going to be pulled in the future. Some users will pull the feed every five minutes; others every 30 minutes; others every hour; and some may only pull the feed once a week. You could design the size of your sliding window based on a statistical analysis of what the majority of your users have done in the past, but then you are guaranteed that some of your users will miss posts. Or you could produce multiple feeds -- one that shows all posts within the past five minutes, one that shows all posts within the past 30 minutes, one that shows all posts within the past hour, and so on -- however in so doing, you add complexity for your users who must figure out which feed to subscribe to.

Alternatively, fellow Atom Publishing and Protocol Working Group member Mark Nottingham has proposed a significantly better solution described in his IETF Internet-Draft entitled "Feed History: Enabling Incremental Syndication" (see Resources). Mark's solution proposes a model that allows feed readers to reconstruct the history of a feed by paging through multiple incremental feed documents.

When a feed reader accesses a feed document that has been flagged as being incremental in nature, the reader can see that the content of the feed contains only a partial representation of the feeds' contents. To reconstruct the full contents of the feed, the reader must walk through a series of previous feed documents that contain the remaining entries (see Figure 1).

Figure 1. Reconstructing feed history using multiple feed documents
Reconstructing feed history using multiple feed documents

An fh:incremental element shown in Listing 1 has a value of true, which indicates that the feed document in question may contain an fh:prev element that points to another document containing additional feed entries. The fh:prev element creates what is essentially a linked list of entries. The end of the list is indicated by a feed document that does not contain an fh:prev element.

Listing 1. Using the Feed History extension
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <title>My Incremental Feed</title>
  <link href="http://www.example.com" />
  <link rel="self" href="http://www.example.com/feed.xml" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:/feed</id>
  <fh:incremental>true</fh:incremental>
  <fh:prev>http://www.example.com/prevfeed.xml</fh:prev>
  ...
</feed>

An fh:incremental element with a value of false (like that in Listing 2) indicates that the feed document in question is a complete representation of the feed's contents. Examples of such feeds include things like top-10 lists or feeds that represent a queue of items similar to those published by Odeo.com (for podcasts) and Netflix.com (for movies).

Listing 2. A non-incremental feed
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <title>My Movie Queue</title>
  <link href="http://www.example.com/movies"/>
  <link rel="self" href="http://www.example.com/movies/feed" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:movies</id>
  <fh:incremental>false</fh:incremental>
  <entry>
    <title>Hitchhiker's Guide to the Galaxy</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Charlie Chaplin - City Lights</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Buster Keaton - College</title>
    <link href="..." />
    ...
  </entry>
</feed>

The fh:incremental element solves a fundamental problem in syndication by providing a means for feed readers to know explicitly when an individual feed document represents a sliding window versus a complete collection of entries. The combination of the fh:incremental and fh:prev elements solves the problem of reconstructing the full history of a feed.

Finally, I would be remiss if I didn't point out that Mark's Feed History extension has been designed to work independently of the specific syndication format, and is compatible with all versions of RSS as well as Atom 1.0. To illustrate this point, Mark has implemented Feed History support in his personal weblogs' RSS 1.0 feed (see Resources).


Bringing order to the feed

In the world of RSS-based syndication, a generally accepted practice is to implicitly order the items in the feed in a reverse chronological order from the creation date of those items. Of course, many exceptions to this rule have popped up based on specific application needs. For example, if you maintain an account at Netflix.com, the items in the RSS feed that represents the queue of movies you wish to watch are ordered by priority rank, rather than the order in which those movies were added to the queue. Moving a movie up in the queue changes its position in the RSS feed.

Further complicating the situation is that many such feeds are intended to be non-incremental -- that is, the feed document contains the full set of entries available. Many feed readers, on the other hand, will treat such feeds just like any other kind of incremental syndicated content, and will display the complete history of the feed in a presentation order that's determined by the order in which new items were received by the reader. If I reorder the items in the feed, or even wish to remove items from the feed with the intent of having feed readers no longer display those items, RSS has no mechanism that allows me to indicate those intentions.

The Atom 1.0 Syndication Format explicitly dictates that the order of entries displayed within the feed has no significance, meaning that feed readers are free to present and process the entries in any order they deem appropriate. In order to give feed publishers the ability to indicate that the order of a feed's entries is significant, a new Feed Rank extension has been proposed as an IETF Internet-Draft.

The Feed Rank mechanism introduces numeric ranking values into the entries of a feed.

Each numeric ranking value is associated with a logical ranking domain that's bound to a ranking scheme. The ranking domain provides a means of composing the set of entries that are to be ranked relative to one another. The ranking scheme defines how the numeric ranking values within a domain are to be interpreted for the sake of sorting and organizing the set of entries. The example in Listing 3 illustrates the association of the ranking domain to a ranking scheme and a set of numeric rank values.

Listing 3. A feed with ranked entries
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:r="http://purl.org/syndication/index/1.0">
  <title>My Ordered Feed</title>
  <link href="http://www.example.com" />
  <link rel="self" href="http://www.example.com/feed.xml" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:/feed</id>
  <r:ranking-scheme 
    domain="urn:example.com"
    label="Example Five Star Ranking Scheme"
    significance="ascending"
    precision="1"
    max-value="5"
    min-value="0" />
  <entry>
    <title>Entry</title>
    <link href="http://www.example.com/1</link>
    <updated>2005-12-12T12:00:00Z</updated>
    <content>My entry</content>
    <id>tag:example.com,2005:/feed</id>
    <r:rank domain="urn:example.com">5.0</r:rank>
  </entry>
  <entry>
    <title>Entry</title>
    <link href="http://www.example.com/1</link>
    <updated>2005-12-12T12:00:00Z</updated>
    <content>My entry</content>
    <id>tag:example.com,2005:/feed</id>
    <r:rank domain="urn:example.com">3.5</r:rank>
  </entry>
</feed>

The r:ranking-scheme element is the means by which ranking schemes are defined and described. The element's attributes specify the minimum and maximum values for numeric ranks (max-value, min-value), the precision to apply to those rankings, the order of significance, and a human-readable label that can be used to refer to the scheme. Most importantly, the r:ranking-scheme element's domain attribute binds the scheme to a specific ranking domain.

Ranking domains are identified by unique Internationalized Resource Identifiers (IRIs). The value of the domain attribute on the r:ranking-scheme and r:rank elements is the IRI that identifies the domain. If the domain attribute is missing, it's value defaults to the IRI value specified by the containing Atom feed's id element. If the domain attribute's value is a "same document reference", the value defaults to the base URI of the containing document.

The ranking domain identifiers for each of the ranking schemes in Listing 4 are:

  • Ranking Scheme 1: Defaults to /atom:feed/atom:id (tag:example.com,2005:/feed)
  • Ranking Scheme 2: Defaults to the document base URI (http://www.example.com/myrankedfeed)
  • Ranking Scheme 3: urn:example.com
Listing 4. http://www.example.com/myrankedfeed with multiple ranking domains
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:r="http://purl.org/syndication/index/1.0">
  <title>My Ordered Feed</title>
  <link href="http://www.example.com" />
  <link rel="self" href="http://www.example.com/feed.xml" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:/feed</id>
  <r:ranking-scheme 
    label="Ranking Scheme 1"
    significance="ascending"
    precision="1"
    max-value="5"
    min-value="0" />
  <r:ranking-scheme 
    domain=""
    label="Ranking Scheme 2"
    significance="ascending"
    precision="1"
    max-value="5"
    min-value="0" />
  <r:ranking-scheme 
    domain="urn:example.com"
    label="Ranking Scheme 3"
    significance="ascending"
    precision="1"
    max-value="5"
    min-value="0" />
  <entry>
    <title>Entry</title>
    <link href="http://www.example.com/1</link>
    <updated>2005-12-12T12:00:00Z</updated>
    <content>My entry</content>
    <id>tag:example.com,2005:/feed</id>
    <r:rank>5.0</r:rank>
    <r:rank domain="">5.0</r:rank>
    <r:rank domain="urn:example.com">5.0</r:rank>
  </entry>
</feed>

For applications such as the Netflix.com movie queue, the feed rank becomes a very useful tool when you combine it with the fh:incremental element. This element is introduced by the feed history extension, as shown in Listing 5.

Listing 5. A movie queue feed with ranked entries
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0"
      xmlns:r="http://purl.org/syndication/index/1.0">
  <title>My Movie Queue</title>
  <link href="http://www.example.com/movies"/>
  <link rel="self" href="http://www.example.com/movies/feed" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:movies</id>
  <fh:incremental>false</fh:incremental>
  <r:ranking-scheme
    domain="http://www.example.com/movies/queue"
    label="Queue"
    significance="descending"
    precision="0"
    min-value="1" />
  <r:ranking-scheme
    domain="http://www.example.com/movies/ratings"
    label="Ratings"
    significance="ascending"
    precision="1"
    min-value="0"
    max-value="5" />
  <entry>
  <title>Hitchhiker's Guide to the Galaxy</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">1</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">5.0</r:rank>
  ...
  </entry>
  <entry>
  <title>Charlie Chaplin - City Lights</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">3</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">4.5</r:rank>
  ...
  </entry>
  <entry>
  <title>Buster Keaton - College</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">2</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">3.5</r:rank>
  ...
  </entry>
</feed>

Consumers can sort these entries using either of the two defined ranking domains:

  • Domain: http://www.example.com/movies/queue
    • Rank 1: Hitchhiker's Guide to the Galaxy
    • Rank 2: Buster Keaton - College
    • Rank 3: Charlie Chaplin - City Lights
  • Domain: http://www.example.com/movies/ratings
    • Rank 5.0: Hitchhiker's Guide to the Galaxy
    • Rank 4.5: Charlie Chaplin - City Lights
    • Rank 3.5: Buster Keaton - College

In addition, because the feed is marked as being non-incremental, feed consumers can see that the feed contains the full set of entries.


Freshness dating

Quite often, published content is only intended to be valid within a specific period of time. For instance, a top-10 list of entries might only be valid for 10 days after it is published. Or a particular document might automatically expire after a given period of time, as is the case with IETF Internet-Drafts which expire after six months. To address this case, a new IETF Internet Draft entitled "Atom Metadata Expiration: Specifying Expiration Timestamps for Atom Feed and Entry metadata" (see Resources) proposes two new extension elements that you can use to indicate either an exact time or a maximum age after which the metadata contained by the feed or entry expires.

Suppose, for instance, that I want the non-incremental, ordered movie queue illustrated above to expire 10 days (864,000,000 milliseconds) after it was published. To accomplish this, I use the max-age element introduced by the expiration extension to specify a total number of milliseconds from the moment specified by the feed's atom:updated element that the feed's metadata should be considered valid (see Listing 6).

Listing 6. A feed whose content expires after 10 days
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0"
      xmlns:fa="|http://purl.org/atompub/age/1.0|">
  <title>My Movie Queue</title>
  <link href="http://www.example.com/movies"/>
  <link rel="self" href="http://www.example.com/movies/feed" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:movies</id>
  <fh:incremental>false</fh:incremental>
  <fa:max-age>864000000</fa:max-age>
  <entry>
    <title>Hitchhiker's Guide to the Galaxy</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Charlie Chaplin - City Lights</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Buster Keaton - College</title>
    <link href="..." />
    ...
  </entry>
</feed>

As an alternative to specifying a maximum age, you could use the expires element to specify an exact moment when the feed or entry is considered expired (see Listing 7).

Listing 7. A feed whose content expires at noon on December 22, 2005
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0"
      xmlns:fa="|http://purl.org/atompub/age/1.0|">
  <title>My Movie Queue</title>
  <link href="http://www.example.com/movies"/>
  <link rel="self" href="http://www.example.com/movies/feed" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:movies</id>
  <fh:incremental>false</fh:incremental>
  <fa:expires>2005-12-22T12:00:00Z</fa:expires>
  <entry>
    <title>Hitchhiker's Guide to the Galaxy</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Charlie Chaplin - City Lights</title>
    <link href="..." />
    ...
  </entry>
  <entry>
    <title>Buster Keaton - College</title>
    <link href="..." />
    ...
  </entry>
</feed>

When a feed reader accesses an Atom feed or entry that has expired, it should alert the user in some way.

The exact intent of what it means for a feed or entry to be expired is intentionally vague -- that is, the only assertion that's implied when a feed or entry expires is that the informational metadata contained within either might no longer be valid or might have changed. It is up to each individual application to define any further semantic requirements necessary for proper handling of valid and expired elements.

It's also worth noting that you should never use the expires and max-age elements for cache control purposes, or to determine a schedule for when to download a new version of an Atom document. Instead, use standard HTTP cache control mechanisms for such purposes.


Putting it together

Listing 8 illustrates a single feed that uses each of the extensions discussed thus far.

Listing 8. A combined example
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0"
      xmlns:fr="http://purl.org/syndication/index/1.0"
      xmlns:fa="http://purl.org/atompub/age/1.0">
  <title>My Movie Queue</title>
  <link href="http://www.example.com/movies"/>
  <link rel="self" href="http://www.example.com/movies/feed" />
  <updated>2005-12-12T12:00:00Z</updated>
  <author><name>James Snell</name></author>
  <id>tag:example.com,2005:movies</id>
  <fh:incremental>false</fh:incremental>
  <fr:ranking-scheme
    domain="http://www.example.com/movies/queue"
    label="Queue"
    significance="descending"
    precision="0"
    min-value="1" />
  <fr:ranking-scheme
    domain="http://www.example.com/movies/ratings"
    label="Ratings"
    significance="ascending"
    precision="1"
    min-value="0"
    max-value="5" />
  <fa:expires>2005-12-22T12:00:00Z</fa:expires>
  <entry>
  <title>Hitchhiker's Guide to the Galaxy</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">1</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">5.0</r:rank>
  ...
  </entry>
  <entry>
  <title>Charlie Chaplin - City Lights</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">3</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">4.5</r:rank>
  ...
  </entry>
  <entry>
  <title>Buster Keaton - College</title>
  <link href="..." />
  <r:rank 
  domain="http://www.example.com/movies/queue">2</r:rank>
  <r:rank 
  domain="http://www.example.com/movies/ratings">3.5</r:rank>
  ...
  </entry>
</feed>

In the second part of this series, I will examine three more extensions that you can use to:

  • Associate copyright licenses with feeds and entries
  • Control automated processing of links
  • Syndicate threaded discussions

As a final note, you should consider all of the extensions discussed here to be works-in-progress that will continue to evolve as they navigate through the IETF Internet Standards process. Most are fairly stable, but if you choose to implement any of them today, you should expect some changes in the future as the final details are discussed and finalized.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=96298
ArticleTitle=Atom 1.0 extensions, Part 1: Feed history, ordering entries, and expiration timestamps
publish-date=10252005