Feeds are used to syndicate data in news archives or blog-like Web applications. This feed (see Listing 1) has a potential problem where the id of an entry could potentially be duplicated elsewhere. If this duplication doesn't overwrite the current entry, then it will create duplicated content on the syndication.
Listing 1. The original feed
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
<id>http://www.example.com/myblog/index.html</id>
<updated>2007-01-10T17:23:36.222-08:00</updated>
<title type='text'>MyBlog updates</title>
<link rel='alternate' type='text/html'
href='http://www.example.com/myblog/index.html'></link>
<link rel='next' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/default?start-index=26&max-results=2
5'></link>
<link rel='self' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/default'></link>
<author><name>tsa</name></author>
<generator version='1.01' uri='http://www.example.com'>Example
Blogerator</generator>
<openSearch:totalResults>2</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<entry>
<id>http://www.example.com/myblog/posts/55</id>
<published>2007-01-10T17:21:00.000-08:00</published>
<updated>2007-01-10T17:23:35.767-08:00</updated>
<title type='text'>My first post ever</title>
<content type='html'>Content goes here!</content>
<link rel='alternate' type='text/html'
href='http://www.example.com/myblog/posts/55'></link>
<link rel='self' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/default/55'></link>
<link rel='edit' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/edit/55'></link>
<author><name>tsa</name></author></entry>
<entry>
<id>http://www.example.com/myblog/posts/56</id>
<published>2007-01-11T17:21:00.000-08:00</published>
<updated>2007-01-10T17:23:35.767-08:00</updated>
<title type='text'>My second post ever</title>
<content type='html'>Content for second post goes here!</content>
<link rel='alternate' type='text/html'
href='http://www.example.com/myblog/posts/56'></link>
<link rel='self' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/default/56'></link>
<link rel='edit' type='application/atom+xml'
href='http://www.example.com/myblog/feeds/posts/edit/56'></link>
<author><name>tsa</name></author></entry>
</feed>
|
Listing 1 contains a feed with just two entries. In this tip you focus on the id and link tags only, as well as the source or URL of the Atom feed (contained in the link). This is a valid Atom 1.0 feed, but has a potential hazard in the construction of the idtags.
The problem is that the URL, or contents of the link tag, is used as the id. You MUST create the atom:id element in a way that assures uniqueness as the Atom syndication format specification mentions (see Resources for a link).
Though the structure of your blog, news Web site, or whatever content that you syndicate might help you to prevent duplicate id entries, it's bad practice to just use the URL as the content for the atom:id element because of the potential for duplications.
For example, you might edit a blog post or article. When you save the new content, the URL is the same. However, a new database entry is created, which means that the Atom entry must have a unique id from the entry that was updated.
The fix is to properly format the atom:id element. The most common method to format the id element is to use the 'tag' URI scheme (see Resources for a full specification).
Essentially, this scheme is a format to create the atom:id element in such a way that duplicates are not possible, yet the element is still translatable back to a URL.
The three id elements above would be translated into the tag scheme as shown in Listing 2.
Listing 2. Translating the
id elements to the tag scheme<id>tag:example.com,2007:myblog</id> <id>tag:example.com,2007:myblog.post-55</id> <id>tag:example.com,2007:myblog.post-56</id> |
Now if you edited a post, the URL or link element will not change; however, the id might change to the following: <id>tag:example.com,2007:myblog.post-55.edit-0</id>. The URL is still retrievable from the id element, ignoring everything after edit. Re-translating this atom:id back to a URL gives: http://www.example.com/myblog/posts/55/edit/0.
That's a wrap! I hope that you'll never have any trouble duplicating and overwriting your feed entries due to an incorrectly formulated atom:id element.
You should now know more about Atom feeds and the importance of keeping the atom:id element unique. And more importantly, as you develop code to create feeds, keep the concepts discussed in this tip in mind to assure that a duplicate atom:id element never occurs in a feed entry.
| Description | Name | Size | Download method |
|---|---|---|---|
| Atom duplicates samples | x-tipatom1-dup-source.zip | 2KB | HTTP |
Information about download methods
Learn
- atom:id element: Get information on this part of the Atom syndication format.
- 'tag' URI scheme: Get the full specification on the "tag" Uniform Resource Identifier (URI) scheme.
- On RSS and Atom: Read DeWitt Clinton's blog.
- An overview of the Atom 1.0 Syndication Format (James Snell, developerWorks, August 2005): See how this popular Web content syndication format stacks up.
- Getting to know the Atom Publishing Protocol (James Snell, developerWorks, October - December 2006): Find out about the Atom API in this three-part series:
- Part 1: Create and edit Web resources with the Atom Publishing Protocol: Explore a high-level overview of the protocol and its basic operation and capabilities.
- Part 2: Put the Atom Publishing Protocol (APP) to work: Learn to use APP to interact with a number of real-world deployed applications.
- Part 3: Introducing the Apache Abdera project: Start to implement Atom-enabled apps using a new open-source project, called Abdera, currently under incubation at the Apache Software Foundation.
- RSS and Atom in Action (Dave Johnson, Manning Publications, July 2006): If you prefer something more physical, take a look at this book about the blog technologies of news feed formats and publishing protocols and how to put these building blocks together. (Blogapps was put together for this book.)
- Atom 1.0 specification: Read about this XML-based Web content and metadata syndication format.
- Atom 1.0 compatible software: Visit the Atom Working Group's Wiki for a list of known Atom 1.0 feed consumers.
- Use the Atom format for syndicating news and more (Uche Ogbuji, developerWorks, May 2004): Read more on this XML-based standard, format, and API for the interchange and cross-reference of Web metadata.
- developerWorks XML zone: Learn all about XML at the developerWorks XML zone.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML xone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
Get products and technologies
- feedvalidator.org: Validate your Atom feed generator.
Discuss
- Participate in the discussion forum.
- XML zone discussion forums: Participate in any of several XML-centered forums.
Tyler Anderson received both his B.S. in Computer Science in 2004 and his M.S. in Electrical and Computer Engineering in 2005 from Brigham Young University. Tyler has worked with Stexar Corporation as a Design Engineer, R&D, since May 2005 to August 2006 when Stexar died. Since Tyler was discovered by Backstop Media LLC in early 2005 he has written and coded numerous articles and tutorials for IBM developerWorks and DevX.




