Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Tip: Use Atom's structure to avoid duplicates in aggregate feeds

Correctly formulate atom:id elements and keep them unique

Tyler Anderson received both his B.S. in Computer Science in 2004 and his M.S. in Electrical and Computer Engineering in 2005 from Brigham Young University. Tyler has worked with Stexar Corporation as a Design Engineer, R&D, since May 2005 to August 2006 when Stexar died. Since Tyler was discovered by Backstop Media LLC in early 2005 he has written and coded numerous articles and tutorials for IBM developerWorks and DevX.

Summary:  Atom is a growing syndication format favored by many over the major syndication format, RSS. With any indexing system like Atom, you never want to intentionally create duplicate feeds as this can overwrite the older, original id element. And duplicated entries (if they aren't overwritten) waste hard disk space. But more importantly, duplicate content confuses search engines, which can cause your rankings in the search engines to suffer. This tip shows you how to take full advantage of the id tag as the main identifier, including other information about the feed entry (like the link id and source URL for the feed), to avoid duplicates in your Atom feeds.

View more content in this series

Date:  02 Apr 2007 (Published 20 Mar 2007)
Level:  Introductory
Also available in:   Chinese  Japanese

Activity:  6637 views
Comments:  

The feed

Feeds are used to syndicate data in news archives or blog-like Web applications. This feed (see Listing 1) has a potential problem where the id of an entry could potentially be duplicated elsewhere. If this duplication doesn't overwrite the current entry, then it will create duplicated content on the syndication.


Listing 1. The original feed

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
      xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
<id>http://www.example.com/myblog/index.html</id>
<updated>2007-01-10T17:23:36.222-08:00</updated>
<title type='text'>MyBlog updates</title>
<link rel='alternate' type='text/html'
    href='http://www.example.com/myblog/index.html'></link>
<link rel='next' type='application/atom+xml'
    
href='http://www.example.com/myblog/feeds/posts/default?start-index=26&max-results=2
5'></link>
<link rel='self' type='application/atom+xml' 
    href='http://www.example.com/myblog/feeds/posts/default'></link>
<author><name>tsa</name></author>
<generator version='1.01' uri='http://www.example.com'>Example 
Blogerator</generator>
<openSearch:totalResults>2</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<entry>
<id>http://www.example.com/myblog/posts/55</id>
<published>2007-01-10T17:21:00.000-08:00</published>
<updated>2007-01-10T17:23:35.767-08:00</updated>
<title type='text'>My first post ever</title>
<content type='html'>Content goes here!</content>
<link rel='alternate' type='text/html'
    href='http://www.example.com/myblog/posts/55'></link>
<link rel='self' type='application/atom+xml'
    href='http://www.example.com/myblog/feeds/posts/default/55'></link>
<link rel='edit' type='application/atom+xml'
    href='http://www.example.com/myblog/feeds/posts/edit/55'></link>
<author><name>tsa</name></author></entry>
<entry>
<id>http://www.example.com/myblog/posts/56</id>
<published>2007-01-11T17:21:00.000-08:00</published>
<updated>2007-01-10T17:23:35.767-08:00</updated>
<title type='text'>My second post ever</title>
<content type='html'>Content for second post goes here!</content>
<link rel='alternate' type='text/html'
    href='http://www.example.com/myblog/posts/56'></link>
<link rel='self' type='application/atom+xml'
    href='http://www.example.com/myblog/feeds/posts/default/56'></link>
<link rel='edit' type='application/atom+xml'
    href='http://www.example.com/myblog/feeds/posts/edit/56'></link>
<author><name>tsa</name></author></entry>
</feed>

Listing 1 contains a feed with just two entries. In this tip you focus on the id and link tags only, as well as the source or URL of the Atom feed (contained in the link). This is a valid Atom 1.0 feed, but has a potential hazard in the construction of the idtags.


The problem

The problem is that the URL, or contents of the link tag, is used as the id. You MUST create the atom:id element in a way that assures uniqueness as the Atom syndication format specification mentions (see Resources for a link).

Though the structure of your blog, news Web site, or whatever content that you syndicate might help you to prevent duplicate id entries, it's bad practice to just use the URL as the content for the atom:id element because of the potential for duplications.

For example, you might edit a blog post or article. When you save the new content, the URL is the same. However, a new database entry is created, which means that the Atom entry must have a unique id from the entry that was updated.


The fix

The fix is to properly format the atom:id element. The most common method to format the id element is to use the 'tag' URI scheme (see Resources for a full specification).

Essentially, this scheme is a format to create the atom:id element in such a way that duplicates are not possible, yet the element is still translatable back to a URL.

The three id elements above would be translated into the tag scheme as shown in Listing 2.


Listing 2. Translating the id elements to the tag scheme

<id>tag:example.com,2007:myblog</id>
<id>tag:example.com,2007:myblog.post-55</id>
<id>tag:example.com,2007:myblog.post-56</id>

Now if you edited a post, the URL or link element will not change; however, the id might change to the following: <id>tag:example.com,2007:myblog.post-55.edit-0</id>. The URL is still retrievable from the id element, ignoring everything after edit. Re-translating this atom:id back to a URL gives: http://www.example.com/myblog/posts/55/edit/0.

That's a wrap! I hope that you'll never have any trouble duplicating and overwriting your feed entries due to an incorrectly formulated atom:id element.


Summary

You should now know more about Atom feeds and the importance of keeping the atom:id element unique. And more importantly, as you develop code to create feeds, keep the concepts discussed in this tip in mind to assure that a duplicate atom:id element never occurs in a feed entry.



Download

DescriptionNameSizeDownload method
Atom duplicates samplesx-tipatom1-dup-source.zip2KB HTTP

Information about download methods


Resources

Learn

Get products and technologies

Discuss

About the author

Tyler Anderson received both his B.S. in Computer Science in 2004 and his M.S. in Electrical and Computer Engineering in 2005 from Brigham Young University. Tyler has worked with Stexar Corporation as a Design Engineer, R&D, since May 2005 to August 2006 when Stexar died. Since Tyler was discovered by Backstop Media LLC in early 2005 he has written and coded numerous articles and tutorials for IBM developerWorks and DevX.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=202326
ArticleTitle=Tip: Use Atom's structure to avoid duplicates in aggregate feeds
publish-date=04022007
author1-email=tyleranderson5@yahoo.com
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers