Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Annotating the Web with Atom

Open up the whole idea of user participation

Uche Ogbuji (uche@ogbuji.net), Partner, Zepheira, LLC
Uche Ogbuji
Uche Ogbuji is Partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies. Mr. Ogbuji is lead developer of 4Suite, an open source platform for XML, RDF and knowledge-management applications, the Jacqard agile methodology for team Web development, and the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his blog Copia.
Eric Larson (eric@ionrock.org), Developer, YouGov
Eric Larson photo
Hailing from the vast expanses of Texas, Eric Larson spends his days coding Python and working with large data sets. Eric is a lead developer for Bright Content, a RESTful content management system built around AtomPub. Eric is also contributor to the XML/RDF libraries, Amara and Akara. Outside of technology, Eric plays the bass in the rock band Ume, where he becomes something of a luddite in favor of the sounds of analog compression and feedback.

Summary:  You've seen reader comments on weblogs and other Web 2.0 sites, but the Atom protocol makes it possible to create and manage such comments in a very flexible way. Flexible Web annotations is an idea that will open up an entirely new class of Web applications with very little actual new invention. Learn how to create a system to manage annotations for anything on the Web, from nearly anywhere.

Date:  08 Jul 2008
Level:  Intermediate PDF:  A4 and Letter (152KB | 12 pages)Get Adobe® Reader®
Also available in:   Chinese  Japanese

Activity:  8362 views
Comments:  

What we call Web 2.0 came from several related impulses to make the Web more valuable. One of those impulses is for a "read-write Web". Most of the time we use the Web as mere spectators. There are a few publishers (writers) and innumerable readers. More and more people do have the tools to write the Web, but again they are usually just writing to a few small areas where they have control. Many would like to see a more balanced Web where more people can contribute and can do so more widely. The most important Web 2.0 developments (weblogs, weblog comments, wikis, and forums, for example) and resource or media sharing sites (del.icio.us, Flickr, YouTube, and Digg, for example) make it more of a read-write Web. Still there is room to go further. In this article we introduce a subtle but effective approach for widening the space in which average users can write the Web. To get the most out of this article, you should be familiar with the basics of the Atom syntax and Atom Publishing Protocol (see Resources).

Easing beyond Web page comments

Comments on weblogs and such sites are a suitable and fairly well-understood starting point for annotating the Web. They associate snippets of text with a URL, which might be a weblog entry, a news article, a media resource, or something similar. There is a limited concept of typing for such comments (grouping them by nature or source). Some are merely pointers to other primary URLs, such as weblog track-backs and ping-backs. Figure 1 is an example of a weblog page with an entry, one posted comment, and the form for posting further comments, taken from Sam Ruby's "Intertwingly" weblog.


Figure 1. A weblog with commenting system
Weblog entry, comment, and comments form

It's all just URLs and relationships

Most such commenting systems are implemented as records buried in a database and pulled into the page rendering process with very specialized code. More and more systems offer unique URLs (known as "perma-links") for each comment, which is an important step. If you take a step back and look at the abstract concept of such comments, each one is basically a specific relationship between one Web resource, the target page of the comment, and another, the comment itself. As required for Web resources, each has a URL. This is a very simple, but powerful arrangement that can be made even more powerful through a simple adjustment. Rather than burying the relationship between these Web resources within specialized databases and code, we propose using a standard, open way of expressing the relationship. That way comments can be queried and aggregated across systems; and we already know from Web 2.0 mash-ups how wide the doors of creativity get kicked open when Web publishers use sharable, hack-able, open formats.

When you think about it in such general terms, you can really use your imagination. The Web resource you associate with a target doesn't just need to be a short snippet of text, as in simple comments. They can be images, sounds, other media, other weblog entries (this idea is just a generalization of track-backs and ping-backs), and even more interestingly, services. For a more conventional take on the latter idea, consider this: When you submit a target URL to a service such as Digg or del.icio.us, you are effectively making the same type of URL association as in a comment. Sometimes people submit a weblog article to Digg and then leave a text comment on the weblog saying, "Cool. I submitted this to Digg," which is redundant once you start thinking more generally. The association of the weblog entry with the Digg service should be as transparent as the association of a comment through that weblog's local forms.

"Comment" is too limited a name for the generalized idea of associating an arbitrary Web resource to a target. What we're really talking about here are Web annotations, designed to be readily discovered and easily processed. The key to ensure this is to build on the scaffolding of well-established standards. The Atom group of standards, in particular, has a lot to offer.

Web annotations through the Atom lens

The Atom Threading Extension extends the core Atom syntax with a means of providing context for an Atom entry. Simply, the Atom Threading Extension allows you to declare an entry or feed as being a response or reply to some resource. This extension makes creating feeds for comments and threaded conversations rather obvious. Listing 1 is a simple example of a comment to a blog entry, using Atom and the Threading Extension.


Listing 1. Weblog entry comment in Atom and the Threading Extension
                
<entry xmlns="http://www.w3.org/2005/Atom"
  xmlns:thr="http://purl.org/syndication/thread/1.0">
  <id>tag:eric@ionrock.org,2008-02-29:1-Annotations_and_Communcations</id>
  <title>Communication Reactions</title>
  <content type="text">
    One good use of annotations is tracking edits of some document!
  </content>
  <thr:in-reply-to
    rel="http://ionrock.org/blog/2008/01/10/Annotations_and_Communcations.atom"
    href="http://ionrock.org/blog/2008/01/10/Annotations_and_Communcations/"
    type="text/html" />
</entry>
    
    

An annotation is just a generalization of a comment. It can be anything with a Web URL and representation, not just a snippet of text. (In fact, using a neat trick you can use things as annotations without any Web representation, but that's a topic for another article.) Comments are generally limited to a specific usage context, but luckily the Atom Threading Extension already supports more general usage by providing the means to set a pointer, while leaving the context completely open.


The annotation service

The Atom syntax provides a format for expressing Web annotations. The Atom Publishing Protocol ("AtomPub") provides a framework for building annotation services. In this article we'll present an annotation server that's no more than a slightly specialized AtomPub server. The only extra requirement is that Atom entries contain a threading element that will be indexed for later queries. This server's purpose is to provide a single service for storing annotations that can be queried based on these indexed references. In other words, you can ask the server, "give me all annotations of resource X," and it will use the index to provide a quick response. If you build on this idea you may want to tailor the service a bit to provide convenience operations facilitating more complex annotations, but for this article we keep it simple.

Since the annotation service provides an AtomPub interface, you can define its usage using an AtomPub service document, given in Listing 2.


Listing 2. AtomPub service document for the annotation service
                
<service xmlns="http://www.w3.org/2007/app"
         xmlns:a="http://www.w3.org/2005/Atom"
         xml:base="http://ionrock.org/annotations/">
  <workspace>
    <a:title>Annotations Collections</a:title>
      <collection href="comments/">
        <a:title>Comments for Ionrock.org</a:title>
        <accept>application/atom+xml;type=entry</accept>
      </collection>
   </workspace>

  <workspace>
    <a:title>Annotations Management</a:title>
    <collection href="manage/">
      <a:title>Manage Annotation Collections</a:title>
      <accept>application/atom+xml;type=entry</accept>
    </collection>
  </workspace>
</service>
    
    

In the service defined above, there are two primary workspaces. The first workspace is for collections of annotations. The listing only contains the "comments" collection, but you can add others as needed. In order to facilitate doing so, the second workspace, "Annotations Management," has a master collection that defines what other collections are available. If you were to request the feed at the "manage" URL, you would only find one entry that describes the "comments" collection.

With the service document in place, the annotation service is essentially usable with the AtomPub interface. You could also build a management application for annotation service on this AtomPub basis.


Creating a commenting interface

With the basic annotations service in place, we can consider how to use the service for the more specific commenting use case. The idea is to allow a comment to be posted to our service from any Web page. For purposes of this article we assume our clients will be Web browsers, which presents a slightly awkward issue, as it is not entirely trivial to POST an Atom entry using traditional Web forms. One solution is to use XForms, but XForms are not yet universal and you may not want to rely on its availability to the user. Another option is to use JavaScript to compose a valid Atom entry from the form data (basically the Ajax approach). While for some, the JavaScript may be a trivial task, it could pose a problem on more limited systems. For example, a social networking site that allows pasting HTML might not allow JavaScript. This leaves us with simple HTML forms and a server endpoint to convert form data to an Atom entry and POST it to the Annotation Service endpoint.

To help make this more discoverable, we add a link to our comment collection feed that indicates where this form's processing endpoint is available. Listing 3 is an Atom feed snippet with an example.


Listing 3. Atom feed snippet with an example of a comment form endpoint
                
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>tag:ionrock.org:site-comments</id>
  <title>Ionrock dot org Web Log Comments</title>
  <link rel="comment-forms" href="http://ionrock.org/comments/forms/" />
  <entry>
   ...
  </entry>
</feed>
    
    

The next step is to add some basic forms to the page we want to comment on. Listing 4 is an HTML form snippet.


Listing 4. HTML form snippet for posting a comment
                
<form action="http://ionrock.org/comments/forms/" method="POST"
     name="annotation-service-comments">

  <fieldset>
    <legend>Leave a Comment</legend>
    <input type="hidden" name="thr-in-reply-to"
           value="http://ionrock.org/tutorials/pyxml/amara/Atom_Fun/"
    />
    <input type="hidden" name="atom-author-name" value="Uche Ogbuji" />
    <input type="hidden" name="atom-author-url"
           value="http://copia.ogbuji.net" />
    <label for="atom-title">Title<label>
    <input type="text" name="atom-title" />
    <label for="atom-content">Content</label>
    <textarea name="atom-content" rows="10" cols="30"></textarea>
    <input type="submit" name="submit" value="submit" />
  </fieldset>
</form>
    
    

We have named forms elements using a very simple convention. The atom prefix implies the value will be eventually transformed into an Atom element. The first value after the - delimiter defines the Atom element. In the case of the author, Atom defines this as an atom:author element with an atom:name and the optional atom:url and atom:email elements. To support this, we simply add the element after the author using another - delimiter. The one case where we did not use this convention was in the thr-in-reply-to, which will define the Atom Threading Extension element. We preferred a more natural name, and besides, we wouldn't want users to assume the convention is meant to be rigid.

You may also notice that the author information is contained in hidden inputs instead of text inputs. In this example, there is an assumption that the user was validated somehow before being presented with the forms. For example, we could use the Simple Registration Extension with OpenID to gather basic information. This extension in OpenID allows you to provide extra information after authenticating and approving, sending the information to the site requesting it. Likewise, you could simply replace the hidden input forms for the author information with text inputs.

With the forms in place, we need to handle submissions to the form. One strategy would be to go through the form values and use our convention to pick out what items we need. In this case, it seems better to just grab the items that we want to grab. Since this could be implemented in any language, we assume you can work out how to take the form input and transform that to a valid Atom entry with the correct elements. Lastly, if there are any elements that you require, use a proper HTTP status code to convey the problem with some link back to the original referrer indicating the error so they can fix things.


Creating a comment query interface

With commenting in place you now want a way of using the actual comments. For this example, it would be nice to show the title, author, and content of each comment. One method would be to get the entire comment feed from the AtomPub interface and simply find entries that meet the correct criteria. This model is not ideal, though, because as more comments are added to the feed, the bandwidth and processing that needs to happen will only grow per request. The solution is to create a simple query interface to help keep the bandwidth under control as well as to make processing the results simpler.

There are two basic scenarios to consider. In both, the query argument will be the URL of the target resource for the annotations. In the first scenario, all URLs are actually paths to the same base URL. This would allow you to create a URL interface using the path directly in the URL. For example:

# the annotated URL
http://ionrock.org/about/

# the annotation feed for the above URL
http://ionrock.org/comments/feed/about/
    
      

In the second scenario you expect URLs that may be very different from some common URL. In this case the hostname, protocol, port number, and so on might all be radically different for each annotation. In this scenario it might be more advantageous to use a query string argument:

http://ionrock.org/comments/find_feed/?href=http://xml3k.org/Bright_Content
    

In this case, using a query string argument might provide a more flexible means of finding annotations without removing the readability.

With our query interface in place, we can start making use of our comment feeds. Listing 5 is an example in XSLT:


Listing 5. Simple XSLT snippet for accessing the comment annotations for a resource
                
<xsl:template name="get-comment-feed">
  <xsl:param name="comment-feed-url" />
  <xsl:variable name="atom-feed" select="document($comment-feed-url)" />

  <xsl:if test="$atom-feed/a:feed/a:entry">
    <xsl:apply-templates match="$atom-feed/a:feed/a:entry"
                         mode="comments" />
  </xsl:if>
</xsl:template>

<xsl:template match="a:entry" mode="comments">
  <div class="comment-entry">
    <h4><xsl:value-of select="a:title" /></h4>
    <xsl:copy-of select="a:content/xh:div" />
  </div>
</xsl:template>
    
    

This kind of template could be very easily added to a client side XSLT in order to push the request to the browser. Of course you could also use JavaScript to process the feed.


Extending the annotations service

You can easily add other aspects by using features found in Atom. You can use atom:control and atom:draft elements to create a moderation queue. You can represent ping-backs or track-backs in the same collection as well. You could also use atom:category elements with your own defined schemes to define special meaning to certain comments. For example, the scheme could be relative to an anchor in the page:

<a id="section-3.4" name="section-3.4"/>
<h2>Section 3.4 - Handling Annotations with Atom</h2>
    
    

The above might be referenced in the entry with the following atom:category element:

<category scheme="http://ionrock.org/ns/dyntag/anchor/"
          value="section-3.4" />
    
    

Using this simple pattern, you could apply annotations to portions of Web resources as well. This is the sort of flexibility you gain when you simply build on strong foundations such as Atom and AtomPub.


Wrap up

This article does not contain much new science. The general ideas were explored in the Annotea project of W3C staffers in the early part of this decade. The framework of our approach is the established Atom syntax and the Atom protocol. The code we've presented is just one possible flavor, and you'll find there are many languages and libraries you can use to craft your own implementation. The big lesson is that RESTful architecture makes it pretty easy to build on existing work, and that once you get used to thinking that way, you start focusing less on narrow solutions such as weblog comments, and more on expansive ideas such as annotating the Web.


Resources

Learn

Get products and technologies

  • Check out Bright Content, a content management system designed for lightweight applications such as weblogs, built on RESTful principles.

Discuss

About the authors

Uche Ogbuji

Uche Ogbuji is Partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies. Mr. Ogbuji is lead developer of 4Suite, an open source platform for XML, RDF and knowledge-management applications, the Jacqard agile methodology for team Web development, and the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his blog Copia.

Eric Larson photo

Hailing from the vast expanses of Texas, Eric Larson spends his days coding Python and working with large data sets. Eric is a lead developer for Bright Content, a RESTful content management system built around AtomPub. Eric is also contributor to the XML/RDF libraries, Amara and Akara. Outside of technology, Eric plays the bass in the rock band Ume, where he becomes something of a luddite in favor of the sounds of analog compression and feedback.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=319020
ArticleTitle=Annotating the Web with Atom
publish-date=07082008
author1-email=uche@ogbuji.net
author1-email-cc=dwxed@us.ibm.com
author2-email=eric@ionrock.org
author2-email-cc=dwxed@us.ibm.com