Skip to main content

XML Matters: Up and Atom

Is Atom a format or a protocol? It's both! Use it for syndication and publishing

Dethe Elza (delza@livingcode.org), Technical Architect, Justsystems
Photo of Dethe Elza
Dethe Elza's favorite job title has been Chief Mad Scientist. Dethe can be reached at delza@livingcode.org. He keeps a blog mainly about Python and Mac OS X at http://livingcode.org/ and writes programs for his kids. Suggestions and recommendations on this column are welcome.
David Mertz, Ph.D (mertz@gnosis.cx), Author, Gnosis Software, Inc.
Photo of David Mertz
David Mertz is a great believer in open standards, and is only modestly intimidated by verbosity. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Summary:  Atom is really two different things, both related to syndication (blogs, newsfeeds, and other information which gets updated periodically). The Atom Syndication Format is an IETF standard for publishing entries (single topics or items) and feeds (collections of topics or items). The Atom Publication Protocol (sometimes called the Atom API or abbreviated APP) is a means for finding, listing, adding, editing, and removing content from an Atom repository. While Atom the Syndication Format has gone through the IETF process to become a standard, the standards committee is still at work on Atom the Publishing Protocol, although it seems likely that much of it has stabilized at this point.

View more content in this series

Date:  23 May 2006
Level:  Intermediate
Activity:  2814 views

Start learning what the excitement about Atom is all about.

The Atom syndicate

As a syndication format, Atom grew out of experience with the various flavors of RSS (of which there are many), not to create something entirely new, but to tighten up places where the specification of RSS was too vague to be useful (for instance, RSS does not specify whether title elements can contain markup or not), and to fix a few areas that were problematic (such as identifying duplicate entries in different feeds, as often happens with aggregated content). Most clients and tools for the earlier RSS feeds now support or plan to support Atom content, which, taken with the RSS problems it fixes, makes it the format I recommend. But Atom is basically an incremental improvement over RSS, not a revolutionary one. An evolutionary step Atom takes is to support two basic types of syndication document: Feeds and Entries. Feeds are comparable to RSS, they are collections of Entries. But an Entry can be a standalone document as well, containing either a post itself (a news item or a blog post, perhaps) or a reference to an external resource, such as a picture. By providing both, the syndication format provides a flexible strata for the publication protocol to build on.

Listing 1 shows a small example of an Atom feed:


Listing 1. Atom feed
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
    <id>urn:uuid:E14A6C6B-A832-4AE9-9D67-263181407D5E</id>
    <link rel="self" href="/temp/index.atom"/>
    <updated>2006-04-19T18:43:55Z</updated>
    <title type='text'>Cool gizmos</title>
    <subtitle type='text'>The latest in high-tech gizmos.</subtitle>
    <author>
        <name>Dethe Elza</name>
        <email>dethe.elza@livingcode.org</email>
    </author>
    <entry>
        <id>urn:uuid:2C31F522-20A6-44DF-AA63-6524441FF6A3</id>
        <published>2006-05-02T19:00:10Z</published>
        <updated>2006-04-30T20:57:20Z</updated>
        <title type='text'>This new gizmo is hot, hot hot!</title>
        <content type='text'>I just got my hands on the latest gizmo
            that's sweeping the nation: it's totally rad.</content>
    </entry>
</feed>


Protocols, more protocols, and APIs

The publication protocol aspect of Atom is more ambitious. Several attempts to create common protocols (though they were often mis-called APIs) to manage Web-based content (blogs, for instance) have lacked needed features and required workarounds or proprietary interfaces. And for the most part, these attempts did not take advantage of good Web application practices, such as REST. LiveJournal was the first, but put everything through HTTP POST, ignoring GET, PUT, and DELETE, and did not use HTTP authentication. XML-RPC followed the same pattern, and like the LiveJournal protocol, send everything through one URI. Until fairly recently, XML-RPC limited strings to only ASCII, thereby losing one of the great benefits of XML in the first place. While XML-RPC was not itself a publication protocol, several of them were built on top of it, including Manila RPC, Blogger API, MetaWeblog API, and the LiveJournal XML-RPC Client/Server Protocol. None of these protocols allowed for internationalization, all sent passwords in plain text, and none of them were easily extensible. Atom Publishing Protocol leverages the full use of XML (internationalization, extensible using XML Namespaces) and the full use of HTTP (all methods, authentication, URIs to identify resources).

Above and beyond weblog protocols, the Atom Publishing Protocol is a generally useful tool for managing content on the Web, and is finding uses in many places, including Bugzilla, the Google Data APIs Protocol, and many of the calendaring sites mentioned in the previous XML Matters article (see Resources) and one other important site not discussed which I'll talk about below.

I think that the Atom Publishing Protocol is an important milestone on the way to the writeable Web. The Web was always intended to be bi-directional, the PUT and DELETE methods have always been part of HTTP, but something happened on the way to the read/write Web. We've slowly made progress back to the original vision of the Web, through wikis, XML-RPC, and the big one, WebDAV. WebDAV, or simply DAV, is the Distributed Authoring and Versioning protocol, and its goal is "completing the original vision of the Web as a writeable, collaborative medium" (from the WebDAV FAQ). So it's fair to compare APP with DAV. DAV provides features above and beyond APP, including locking, storage of arbitrary metadata, and moving or renaming of stored resources. To be fair, Atom can also support arbitrary metadata, since it is easily extensible through XML Namespaces. Atom doesn't really support collaboration as such, just publishing (including later editing), so the need for locking isn't as great, and you can rename resources using Atom by means of a DELETE followed by a PUT. Atom does all that without having to extend HTTP or add new methods. WebDAV is unfortunately complex, and has historically suffered from poor and buggy implementations from major vendors. Even so, the WebDAV community wasn't satisfied with DAV's extensions to HTTP, so they have layered on (or are working to) further extensions such as Advanced Collections, Versioning and Configuration Management (because Distributed Authoring and Versioning didn't support, y'know, versioning), and Access Control. And that wasn't enough for calendaring, so CalDAV was built on top with even more extensions to HTTP.

Perhaps it isn't suprising that WebDAV has so far failed to take over the Web. It is hard to implement plus complex to setup and manage. DAV has its share of success stories, particularly in the form of Subversion, the version control system anointed as the successor to CVS. Subversion builds on top of DAV and the Versioning extensions (also called DeltaV), although it cherry-picks the parts of DAV that are relevant and leaves the rest alone, adding its own protocols to the mix. So while Subversion is great, it doesn't really justify the baroque complexity that is DAV. For 90% of what DAV is intended for, I think APP is a better fit, and for the other 10% I think other systems are a better fit. APP itself is not a be-all and end-all solution. It does not, by itself, solve problems of authorization, it does not provide a query mechanism, and it certainly is not intended for anything like real-time collaboration. I think the read/write Web can still grow, and I look forward to the day when the Jabber XMPP protocol is baked into Web browsers for realtime bidirectional peer-peer protocols, but that is a topic for a future article.

James Tauber is working on a Python project relevant to both Atom and Subversion, called Demokritos, which provides an Atom Store. The interesting part (at least in this context) is that underneath it uses Subversion to provide persistence. The project is still in its early stages (authentication is expected to be added in an upcoming version), but it's fun to watch it develop. Google Base can be considered a commercial version of an Atom Store (although its bulk upload uses now-obsolete version 0.3 of the Atom Syndication Format), and Amazon's S3 data store is conceptually similar to the Atom Publishing Protocol in its use of HTTP GET/PUT/DELETE. Choice is good, and when the choices are converging on simple, robust standards, that's even better.

The way the Atom Publishing Protocol works is that the client can query for an introspection document, which outlines the collections provided, their capabilities (such as writable or editable) and their addresses as URIs. From this, the client can query the collections themselves to find out similar information about their contained resources, which can be Atom Entries or media such as pictures, audio, videos, or other media. The collections are Atom feeds, and to add new material you simply PUT an Atom Entry document, receiving back a URI to the resource which you can use to further manipulate (POST to edit, GET to read, DELETE to remove) that resource. For example, Listing 2 is a simple introspection document:


Listing 2. Sample introspection document
                <?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://purl.org/atom/app#">
    <workspace title="Gizmo Page" >
        <collection
            title="Cool gizmos"
            href="http://example.org/gizmo/index.atom" >
            <member-type>entry</member-type>
        </collection>
        <collection
            title="Photos of Gizmos"
            href="http://example.org/gizmo/image" >
            <member-type>media</member-type>
        </collection>
    </workspace>
</service>

One question remains in this bootstrapping process: How do you find the introspection document in the first place? An expired IETF draft called Atom Feed Autodiscovery, which nonetheless has been widely implemented, describes how to embed references to one or more Atom feeds into the metadata of an HTML page>. This method works equally well for introspection documents. The technique is quite simple, in the <head> of an HTML document, insert a <link> element whose rel attribute includes the keyword "alternate," whose type attribute has the value "application/atom+xml" and whose href attribute points to an Atom feed. While the Atom working group hasn't finalized exactly how this will work with introspection documents, it will probably be something like the above, with the following changes as spelled out on the Atom wiki: the rel attribute is "introspection," the type attribute must be "application/atomserv+xml" and the href points to an introspection document, rather than a feed. In both cases, the <link> element should also contain a human readable value in the title attribute. For example:

<link rel="instrospection" type="application/atomserv+xml"
    href="/introspection.atomsrv" title="All about my feeds"/>
    


What does Atom have to do with Microformats?

In the last XML Matters article, I mentioned that microformats could really come into their own when coupled with Atom (see Resources). The jury is still out on that one. Uche Ogbuji, while a fan of Atom, offers a dissenting opinion on the utility of microformats. There are two microformats in process which specifically target Atom (that I know of): hAtom, which is a subset of Atom, and an IETF draft for "XHTML Microformats for the Atom Publishing Protocol," which proposes two microformats for use within APP to describe categories and errors. I haven't found a use for either of these.

More interestingly, microformats are designed to be embedded into other documents, and Atom documents are designed to contain HTML or XHTML fragments (in a carefully controlled manner). So there is no reason why you can't have an Atom feed which embeds, say, your schedule, using hCalendar. And in fact, just as the previous article was going through final edits, Google released their Calendar product, which allows you to subscribe to a calendar's feed in Atom format. Unfortunately, while you can get the feed in a way which includes iCalendar information, Google doesn't provide hCalendar in the feeds. Several people have jumped on this to write things such as GreaseMonkey scripts which find hCalendar in a page and add it to their Google Calendar, but I wanted to go the other way, using my Google Calendar Atom feed to add events to my weblog in hCalendar format. Since a quick search didn't turn up anyone doing this already, I whipped up a quick and dirty script to do it myself. I wrote it in Python, and it requires two 3rd party libraries, httplib2 and ElementTree, links to both are in Resources. This is intended as an example, and as is traditional in examples, it includes no error checking and isn't particularly flexible. Listing 3 shows a test function to demonstrate its use:


Listing 3. Example script for Atom-to-weblog

'''
    Utility to grab a Google Calendar feed and return hCalendar code

    This module has one public function:

    events_for_feed(feed_uri, start, end) -> [hCalendarEvents]
'''
import StringIO

import httplib2
import cElementTree

_ATOM_NS = 'http://www.w3.org/2005/Atom'
_GDATA_NS = 'http://schemas.google.com/g/2005'

_MONTHS = [None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
    'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

_EVENT_TEMPLATE = '''<div class="vevent"
        xmlns="http://www.w3.org/1999/xhtml">
    <abbr class="dtstart" title="%(start)s">%(start_hr)s</abbr> -
    <abbr class="dtend" title="%(end)s">%(end_hr)s</abbr> -
    <span class="summary">%(summary)s</span> - at
    <span class="location">%(where)s</span>
    <div class="description">%(description)s</div>
</div>
'''

def _end_human_readable(ts):
    return ts[11:16]

def _start_human_readable(ts):
    month = _MONTHS[int(ts[5:7], 10)]
    return '%s %s, %s - %s' % (month, ts[8:10], ts[:4], ts[11:16])

def _entries_for_feed(feed_uri, start, end):
    h = httplib2.Http('.cache')
    resp, content = h.request('%s?start-min=%s&start-max=%s' %
        (feed_uri, start, end), 'GET')
    doc = cElementTree.parse(StringIO.StringIO(content))
    return doc.findall('//{%s}entry' % _ATOM_NS)

def _event_for_entry(entry):
    when = entry.find('{%s}when' % _GDATA_NS)
    start = when.get('startTime')
    start_hr = _start_human_readable(start)
    end = when.get('endTime')
    end_hr = _end_human_readable(end)
    where = entry.find('{%s}where' % _GDATA_NS).get('valueString')
    summary = entry.findtext('{%s}title' % _ATOM_NS)
    description = entry.findtext('{%s}content' % _ATOM_NS)
    return locals().copy()

def events_for_feed(feed_uri, start, end):
    return [_EVENT_TEMPLATE % _event_for_entry(entry) for entry in
        _entries_for_feed(feed_uri, start, end)]


def test():
    feed_uri = 'http://www.google.com/calendar/feeds/\
    dethe.elza@gmail.com/public/full'
    start = '2006-04-30T00:00:00'
    end = '2006-05-30T00:00:00'
    for event in events_for_feed(feed_uri, start, end):
        print event

if __name__ == '__main__':
    test()
    

Give that script your Calendar feed and a start date and end date, and it will return a list of hCalendar-formatted strings, ready to embed in your weblog. That's the beauty of the read/write Web. If one party doesn't support your format, but the format they use is open and documented, as Google's is, you can always create what you need yourself. While my thesis that microformats and Atom were made for each other is far from proven, I think this shows that they can work together well, in a variety of ways (and that much work remains to be done, as always).


The friendly Atom

Work is still progressing on the Atom Publication Protocol, and other halo-effect specifications such as Google's Calendar extensions. Sites are adopting Atom rapidly, and both applications and programming tools are adapting to Atom as well. With its open format, extensibility, and clear definition, Atom could be as potent a force for the Web as the relational database was for the enterprise. HTTP GET and View Source are still as potent a combination now as they were in the heady early days of the Web.

In this whirlwind introduction, I hope you learned the basics of why the Atom Syndication format is important, and how the Atom Publication Protocol can make it easy to use. Please explore the links in Resources to find out more about how to use these new technologies.


Resources

Learn

Get products and technologies

  • Universal Feed Parser: With this Python program, parse all known forms of RSS and Atom, with a focus on best-effort data extraction over validity enforcement, that is, it will parse many feeds which are not strictly correct.

  • Joe Gregorio's Python library, Httplib2 for client-side HTTP: Find support for HTTP 1.1, HTTPS, three forms of HTTP Authentication, cacheing, compression, all HTTP methods, and more.

  • Subversion: Try a Version control utility that provides an incremental improvement over CVS using WebDAV.

  • Demokritos: Try James Tauber's Atom Store; it written in Python and uses Subversion for persistence.

  • Feed Validator: Check an Atom feed (yours or someone else's) for errors.

  • Atom Publishing Protocol Test Suite: Test a site supporting APP for conformance to the Protocol. (Includes HTTP authentication.)

  • ElementTree: Explore Frederik Lundh's pythonic XML library.

Discuss

  • Atom and RSS forum: Find tips, tricks, and answers about Atom, RSS, or other syndication topics in this forum.

About the authors

Photo of Dethe Elza

Dethe Elza's favorite job title has been Chief Mad Scientist. Dethe can be reached at delza@livingcode.org. He keeps a blog mainly about Python and Mac OS X at http://livingcode.org/ and writes programs for his kids. Suggestions and recommendations on this column are welcome.

Photo of David Mertz

David Mertz is a great believer in open standards, and is only modestly intimidated by verbosity. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=112103
ArticleTitle=XML Matters: Up and Atom
publish-date=05232006
author1-email=delza@livingcode.org
author1-email-cc=dwxed@us.ibm.com
author2-email=mertz@gnosis.cx
author2-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers