Start learning what the excitement about Atom is all about.
As a syndication format, Atom grew out of experience with the various flavors of RSS (of which there are many), not to create something entirely new, but to tighten up places where the specification of RSS was too vague to be useful (for instance, RSS does not specify whether title elements can contain markup or not), and to fix a few areas that were problematic (such as identifying duplicate entries in different feeds, as often happens with aggregated content). Most clients and tools for the earlier RSS feeds now support or plan to support Atom content, which, taken with the RSS problems it fixes, makes it the format I recommend. But Atom is basically an incremental improvement over RSS, not a revolutionary one. An evolutionary step Atom takes is to support two basic types of syndication document: Feeds and Entries. Feeds are comparable to RSS, they are collections of Entries. But an Entry can be a standalone document as well, containing either a post itself (a news item or a blog post, perhaps) or a reference to an external resource, such as a picture. By providing both, the syndication format provides a flexible strata for the publication protocol to build on.
Listing 1 shows a small example of an Atom feed:
Listing 1. Atom feed
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<id>urn:uuid:E14A6C6B-A832-4AE9-9D67-263181407D5E</id>
<link rel="self" href="/temp/index.atom"/>
<updated>2006-04-19T18:43:55Z</updated>
<title type='text'>Cool gizmos</title>
<subtitle type='text'>The latest in high-tech gizmos.</subtitle>
<author>
<name>Dethe Elza</name>
<email>dethe.elza@livingcode.org</email>
</author>
<entry>
<id>urn:uuid:2C31F522-20A6-44DF-AA63-6524441FF6A3</id>
<published>2006-05-02T19:00:10Z</published>
<updated>2006-04-30T20:57:20Z</updated>
<title type='text'>This new gizmo is hot, hot hot!</title>
<content type='text'>I just got my hands on the latest gizmo
that's sweeping the nation: it's totally rad.</content>
</entry>
</feed>
|
Protocols, more protocols, and APIs
The publication protocol aspect of Atom is more ambitious. Several attempts to create common protocols (though they were often mis-called APIs) to manage Web-based content (blogs, for instance) have lacked needed features and required workarounds or proprietary interfaces. And for the most part, these attempts did not take advantage of good Web application practices, such as REST. LiveJournal was the first, but put everything through HTTP POST, ignoring GET, PUT, and DELETE, and did not use HTTP authentication. XML-RPC followed the same pattern, and like the LiveJournal protocol, send everything through one URI. Until fairly recently, XML-RPC limited strings to only ASCII, thereby losing one of the great benefits of XML in the first place. While XML-RPC was not itself a publication protocol, several of them were built on top of it, including Manila RPC, Blogger API, MetaWeblog API, and the LiveJournal XML-RPC Client/Server Protocol. None of these protocols allowed for internationalization, all sent passwords in plain text, and none of them were easily extensible. Atom Publishing Protocol leverages the full use of XML (internationalization, extensible using XML Namespaces) and the full use of HTTP (all methods, authentication, URIs to identify resources).
Above and beyond weblog protocols, the Atom Publishing Protocol is a generally useful tool for managing content on the Web, and is finding uses in many places, including Bugzilla, the Google Data APIs Protocol, and many of the calendaring sites mentioned in the previous XML Matters article (see Resources) and one other important site not discussed which I'll talk about below.
I think that the Atom Publishing Protocol is an important milestone on the way to the writeable Web. The Web was always intended to be bi-directional, the PUT and DELETE methods have always been part of HTTP, but something happened on the way to the read/write Web. We've slowly made progress back to the original vision of the Web, through wikis, XML-RPC, and the big one, WebDAV. WebDAV, or simply DAV, is the Distributed Authoring and Versioning protocol, and its goal is "completing the original vision of the Web as a writeable, collaborative medium" (from the WebDAV FAQ). So it's fair to compare APP with DAV. DAV provides features above and beyond APP, including locking, storage of arbitrary metadata, and moving or renaming of stored resources. To be fair, Atom can also support arbitrary metadata, since it is easily extensible through XML Namespaces. Atom doesn't really support collaboration as such, just publishing (including later editing), so the need for locking isn't as great, and you can rename resources using Atom by means of a DELETE followed by a PUT. Atom does all that without having to extend HTTP or add new methods. WebDAV is unfortunately complex, and has historically suffered from poor and buggy implementations from major vendors. Even so, the WebDAV community wasn't satisfied with DAV's extensions to HTTP, so they have layered on (or are working to) further extensions such as Advanced Collections, Versioning and Configuration Management (because Distributed Authoring and Versioning didn't support, y'know, versioning), and Access Control. And that wasn't enough for calendaring, so CalDAV was built on top with even more extensions to HTTP.
Perhaps it isn't suprising that WebDAV has so far failed to take over the Web. It is hard to implement plus complex to setup and manage. DAV has its share of success stories, particularly in the form of Subversion, the version control system anointed as the successor to CVS. Subversion builds on top of DAV and the Versioning extensions (also called DeltaV), although it cherry-picks the parts of DAV that are relevant and leaves the rest alone, adding its own protocols to the mix. So while Subversion is great, it doesn't really justify the baroque complexity that is DAV. For 90% of what DAV is intended for, I think APP is a better fit, and for the other 10% I think other systems are a better fit. APP itself is not a be-all and end-all solution. It does not, by itself, solve problems of authorization, it does not provide a query mechanism, and it certainly is not intended for anything like real-time collaboration. I think the read/write Web can still grow, and I look forward to the day when the Jabber XMPP protocol is baked into Web browsers for realtime bidirectional peer-peer protocols, but that is a topic for a future article.
James Tauber is working on a Python project relevant to both Atom and Subversion, called Demokritos, which provides an Atom Store. The interesting part (at least in this context) is that underneath it uses Subversion to provide persistence. The project is still in its early stages (authentication is expected to be added in an upcoming version), but it's fun to watch it develop. Google Base can be considered a commercial version of an Atom Store (although its bulk upload uses now-obsolete version 0.3 of the Atom Syndication Format), and Amazon's S3 data store is conceptually similar to the Atom Publishing Protocol in its use of HTTP GET/PUT/DELETE. Choice is good, and when the choices are converging on simple, robust standards, that's even better.
The way the Atom Publishing Protocol works is that the client can query for an introspection document, which outlines the collections provided, their capabilities (such as writable or editable) and their addresses as URIs. From this, the client can query the collections themselves to find out similar information about their contained resources, which can be Atom Entries or media such as pictures, audio, videos, or other media. The collections are Atom feeds, and to add new material you simply PUT an Atom Entry document, receiving back a URI to the resource which you can use to further manipulate (POST to edit, GET to read, DELETE to remove) that resource. For example, Listing 2 is a simple introspection document:
Listing 2. Sample introspection document
<?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://purl.org/atom/app#">
<workspace title="Gizmo Page" >
<collection
title="Cool gizmos"
href="http://example.org/gizmo/index.atom" >
<member-type>entry</member-type>
</collection>
<collection
title="Photos of Gizmos"
href="http://example.org/gizmo/image" >
<member-type>media</member-type>
</collection>
</workspace>
</service>
|
One question remains in this bootstrapping process: How do you find the introspection document in the first place? An expired IETF draft called Atom Feed Autodiscovery, which nonetheless has been widely implemented, describes how to embed references to one or more Atom feeds into the metadata of an HTML page>. This method works equally well for introspection documents. The technique is quite simple, in the <head> of an HTML document, insert a <link> element whose rel attribute includes the keyword "alternate," whose type attribute has the value "application/atom+xml" and whose href attribute points to an Atom feed. While the Atom working group hasn't finalized exactly how this will work with introspection documents, it will probably be something like the above, with the following changes as spelled out on the Atom wiki: the rel attribute is "introspection," the type attribute must be "application/atomserv+xml" and the href points to an introspection document, rather than a feed. In both cases, the <link> element should also contain a human readable value in the title attribute. For example:
<link rel="instrospection" type="application/atomserv+xml"
href="/introspection.atomsrv" title="All about my feeds"/>
|
What does Atom have to do with Microformats?
In the last XML Matters article, I mentioned that microformats could really come into their own when coupled with Atom (see Resources). The jury is still out on that one. Uche Ogbuji, while a fan of Atom, offers a dissenting opinion on the utility of microformats. There are two microformats in process which specifically target Atom (that I know of): hAtom, which is a subset of Atom, and an IETF draft for "XHTML Microformats for the Atom Publishing Protocol," which proposes two microformats for use within APP to describe categories and errors. I haven't found a use for either of these.
More interestingly, microformats are designed to be embedded into other documents, and Atom documents are designed to contain HTML or XHTML fragments (in a carefully controlled manner). So there is no reason why you can't have an Atom feed which embeds, say, your schedule, using hCalendar. And in fact, just as the previous article was going through final edits, Google released their Calendar product, which allows you to subscribe to a calendar's feed in Atom format. Unfortunately, while you can get the feed in a way which includes iCalendar information, Google doesn't provide hCalendar in the feeds. Several people have jumped on this to write things such as GreaseMonkey scripts which find hCalendar in a page and add it to their Google Calendar, but I wanted to go the other way, using my Google Calendar Atom feed to add events to my weblog in hCalendar format. Since a quick search didn't turn up anyone doing this already, I whipped up a quick and dirty script to do it myself. I wrote it in Python, and it requires two 3rd party libraries, httplib2 and ElementTree, links to both are in Resources. This is intended as an example, and as is traditional in examples, it includes no error checking and isn't particularly flexible. Listing 3 shows a test function to demonstrate its use:
Listing 3. Example script for Atom-to-weblog
'''
Utility to grab a Google Calendar feed and return hCalendar code
This module has one public function:
events_for_feed(feed_uri, start, end) -> [hCalendarEvents]
'''
import StringIO
import httplib2
import cElementTree
_ATOM_NS = 'http://www.w3.org/2005/Atom'
_GDATA_NS = 'http://schemas.google.com/g/2005'
_MONTHS = [None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
_EVENT_TEMPLATE = '''<div class="vevent"
xmlns="http://www.w3.org/1999/xhtml">
<abbr class="dtstart" title="%(start)s">%(start_hr)s</abbr> -
<abbr class="dtend" title="%(end)s">%(end_hr)s</abbr> -
<span class="summary">%(summary)s</span> - at
<span class="location">%(where)s</span>
<div class="description">%(description)s</div>
</div>
'''
def _end_human_readable(ts):
return ts[11:16]
def _start_human_readable(ts):
month = _MONTHS[int(ts[5:7], 10)]
return '%s %s, %s - %s' % (month, ts[8:10], ts[:4], ts[11:16])
def _entries_for_feed(feed_uri, start, end):
h = httplib2.Http('.cache')
resp, content = h.request('%s?start-min=%s&start-max=%s' %
(feed_uri, start, end), 'GET')
doc = cElementTree.parse(StringIO.StringIO(content))
return doc.findall('//{%s}entry' % _ATOM_NS)
def _event_for_entry(entry):
when = entry.find('{%s}when' % _GDATA_NS)
start = when.get('startTime')
start_hr = _start_human_readable(start)
end = when.get('endTime')
end_hr = _end_human_readable(end)
where = entry.find('{%s}where' % _GDATA_NS).get('valueString')
summary = entry.findtext('{%s}title' % _ATOM_NS)
description = entry.findtext('{%s}content' % _ATOM_NS)
return locals().copy()
def events_for_feed(feed_uri, start, end):
return [_EVENT_TEMPLATE % _event_for_entry(entry) for entry in
_entries_for_feed(feed_uri, start, end)]
def test():
feed_uri = 'http://www.google.com/calendar/feeds/\
dethe.elza@gmail.com/public/full'
start = '2006-04-30T00:00:00'
end = '2006-05-30T00:00:00'
for event in events_for_feed(feed_uri, start, end):
print event
if __name__ == '__main__':
test()
|
Give that script your Calendar feed and a start date and end date, and it will return a list of hCalendar-formatted strings, ready to embed in your weblog. That's the beauty of the read/write Web. If one party doesn't support your format, but the format they use is open and documented, as Google's is, you can always create what you need yourself. While my thesis that microformats and Atom were made for each other is far from proven, I think this shows that they can work together well, in a variety of ways (and that much work remains to be done, as always).
Work is still progressing on the Atom Publication Protocol, and other halo-effect specifications such as Google's Calendar extensions. Sites are adopting Atom rapidly, and both applications and programming tools are adapting to Atom as well. With its open format, extensibility, and clear definition, Atom could be as potent a force for the Web as the relational database was for the enterprise. HTTP GET and View Source are still as potent a combination now as they were in the heady early days of the Web.
In this whirlwind introduction, I hope you learned the basics of why the Atom Syndication format is important, and how the Atom Publication Protocol can make it easy to use. Please explore the links in Resources to find out more about how to use these new technologies.
Learn
- "XML Matters: Pipestreaming microformats," (developerWorks, April 2006): Combine pipes and streams to move XML from one state to another.
- "An overview of the Atom 1.0 Syndication Format" (Agust 2005): For more details about Atom, see this IBM developerWorks article by James Snell.
-
Atom Syndication Format: Review the official IETF Specification for Atom documents, officially known as RFC 4287.
-
Atom Publishing Protocol specification: Read the details of this new standard for content publishing and management.
-
Atom Feed Autodiscovery: In this expired IETF draft, learn to embed references to Atom feeds and introspection documents.
-
Extensible Messaging and Presence Protocol (XMPP): Explore the IETF's formalization of the core protocols for XML routing that lie at the heart of the Jabber instant messaging system.
-
RSS and Atom compared: Get a summary of the benefits of Atom Syndication Format over RSS 2.0.
-
Atom Publishing Protocol Slides: Revisit Joe Gregorio's presentation at XML 2005 on the problems solved by Atom (both syndication format and publishing protocol).
- "Bugzilla Query RSS should HTML-escape summary in <title>:" Read the bug discussion in Bugzilla that revealed limitations in RSS that led the Bugzilla team to move to Atom.
- "Dreaming of an Atom Store:" Peruse Joe Gregorio's vision of using the Atom Publishing Protocol together with Amazon's Open Search.
-
Open Search: Check out part of Amazon's A9 search tools that allow the aggregation of search results. You can get Search results in HTML, RSS, or Atom.
-
Google Data APIs: See how Google's Atom Store extends the Atom Publishing Protocol with authentication, optiomistic concurrency, and querying capabilities.
-
Google Calendar Data API: Explore Google's URL-formatting for queries and documentation of their Java library for accessing calendar data.
-
Atom Enabled: Visit a site that introduces Atom, and provides links to libraries, clients and services that use Atom.
-
WebDAV Resources: Begin with this good starting point for all things WebDAV.
- "The Atom Project Wiki:" In this Wiki, hash out issues for the syndication format and publishing protocol formerly known as Pie.
- "Microformats in Context:" Explore the good and the bad of microformats with Uche Ogbuji.
- "XHTML Microformats for the Atom Publishing Protocol:" Review an IETF draft hCat for Atom categories and hError for Atom errors.
-
hAtom: Get more on a microformat which is a subset of Atom itself.
-
Google Base: Compare and note how close this is to the Atom Store that Joe Gregorio describes in his article above.
- "Amazon S3: Simple Storage Service:" While not an Atom Store, explore conceptual similarities, with the added twist of supporting BitTorrent as a download option. Combined with Amazon's A9 search, which can return searches as Atom feeds, this can get pretty interesting.
Get products and technologies
-
Universal Feed Parser: With this Python program, parse all known forms of RSS and Atom, with a focus on best-effort data extraction over validity enforcement, that is, it will parse many feeds which are not strictly correct.
- Joe Gregorio's Python library, Httplib2 for client-side HTTP: Find support for HTTP 1.1, HTTPS, three forms of HTTP Authentication, cacheing, compression, all HTTP methods, and more.
-
Subversion: Try a Version control utility that provides an incremental improvement over CVS using WebDAV.
-
Demokritos: Try James Tauber's Atom Store; it written in Python and uses Subversion for persistence.
-
Feed Validator: Check an Atom feed (yours or someone else's) for errors.
- Atom Publishing Protocol Test Suite: Test a site supporting APP for conformance to the Protocol. (Includes HTTP authentication.)
-
ElementTree: Explore Frederik Lundh's pythonic XML library.
Discuss
-
Atom and RSS forum: Find tips, tricks, and answers about Atom, RSS, or other syndication topics in this forum.

Dethe Elza's favorite job title has been Chief Mad Scientist. Dethe can be reached at delza@livingcode.org. He keeps a blog mainly about Python and Mac OS X at http://livingcode.org/ and writes programs for his kids. Suggestions and recommendations on this column are welcome.

David Mertz is a great believer in open standards, and is only modestly intimidated by verbosity. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.





