A simple fact about HTTP is both its greatest strength and its central weakness: HTTP is a stateless protocol. Each request to an HTTP server resource is meant to be idempotent, which is to say the same request should return the same result at each invocation. Idempotency is the central idea in REST: the same request — perhaps encoding client information — should return the same data whenever it is made.
In contrast to the REST philosophy, Ajax applications are often very stateful. Some field or region in a Web application reflects the current state of some server data, with client JavaScript polling used to query that current state periodically (there are ways to make this more push-oriented, but that is not essential to this tip). The Web application, however, more or less expects the server to keep track of what it needs to know on the next polling event: what data the client has seen or not seen, what interactions have already occurred, and so forth.
One common means of making Ajax applications technically RESTful is to arrange matters such that every query for the latest data has a globally unique URI. For example, a query might incorporate a UUID, either in a URL-encoded parameter or with a hidden form variable; for instance, an XMLHttpRequest object might GET the following resource:
http://myserver.example.com?uuid=4b879324-8ec0-4120-bba6-890eb0aa3fc0 |
On the very next polling event, even if it is merely one second later, a different URI would be opened.
Understanding the meaning of "same data" is more subtle than it might appear. Only in caricature must the same URI always return identical data. After all, even a static Web page might change when the content is corrected (say, the typos are fixed in a published article). The idea behind idempotency is merely that the change involved should not be a direct effect of the GET request itself. So having a constantly changing resource like this is a perfectly reasonable approach:
http://myserver.example.com/latest_data/ |
The issue is merely that what makes up "latest_data" depends on something other than merely whether, when, and by whom this data has been retrieved. A server can be perfectly RESTful and still reflect "the state of the world."
A colleague of mine, Miki Tebeka, and I faced exactly this situation of developing a Web application that frequently polled the latest data from a server, with a JavaScript XMLHttpRequest() object. The Python server example I present here is inspired by a nice in-house module Miki created, but simplified and improved.
There were two problems we wanted to solve here. One was avoiding sending any substantial message at all when nothing had changed since the prior request. The second problem was avoiding excessive use of database or computational resources in generating duplicate data.
The "Not Modified" problem is, in fact, addressed right in the HTTP protocol, though this correct solution is underused. What we may — and should — do is simply return an HTTP 304 status code. It is the responsibility of our Ajax code to check for 304 status, and if found, simply not to change client application state based on the (absence of) data sent from the poll.
The server resource issue can be addressed by caching prior data and then aggregating the very newest additions. This solution is generally only relevant if "latest data" consists of relatively discrete items of data, not as much if the entire data set is interdependent. We can track the cached state of the client session by using a client cookie. Listing 1 puts it together:
Listing 1. Session-enabled server code: server.cgi
from datetime import datetime
session = ClientSession()
old_stuff = session.get("data", []) # Retrieve cached data
last_query = session.get("last", None)
prune_data(old_stuff, last_query) # Age out really-old data
new_stuff = get_new_stuff() # Look for brand-new data
if not new_stuff:
print "Status: 304" # "Not Modified" status
else
print session.cookie # New or existing cookie
print "Content-Type: text/plain"
print
all_stuff = old_stuff + new_stuff
session["data"] = all_stuff
session["last"] = datetime.now().isoformat()
print encode_data(all_stuff) # XML, or JSON, or...
session.save()
|
A slight bit of cleverness goes into the ClientSession class, but
not all that much. Basically, we just need to keep track of each
client who might have a cookie corresponding to cached old_stuff:
Listing 2. Maintaining the session
from os import environ
from Cookie import SimpleCookie
from random import shuffle
from string import letters
from cPickle import load, dump
COOKIE_NAME = "my.server.process"
class ClientSession(dict):
def __init__(self):
self.cookie = SimpleCookie()
self.cookie.load(environ.get("HTTP_COOKIE",""))
if COOKIE_NAME not in cookie:
# Real UUID would be better
lets = list(letters)
shuffle(lets)
self.cookie[COOKIE_NAME] = "".join(lets[:15])
self.id = self.cookie[COOKIE_NAME].value
try:
session = load(open("session."+self.id, "rb"))
self.update(session)
except: # If nothing cached, just do not update
pass
def save(self):
fh = open("session."+self.id, "wb")
dump(self.copy(), fh, protocol=-1) # Save the dictionary
fh.close()
|
With the caching server in place, the JavaScript to poll its data is quite simple. All we need is something along the lines of Listing 3:
Listing 3. Polling the server for latest data
var r = new XMLHttpRequest();
r.onreadystatechange=function() {
if (r.readyState==4) {
if (r.status==200) { // "OK status"
displayData(r.responseText);
}
else if (r.status==304) {
// "Not Modified": No change to display
}
else {
alertProblem(r);
}
}
}
r.open("GET",'http://myserver.example.com/latest_data/',true)
r.send(null);
|
The implementation of displayData() and alertProblem() are not
specified in our simple example. Presumably, the former needs to
parse or process the received response in some manner; the details
depend on whether JSON, XML, or some other format is used to send the
data, as well as on the actual application requirements.
Moreover, the quick example only shows how to poll one time. In a
long-running application, you might repeatedly make this request in a
setTimeout() or setInterval() callback. Or, depending on your
application, polling might occur following some particular client
application action or event.
This tip presented some server code written in Python, but almost the same design would apply for nearly any language that might be used to program a CGI or other server process. The general idea is simple: Use a client cookie (if available) to identify the cached data, and send a 304 status if no new data has arisen since the last polling event. Whatever your server programming language, your program will look almost the same.
While I have not shown much error catching, the design involved is
robust in falling back to correct behavior where cookies are not
available. If a client does not have a relevant session cookie — either
because it does not accept cookies or because this is a first poll in
a new session —
old_stuff is simply an empty list, and any data
returned will be part of new_stuff. Another capability often worth
adding is a special client message that will send any current session
state: this is useful both for application debugging and as a way of
clearing out inconsistent state should the client detect that
something has gone wrong. All you lose in flushing the cache is a
little server load and some bandwidth; it does not violate underlying
idempotency.
- As with many topics, Wikipedia provides a nice introduction to the
principles behind REST.
- The developerWorks SOA and Web services zone is packed with information on session state and REST principles.
- With the growth in popularity of building Ajax functionality into Web applications, you'll want to check out the developerWorks Ajax resource center where you'll find tools, code, education, and resources to get you started building Ajax into your applications right now.

David Mertz feels that bandwidth saved is bandwidth earned. Pore over David's life or buy his book Text Processing in Python .