In this series, we look at how to get started with the Google App Engine (GAE). In Part 1, we look at how to get a development environment set up so you can start creating an application that will run on the GAE. We will see how we can use Eclipse to make developing and debugging your application easier. Here in Part 2, we build an Ajax mashup using Eclipse and deploy it to the GAE. Finally, in Part 3, we give back to the ecosystem by creating RESTful Web services to our application, so other folks can use it to create their own mashups.
The GAE is a platform for creating Web applications. The biggest prerequisite for it is knowledge of Python, as this is the programming language used on it (currently, Python V2.5.2). For this series, it would be helpful to have some typical Web development skills (e.g., knowledge of HTML, JavaScript, and CSS). To develop for the GAE, you need to download three software packages.
- Eclipse Classic
- I used Eclipse Classic V3.3.2. Later versions will work, too.
- Google App Engine SDK
- Read official documentation from the GAE site and find links to download the SDK.
- PyDev
- PyDev, which turns Eclipse into a Python IDE, can be installed from within Eclipse using the update site at http://pydev.sourceforge.net/updates/.
Installing the latter two software packages is discussed in Part 1. If you are new to Eclipse, see Resources to get started.
In Part 1, we built a small application for aggregating content feeds and serving them through the GAE. We could go ahead and deploy the application to the GAE, but before we do that, let's make a few enhancements to it. The first set of enhancements are around performance. The version from Part 1 pulls data from the subscribed services each time the page is requested. This can take a long time, especially if any one of the services is slow to respond or if a user has subscribed to many services. That is a problem in general, but it is especially a problem for anything running on the GAE. To make the GAE scalable, it cannot be tied up by long-running requests. If our processing takes too long, it will be aborted and an error message sent to the user. We do not want that, so we will make greater use of the GAE's data-modeling and Bigtable features to improve performance. Bigtable is a distributed storage system for managing structured data (see Resources for more information). We will also use its Memcache APIs to make even greater improvements.
The other set of enhancements we will make in this article deal with the user experience. We will improve our user interface by adding Ajax elements to the application. This will not just be Ajax for Ajax's sake. It will also tie into some of the data-modeling and cache enhancements to further improve the performance of the application. Once these enhancements are in place, we will be ready to deploy the application to the GAE. Let's start by looking at the data-modeling enhancements.
In Part 1, we used a single data model: Account. It used the Expando property feature of GAE to store the URLs of the services. To improve our performance, we want to store the actual data from the feeds. Accessing Bigtable is never as fast as accessing a traditional relational database (or at least a relational database under light load), but it should be faster than pulling the feed down from the source. However, if we only rely on Bigtable, we will never get anything new. Accordingly, we want to keep track of when we pull live data down and insert it into Bigtable, so if it is too stale, we can go back to the source.
There is one more thing we need to consider before creating our new data models. It is possible that different users could have the same feeds. There is really a many-to-many relationship between feeds and accounts. With that in mind, let's look at the new models. The revised Account model is shown in Listing 1.
Listing 1. The Account model
class Account(db.Model):
user = db.UserProperty(required=True)
|
The big change here is that we moved the service information out of the model. How will we determine the URL of a service? That information has been moved to a separate module-level data structure (dictionary), as shown below.
Listing 2. The service data
service_templates = {
'twitter': "http://twitter.com/statuses/user_timeline/%s.rss",
'del.icio.us': "http://del.icio.us/rss/%s",
'last.fm': "http://ws.audioscrobbler.com/1.0/user/%s/recenttracks.rss",
'YouTube': "http://www.youtube.com/rss/user/%s/videos.rss",
}
|
This allows us to use simple string substitution to create a service URL based on a
username. In other words, the combination of a service name (used as a key into the
service_templates dictionary) and a username (used for
string substitution on the value retrieved from the dictionary) will allow us to
calculate the URL. This leads us to the Feed data model.
Listing 3. The Feed model
class Feed(db.Model):
service = db.StringProperty(required=True)
username = db.StringProperty(required=True)
content = db.TextProperty()
timestamp = db.DateTimeProperty(auto_now=True)
|
The service and username are just as we described above. The service property will
serve as a key into the service_templates dictionary, and
the username will be used with that value to calculate a URL. The content property is
the actual content we pull in from the Web service. The time stamp is the date and time
when we pulled in the content. The auto_now=True tells
Bigtable to update the property every time we update the record. We need a join table
to define the many-to-many relationship between an Account and a Feed, as shown below.
Listing 4. The AccountFeed model
class AccountFeed(db.Model):
account = db.ReferenceProperty(Account, required=True, collection_name='feeds')
feed = db.ReferenceProperty(Feed, required=True, collection_name='accounts')
|
A ReferenceProperty is how you relate one model to another
in Bigtable. It is similar to a foreign key in a relational database. You might notice
the collection_name attribute that is used. This is the name
that will be used to refer to the reference if you wanted to use the reference in a
query. If you don't set this, it will be set to whatever the name of the model is plus
_set appended (something like account_set).
Our data modeling is complete. We created models for our feeds and associated them in a many-to-many relationship to an account. Bigtable and the GAE's APIs make it easy to model our entities, but what about versioning? We just went from one version of data models to another. Let's see how to deal with this in the GAE.
Changing schemas during development
Evolving schemas is often tricky. Luckily, we are still in development mode here, and making changes is much easier than if it was in a production application. Changing schemas during development is common, which GAE makes easy. All you need to do is give an extra parameter to the GAE's local Web server, as shown in Figure 1.
Figure 1. Adding a parameter to clean local data store
We simply added the --clear-datastore
parameter as a command-line argument that gets passed into our start-up script. Eclipse
and PyDev make it easy to add these as needed. One bit of warning is that Eclipse will
remember these arguments. If you leave it like that, it will delete your local data
store every time you start your development server. This may be fine, but you should be aware of it.
Now we have a new schema that will allow us to store our feed using Bigtable. Looking up data from Bigtable is not cheap. It is not as fast as many developers have become accustomed to from using relational databases. Luckily, the GAE provides an additional API for faster access to data: Memcache.
The GAE includes an in-memory cache: Memcache. This was inspired by the popular open source distributed-cache memcached, but is a specialized implementation for the GAE. It has similar semantics: You simply put or get name-value pairs from Memcache. Using Memcache can dramatically improve the performance of an application.
For the aggroGator application, we will cache two things. The first and most obvious
thing to cache is the user's services. This can only be changed in the AddService action, so it is easy to make sure that our cache is
accurate. The code for this is shown in the Cache class
below.
Listing 5. User-service cache
class Cache:
@staticmethod
def setUserServices(account):
userServices = [{'service': accountFeed.feed.service, 'username':
accountFeed.feed.username}
for accountFeed in account.feeds]
if not memcache.set(account.user.email(),
pickle.dumps(userServices)):
logging.error('Cache set failed: userServices')
return userServices
@classmethod
def getUserServices(cls, user):
userServices_pickled = memcache.get(user.email())
if userServices_pickled:
userServices = pickle.loads(userServices_pickled)
else:
account = DB.getAccount(user)
userServices = cls.setUserServices(account)
return userServices
|
Here is a quick explanation of this code. We start with a static method (independent of
the class) for setting the user's services in the cache. This uses a list comprehension
to create an array of objects, where each object is a service and the user's username
for that service. The user's e-mail is then used as the key for Memcache. We use the
pickle module to serialize the data and put it into Memcache.
The getUserServices method is similar. It is a class method,
as it is static, but needs to be able to call the setUserServices method in case of a cache miss. It tries to retrieve
the serialized object described above. If there is nothing found in the cache, it looks
up the data from Bigtable and puts it into the cache.
A similar strategy is used for caching entries in a feed. There is one big difference here: We have to be careful about staleness. After all, the user could be creating new entries all the time, and we will have to go back to the source. We need an expiration policy, shown below.
Listing 6. Entry cache
class Cache:
#user service methods omitted
@staticmethod
def setEntries(feed):
entries = GenericFeed.entries(feed)
if not memcache.set("%s_%s" % (feed.service, feed.username),
pickle.dumps(entries), CACHE_FEED_TIME):
logging.error('Cache set failed: entries')
return entries
@classmethod
def getEntries(cls, service, username):
entries_pickled = memcache.get("%s_%s" % (service, username))
if entries_pickled:
entries = pickle.loads(entries_pickled)
else:
feed = DB.getFeed(service, username)
entries = cls.setEntries(feed)
return entries
|
Again, a similar pattern is used. We serialize the data and store in Memcache, this
time using the combination of service and username. This allows us to cache across
users for greater efficiency. When we try to load from cache, we go to Bigtable if
there is a cache miss. Also notice the CACHE_FEED_TIME
expiration value being used to expire data from the cache. If you do not set this,
Memcache will keep everything in cache until it runs out of memory. For user services
and entries, we are using a DB class for querying Bigtable. This class is shown below.
Listing 7. DB access class
class DB:
@staticmethod
def getAccount(user):
return Account.gql("WHERE user = :1", user).get()
@staticmethod
def getFeed(service, username):
return Feed.gql("WHERE service = :1 AND username = :2", service, username).get()
|
This class uses very simple queries using GAE's GQL syntax. This syntax is a small, but powerful subset of SQL syntax. In the above example, we used numbered parameters, but you can also use named parameters for more complicated queries. By querying a feed from Bigtable, we are really just using Bigtable as another caching layer. Let's take a look at how all of this high-performance caching gets coordinated from the client through Ajax.
When most people think of Ajax, they think of how it can enhance the user experience. This is true, of course, but there are many other advantages of Ajax. In particular, there are numerous architectural advantages. It allows you to move a lot of your application logic to the client and retrieve smaller amounts of data from your server. Running on GAE does not change this in any way, but it can take advantage of the scalability features of our application. Let's take a look at how we have weaved Ajax into the aggroGator application.
In the previous version of aggroGator, we would show the list of entries of the various services when the page loaded. We used a Django-style template to accomplish this. To make the page more dynamic, we separate the data and presentation logic. To see how this works, let's look at how our template has changed.
Listing 8. Main page template
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Aggrogator</title>
<link rel="stylesheet" href="/css/aggrogator.css" type="text/css" />
<script type="text/javascript" src="/js/prototype.js"></script>
<script type="text/javascript" src="/js/effects.js"></script>
<script type="text/javascript" src="/js/builder.js"></script>
<script type="text/javascript" src="/js/aggrogator.js"></script>
</head>
<body onload="initialize();">
<ul id="cache"></ul>
<img id="spinner" alt="spinner" src="/gfx/spinner.gif"
style="display: none; float: left;" />
<p id="logout">
{{ user.nickname }}
<a href="{{ logout_url }}">Logout</a>
</p>
<div class="clearboth"></div>
<form id="form_addService" onsubmit="addService(); return false;">
<fieldset>
<legend>Add New Service</legend>
<label for="service">Service: </label>
<select name="service" id="service">
<option>twitter</option>
<option>del.icio.us</option>
<option>last.fm</option>
<option>YouTube</option>
</select><br/>
<label for="username">Username: </label>
<input type="text" name="username" id="username" />
<input type="submit" value="Add" />
</fieldset>
</form>
<table>
<tbody style="vertical-align: top;">
<tr>
<td>
<div id="userServices"><span /></div>
<div id="entries"><span /></div>
<td>
<table><tbody id="allEntries"></tbody></table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
|
There are two important things to notice about the template. First, there is almost
nothing dynamic about it anymore — just the user name and login/logout
link. Second, we include a lot of JavaScript. We use the Prototype and script.aculo.us
JavaScript libraries (see Resources for more information).
We also include a custom JavaScript library: aggrogator.js. It has an initialize() method called when the page loads, as shown
below.
Listing 9. Page initialization
function initialize() {
getUserServices();
new PeriodicalExecuter(getUserServices, 300);
}
function getUserServices() {
var handler = function(xhr) {
var json = xhr.responseJSON;
if (json.error) {
// display the error
}
else {
cacheStats(json.stats);
userServicesTable(json.userServices);
updateEntries(json.userServices);
}
};
// create options for request
var options = {
method: "get",
onSuccess: handler
};
// send the request
new Ajax.Request("/getUserServices", options);
}
|
As you can see, the initialization code simply calls another function: getUserServices. It also starts a polling process to periodically
call getUserServices using Prototype's PeriodicalExecutor class. In this case, it will call getUserServices every 300 seconds, or every 5 minutes. This polling
will provide the illusion of data being pushed (also known as Comet or reverse Ajax)
from the server. Thus, when a new post is made to Twitter, for example, it will shortly
be pushed to a user on aggroGator.
The getUserServices class does a lot more interesting work.
It makes an Ajax request that loads the services the current user is subscribed to. It
then builds a table of those services, as shown below.
Listing 10. Building the user services table
function userServicesTable(json) {
var table = Builder.node('table',
Builder.node('tbody',
function() {
var l = [];
json.each(function(s) {
l.push(Builder.node('tr', [
Builder.node('td',
Builder.node('a', {href: "",
onclick: "getEntries('" + s.service + "', '" +
s.username + "'); return false;"},
s.service + ':' + s.username)
)
]));
});
return l;
}()
)
);
$('userServices').replaceChild(table, $('userServices').firstChild);
}
|
This function makes heavy use of script.aculo.us's Builder library to create an HTML
table with all the user's services shown. Before we go any further, let's talk about
the data being used in this service. As we saw in Listing 9, it is making a request to
GetUserServices. This has to be configured in the main method of our application.
Listing 11. Setting up routing rules
def main():
app = webapp.WSGIApplication([
('/', MainPage),
('/addService', AddService),
('/getEntries', GetEntries),
('/getUserServices', GetUserServices),
], debug=True)
util.run_wsgi_app(app)
|
As you can see, the /getUserServices URL is being mapped to a new class called GetUserServices. This class is shown below.
Listing 12.
GetUserServices
class GetUserServices(webapp.RequestHandler):
def get(self):
user = users.get_current_user()
# get the user's services from the cache
userServices = Cache.getUserServices(user)
stats = memcache.get_stats()
self.response.headers['content-type'] = 'application/json'
self.response.out.write(simplejson.dumps({'stats': stats, 'userServices': userServices}))
|
This class is pretty simple, but very powerful. It is retrieving data from our Cache class, which is really an abstraction on top of Bigtable
and Memcache. It is then passing the data back as JSON. There are numerous third-party
libraries available for converting Python objects to JSON and vice-versa, but we did
not need them. The GAE SDK includes Django, so we are using Django's django.utils.simplejson function to serialize our Python objects to
JSON. You might also notice we are passing back some cache stats. These are some simple
stats on how often we found the data in Memcache vs. how many times we did not. Of
course, this is not needed, but is interesting, at least to developers. You can do a
view source on the Web page to see these stats. Finally, notice that we set the
content-type header to application/json. This is used as an indicator to Prototype that the payload is JSON, so it will handle safe
deserialization of the JSON for us.
Now we have seen how the data gets served from our application running on GAE. If you
go back to Listing 9, we don't just build the table of services. We also retrieve all
of the entries for each service by calling the updateEntries
function. You can find that function and the Python class that handles it in the full
code included with this article. It follows a similar pattern:
- Call the server
- Look for data in Memcache
- If not in Memcache, go to Bigtable
- If not there, or too old, go to source
- Serialize data as JSON
- On client, programmatically build UI
There are more great features we could build for our application, but at some point, we need to deploy it to the GAE. Let's take a look at doing that next and see how we can monitor and debug production applications.
Deployment to your production environment is often a painful process. It might involve
FTPing code, running builds, etc. However, deployment is one feature that GAE has made
very simple. There is a simple deployment script called appcfg.py the GAE installer
should have put in your path (it is in the GAE home directory, if you did not use an
installer and simply unzipped the package). You simply invoke this script with its
update command and the directory of your application (the
directory with the app.yaml file, as it needs to read this file), and you should see something similar to Listing 13.
Listing 13. Deployment using appcfg.py script
$ appcfg.py update aggrogator/src/ Loaded authentication cookies from /Users/michael/.appcfg_cookies Scanning files on local disk. Initiating update. Email: your_email@here Password for your_email@here: Saving authentication cookies to /Users/michael/.appcfg_cookies Cloning 7 static files. Cloning 3 application files. Uploading 1 files. Closing update. Uploading index definitions. |
That's it. Now your application is deployed to the GAE. You can go to it in a browser and give it a try. You need to register your application on GAE before you deploy, as you need its name in app.yaml (as shown in Part 1). Don't pick aggroGator for that name because it was used for this application. You can check out the application running live here: http://aggrogator.appspot.com.
For any production Web site, you need to be able to monitor and make sure that it is healthy and running properly. This is easily done with the GAE. If you log in to the Google App Engine, you will see a list of applications you own, as shown below.
Figure 2. My apps on GAE
Click on the link, and it will bring up the dashboard for your application.
Figure 3. App dashboard
There is a lot of useful information you can use here. One of the most useful is the Logs. When the aggroGator application was first deployed, one of the services it featured, del.icio.us, was not working. It worked fine in development, but not in production. Luckily, the GAE SDK provides logging. The problem was in the code that was pulling down the RSS feed from del.icio.us, so logging was added there, as shown below.
Listing 14. Adding logging code
class GenericFeed:
@staticmethod
def fetch(service, username):
content = None
# construct service url
service_url = SERVICE_TEMPLATES[service] % username
# fetch feed from service
result = urlfetch.fetch(service_url)
if result.status_code == 200:
content = unicode(result.content, 'utf-8')
else:
logging.error("Error fetching content, HTTP status code = " + str(result.status_code))
return content
|
Now, the logging could be viewed using the Logs console in GAE, as shown below.
Figure 4. Logs console
As you can see, del.icio.us was returning an HTTP 503 (Service Unavailable status code). Nothing wrong with the code, just something wrong with the communication between GAE and the del.icio.us Web site.
We have seen how to make use of Google App Engine's features to provide greater scalability and performance to your application. This includes using Bigtable and Memcache together to provide caching of "expensive" data — data that takes a long time to retrieve from a remote resource. This combines with Ajax to provide an efficient use of GAE and to allow for compelling features for our end users, such as data being pushed from the server. In Part 3, we continue to grow our feature set, diving further into the data-modeling capabilities of the GAE, and see how we can turn aggroGator into a data provider for other mashups.
A special thanks to Python master Chris Gilmore for greatly improving the quality and performance of the code in this article.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code | os-eclipse-mashup-google-pt2-aggrogator2.zip | 178KB | HTTP |
Information about download methods
Learn
-
Read "Charming
Python: Python elegance and warts" to learn about the latest goodies in Python.
-
Read more of the "Charming
Python" series on developerWorks.
-
The SDK uses the Web app framework that is similar to Django. You can actually use
Django, so you might want to learn about Django in the developerWorks article "Python Web frameworks,
Part 1."
-
Check out "Get started with
open source CMS, Part 6: Build a Python WebDAV client for Jakarta Slide" to see PyDev in action.
-
Read all about Google's Bigtable: A Distributed Storage
System for Structured Data.
-
With a dynamic language like Python, it is always good to have the official Python documentation handy.
-
Doing Web development with Eclipse? You might want to read "Discover the Ajax
Toolkit Framework for Eclipse."
-
Learn more about script.aculo.us.
-
Learn more about the Prototype Framework.
-
Check out EclipseLive for webinars featuring various Eclipse technologies.
-
Check out the "Recommended Eclipse reading list."
-
Browse all the Eclipse content on developerWorks.
-
New to Eclipse? Read the developerWorks article "Get started with Eclipse Platform" to learn its origin and architecture, and how to extend Eclipse with plug-ins.
-
Expand your Eclipse skills by checking out IBM developerWorks' Eclipse project resources.
-
To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
-
Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
Get products and technologies
-
Download the Google App Engine SDK and
read official documentation from the Google App Engine site.
-
The application created in this article used Mark Pilgrim's Universal Feed Parser. This awesome library can parse RSS, Atom, you name it.
-
DjangoProject.com is the home page for the Django framework.
-
This article uses Eclipse Classic V3.3.2.
-
The PyDev plug-in is available from http://pydev.sourceforge.net/updates/. It
can be installed from within Eclipse using this update site.
-
Check out the latest Eclipse technology downloads at IBM alphaWorks.
-
Download Eclipse Platform and other projects from the Eclipse Foundation.
-
Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
-
Innovate your next open source development project with IBM trial software, available for download or on DVD.
Discuss
-
The Eclipse Platform newsgroups should be your first stop to discuss questions regarding Eclipse. (Selecting this will launch your default Usenet news reader application and open eclipse.platform.)
-
The Eclipse newsgroups has many resources for people interested in using and extending Eclipse.
-
Participate in developerWorks blogs and get involved in the developerWorks community.
Comments (Undergoing maintenance)






