GAE storage with Bigtable, Blobstore, and Google Storage

Learn your way around three data storage options for GAE

Google App Engine eschews the relational database in favor of several non-relational datastores: Bigtable, Blobstore, and the newest kid on the block, Google Storage for Developers. Author John Wheeler explores the pros and cons of GAE's three big-data storage options, while walking you through an application scenario that will familiarize you with setting up and using each one.

Share:

John Wheeler, Applications manager, Xerox

John WheelerJohn Wheeler has been programming professionally for over a decade. He is the co-author of Spring in Practice and works for Xerox as an applications manager. Visit John's website for more of his writing about software development.



07 December 2010

Also available in Chinese Russian Japanese Portuguese

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Using NoSQL and analyzing big data

Because they've always been around to dump bytes on, it's easy to take disk drives and the file systems on them for granted. When you're writing a file, you don't have to consider much more than its location, permissions, and space requirements. You just construct a java.io.File and get to work; java.io.File works the same whether you're on a desktop computer, web server, or mobile device. But when you begin working with Google App Engine (GAE), that transparency, or lack thereof, quickly becomes apparent. In GAE, you can't write files to disk because there's no useable file system. In fact, so much as declaring a java.io.FileInputStream would throw a compilation error, because that class has been blacklisted from the GAE SDK.

Luckily, life has options, and GAE offers some particularly powerful ones for storage. Because it was designed from the ground up with scalability in mind, GAE supplies two key-value stores: Datastore (aka Bigtable) holds regular data you'd normally throw in a database, while Blobstore holds huge binary blobs. Both have constant-time access, and both are completely unlike file systems you probably have worked with in the past.

In addition to these two, there's a newcomer to the mix: Google Storage for Developers. It works like Amazon S3, which is also markedly different from a traditional file system. In this article, we'll build an example application that implements each GAE storage option in turn. You'll get hands-on experience with using Bigtable, Blobstore, and Google Storage for Developers, and you'll understand the pros and cons of each implementation.

What you'll need

You'll need a GAE account and several free, open source tools to work through the examples in this article. For your development environment, you'll need JDK 5 or JDK 6 and the Eclipse IDE for Java™ developers. You'll also need:

Google Storage for developers is currently available only to a limited number of developers in the United States. If you cannot immediately obtain access to Google Storage, you can still follow the examples for Bigtable and Blobstore, and you'll get a good sense of how Google Storage works.

Preliminary setup: The example application

Before we can start exploring the GAE storage systems, we need to create the three classes required for our example application:

  • A bean that represents a photograph. Photo contains fields like title and caption, as well as a few others for storing binary-image data.
  • A DAO that persists Photos to the GAE datastore, aka Bigtable. The DAO contains one method for inserting Photos and another for pulling them back out by ID. It uses an open source library called Objectify-Appengine for persistence.
  • A servlet that uses the Template Method pattern to encapsulate a three-step workflow. We'll use the workflow to explore each GAE storage option.

Application workflow

We'll follow the same procedure to learn about each of the GAE data storage options; this will give you the opportunity to focus on the technology, and also compare the pros and cons of each storage method. The application workflow will be the same every time:

  1. Display an upload form.
  2. Upload an image to storage and save a record to the datastore.
  3. Serve up the image.

Figure 1 is a diagram of the application workflow:

Figure 1. The three-step workflow used to demonstrate each storage option
A diagram of the GAE data storage example application.

As an added benefit, the example application also lets you practice tasks that are key to any GAE project that writes out and serves up binaries. Now, let's start creating those classes!


A simple application for GAE

Download Eclipse if you don't have it, then install the Google Plug-in for Eclipse and create a new Google Web Application project that doesn't use GWT. Refer to the sample code included with this article for guidance on structuring project files. Once you've got your Google Web app set up, add the application's first class, Photo, as shown in Listing 1. (Note that I've omitted getters and setters.)

Listing 1. Photo
import javax.persistence.Id;

public class Photo {

    @Id
    private Long id;
    private String title;
    private String caption;
    private String contentType;
    private byte[] photoData;
    private String photoPath;

    public Photo() {
    }

    public Photo(String title, String caption) {
        this.title = title;
        this.caption = caption;
    }

    // getters and setters omitted
}

The @Id annotation designates which field is a primary key, which will be important when we start working with Objectify. Each record saved in the datastore, also called an entity, requires a primary key. When an image is uploaded, one option is to store it directly in photoData, which is a byte array. It's written to the datastore as a Blob property along with the rest of Photo's fields. In other words, the image is saved and fetched right alongside the bean. If an image is instead uploaded to Blobstore or Google Storage, then bytes are stored externally on that system and photoPath points to their location. Only photoData or photoPath is used in either case. Figure 2 clarifies each one's function:

Figure 2. How photoData and photoPath work
A diagram showing the difference between photoData and photoPath.

Next we'll handle persistence for the bean.

Object-based persistence

As previously mentioned, we'll use Objectify to create a DAO for the Photo bean. While JDO and JPA may be more popular and ubiquitous persistence APIs, they have steeper learning curves. Another option would be to use the low-level GAE datastore API, but that involves the tedious business of marshaling beans to and from datastore entities. Objectify takes care of that for us, by way of Java reflection. (See Resources to learn more about GAE persistence alternatives, including Objectify-Appengine.)

Start by creating a class called PhotoDao and coding it as shown in Listing 2:

Listing 2. PhotoDao
import com.googlecode.objectify.*;
import com.googlecode.objectify.helper.DAOBase;

public class PhotoDao extends DAOBase {

    static {
        ObjectifyService.register(Photo.class);
    }

    public Photo save(Photo photo) {
        ofy().put(photo);
        return photo;
    }
    
    public Photo findById(Long id) {
        Key<Photo> key = new Key<Photo>(Photo.class, id);
        return ofy().get(key);
    }
}

PhotoDao extends DAOBase, a convenience class that lazily loads an Objectify instance. Objectify is our primary interface into the API and is exposed through the ofy method. Before we can use ofy, however, we need to register persistent classes in a static initializer like Photo in Listing 2.

The DAO contains two methods for inserting and finding Photos. In each, working with Objectify is as simple as working with hashtable. You might notice that Photos are fetched with a Key in findById, but don't worry about it: For the purpose of this article, just think of Key as a wrapper around the id field.

We now have a Photo bean and a PhotoDao to manage persistence. Next, we'll flesh out the application workflow.

Application workflow, by way of the Template Method pattern

If you've ever played Mad Libs, then the Template Method pattern will make sense to you. Each Mad Lib presents a story with a bunch of blank spots for readers to fill out. The reader's input — how the blank spots are completed — drastically alters the story. Similarly, classes using the Template Method pattern contain a series of steps, and some are left blank.

We'll build a servlet that uses the Template Method pattern to carry out our example application's workflow. Start by stubbing out an abstract servlet and naming it AbstractUploadServlet. You can use the code in Listing 3 as a reference:

Listing 3. AbstractUploadServlet
import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.*;

@SuppressWarnings("serial")
public abstract class AbstractUploadServlet extends HttpServlet {

}

Next, add the three abstract methods in Listing 4. Each one represents a step in the workflow.

Listing 4. Three abstract methods
protected abstract void showForm(HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException;

protected abstract void handleSubmit(HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException;

protected abstract void showRecord(long id, HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException;

Now, given that we're using the Template Method pattern, think of the methods in Listing 4 as the blanks and the code in Listing 5 as the story that assembles them:

Listing 5. A workflow emerges
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
    throws ServletException, IOException {
    String action = req.getParameter("action");
    if ("display".equals(action)) {
        // don't know why GAE appends underscores to the query string
        long id = Long.parseLong(req.getParameter("id").replace("_", ""));
        showRecord(id, req, resp);
    } else {
        showForm(req, resp);
    }
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) 
    throws ServletException, IOException {
    handleSubmit(req, resp);
}

A reminder about servlets

Just in case it's been a while since you've worked with plain old servlets, doGet and doPost are standard methods for handling HTTP GETs and POSTs. It's common practice to use GET to fetch web resources and POST to send data. In that spirit, our implementation of doGet either displays an upload form or a photo from storage, and doPost handles upload-form submissions. It's up to classes that extend AbstractUploadServlet to define each piece of behavior. The diagram in Figure 3 shows the sequence of events that occur. It might take a few minutes to get a clear picture of exactly what's going on.

Figure 3. The workflow in a sequence diagram
A sequence diagram of the example application workflow.

With the three classes built, our example application is ready to roll. We can now focus on seeing how each of the GAE storage options interacts with the application workflow, starting with Bigtable.


GAE storage option #1: Bigtable

Google's GAE documentation describes Bigtable as a sharded, sorted array, but I find it easier to think of it as a gigantic hashtable chunked out across a bazillion servers. Like a relational database, Bigtable has datatypes. In fact, both Bigtable and relational databases use the blob type to store binaries.

Don't confuse the blob type with Blobstore — that's GAE's other key-value store, which we'll explore next.

Working with blobs in Bigtable is most convenient because they're loaded alongside other fields, making them immediately available. The one big caveat is that blobs can't be larger than 1MB, although that restriction might be relaxed in the future. You'd be hard-pressed to find a digital camera nowadays that takes pictures smaller than that, so using Bigtable could present a drawback for any use case that involves images (as our example application does). If you're okay with the 1MB rule for now, or if you're storing something smaller than images, then Bigtable could be a good choice: of the three GAE storage alternatives, it's the easiest to work with.

Before we can upload data to Bigtable, we'll need to create an upload form. Then we'll work through the servlet implementation, which consists of three abstract methods customized for Bigtable. Finally, we'll implement error handling, because the 1MB limit is easy for people to break.

Create the upload form

Figure 4 shows the upload form for Bigtable:

Figure 4. An upload form for Bigtable
Screenshot of an upload form for digital images.

To create this form, start with a file called datastore.jsp, then plug in the block of code in Listing 6:

Listing 6. The custom upload form
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body>		
        <form method="POST" enctype="multipart/form-data">
            <table>
                <tr>	
                    <td>Title</td>
                    <td><input type="text" name="title" /></td>
                </tr>
                <tr>	
                    <td>Caption</td>
                    <td><input type="text" name="caption" /></td>
                </tr>
                <tr>	
                    <td>Upload</td>
                    <td><input type="file" name="file" /></td>
                </tr>
                <tr>
                    <td colspan="2"><input type="submit" /></td>
                </tr>				
            </table>
        </form>
    </body>	
</html>

The form must have its method attribute set to POST and an enclosure type of multipart/form-data. Because no action attribute is specified, the form submits to itself. By POSTing, we end up in AbstractUploadServlet's doPost, which in turn calls handleSubmit.

We've got the form, so let's move on to the servlet behind it.

Uploading to and from Bigtable

Here we implement the three methods in turn. One displays the form we just created and another processes its uploads. The last method serves the uploads back to us, just so you can see how that's done.

The servlet uses the Apache Commons FileUpload library. Download it and its dependencies and include them in your project. When that's done, hammer out the stub in Listing 7:

Listing 7. DatastoreUploadServlet
import info.johnwheeler.gaestorage.core.*;
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import org.apache.commons.fileupload.*;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.fileupload.util.Streams;

@SuppressWarnings("serial")
public class DatastoreUploadServlet extends AbstractUploadServlet {
    private PhotoDao dao = new PhotoDao();
}

There's nothing too exciting going on here yet. We import classes we need and construct a PhotoDao to use later. DatastoreUploadServlet won't compile until we implement the abstract methods. Let's step through each one starting with showForm in Listing 8:

Listing 8. showForm
@Override
protected void showForm(HttpServletRequest req, HttpServletResponse resp) 
    throws ServletException, IOException {
    req.getRequestDispatcher("datastore.jsp").forward(req, resp);        
}

As you can see, showForm simply forwards to our upload form. handleSubmit, shown in Listing 9, is more involved:

Listing 9. handleSubmit
@Override
protected void handleSubmit(HttpServletRequest req,
    HttpServletResponse resp) throws ServletException, IOException {
    ServletFileUpload upload = new ServletFileUpload();

    try {
        FileItemIterator it = upload.getItemIterator(req);

        Photo photo = new Photo();

        while (it.hasNext()) {
            FileItemStream item = it.next();
            String fieldName = item.getFieldName();
            InputStream fieldValue = item.openStream();

            if ("title".equals(fieldName)) {
                photo.setTitle(Streams.asString(fieldValue));
                continue;
            }

            if ("caption".equals(fieldName)) {
                photo.setCaption(Streams.asString(fieldValue));
                continue;
            }

            if ("file".equals(fieldName)) {
                photo.setContentType(item.getContentType());
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                Streams.copy(fieldValue, out, true);
                photo.setPhotoData(out.toByteArray());
                continue;
            }
        }

        dao.save(photo);
        resp.sendRedirect("datastore?action=display&id=" + photo.getId());            
    } catch (FileUploadException e) {
        throw new ServletException(e);
    }        
}

That's a long line of code, but what it does is simple. The handleSubmit method streams the upload form's request body, extracting each form value into a FileItemStream. Meanwhile, a Photo is set up one piece at a time. It's a bit clumsy to roll through each field and check what's what, but that's how it's done with streaming data and the streaming API.

Getting back to the code, when we land on the file field, a ByteArrayOutputStream assists in doling uploaded bytes into photoData. Lastly, we save the Photo with PhotoDao and send a redirect, which lands us in our final abstract class, showRecord in Listing 10:

Listing 10. showRecord
@Override
protected void showRecord(long id, HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException {
    Photo photo = dao.findById(id);
        
    resp.setContentType(photo.getContentType());        
    resp.getOutputStream().write(photo.getPhotoData());
    resp.flushBuffer();                    
}

showRecord looks up a Photo and sets a content-type header before writing the photoData byte array directly to the HTTP response. flushBuffer forces any remaining content out to the browser.

The last thing we need to do is add some error-handling code for uploads larger than 1MB.

Displaying an error message

As previously mentioned, Bigtable imposes a 1MB limit that's a challenge not to break with most use cases involving images. At best, we can tell users to resize their images and try again. For demonstration purposes, the code in Listing 11 simply displays an exception message when a GAE exception is thrown. (Note that this is standard servlet-spec error handling and not specific to GAE.)

Listing 11. An error has occurred
import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;

@SuppressWarnings("serial")
public class ErrorServlet extends HttpServlet {
    @Override
    protected void service(HttpServletRequest req, HttpServletResponse res) 
        throws ServletException, IOException {
        String message = (String)   
            req.getAttribute("javax.servlet.error.message");
        
        PrintWriter out = res.getWriter();
        out.write("<html>");
        out.write("<body>");
        out.write("<h1>An error has occurred</h1>");                
        out.write("<br />" + message);        
        out.write("</body>");
        out.write("</html>");
    }
}

Don't forget to register ErrorServlet in web.xml, along with the other servlets we'll create throughout this article. The code in Listing 12 registers an error page that points back to ErrorServlet:

Listing 12. Registering the error
<servlet>
    <servlet-name>errorServlet</servlet-name>	  
    <servlet-class>
        info.johnwheeler.gaestorage.servlet.ErrorServlet
    </servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>errorServlet</servlet-name>
    <url-pattern>/error</url-pattern>
</servlet-mapping>

<error-page>
    <error-code>500</error-code>
    <location>/error</location>
</error-page>

That wraps up this quick introduction to Bigtable, also known as the GAE datastore. Bigtable is probably the most intuitive of the GAE storage options, but its downside is file size: At 1MB per file, you probably don't want to use it for anything bigger than a thumbnail — if that. Next up is Blobstore, another key-value storage option that can save and serve files up to 2GB in size.


GAE storage option #2: Blobstore

Blobstore has a size advantage over Bigtable but it's not without problems of its own: namely, the fact that it forces the use of a one-time upload URL that's hard to build web services around. Here's an example of what one looks like:

/_ah/upload/aglub19hcHBfaWRyGwsSFV9fQmxvYlVwbG9hZFNlc3Npb25fXxh9DA

Web service clients must ask for the URL before POSTing to it, which results in an extra call across the wire. That might not be a big deal in a lot of applications, but it's less than completely elegant. It also could be prohibitive in cases where the client is running on GAE, where CPU hours are billable. If you're thinking you can get around these issues by building a servlet that forwards uploads to the one-time URL via URLFetch, think again. URLFetch has a 1MB transfer restriction, so you might as well use Bigtable if you're going in that direction. For frame of reference, the graphic in Figure 5 shows the difference between a one- and two-pronged web service call:

Figure 5. The difference between a one-pronged and two-pronged web service call
A graphic showing image files moving between a web service client and Blobstore (which is two-pronged) versus Bigtable (which is one-pronged).

Blobstore has its pros and cons, and you'll see more of them for yourself in the next sections. We'll once again build an upload form and implement the three abstract methods supplied by AbstractUploadServlet— but this time we'll tailor our code to Blobstore.

An upload form for Blobstore

There's not much to repurposing our upload form for Blobstore: just copy datastore.jsp to a file named blobstore.jsp, then augment it with the bold lines of code shown in Listing 13:

Listing 13. blobstore.jsp
<% String uploadUrl = (String) request.getAttribute("uploadUrl"); %><html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body>		
        <form method="POST" action="<%= uploadUrl %>"
            enctype="multipart/form-data">
		<!-- labels and fields omitted -->
        </form>
    </body>	
</html>

The one-time upload URL is generated in a servlet, which we'll code up next. Here, that URL is parsed off the request and placed into the form's action attribute. We have no control over the Blobstore servlet we're uploading to, so how are we going to get the other form values? The answer is that the Blobstore API has a callback mechanism. We pass a callback URL to the API when the one-time URL is generated. After the upload, Blobstore invokes the callback, passing over the original request along with any uploaded blobs. You'll see all this in action as we implement AbstractUploadServlet next.

Uploading to Blobstore

Start by using Listing 14 as a reference to stub out a class called BlobstoreUploadServlet, which extends AbstractUploadServlet:

Listing 14. BlobstoreUploadServlet
import info.johnwheeler.gaestorage.core.*;
import java.io.IOException;
import java.util.Map;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import com.google.appengine.api.blobstore.*;

@SuppressWarnings("serial")
public class BlobstoreUploadServlet extends AbstractUploadServlet {
    private BlobstoreService blobService = 
        BlobstoreServiceFactory.getBlobstoreService();
    private PhotoDao dao = new PhotoDao();
}

The initial class definition is similar to what we did for DatastoreUploadServlet, but with the addition of a BlobstoreService variable. That's what generates the one-time URL in showForm in Listing 15:

Listing 15. showForm for blobstore
@Override
protected void showForm(HttpServletRequest req, HttpServletResponse resp) 
    throws ServletException, IOException {
    String uploadUrl = blobService.createUploadUrl("/blobstore");
    req.setAttribute("uploadUrl", uploadUrl);
    req.getRequestDispatcher("blobstore.jsp").forward(req, resp);
}

The code in Listing 15 creates an upload URL and sets it on the request. The code then forwards to the form created in Listing 13, where the upload URL is expected. The callback URL is set to this servlet's context as it was defined in web.xml. That way, when the Blobstore POSTs back, we land in handleSubmit, shown in Listing 16:

Listing 16. handleSubmit for Blobstore
@Override
protected void handleSubmit(HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException {
    Map<String, BlobKey> blobs = blobService.getUploadedBlobs(req);
    BlobKey blobKey = blobs.get(blobs.keySet().iterator().next());

    String photoPath = blobKey.getKeyString();
    String title = req.getParameter("title");
    String caption = req.getParameter("caption");
    
    Photo photo = new Photo(title, caption);
    photo.setPhotoPath(photoPath);
    dao.save(photo);

    resp.sendRedirect("blobstore?action=display&id=" + photo.getId());
}

getUploadedBlobs returns a Map of BlobKeys. Because our upload form supports a single upload, we get the one-and-only BlobKey we expect, and stuff a string representation of it into the photoPath variable. Afterward, the rest of the fields are parsed into variables and set on a new Photo instance. The instance is then saved to the datastore before redirecting to showRecord in Listing 17:.

Listing 17. showRecord for blobstore
@Override
protected void showRecord(long id, HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException {
    Photo photo = dao.findById(id);
    String photoPath = photo.getPhotoPath();

    blobService.serve(new BlobKey(photoPath), resp);
}

In showRecord, the Photo we just saved in handleSubmit is reloaded from Blobstore. The actual bytes of whatever was uploaded aren't stored in the bean as they were in Bigtable. Instead, a BlobKey is rebuilt with photoPath and used to serve an image to the browser.

Blobstore makes working with form-based uploads a snap, but web service-based uploads are a different story. Next, we'll check out Google Storage for Developers, which meets us with the exact opposite conundrum: Form-based uploads require a bit of hacking while service-based uploads are easy.


GAE storage option #3: Google Storage

Google Storage for Developers is the most powerful of the three GAE storage options, and it's easy to use once you've cleared a few things out of the way. Google Storage has a lot in common with Amazon S3; in fact, both use the same protocol and have the same RESTful interface, so libraries made to work with S3, such as JetS3t, also work with Google Storage. Unfortunately, as of this writing, these libraries don't work reliably on Google App Engine because they perform unpermitted operations such as spawning threads. So, for the moment, we're left to work with the RESTful interface and do some of the heavy lifting these APIs would otherwise do.

Google Storage is worth the trouble, mainly because it supports powerful access controls through access control lists (ACLs). With ACLs, it's possible to grant read-only and read-write access to objects, so you can easily make photos public or private, like they are on Facebook and Flickr. ACLs are outside the scope of this article, so everything we'll upload will be granted public, read-only access. See the Google Storage online documentation (in Resources) to learn more about ACLs.

About Google Storage

Released as a preview edition in May 2010, Google Storage for Developers is currently available only to a limited number of developers in the United States, and there's a waiting list for the preview. Being still in its infancy, Google Storage also poses some implementation challenges, which I work around in this section. Having no clear-cut integration path between Google Storage and GAE means extra coding, but for some use cases — such as those requiring access control — that proves worthwhile. I hope we'll see integration libraries in the near future.

Unlike Blobstore, Google Storage is by default compatible for use by web service and browser clients. Data is sent through either a RESTful PUT or POST. The first option is for web service clients that can control how requests are structured and headers are written. The second option, which we'll explore here, is for browser-based uploads. We'll need a JavaScript hack to process the upload form, which presents some complications as you'll see.

Hacking the Google Storage upload form

Unlike Blobstore, Google Storage doesn't forward to a callback URL after it's been POSTed to. Instead, it issues a redirect to a URL we specify. This presents a problem because form values aren't carried over the redirect. The way to get around this is by creating two forms in the same web page — one containing the title and caption text fields, the other with the file upload field and required Google Storage parameters. We'll then use Ajax to submit the first form. When the Ajax callback is invoked, we'll submit the second upload form.

Because this form is more complicated than the last two, we'll construct it step-by-step. First, we extract a few values that are set by a forwarding servlet that hasn't been built yet, shown in Listing 18:

Listing 18. Extracting form values
<% 
String uploadUrl = (String) request.getAttribute("uploadUrl");
String key = (String) request.getAttribute("key");
String successActionRedirect = (String) 
    request.getAttribute("successActionRedirect");
String accessId = (String) request.getAttribute("GoogleAccessId");
String policy = (String) request.getAttribute("policy");
String signature = (String) request.getAttribute("signature");
String acl = (String) request.getAttribute("acl");
%>

The uploadUrl holds Google Storage's REST endpoint. The API provides the two shown below. Either is acceptable, but we're responsible for replacing the components in italics with our own values:

  • bucket.commondatastorage.googleapis.com/object
  • commondatastorage.googleapis.com/bucket/object

The remaining variables are required Google Storage parameters:

  • key: The name of the uploaded data on Google Storage.
  • success_action_redirect: Where to redirect once the upload is complete.
  • GoogleAccessId: A Google-assigned API key.
  • policy: A base 64-encoded JSON string constraining how data is uploaded.
  • signature: The policy signed with a hashing algorithm and base 64 encoded. Used for authentication.
  • acl: An access control list specification.

Two forms and a submit button

The first form in Listing 19 contains just title and caption fields. The surrounding <html> and <body> tags have been omitted.

Listing 19. The first upload form
<form id="fieldsForm" method="POST">
    <table>
        <tr>	
            <td>Title</td>
            <td><input type="text" name="title" /></td>
        </tr>
        <tr>	
            <td>Caption</td>
            <td>
                <input type="hidden" name="key" value="<%= key %>" />	
                <input type="text" name="caption" />
            </td>
        </tr>			
    </table>		
</form>

There's not much to say about this form except that it POSTs to itself. Let's move on to the form in Listing 20, which is larger because it contains a half-dozen hidden input fields:

Listing 20. The second form with hidden fields
<form id="uploadForm" method="POST" action="<%= uploadUrl %>" 
    enctype="multipart/form-data">
    <table>
        <tr>
            <td>Upload</td>
            <td>
                <input type="hidden" name="key" value="<%= key %>" />
                <input type="hidden" name="GoogleAccessId" 
                    value="<%= accessId %>" />
                <input type="hidden" name="policy" 
                    value="<%= policy %>" />
                <input type="hidden" name="acl" value="<%= acl %>" />
                <input type="hidden" id="success_action_redirect" 
                    name="success_action_redirect" 
                    value="<%= successActionRedirect %>" />
                <input type="hidden" name="signature"
                    value="<%= signature %>" />
                <input type="file" name="file" />
            </td>
        </tr>
        <tr>
            <td colspan="2">
                <input type="button" value="Submit" id="button"/>
            </td>
        </tr>
    </table>
</form>

The values extracted in the JSP scriptlet (in Listing 18) are placed in the hidden fields. The file input is at the bottom. The submit button is a plain old button that won't do anything until we rig it with JavaScript, as shown in Listing 21:

Listing 21. Submitting the upload form
<script type="text/javascript" 
src="https://Ajax.googleapis.com/Ajax/libs/jquery/1.4.3/jquery.min.js">
</script>
<script type="text/javascript">
    $(document).ready(function() {			
        $('#button').click(function() {
            var formData = $('#fieldsForm').serialize();
            var callback = function(photoId) {
                var redir = $('#success_action_redirect').val() +
                    photoId;
                $('#success_action_redirect').val(redir)
                $('#uploadForm').submit();
             };
			
             $.post("gstorage", formData, callback);
         });
     });
</script>

The JavaScript in Listing 21 is written with JQuery. Even if you haven't used the library, the code shouldn't be hard to understand. The first thing the code does is import JQuery. Then, a click-listener is installed on the button, so that when the button is clicked, the first form is submitted via Ajax. From there, we land in the handleSubmit method of the servlet (which we'll build shortly), where a Photo is constructed and saved to the datastore. Finally, the new Photo ID is returned to the callback and appended to the URL in success_action_redirect before the upload form is submitted. That way, when we come back from the redirect, we can look up the Photo and display its image. Figure 6 shows the entire sequence of events:

Figure 6. A sequence diagram showing the JavaScript call path
A sequence diagram showing the JavaScript call path.

With the form taken care of, we need a utility class to create and sign policy documents. Then we can get to subclassing AbstractUploadServlet.

Creating and signing a policy document

Policy documents constrain uploads. For example, we might specify how large uploads can be or what types of files are acceptable, or we could even impose restrictions on file names. Public buckets don't require policy documents, but private ones, like Google Storage, do. To set things in motion, stub out a utility class called GSUtils based on the code in Listing 22:

Listing 22. GSUtils
import java.io.UnsupportedEncodingException;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.GregorianCalendar;
import java.util.TimeZone;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;

import com.google.appengine.repackaged.com.google.common.util.Base64;

private class GSUtils {
}

Given that utility classes are usually only composed of static methods, it's a good idea to privatize their default constructors to prevent instantiation. With the class stubbed out, we can turn our attention to creating the policy document.

The policy document is JSON-formatted, but the JSON is simple enough that we don't have to resort to any fancy libraries. Instead, we can craft things by hand with a simple StringBuilder. First, we have to construct an ISO8601 date and set the policy document to expire by it. Uploads won't succeed once the policy document expires. Then, we have to put in the constraints we talked about earlier, which are called conditions in the policy document. Finally, the document is base-64 encoded and returned to the caller.

Add the method in Listing 23 to GSUtils:

Listing 23. Creating a policy document
public static String createPolicyDocument(String acl) {
    GregorianCalendar gc = new GregorianCalendar();
    gc.add(Calendar.MINUTE, 20);

    DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
    df.setTimeZone(TimeZone.getTimeZone("GMT"));
    String expiration = df.format(gc.getTime());

    StringBuilder buf = new StringBuilder();
    buf.append("{\"expiration\": \"");
    buf.append(expiration);
    buf.append("\"");
    buf.append(",\"conditions\": [");
    buf.append(",{\"acl\": \"");
    buf.append(acl);
    buf.append("\"}");        
    buf.append("[\"starts-with\", \"$key\", \"\"]");
    buf.append(",[\"starts-with\", \"$success_action_redirect\", \"\"]");
    buf.append("]}");

    return Base64.encode(buf.toString().replaceAll("\n", "").getBytes());
}

We use a GregorianCalendar that's been set 20 minutes into the future to construct the expiration date. The code is kludgy, which would be helped by printing it to the console, copying it, and running it through a tool like JSONLint (see Resources). Next, we pass the acl into the policy document to avoid hardcoding it. Anything that's variable should be passed in as a method argument like acl. Finally, the document is base-64 encoded before it's returned to the caller. See the Google Storage documentation for more information about what's allowed in the policy document.

Google's Secure Data Connector

We won't work with Google's Secure Data Connector in this article but it's worth checking out if you're planning to use Google Storage. SDC makes it easier to access data on your own systems even if those systems are behind a firewall.

Authentication in Google Storage

Policy documents serve two functions. Aside from enforcing policies, they're the basis of signatures we generate to authenticate uploads. When we sign up for Google Storage, we're given a secret key that only we and Google know. We sign the document on our side with the secret key, and Google signs it with the same key. If the signatures match, the upload is permitted. Figure 7 offers a better picture of this cycle:

Figure 7. How uploads are authenticated to Google Storage
A diagram of the GAE authentication cycle.

In order to generate a signature, we use the javax.crypto and java.security packages we imported while stubbing out GSUtils. Listing 24 shows the methods:

Listing 24. Signing a policy document
public static String signPolicyDocument(String policyDocument,
    String secret) {
    try {
        Mac mac = Mac.getInstance("HmacSHA1");
        byte[] secretBytes = secret.getBytes("UTF8");
        SecretKeySpec signingKey = 
            new SecretKeySpec(secretBytes, "HmacSHA1");
        mac.init(signingKey);
        byte[] signedSecretBytes = 
            mac.doFinal(policyDocument.getBytes("UTF8"));
        String signature = Base64.encode(signedSecretBytes);
        return signature;
    } catch (InvalidKeyException e) {
        throw new RuntimeException(e);
    } catch (NoSuchAlgorithmException e) {
        throw new RuntimeException(e);
    } catch (UnsupportedEncodingException e) {
        throw new RuntimeException(e);
    }
}

Secure hashing in Java code involves some rigmarole that i'd rather skip over in this article. What matters is that Listing 24 shows how it's done properly, and that the hash must be base-64 encoded before it's returned.

With those prerequisites taken care of, we're back in familiar territory: implementing the three abstract methods to upload and retrieve files from Google Storage.

Uploading to Google Storage

Start by stubbing out a class called GStorageUploadServlet based on the code in Listing 25:

Listing 25. GStorageUploadServlet
import info.johnwheeler.gaestorage.core.GSUtils;
import info.johnwheeler.gaestorage.core.Photo;
import info.johnwheeler.gaestorage.core.PhotoDao;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.UUID;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class GStorageUploadServlet extends AbstractUploadServlet {
    private PhotoDao dao = new PhotoDao();
}

The showForm method, shown in Listing 26, sets up the parameters we need to pass to Google Storage through the upload form:

Listing 26. showForm for Google Storage
@Override
protected void showForm(HttpServletRequest req, HttpServletResponse resp) 
    throws ServletException, IOException {
    String acl = "public-read";
    String secret = getServletConfig().getInitParameter("secret");
    String accessKey = getServletConfig().getInitParameter("accessKey");
    String endpoint = getServletConfig().getInitParameter("endpoint");
    String successActionRedirect = getBaseUrl(req) + 
        "gstorage?action=display&id=";
    String key = UUID.randomUUID().toString();
    String policy = GSUtils.createPolicyDocument(acl);
    String signature = GSUtils.signPolicyDocument(policy, secret);

    req.setAttribute("uploadUrl", endpoint);
    req.setAttribute("acl", acl);
    req.setAttribute("GoogleAccessId", accessKey);
    req.setAttribute("key", key);
    req.setAttribute("policy", policy);
    req.setAttribute("signature", signature);
    req.setAttribute("successActionRedirect", successActionRedirect);

    req.getRequestDispatcher("gstorage.jsp").forward(req, resp);
}

Note that the acl is set to public-read, so anything uploaded will be viewable by everyone. The next three variables, secret, accessKey, and endpoint, are used to get to and authenticate with Google Storage. They're pulled out of init-params declared in web.xml; see the sample code for details. Recall that, unlike Blobstore, which forwards to a URL that places us in showRecord, Google Storage issues a redirect. The redirect URL is stored in successActionRedirect. successActionRedirect relies on the helper method in Listing 27 to construct the redirect URL.

Listing 27. getBaseUrl()
private static String getBaseUrl(HttpServletRequest req) {
    String base = req.getScheme() + "://" + req.getServerName() + ":" + 
        req.getServerPort() + "/";
    return base;
}

The helper method polls the incoming request to construct the base URL before relinquishing control back to showForm. Upon return, a key is created with a universally unique identifier or UUID, which is a String guaranteed to be unique. Next, policy and signature are generated with the utility class we built. Finally, we set request attributes for the JSP before forwarding to it.

Listing 28 shows handleSubmit:

Listing 28. handleSubmit for Google Storage
@Override
protected void handleSubmit(HttpServletRequest req, HttpServletResponse 
    resp) throws ServletException, IOException {
    String endpoint = getServletConfig().getInitParameter("endpoint");
    String title = req.getParameter("title");
    String caption = req.getParameter("caption");
    String key = req.getParameter("key");

    Photo photo = new Photo(title, caption);
    photo.setPhotoPath(endpoint + key);
    dao.save(photo);

    PrintWriter out = resp.getWriter();
    out.println(Long.toString(photo.getId()));
    out.close();
}

Remember, when the first form is submitted, we're put in handleSubmit by an Ajax POST. The upload itself isn't handled there but separately in the Ajax callback. handleSubmit just parses the first form, constructs a Photo, and saves it to the datastore. Then, the Photo's ID is returned to the Ajax callback by writing it out to the response body.

In the callback, the upload form is submitted to the Google Storage endpoint. Once Google Storage processes the upload, it's set up to issue a redirect back to showRecord, in Listing 29:

Listing 29. showRecord for Google Storage
@Override
protected void showRecord(long id, HttpServletRequest req, 
    HttpServletResponse resp) throws ServletException, IOException {
    Photo photo = dao.findById(id);
    String photoPath = photo.getPhotoPath();
    resp.sendRedirect(photoPath);
}

showRecord looks up the Photo and redirects to its photoPath. photoPath points to our image hosted on Google's servers.


In conclusion

We've examined three Google-centric storage options and evaluated their pros and cons. Bigtable is easy to work with but imposes a 1MB file limit. Blobs in Blobstore can be up to 2GB a piece, but the one-time URL is a pain to work around in web services. Finally, Google Storage for Developers is the most robust option. We only pay for the storage we use, and the sky is the limit on how much data can be stored in a single file. Google Storage is also the most complex solution to work with, however, because its libraries currently don't support GAE. Supporting browser-based uploads also isn't the most straightforward thing in the world.

As Google App Engine becomes a more popular development platform for Java developers, understanding its various storage options is key. In this article, you've walked through simple implementation examples for Bigtable, Blobstore, and Google Storage for Developers. Whether you settle on one storage option and stick with it, or use each one for different use cases, you should now have the tools you need to store mountains of data on GAE.


Download

DescriptionNameSize
Sample code for this articlej-gaestorage.zip12KB

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Cloud computing
ArticleID=592218
ArticleTitle=GAE storage with Bigtable, Blobstore, and Google Storage
publish-date=12072010