IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > Patrick Mueller > Amazon S3 vs RESTful Collections > Information > Page Comparison
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
Amazon S3 vs RESTful Collections
Version 5 by pmuellr
on Oct 16, 2006 08:37.


compared with
Current by pmuellr
on Oct 16, 2006 12:53.

(show comment)
 
Key
These lines were removed. This word was removed.
These lines were added. This word was added.

View page history


There are 1 changes. View first change.

 Let's start with some references:
  
 h2. Amazon S3
  
 This is a service that Amazon provides to store data. It has both RESTy and SOAPy interfaces. The entry point in Amazon for info on S3 is here: [http://aws.amazon.com/s3]. The current reference docs are here: [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=123&categoryID=48].
  
 h2. RESTful Collections
  
 [Joe Gregorio|http://bitworking.org/] has been blogging on this, and here's a good primer on what this is all about: [http://bitworking.org/news/wsgicollection].
  
 h1. What are we doing here?
  
 I'm a big believer in RESTy web services, and the collections pattern as described by Joe seems like a good stab at applying some order in the otherwise unordered world of REST.
  
 Amazon S3 provides a lot of the same functionality as RESTful collections, and it's a real, live, commercial ($$$) implementation, backed by a pretty stable company.
  
 I thought it would be interesting to compare/constrast these.
  
 h1. Is Amazon S3 already a RESTful Collection?
  
 Here's some brief details on the S3 service, if you don't already know about it.
  
 * each account holder can create buckets, which are used to hold objects
  
 * a bucket may contain an unlimited number of objects, stored as a binary blob, each of which may be from 1 byte to 5 gigabytes in size
  
 * buckets and objects are named, and are combined to form a URI at which to access a resource (see table below).
  
 Let's look at the operations you can perform against your S3 account. I will express these as the HTTP verb and URI pattern invoked against.
  
 | GET / | get a list of all your buckets |
 | PUT /\{bucket\} | create a new bucket |
 | GET /\{bucket\} | list the contents of the bucket |
 | DELETE /\{bucket\} | delete the bucket |
 | PUT /\{bucket\}/\{object\} | create/update an object |
 | GET /\{bucket\}/\{object\} | get the contents of the object |
 | DELETE /\{bucket\}/\{object\} | delete the object |
  
 Compare this to the same table for RESTful collections.
  
 | GET /\{collection\} | get a list of the objects in the collection |
 | POST /\{collection\} | create a new object in a collection |
 | GET /\{collection\}/\{object\} | get the contents of an object |
 | PUT /\{collection\}/\{object\} | update the contents of an object |
 | DELETE /\{collection\}/\{object\} | delete the object |
  
 S3 adds:
  
 * getting a list of your collections (buckets)
 * creating a new collection (bucket)
 * deleting a collection (bucket)
  
 And the big difference is the different pattern for object creation.
  
 h1. Different object creation pattern
  
 Note that the object creation style is different than the RESTful collection style, which does a {{POST}} against a collection, instead of a {{PUT}} against the object. There's something nice about both styles. The {{POST}} to the collection style is nice because it separates create from update, which I think is important. On the other hand, the {{PUT}} of the object with it's name is nice because you don't need to depend on the server to tell you what the resulting uri is, via the Location: header that's output. How does the server determine that resulting uri, especially if you want it to be human readable, in the {{POST}} case?
  
 In the end, the {{PUT}} style seems a bit more symmetric to me. The update vs. create issue is real through. The scenario is that two people try to create the same named object at the same time. With the {{PUT}} style, the first one operates as a create, the second as an update *against the same object*, and the system doesn't know any better. The object sent in the first received {{PUT}} is basicaly thrown away. In the {{POST}} style, two different objects are created.
  
 In theory, this is solvable by using {{If-Match}} and {{If-None-Match}} headers on the {{PUT}}. Specifically, when you attempting to create a new item, use the header
  
 {{If-None-Match: *}}
  
 This means that the {{PUT}} request should fail with a {{412 Precondition Failed}} response if the resource already exists.
  
 When you are attempting to update an existing item, you should first {{GET}} the item, and use the {{ETag}} returned in the response in the header
  
 {{If-Match: \[ETag returned from GET\]}}
  
 This means the {{PUT}} request should fail if the resource has been updated since you last performed the {{GET}}.
  
 A note in the S3 doc indicates the {{ETag}} is an MD5 of the content of the resource, so 'updated' in the sentence above implies the bits of the entity have changed. If for some reason the entity was updated with the exact same bits that it previously had, the {{ETag}} will not have changed, but then, that's probably ok anyway.
  
 Curiously, the S3 doc indicates that {{GET}} requests can use the various forms of the {{If-}} headers, but those headers aren't documented for {{PUT}}. Some experimentation here is required. If in fact S3 *does not* support the {{If-}} headers on PUT, then other measures will need to be taken to ensure objects are not inadvertantly overwritten. And those other measures may include using the [Amazon Simple Queue Service (SQS)|http://aws.amazon.com/sqs].
  
 h1. More S3 issues
  
 The issues don't stop here. Here are some S3 constraints:
  
 * S3 account holders can only create up to 100 buckets
  
 * buckets names must be globally unique
  
 The first constraint means that we can't just use a bucket for every collection we might want to support, because we'll run out of buckets. Each bucket is more like a relational database than a table in a database, basically. The second constaint means that we can't use a 'nice' name for our buckets, because only one person will be able to create a bucket named 'recipes'.
  
 h1. Additional S3 capabilities
  
 Given what I've said, you might thing that think that this is a bit hopeless as a general purpose RESTful collection. You're going to have odd constraints on the name of your collections, and you can't really have very many collections to start with. However, there are some additional capabilities in the 'list the contents of the bucket' functionality ({{GET /\{bucket\}}}) which point to a different approach.
  
 The additional capabilities are:
  
 * the ability to only list objects whose names begin with a specific prefix
 * the ability to 'roll up' a list of prefixes that are shared by a set of objects
  
 To describe these, let's say I create a bucket which contains a set of ToDo lists (like [Ta-Da Lists|http://www.tadalist.com/.]); each ToDo list is named with a 'simple' name (constrained to a path element; ie, a directory name); and each ToDo item also has a 'simple name', which is prefixed by the ToDo list it's part of. And we'll do this in a directory-style naming, where the ToDo list and ToDo item are separated by a {{/}}. Here's an example of what the 'list all the objects in my bucket' might return.
  
 {{home/pick-up-milk}}
 {{home/trim-nose-hair}}
 {{work/submit-tps-report}}
 {{work/beg-for-a-raise}}
  
 In this example, I have two ToDo lists, 'home' and 'work', each having two ToDo items.
  
 The first additional capability, only listing objects with a specific prefix, can be used in such a way that I can issue a request like 'list all the objects starting with "/home"'. And I'd get
  
 {{home/pick-up-milk}}
 {{home/trim-nose-hair}}
  
 Nice. Except, before I even do that, how do I know that 'home' exists? This is what the 'rollup' capability does, which is called 'delimiter' access in the S3 docs. I can issue a request like 'list all the objects with delimiter "/". And I'd get
  
 {{home}}
 {{work}}
  
 Prefix and delimiter can work together for arbitrary depth 'trees'. Perhaps I'd like to be more elaborate and have ToDo lists that are 'current' and some for the 'future'. For example, a complete list of my bucket might be
  
 {{current/home/pick-up-milk}}
 {{current/home/trim-nose-hair}}
 {{current/work/submit-tps-report}}
 {{current/work/beg-for-a-raise}}
 {{future/retirement/buy-a-jet-car}}
 {{future/retirement/take-a-nap}}
  
 I can issue a request list 'list all objects prefixed with "current" using delimiter "/"' and get back
  
 {{current/home}}
 {{current/work}}
  
 (aside: The names returned may not be what you are actually getting from S3, but you get the gist; I need to actually get off my duff and write some code to get the exact details.)
  
 In general, if you name your objects appropriately, you can use prefix and delimiter to do tree-based resource listing. Can you say [GET NEXT IN PARENT|http://publib.boulder.ibm.com/infocenter/dzichelp/topic/com.ibm.ims9.doc.apdb/p3hgnp.htm]? I thought you could.
  
 (aside: I've been having chats with some folks recently, pleasantly reminiscing the old IMS DL/I hierarchical database days. It's coming back, I tell ya, it's coming back!)
  
 Additionally, the 'list the contents of the bucket' functionality includes some pretty nice pagination control, which is important when you are dealing with a lot of objects.
  
 h1. Conclusion
  
 Amazon S3 provides a lot of the same functionality as described by RESTful collections, with some slightly different usage patterns (object creation), some practical limitations (the number of buckets, bucket names), and some additional capabilities (name-based searching in the collection). It's something I want to get started playing with, especially since it's pretty inexpensive to play, as long as you're not storing a lot of data or accessing it a lot.
  
 Two things that I didn't talk about, but are quite interesting, are the authentication / authorization story with S3, and exception story. Go read up. Though the auth story seems a bit overkill, the exception story is something we should start talking about w/r/t RESTful collections.
  
 The {{If-}} header support on the {{PUT}} method used for object creation is something that I need to look into via some experiments. If it's not supported, I'll propose (forum?) that they do. And figure out the work-around.
  
 One interesting exercise would be to implement the RESTful collections pattern on top of S3, which would presumably be a fairly thin veneer. One thing in particular that would need to happen is to convert the XML returned by something like the 'list objects' functionality, potentially, into some other content-negotiated content; JSON and HTML being the obvious examples, because I don't like XML, and I love JSON and HTML. :)
  
 h1. Updates
  
  
  
 h2. 2006-10-16
  
 I've uploaded my little command-line S3 utility, here: [http://www.muellerware.org/projects/s3u/index.html].
  
  As an update to my claim that {{If-}} doesn't work with S3, [this thread|http://developer.amazonwebservices.com/connect/thread.jspa?threadID=10109&tstart=0] claims that it does work. Hmmmm. Need to investigate more ...
  As an update to my claim that {{If-}} doesn't work with S3, [this thread|http://developer.amazonwebservices.com/connect/thread.jspa?threadID=10109&tstart=0] claims that it does work, for {{HEAD}}. But not for {{PUT}}. Yet.
  
  
  
 h2. 2006-10-15
  
 So, my fears were right; S3 does not currently support the use of {{If-Match}} and {{If-None-Match}} for the 'put object' functionality. Here's a message flow using {{If-None-Match}} to try to enforce the {{PUT}} of an object which doesn't already exist.
  
 {noformat}
 -----------------------------------------------------
 Request
 -----------------------------------------------------
 PUT https://s3.amazonaws.com/org.muellerware.testing/a/a.html
 Server: s3.amazonaws.com
 Date: Mon, 16 Oct 2006 05:09:17 GMT
 Authorization: AWS xxxx:yyyy
 x-amz-acl: private
 If-None-Match: *
 Content-Length: 19
  
 -----------------------------------------------------
 Response
 -----------------------------------------------------
 HTTP/1.1 501 Not Implemented
 x-amz-request-id: xyz
 x-amz-id-2: xyz
 Content-Type: application/xml
 Transfer-Encoding: chunked
 Date: Mon, 16 Oct 2006 05:09:25 GMT
 Connection: close
 Server: AmazonS3
  
 <?xml version="1.0" encoding="UTF-8"?>
 <Error><Code>NotImplemented</Code><Message>A header you provided implies
 functionality that is not implemented</Message><RequestId>xyz
 </RequestId><Header>If-None-Match</Header><HostId>xyz</HostId></Error>
 {noformat}
  
 &nbsp;
  

 
    About IBM Privacy Contact