Perl developers have a wonderful resource, the Comprehensive Perl Archive Network (CPAN). Amazon has a wonderful resource too—the Simple Storage Service (S3). Despite there being an official Amazon S3 Perl library (called "S3") on CPAN, there are at least five or six modules (though some are not entirely standalone) for talking to S3. It's as if there was a chili cook-off and instead of 200 people, 2,000 showed up.
Well, here I am the day after the cook-off, with a mop and a bucket, sharing the best recipes with you. These modules and tools are on CPAN, freely available, and you can start using them right away. The three recipes I'm recommending belong to these three modules: Net::Amazon::S3, Amazon::S3, and SOAP::Amazon::S3.
See the discussion of the benefits and drawbacks of S3 in the Cultured Perl: Perl and the Amazon cloud series of articles on developerWorks. Rehashing it here is not necessary, but I'll summarize briefly.
S3 is a storage service provided and managed by Amazon. Amazon charges users for access, so users don't have to worry about servers, backups, geographical bandwidth availability and distribution, etc. So S3 is, like most things in life, a convenience at a cost, and only you can decide if it's right for your business or personal use.
S3 data is organized by buckets (roughly linked to a domain name) and data items in those buckets. As a side note, it's very important to include the MIME type on upload or you'll get raw binary data back (this is especially unpleasant for images). The MIME type and other metadata can't be changed once written—you have to delete and recreate the item.
Before diving in, it's worth mentioning that this article is intended for intermediate-level Perl programmers. It does not explain basic Perl techniques in detail. You should also know what Amazon's S3 is and how to install a CPAN module. If not, consult the Resources section for background reading.
Using the three modules (Net::Amazon::S3, Amazon::S3, and SOAP::Amazon::S3), I explain how to perform a series of basic S3 operations and show the source code necessary, as follows:
- List, create, and delete buckets
- List, create, retrieve, and delete items in a bucket
- Get an item's metadata
To keep the tasks comparable among modules, I've reused the same command-line options and general structure.
The Net::Amazon::S3 module is a well-known module, well-supported and comprehensive. It has a few prerequisites but nothing as ridiculous as Moose, if you happen to be familiar with that.
We'll store the S3 keys in the environment as
S3KEY and S3SECRET.
That way we can access them from Perl as
$ENV{S3KEY} and
$ENV{S3SECRET}. This method of storing the data
in a file is a fairly safe way of protecting secret values (remember, if
your S3 keys get stolen, you'll be paying for someone else's bandwidth).
All right, so Net::Amazon::S3 is installed. (You actually want the suggested Net::Amazon::S3::Client interface, as Net::Amazon::S3 is described as the "legacy interface," but Net::Amazon::S3::Client version 0.50 had a bug due to the Moose module that prevented me from using it at the time I wrote this article. They fixed this in version 0.51, and the current version is 0.52.) Anyway, on to the example. As usual, all the scripts are in the downloads section.
Listing 1. net-amazon-s3.pl preliminaries
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use Getopt::Long; use Net::Amazon::S3; use MIME::Types; my $mime = MIME::Types->new(); |
This is standard stuff, including Data::Dumper
for debugging purposes. A generic MIME object is created, although we will
only use it for uploads.
Listing 2. net-amazon-s3.pl options, helpers, and initial S3 connection
my %opts = (
key => $ENV{S3KEY},
secret => $ENV{S3SECRET},
separator => '/',
);
GetOptions(
\%opts,
"create|c=s",
"delete|d=s",
"list|l:s", # the parameter is optional
"keys|k",
"metadata|m",
"get=s",
"put=s",
"separator=s",
"help|h",
);
unless ($opts{key} && $opts{secret} )
{
die "$0 requires the S3KEY and S3SECRET environment variables to be set.";
}
# handle -h
usage() if exists $opts{help};
my $s3 = Net::Amazon::S3->new(
aws_access_key_id => $opts{key},
aws_secret_access_key => $opts{secret},
retry => 1,
);
die "Could not connect to S3" unless defined $s3;
sub read_filename
{
print "\nEnter filename: ";
my $name = <>;
chomp $name;
return $name;
}
sub usage
{
print lt;<EOHIPPUS;
$0 [OPTIONS]
Pass your S3 key and secret in the S3KEY and S3SECRET environment entries.
Options:
--help or -h : this help
--separator $opts{separator} : BUCKET and KEY separator character
(for --get and --put)
--create BUCKET (or -c BUCKET) : create BUCKET
--delete BUCKET (or -d BUCKET) : delete BUCKET
--delete BUCKET$opts{separator}KEY : delete KEY in BUCKET
--list [BUCKET] (or -l) : list a specific bucket or all buckets
--keys (or -k) : list the keys in each bucket (requires --list)
--metadata (or -m) : show the keys' metadata
(requires --keys and --list)
--get BUCKET$opts{separator}KEY : download KEY from BUCKET
--put BUCKET$opts{separator}KEY : upload a file to KEY in BUCKET
EOHIPPUS
exit 0;
}
|
More standard stuff. The options are set up here. The
usage() function is pretty comprehensive to
make the program useful. Every option is listed and explained briefly.
Listing 3. net-amazon-s3.pl create and delete operations
if (exists $opts{create})
{
my $bucket = $s3->add_bucket( { bucket => $opts{create}} )
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
print "Created bucket '$opts{create}' successfully.\n";
}
elsif (exists $opts{delete})
{
my ($b, $key) = split $opts{separator}, $opts{delete};
my $bucket = $s3->bucket($b);
die "Could not retrieve bucket $b" unless $bucket;
if (defined $key)
{
$bucket->delete_key($key)
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
print "Deleted key '$key' in bucket '$b' successfully.\n";
}
else
{
$bucket->delete_bucket()
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
print "Deleted bucket '$b' successfully.\n";
}
}
|
The --create option is very simple. You just
create the bucket and return.
For --delete, handle either bucket deletion
(non-recursive so the user has to remove all the keys in the bucket first)
or key deletion within a bucket. The separator character comes into play
here as the splitting point between bucket name and key name.
Listing 4. net-amazon-s3.pl get and put operations
...
elsif (exists $opts{get})
{
my ($b, $key) = split $opts{separator}, $opts{get};
my $bucket = $s3->bucket($b);
die "Could not get the bucket $b" unless $bucket;
my $where = read_filename();
my $response = $bucket->get_key_filename( $key, 'GET', $where )
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
die "Could not create file $where" unless -f $where;
print "Successfully downloaded $key from bucket $b into $where\n";
}
elsif (exists $opts{put})
{
my ($b, $key) = split $opts{separator}, $opts{put};
my $bucket = $s3->bucket($b);
die "Could not get the bucket $b" unless $bucket;
my $where = read_filename();
die "File $where does not exist or is not readable" unless -f $where && -r $where;
my $response = $bucket->add_key_filename(
$key,
$where,
{ content_type => $mime->mimeTypeOf($where), },
)
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
print "Successfully uploaded $where into $key in bucket $b\n";
}
|
Get and put are very
similar, both operating on a file name and a key name. The only
interesting piece here is the MIME::Types $mime
object, which we use to get the content type. Remember, it can't be
changed once you upload.
Listing 5. net-amazon-s3.pl list operations
...
elsif (exists $opts{list})
{
print "Available buckets:\n";
my @todo;
if ($opts{list})
{
push @todo, map { $s3->bucket($_) } $opts{list};
}
else
{
print "(Getting all buckets)\n";
my $response = $s3->buckets;
die "Could not get the bucket list" unless $response;
@todo = @{$response->{buckets}};
}
foreach my $bucket ( @todo )
{
printf "\t%s\n", $bucket->bucket;
if (exists $opts{keys})
{
my $response = $bucket->list_all
or die sprintf ("%s: %s", $s3->err, $s3->errstr);
foreach my $key (@{$response->{keys}})
{
printf "\t\t%10s\t%s\n", $key->{size}, $key->{key};
if (exists $opts{metadata})
{
my $detail = $bucket->get_key($key->{key});
foreach my $entry (qw/content_length content_type etag/)
{
printf "\t\t\t%20s=%s\n", $entry, ($detail->{$entry}||'UNDEFINED');
}
}
}
}
}
}
|
Last and longest, the --list handler will
handle a single bucket or multiples. It will show the list of keys and
their metadata with the appropriate switches.
The \t characters will translate to tabs; it's
a rough way of indenting the output so it looks okay on a text terminal.
Net::Amazon::S3 is, as you can see, a fairly clean module. Some of the
methods are a little strange, but they work fine. There is a break between
getting an item and using it with the metadata that requires an extra
get_key() call. Everywhere, the code is simple
and short.
The Amazon::S3 module is a drop-in replacement for Net::Amazon::S3. I had
to change the new() call to use a hash
reference, but that was the only change (besides changing every
Net::Amazon::S3 reference to Amazon::S3, obviously).
Listing 6. amazon-s3.pl change from net-amazon-s3.pl
my $s3 = Amazon::S3->new({
aws_access_key_id => $opts{key},
aws_secret_access_key => $opts{secret},
retry => 1,
});
|
Amazon::S3 supports many options; you should consult the documentation if you think you can use them. In general, Amazon::S3 seems to have slightly different aims from Net::Amazon::S3, but from a developer standpoint, the API is the same. That is truly a nice take on library interoperability.
A feature of Amazon::S3 you might appreciate is that it doesn't depend on nearly as many modules as Net::Amazon::S3. Its author emphasizes that it's portable, which is a plus for those who need it.
The SOAP::Amazon::S3 module requires a little background, which I will simplify a bit for this article. SOAP is a way of accessing information over HTTP (the Web) different from the usual HTTP requests that use the path to indicate a resource. Those requests are commonly called REST-style and tend to bloom into a lot of URL variations; SOAP, on the other hand, hits just a few URLs and hands over a whole lot of XML to the server.
I wish I could go on about SOAP, but it's a broad topic so I'll let you hit the books and search engines if you're curious. For what it's worth, SOAP and REST are not that different. You can write great code and terrible code with each one just as easily.
Unfortunately, SOAP::Amazon::S3 is marked as experimental in the documentation. You should test it on your own to decide if it's suitable for you.
On to the changes. First of all, we'll require the File::Slurp module because SOAP::Amazon::S3 doesn't have the convenience methods for putting and getting a file that Net::Amazon::S3 and Amazon::S3 do.
Listing 7. soap-amazon-s3.pl preliminaries
# set Debug to 0 if you don't want to see all the XML
my $s3 = SOAP::Amazon::S3->new( $opts{key}, $opts{secret},
{ Debug => 1, RaiseError => 1 } );
|
I left Debug=1 because you'll probably be
curious to see all the SOAP traffic SOAP::Amazon::S3 generates. All the
error checks I did with Net::Amazon::S3 were not necessary because of
RaiseError=1, but this may not be a good idea
for a production environment.
Creating a bucket
(my $bucket = $s3->createbucket($opts{create});)
is different from creating one with Net::Amazon::S3, as is deleting a
bucket ($bucket->delete()) or an object
($bucket->object($key)->delete();).
Writing to a file was slightly more complicated because, as I mentioned, SOAP::Amazon::S3 doesn't have the convenience functions we saw in Net::Amazon::S3.
Listing 8. soap-amazon-s3.pl write file
open W, '>', $where or die "Could not write to $where: $!"; print W $bucket->object($key)->getdata(); close W; |
Ditto for uploading from a file.
Listing 9. soap-amazon-s3.pl upload file
my $type = ''.$mime->mimeTypeOf($where); # force $type to be a string
# use File::Slurp::read_file in scalar context to grab the whole file's contents
my $data = read_file($where);
$bucket->putobject($key, $data, { 'Content-Type' => $type });
|
The MIME type has to be forced into a string, or the XML helper modules will complain. Also the content type is capitalized differently from Net::Amazon::S3.
Finally, we get to listing the items. Here, SOAP::Amazon::S3 was not as
powerful as Net::Amazon::S3 because object metadata was not available.
SOAP::Amazon::S3 has a nice $object->url()
method, but it doesn't convert spaces to %20,
for example, so you should try it to be sure it will work for you. I
submitted a bug (it's an experimental module, remember), but for now I
would avoid the $object->url() method in a
production environment.
Listing 10. soap-amazon-s3.pl listing buckets and objects in them
...
elsif (exists $opts{list})
{
print "Available buckets:\n";
my @todo;
if ($opts{list})
{
push @todo, map { $s3->bucket($_) } $opts{list};
}
else
{
print "(Getting all buckets)\n";
@todo = $s3->listbuckets;
}
foreach my $bucket ( @todo )
{
printf "\t%s\n", $bucket->name;
if (exists $opts{keys})
{
foreach my $key ($bucket->list())
{
printf "\t\t%10s\t%-30s\t%s\n", $key->{Size}, $key->name, $key->url;
if (exists $opts{metadata})
{
foreach my $entry (qw/Size ETag LastModified/)
{
printf "\t\t\t%20s=%s\n", $entry, ($key->{$entry}||'UNDEFINED');
}
}
}
}
}
}
|
As you can see, SOAP::Amazon::S3 doesn't get the metadata, so you can't see the object's content type, for instance. This can be an unpleasant restriction if you use the metadata to store important information about an object. Otherwise, SOAP::Amazon::S3 compares well to Net::Amazon::S3 for listing buckets and their elements.
I did not evaluate two other S3 CPAN modules in detail, but here's a quick summary:
- Net::Amazon::S3::Tools is nice if you want command-line access to S3. It's better than writing your own, so for a ready-made toolkit, you should check it out.
- Tie::Amazon::S3 is a nice module for working with a single bucket. You can delete or modify hash entries, add a new entry, and so on. Unfortunately, it does not allow bucket-level operations (create bucket, delete bucket, etc.) or metadata operations (especially setting the content type on a new key). Thus, Tie::Amazon::S3 is useful if you want to store pure data in S3, but for images it won't be very helpful.
Overall, Net::Amazon::S3 and Amazon::S3 are currently the best choice for S3 work. Their interchangeable API makes it easy to switch when necessary. They support all the operations, from bucket operations down to item metadata retrieval. Their API is slightly strange, but easy to use once you set it up.
Keep an eye on SOAP::Amazon::S3; it has good potential. If you're working with pure data in a single bucket, definitely go for Tie::Amazon::S3 because it will make your life easy.
Good luck using S3!
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample scripts | amazon-s3-scripts.zip | 5KB | HTTP |
Information about download methods
Learn
- Learn more about
Amazon S3.
- The Cultured Perl: Perl and the Amazon
cloud series (developerWorks, March - June 2009) walks you through building a
simple photo-sharing Web site using Perl and Apache to access Amazon's
Simple Storage Service (S3) and SimpleDB:
- Part 1 introduces the benefits and drawbacks of S3 and SimpleDB by taking a tour of their architectures.
- Part 2 shows you how to upload a file into S3 from a Web page through an HTML form to minimize the load on the server and maintain a tight security policy.
- Part 3 details how the URL creates a SimpleDB record for the uploaded file and how to create, edit, and delete comments as SimpleDB records on a photo for a particular user.
- Part 4
examines the full
mod_perlsite's code base, including how to configure the top level, what to do with the handlers, and how to set up external dependencies. - Part 5
examines the full
mod_perlsite's templates, including one for indexing, three for uploading (general, S3 forms, and URL additions), one for image and comment browsing, and one for browsing comments recursively for an image (or threading down).
- Did
Moose
intrigue you? Check it out. Ditto with
File::Slurp.
- For more on SOAP and REST, try the
developerWorks SOA and Web services
zone.
- In the
developerWorks Linux zone,
find more resources for Linux developers, and scan our
most popular articles and
tutorials.
- See all
Linux articles
and
Linux tutorials
on developerWorks.
- Stay current with
developerWorks technical events
and webcasts.
Get products and technologies
- At the
CPAN (Comprehensive Perl Archive Network)
site you can find modules—scads and scads of modules—and
module documentation. Here's
Net-Amazon-S3-0.current modules;
Amazon-S3-0.current modules;
and
SOAP-Amazon-S3-0.current modules.
Plus the two I didn't really discuss:
Net-Amazon-S3-Tools-0.current
and
Tie-Amazon-S3-0.current.
- With
IBM trial software,
available for download directly from developerWorks, build your next
development project on Linux.
Discuss
- Get involved in the
My developerWorks community.
Connect with other developerWorks users while exploring the
developer-driven blogs, forums, groups, and wikis.





