Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Cultured Perl: Storage management on Amazon S3

Three CPAN modules for working with S3 buckets and their contents

Teodor Zlatanov, Programmer, Gold Software Systems
photo- teodor zlatanov
Teodor Zlatanov emerged with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, database architectures, user interfaces, and UNIX system administration.

Summary:  Learn how Perl programmers can use three of the CPAN S3 modulesNet::Amazon::S3, Amazon::S3, and SOAP::Amazon::S3to list, create, and delete buckets (S3 data storage); to list, create, retrieve, and delete items in a bucket; and to get an item's metadata.

Share your expertise:  What's your favorite way to access S3? Add your comments below.

View more content in this series

Date:  20 Jan 2010
Level:  Intermediate PDF:  A4 and Letter (49KB | 14 pages)Get Adobe® Reader®
Also available in:   Korean  Japanese  Portuguese  Spanish

Activity:  43379 views
Comments:  

IBM and Amazon Web Services

Cloud computing provides a way to develop applications in a virtual environment, where computing capacity, bandwidth, storage, security and reliability aren't issues—you don't need to install the software on your own system. In a virtual computing environment, you can develop, deploy, and manage applications, paying only for the time and capacity you use, while scaling up or down to accommodate changing needs or business requirements.

IBM has partnered with Amazon Web Services to give you access to IBM software products in the Amazon Elastic Compute Cloud (EC2) virtual environment. Our software offerings on EC2 include:

  • DB2 Express-C 9.5
  • Informix Dynamic Server Developer Edition 11.5
  • WebSphere Portal Server and Lotus Web Content Management Standard Edition
  • WebSphere sMash

This is product-level code, with all features and options enabled. Get more information and download the Amazon Machine Images for these products on the IBM developerWorks Cloud Computing Resource Center.

For more cloud computing resources, see the Cloud Computing space on developerWorks.

Perl developers have a wonderful resource, the Comprehensive Perl Archive Network (CPAN). Amazon has a wonderful resource too—the Simple Storage Service (S3). Despite there being an official Amazon S3 Perl library (called "S3") on CPAN, there are at least five or six modules (though some are not entirely standalone) for talking to S3. It's as if there was a chili cook-off and instead of 200 people, 2,000 showed up.

Well, here I am the day after the cook-off, with a mop and a bucket, sharing the best recipes with you. These modules and tools are on CPAN, freely available, and you can start using them right away. The three recipes I'm recommending belong to these three modules: Net::Amazon::S3, Amazon::S3, and SOAP::Amazon::S3.

See the discussion of the benefits and drawbacks of S3 in the Cultured Perl: Perl and the Amazon cloud series of articles on developerWorks. Rehashing it here is not necessary, but I'll summarize briefly.

S3 is a storage service provided and managed by Amazon. Amazon charges users for access, so users don't have to worry about servers, backups, geographical bandwidth availability and distribution, etc. So S3 is, like most things in life, a convenience at a cost, and only you can decide if it's right for your business or personal use.

S3 data is organized by buckets (roughly linked to a domain name) and data items in those buckets. As a side note, it's very important to include the MIME type on upload or you'll get raw binary data back (this is especially unpleasant for images). The MIME type and other metadata can't be changed once written—you have to delete and recreate the item.

Before diving in, it's worth mentioning that this article is intended for intermediate-level Perl programmers. It does not explain basic Perl techniques in detail. You should also know what Amazon's S3 is and how to install a CPAN module. If not, consult the Resources section for background reading.

What's the goal?

Connect with Ted

Ted is one of our most popular and prolific authors. Browse all of Ted's articles on developerWorks. Check out Ted's profile and connect with him, other authors, and fellow readers in My developerWorks.

Using the three modules (Net::Amazon::S3, Amazon::S3, and SOAP::Amazon::S3), I explain how to perform a series of basic S3 operations and show the source code necessary, as follows:

  • List, create, and delete buckets
  • List, create, retrieve, and delete items in a bucket
  • Get an item's metadata

To keep the tasks comparable among modules, I've reused the same command-line options and general structure.


Net::Amazon::S3

Loving the Moose

The main goal of Moose is to make Perl 5 OOP easier, more consistent, and less likely to drive programmers insane via tedium. Moose supposedly helps you focus on what you want to accomplish by helping you "let go" of the mechanics of object-oriented programming. Built atop Class::MOP, Moose makes metaclass programming easier too. If you haven't tried it yet, you really should; it's a wonderful way to do OOP in Perl. See more on Moose in Resources.

The Net::Amazon::S3 module is a well-known module, well-supported and comprehensive. It has a few prerequisites but nothing as ridiculous as Moose, if you happen to be familiar with that.

We'll store the S3 keys in the environment as S3KEY and S3SECRET. That way we can access them from Perl as $ENV{S3KEY} and $ENV{S3SECRET}. This method of storing the data in a file is a fairly safe way of protecting secret values (remember, if your S3 keys get stolen, you'll be paying for someone else's bandwidth).

All right, so Net::Amazon::S3 is installed. (You actually want the suggested Net::Amazon::S3::Client interface, as Net::Amazon::S3 is described as the "legacy interface," but Net::Amazon::S3::Client version 0.50 had a bug due to the Moose module that prevented me from using it at the time I wrote this article. They fixed this in version 0.51, and the current version is 0.52.) Anyway, on to the example. As usual, all the scripts are in the downloads section.


Listing 1. net-amazon-s3.pl preliminaries

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;
use Getopt::Long;
use Net::Amazon::S3;
use MIME::Types;

my $mime = MIME::Types->new();

This is standard stuff, including Data::Dumper for debugging purposes. A generic MIME object is created, although we will only use it for uploads.


Listing 2. net-amazon-s3.pl options, helpers, and initial S3 connection

my %opts = (
    key       => $ENV{S3KEY},
    secret    => $ENV{S3SECRET},
    separator => '/',
   );

GetOptions(
   \%opts,
   "create|c=s",
   "delete|d=s",
   "list|l:s",         # the parameter is optional
   "keys|k",
   "metadata|m",
   "get=s",
   "put=s",
   "separator=s",
   "help|h",
  );

unless ($opts{key} && $opts{secret} )
{
 die "$0 requires the S3KEY and S3SECRET environment variables to be set.";
}

# handle -h
usage() if exists $opts{help};

my $s3 = Net::Amazon::S3->new(
                  aws_access_key_id     => $opts{key},
                  aws_secret_access_key => $opts{secret},
                  retry                 => 1,
                 );

die "Could not connect to S3" unless defined $s3;


sub read_filename
{
 print "\nEnter filename: ";
 my $name = <>;
 chomp $name;
 return $name;
}

sub usage
{
 print lt;<EOHIPPUS;

 $0 [OPTIONS]

Pass your S3 key and secret in the S3KEY and S3SECRET environment entries.

Options:
 --help or -h                           : this help
 --separator $opts{separator}           : BUCKET and KEY separator character
                                          (for --get and --put)
 --create BUCKET (or -c BUCKET)         : create BUCKET
 --delete BUCKET (or -d BUCKET)         : delete BUCKET
 --delete BUCKET$opts{separator}KEY     : delete KEY in BUCKET
 --list [BUCKET] (or -l)                : list a specific bucket or all buckets
 --keys (or -k)                         : list the keys in each bucket (requires --list)
 --metadata (or -m)                     : show the keys' metadata 
                                          (requires --keys and --list)
 --get BUCKET$opts{separator}KEY        : download KEY from BUCKET
 --put BUCKET$opts{separator}KEY        : upload a file to KEY in BUCKET

EOHIPPUS

  exit 0;
}

More standard stuff. The options are set up here. The usage() function is pretty comprehensive to make the program useful. Every option is listed and explained briefly.


Listing 3. net-amazon-s3.pl create and delete operations

if (exists $opts{create}) 
{
 my $bucket = $s3->add_bucket( { bucket => $opts{create}} )
  or die sprintf ("%s: %s", $s3->err, $s3->errstr);
 print "Created bucket '$opts{create}' successfully.\n";
}
elsif (exists $opts{delete}) 
{
 my ($b, $key) = split $opts{separator}, $opts{delete};
 my $bucket = $s3->bucket($b);
 die "Could not retrieve bucket $b" unless $bucket;
 if (defined $key)
 {
  $bucket->delete_key($key)
   or die sprintf ("%s: %s", $s3->err, $s3->errstr);
  print "Deleted key '$key' in bucket '$b' successfully.\n";
 }
 else
 {
  $bucket->delete_bucket()
   or die sprintf ("%s: %s", $s3->err, $s3->errstr);
  print "Deleted bucket '$b' successfully.\n";
 }
}

The --create option is very simple. You just create the bucket and return.

For --delete, handle either bucket deletion (non-recursive so the user has to remove all the keys in the bucket first) or key deletion within a bucket. The separator character comes into play here as the splitting point between bucket name and key name.


Listing 4. net-amazon-s3.pl get and put operations

...
elsif (exists $opts{get})
{
 my ($b, $key) = split $opts{separator}, $opts{get};
 my $bucket = $s3->bucket($b);
 die "Could not get the bucket $b" unless $bucket;
 my $where = read_filename();
 my $response = $bucket->get_key_filename( $key, 'GET', $where )
  or die sprintf ("%s: %s", $s3->err, $s3->errstr);
 die "Could not create file $where" unless -f $where;
 print "Successfully downloaded $key from bucket $b into $where\n";
}
elsif (exists $opts{put})
{
 my ($b, $key) = split $opts{separator}, $opts{put};
 my $bucket = $s3->bucket($b);
 die "Could not get the bucket $b" unless $bucket;
 my $where = read_filename();
 die "File $where does not exist or is not readable" unless -f $where && -r $where;
  
 my $response = $bucket->add_key_filename(
                      $key,
                      $where,
                      { content_type => $mime->mimeTypeOf($where), },
                     )
  or die sprintf ("%s: %s", $s3->err, $s3->errstr);
 print "Successfully uploaded $where into $key in bucket $b\n";
}

Get and put are very similar, both operating on a file name and a key name. The only interesting piece here is the MIME::Types $mime object, which we use to get the content type. Remember, it can't be changed once you upload.


Listing 5. net-amazon-s3.pl list operations

...
elsif (exists $opts{list})
{
 print "Available buckets:\n";

 my @todo;

 if ($opts{list})
 {
  push @todo, map { $s3->bucket($_) } $opts{list};
 }
 else
 {
  print "(Getting all buckets)\n";
  my $response = $s3->buckets;
  die "Could not get the bucket list" unless $response;
  @todo = @{$response->{buckets}};
 }

 foreach my $bucket ( @todo )
 {
  printf "\t%s\n", $bucket->bucket;

  if (exists $opts{keys})
  {
   my $response = $bucket->list_all
    or die sprintf ("%s: %s", $s3->err, $s3->errstr);
   
   foreach my $key (@{$response->{keys}})
   {
    printf "\t\t%10s\t%s\n", $key->{size}, $key->{key};
    if (exists $opts{metadata})
    {
     my $detail = $bucket->get_key($key->{key});
     foreach my $entry (qw/content_length content_type etag/)
     {
      printf "\t\t\t%20s=%s\n", $entry, ($detail->{$entry}||'UNDEFINED');
     }
    }
   }
  }
 }
}

Last and longest, the --list handler will handle a single bucket or multiples. It will show the list of keys and their metadata with the appropriate switches.

The \t characters will translate to tabs; it's a rough way of indenting the output so it looks okay on a text terminal.

Net::Amazon::S3 is, as you can see, a fairly clean module. Some of the methods are a little strange, but they work fine. There is a break between getting an item and using it with the metadata that requires an extra get_key() call. Everywhere, the code is simple and short.


Amazon::S3

The Amazon::S3 module is a drop-in replacement for Net::Amazon::S3. I had to change the new() call to use a hash reference, but that was the only change (besides changing every Net::Amazon::S3 reference to Amazon::S3, obviously).


Listing 6. amazon-s3.pl change from net-amazon-s3.pl

 my $s3 = Amazon::S3->new({
                           aws_access_key_id     => $opts{key},
                           aws_secret_access_key => $opts{secret},
                           retry                 => 1,
                          });

Amazon::S3 supports many options; you should consult the documentation if you think you can use them. In general, Amazon::S3 seems to have slightly different aims from Net::Amazon::S3, but from a developer standpoint, the API is the same. That is truly a nice take on library interoperability.

A feature of Amazon::S3 you might appreciate is that it doesn't depend on nearly as many modules as Net::Amazon::S3. Its author emphasizes that it's portable, which is a plus for those who need it.


SOAP::Amazon::S3

Drinking in File::Slurp

The File::Slurp module provides subs that allow you to read or write entire files with one simple call. They are designed to be simple and efficient to use, with flexible ways to pass in or get the file contents. There is also a sub to read in all the files in a directory other than . and ... The subs work for files, pipes and sockets, and stdio, pseudo-files, and DATA. For more on File::Slurp, see Resources.

The SOAP::Amazon::S3 module requires a little background, which I will simplify a bit for this article. SOAP is a way of accessing information over HTTP (the Web) different from the usual HTTP requests that use the path to indicate a resource. Those requests are commonly called REST-style and tend to bloom into a lot of URL variations; SOAP, on the other hand, hits just a few URLs and hands over a whole lot of XML to the server.

I wish I could go on about SOAP, but it's a broad topic so I'll let you hit the books and search engines if you're curious. For what it's worth, SOAP and REST are not that different. You can write great code and terrible code with each one just as easily.

Unfortunately, SOAP::Amazon::S3 is marked as experimental in the documentation. You should test it on your own to decide if it's suitable for you.

On to the changes. First of all, we'll require the File::Slurp module because SOAP::Amazon::S3 doesn't have the convenience methods for putting and getting a file that Net::Amazon::S3 and Amazon::S3 do.


Listing 7. soap-amazon-s3.pl preliminaries

# set Debug to 0 if you don't want to see all the XML
my $s3 = SOAP::Amazon::S3->new( $opts{key}, $opts{secret},
 { Debug => 1, RaiseError => 1 } );

I left Debug=1 because you'll probably be curious to see all the SOAP traffic SOAP::Amazon::S3 generates. All the error checks I did with Net::Amazon::S3 were not necessary because of RaiseError=1, but this may not be a good idea for a production environment.

Creating a bucket (my $bucket = $s3->createbucket($opts{create});) is different from creating one with Net::Amazon::S3, as is deleting a bucket ($bucket->delete()) or an object ($bucket->object($key)->delete();).

Writing to a file was slightly more complicated because, as I mentioned, SOAP::Amazon::S3 doesn't have the convenience functions we saw in Net::Amazon::S3.


Listing 8. soap-amazon-s3.pl write file

open W, '>', $where or die "Could not write to $where: $!";
print W $bucket->object($key)->getdata();
close W;

Ditto for uploading from a file.


Listing 9. soap-amazon-s3.pl upload file

 my $type = ''.$mime->mimeTypeOf($where); # force $type to be a string

 # use File::Slurp::read_file in scalar context to grab the whole file's contents
 my $data = read_file($where);
 $bucket->putobject($key, $data, { 'Content-Type' => $type });

The MIME type has to be forced into a string, or the XML helper modules will complain. Also the content type is capitalized differently from Net::Amazon::S3.

Finally, we get to listing the items. Here, SOAP::Amazon::S3 was not as powerful as Net::Amazon::S3 because object metadata was not available. SOAP::Amazon::S3 has a nice $object->url() method, but it doesn't convert spaces to %20, for example, so you should try it to be sure it will work for you. I submitted a bug (it's an experimental module, remember), but for now I would avoid the $object->url() method in a production environment.


Listing 10. soap-amazon-s3.pl listing buckets and objects in them

...
elsif (exists $opts{list})
{
 print "Available buckets:\n";

 my @todo;

 if ($opts{list})
 {
  push @todo, map { $s3->bucket($_) } $opts{list};
 }
 else
 {
  print "(Getting all buckets)\n";
  @todo = $s3->listbuckets;
 }

 foreach my $bucket ( @todo )
 {
  printf "\t%s\n", $bucket->name;

  if (exists $opts{keys})
  {
   foreach my $key ($bucket->list())
   {
    printf "\t\t%10s\t%-30s\t%s\n", $key->{Size}, $key->name, $key->url;
    if (exists $opts{metadata})
    {
     foreach my $entry (qw/Size ETag LastModified/)
     {
       printf "\t\t\t%20s=%s\n", $entry, ($key->{$entry}||'UNDEFINED');
     }
    }
   }
  }
 }
}

As you can see, SOAP::Amazon::S3 doesn't get the metadata, so you can't see the object's content type, for instance. This can be an unpleasant restriction if you use the metadata to store important information about an object. Otherwise, SOAP::Amazon::S3 compares well to Net::Amazon::S3 for listing buckets and their elements.


Conclusion

I did not evaluate two other S3 CPAN modules in detail, but here's a quick summary:

  • Net::Amazon::S3::Tools is nice if you want command-line access to S3. It's better than writing your own, so for a ready-made toolkit, you should check it out.
  • Tie::Amazon::S3 is a nice module for working with a single bucket. You can delete or modify hash entries, add a new entry, and so on. Unfortunately, it does not allow bucket-level operations (create bucket, delete bucket, etc.) or metadata operations (especially setting the content type on a new key). Thus, Tie::Amazon::S3 is useful if you want to store pure data in S3, but for images it won't be very helpful.

Overall, Net::Amazon::S3 and Amazon::S3 are currently the best choice for S3 work. Their interchangeable API makes it easy to switch when necessary. They support all the operations, from bucket operations down to item metadata retrieval. Their API is slightly strange, but easy to use once you set it up.

Keep an eye on SOAP::Amazon::S3; it has good potential. If you're working with pure data in a single bucket, definitely go for Tie::Amazon::S3 because it will make your life easy.

Good luck using S3!



Download

DescriptionNameSizeDownload method
Sample scriptsamazon-s3-scripts.zip5KBHTTP

Information about download methods


Resources

Learn

  • Learn more about Amazon S3.

  • The Cultured Perl: Perl and the Amazon cloud series (developerWorks, March - June 2009) walks you through building a simple photo-sharing Web site using Perl and Apache to access Amazon's Simple Storage Service (S3) and SimpleDB:
    • Part 1 introduces the benefits and drawbacks of S3 and SimpleDB by taking a tour of their architectures.
    • Part 2 shows you how to upload a file into S3 from a Web page through an HTML form to minimize the load on the server and maintain a tight security policy.
    • Part 3 details how the URL creates a SimpleDB record for the uploaded file and how to create, edit, and delete comments as SimpleDB records on a photo for a particular user.
    • Part 4 examines the full mod_perl site's code base, including how to configure the top level, what to do with the handlers, and how to set up external dependencies.
    • Part 5 examines the full mod_perl site's templates, including one for indexing, three for uploading (general, S3 forms, and URL additions), one for image and comment browsing, and one for browsing comments recursively for an image (or threading down).

  • Did Moose intrigue you? Check it out. Ditto with File::Slurp.

  • For more on SOAP and REST, try the developerWorks SOA and Web services zone.

  • In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.

  • See all Linux articles and Linux tutorials on developerWorks.

  • Stay current with developerWorks technical events and webcasts.

Get products and technologies

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

photo- teodor zlatanov

Teodor Zlatanov emerged with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, database architectures, user interfaces, and UNIX system administration.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Web development, Open source,
ArticleID=463266
ArticleTitle=Cultured Perl: Storage management on Amazon S3
publish-date=01202010
author1-email=tzz@lifelogs.com
author1-email-cc=

Next steps from IBM

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers