Skip to main content

Offload your multimedia content and bandwidth to Amazon using PHP

Increase site reliability by using Amazon’s S3 remote storage service to host media files

Jack D Herrington (jherr@pobox.com), Senior Software Engineer, Leverage Software Inc.
A senior software engineer with more than 20 years of experience, Jack Herrington is the author of three books: Code Generation in Action, Podcasting Hacks, and PHP Hacks. He is also the author of more than 30 articles.

Summary:  Save disk space and bandwidth, and increase the reliability of your site by using the Amazon Simple Storage Service (S3) remote storage service to host your media files. You'll also improve the reliability of your site as it serves the increasingly large multimedia files that are so popular in the Web 2.0 world.

Date:  23 Jan 2007
Level:  Intermediate
Activity:  2450 views

With Web 2.0 came the popularization of multimedia on the Web. Flikr served millions of images just weeks after going live. The postage stamp-size videos of Web 1.0 were replaced with the full browser-size movies on Google Video or YouTube. But where does that leave the small PHP application developer? What happens when you want to host tons of images and huge video files? Can your US$9 hosting account afford all that space and bandwidth? And will the hosting company's Internet connection keep up with the traffic? Probably not.

Fortunately, Amazon comes to the rescue with a completely new type of consumer-level Web service: remote storage. For just pennies, you can store and have Amazon host gigabytes of any type of data you want. Your site can use the space as an image store or you can use Amazon to host your backups.

In this article, I use Amazon S3 to host multimedia files using a set of PHP pages. Although there are some stumbling blocks, Amazon makes the process easy by providing several methods of uploading and retrieving content.

More about Amazon S3

Getting started with the Amazon S3 storage service is easy. Start at the Amazon Web Services (AWS) store, select the storage service you want, and click Subscribe. From there, set up your billing mechanism -- typically a credit card -- even the same card you use to buy your books and DVDs. Amazon will bill your card for each gigabyte of data you upload or download. After you've set up your storage, you will receive an e-mail containing a link with which you can access your account ID and secret key.

From there, you must understand two key concepts: buckets and objects. A bucket is like a directory on your hard disk. An object is a named block of data inside a bucket. You can have in it whatever you like, which is why Amazon uses object instead of file. For this example, I'll be uploading image files to Amazon S3, so every object will correspond to one file.

Naming your buckets

You can name a bucket whatever you like, but be warned: The name space of buckets is shared with everyone else on the service, so name your bucket something unique. Objects within your bucket can have any name you choose.

Amazon S3 supports multiple methods of creating, editing, and deleting buckets and objects within buckets. You can use SOAP if you prefer. Or, as is the case here, you can use a Representational State Transfer (REST) protocol with the curl command-line utility to issue GET, PUT, and DELETE commands over HTTP to the Amazon S3 servers. The PUT command creates a bucket or object, DELETE deletes the bucket or object, and GET retrieves information about the bucket or data from the object.

Objects can have several levels of access control. For our purposes, the two that matter are private, which means that only the owner of the bucket can read its contents, and public-read, which means that anyone can read but not modify the contents. I'm going to use the public-read option so I can use Amazon S3 to serve up my images. The URL of the images is in either this format: http://[bucketname].s3.amazonaws.com/[object] or http://s3.amazonaws.com/[bucketname]/[object]. In the case of my image upload application, the URL for an uploaded image would look like this: http://jherr_photos.s3.amazonaws.com/IMG_2912.jpg. I think that's really clean.


The example application

The application I'm going to create is straightforward. I'm going to have one page that has a form that accepts a file. That page will then post to an upload page that will add a new object to my Amazon S3 bucket with the contents of the image. This concept is shown in Figure 1.


Figure 1. Uploading images to Amazon S3
Uploading images to Amazon S3

Actually, the script really doesn't care what I upload. It just sends the contents of the file -- and the associated MIME type from the uploaded file -- to Amazon S3. So I could put up movies or anything I like.

When I have some images on Amazon S3, another page will show me the contents of the bucket. This concept is shown in Figure 2.


Figure 2. The flow of data for the image page
The flow of data for the image page

What's important to note is that the HTML page data is coming from my server, but the image data is coming from Amazon S3. Yes, that means that I pay bandwidth costs to Amazon. But it also means that I pick up Amazon's massive data pipes, data center redundancy, and all its infrastructure, which is something that PHP hosting accounts won't have. So for large data files, such as movie-type images, the reliability of the server hosting the data and the size of that Internet pipe is very important.


The upload page

The first step in creating the application I've proposed is to set up an Amazon S3 account. After that, the trick is to get some PHP code that will connect to the Amazon S3 service. Unfortunately, neither Amazon nor PEAR have Amazon S3 classes for PHP at the moment. So, I went searching around and found an Amazon S3 class from Geoff Gaudreault that handles all the basics. The source code is too long to include as a listing here, but it is in the Download section.

Gaudreault's Amazon S3 class requires that you install the Crypt_HMAC module from PEAR, so step 3 is to install that module using the following command:

% pear install Crypt_HMAC

When that's done, edit Gaudreault's s3.class.php file to include your AWS key and your secret key. The secret key is used to encrypt portions of the request so Amazon can ensure that code that you wrote that is sending them requests. Because it's a pay service, you don't want to give anyone that secret key.

To start the implementation, you need a page with a form on it that will allow for file uploads.


Listing 1. Index.php

<html>
<body>
<form enctype="multipart/form-data" action="upload.php" method="post">
<input type="hidden" name="MAX_FILE_SIZE" value="2000000" />
<input type="file" name="file" />
<input type="submit" value="Upload" />
</form>
</body>
</html>

The upload page is shown in the browser.


Figure 3. The upload page
The upload page

The index.php script submits the file to an upload page.


Listing 2. Upload.php

<?php
require( "s3.class.php" );

$srvc = new S3();
$srvc->putBucket( 'jherr_photos' );

$tmpfile = 'tmpfile';
move_uploaded_file( $_FILES['file']['tmp_name'], $tmpfile );
chmod( $tmpfile, 0777 );

$fh = fopen( $tmpfile, 'rb' );
$contents = fread( $fh, filesize( $tmpfile ) );
fclose( $fh );

$srvc->putObject( $_FILES['file']['name'], $contents,
  'jherr_photos', 'public-read',
  $_FILES['file']['type'] );

unlink( $tmpfile );
?>

The first thing you do is include the Amazon S3 PHP library. Then create an Amazon S3 object and build the bucket that will contain the objects. Amazon S3 is kind enough simply to ignore a request to build another bucket if one is already there. So after the first time you run this code, the bucket create request is ignored.

After creating the bucket, move the uploaded file to a place where you can read it and change its permissions so you can get to the content. You then read the entire file into the $contents variable. From there, use the S3 object to add the file to the bucket. If a file already exists with that name, the contents are simply replaced. At the end of the script, you delete the temporary uploaded file.


Retrieving the images

The next step is actually to see the uploaded images on a Web page. To do so, you must build another PHP script called list.php.


Listing 3. List.php

<?php
require( "s3.class.php" );

$srvc = new S3();
$resp = $srvc->getBucket( 'jherr_photos' );
preg_match_all( "/\<Key\>(.*?)\<\/Key\>/", $resp, $found );
?>
<html><body><table>
<?php foreach( $found[1] as $key ) { ?>
<tr><td>
<img
 src="http://jherr_photos.s3.amazonaws.com/<?php echo($key) ?>" />
</td></tr><?php } ?>
</table></body></html>

The script starts by including the library and creating the S3 object. You then use the getBucket() method to get the current contents of the bucket. That information is returned as a string that contains XML code. The XML code has a lot of material in it, but most important are the names of the files, which are stored in <Key> tags. You could use an XML parser to read out the <Key> tags, but it's easier in this case to use a regular expression.

After you have the array of the found <Key> tags, you create a table in which each row has a single image tag that uses the file name in the source. To test this process, I uploaded a couple of Christmas shots of myself and my family, then went to the list.php page. The result is shown in below.


Figure 4. The list page with the pictures from Amazon S3
The list page with the pictures from Amazon S3

I would consider this the HelloWorld version of an Amazon S3 example. It's about as simple as it gets. I could use command-line scripts, but it's more interesting to see it in the browser. You can see from this example just how easy it is to use Amazon S3.


Pitfalls

This example is deceptively simple because most of the complexity is contained within the S3 class. While most of the Web requests are relatively straightforward, the problems come up with the signature portion of the request. Amazon requires each request to be signed using a secret key that only you know. And that signing process can be tricky and difficult to debug. Thankfully, the S3 class hides this complexity.

With the S3 class in hand, you won't have issues with the signature process. But if you do, I have found that an hour or two spent methodically going through the documentation and using Amazon's signature tools help to get around the problem. After that, using the Amazon S3 service and its sister -- Amazon Simple Queue Service (SQS) -- is easy.


Amazon S3 and its world

Amazon S3 is just one of a set of services that exist in a larger context. The two other services that relate specifically to Amazon S3 are Amazon SQS and the amazingly named Amazon Elastic Compute Cloud (EC2).

Amazon SQS offers the ability to have applications communicate with each other through named queues in which one application inserts messages based on various events (for example, "added a user," "requesting report") and other applications read and process those messages, then delete them from the queue. This functionality is similar to the service TIBCO provides and can be useful in allowing for enterprise applications to be coupled in a loose fashion.

The Amazon EC2 service, which was in beta at the time of this writing, allows you to use the compute power of Amazon's server pool in an on-demand fashion. You create an Amazon Machine Image of your application, then upload that image to Amazon S3. You then make requests to the Amazon EC2 service to start and stop your processes, as well as monitor them. This functionality can provide massive processing power on demand -- if you've written your application to take advantage of it.

When you look at all three services as a package, in which you use Amazon S3 as the disk, Amazon SQS as the messaging backplane, and Amazon EC2 as the process-management system, it becomes clear what Amazon is trying to provide. Amazon wants to become a vendor of on-demand computing power to small and medium-size businesses. To that end, any use of Amazon S3 from the Amazon EC2 system is free of charge.


Conclusion

Amazon S3 can be useful for many things beyond offloading some of the bandwidth from your site. Here are some other ideas:

Backups
Use Amazon S3 to store nightly backups of your database. Amazon's S3Curl command-line script that makes doing so easy.
Shared storage
Instead of using USB drives, create an Amazon S3 folder, then use a tool like Jungle Disk, which mounts Amazon S3 like a hard disk, to create a shared repository for files. You can use this repository among your own computers or between you and your team members. You can even use it between you and your remote Web server.
Small Web sites
By uploading HTML and images to Amazon S3, you can host small, static Web sites on Amazon S3 for just pennies.
Podcasting, videoblogging, or photoblogging
Using Amazon S3, you can upload your podcast media files, as well as the RSS V2.0 XML and run a podcast straight off Amazon S3.

The more I use Amazon S3, the more I like it. The interface is simple, and the system is reliable. Even better, the prices are affordable. Give Amazon S3 a shot and see whether you can use it in your own Web applications.



Download

DescriptionNameSizeDownload method
Source codeos-php-amzmm.s3class.zip4KB HTTP

Information about download methods


Resources

Learn

  • Check out Amazon S3.

  • PHP.net is the resource for PHP developers.

  • Check out the "Recommended PHP reading list."

  • Browse all the PHP content on developerWorks.

  • Expand your PHP skills by checking out IBM developerWorks' PHP project resources.

  • To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.

  • Stay current with developerWorks' Technical events and webcasts.

  • Using a database with PHP? Check out the Zend Core for IBM, a seamless, out-of-the-box, easy-to-install PHP development and production environment that supports IBM DB2 9.

  • Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.

  • Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.

Get products and technologies

  • The Amazon SQS is also valuable to PHP application developers.

  • Jungle Disk turns your Amazon S3 account into a hard drive on your computer desktop.

  • Interarchy is an FTP client for Macintosh that can attach to Amazon S3 accounts.

  • Amazon S3 Authentication Tool for Curl is a Perl-based command-line tool for accessing Amazon S3 accounts.

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

Discuss

About the author

A senior software engineer with more than 20 years of experience, Jack Herrington is the author of three books: Code Generation in Action, Podcasting Hacks, and PHP Hacks. He is also the author of more than 30 articles.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=189071
ArticleTitle=Offload your multimedia content and bandwidth to Amazon using PHP
publish-date=01232007
author1-email=jherr@pobox.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers