Taking The Apps To The Data
powers-old-account 270000NC1K Visits (1731)
A couple of weekends ago, I had the chance to attend CloudCamp RTP. I'm used to having to travel thousands of miles to attend interesting conferences, or at least having to drive 30 minutes to Raleigh for cool BarCamps. But CloudCamp RTP was held in Carrboro! Now, technically I live in the Chapel Hill town limits, but all my hanging out time is spent in Carrboro, especially the world's best Third Place, the Open Eye Cafe. So of course I had to attend!
I'm proud to say IBM was one of the main sponsors of the event and Troy Volin from IBM had an interesting talk about the lessons that can be learned from recent high profile cloud service failures.
The presentation that blew me away was by Stuart Jeffreys, Phd, from the University of North Carolina at Chapel Hill Lineberger Cancer Center. His talk was "The Genomics Data Crisis." Sadly. his presentation has not been posted online. So I'll have to summarize. He walked through the amount of storage needed to store genomic sequencing samples and the IT challenges it presents. We're talking crazy big amounts of data and then he walked through some very rough back of the envelope math about the IT costs associated with that much data. Then he proceeded to show the rates of growth in the amount of genomic data that's being collected and stored and it's very easy to see a crisis coming.
In his analysis, he shows that costs associated with storage of this huge amount of data and the costs associated with analysis of this huge amount of data is dwarfed, by at least an order of magnitude, by the costs of down
This reminds me of a brief analysis I did a long time ago when I was looking at whether or not it was cost effective to store the MP3 files associated with my podcasts in Amazon S3. My conclusion then was essentially the same, The transfer costs to download files out of S3 dwarfed all other considerations by almost an order of magnitude and Amazon S3 was not a viable option for storing media files for my podcast.
So what's the solution? Bigger pipes to the data repositories? Sure. Well, Maybe. Never say no to more bandwidth, that's my motto. But I chatted with Stuart during a break and he said that they are looking at ways to avoid the huge transfer costs. The bottom line is, they are looking to take the application to the data, not the data to the application.
And, oh by the way, this type of approach is more secure as well. By keeping the data centralized and protected by a well known, secured application environment, you can protect the sensitive genomic data and avoid costs like $20 million associated with the theft of a single laptop.
Sure, cloud computing has many security challenges associated with protecting the platform from outside attacks as well as making sure co-tenants are protected from each other. This is often seen as a drawback of cloud computing. But when you step back and look at the some of the big picture issues, cloud computing also becomes a key strategy in saving money and a primary security control for protecting Big Data.