In classical IT, every data set expires at some point in time. “But not anymore!” I hear those Big Data analysts shout! “We want to store as much data for as long as possible, just in case we find an attractive use case for it in future!” Or, stealing a line from my favorite comic strip, “Calvin and Hobbes”: In Big Data, there’s treasure everywhere! (By the way: You might want to visit http://www.ibm.com/analytics/us/en/index.html to find out how IBM helps you to uncover that treasure…)
And this is where the conflict starts between them and the storage admins…
Storing data is fairly easy, but what about migrating PetaBytes of data from one storage technology to another? You’ll have to touch all your data at this time, regardless whether it’s hot or cold data.
If you have that data in your own data center (or your private cloud), then you’ll need to migrate all that data to the next generation of storage technology at some point in time. Of course this is a time consuming procedure, but it can be mastered with appropriate planning.
But what will you do if your data resides in a public cloud? You don’t need to stress the scenario of “another Nirvanix” to see the point. Even if you don’t want to be tied to a single cloud service provider (and his business / security / billing model), it makes sense to be prepared for moving “from one cloud to another”.
The solution for your problem might already be available: Don’t migrate to another cloud, replicate between them!
IBM recently published a press release about a “Multi-cloud Storage Toolkit”. This toolkit contains the so-called InterCloud Storage (ICStore), a software approach to store data on multiple clouds (public and/or private!) to guarantee service continuity.
The InterCloud could be described as a “cloud-of-clouds” that offers stronger resilience and protection against service outages and data loss than any single cloud could deliver.
The main innovation of ICStore is that you can stripe the data across the cloud providers using erasure coding, which means that fragments of your data are uploaded to a qualified majority of cloud providers. Compared to replication (generating multiple copies of a single file) it has the advantage that each cloud provider only gets fragments of your data. You see the advantage in data privacy.
Additionally, consider that data ingestion and retrieval require lower bandwidth than with complete replicas, but if one cloud provider fails, you will still have full access to your data.
The InterCloud Storage technology was demonstrated in June at the IBM Edge 2013 conference in Las Vegas in conjunction with the IBM Storwize platform and is available for early trial testing.
The following picture visualizes this concept:
If you want to learn more, you can read the related press release at http://www-03.ibm.com/press/us/en/pressrelease/42684.wss, or visit IBM at CeBIT in Hannover, where this technology will be showcased at the IBM booth!