Data scientists need a cloud sandbox

Big data is not just about scaling your data analytics processing platforms to keep up with the onslaught of new information. Just as important, big data is about bringing together your best and brightest minds–your data scientists–and giving them the tools they need to interactively and collaboratively explore rich information sets.

Data scientist productivity is a critical concern, especially when you’re talking about high-priced talent in short supply. If you don’t provide your data scientists with scalable modeling platforms, you won’t realize the full value of your investment in big data.

Today’s statistical modelers and business analysts need high-performance cloud-centric development platforms–often known as “sandboxes”–where they can aggregate and prepare data sets, tweak segmentations and decision trees, and iterate through statistical models as they look for deep statistical patterns.

Big data sandboxes are where you develop the all-important intellectual property – advanced analytic models – that extract intelligence from otherwise inchoate gobs of content. To be as productive as possible, teams of data scientists must have massively parallel cloud-computing resources–including CPU, memory, storage, and I/O capacity–at their fingertips, available within their sandboxing platforms and in the operational cloud environments to which they will deploy their models.

If you fail to provide them with the cloud-based scalability they need to run a growing range of jobs, you’ll be wasting their time as they queue up for access to limited processing and storage resource.

Sandbox scalability is critical, but it’s more than just raw horsepower. Your sandboxing platform must also embed comprehensive, extensible libraries of reusable algorithms and models for advanced analytics. Today your data science requirements may revolve around traditional statistical analysis, data mining, and predictive modeling, and these libraries should be included in all of your sandboxing environments. But your data scientists will increasingly need to incorporate libraries of MapReduce, R, geospatial, matrix manipulation, natural language processing, sentiment analysis, and other advanced analytic algorithms as well.

And don’t skimp on training and other skills-enhancement initiatives to ensure that you have sufficient numbers of the right kinds of data scientists for your big-data projects. Data science’s learning curve is formidable. Your organization may need to establish a data-science center of excellence and a structured training curriculum to ensure you have the right kinds of professionals who’ve mastered this demanding discipline.

Here, for your inspiration, are several IBM resources on the topic of data scientists in the business:

And here are several blogs that I authored examining various aspects of data scientists in modern business:

Last but not least, we will be holding a Twitter chat on “The Rise of the Data Scientist,” on May 9 from 4-5 p.m. ET. We invite you to join us on this chat, using hashtag #cloudchat. I will be one of the panelists, along with’s chief scientist Hilary Mason and STORM Insights founder & CEO Adrian Bowles. More info on the chat can be found here.

We look forward to engaging you further on this exciting topic.

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Archive Stories

Analysis of cloud computing patent holders

Prior to becoming an IBMer, I was fortunate to work at an intellectual property acquisition, management and consulting firm, Transpacific IP, headquartered in Asia. This unique experience gave me first-hand knowledge of how patents can serve as a vital strategic value to a corporation. As there has been little discussion regarding cloud computing patents, I […]

#Cloudchat: Join us for our second Twitter chat July 14

Our second installment in the #cloudchat series will focus on a topic that came up as a top hindrance to cloud adoption in our last chat: security. This Thursday, July 14 at 4 p.m. EDT, we’ll discuss what questions you should be asking your cloud provider about security as well as how to implement a […]

From self-service to outsourced; a cloud migration

Would you trust the most critical aspects of your business and its reputation to a “do-it-yourself” (DIY) approach?