July 17, 2013 | Written by: Edgar Garcia
Share this post:
Big data doesn’t just mean a lot of data.
It does, of course, involve a large volume of data being generated at an always-increasing velocity, but it also refers to the variety of structured and unstructured sources used, and thus the intrinsic differences in the veracity of data.
Due to this qualitative difference with traditional data, we would also expect a qualitative difference on the results we’d obtain when analyzing big data. That’s why it’s so attractive to try to get insights from it.
Big data processed in the cloud
When analyzing big data, we have to first explore the data sets because we may not know everything about the content beforehand. In this way, we won’t be restricted by a priori queries we have formulated. We have to consider that our findings will be shaping the next steps in our analysis.
In order to run this kind of analysis, we need a similarly large amount of computing power delivered dynamically as we’re deciding the operations we want to perform.
If you think cloud computing is perfect for this job, you’re not alone. There is a huge movement toward taking advantage of the IT efficiencies, speed and flexibility that the cloud could provide to process big data.
Data scientists are creatively using whatever sources are available for these studies. Everything from medical records or sensors around cities to TV ratings and social media gossip is being crunched by cloud services to try to discover useful information. But what about the cloud itself?
Big data acquired from the cloud
Consider a cloud provider running thousands of servers and other equipment plus a larger number of virtual machines in order to deliver different services to a massive number of users. Can you imagine the amount and diversity of log files that they would produce?
They would be unable to store them for a long time, but they would still have a gold mine of IT service information.
Some data centers such as the cloud data center for the IBM instructor-led online training site are analyzing all this data before they have to drop it. They are correlating business data and provisioning process statistics, application and systems logs, and even electricity consumption data, to find out useful patterns for problem anticipation and to enhance the overall business efficiency of the data center.
There is great potential in the insights related to performance, configuration, management processes or security and compliance that can be obtained by analyzing cloud infrastructure operational data, not to mention enhancement to services that can be achieved by examining application usage patterns.
I think this would provide great feedback for cloud builders and application developers. If you’re in this domain, it would be great to know what kind of information from big data would be valuable for you. Share your thoughts on cloud and big data below or connect with me on Twitter.