If you are a technologist like me, you know big data is a big deal—but what is it, and why should we care about it? And how, exactly, do we use it?
Big data transforms the way our society processes information and will change the way we think about our world. It allows us to analyze vast amounts of less than perfect data, and find correlations, even if those correlations are totally non‑intuitive, surprising insights.
For example, my favorite music app would use big data to know that if I like to listen to Lady Gaga and Carlos Santana, I also might like to listen to Adele (well, it wasn’t obvious to me). And when I say “vast,” I mean really, really gigantic amounts of data, sometimes zettabytes of data (one zettabyte is the number one followed by 21 zeros, bytes). University of California San Diego reported that in 2008 Americans consumed 3.6 zettabytes of information mainly in computer games and TV. This field is relatively new in practice because of the availability now of huge amounts of inexpensive computing resources (processors, main memory, and especially disk storage/persistent storage).
I find one of the most important open technologies being utilized for big data in the world comes from Apache Hadoop and related open source projects, but there are many different distributions based on the projects that are available. The differences in the distributions often cause problems in moving big data applications between one distribution and another. Also, Hadoop alone is insufficient for most big data users. It requires additional capabilities –for example, provisioning, management, and monitoring functions.
Enter the Open Data Platform Initiative (ODP) with its Open Data Platform Core (ODP Core). ODP intends to provide a single integrated, tested big data platform with stable, well defined interfaces. This would allow big data consumers of Hadoop and related open technologies to focus on innovations built with that core platform. They no longer need to spend their energies repeatedly integrating and testing multiple distributions of multiple open source projects themselves. This will accelerate and broaden adoption of open big data solutions, expanding the market for both consumers and vendors of big data.
IBM will be a Platinum Member of ODP with the goal of helping to expand adoption. We plan to commit significant resources to help free up platform consumers’ resources and increase the core platform’s portability and compatibility. ODP intends to be an inclusive community that all are welcome to join. The community will have membership tiers to fit every organization’s needs—no matter the size. We hope this will encourage users to participate in the process directly. The bylaws and structure are still under development but those joining are committed to these principles. I look forward to the growth in membership that is sure to follow.
And while IBM will be doing some of the heavy lifting at ODP, I expect to see many of our clients contributing to ODP as well. I hope these clients will help direct our efforts toward integrating the open technologies they really need to accelerate their big data use. They will also share the insights that ODP brings to their business.
Indeed, big data is a big deal!
Share this post: