By Beth L. Hoffman, IBM and Chuck Calio, IBM
In February, IBM along with other big data industry leaders announced a new initiative called Open Data Platform (ODP) to advance the collaboration and innovation of big data technologies. The focus is on innovating around the Apache Hadoop open source core, growing the ecosystem and enabling solutions on a standardized Open Data Platform across the ecosystem partners.
IBM was one of the founding partners of ODP and has helped pave the way to enable any company to join the Open Data Platform. Members can be ISVs, System Integrators, Distribution vendors, or end users from any client, and we are already seeing a variety in the existing membership. Members have the privilege of directly participating in the definition of the ODP core platform and can help shape the future evolution. A list of the current platinum members include GE, Hortonworks, IBM, Infosys, Pivotal, Telstra, and SAS.
One desired outcome of the ODP is to bring new big data solutions to the market more quickly. This will be achieved by making it easier for the ecosystem vendors to enable and test on a well-defined common Hadoop core platform. The common platform defines a specific set of common core Apache Hadoop components and versions. With all the partners testing and certifying to this same common core, customers can test, deploy and gain value from their big data environments more quickly knowing that a set of big data solutions all align to the same common Open Data Platform. Ecosystem partners don’t have to guess which environments to target and customers don’t have to ask every participating partner which environments they support. Time to value is accelerated and risks are reduced.
With these industry leaders aligning to the same common platform, we are confident that the functionality and quality of service of the environment will grow over time. The Apache Software Foundation will benefit from the ODP as all the innovations made to the core will be contributed back to the larger Apache Hadoop community, and as a result benefiting anyone using Hadoop.
Initially, the ODP core is focused on two key Apache components: Apache Hadoop (which includes HDFS, YARN, and MapReduce) and Apache Ambari (which is used for installing and managing Hadoop across a cluster). The intention is for the scope to broaden to include other open source projects and components, and we anticipate new announcements when component decisions are made. We expect that key technologies like Spark will become part of the ODP common platform.
IBM recently announced IBM Open Platform with Apache Hadoop (IOP) which is available on IBM Power Systems. IOP is IBM’s first product that is ODP compliant. It is 100% open source and is tested and certified to the ODP Core. Software optimizations that take advantage of IBM POWER’s unique capabilities are part of the ODP open source code base.
IOP not only supports ODP but it also includes Spark support. Spark already runs two times faster on POWER8 as described in this blog post. Beyond that, work is going on to extend the capabilities and benefits of Spark even further through the enablement of POWER8 accelerators. For example, CAPI Flash will be exploited to expand the in-memory advantages of Spark for RDD caching and intermediate shuffle data. The result will be the ability to run larger Spark workloads with less memory without sacrificing performance. Additional integration of other OpenPOWER technologies such as CAPI FPGA accelerators, GPUs and CAPI Networking will lead to more advances in Spark performance across query processing, machine learning, graph and streaming workloads. These are just a few examples of the work going on to make POWER8-based systems the premier platform for ODP deployment. All of these POWER8 benefits will be contributed back to the Apache Community and will be available to all solutions that leverage the Open Data Platform with POWER8 and OpenPOWER.
IOP is available as a free download for all clients for unlimited trial use or production use with an Elite Support option. IOP provides a consistent environment for clients and ISVs to develop their Big Data and Analytics projects without being concerned with vendor lock-in as all ODP members align on the ODP common components. For more information and access to the free download of IBM Open Platform with Apache Hadoop see: http://www.ibm.com/software/products/en/ibm-open-platform-with-apache-hadoop
And, join Power Systems on October 5th for a webcast highlighting new capabilities and product announcements that will help you go faster than ever before! http://bit.ly/1OcrNru