As the planet becomes more integrated, the rate of data growth is increasing exponentially. This data explosion is rendering commonly accepted practices of data management to be inadequate. As a result, this growth has given birth to a new wave of business challenges around data management and analytics. Based on this need, the IBM Power System S812LC was used to design a solution to create a big data environment built on a heritage of strong resiliency, availability and security: The IBM Data Engine for Hadoop and Apache Spark. Here's five things you should know:
The IBM Power System S812LC
- IBM offering that combines the line of OpenPOWER Linux servers designed for big data and analytics with an open source Apache Hadoop and Spark distribution along with optional advanced analytics capabilities.
The IBM Data Engine for Hadoop and Spark is a fully integrated infrastructure solution with integrated cluster management and analytics software that is optimized for Hadoop-based and Spark-based workloads. The solution is designed to deliver superior price and performance for these workloads and at the same time improving ease of deployment and cluster operational simplicity for clients deploying big data and analytics applications to support their businesses.
- Data Engine for Hadoop and Spark offers a range of configurations based on the new storage-dense, analytics-optimized S812LC line of IBM POWER8 servers. Choosing an infrastructure that can scale to handle these demands is vital to meeting service level agreements and continuing access to insights. Businesses choose Power Systems because they know Power Systems is built for big data workloads that demand high performance and high reliability.
- The server, networking, storage, and software components are pre-integrated and tested prior to delivery. Services are available to quickly bring the cluster into initial operation. The hardware and software components in this infrastructure are customizable to allow the best performance or the best price/performance ratio.
- Spark workloads benefit from large, fast memory and lots of processor threads. The IBM Power System servers use the POWER8 chip, which has up to 8~10 cores per socket. With SMT8 technology, the POWER8 chip has eight threads per core for running parallel Java workloads, which takes maximum advantage of the processing capability. The POWER8 chip has high memory and I/O bandwidth, which is critical for a Big Data system to achieve performance.
- Hadoop workloads require large storage capacity, high-speed networks, and a resilient cluster file system
The cluster is integrated with IBM Platform™ Cluster Manager, IBM Open Platform with Apache Hadoop and Spark and optionally IBM Spectrum Scale and IBM Spectrum Symphony which include advanced capabilities for storage and resource optimization. This optimized configuration enables users to show results more quickly.
For more information on the IBM Data Engine for Hadoop and Spark, refer to the following:
IBM Data Engine for Hadoop and Spark IBM Redbooks Publication
IBM Power Systems Redbooks
IBM Power Systems Home Page
IBM Data Engine for Hadoop and Spark – Power Systems Edition
IBM Power System S812LC
Dino E. Quintero
Project Leader for IBM Redbooks on Cloud, Analytics, DR, and HPC Solutions