Share this post:
The Apache Spark community and enterprise Spark adoption have both been growing rapidly. Many enterprises are now experimenting with Spark as an in-memory engine to accelerate many common analytics workloads. However early adoption often results in individual, isolated Spark clusters as different lines of business or functional groups set up their own infrastructure to learn to take advantage of the power of Spark. As initial adoption moves into production, IT organizations need to respond with systems than can efficiently host these different groups on shared resources. They must also provide an optimized infrastructure that delivers even faster time-to-insights on a manageable platform.
How IBM Spectrum Conductor with Spark provides outstanding management of Spark applications
IBM Spectrum Conductor with Spark integrates the Apache Spark framework to run multiple instances of Apache Spark, including different versions simultaneously in a shared multi-tenant environment. This capability helps reduces complexity and helps users manage Apache Spark in the face of frequent updates to open-source Spark distributions. IBM Spectrum Conductor with Spark uses high-efficiency resource scheduling technology to put idle resources to work running Spark jobs, speeding time to results and optimizing resource utilization. In addition, it provides critical monitoring, alerting, reporting and diagnostic capabilities required to run Spark in the enterprise.
How Power Systems provides optimal time-to-insights for key Spark workloads
IBM Power Systems with the POWER8 processor provides an ideal environment for deploying Spark applications and accelerating big data workloads thanks to industry-leading memory bandwidth, cache size and processor performance. Some of the most popular Spark workloads utilize Spark SQL, Spark Streaming, Spark MLlib machine learning and Spark GraphX analytics. SQL and streaming workloads benefit from POWER8’s simultaneous multithreading (SMT) density of up to 8 threads per core–which is 4X more than Intel offers[ref]POWER8 supports 8 threads per core, x86 supports 2 threads per core[/ref]. Machine learning and graph workloads have complex computation often iterating over the same data set; such workloads benefit from POWER8’s large memory bandwidth[ref]Up to 4X depending on specific x86 and POWER8 servers being compared[/ref] and caches[ref]Up to 4.5X more cache comparing Intel e7-8890 servers to 12 core POWER8 servers [/ref]–also 4X more than Intel offers. The balanced system design of the POWER8 servers ensures maximum utilization across the compute, memory, cache and I/O resources of the individual servers. The net result is a 2X[ref]All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL RDD Relation, Logistic Regression, SVM
6 Data Nodes and 1 Management Node. Each node is IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.92GHz, 256 GB memory, RedHat 7.2, Spark 1.5.1, OpenJDK 1.8
6 Data Nodes and 1 Management Node. Each node is x86 E5-2620V3 12 cores / 24 threads, E5-2620 V3; 2.4GHz, 256 GB memory, RedHat 7.1, Spark 1.5.1, OpenJDK 1.8
[/ref] per-core average performance advantage across key Spark workloads, which translates to faster insights and more efficient clusters capable of hosting multi-tenant environments.
The combined solution value
Together, IBM Spectrum Conductor with Spark and Power Systems offer an integrated solution ideal for running multi-tenant enterprise Spark deployments with blazing speed and efficiency. The IBM Data Engine for Hadoop and Spark, built with storage-dense S812LC POWER8 servers, offers an integrated cluster configuration that delivers a ready-to-use environment for deploying IBM Spectrum Conductor with Spark.