IBM Support

New results for Hadoop performance testing

Technical Blog Post


Abstract

New results for Hadoop performance testing

Body

image

By: Anirban Chatterjee.

Last year, our research team published a research paper showing how a 10-node Hadoop cluster of IBM PowerLinux 7R2 servers could sort through a terabyte of data in less than 9 minutes.  At the time, this beat the best known result achieved with a comparable cluster composed of x86 nodes by over a factor of two.
 

The team has not been standing still, however.  With the launch in February of our new 7R2s that included enhanced POWER7+ processors, the team has pushed the envelope even further on these systems and, with a similarly sized cluster, is now able to sort a terabyte of data in less than 6.7 minutes.
 

The IBM China Research Lab reached this milestone using a 10-node cluster running RHEL 6.2 and Hadoop 1.1.3, managed with IBM Platform Symphony.  The cluster comprised of one master control node and nine compute nodes.  At 16 cores per compute node, this amounts to a sorting rate of 1.04 GB/min/core.  (By comparison, a recent benchmark using an 18-node Cloudera Hadoop cluster of HP ProLiant Gen8 DL380 systems achieved a sorting rate of 0.57 GB/min/core.*)
 

We’ll have more information on the details of the testing environment coming soon, but proof points like this show the ability of Power Systems and Platform Symphony to provide high performance data analytics platforms at a reasonable cost.  IBM solutions can provide rapid results to big data challenges, often in half the time as other solutions.

*http://www.hp.com/hpinfo/newsroom/press_kits/2012/HPDiscover2012/Hadoop_Appliance_Fact_Sheet.pdf

 

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power ->PowerLinux"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm16171387