By: Anirban Chatterjee.
Last year, our research team published a research paper showing how a 10-node Hadoop cluster of IBM PowerLinux 7R2 servers could sort through a terabyte of data in less than 9 minutes. At the time, this beat the best known result achieved with a comparable cluster composed of x86 nodes by over a factor of two.
The IBM China Research Lab reached this milestone using a 10-node cluster running RHEL 6.2 and Hadoop 1.1.3, managed with IBM Platform Symphony. The cluster comprised of one master control node and nine compute nodes. At 16 cores per compute node, this amounts to a sorting rate of 1.04 GB/min/core. (By comparison, a recent benchmark using an 18-node Cloudera Hadoop cluster of HP ProLiant Gen8 DL380 systems achieved a sorting rate of 0.57 GB/min/core.*)