One system. Half a Teraflop. Linux.
Recently the IBM Linux Performance team submitted a LINPACK HPC result of 500 GFlops (half a teraflops!) on a single IBM Power 575 compute node. These water-cooled systems pack significant compute power and memory into a dense form factor.
This result was produced using the 32 4.7GHz POWER6 cores with 128 GB DDR2 533 MHz of memory and the latest RHEL 5.2. Other software includes IBM XL Fortran V11.1 & IBM XL C/C++ V9.0 for Linux Compilers, IBM Engineering Scientific Subroutine Library (ESSL) V18.104.22.168 for Linux on Power, libhugetlbfs v1.0.1 (transparent 16MB large pages) and Open MPI 1.2.5. Compared to non-super computing systems, this was the best result published on a 32 core system as of May 30, 2008, impressive for a dense 2U form factor.
With the introduction of 64KB memory page size support for the Power processors in RHEL 5, applications benefiting from larger page sizes now realize significant performance gains right out of the box. See 64KB pages on Linux for an introduction on how Linux leverages the alternative page sizes provided by the POWER hardware. As usual, performance gains obtained are dependent on the application, software used, and tunings leveraged.
Top 500.org and LINPACK
LINPACK has been a highly respected and utilized benchmark metric for more than 20 years. It was developed to target highly vectorized supercomputers. The following is the description of the LINPACK benchmark from http://www.netlib.org/linpack.
- “LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tridiagonal square. In addition, the package computes the QR and singular value decompositions of rectangular matrices and applies them to least-squares problems. LINPACK uses column-oriented algorithms to increase efficiency by preserving locality of reference.”
The Rpeak of the IBM Power 575 4.7GHz is 602 GFlops, which is the theoretical limit the system can achieve. The Rpeak is calculated by taking the number of cores times the CPU frequency times the number of floating point operations/second. In this case, 32 * 4.7 * 4 = 601.6. At 500GFlops, this combination of RHEL 5.2 and other software on the IBM Power 575 system achieved an impressive 83% performance of the systems maximum capability.
A complete listing of published LINPACK results can be found at http://www.netlib.org/benchmark/performance.pdf.
IBM Power 575 with Linux
The IBM Power 575 system is designed for extreme high performance and parallel computing for customers who require a highly scalable solution. Hundreds of the IBM System 575 frames can be clustered with each frame supporting up to 3.5 TB of memory, 448 Power 6 cores available in building blocks of 14 2U nodes per frame. Along with the benefit of chilled water cooling, this system is the ideal solution for customers requiring scalable, compute intensive systems with economical TCO. For more information on the IBM System 575 see this report.
For more performance related information on LINPACK, RHEL 5, POWER6, and other related benchmark and stack components, check out the paper: An Assessment of Leadership Performance with POWER6 Processors and Red Hat Enterprise Linux 5.1
IBM and the Linux community continue to focus on improving the HPC software stack for customers. This single system result demonstrates the building block used by customers to scale up to tens and hundreds of nodes in HPC clusters.