Using HPC Challenge with RHEL 5.2 on Power

This page has not been liked. Updated 4/12/13 12:22 PM by hravalTags: None

Using HPC Challenge with RHEL 5.2 on Power6 systems

For discussions or questions...

To start a discussion or get a question answered, consider posting on the Linux for Power Architecture forum.

Additional Linux on Power performance information is available on the Performance page

While this article focuses on RHEL 5.2 with Power6, a new article is being drafted for tuning HPCC on Power7 systems .

Contents

 

In this paper we introduce how to take advantage of Red Hat's latest operating system level (RHEL 5.2) with a standard clustered HPC workload called HPC Challenge. This paper highlights HPC software components like OpenMPI which comes standard with RHEL 5.2 and leveraging either the standard gcc which comes with Linux or the IBM commercial compilers available for Power systems.

The paper will cover installing, building, running, basic tuning, and how to analyze and compare the results. We highlight the easy and significant performance gains which are possible when leveraging IBM's compilers and ESSL.

We took advantage of an IBM Power 575 system with 32 POWER6 cores running at 4.70Ghz with 128GB memory for these engineering examples.


Introduction

The HPC Challenge benchmark consists of seven separate workloads building on the success of the top500.org Linpack HPL based workload. The benchmark was designed to better measure the overall performance of high end HPC (High Performance Computing) systems running kernels with complex memory patterns. For example, HPC Challenge measures the performance of the system's processors, memory, network bandwidth, and network latency.

HPC Challenge was written by Jack Dongarra, who is part of the Innovative Computing Laboratory (ICL) at the University of Tennessee, along with the following contributers: David Bailey, Jeremy Kepner, David Koester, Bob Lucas, John McCalpin, Antoine Petitet, Rolf Rabenseifner, Daisuke Takahashi, and R. Clint Whaley. To get more information, including biographies and webpage links, see HPCC Collaborators.

The seven workloads include:

  1. HPL, or High-Performance Linpack Benchmark - measures a system's floating point rate of execution for solving a linear system of equations
  2. DGEMM - measures a system's floating point rate of execution for solving double precision real matrix-matrix multiplication
  3. STREAM - measures a system's sustainable memory bandwidth and computation rate for simple vector kernels
  4. PTRANS, or Parallel Matrix Transpose - measures the communications capacity of the network
  5. RandomAccess - measures a system's rate of integer random updates of memory
  6. FFT, or Fast Fourier Transform - measures a system's floating point rate of execution for solving double precision complex one-dimensional Discrete Fourier Transform
  7. Communication bandwidth and latency tests - measures a system's network latency and bandwidth

To find out more information about the HPC Challenge benchmark, see http://icl.cs.utk.edu/hpcc/

We have found the benchmark easy to quickly build and get results. Tuning and interpreting the basic results were straight-forward, but we observed that detailed tunings and optimizations will take more time to learn.

 


Installing

To download the latest version of HPC Challenge, click the download link here: http://icl.cs.utk.edu/hpcc/software/index.html

Un-tar the source code tar ball and it will create a hpcc-<ver> directory. For example, in our case, we've downloaded Version 1.2 from the hpcc web site into /usr/local.

<download the tar ball to /usr/local/>

# cd /usr/local
# tar -zxf hpc-1.2.0.tar.gz
# ls hpcc*
hpcc-1.2.0.tar.gz
hpcc-1.2.0:
DGEMM  FFT  _hpccinf.txt  hpl  include  Makefile  PTRANS  
RandomAccess  README.html  README.txt  src  STREAM


# cd hpcc-1.2.0
# ls -Fc
DGEMM/   FFT/           _hpccinf.txt  hpl/        include/  Makefile  
PTRANS/  RandomAccess/  README.html   README.txt  src/      STREAM/

Inside this directory there are directories for the DGEMM, FFT, PTRANS, RandomAccess, and STREAM workloads which contain source code and header files. Also under this top level directory is a README file and a file called _hpccinf.txt. This file is a sample input file for the workload, which is very similar to the HPL.dat file used for tuning Linpack. Tuning will be discussed in depth after we build and run the basic benchmark workloads.

In the hpl directory, there is a README, INSTALL, and TUNING file. There is also a www directory under hpl, which contains many helpful files that provide information about links, references, results, scalability, software, tuning, etc. Inside the setup directory are all of the example makefiles provided with the workload. Using these makefiles will be discussed in more detail below.

 


Dependencies

In order to run the HPC Challenge benchmark, there are a few things that must be installed on your system. First, you must have some implementation of either BLAS (Basic Linear Algebra Subprograms) or VSIPL (Vector Signal Image Processing Library). You are allowed to use optimized versions of BLAS that are architecture dependent, such as IBM's ESSL (Engineering Scientific Subroutine Library). While ESSL must be obtained from IBM, an open-source version of BLAS ships with RHEL 5.2 (blas-3.0-37.el5). Both 32-bit and 64-bit version are provided. Be sure to install both blas-devel rpms as well.

If you were able to acquire ESSL, we assume here that you've installed the latest version on your system.

Also, you will need an implementation of MPI (Message Passing Interface) installed. We used OpenMPI, which is an open source high performance message passing library. OpenMPI 1.2.5 is provided by RHEL 5.2. When installing OpenMPI, make sure to install both the openmpi and openmpi-devel packages so you get the needed header files. There are both 32 and 64 bit builds available. Be aware that there are numerous dependencies that will get pulled in when installing OpenMPI, so using "yum install" will make things easier.

For example, on our system, these packages were installed:

  compat-dapl-1.2.5-2.0.7-2.el5@ppc      mpi-selector-1.0.0-2.el5@noarch        
  libibverbs-1.1.1-9.el5@ppc             openmpi-1.2.5-5.el5@ppc                
  librdmacm-1.0.7-1.el5@ppc              openmpi-libs-1.2.5-5.el5@ppc 
  compat-dapl-1.2.5-2.0.7-2.el5@ppc64    openmpi-1.2.5-5.el5@ppc64              
  libibverbs-1.1.1-9.el5@ppc64           openmpi-libs-1.2.5-5.el5@ppc64         
  openmpi-devel-1.2.5-5.el5@ppc          openmpi-devel-1.2.5-5.el5@ppc64  
  librdmacm-1.0.7-1.el5@ppc64

In addition to the software mentioned above, you will need a compiler installed. For optimal performance on Power systems, we recommend you try IBM XL Fortran Advanced Edition for Linux and IBM XL C/C++ Advanced Edition for Linux.

The other obvious compiler option is GCC, the GNU Compiler Collection, which comes standard with the operating system. Another option of compilers based on gcc and provided specifically for Power users is the Advance Toolchain. Details on the Advance Toolchain can be found here. The Advance Toolchain provides a newer version of the gcc compiler and the tool chain libraries.

Our initial examples are on a single node, but we show you one way to extend this testing onto multiple machines (small clusters) later in this article.

In the following text, we now assume both compilers were installed, BLAS and ESSL were installed, and OpenMPI was installed.

 


Setup

Before you can run HPC Challenge, you must first provide the input file which passes the needed parameters to the workload. The input file for HPC Challenge is called hpccinf.txt and is very similar to the input file used for the Linpack benchmark. The HPC Challenge tar ball provides a sample input file named _hpccinf.txt in the hpcc-<ver> directory. For testing purposes, it is sufficient to just copy this sample input file to a file named "hpccinf.txt" without any changes. Tuning the workload through this file will be discussed in the Tuning section.

[~]# cd /usr/local/hpcc-1.2.0
[hpcc-1.2.0]# cp _hpccinf.txt hpccinf.txt

The additional step is to create a hostfile for your MPI (Message Passing Interface), even if you are only using one system. This file can reside anywhere on the system as you will specify its location in the run command. Below is an example hostfile (which we arbitrarily put in the /etc directory). For this hostfile, you can use IPs or long hostnames, but do not use 'localhost'.

[hpcc-1.2.0]# vi /etc/hostfile 
mytestsystem.austin.ibm.com

 


Building

First, to build the HPC Challenge workload, you need to create a makefile specific to your architecture and environment. There are some sample makefiles under hpl/setup. If your setup is similar to one of the makefiles provided, use that make file as a starting point. The most commonly changed parameters in the makefile are to setup MPI, BLAS, and the compilers. Below is an example of each of these sections of the makefile:

In our case, we'll be setting up MPI with OpenMPI.

# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        = xxx
MPinc        = xxx
MPlib        = xxx

For BLAS, we'll be using the BLAS as provided by RHEL 5.2 and the ESSL version.

#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = xxx
LAinc        = xxx
LAlib        = xxx

And for the compilers, we'll provide examples for gcc and the IBM Compilers.

# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           = xxx
CCNOOPT      = xxx
CCFLAGS      = xxx
#
LINKER       = xxx
LINKFLAGS    = xxx

For example, for an IBM Power 575 we used Make.PWRPC_FBLAS as our starting point, changing these sections as necessary. Below are some example configurations for 32 bit builds and there are comments provided about what to change for 64 bit builds, provided you have the corresponding 64 bit installations of the dependencies.

We will provide three common easy examples. First, using blas with gcc. Then blas with xlc. And finally, ESSL with xlc. (ESSL cannot be used with gcc.)

# cd /usr/local/hpcc-1.2.0
# cp hpl/setup/Make.PWRPC_FBLAS   hpl/Make.ppc64.blas.gcc
# cp hpl/setup/Make.PWRPC_FBLAS   hpl/Make.ppc64.blas.xlc
# cp hpl/setup/Make.PWRPC_FBLAS   hpl/Make.ppc64.essl.xlc

In the later sections, we'll show what needs to be changed in each makefile. The makefiles reside in the hpl sub-directory. You will edit the files in the hpl directory.

The "make" command is executed from the hpcc-1.2.0 directory. But first, you'll need to edit these three makefiles.

Once the build has finished, it will create an executable called hpcc. The wrapper scripts and variables needed to run this executable are covered in each of the example configuration sections below. "hpcc" isn't runnable as it is, by itself.

 

Example Configurations

 

BLAS, OpenMPI, gcc

Below is an example of the changes to the makefile (Make.ppc64.blas.gcc) for the Power 575 machine using a generic BLAS library, OpenMPI, and the gcc compilers. This build changes the "F2CDEFS" variable in the sample makefile as well as the normal sections.

cd /usr/local/hpcc-1.2.0
vi hpl/Make.ppc64.blas.gcc

Search for the following sections and edit the file for the variables being set.

# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        = 
MPinc        = 
MPlib        = -lmpi
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = 
LAinc        =
LAlib        = -lblas 
#

...

F2CDEFS      = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
##### These values are the default for this variable, so either explicitly
##### change this variable to these values, or simply comment the variable
##### out to take the default values.

...

# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           = /usr/lib/openmpi/1.2.5-gcc/bin/mpicc
####### For 64 bit, replace 'lib' with 'lib64' #######
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) 
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
LINKER       = $(CC)
LINKFLAGS    =

Using this makefile, issue the following commands to clean/setup, build and run. The minimum number of threads required by the HPC Challenge benchmark is 4 on the -np directive.

# clean up first and setup directories, then build
make arch=ppc64.blas.gcc clean
make arch=ppc64.blas.gcc


# RUN:
/usr/lib/openmpi/1.2.5-gcc/bin/mpirun --hostfile /etc/hostfile -np 4 ./hpcc

# Replace this example path with the path to your mpirun command provided by openmpi ('lib64' for 64 bit)

 

BLAS, OpenMPI, IBM Compilers

Below is an example of the changes to the makefile (Make.ppc64.blas.xlc) for the Power 575 machine using a generic BLAS library, OpenMPI, and the IBM XL compilers. This build changes the "F2CDEFS" variable in the sample makefile as well as the normal sections.

cd /usr/local/hpcc-1.2.0
vi hpl/Make.ppc64.blas.xlc

Search for the following sections and edit the file for the variables being set.

# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        =
MPinc        =
MPlib        =-lmpi
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        =
LAinc        =
LAlib        =-lblas

...

F2CDEFS      = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
##### These values are the default for this variable, so either explicitly
##### change this variable to these values, or simply comment the variable
##### out to take the default values.

...

# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           =/usr/lib/openmpi/1.2.5-gcc/bin/mpicc
####### For 64 bit, replace 'lib' with 'lib64' #######
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS)
#
LINKER       = $(CC)
LINKFLAGS    =-lblas -L/opt/ibmcmp/xlsmp/1.7/lib/ -L/opt/ibmcmp/xlf/11.1/lib/ -lxlf90_r -lxlfmath -lxlomp_ser -lxl
####### For 64 bit, replace both 'lib' with 'lib64' #######

Using this makefile, issue the following commands to clean/setup, build and run. The minimum number of threads required by the HPC Challenge benchmark is 4 on the -np directive. For this configuration, you need to define a few flags for the openmpi wrapper compiler mpicc as well as the run command. These flags override the compiler wrapper defaults, which is assumed to be gcc in this case. See the OpenMPI FAQs on this topic for more information:

# set the openmpi flags, then clean up and setup directories, then build

# C, Fortran77, and Fortran90 flags
export OMPI_CC=/opt/ibmcmp/vac/9.0/bin/xlc                   
export OMPI_F77=/opt/ibmcmp/xlf/11.1/bin/xlf
export OMPI_FC=/opt/ibmcmp/xlf/11.1/bin/xlf90

# C Compiler flag
export OMPI_CFLAGS='-O3 -q32'
####### For 64 bit, replace '-q32' with '-q64' #######

# Linker flag
export OMPI_LDFLAGS='-q32 -L/usr/lib/openmpi/1.2.5-gcc/lib'
####### For 64 bit, replace '-q32' with '-q64' and replace the first 'lib' with 'lib64' #######



make arch=ppc64.blas.xlc clean
make arch=ppc64.blas.xlc


#RUN:

/usr/lib/openmpi/1.2.5-gcc/bin/mpirun --hostfile /etc/hostfile -np 4 ./hpcc
# Replace this example path with the path to your mpirun command provided by openmpi ('lib64' for 64 bit)
# Replace the path to your hostfile after the --hostfile option
# Replace the desired number of processes after the -np option

 

IBM's ESSL, OpenMPI, IBM Compilers

Below is an example of the changes to the makefile (Make.ppc64.essl.xlc) for the IBM Power 575 machine using IBM's ESSL, OpenMPI, and the IBM XL compilers.

cd /usr/local/hpcc-1.2.0
vi hpl/Make.ppc64.essl.xlc

Search for the following sections and edit the file for the variables being set.

# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        =
MPinc        =
MPlib        =-lmpi
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        =
LAinc        =
LAlib        =-lessl

...

# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           =/usr/lib/openmpi/1.2.5-gcc/bin/mpicc        
####### For 64 bit, replace 'lib' with 'lib64' #######
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS)


LINKER       = $(CC)
LINKFLAGS    = -lessl -L/opt/ibmcmp/xlsmp/1.7/lib/ -L/opt/ibmcmp/xlf/11.1/lib/ -lxlf90_r -lxlfmath -lxlomp_ser -lxl
####### For 64 bit, replace both 'lib' with 'lib64' #######

Using this makefile, issue the following commands to clean/setup, build and run. The minimum number of threads required by the HPC Challenge benchmark is 4 on the -np directive. For this configuration, you need to define a few flags for the openmpi wrapper compiler mpicc as well as the run command. These flags override the compiler wrapper defaults, which is assumed to be gcc in this case. See the OpenMPI FAQs on this topic for more information:

# set the openmpi flags, then clean up and setup directories, then build

# C, Fortran77, and Fortran90 flags
export OMPI_CC=/opt/ibmcmp/vac/9.0/bin/xlc                   
export OMPI_F77=/opt/ibmcmp/xlf/11.1/bin/xlf
export OMPI_FC=/opt/ibmcmp/xlf/11.1/bin/xlf90

# C Compiler flag
export OMPI_CFLAGS='-O3 -q32'
####### For 64 bit, replace '-q32' with '-q64' #######

# Linker flag
export OMPI_LDFLAGS='-q32 -L/usr/lib/openmpi/1.2.5-gcc/lib'
####### For 64 bit, replace '-q32' with '-q64' and replace the first 'lib' with 'lib64' #######



make arch=ppc64.essl.xlc clean
make arch=ppc64.essl.xlc

#RUN:

/usr/lib/openmpi/1.2.5-gcc/bin/mpirun --hostfile /etc/hostfile -np 4 ./hpcc
# Replace this example path with the path to your mpirun command provided by openmpi ('lib64' for 64 bit)
# Replace the path to your hostfile after the --hostfile option
# Replace the desired number of processes after the -np option

After the workload is finished, there will be an output file named hpccoutf.txt under the hpcc-<ver> directory on the control node (or whatever machine is listed first in the hostfile normally). This is the results file generated by the benchmark and will be discussed in more detail later.

 

Warnings

You may see these warnings:

libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host mytestsystem.austin.ibm.com was unable to find any HCAs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host mytestsystem.austin.ibm.com was unable to find any NICs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------

To fix the libibverbs warning, load the following ib modules: ib_mthca and rdma_ucm. Or simply make sure you have the following packages, which all ship with RHEL5.2, installed: openib, libmthca, libibverbs, libibverbs-devel, librdmacm, librdmacm-devel.

The next two warnings about OpenIB not being able to find and HCAs and uDAPL not being able to find any NICs are simply OpenMPI letting you know that you are not using a high speed network that it assumes is built in. To suppress these warnings, insert the MCA parameter "--mca btl ^openib,udapl" in your mpirun command. For example:

/usr/lib/openmpi/1.2.5-gcc/bin/mpirun --mca btl ^openib,udapl --hostfile /etc/hostfile -np 4 ./hpcc

To learn more about theses warnings, see the related thread in the OpenMPI User's Mailing List Archives and to learn more about MCA parameters, see the OpenMPI FAQs.

 


Tuning

There are two types of runs for HPC Challenge. First there are baseline runs where the user can tune various parameters, and then there are optimized runs where the user can tune various things and can make certain code modifications to the benchmark. This paper will discuss a little about baseline tuning. To learn more, or to learn about optimized run tuning, see the HPCC rules.

There are several things you can change to tune the benchmark to work towards achieving a better score. First there are the software pieces. Obviously getting an optimized version of your chosen compiler, MPI, linker, and BLAS or VSIPL libraries will improve performance.

Optimized Math Libraries





We experienced a very large gain between building with a general BLAS library compared to building with IBM's ESSL. The Linpack component for an out-of-the-box run on an IBM Power 575 with no other tuning significantly increased from the general BLAS to IBM's carefully tuned ESSL product.

Transparent Large Pages





Also, you are allowed to use compiler and load options to increase your result. Another software piece you may consider is libhugetlbfs, which will make 16MB large pages available to the workload and can boost the score in some cases.

Tuning HPC-Challenge





Another big piece to improving your score is tuning the input file (hpccinf.txt). The input file is described in depth in the file under the hpl directory called TUNING and also in hpl/www/tuning.html. These files explain line by line what each of the input file parameters are and also provide some guidelines at the bottom.

Another good source of guidelines for these parameters are the HPPC FAQs or the file hpl/www/faqs.html.

There are a few main parameters that you can start with.

  • First there is N (line 6) which is your problem size, or in other words your matrix dimension for HPL (Linpack).
  • Next, there is your NB (line 8) which is your block size, or in other words your sub matrix size.
  • Then there is your P (line 11) and Q (line 12) which are the number of process rows and columns you want to run (PxQ). This will correspond to the number of processors you choose to run with.

The input file makes it easy to try out multiple configurations during one run by allowing you to specify how many N's to try (line 5), how many NB's to try (line 7), how many process grids to try (line 10), how many additional N's to try for PTRANS (line 33), and how many additional NB's to try for PTRANS (line 35), etc. Refer to the above mentioned files/links to learn about the other parameters and to read some guidelines for setting these values.

Infiniband Performance





One other thing to consider is your network and I/O performance. You can try using InfiniBand, which is a very high speed I/O technology that can improve the performance of high performance computing systems. To configure InfiniBand, first install openib, libibverbs-utils, and libehca, which all come with the RHEL5.2 installation. Then issue the following commands:

service openibd start

This starts the openib daemon and loads the openib kernel modules. At this point, the IB kernel module and device driver are loaded and the IB connection is established. The following commands make use of the IPoIB protocol and are useful for verifying the IB connection, although they are not necessary to establish and use the IB connection with OpenMPI.

# On node A issue:
ifconfig ib0 192.168.2.1

# On node B issue:
ifconfig ib0 192.168.2.2
# Repeat for each node in the cluster with unique local IP addresses

# On all nodes, we disable the firewall (as assume a protected lab)
iptables -F

# On each node, check to see if the device is visible by issuing:
ibv_devinfo

# Check to see if the state of the port is active

# Now check the connection:
# On node A issue:
ibv_rc_pingpong

# On node B issue:
ibv_rc_pingpong 192.168.2.1 (node A's IB IP created earlier)

# If the connection is established, Mbit/sec and usec/iter metrics will be displayed

 


Analyzing and Comparing Results

The results file (hpccoutf.txt) contains a lot of information. First, the input file is printed for reference along with some information about the workload, the hostname, etc. Then there are the full results for all the workloads.

HPC Challenge consists of three different types of tests.

Local Runs





First there are Single runs, which are also referred to as "Local". These tests are run on a single processor.

Star Runs





Next, there are Star runs, also referred to as "EP" or "Embarrassingly Parallel". For these runs, each processor is doing computation in parallel, but the processors are not communicating with each other explicitly. Lastly there are MPI runs, also referred to as "Global". For these runs, each processor is doing computation in parallel and the processors are explicitly communicating with each other.

The workloads are run in the following order: PTRANS, HPL, StarDGEMM, SingleDGEMM, StarSTREAM, SingleSTREAM, MPIRandomAccess, StarRandomAccess, SingleRandomAccess, MPIFFT, StarFFT, SingleFFT, and finally LatencyBandwidth. Some of the workloads will provide a short summary after the results stating how many tests were measured, how many failed, and how many were skipped.

After the full results reports, there is a summary section that consists of a long list of variables that contain system information, workload scores, workload times, etc. If you were to upload and submit your results to the HPCC website, they take your hpccoutf.txt file and some system information and create a much easier to read report.

See the HPCC upload link for more information.

  • To see an example of the "results" file that is created after submission, click on any of the links in the System Information field. There is a very nice feature for comparing results on the HPCC website under the Kiviat Diagram link. You simply check the runs you want to compare and scroll down and hit the Graph button and it creates a radar graph comparing each component of all the runs selected.

For more information on how to compare the published results to your own, or why your results are different, see HPCC FAQs.



Multi-machine setup

  • For multi-machine configuration, simply build the workload on each machine as described above and then include each machine's IP/hostname in the hostfile on the controller machine. You only need to issue the run command on the control machine and it is good practice to list the control machine first in the hostfile so that the hpcc output file is created there.
  • Also, for a multi-machine setup, you need to setup your ssh keys so that OpenMPI can remotely startup processes without any password prompts. To do this, enter the following commands on one of the nodes (as root):

Note: This master node is often called the control node in HPC applications. Additional nodes are generally referred to as the compute nodes. In our example here, we keep things simple by using no passphrase, which is not recommended for production environments.

ssh-keygen -t dsa
# Hit enter to accept the default file location
# Hit enter twice for no passphrase

cd /root/.ssh
cat id_dsa.pub >> authorized_keys

Then copy the .ssh directory to all other nodes under /root. Make sure to do a chmod 700 /root/.ssh/ on all nodes if those permissions are not already set. Verify that you can now ssh between any of the nodes without a password.

 


References

HPCC website, http://icl.cs.utk.edu/hpcc/

IBM's ESSL (Engineering Scientific Subroutine Library), http://www-304.ibm.com/jct03004c/systems/p/software/essl/index.html

IBM XL Fortran Advanced Edition for Linux, http://www-306.ibm.com/software/awdtools/fortran/xlfortran/features/linux/xlf-linux.html

IBM XL C/C++ Advanced Edition for Linux, http://www-306.ibm.com/software/awdtools/xlcpp/features/linux/xlcpp-linux.html

InfiniBand, http://www.infinibandta.org/home

Libhugetlbfs, http://sourceforge.net/projects/libhugetlbfs

BLAS (Basic Linear Algebra Subprograms), http://www.netlib.org/blas/index.html

VSIPL (Vector Signal Image Processing Library), http://www.vsipl.org/

MPI (Message Passing Interface), http://www-unix.mcs.anl.gov/mpi/

OpenMPI, http://www.open-mpi.org/

GCC, the GNU Compiler Collection, http://gcc.gnu.org/