Question & Answer
Question
You find that an MPI communication routine (MPI_Bcast) implemented in Spectrum_MPI/10.1.0 gives poor performance on Paragon (IBM Power 8 ppc64le) when compared to same Intel_MPI routine on ScaffelPike (x86 system).
Steps to reproduce:
Load Spectrum_MPI and profile the following code:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
int v1[1];
int i;
for(i=0; i<1000000; i++)
{
v1[0]=i;
MPI_Bcast(v1,1,MPI_INT,0,MPI_COMM_WORLD);
}
// Print off a hello world message
if (v1[0]==99) printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
}
For the benchmark provided, there is high variability in the execution time per rank. This variability should not be seen in general.
Furthermore, results on your Intel platform and on the Power8 platform when using the MXM communication library are in line with the expectation.
Only results on Power with the default communication layer (PAMI) are off, by a significant factor.
Data should be obtained over multiple nodes and with a high number of ranks to see the problem. You are testing with 96 ranks over 6 nodes (16 node per rank).
Attached is the output of mpitrace, an MPI tracing library that was developed internally at IBM and is available in the latest version of Spectrum MPI. But any tracing library will suffice for this experiment.
A summary of the variance is highlighted below:
- Spectrum MPI with IBM's PAMI communication layer:
Histogram of times spent in MPI
time-bin #ranks
7.940 1
8.746 0
9.551 5
10.356 0
11.162 0
11.967 0
12.772 9
13.577 9
14.383 12
15.188 18
15.993 18
16.798 0
17.604 6
18.409 0
19.214 18
- Spectrum MPI with OpenMPI's MXM communication layer:
Histogram of times spent in MPI
time-bin #ranks
3.402 8
3.404 15
3.407 9
3.409 0
3.412 1
3.414 33
3.416 14
3.419 0
3.421 0
3.423 0
3.426 0
3.428 0
3.431 0
3.433 12
3.435 4
Log InLog in to view more of this document
Was this topic helpful?
Document Information
More support for:
IBM Spectrum MPI
Software version:
10.1
Operating system(s):
Linux
Document number:
792467
Modified date:
30 December 2018
UID
ibm10792467