IBM Support

Performance issue with Spectrum_MPI

Question & Answer


Question

You find that an MPI communication routine (MPI_Bcast) implemented in Spectrum_MPI/10.1.0 gives poor performance on Paragon (IBM Power 8 ppc64le) when compared to same Intel_MPI routine on ScaffelPike (x86 system).

Steps to reproduce:
Load Spectrum_MPI and profile the following code:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);

// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);

int v1[1];
int i;
for(i=0; i<1000000; i++)
{
v1[0]=i;
MPI_Bcast(v1,1,MPI_INT,0,MPI_COMM_WORLD);
}

// Print off a hello world message
if (v1[0]==99) printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size);

// Finalize the MPI environment.
MPI_Finalize();
}

For the benchmark provided, there is high variability in the execution time per rank. This variability should not be seen in general.
Furthermore, results on your Intel platform and on the Power8 platform when using the MXM communication library are in line with the expectation.

Only results on Power with the default communication layer (PAMI) are off, by a significant factor.

Data should be obtained over multiple nodes and with a high number of ranks to see the problem. You are testing with 96 ranks over 6 nodes (16 node per rank).

Attached is the output of mpitrace, an MPI tracing library that was developed internally at IBM and is available in the latest version of Spectrum MPI. But any tracing library will suffice for this experiment.

A summary of the variance is highlighted below:

- Spectrum MPI with IBM's PAMI communication layer:
Histogram of times spent in MPI
   time-bin   #ranks
      7.940        1
      8.746        0
      9.551        5
     10.356        0
     11.162        0
     11.967        0
     12.772        9
     13.577        9
     14.383       12
     15.188       18
     15.993       18
     16.798        0
     17.604        6
     18.409        0
     19.214       18

- Spectrum MPI with OpenMPI's MXM communication layer:
Histogram of times spent in MPI
   time-bin   #ranks
      3.402        8
      3.404       15
      3.407        9
      3.409        0
      3.412        1
      3.414       33
      3.416       14
      3.419        0
      3.421        0
      3.423        0
      3.426        0
      3.428        0
      3.431        0
      3.433       12
      3.435        4

 

 

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSZTET","label":"IBM Spectrum MPI"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"10.1","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

More support for:
IBM Spectrum MPI

Software version:
10.1

Operating system(s):
Linux

Document number:
792467

Modified date:
30 December 2018

UID

ibm10792467

Manage My Notification Subscriptions