APAR status
Closed as program error.
Error description
LOMP PRODUCES INCORRECT BINDING WHEN OMP_PLACES=THREADS The test case prints the binding and shows that the mapping running 4 ranks with 4 threads on 1 node produced by xlc: Rank 0 rzansel47 Thread map: 0 1 2 3 Rank 1 rzansel47 Thread map: 0 1 2 3 Rank 2 rzansel47 Thread map: 0 1 2 3 Rank 3 rzansel47 Thread map: 0 1 2 3 ================================================================ ====TESTCASE: #include <stdio.h> #include <sched.h> #include <omp.h> #include <unistd.h> #include <string.h> #include <sched.h> /* Doesn't seem to be prototyped for XL */ extern int sched_getcpu(void); #define MAX_SIZE 1024 int map_array[MAX_SIZE]; void map_thread() { int tid = omp_get_thread_num(); int cpu = sched_getcpu(); if ((tid >= 0) && (tid < MAX_SIZE)) { map_array[tid] = cpu; } else printf ("Unexpected tid %i cpu %i\n", tid, cpu); } int main(int argc, char *argv[]) { int numtasks, rank, rc; char map_buf[10000]; char host[50]; int i; char *tag=""; rc = MPI_Init(&argc,&argv); if (rc != MPI_SUCCESS) { printf ("Error starting MPI program. Terminating.\n"); MPI_Abort(MPI_COMM_WORLD, rc); } MPI_Comm_size(MPI_COMM_WORLD,&numtasks); MPI_Comm_rank(MPI_COMM_WORLD,&rank); /* Use first arg as tag if set */ if ( argv[1] != NULL) tag=argv[1]; for (i=0; i < MAX_SIZE; i++) map_array[i] = -1; #pragma omp parallel for for (i=0; i < 1000; i++) { map_thread(); } sprintf (map_buf, "Thread map:"); for (i=0; i < MAX_SIZE; i++) { if (map_array[i] != -1) { sprintf (map_buf+strlen(map_buf), " %i", map_array[i]); } } gethostname(host, sizeof(host)); printf ("%s Rank %4i %12s %s\n", tag, rank, host, map_buf); MPI_Finalize(); return (0); } ================================================================
Local fix
N/A
Problem summary
USERS AFFECTED: Users who have mixed OpenMP and MPI applications and run them in multiple ranks with OMP_PLACES=threads/cores are affected by this issue. PROBLEM DESCRIPTION: When handling OMP_PLACES=threads, the OpenMP runtime treats each MPI rank of an application as to have all hardware threads of the system available to it. As a result, different MPI ranks running on the same system have their OpenMP threads bound to the same set of hardware threads and therefore, the application may not be able to fully utilize all the available hardware threads and achieve optimal performance.
Problem conclusion
This problem affects the performance of mixed OpenMP and MPI applications.
Temporary fix
Comments
APAR Information
APAR number
LI81541
Reported component name
XL C/C++ LINUX
Reported component ID
5725C7310
Reported release
G11
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-06-03
Closed date
2020-06-23
Last modified date
2020-06-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
XL C/C++ LINUX
Fixed component ID
5725C7310
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSXVZZ","label":"XL C\/C++ for Linux"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"G11","Line of Business":{"code":"LOB57","label":"Power"}}]
Document Information
Modified date:
24 June 2020