OMP_PROC_BIND

The OMP_PROC_BIND environment variable controls the thread affinity policy and whether OpenMP threads can be moved between places. With the thread affinity feature, you can have a fine-grained control of how threads are bound and distributed to places. The thread affinity policies are MASTER, CLOSE, and SPREAD.

Read syntax diagramSkip visual syntax diagram
OMP_PROC_BIND syntax

>>-OMP_PROC_BIND=--+-TRUE-----------+--------------------------><
                   +-FALSE----------+   
                   | .-,----------. |   
                   | V            | |   
                   '---+-MASTER-+-+-'   
                       +-CLOSE--+       
                       '-SPREAD-'       

TRUE
Binds the threads to places.
FALSE
Allows threads to be moved between places and disables thread affinity.
MASTER
Instructs the execution environment to assign the threads in the team to the same place as the master thread.
CLOSE
Instructs the execution environment to assign the threads in the team to the places that are close to the place of the parent thread. The place partition is not changed by this policy. Each implicit task inherits the place-partition-var ICV of the parent implicit task. Suppose T threads in the team are assigned to P places in the parent’s place partition, the threads are assigned as follows:
  • If T is less than or equal to P, the master thread executes on the place of the parent thread. The thread with the next smallest thread number executes on the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread.
  • If T is greater than P, each place contains at least S = floor(T/P) consecutive threads. The first S threads with the smallest thread number (including the master thread) are assigned to the place of the parent thread. The next S threads with the next smallest thread numbers are assigned to the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread. When P does not divide T evenly, each remaining thread is assigned to a subpartition in the order of the place list.
SPREAD
Instructs the execution environment to spread a set of T threads as evenly as possible among P places of the parent's place partition at run time. The thread distribution mechanism is as follows:
  • If T is less than or equal to P, the parent partition is divided into T subpartitions, where each subpartition contains at least S=floor(P/T) consecutive places. A single thread is assigned to each subpartition. The master thread executes on the place of the parent thread and is assigned to the subpartition that includes that place. The thread with the next smallest thread number is assigned to the first place in the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread.
  • If T is greater than P, the parent's partition is divided into P subpartitions, where each subpartition contains a single place. Each place contains at least S = floor(T/P) consecutive threads. The first S threads with the smallest thread number (including the master thread) are assigned to the subpartition that contains the place of the parent thread. The next S threads with the next smallest thread numbers are assigned to the next place in the place partition, and so on, with wrap around with respect to the original place partition of the master thread. When P does not divide T evenly, each remaining thread is assigned to a subpartition in the order of the place list.

where

Place
is a hardware unit that holds an unordered set of processors on which one or more threads can execute.
Place list
is an ordered list that describes all places that are available to the applications.
Place partition
is an ordered list that corresponds to a contiguous interval in the place list. The places in the partition are available for a given parallel region.

When OMP_PROC_BIND is set to TRUE, MASTER, CLOSE, or SPREAD, a place can be assigned with up to THREADS_PER_PLACE threads. Each remaining thread is assigned to a place in the order of the place list.

For each place in OMP_PLACES, THREADS_PER_PLACE is a positive integer and is calculated in the following way:

THREADS_PER_PLACE = floor((the number of resources in that place/the total number of resources (including duplicated resources))*OMP_THREAD_LIMIT)

After THREADS_PER_PLACE is calculated for each place in this manner, if the sum of all the THREADS_PER_PLACE values is less than OMP_THREAD_LIMIT, each THREADS_PER_PLACE is increased by one, starting from the largest place to the smallest place, until OMP_THREAD_LIMIT is reached. Places that are equivalent in size are ordered according to their order in OMP_PLACES.

Usage

By default, the OMP_PROC_BIND environment variable is not set.

If the initial thread cannot be bound to the first place in the OpenMP place list, the runtime execution environment issues a message and assigns threads according to the default place list.

The OMP_PROC_BIND and XLSMPOPTS environment variables interact with each other according to the following rules:

Table 1. Thread binding rule summary
OMP_PROC_BIND settings XLSMPOPTS settings Thread binding results
OMP_PROC_BIND is not set XLSMPOPTS is not set. Threads are not bound.
XLSMPOPTS is set to startproc/stride or procs2. Threads are bound according to the settings in XLSMPOPTS.
XLSMPOPTS setting is invalid. Threads are not bound.
OMP_PROC_BIND=TRUE XLSMPOPTS is not set. Threads are bound.
XLSMPOPTS is set to startproc/stride or procs2. Threads are bound according to the settings in XLSMPOPTS1.
XLSMPOPTS setting is invalid. Threads are bound.
OMP_PROC_BIND=FALSE XLSMPOPTS is not set. Threads are not bound.
XLSMPOPTS is set to startproc/stride or procs2.
XLSMPOPTS setting is invalid.
Note:
  1. If procs is set and the number of CPU IDs specified is smaller than the number of threads that are used by the program, the remaining threads are also bound to the listed CPU IDs but not in any particular order. If XLSMPOPTS=startproc is used, the value specified by startproc is smaller than the number of CPUs, and the value that is specified by stride causes a thread to bind to a CPU outside the range of available places, some of the threads are bound and some are not.
  2. The startproc/stride and procs suboptions of XLSMPOPTS are deprecated.

The OMP_PROC_BIND environment variable provides a portable way to control whether OpenMP threads can be migrated. The startproc/stride or procs suboption of the XLSMPOPTS environment variable, which is an IBM extension, provides a finer control to bind OpenMP threads to places. If portability of your application is important, use only the OMP_PROC_BIND environment variable to control thread binding.

When OMP_PROC_BIND is set to MASTER, CLOSE, or SPREAD, the suboption settings startproc/stride or procs of XLSMPOPTS are ignored.

For a program that contains both OpenMP and OpenMPI code, the OpenMP runtime detects the existence of OpenMPI code by the presence of the OMPI_COMM_WORLD_RANK environment variable. If you do not set OMP_PLACES explicitly, the compiler sets OMP_PROC_BIND to be TRUE.

Examples

The following examples demonstrate the thread binding and thread affinity results when you compile myprogram.c with different environment variable settings.

myprogram.c
int main(){
    // ...
}  
Environment variable settings 1
OMP_NUM_THREADS=4; 
OMP_PROC_BIND=MASTER; 
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'

Results 1: Every thread in the team is assigned to the place on which the master executes. Four threads are assigned to place 0.

Environment variable settings 2
OMP_NUM_THREADS=4; 
OMP_PROC_BIND=close; 
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
Results 2: The thread is assigned to a place that is close to the place of the parent thread. The thread assignment is as follows:
  • OMP thread 0 is assigned to place 0
  • OMP thread 1 is assigned to place 1
  • OMP thread 2 is assigned to place 2
  • OMP thread 3 is assigned to place 3
Environment variable settings 3
OMP_NUM_THREADS=4; 
OMP_PROC_BIND=spread; 
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
Results 3: The number of threads 4 is smaller than the number of places 8, so four subpartitions are formed. 8 is evenly divided by 4, so the thread assignment is as follows:
  • OMP thread 0 is assigned to place 0
  • OMP thread 1 is assigned to place 2
  • OMP thread 2 is assigned to place 4
  • OMP thread 3 is assigned to place 6
Environment variable settings 4
OMP_NUM_THREADS=5; 
OMP_PROC_BIND=spread; 
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
Results 4: The number of threads 5 is smaller than the number of places 8, so five subpartitions are formed. 8 is not evenly divided by 5, so threads are assigned to the places in order. The thread assignment is as follows:
  • OMP thread 0 is assigned to place 0
  • OMP thread 1 is assigned to place 2
  • OMP thread 2 is assigned to place 4
  • OMP thread 3 is assigned to place 6
  • OMP thread 4 is assigned to place 7
Environment variable settings 5
OMP_NUM_THREADS=8; 
OMP_PROC_BIND=spread;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4}'
Results 5: The number of threads 8 is greater than the number of places 4, so four subpartitions are formed. 8 is evenly divided by 4, so two threads are assigned to each subpartition. The thread assignment is as follows:
  • OMP thread 0 and thread 1 are assigned to place 0
  • OMP thread 2 and thread 3 are assigned to place 1
  • OMP thread 4 and thread 5 are assigned to place 2
  • OMP thread 6 and thread 7 are assigned to place 3
Environment variable settings 6
OMP_NUM_THREADS=7; 
OMP_PROC_BIND=spread; 
OMP_PLACES='{0:4},{4:4},{8:4},{12:4}'
Results 6: The number of threads 7 is greater than the number of places 4, so four subpartitions are formed. 7 is not evenly divided by 4, so one thread (floor(7/4)=1) is assigned to each subpartition. The thread assignment is as follows:
  • OMP thread 0 is assigned to place 0
  • OMP thread 1 and thread 2 are assigned to place 1
  • OMP thread 3 and thread 4 are assigned to place 2
  • OMP thread 5 and thread 6 are assigned to place 3


Voice your opinion on getting help information Ask IBM compiler experts a technical question in the IBM XL compilers forum Reach out to us