OMP_PROC_BIND
The OMP_PROC_BIND environment variable controls the thread affinity policy and whether OpenMP threads can be moved between places. With the thread affinity feature, you can have a fine-grained control of how threads are bound and distributed to places. The thread affinity policies are MASTER, CLOSE, and SPREAD.
OMP_PROC_BIND syntax >>-OMP_PROC_BIND=--+-TRUE-----------+-------------------------->< +-FALSE----------+ | .-,----------. | | V | | '---+-MASTER-+-+-' +-CLOSE--+ '-SPREAD-'
- TRUE
- Binds the threads to places.
- FALSE
- Allows threads to be moved between places and disables thread affinity.
- MASTER
- Instructs the execution environment to assign the threads in the team to the same place as the master thread.
- CLOSE
- Instructs the execution environment to assign the threads in the
team to the places that are close to the place of the parent thread.
The place partition is not changed by this policy. Each implicit task
inherits the place-partition-var ICV of the parent
implicit task. Suppose T threads in the team are
assigned to P places in the parent’s place
partition, the threads are assigned as follows:
- If T is less than or equal to P, the master thread executes on the place of the parent thread. The thread with the next smallest thread number executes on the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread.
- If T is greater than P, each place contains at least S = floor(T/P) consecutive threads. The first S threads with the smallest thread number (including the master thread) are assigned to the place of the parent thread. The next S threads with the next smallest thread numbers are assigned to the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread. When P does not divide T evenly, each remaining thread is assigned to a subpartition in the order of the place list.
- SPREAD
- Instructs the execution environment to spread a set of T threads
as evenly as possible among P places of the parent's
place partition at run time. The thread distribution mechanism is
as follows:
- If T is less than or equal to P, the parent partition is divided into T subpartitions, where each subpartition contains at least S=floor(P/T) consecutive places. A single thread is assigned to each subpartition. The master thread executes on the place of the parent thread and is assigned to the subpartition that includes that place. The thread with the next smallest thread number is assigned to the first place in the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread.
- If T is greater than P, the parent's partition is divided into P subpartitions, where each subpartition contains a single place. Each place contains at least S = floor(T/P) consecutive threads. The first S threads with the smallest thread number (including the master thread) are assigned to the subpartition that contains the place of the parent thread. The next S threads with the next smallest thread numbers are assigned to the next place in the place partition, and so on, with wrap around with respect to the original place partition of the master thread. When P does not divide T evenly, each remaining thread is assigned to a subpartition in the order of the place list.
where
- Place
- is a hardware unit that holds an unordered set of processors on which one or more threads can execute.
- Place list
- is an ordered list that describes all places that are available to the applications.
- Place partition
- is an ordered list that corresponds to a contiguous interval in the place list. The places in the partition are available for a given parallel region.
When OMP_PROC_BIND is set to TRUE, MASTER, CLOSE, or SPREAD, a place can be assigned with up to THREADS_PER_PLACE threads. Each remaining thread is assigned to a place in the order of the place list.
For each place in OMP_PLACES, THREADS_PER_PLACE is a positive integer and is calculated in the following way:
THREADS_PER_PLACE = floor((the number of resources in that place/the total number of resources (including duplicated resources))*OMP_THREAD_LIMIT)
After THREADS_PER_PLACE is calculated for each place in this manner, if the sum of all the THREADS_PER_PLACE values is less than OMP_THREAD_LIMIT, each THREADS_PER_PLACE is increased by one, starting from the largest place to the smallest place, until OMP_THREAD_LIMIT is reached. Places that are equivalent in size are ordered according to their order in OMP_PLACES.
Usage
By default, the OMP_PROC_BIND environment variable is not set.
If the initial thread cannot be bound to the first place in the OpenMP place list, the runtime execution environment issues a message and assigns threads according to the default place list.
The OMP_PROC_BIND and XLSMPOPTS environment variables interact with each other according to the following rules:
OMP_PROC_BIND settings | XLSMPOPTS settings | Thread binding results |
---|---|---|
OMP_PROC_BIND is not set | XLSMPOPTS is not set. | Threads are not bound. |
XLSMPOPTS is set to startproc/stride or procs2. | Threads are bound according to the settings in XLSMPOPTS. | |
XLSMPOPTS setting is invalid. | Threads are not bound. | |
OMP_PROC_BIND=TRUE | XLSMPOPTS is not set. | Threads are bound. |
XLSMPOPTS is set to startproc/stride or procs2. | Threads are bound according to the settings in XLSMPOPTS1. | |
XLSMPOPTS setting is invalid. | Threads are bound. | |
OMP_PROC_BIND=FALSE | XLSMPOPTS is not set. | Threads are not bound. |
XLSMPOPTS is set to startproc/stride or procs2. | ||
XLSMPOPTS setting is invalid. | ||
Note:
|
The OMP_PROC_BIND environment variable provides a portable way to control whether OpenMP threads can be migrated. The startproc/stride or procs suboption of the XLSMPOPTS environment variable, which is an IBM extension, provides a finer control to bind OpenMP threads to places. If portability of your application is important, use only the OMP_PROC_BIND environment variable to control thread binding.
When OMP_PROC_BIND is set to MASTER, CLOSE, or SPREAD, the suboption settings startproc/stride or procs of XLSMPOPTS are ignored.
For a program that contains both OpenMP and OpenMPI code, the OpenMP runtime detects the existence of OpenMPI code by the presence of the OMPI_COMM_WORLD_RANK environment variable. If you do not set OMP_PLACES explicitly, the compiler sets OMP_PROC_BIND to be TRUE.
Examples
The following examples demonstrate the thread binding and thread affinity results when you compile myprogram.c with different environment variable settings.
myprogram.cint main(){
// ...
}
Environment variable settings 1OMP_NUM_THREADS=4;
OMP_PROC_BIND=MASTER;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
Results 1: Every thread in the team is assigned to the place on which the master executes. Four threads are assigned to place 0.
Environment variable settings 2OMP_NUM_THREADS=4;
OMP_PROC_BIND=close;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
- OMP thread 0 is assigned to place 0
- OMP thread 1 is assigned to place 1
- OMP thread 2 is assigned to place 2
- OMP thread 3 is assigned to place 3
OMP_NUM_THREADS=4;
OMP_PROC_BIND=spread;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
- OMP thread 0 is assigned to place 0
- OMP thread 1 is assigned to place 2
- OMP thread 2 is assigned to place 4
- OMP thread 3 is assigned to place 6
OMP_NUM_THREADS=5;
OMP_PROC_BIND=spread;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4},{16:4},{20:4},{24:4},{28:4}'
- OMP thread 0 is assigned to place 0
- OMP thread 1 is assigned to place 2
- OMP thread 2 is assigned to place 4
- OMP thread 3 is assigned to place 6
- OMP thread 4 is assigned to place 7
OMP_NUM_THREADS=8;
OMP_PROC_BIND=spread;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4}'
- OMP thread 0 and thread 1 are assigned to place 0
- OMP thread 2 and thread 3 are assigned to place 1
- OMP thread 4 and thread 5 are assigned to place 2
- OMP thread 6 and thread 7 are assigned to place 3
OMP_NUM_THREADS=7;
OMP_PROC_BIND=spread;
OMP_PLACES='{0:4},{4:4},{8:4},{12:4}'
- OMP thread 0 is assigned to place 0
- OMP thread 1 and thread 2 are assigned to place 1
- OMP thread 3 and thread 4 are assigned to place 2
- OMP thread 5 and thread 6 are assigned to place 3