.-nosmp-------------------------------------------------------.
>>- -q--+-smp--+----------------------------------------------------+-+-><
| .-:-------------------------------------------. |
| | .-nostackcheck----------------------------. | |
| | +-opt-------------------------------------+ | |
| | +-norec_locks-----------------------------+ | |
| | +-noomp-----------------------------------+ | |
| | +-nonested_par----------------------------+ | |
| V +-auto------------------------------------+ | |
'-=----+-omp-------------------------------------+-+-'
+-nested_par------------------------------+
+-noauto----------------------------------+
+-noopt-----------------------------------+
+-rec_locks-------------------------------+
| .-runtime----------------. |
+-schedule--=--+-+-affinity-+--+------+-+-+
| +-dynamic--+ '-=--n-' |
| +-guided---+ |
| '-static---' |
+-stackcheck------------------------------+
'-threshold--+------+---------------------'
'-=--n-'
- auto | noauto
- Enables or disables automatic parallelization and optimization of program
code. By default, the compiler will attempt to parallelize
explicitly coded DO loops as well as those that are generated by the compiler
for array language. When noauto is in effect, only
program code explicitly parallelized with OpenMP
directives is optimized. noauto is implied if you specify -qsmp=omp or -qsmp=noopt.
- nested_par | nonested_par
- By default, the compiler serializes a nested parallel construct. When nested_par is in effect, the compiler parallelizes prescriptive
nested parallel constructs (PARALLEL DO, PARALLEL SECTIONS). This includes not only the loop constructs that are nested within a
scoping unit but also parallel constructs in subprograms that are referenced
(directly or indirectly) from within other parallel constructs. Note that
this suboption has no effect on loops that are automatically parallelized.
In this case, at most one loop in a loop nest (in a scoping unit) will be
parallelized. nested_par does not provide true nested
parallelism because it does not cause new team of threads to be created for
nested parallel regions. Instead, threads that are currently available are
reused.
This suboption should be used with caution. Depending on the number
of threads available and the amount of work in an outer loop, inner loops
could be executed sequentially even if this option is in effect. Parallelization
overhead may not necessarily be offset by program performance gains.
Note that the implementation of the nested_par suboption
does not comply with the OpenMP API. There is
no support for OpenMP nested parallelism. As such, the omp_get_nested routine always returns false according to the OpenMP API. If you
specify this suboption, the runtime library uses the same threads for the
nested PARALLEL DO and PARALLEL SECTIONS constructs
that it used for the enclosing PARALLEL constructs.
- omp | noomp
- Enforces or relaxes strict compliance to the OpenMP standard. When noomp is in effect, auto is implied.
When omp is in effect, noauto is
implied and only OpenMP parallelization directives are recognized. The compiler
issues warning messages if your code contains any language constructs that
do not conform to the OpenMP API.
Specifying omp also has the following effects:
- Automatic parallelization is disabled.
- All previously recognized directive triggers are ignored. The only recognized
directive trigger is $OMP. However, you can specify additional triggers
on subsequent -qdirective options.
- The -qcclines compiler option is enabled.
- When the C preprocessor is invoked, the _OPENMP C preprocessor macro
is defined automatically, with the value 200505, which is useful in supporting
conditional compilation. See "Conditional Compilation" for more
information.
- opt | noopt
- Enables or disables optimization of parallelized program code. When noopt is in effect, the compiler will do the smallest amount
of optimization that is required to parallelize the code. This is useful for
debugging because -qsmp enables the -O2 and -qhot options by default, which may result
in the movement of some variables into registers that are inaccessible to
the debugger. However, if the -qsmp=noopt and -g options are specified, these variables will remain visible to the
debugger.
- rec_locks | norec_locks
- Determines whether recursive locks are used to avoid
problems associated with CRITICAL constructs. When rec_locks is in effect, nested critical sections will not cause a deadlock; a thread can enter a CRITICAL construct from within the
dynamic extent of another CRITICAL construct that has the same name.
Note that the rec_locks suboption specifies
behavior for critical constructs that is inconsistent with the OpenMP API.
- schedule
- Specifies the type of scheduling algorithms and chunk size (n) that are used for loops to which no other scheduling algorithm has
been explicitly assigned in the source code. Suboptions of the schedule suboption are as follows:
- affinity[=n]
- The iterations of a loop are initially divided into n partitions, containing ceiling(number_of_iterations/number_of_threads) iterations.
Each partition is initially assigned to a thread and is then further subdivided
into chunks that each contain n iterations. If n is not specified, then the chunks consist of ceiling(number_of_iterations_left_in_partition /
2) loop iterations.
When a thread becomes free, it takes the next chunk
from its initially assigned partition. If there are no more chunks in that
partition, then the thread takes the next available chunk from a partition
initially assigned to another thread.
The work in a partition initially
assigned to a sleeping thread will be completed by threads that are active.
The affinity scheduling type does not appear
in the OpenMP API standard.
- dynamic[=n]
- The iterations of a loop are divided into chunks containing n iterations each. If n is not specified, then
the chunks consist of ceiling(number_of_iterations/number_of_threads). iterations.
Active threads
are assigned these chunks on a "first-come, first-do" basis. Chunks of the
remaining work are assigned to available threads until all work has been assigned.
If a thread is asleep, its assigned work will be taken over by an active
thread once that thread becomes available.
- guided[=n]
- The iterations of a loop are divided into progressively smaller chunks
until a minimum chunk size of n loop iterations is
reached. If n is not specified, the default value
for n is 1 iteration.
Active threads are assigned chunks on a
"first-come, first-do" basis. The first chunk contains ceiling(number_of_iterations/number_of_threads) iterations. Subsequent chunks consist of ceiling(number_of_iterations_left / number_of_threads) iterations.
- runtime
- Specifies that the chunking algorithm will be determined at run time.
- static[=n]
- The iterations of a loop are divided into chunks containing n iterations each. Each thread is assigned chunks in a "round-robin"
fashion. This is known as block cyclic scheduling. If the value of n is 1, then the scheduling
type is specifically referred to as cyclic scheduling.
If n is not specified, the chunks will contain ceiling(number_of_iterations/number_of_threads) iterations. Each thread is assigned
one of these chunks. This is known as block scheduling.
If
a thread is asleep and it has been assigned work, it will be awakened so that
it may complete its work.
- n
- Must be an integral assignment expression of value 1 or greater.
Specifying schedule with no suboption is
equivalent to schedule=runtime.
For more information on chunking algorithms and SCHEDULE, refer to "Directives".
- stackcheck | nostackcheck
- Causes the compiler to check for stack overflow by slave threads at
run time, and issue a warning if the remaining stack size is less than the
number of bytes specified by the stackcheck option of
the XLSMPOPTS environment variable. This suboption is intended for debugging
purposes, and only takes effect when XLSMPOPTS=stackcheck is also set; see XLSMPOPTS for more information.
- threshold[=n]
- When -qsmp=auto is in effect, controls
the amount of automatic loop parallelization that occurs. The value of n represents the minimum amount of work required in a
loop in order for it to be parallelized. Currently, the calculation of "work"
is weighted heavily by the number of iterations in the loop. In general,
the higher the value specified for n, the fewer loops
are parallelized. Specifying a value of 0 instructs the compiler to parallelize
all auto-parallelizable loops, whether or not it is profitable to do
so. Specifying a value of 100 instructs the compiler to parallelize only those
auto-parallelizable loops that it deems profitable. Specifying a value
of greater than 100 will result in more loops being serialized.
- n
- Must be a positive integer of 0 or greater.
If you specify threshold with no suboption,
the program uses a default value of 100.
Specifying -qsmp without suboptions is equivalent
to :
-qsmp=auto:opt:noomp:norec_locks:nonested_par:schedule=runtime:nostackcheck:threshold=100
In the following example, you should specify -qsmp=rec_locks to avoid a deadlock caused by critical constructs.
program t
integer i, a, b
a = 0
b = 0
!smp$ parallel do
do i=1, 10
!smp$ critical
a = a + 1
!smp$ critical
b = b + 1
!smp$ end critical
!smp$ end critical
enddo
end