-qsmp
Category
@PROCESS
None.
Purpose
Enables parallelization of program code.
Syntax
.-nosmp-------------------------------------------------------. >>- -q--+-smp--+----------------------------------------------------+-+->< | .-:-------------------------------------------. | | | .-nostackcheck----------------------------. | | | | +-opt-------------------------------------+ | | | | +-norec_locks-----------------------------+ | | | | +-noomp-----------------------------------+ | | | V +-auto------------------------------------+ | | '-=----+-omp-------------------------------------+-+-' +-noauto----------------------------------+ +-noopt-----------------------------------+ +-rec_locks-------------------------------+ | .-auto-------------------. | +-schedule--=--+-runtime----------------+-+ | '-+-affinity-+--+------+-' | | +-dynamic--+ '-=--n-' | | +-guided---+ | | '-static---' | +-stackcheck------------------------------+ '-threshold--+------+---------------------' '-=--n-'
Defaults
-qnosmp. Code is produced for a uniprocessor machine.
Parameters
- auto | noauto
- auto enables automatic parallelization and optimization of program code; that is, the compiler attempts to automatically parallelize both user and compiler-generated loops. noauto parallelizes only program code that is explicitly annotated with OpenMP directives. noauto is implied if you specify -qsmp=omp or -qsmp=noopt.
- omp | noomp
- omp implies noauto, that is, only program code that is explicitly annotated with OpenMP directives is parallelized. When noomp is in effect, auto is implied.
- Specifying omp also has the following
effects:
- The -qcclines compiler option is enabled.
- When the C preprocessor is invoked, the _OPENMP C preprocessor macro is defined based on the latest OpenMP API specification that XL Fortran supports. This macro is useful in supporting conditional compilation. See Conditional Compilation for more information.
- opt | noopt
- opt enables optimization of parallelized program code. noopt performs the smallest amount of optimization that is required to parallelize the code. This is useful for debugging because -qsmp enables the -O2 and -qhot options by default, which may result in the movement of some variables into registers that are inaccessible to the debugger. However, if the -qsmp=noopt and -g options are specified, these variables will remain visible to the debugger.
- rec_locks | norec_locks
- Determines whether recursive locks are used to avoid problems associated with CRITICAL constructs. When rec_locks is in effect, nested critical sections will not cause a deadlock; a thread can enter a CRITICAL construct from within the dynamic extent of another CRITICAL construct that has the same name. Note that the rec_locks suboption specifies behavior for critical constructs that is inconsistent with the OpenMP API.
- schedule
- Specifies the type of scheduling algorithms and, except in the
case of auto, chunk size (n) that are used
for loops to which no other scheduling algorithm has been explicitly
assigned in the source code. Suboptions of the schedule suboption
are as follows:
- affinity[=n]
- The iterations of a loop are initially divided into n partitions,
containing ceiling(number_of_iterations/number_of_threads)
iterations. Each partition is initially assigned to a thread and is
then further subdivided into chunks that each contain n iterations.
If n is not specified, then the chunks consist of ceiling(number_of_iterations_left_in_partition /
2) loop iterations.
When a thread becomes free, it takes the next chunk from its initially assigned partition. If there are no more chunks in that partition, then the thread takes the next available chunk from a partition initially assigned to another thread.
The work in a partition initially assigned to a sleeping thread will be completed by threads that are active.
The affinity scheduling type is not part of the OpenMP API specification.
Note: This suboption has been deprecated. You can use the OMP_SCHEDULE environment variable with the dynamic clause for a similar functionality. - auto
- Scheduling of the loop iterations is delegated to the compiler and runtime systems. The compiler and runtime system can choose any possible mapping of iterations to threads (including all possible valid schedule types) and these might be different in different loops. Do not specify chunk size (n).
- dynamic[=n]
- The iterations of a loop are divided into chunks that contain n iterations
each. If n is not specified, each chunk contains one iteration.
Active threads are assigned these chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads until all work has been assigned.
- guided[=n]
- The iterations of a loop are divided into progressively smaller
chunks until a minimum chunk size of n loop iterations is reached.
If n is not specified, the default value for n is
1 iteration.
Active threads are assigned chunks on a "first-come, first-do" basis. The first chunk contains ceiling(number_of_iterations/number_of_threads) iterations. Subsequent chunks consist of ceiling(number_of_iterations_left / number_of_threads) iterations.
- runtime
- Specifies that the chunking algorithm will be determined at run time.
- static[=n]
- The iterations of a loop are divided into chunks containing n iterations
each. Each thread is assigned chunks in a "round-robin" fashion.
This is known as block cyclic scheduling. If the value of n is
1, then the scheduling type is specifically referred to as cyclic
scheduling.
If n is not specified, the chunks will contain floor(number_of_iterations/number_of_threads) iterations. The first remainder (number_of_iterations/number_of_threads) chunks have one more iteration. Each thread is assigned a separate chunk. This is known as block scheduling.
If a thread is asleep and it has been assigned work, it will be awakened so that it may complete its work.
- n
- Must be an integer of value 1 or greater.
Specifying schedule with no suboption is equivalent to schedule=auto.
For more information on chunking algorithms and SCHEDULE, refer to Directives.
- stackcheck | nostackcheck
- Causes the compiler to check for stack overflow by slave threads at run time, and issue a warning if the remaining stack size is less than the number of bytes specified by the stackcheck option of the XLSMPOPTS environment variable. This suboption is intended for debugging purposes, and only takes effect when XLSMPOPTS=stackcheck is also set; see XLSMPOPTS for more information.
- threshold[=n]
- When -qsmp=auto is in effect, controls the
amount of automatic loop parallelization that occurs. The value of n represents
the minimum amount of work required in a loop in order for it to be
parallelized. Currently, the calculation of "work" is weighted heavily
by the number of iterations in the loop. In general, the higher the
value specified for n, the fewer loops are parallelized. Specifying
a value of 0 instructs the compiler to parallelize all auto-parallelizable
loops, whether or not it is profitable to do so. Specifying a value
of 100 instructs the compiler to parallelize only those auto-parallelizable
loops that it deems profitable. Specifying a value of greater than
100 will result in more loops being serialized.
- n
- Must be a positive integer of 0 or greater.
-qsmp=auto:opt:noomp:norec_locks:schedule=auto:
nostackcheck:threshold=100
Usage
- Specifying the omp suboption always implies noauto. Specify -qsmp=omp:auto to apply automatic parallelization on OpenMP-compliant applications, as well.
- The -qsmp option implies -qdirective=SMP\$:\$OMP:IBMP, which turns on the trigger constants SMP$, $OMP, and IBMP, in addition to the default trigger constant IBM*.
- If you use the f77 or fort77 command with the -qsmp option to compile programs, specify -qnosave to make the default storage class automatic, and specify -qthreaded to tell the compiler to generate threadsafe code.
- Object files generated with the -qsmp=opt option can be linked with object files generated with -qsmp=noopt. The visibility within the debugger of the variables in each object file will not be affected by linking.
- Specifying -qsmp implicitly sets -O2. The -qsmp option overrides -qnooptimize, but does not override -O3, -O4, or -O5. When debugging parallelized program code, you can disable optimization in parallelized program code by specifying -qsmp=noopt.
- The -qsmp=noopt suboption overrides performance optimization options anywhere on the command line unless -qsmp appears after -qsmp=noopt. For example, -qsmp=noopt -O3 is equivalent to -qsmp=noopt, while -qsmp=noopt -O3 -qsmp is equivalent to -qsmp -O3.
Examples
program t
integer i, a, b
a = 0
b = 0
!smp$ parallel do
do i=1, 10
!smp$ critical
a = a + 1
!smp$ critical
b = b + 1
!smp$ end critical
!smp$ end critical
enddo
end



