-qsmp

Pragma equivalent

None.

Purpose

Enables parallelization of program code.

Syntax

Read syntax diagramSkip visual syntax diagram
        .-nosmp-------------------------------------------------------.   
>>- -q--+-smp--+----------------------------------------------------+-+-><
               |    .-:-------------------------------------------. |     
               |    | .-nostackcheck----------------------------. | |     
               |    | +-ostls-----------------------------------+ | |     
               |    | +-opt-------------------------------------+ | |     
               |    | +-norec_locks-----------------------------+ | |     
               |    | +-noomp-----------------------------------+ | |     
               |    | +-nonested_par----------------------------+ | |     
               |    | +-explicit--------------------------------+ | |     
               |    V +-auto------------------------------------+ | |     
               '-=----+-omp-------------------------------------+-+-'     
                      +-noostls---------------------------------+         
                      +-nested_par------------------------------+         
                      +-noauto----------------------------------+         
                      +-noexplicit------------------------------+         
                      +-noopt-----------------------------------+         
                      +-rec_locks-------------------------------+         
                      |              .-auto-------------------. |         
                      +-schedule--=--+-runtime----------------+-+         
                      |              '-+-affinity-+--+------+-' |         
                      |                +-dynamic--+  '-=--n-'   |         
                      |                +-guided---+             |         
                      |                '-static---'             |         
                      +-stackcheck------------------------------+         
                      '-threshold--+------+---------------------'         
                                   '-=--n-'                               

Defaults

-qnosmp. Code is produced for a uniprocessor machine.

Parameters

auto | noauto
Enables or disables automatic parallelization and optimization of program code. When noauto is in effect, only program code explicitly parallelized with SMP or OpenMP directives is optimized. noauto is implied if you specify -qsmp=omp or -qsmp=noopt.
explicit | noexplicit
Enables or disables directives controlling explicit parallelization of loops.
nested_par | nonested_par
By default, the compiler serializes a nested parallel construct. When nested_par is in effect, the compiler parallelizes prescriptive nested parallel constructs. This includes not only the loop constructs that are nested within a scoping unit but also parallel constructs in subprograms that are referenced (directly or indirectly) from within other parallel constructs. Note that this suboption has no effect on loops that are automatically parallelized. In this case, at most one loop in a loop nest (in a scoping unit) will be parallelized.

The setting of the omp_set_nested function or of the OMP_NESTED environment variable overrides the setting of the -qsmp = nested_par | nonested_par option.

This suboption should be used with caution. Depending on the number of threads available and the amount of work in an outer loop, inner loops could be executed sequentially even if this option is in effect. Parallelization overhead may not necessarily be offset by program performance gains.

Note: The -qsmp=nested_par | nonested_par option has been deprecated and might be removed in a future release. Use the OMP_NESTED environment variable or the omp_set_nested function instead.
omp | noomp
Enforces or relaxes strict compliance with the OpenMP standard. When noomp is in effect, auto is implied. When omp is in effect, noauto is implied and only OpenMP parallelization directives are recognized. The compiler issues warning messages if your code contains any language constructs that do not conform to the OpenMP API.
Note: The -qsmp=omp option must be used to enable OpenMP parallelization.
opt | noopt
Enables or disables optimization of parallelized program code. When noopt is in effect, the compiler will do the smallest amount of optimization that is required to parallelize the code. This is useful for debugging because -qsmp enables the -O2 and -qhot options by default, which may result in the movement of some variables into registers that are inaccessible to the debugger. However, if the -qsmp=noopt and -g options are specified, these variables will remain visible to the debugger.
ostls| noostls
Enables thread-local storage (TLS) provided by the operating system to be used for threadprivate data. You can use the noostls suboption to enable non-TLS for threadprivate. The noostls suboption is provided for compatibility with earlier versions of the compiler.
Note: If you use this suboption, your operating system must support TLS to implement OpenMP threadprivate data. Use noostls to disable OS level TLS if your operating system does not support it.
rec_locks | norec_locks
Determines whether recursive locks are used. When rec_locks is in effect, nested critical sections will not cause a deadlock. Note that the rec_locks suboption specifies behavior for critical constructs that is inconsistent with the OpenMP API.
schedule
Specifies the type of scheduling algorithms and, except in the case of auto, chunk size (n) that are used for loops to which no other scheduling algorithm has been explicitly assigned in the source code. Suboptions of the schedule suboption are as follows:
affinity[=n]
The iterations of a loop are initially divided into n partitions, containing ceiling(number_of_iterations/number_of_threads) iterations. Each partition is initially assigned to a thread and is then further subdivided into chunks that each contain n iterations. If n is not specified, then the chunks consist of ceiling(number_of_iterations_left_in_partition / 2) loop iterations.

When a thread becomes free, it takes the next chunk from its initially assigned partition. If there are no more chunks in that partition, then the thread takes the next available chunk from a partition initially assigned to another thread.

The work in a partition initially assigned to a sleeping thread will be completed by threads that are active.

The affinity scheduling type is not part of the OpenMP API specification.

Note: This suboption has been deprecated. You can use the OMP_SCHEDULE environment variable with the dynamic clause for a similar functionality.
auto
Scheduling of the loop iterations is delegated to the compiler and runtime systems. The compiler and runtime system can choose any possible mapping of iterations to threads (including all possible valid schedule types) and these might be different in different loops. Do not specify chunk size (n).
dynamic[=n]
The iterations of a loop are divided into chunks that contain n iterations each. If n is not specified, each chunk contains one iteration.

Active threads are assigned these chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads until all work has been assigned.

guided[=n]
The iterations of a loop are divided into progressively smaller chunks until a minimum chunk size of n loop iterations is reached. If n is not specified, the default value for n is 1 iteration.

Active threads are assigned chunks on a "first-come, first-do" basis. The first chunk contains ceiling(number_of_iterations/number_of_threads) iterations. Subsequent chunks consist of ceiling(number_of_iterations_left / number_of_threads) iterations.

runtime
Specifies that the chunking algorithm will be determined at run time.
static[=n]
The iterations of a loop are divided into chunks containing n iterations each. Each thread is assigned chunks in a "round-robin" fashion. This is known as block cyclic scheduling. If the value of n is 1, then the scheduling type is specifically referred to as cyclic scheduling.

If n is not specified, the chunks will contain floor(number_of_iterations/number_of_threads) iterations. The first remainder (number_of_iterations/number_of_threads) chunks have one more iteration. Each thread is assigned a separate chunk. This is known as block scheduling.

If a thread is asleep and it has been assigned work, it will be awakened so that it may complete its work.

n
Must be an integer of value 1 or greater.

Specifying schedule with no suboption is equivalent to schedule=auto.

stackcheck | nostackcheck
Causes the compiler to check for stack overflow by slave threads at run time, and issue a warning if the remaining stack size is less than the number of bytes specified by the stackcheck option of the XLSMPOPTS environment variable. This suboption is intended for debugging purposes, and only takes effect when XLSMPOPTS=stackcheck is also set; see XLSMPOPTS.
threshold[=n]
When -qsmp=auto is in effect, controls the amount of automatic loop parallelization that occurs. The value of n represents the minimum amount of work required in a loop in order for it to be parallelized. Currently, the calculation of "work" is weighted heavily by the number of iterations in the loop. In general, the higher the value specified for n, the fewer loops are parallelized. Specifying a value of 0 instructs the compiler to parallelize all auto-parallelizable loops, whether or not it is profitable to do so. Specifying a value of 100 instructs the compiler to parallelize only those auto-parallelizable loops that it deems profitable. Specifying a value of greater than 100 will result in more loops being serialized.
n
Must be a positive integer of 0 or greater.
If you specify threshold with no suboption, the program uses a default value of 100.
Specifying -qsmp without suboptions is equivalent to:
-qsmp=auto:explicit:opt:noomp:norec_locks:nonested_par:schedule=auto:
nostackcheck:threshold=100:ostls

Usage

  • Specifying the omp suboption always implies noauto. Specify -qsmp=omp:auto to apply automatic parallelization on OpenMP-compliant applications, as well.
  • You should only use -qsmp with the _r-suffixed invocation commands, to automatically link in all of the threadsafe components. You can use the -qsmp option with the non-_r-suffixed invocation commands, but you are responsible for linking in the appropriate components. If you use the -qsmp option to compile any source file in a program, then you must specify the -qsmp option at link time as well, unless you link by using the ld command.
  • Object files generated with the -qsmp=opt option can be linked with object files generated with -qsmp=noopt. The visibility within the debugger of the variables in each object file will not be affected by linking.
  • The -qnosmp default option setting specifies that no code should be generated for parallelization directives, though syntax checking will still be performed. Use -qignprag=omp:ibm to completely ignore parallelization directives.
  • Specifying -qsmp implicitly sets -O2. The -qsmp option overrides -qnooptimize, but does not override -O3, -O4, or -O5. When debugging parallelized program code, you can disable optimization in parallelized program code by specifying -qsmp=noopt.
  • The -qsmp=noopt suboption overrides performance optimization options anywhere on the command line unless -qsmp appears after -qsmp=noopt. For example, -qsmp=noopt -O3 is equivalent to -qsmp=noopt, while -qsmp=noopt -O3 -qsmp is equivalent to -qsmp -O3.

Predefined macros

C only When -qsmp is in effect, _IBMSMP is predefined to a value of 1, which indicates that IBM SMP directives are recognized; otherwise, it is not defined.



Voice your opinion on getting help information Ask IBM compiler experts a technical question in the IBM XL compilers forum Reach out to us