-O, -qoptimize

Pragma equivalent

#pragma options [no]optimize

Purpose

Specifies whether to optimize code during compilation and, if so, at which level.

Syntax

Read syntax diagramSkip visual syntax diagram
          .-noopt----------------------.     
          +-nooptimize-----------------+     
>>-+- -q--+-+-optimize-+--+----------+-+-+---------------------><
   |        '-opt------'  '-=--+-0-+-'   |   
   |                           +-2-+     |   
   |                           +-3-+     |   
   |                           +-4-+     |   
   |                           '-5-'     |   
   +- -O0--------------------------------+   
   +- -O---------------------------------+   
   +- -O2--------------------------------+   
   +- -O3--------------------------------+   
   +- -O4--------------------------------+   
   '- -O5--------------------------------'   

Defaults

-qnooptimize or -O0 or -qoptimize=0

Parameters

-O0 | nooptimize | noopt | optimize|opt=0
Performs only quick local optimizations such as constant folding and elimination of local common subexpressions.

This setting implies -qstrict_induction unless -qnostrict_induction is explicitly specified.

-O | -O2 | optimize | opt | optimize|opt=2
Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. The optimizations may change from product release to release. If you need a specific level of optimization, specify the appropriate numeric value.

This setting implies -qstrict and -qnostrict_induction, unless explicitly negated by -qstrict_induction or -qnostrict.

-O3 | optimize|opt=3
Performs additional optimizations that are memory intensive, compile-time intensive, or both. They are recommended when the desire for runtime improvement outweighs the concern for minimizing compilation resources.
-O3 applies the -O2 level of optimization, but with unbounded time and memory limits. -O3 also performs higher and more aggressive optimizations that have the potential to slightly alter the semantics of your program. The compiler guards against these optimizations at -O2. The aggressive optimizations performed when you specify -O3 are:
  1. Aggressive code motion, and scheduling on computations that have the potential to raise an exception, are allowed.

    Loads and floating-point computations fall into this category. This optimization is aggressive because it may place such instructions onto execution paths where they will be executed when they may not have been according to the actual semantics of the program.

    For example, a loop-invariant floating-point computation that is found on some, but not all, paths through a loop will not be moved at -O2 because the computation may cause an exception. At -O3, the compiler will move it because it is not certain to cause an exception. The same is true for motion of loads. Although a load through a pointer is never moved, loads off the static or stack base register are considered movable at -O3. Loads in general are not considered to be absolutely safe at -O2 because a program can contain a declaration of a static array a of 10 elements and load a[60000000003], which could cause a segmentation violation.

    The same concepts apply to scheduling.

    Example:

    In the following example, at -O2, the computation of b+c is not moved out of the loop for two reasons:

    • It is considered dangerous because it is a floating-point operation
    • It does not occur on every path through the loop

    At -O3, the code is moved.

          ...
        int i ;
        float a[100], b, c ;
        for (i = 0 ; i < 100 ; i++)
         {
         if (a[i] < a[i+1])
          a[i] = b + c ;
         }
          ...
  2. Both -O2 and -O3 conform to the following IEEE rules.

    With -O2 certain optimizations are not performed because they may produce an incorrect sign in cases with a zero result, and because they remove an arithmetic operation that may cause some type of floating-point exception.

    For example, X + 0.0 is not folded to X because, under IEEE rules, -0.0 + 0.0 = 0.0, which is -X. In some other cases, some optimizations may perform optimizations that yield a zero result with the wrong sign. For example, X - Y * Z may result in a -0.0 where the original computation would produce 0.0.

    In most cases the difference in the results is not important to an application and -O3 allows these optimizations.

  3. Floating-point expressions may be rewritten.

    Computations such as a*b*c may be rewritten as a*c*b if, for example, an opportunity exists to get a common subexpression by such rearrangement. Replacing a divide with a multiply by the reciprocal is another example of reassociating floating-point computations.

  4. Specifying -O3 implies -qhot=level=0, unless you explicitly specify -qhot or -qhot=level=1 option.

-qfloat=fltint:rsqrt is set by default with -O3.

-qmaxmem=-1 is set by default with -O3, allowing the compiler to use as much memory as necessary when performing optimizations.

Built-in functions do not change errno at -O3.

Aggressive optimizations do not include the following floating-point suboptions: -qfloat=hsflt | hssngl, or anything else that affects the precision mode of a program.

Integer divide instructions are considered too dangerous to optimize even at -O3.

Refer to -qflttrap to see the behavior of the compiler when you specify optimize options with the -qflttrap option.

You can use the -qstrict and -qstrict_induction compiler options to turn off effects of -O3 that might change the semantics of a program. Specifying -qstrict together with -O3 invokes all the optimizations performed at -O2 as well as further loop optimizations. Reference to the -qstrict compiler option can appear before or after the -O3 option.

The -O3 compiler option followed by the -O option leaves -qignerrno on.

When -O3 and -qhot=level=1 are in effect, the compiler replaces any calls in the source code to standard math library functions with calls to the equivalent MASS library functions, and if possible, the vector versions.

-O4 | optimize|opt=4
This option is the same as -O3, except that it also:
  • Sets the -qarch and -qtune options to the architecture of the compiling machine
  • Sets the -qcache option most appropriate to the characteristics of the compiling machine
  • Sets the -qhot option
  • Sets the -qipa option
Note: Later settings of -O, -qcache, -qhot, -qipa, -qarch, and -qtune options will override the settings implied by the -O4 option.

This option follows the "last option wins" conflict resolution rule, so any of the options that are modified by -O4 can be subsequently changed. For example, specifying -O4 -qarch=ppc allows aggressive intraprocedural optimization while maintaining code portability.

-O5 | optimize|opt=5
This option is the same as -O4, except that it:
  • Sets the -qipa=level=2 option to perform full interprocedural data flow and alias analysis.
Note: Later settings of -O, -qcache, -qipa, -qarch, and -qtune options will override the settings implied by the -O5 option.

Usage

Increasing the level of optimization may or may not result in additional performance improvements, depending on whether additional analysis detects further opportunities for optimization.

Compilations with optimizations may require more time and machine resources than other compilations.

Optimization can cause statements to be moved or deleted, and generally should not be specified along with the -g flag for debugging programs. The debugging information produced may not be accurate.

When using -O or higher optimization, -qtbtable=small is implied. The traceback table generated has no function name or parameter information.

If optimization level -O3 or higher is specified on the command line, the -qhot and -qipa options that are set by the optimization level cannot be overridden by #pragma option_override(identifier, "opt(level, 0)") or #pragma option_override(identifier, "opt(level, 2)").

Predefined macros

  • __OPTIMIZE__ is predefined to 2 when -O | O2 is in effect; it is predefined to 3 when -O3 | O4 | O5 is in effect. Otherwise, it is undefined.
  • __OPTIMIZE_SIZE__ is predefined to 1 when -O | -O2 | -O3 | -O4 | -O5 and -qcompact are in effect. Otherwise, it is undefined.

Examples

To compile and optimize myprogram.c, enter:
xlc myprogram.c -O3


Voice your opinion on getting help information Ask IBM compiler experts a technical question in the IBM XL compilers forum Reach out to us