Compiling with optimization

To produce a program that achieves good performance, the first step is to take advantage of the basic optimization features built into the compiler.

Compiling with optimization can increase the speedup that comes from tuning your program and can remove the need to perform some kinds of tuning.

Recommendations

Follow these guidelines for optimization:

  • Use -O2 or -O3 -qstrict for any production-level FORTRAN, C, or C++ program you compile. For High Performance FORTRAN (HPF) programs, do not use the -qstrict option.
  • Use the -qhot option for programs where the hot spots are loops or array language. Always use the -qhot option for HPF programs.
  • Use the -qipa option near the end of the development cycle if compilation time is not a major consideration.

The -qipa option activates or customizes a class of optimizations known as interprocedural analysis. The -qipa option has several suboptions that are detailed in the compiler manual. It can be used in two ways:

  • The first method is to compile with the -qipa option during both the compile and link steps. During compilation, the compiler stores interprocedural analysis information in the .o file. During linking, the -qipa option causes a complete recompilation of the entire application.
  • The second method is to compile the program for profiling with the -p/-pg option (with or without -qipa), and run it on a typical set of data. The resulting data can then be fed into subsequent compilations with -qipa so that the compiler concentrates optimization in the seconds of the program that are most frequently used.

Using -O4 is equivalent to using -O3 -qipa with automatic generation of architecture and tuning option ideal for that platform. Using the -O5 flag is similar to -O4 except that -qipa= level = 2.

You gain the following benefits when you use compiler optimization:

Branch optimization
Rearranges the program code to minimize branching logic and to combine physically separate blocks of code.
Code motion
If variables used in a computation within a loop are not altered within the loop, the calculation can be performed outside of the loop and the results used within the loop.
Common subexpression elimination
In common expressions, the same value is recalculated in a subsequent expression. The duplicate expression can be eliminated by using the previous value.
Constant propagation
Constants used in an expression are combined, and new ones are generated. Some implicit conversions between integers and floating-point types are done.
Dead code elimination
Eliminates code that cannot be reached or where the results are not subsequently used.
Dead store elimination
Eliminates stores when the value stored is never referenced again. For example, if two stores to the same location have no intervening load, the first store is unnecessary and is removed.
Global register allocation
Allocates variables and expressions to available hardware registers using a "graph coloring" algorithm.
Inlining
Replaces function calls with actual program code
Instruction scheduling
Reorders instructions to minimize execution time
Interprocedural analysis
Uncovers relationships across function calls, and eliminates loads, stores, and computations that cannot be eliminated with more straightforward optimizations.
Invariant IF code floating (Unswitching)
Removes invariant branching code from loops to make more opportunity for other optimizations.
Profile driven feedback
Results from sample program execution are used to improve optimization near conditional branches and in frequently executed code sections.
Reassociation
Rearranges the sequence of calculations in an array subscript expression, producing more candidates for common expression elimination.
Store motion
Moves store instructions out of loops.
Strength Reduction
Replaces less efficient instructions with more efficient ones. For example, in array subscripting, an add instruction replaces a multiply instruction.
Value numbering
Involves constant propagation, expression elimination, and folding of several instructions into a single instruction.

When to compile without optimization

Do not use the -O option for programs that you intend to debug with a symbolic debugger, regardless of whether you use the -g option. However, because optimization is so important to HPF programs, use -O3 -qhot for them even during debugging.

The optimizer rearranges assembler-language instructions, making it difficult to map individual instructions to a line of source code. If you compile with the -g option, this rearrangement may give the appearance that the source-level statements are executed in the wrong order when you use a symbolic debugger.

If your program produces incorrect results when it is compiled with any of the -O options, check your program for unintentionally aliased variables in procedure references.