Implementation details of XL Fortran floating-point processing

This topic answers some common questions about floating-point processing.

How can I get predictable, consistent results?
How can I get the fastest or the most accurate results?
How can I detect, and possibly recover from, exception conditions?
Which compiler options can I use for floating-point calculations?

The topics describing floating-point precision make frequent reference to the compiler options that are grouped together in Floating-point and integer control in the XL Fortran Compiler Reference, especially the -qfloat option. The XL Fortran compiler also provides three intrinsic modules for exception handling and IEEE arithmetic support to help you write IEEE module-compliant code that can be more portable. See IEEE Modules and Support in the XL Fortran Language Reference for details.

The use of the compiler options for floating-point calculations affects the accuracy, performance, and possibly the correctness of floating-point calculations. Although the default values for the options were chosen to provide efficient and correct execution of most programs, you may need to specify nondefault options for your applications to work the way you want. We strongly advise you to read this section before using these options.

Note: The discussions of single-, double-, and extended-precision calculations in this section all refer to the default situation, with -qrealsize=4 and no -qautodbl specified. If you change these settings, keep in mind that the size of a Fortran REAL, DOUBLE PRECISION, and so on may change, but single precision, double precision, and extended precision (in lowercase) still refer to 4-, 8-, and 16-byte entities respectively.

Provide feedback