This page provides software developers the easy steps to follow to take advantage of the decimal floating-point unit (DFU) available on IBM POWER6 processor-based systems running Linux. We will first present a brief introduction of the decimal floating-point (DFP) technology, and then go over the information on how to use the compilers to leverage DFP.
Today, there are two packages with compilers that can be used to exploit the decimal floating-point functionality on POWER6-based Linux systems:
- Advance Toolchain v1.1 which provides an alternative newer gcc compiler on your system, and
- IBM XL C/C++ Advanced Edition for Linux, V9.0.
Information provided in this page can be applied to both Red Hat and SUSE's SLES 10. The examples below were carried out on the system running RHEL5.2. Java applications can also take advantage of the Decimal Floating Point enhancements, but that is beyond the scope of this article. Check out references provided at the end of this page for more information on Java exploitation.
Additional Linux on Power performance information is available at http://www.ibm.com/developerworks/wikis/display/LinuxP/Home
Contents
Decimal Floating-Point
Decimal (the classic day-to-day base 10) data is widely used in commercial and financial applications. However, most computer systems have only binary (base two) arithmetic, using 0 and 1 to represent numbers. There are two binary number systems in computers: integer (fixed-point), and floating-point. Unfortunately, decimal calculations cannot be directly implemented with binary floating-point. For example, the value 0.1 would need an infinitely recurring binary fraction while a decimal number system can represent it exactly, as one tenth. So, using binary floating-point cannot guarantee that results will be the same as those using decimal arithmetic.
There's a good web page available which describes General Decimal Arithmetic - see http://speleotrove.com/decimal/
. Mike Cowlishaw
, an IBM Fellow, has consolidated a lot of good information on this page.
In general, decimal floating-point operations have been emulated with binary fixed-point integers. Decimal numbers are traditionally held in a binary-coded decimal (BCD) format. While BCD provides sufficient accuracy for decimal calculation, it imposes a heavy cost in performance because it is usually implemented in software.
IBM POWER6 processor-based systems provide hardware support for decimal floating-point arithmetic. POWER6 microprocessor core includes the decimal floating-point unit that provides acceleration for the decimal floating-point arithmetic. The IBM POWER instruction set is expanded; 54 new instructions were added to support the decimal floating-point unit architecture.
Next, we show how developers can exploit decimal floating point math on Linux.
The Advance Toolchain version 1.1-0
The Advance Toolchain is a set of free-software development tools allowing users to take greater leading edge advantage of IBM latest hardware features: (1) Power6 enablement and exploitation, (2) ppc970, POWER4, POWER5, POWER5+,POWER6, POWER6x optimized system libraries, and (3) Decimal Floating Point capability.
Advance Toolchain is a self contained toolchain which does not rely on the base system toolchain for operability, and in fact is designed to coexist with the toolchain shipped with the operating system. That is, you do not have to uninstall the regular GCC compilers that come with your Linux distribution in order to use the Advance Toolchain.
The Advance Toolchain package includes the following components:
- GNU Compiler Collection (gcc, g++, gfortran),
- C Libraries (libc, libmpfr, and others),
- binaries utilities (ld, ldd, objcopy, objdump, nm, and others),
- debugger (gdb32, gdb64), and
- performance analysis tools (Oprofile, Valgrind, gprof, mtrace, xtrace).
Customers can download the Advance Toolchain from
Download and install the following three rpms (rpm -ivh *.rpm)
- advance-toolchain-devel-1.1-0.ppc64.rpm
- advance-toolchain-perf-1.1-0.ppc64.rpm
- advance-toolchain-runtime-1.1-0.ppc64.rpm
The simple release notes are available at:
The recommended installation method is to use YaST or YUM commands in order to verify the authenticity of the packages. Please consult the Release Notes for the Advance Toolchain for the detailed instructions. In our experience, installing with the rpm method is just fine.
The following is a list of gcc compiler options for Advance Toolchain related to Decimal Floating Point:
- -D__STDC_WANT_DEC_FP__ : enabling the reference of DFP defined symbols.
- -ldfp : enabling the decimal floating-point functionality provided by the Advance Toolchain.
- -mno-dfp: instructing the compiler to use calls to library functions to handle decimal floating point computation, regardless of the architecture level. You may experience performance degradation when using software emulation.
IBM XL C/C++ Compilers
IBM XL C/C++ Advanced Edition for Linux is a standards-based compiler with advanced optimizing features for select Linux distributions running on POWER-based systems. It is not free but there is a 60-days trial program in case you would like to check it out.
http://www-01.ibm.com/software/awdtools/fortran/xlfortran/features/linux/xlf-linux.html
http://www-01.ibm.com/software/awdtools/xlcpp/features/linux/xlcpp-linux.html
To try out DFP functionality with IBM XL compiler, you need to first install the Advance Toolchain and then configure the IBM compiler to use it. The Advance Toolchain provides the runtime support for DFP . Using the IBM XL compiler to utilize DFP functionality is, however, provided "as is", which means there is no official support for it. The purpose of this preview is to showcase early results of development work.
Assuming that you installed the Advance Toolchain and the IBM compiler at their default locations,
to configure the IBM compiler, you basically have to execute the following command:
Detailed instruction to configure the IBM compiler for DFP can be found at
http://www-1.ibm.com/support/docview.wss?rs=2239&context=SSJT9L&uid=swg27010218
We recommend upgrading your IBM XL compiler with the latest ptf which can be downloaded at
http://www-1.ibm.com/support/docview.wss?uid=swg24018145
Following is a list of compiler options for IBM XL compilers related to Decimal Floating Point:
- -qdfp : enabling decimal floating-point support. Specifically, this option will make the compiler to recognize decimal floating-point literal suffixes, and the _Decimal32, _Decimal64, and _Decimal128 keywords.
- -qfloat=dfpemulate : instructing the compiler to use calls to library functions to handle decimal floating point computation, regardless of the architecture level. You may experience performance degradation when using software emulation.
- -qfloat=nodfpemulate : this is the default when -qarch=pwr6 or -qarch=pwr6e is specified.
- -D__STDC_WANT_DEC_FP__ : enabling the reference of DFP defined symbols.
- -F/path/to/my/configfile : specifying the full path name of the compiler configuration file to use.
- -ldfp : enabling the decimal floating-point functionality provided by the Advance Toolchain.
Below, we provide two sample codes demonstrating the usage of DFP functionalities. For each program, we show how to build using the Advance Toolchain and the IBM XL compilers.
The first sample code is just a simple program. We recommend using it to check if your compiler setup is correct.
Sample Code 1
#include <stdio.h>
#include <float.h>
int main() {
_Decimal128 d128;
double fl ;
printf("Hello DFP world\n");
printf("DEC32_MAX = %Hf\n", DEC32_MAX);
printf("DEC64_MAX = %Df\n", DEC64_MAX);
printf("DEC128_MAX = %DDf\n", DEC128_MAX);
d128 = 1.000001DL;
printf("1.000001 as _Decimal128: \n = '%40.30DDf'\n", d128);
fl = 1.000001;
printf("1.000001 as a float: \n = '%40.30f'\n",fl);
}
Here is how to build it with the IBM compiler. The program's output is also shown below.
Now build the same program with the Advance Toolchain.
As you can see, using DFP is 100% accurate, whereas using binary floating-point is not quite "exactly correct".
The second program is originally from Nigel Griffiths's AIX Decimal Floating Point wiki page (See References Section). We only need to add a few header files so that the code can be compiled on Linux.
Sample Code 2
/* Code from Nigel Griffiths */
#include <ctype.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/* Takes a string with a decimal number and returns a _Decimal128 * Format: [+ -]digits.digits */
_Decimal128 atodecimal(char *s)
{
_Decimal128 top=0, bot=0, result;
int negative=0, i;
if( s[0] == '-') {
negative=1;
s++;
}
if( s[0] == '+') s++;
for(; isdigit(*s); s++) {
top = top * 10;
top = top + *s - '0';
}
if(*s == '.') {
s++;
for(i=strlen(s)-1; isdigit(s[i]);i--) {
bot = bot / 10;
bot = bot + (_Decimal128)(s[i] - '0')/(_Decimal128)10;
}
}
result = top + bot;
if(negative)
result = -result;
return result;
}
int main(int argc, char **argv)
{
long i, count;
double dfund, dinterest;
_Decimal128 Dfund, Dinterest; /* Declaring the new data type*/
dfund = atof(argv[1]);
dinterest = atof(argv[2]);
Dfund = atodecimal(argv[1]); /* Assigning values just like other data types */
Dinterest = atodecimal(argv[2]);
count = atoi(argv[3]);
printf("double fund=%20.10f interest=%40.30f\n",dfund,dinterest);
printf("Decimal fund=%20.10DDf interest=%40.30DDf\n",Dfund,Dinterest);
for(i=0;i<count;i++) {
dfund=dfund*dinterest;
Dfund=Dfund*Dinterest; /* performing maths */
}
printf("Print final funds\n");
printf("double fund=%30.10f\n",dfund);
printf("Decimal fund=%30.10DDf\n",Dfund);
}
Here is how to build this program using the IBM XL compilers on Power6 processor-based systems.
Here is the output and its timing when we run the program on a Power6 processor-based p 550 running at 4.2 GHz.
Now, to demonstrate the benefit of DFU over the software-based decimal floating-point computation, we will force the compiler to use the software emulation mode.
As you can see, Using DFU is 141 times faster (1.3 seconds vs 183 seconds) than the software emulation. Keep in mind that you may or may not see this big improvement in your application.
If you want to use the Advance Toolchain to build sample2.c, here is how.
To build this program without the DFU hardware support with the Advance Toolchain, here is how to do it:
Basically, you only need to replace the flag -ldfp with -mno-dfp.
Next, we will show how to use OProfile to determine if your code is really using the Decimal Floating-point Unit. OProfile uses hardware performance counters to enable profiling all running program with little overhead. In addition to the event-based profiling, we can use OProfile to get the basic time-spent profiling as well. At the time of this writing, the latest version of OProfile (v0.9.3) has the support for DFU-related events on POWER6. Those events can be found in the following groups: Group 89 pm_dfu and Group 90 pm_dfu2.
In the following example, we will configure Oprofile to monitor the event called DFU instruction finish when we run
the program dfphw.
Here is what you need to do before running the program.
Then, run the dfwhw program. Now, issue the following commands to get the profile output and annotated source.
To get annotated source, you need to compile your code with the -g flag.
Below is the profiling output (oprofile.out). You will see that the DFU instruction finish event occurs 100% in main routine. This confirms that you are really using DFU.
Here is a part of our annotated source showing where in the main routine the DFU operation has taken place.
Summary
Decimal numbers are widely used in commercial and financial applications. Software support for DFP is generally available today but has performance problem. Decimal Floating-Point Unit provides hardware support for decimal floating-point arithmetic on POWER6-processor based systems. There are two compilers available for Linux: IBM XL C/C++ compilers and Advance Toolchain, to exploit this feature. This hardware support in general will give you a performance boost. The level of performance improvement however depends on the nature of your applications.
Summarized References