Programmers put effort into optimizing
their applications for performance. Still, performance gain could also be
achieved by simply switching to a newer version of IBM compiler. Although improvement
might vary depending on the design and intent of each application, performance
difference becomes more visible for programs that handle large amount of data.
The following simple program
was designed to require a lot of data: multiplication of two matrices with
large dimensions. (It was not written to multiply the matrices faster). Due... [More]
In a couple of previous posts ( TOC Overflow: what is it, and why should you care? , Dealing with TOC overflow: the traditional approach ) I have presented the issue of TOC overflow. Now I will discuss some features of the XL compilers that can help bypass TOC overflow while minimizing any negative effects on runtime performance.
1. Minimal TOC: The option -qminimaltoc makes the compiler generate code that uses a single entry in the TOC for each compilation unit (in C/C++ a compilation unit is a source file). In order to do this, a... [More]
This demo explains how to use the MASS high-performance mathematical libraries via the auto-vectorization features supported by XL compilers. The demo is built upon the examples given in a developerWorks article titled “ How to improve the performance of programs calling mathematical functions -- Taking advantage of IBM XL C/C++ or XL Fortran compiler auto-vectorization ”. Please refer to the article for detailed explanations on MASS libraries, auto-vectorization, and the Fortran and C source code.
The AIX tprof utility is a valuable tool for optimization which provides features to identify performance bottlenecks within programs and to aid in analysis of performance-critical code. The IBM XL C/C++ Compiler provides features to aid in the use of tprof for performance analysis of programs. This document provides a brief introduction to select features of the tprof utility, with focus on features which interact with XL compiler listing files to allow performance analysis at the source-line or instruction level. In its most simple usage,... [More]
Hello -- I'm the technical leader for the IBM MASS math libraries. MASS stands for Mathematical Acceleration Subsystem, and consists of libraries of mathematical functions specifically tuned for optimum performance on various computing platforms. MASS was originally launched by IBM in 1995, and has been continuously improved and expanded since then. I've been involved with MASS since 2002. There are currently versions of MASS for all the POWER processors, running AIX or Linux operating systems. There are also versions for BlueGene/L and... [More]
Welcome to the blog on Scientific Computing with C/C++. IBM has a long history of Scientific Computing starting with the creation of FORTRAN in 1957 to the control systems for the Apollo missions in the 1960s. Today, various user groups and wikis exist to support IBM’s scientific computer users: SCICOMP (see http://www.spscicomp.org/ ) SPXXL (see http://www.spxxl.org/new_website/html/index.html ) HPC Central (see http://www.ibm.com/developerworks/wikis/display/hpccentral/HPC+Central ) My Name is Roch Archambault and I have over 20 years... [More]
In a previous blog entry I discussed how one might incorporate the prefetch built-in function (bif) into a z/OS binary that is compiled with ARCH(5) i.e. a binary that will run on all supported hardware, yet when it detects it is running on a z10 or z196 system, would be able to utilize the prefetch instruction. There was a request for examples of using prefetch , which I thought was a good idea, so starting with this entry, I'll present some examples of how one might use this instruction. I'll start with a simple, basic program and in... [More]
If your applications call mathematical functions such as sin, cos, exp, log, etc. and you are interested in maximizing performance with minimum effort, here is something that will interest you! My colleague Daniel Zabawa and I have written a paper, "How to improve the performance of programs calling mathematical functions -- taking advantage of IBM XL C/C++ or XL Fortran compiler auto-vectorization". Our paper introduces the IBM MASS high-performance mathematical libraries, and demonstrates how to benefit from them — without the need for... [More]
Mathematical Acceleration Subsystem (MASS) Version 6.0 for AIX has been released. This high performance suite of elementary and special mathematical functions contains two new libraries tuned for the IBM POWER7 processor, and exploiting the POWER7 VSX SIMD instruction set. MASS v6.0 offers performance on POWER7 of up to 260 times that of the system math library libm, and up to 4 times that of previous versions of MASS. MASS v6.0 ships with the IBM XL C/C++ for AIX v11.1 and XL Fortran for AIX v13.1 compilers, which provide for both explicit... [More]
Which option combination do you think yields a faster execution? A. -O2 –qinline B. -O2 –qnoinline By specifying –qinline user is asking the compiler to inline functions that meets the criteria of inlining threshold and limit size: for the z platform the default value for the relative size of the function to be inlined is 100 ACUs and the maximum relative size a function can grow before inliner stops inlining more is 1000 ACUs . While inlining eliminates the linkage overhead and provides a larger code to the compiler to be optimized, its... [More]
Here we are eleven days into a new year; and I would like to wish all of you a happy belated 2009! May this be a year of making technology less complex, more intuitive, friendlier and greener. Is it clear why we have RENT|NORENT compiler option? Do you know that C++ always uses constructed re-enterancy? Can you imagine how applications can benefit most from this? Recently, I developed a greater appreciation for RENT option, that is after I banged my head against the wall to REALLY understand what RENT is all about in order to fix a related... [More]
Computer architectures designed with high-performance microprocessors are reshaping the IT landscape. Today's software developers must deliver products and services faster with higher quality and performance to stay competitive, but in an environment with unprecedented potential, it's increasingly difficult for application developers to simultaneously handle business logic and performance issues in their code. That's where IBM Power Systems technology steps in, says Susan Yoskin, IBM Rational marketing solution manager. Power Systems are based... [More]
Dual cores have become household products, yet we see little change in the performance of the application that run on these machines. Just this morning, I started Firefox and Lotus Notes at the same time on my dual core T60p laptop, and had to wait a long time before I could use either one of the applications. To take full advantage of hardware horse power: software has to keep up, compiler has to generate better code, or both. Recently, I have been experimenting with compiler options to see how much I can tune the size and performance of the... [More]
z/OS C/C++ Performance Features Compliers are an important tool in your development environment. A good optimizing compiler generates performance code without you worrying about the low level details of the OS, internals of the runtime environment and hardware architecture. You can concentrate on the business logic in your application. But optimization can take up a lot of resources, both in terms of compilation time and memory space. XL C/C++ provides an optimization option with 3 levels (called suboption 1, 2 and 3). Level 1 and 2 represent... [More]
Next week the First Annual OpenPOWER Summit takes place in San Jose, California from March 17-19. The summit will have a large number of technical professionals and industry experts discussing and demonstrating the latest advancements in OpenPOWER based applications, platforms, and research. The XL C/C++ compiler team will be presenting two sessions on the latest compilation technology advancements: