In a couple of previous posts ( TOC Overflow: what is it, and why should you care? , Dealing with TOC overflow: the traditional approach ) I have presented the issue of TOC overflow. Now I will discuss some features of the XL compilers that can help bypass TOC overflow while minimizing any negative effects on runtime performance.
1. Minimal TOC: The option -qminimaltoc makes the compiler generate code that uses a single entry in the TOC for each compilation unit (in C/C++ a compilation unit is a source file). In order to do this, a... [More]
Programmers put effort into optimizing
their applications for performance. Still, performance gain could also be
achieved by simply switching to a newer version of IBM compiler. Although improvement
might vary depending on the design and intent of each application, performance
difference becomes more visible for programs that handle large amount of data.
The following simple program
was designed to require a lot of data: multiplication of two matrices with
large dimensions. (It was not written to multiply the matrices faster). Due... [More]
This demo explains how to use the MASS high-performance mathematical libraries via the auto-vectorization features supported by XL compilers. The demo is built upon the examples given in a developerWorks article titled “ How to improve the performance of programs calling mathematical functions -- Taking advantage of IBM XL C/C++ or XL Fortran compiler auto-vectorization ”. Please refer to the article for detailed explanations on MASS libraries, auto-vectorization, and the Fortran and C source code.
Next week the First Annual OpenPOWER Summit takes place in San Jose, California from March 17-19. The summit will have a large number of technical professionals and industry experts discussing and demonstrating the latest advancements in OpenPOWER based applications, platforms, and research. The XL C/C++ compiler team will be presenting two sessions on the latest compilation technology advancements:
This week, various members of the IBM compiler team will be attending InterConnect 2015. Join us and other members of the largest community embodying the full life-cycle of IT, assets and infrastructure — from Development to Architecture to Operations. The compiler team will be presenting the following topics this year:
DEM-3886: Using New IBM Compilers to Reduce Operating Costs for Business-Critical Applications on z13
On December 12, 2014, IBM will be releasing a new XL C/C++ compiler to support application development targeting the little endian Linux distributions running on IBM Power Systems with POWER8 processor and architecture. A key strength of the IBM XL compilers on Linux is its ability to generate highly optimized code for execution on IBM Power Systems. With these new compilers, you can create and port applications for execution on the next generation of IBM systems built on POWER8 technology, designed to handle big data and to drive... [More]
Computer architectures designed with high-performance microprocessors are reshaping the IT landscape. Today's software developers must deliver products and services faster with higher quality and performance to stay competitive, but in an environment with unprecedented potential, it's increasingly difficult for application developers to simultaneously handle business logic and performance issues in their code. That's where IBM Power Systems technology steps in, says Susan Yoskin, IBM Rational marketing solution manager. Power Systems are based... [More]
In a previous blog entry I discussed how one might incorporate the prefetch built-in function (bif) into a z/OS binary that is compiled with ARCH(5) i.e. a binary that will run on all supported hardware, yet when it detects it is running on a z10 or z196 system, would be able to utilize the prefetch instruction. There was a request for examples of using prefetch , which I thought was a good idea, so starting with this entry, I'll present some examples of how one might use this instruction. I'll start with a simple, basic program and in... [More]
The AIX tprof utility is a valuable tool for optimization which provides features to identify performance bottlenecks within programs and to aid in analysis of performance-critical code. The IBM XL C/C++ Compiler provides features to aid in the use of tprof for performance analysis of programs. This document provides a brief introduction to select features of the tprof utility, with focus on features which interact with XL compiler listing files to allow performance analysis at the source-line or instruction level. In its most simple usage,... [More]
Mathematical Acceleration Subsystem (MASS) Version 6.0 for AIX has been released. This high performance suite of elementary and special mathematical functions contains two new libraries tuned for the IBM POWER7 processor, and exploiting the POWER7 VSX SIMD instruction set. MASS v6.0 offers performance on POWER7 of up to 260 times that of the system math library libm, and up to 4 times that of previous versions of MASS. MASS v6.0 ships with the IBM XL C/C++ for AIX v11.1 and XL Fortran for AIX v13.1 compilers, which provide for both explicit... [More]
If your applications call mathematical functions such as sin, cos, exp, log, etc. and you are interested in maximizing performance with minimum effort, here is something that will interest you! My colleague Daniel Zabawa and I have written a paper, "How to improve the performance of programs calling mathematical functions -- taking advantage of IBM XL C/C++ or XL Fortran compiler auto-vectorization". Our paper introduces the IBM MASS high-performance mathematical libraries, and demonstrates how to benefit from them — without the need for... [More]
Which option combination do you think yields a faster execution? A. -O2 –qinline B. -O2 –qnoinline By specifying –qinline user is asking the compiler to inline functions that meets the criteria of inlining threshold and limit size: for the z platform the default value for the relative size of the function to be inlined is 100 ACUs and the maximum relative size a function can grow before inliner stops inlining more is 1000 ACUs . While inlining eliminates the linkage overhead and provides a larger code to the compiler to be optimized, its... [More]
Hello -- I'm the technical leader for the IBM MASS math libraries. MASS stands for Mathematical Acceleration Subsystem, and consists of libraries of mathematical functions specifically tuned for optimum performance on various computing platforms. MASS was originally launched by IBM in 1995, and has been continuously improved and expanded since then. I've been involved with MASS since 2002. There are currently versions of MASS for all the POWER processors, running AIX or Linux operating systems. There are also versions for BlueGene/L and... [More]
Here we are eleven days into a new year; and I would like to wish all of you a happy belated 2009! May this be a year of making technology less complex, more intuitive, friendlier and greener. Is it clear why we have RENT|NORENT compiler option? Do you know that C++ always uses constructed re-enterancy? Can you imagine how applications can benefit most from this? Recently, I developed a greater appreciation for RENT option, that is after I banged my head against the wall to REALLY understand what RENT is all about in order to fix a related... [More]
Dual cores have become household products, yet we see little change in the performance of the application that run on these machines. Just this morning, I started Firefox and Lotus Notes at the same time on my dual core T60p laptop, and had to wait a long time before I could use either one of the applications. To take full advantage of hardware horse power: software has to keep up, compiler has to generate better code, or both. Recently, I have been experimenting with compiler options to see how much I can tune the size and performance of the... [More]