High Performance a Balancing Act ...
Visda 060001SUB1 Visits (7997)
Dual cores have become household products, yet we see little change in the performance of the application that run on these machines. Just this morning, I started Firefox and Lotus Notes at the same time on my dual core T60p laptop, and had to wait a long time before I could use either one of the applications.
To take full advantage of hardware horse power: software has to keep up, compiler has to generate better code, or both.
Recently, I have been experimenting with compiler options to see how much I can tune the size and performance of the executable. I started off by identifying the bottle neck(s), the hottest loop(s); places in application where CPU spends most of its processing time. I used hardware profiling tool to gather the data and Visual Performance Analyzer to identify the bottle necks. With VPA you can identify the bottle neck down to the hardware instruction level. I just mention this here to give you a sense of what the tool can do. But, for the purpose of my experiment, the function name was sufficient.
Given the function name, the rest was trial an error. I used a variety of performance tuning compiler options, e.g. HOT, INLINE, HGPR, all at O3; and pragma directives, #pragma option_override and #pragma unroll.
To conclude, I have to stay a lot relied on the high level code. I couldn’t find a hat that fits all. Playing with #pragma unroll proved to be interesting. Given the limited number of registers on z/OS loop unrolling opportunities were very limited.