Which option combination do you think yields a faster execution?
A.-O2 –qinline
B.-O2 –qnoinline
By specifying –qinline user is asking the compiler to inline functions that meets the criteria of inlining threshold and limit size: for the z platform the default value for the relative size of the function to be inlined is 100 ACUs and the maximum relative size a function can grow before inliner stops inlining more is 1000 ACUs.
While inlining eliminates the linkage overhead and provides a larger code to the compiler to be optimized, its impact on the execution time solely depends on how much inlining has been done and which functions have been inlined.
Recently, I looked at a user code that performed 4.5 times slower with –qinline than -qnoinline. With the report sub-option turned on, I got an idea which functions were inlined. I also noted our compiler generated the informational messages, CCN1051: Function &1 exceeds size limit, for a couple of functions that were not inlined.
I increased the inline threshold size in steps of 50 and measured the execution time. In table, below, you can find the values for threshold size along with the execution time they yielded. There was not a significant improvement in performance with threshold size larger than 200; in fact threshold size of 300 showed some degradation. With threshold size of 190 the execution time was twice as fast as –qnoinline; this gave the best balance between inlining and execution time.
Execution Time in seconds
-O2 -qnoinline 25.71 -qinline=auto:report:*:1000 :100 111.98 :150 111.97 :190 10.72 :200 10.72 :250 10.73 :300 10.78
Table 1. Shows the execution time with respect to different inlining thresholds
By changing the values of inlining sub options I was able to get much better performance than without inlining. Moreover I noticed that after certain threshold more inlining doesn’t improve any performance and may have some adverse side effects as well.
Give it a try and let me know if changing the threshold made a difference in the run time of your application.