Fixes are available
XL C/C++ for Blue Gene/Q Fix Pack 12 (May 2015 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 14 (May 2016 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 15 (October 2017 Update) for 12.1
February 2013 Update for XL C/C++ for Blue Gene/Q, V12.1
May 2013 Update for XL C/C++ for Blue Gene/Q, V12.1
XL C/C++ for Blue Gene/Q Fix Pack 5 (August 2013 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 6 (November 2013 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 7 (February 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 8 (May 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 9 (August 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 10 (November 2014 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 11 (February 2015 Update) for 12.1
XL C/C++ for Blue Gene/Q Fix Pack 13 (August 2015 Update) for 12.1
APAR status
Closed as suggestion for future release.
Error description
In general, -qhot=level=2 is available to the user to perform more aggressive loop transformations through the polyhedral framework, however the performance is not guaranteed to be as good or better than -qhot=level=1. The following example identifies a situation when the code compiled with -qhot=level=2 may get slower than the code compiled with -qhot=level=1: ... ... do j=1,n do i=1,n dr(i,j)=ar(i,j)+br(i,j)*cr(i,j)-bi(i,j)*ci(i,j) di(i,j)=ai(i,j)+br(i,j)*ci(i,j)+bi(i,j)*cr(i,j) end do end do Because the access to all of these arrays are stride-1, and there are no loop carried dependencies, XL compiler is able to easily vectorize the inner loop. The loop transformations performed under -qhot=level=1 iterates through a series of transformations and identifies opportunities that are profitable while also taking preventative measures not to prevent opportunities from other transformations that are deemed to be more profitable. In the case above it is able to identify the SIMD opportunity and focus on vectorizing the inner loop Under -qhot=level=2 XL compiler's polyhedral framework is much more aggressive and is solely focused on maximizing the performance of the entire loop nest without taking into consideration further opportunities that can be exploited. In the case above, -qhot=level=2 decides to perform several transformations that are profitable, but also prevents XL compiler's auto-simd transformations from occuring. If the source is built with -qsimd=noauto and -qhot=level=2 option combination, the performance of the resulting binary may be better than the one compiled with -qhot=level=1. Currently, the code compiled with -qhot=level=2 option fails to recognize the simd opportunity that the compiler was able to exploit for -qhot=level=1 compilation which leads to the performance regression at -qhot=level=2.
Local fix
For loop intensive High-Performance Computing workloads it is recommended to use -O3 or -O3 -qhot at compile and link time. By default, -qhot implies -qhot=level=1.
Problem summary
Problem conclusion
Temporary fix
Comments
According to development team, the runtime performance of the code provided by client and compiled with -qhot=level=2 will be on par or better than the one compiled with -qhot=level=1 command line options in the future version of XL compilers.
APAR Information
APAR number
LI77084
Reported component name
XL FORTRAN FOR
Reported component ID
5799AH100
Reported release
E10
Status
CLOSED SUG
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-11-06
Closed date
2012-11-06
Last modified date
2012-11-06
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
Document Information
Modified date:
06 November 2012