IBM Support

LI77084: -QHOT AT LEVEL=2 MAY BE SLOWER THAN AT LEVEL=1

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as suggestion for future release.

Error description

  • In general, -qhot=level=2 is available to the user to perform
    more aggressive loop transformations through the polyhedral
    framework, however the performance is not guaranteed to be as
    good or better than -qhot=level=1.
    
    The following example identifies a situation when
    the code compiled with -qhot=level=2 may get slower than the
    code compiled with -qhot=level=1:
    
          ...
          ...
          do j=1,n
          do i=1,n
          dr(i,j)=ar(i,j)+br(i,j)*cr(i,j)-bi(i,j)*ci(i,j)
          di(i,j)=ai(i,j)+br(i,j)*ci(i,j)+bi(i,j)*cr(i,j)
          end do
          end do
    
    Because the access to all of these arrays are stride-1, and
    there are no loop carried dependencies, XL compiler is able to
    easily vectorize the inner loop.
    
    The loop transformations performed under -qhot=level=1 iterates
    through a series of transformations and identifies opportunities
    that are profitable while also taking preventative measures not
    to prevent opportunities from other transformations that are
    deemed to be more profitable. In the case above it is able to
    identify the SIMD opportunity and focus on vectorizing the inner
    loop
    
    Under -qhot=level=2 XL compiler's polyhedral framework is much
    more aggressive and is solely focused on maximizing the
    performance of the entire loop nest without taking into
    consideration further opportunities that can be exploited.
    
    In the case above, -qhot=level=2 decides to perform several
    transformations that are profitable, but also prevents XL
    compiler's auto-simd transformations from occuring.
    
    If the source is built with -qsimd=noauto and -qhot=level=2
    option combination, the performance of the resulting binary may
    be better than the one compiled with -qhot=level=1.
    Currently, the code compiled with -qhot=level=2 option fails to
    recognize the simd opportunity that the compiler was able to
    exploit for -qhot=level=1 compilation which leads to the
    performance regression at -qhot=level=2.
    

Local fix

  • For loop intensive High-Performance Computing workloads it is
    recommended to use -O3 or -O3 -qhot at compile and link time.
    By default, -qhot implies -qhot=level=1.
    

Problem summary

Problem conclusion

Temporary fix

Comments

  • According to development team, the runtime performance of the
    code provided by client and compiled with -qhot=level=2 will be
    on par or better than the one compiled with -qhot=level=1
    command line options in the future version of XL compilers.
    

APAR Information

  • APAR number

    LI77084

  • Reported component name

    XL FORTRAN FOR

  • Reported component ID

    5799AH100

  • Reported release

    E10

  • Status

    CLOSED SUG

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-11-06

  • Closed date

    2012-11-06

  • Last modified date

    2012-11-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS2MB5","label":"XL Fortran for Blue Gene\/Q"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"ALL VERSIONS","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
06 November 2012