This tutorial explores a practical, hands-on example on the use of performance tools, explaining how to collect proper information and how to access relevant visualization features. In this tutorial, you will analyze an example of a FFT16M application.
The example application for analysis is the FFT16M application that can be found in the Cell/B.E. SDK 3.0 demos bundle /opt/cell/sdk/src/demos/FFT16M. This hand-tuned application performs a four-way SIMD single-precision complex FFT on an array of size 16,777,216 elements. The two available command options are:
fft <ncycles> <printflag>
fft <ncycles> <printflag> [<log2_spus> <numa_flag> <largepage_flag>]
The old usage assumes that
log2_spus is 3,
numa_flag is 0, and
largepage_flag is 1. Also:
numa_flagequals 1, then numa is used.
largepage_flagequals 1, then large pages are used.
(When you get to the section Create and work with trace data, you will see that the newer format was used to collect some of the traces in this section.)