Creating and working with trace data
The PDT tool produces tracing data, which you can view and analyze in the Trace Analyzer tool. In order to properly collect trace data, you need to recompile the fft application according to the required PDT procedures.
Step 1: Collect trace data with PDT
- Prepare the spu makefile according to PDT requirements, depending on the compiler of your choice:
########### Modifying ~/FFT16M/spu/Makefile for gcc compiler ########## ####################################################################### ## Target ####################################################################### # PROGRAMS_spu:= fft_spu LIBRARY_embed:= fft_spu.a ####################################################################### ## Local Defines ####################################################################### # CFLAGS_gcc:= -g --param max-unroll-times=1 -Wall -Dmain=_pdt_main -Dexit=_pdt_exit -DMFCIO_TRACE -DLIBSYNC_TRACE LDFLAGS_gcc = -Wl,-q -g -L/usr/spu/lib/trace INCLUDE = -I/usr/spu/include/trace IMPORTS = -ltrace ####################################################################### ## buildutils/make.footer ####################################################################### # ifdef CELL_TOP include $(CELL_TOP)/buildutils/make.footer else include ../../../../buildutils/make.footer endif |
########### Modifying ~/FFT16M/spu/Makefile for xlc compiler ########## ####################################################################### ## Target ####################################################################### # SPU_COMPILER = xlc PROGRAMS_spu:= fft_spu LIBRARY_embed:= fft_spu.a ####################################################################### ## Local Defines ####################################################################### # CFLAGS_xlc:= -g -qnounroll -O5 CPP_FLAGS_xlc := -I/usr/spu/include/trace -Dmain=_pdt_main -Dexit=_pdt_exit -DMFCIO_TRACE -DLIBSYNC_TRACE LDFLAGS_xlc:= -O5 -qflag=e:e -Wl,-q -g -L/usr/spu/lib/trace -ltrace ####################################################################### ## buildutils/make.footer ####################################################################### # ifdef CELL_TOP include $(CELL_TOP)/buildutils/make.footer else include ../../../../buildutils/make.footer endif |
- Rebuild the fft application:
cd ~/FFT16M ; CELL_TOP=/opt/cell/sdk make. - Set up a
configuration file with only the relevant stalls (mailboxes and read tag
status for SPE), because it is strongly recommended to focus on stalls here:
- Copy the default xml to the place the FFT runs so you can modify it:
cp /usr/share/pdt/config/pdt_cbe_configuration.xml ~/FFT16M. - Open the copied file for editing.
- On the first line, change the application name value to fft.
- Search for
<configuration name="SPE">. Below that line you find the MFCIO group tag. Set the group tag toactive="false". - Delete the SPE_MFC group. This should be sufficient to trace only the stalls in the SPE.
- Copy the default xml to the place the FFT runs so you can modify it:
- Prepare the environment by setting the following variables:
export LD_LIBRARY_PATH=/usr/lib/trace
export PDT_KERNEL_MODULE=/usr/lib/modules/pdt.ko
export PDT_CONFIG_FILE=~/FFT16M/pdt_cbe_configuration.xml
- Run the fft application at least three times to have adequate sampling:
cd ~/FFT16M/ppu ; ./fft 1 1 4 1 0. You should now have three trace files (.pex, .map, and .trace).
Note that the default PDT_CONFIG_FILE for the SDK establishes the trace files prefix as test. If you haven't modified the file, look for the trace files with this prefix. Also, remember to unset the LD_LIBRARY_PATH environment variable before running the original (non-PDT) binary later.
Step 2: Generate a trace report using PDTR
To generate a complete textual summary report from the previous trace files, you can use the PDTR tool to produce .pep files in ASCII format by typing the following:
$ /opt/cell/sdk/prototype/usr/bin/pdtr -trc <trace file name (without the suffix)>
Step 3: Import the PDT data into Trace Analyzer
The Trace Analyzer enables the visualization of the application's stages of execution. It works with data generated from the PDT tool. More specifically, it reads information available in the generated .pex file. Complete the following procedure to visualize the data on the Trace Analyzer:
- With VPA open, select Tools > Trace Analyzer.
- Go to File > Open File and locate the .pex file that was generated in the previous steps. The screen in Figure 13 should appear:
Figure 13. Trace Analyzer screen
The screen corresponds to the FFT16M application run with 16 SPEs and no large pages. As you can see, a less intensive blue has been selected for the MFC_IO group, and you can now see the difference between the borders and the internals of the interval. Additionally, the color map was used in the example to change the color of read_in_mbox to red rather than its group's default color, blue. You can see a large stall in the middle. This is where the benchmark driver verifies the result of the test run to make sure the benchmark computes correctly. The timed run is the thin blue strip after the stall.
Next, zoom into this area, which of most interest in this benchmark, as shown in Figure 14.
Figure 14. Zoomed trace view
As you can see, the mailboxes (red bars) break the execution into six stages. Different stages have different behaviors. For example, the third and sixth stages are much longer than the others, and they have a lot of massive stalls. To obtain more details, Trace Analyzer enables you to select a stall by simply clicking on it (as shown in Figure 14 by the yellow highlight). The selection marker rulers on the left and top show the location of the selected item (and can be used to get back to it if you scroll away). The data collected by the PDT for the selected stall is shown in the record details window. You can see that the stall is large: almost 12,000 ticks. Now you can check the Cell/B.E. performance tips for a possible cause of the stall. You might consider TLB misses as a possible culprit, as well as huge pages as a possible fix.



