Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Cell/B.E. SDK 3.0 tools, Part 1: Using performance tools

Explore a practical example on the use of performance tools with the SDK 3.0

Gad Haber (gadi@us.ibm.com), Senior Architect, IBM Japan
Dr. Gad Haber joined the IBM Haifa Labs in 1993 as a research staff member in the area of performance analysis and post-link optimization. He managed the Performance Analysis and Optimization Technologies (PAOT) group until 2006, and he is currently involved with promoting the Cell/B.E. performance tools in the IBM Austin Labs. Dr. Haber works in the IBM Systems and Technology Group in Enterprise Systems Development.

Summary:  This introductory tutorial, designed as a companion for the IBM SDK for Multicore Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine® SDK), teaches you how to use five performance tools that reside in the SDK 3.0: OProfile, Cell Performance Counter, Performance Debugging Tool, the PDT Trace Reader, and FDPR-Pro. The Visual Performance Analyzer, available separately, is also highlighted.

Date:  08 Apr 2008
Level:  Introductory PDF:  A4 and Letter (1063 KB | 30 pages)Get Adobe® Reader®

Activity:  29565 views
Comments:  

Creating and working with trace data

The PDT tool produces tracing data, which you can view and analyze in the Trace Analyzer tool. In order to properly collect trace data, you need to recompile the fft application according to the required PDT procedures.

Step 1: Collect trace data with PDT

  1. Prepare the spu makefile according to PDT requirements, depending on the compiler of your choice:
########### Modifying ~/FFT16M/spu/Makefile for gcc compiler ##########

#######################################################################
##     Target
#######################################################################
#

PROGRAMS_spu:= fft_spu
LIBRARY_embed:= fft_spu.a

#######################################################################
##     Local Defines
#######################################################################
#

CFLAGS_gcc:= -g --param max-unroll-times=1 -Wall -Dmain=_pdt_main
-Dexit=_pdt_exit -DMFCIO_TRACE -DLIBSYNC_TRACE
LDFLAGS_gcc   = -Wl,-q -g -L/usr/spu/lib/trace
INCLUDE         = -I/usr/spu/include/trace
IMPORTS         = -ltrace

#######################################################################
##     buildutils/make.footer
#######################################################################
#

ifdef CELL_TOP
   include $(CELL_TOP)/buildutils/make.footer
else
   include ../../../../buildutils/make.footer
endif

########### Modifying ~/FFT16M/spu/Makefile for xlc compiler ##########

#######################################################################
##     Target
#######################################################################
#

SPU_COMPILER = xlc
PROGRAMS_spu:= fft_spu
LIBRARY_embed:= fft_spu.a

#######################################################################
##     Local Defines
#######################################################################
#

CFLAGS_xlc:= -g -qnounroll -O5
CPP_FLAGS_xlc := -I/usr/spu/include/trace -Dmain=_pdt_main
-Dexit=_pdt_exit -DMFCIO_TRACE -DLIBSYNC_TRACE
LDFLAGS_xlc:= -O5 -qflag=e:e -Wl,-q -g -L/usr/spu/lib/trace -ltrace

#######################################################################
##     buildutils/make.footer
#######################################################################
#

ifdef CELL_TOP
   include $(CELL_TOP)/buildutils/make.footer
else
   include ../../../../buildutils/make.footer
endif

  1. Rebuild the fft application: cd ~/FFT16M ; CELL_TOP=/opt/cell/sdk make.
  2. Set up a configuration file with only the relevant stalls (mailboxes and read tag status for SPE), because it is strongly recommended to focus on stalls here:
    1. Copy the default xml to the place the FFT runs so you can modify it: cp /usr/share/pdt/config/pdt_cbe_configuration.xml ~/FFT16M.
    2. Open the copied file for editing.
    3. On the first line, change the application name value to fft.
    4. Search for <configuration name="SPE">. Below that line you find the MFCIO group tag. Set the group tag to active="false".
    5. Delete the SPE_MFC group. This should be sufficient to trace only the stalls in the SPE.
  3. Prepare the environment by setting the following variables:

    export LD_LIBRARY_PATH=/usr/lib/trace
    export PDT_KERNEL_MODULE=/usr/lib/modules/pdt.ko
    export PDT_CONFIG_FILE=~/FFT16M/pdt_cbe_configuration.xml
  4. Run the fft application at least three times to have adequate sampling: cd ~/FFT16M/ppu ; ./fft 1 1 4 1 0. You should now have three trace files (.pex, .map, and .trace).

Note that the default PDT_CONFIG_FILE for the SDK establishes the trace files prefix as test. If you haven't modified the file, look for the trace files with this prefix. Also, remember to unset the LD_LIBRARY_PATH environment variable before running the original (non-PDT) binary later.

Step 2: Generate a trace report using PDTR

To generate a complete textual summary report from the previous trace files, you can use the PDTR tool to produce .pep files in ASCII format by typing the following:

$ /opt/cell/sdk/prototype/usr/bin/pdtr -trc <trace file name (without the suffix)>

Step 3: Import the PDT data into Trace Analyzer

The Trace Analyzer enables the visualization of the application's stages of execution. It works with data generated from the PDT tool. More specifically, it reads information available in the generated .pex file. Complete the following procedure to visualize the data on the Trace Analyzer:

  1. With VPA open, select Tools > Trace Analyzer.
  2. Go to File > Open File and locate the .pex file that was generated in the previous steps. The screen in Figure 13 should appear:

Figure 13. Trace Analyzer screen
Trace Analyzer screen

The screen corresponds to the FFT16M application run with 16 SPEs and no large pages. As you can see, a less intensive blue has been selected for the MFC_IO group, and you can now see the difference between the borders and the internals of the interval. Additionally, the color map was used in the example to change the color of read_in_mbox to red rather than its group's default color, blue. You can see a large stall in the middle. This is where the benchmark driver verifies the result of the test run to make sure the benchmark computes correctly. The timed run is the thin blue strip after the stall.

Next, zoom into this area, which of most interest in this benchmark, as shown in Figure 14.


Figure 14. Zoomed trace view
Zoomed trace view

As you can see, the mailboxes (red bars) break the execution into six stages. Different stages have different behaviors. For example, the third and sixth stages are much longer than the others, and they have a lot of massive stalls. To obtain more details, Trace Analyzer enables you to select a stall by simply clicking on it (as shown in Figure 14 by the yellow highlight). The selection marker rulers on the left and top show the location of the selected item (and can be used to get back to it if you scroll away). The data collected by the PDT for the selected stall is shown in the record details window. You can see that the stall is large: almost 12,000 ticks. Now you can check the Cell/B.E. performance tips for a possible cause of the stall. You might consider TLB misses as a possible culprit, as well as huge pages as a possible fix.

5 of 8 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration, Linux
ArticleID=301401
TutorialTitle=Cell/B.E. SDK 3.0 tools, Part 1: Using performance tools
publish-date=04082008
author1-email=gadi@us.ibm.com
author1-email-cc=

IBM SmartCloud trial. No charge.

IBM PureSystems on a kaleideoscope background

Unleash the power of hybrid cloud computing today!


Special offers