Skip to main content


developerWorks  >  Multicore acceleration | Linux  >

Cell/B.E. SDK 3.0 tools, Part 1: Using performance tools

Explore a practical example on the use of performance tools with the SDK 3.0

developerWorks

Level: Introductory

Gad Haber (gadi@us.ibm.com), Senior Architect, IBM

15 Apr 2008

Register now or sign in using your IBM ID and password.

This introductory tutorial, designed as a companion for the IBM SDK for Multicore Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine® SDK), teaches you how to use five performance tools that reside in the SDK 3.0: OProfile, Cell Performance Counter, Performance Debugging Tool, the PDT Trace Reader, and FDPR-Pro. The Visual Performance Analyzer, available separately, is also highlighted.

In this tutorial

This tutorial introduces performance tools for the IBM SDK for Multicore Acceleration 3.0 (the Cell/B.E. SDK 3.0) and explains how to use them:

  • OProfile for Cell is a system-level profiler for Cell/B.E. Linux® systems that is capable of profiling all running code at low overhead. OProfile is released as part of the Cell Broadband Engine SDK. OProfile consists of a kernel driver, a daemon for collecting sample data, and several post-profiling tools for turning data into information. OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics that can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications. OProfile is distributed as part of the SDK.

  • The Cell Performance Counter tool (also called cell-perf-counter tool and CPC) is used for setting up and using the hardware performance counters in the Cell/B.E. processor. These counters allow you to see how many times certain hardware events are occurring, which is useful if you are analyzing the performance of software running on a Cell/B.E. system. Hardware events are available from all of the logical units within the Cell/B.E. processor, including the PPE, SPEs, interface bus, and memory and I/O controllers. Four 32-bit counters, which can also be configured as pairs of 16-bit counters, are provided in the Cell/B.E. performance monitoring unit (PMU) for counting these events. CPC also makes use of the hardware sampling capabilities of the Cell/B.E. PMU. This feature enables the hardware to collect very precise counter data at programmable time intervals. The accumulated data can be used to monitor the changes in performance of the Cell/B.E. system over longer periods of time. CPC provides a variety of output formats for the counter data: simple text output shown in the terminal session; HTML output available for viewing in a Web browser; and XML output generated for use by higher-level analysis tools such as the Visual Performance Analyzer. CPC is distributed as part of the SDK.

  • The Performance Debugging Tool (PDT) provides tracing for recording significant events during program execution and for maintaining the sequential order of events. The main objective of the PDT is to provide the capability to trace events of interest, in real time, and to record relevant data from the SPEs and PPE. This objective is achieved by instrumenting the code that implements key functions of the events on the SPEs and PPE and by collecting the trace records. This instrumentation requires additional communication between the SPEs and the PPE as trace records are collected in the PPE memory. Tracing 16 SPEs using one central PPE might lead to a heavy load on the PPE and therefore, might influence the application performance. The PDT is designed to reduce the tracing execution load and to provide a means for throttling the tracing activity on the PPE and each SPE. In addition, the SPE tracing code size is minimized so that it fits into the small SPE local store. After tracing is enabled, data may be gathered for any running application. Tracing is enabled at the application level (user space). After the application has been enabled, the tracing facility trace data is gathered each time the application runs. The PDT is distributed as part of the SDK.

  • The PDT Trace Reader (PDTR) is a command-line tool that reads and displays PDT traces and generates various trace event-based summary reports. The PDTR is distributed as part of the SDK.

  • FDPR-Pro for Cell/B.E. is a performance-tuning utility to reduce the execution time and real memory utilization of user-level application programs. It optimizes the executable image of a program by collecting information about the program's behavior under a typical workload and creating a new version of the program that is optimized for that workload. FDPR-Pro for Cell/B.E. is distributed as part of the SDK.

  • The Visual Performance Analyzer (VPA) is an Eclipse-based performance visualization toolset that consists of six major components:
    • Profile Analyzer provides a powerful set of graphical and text-based views to allow users to narrow down performance problems to a particular process, thread, module, symbol, offset, instruction, or source line.
    • Code Analyzer displays detailed information on basic blocks, functions, and assembly instructions of executable files and DLLs (Dynamic Link Libraries).
    • Pipeline Analyzer displays pipeline execution (for POWER® systems).
    • Counter Analyzer is a common tool to analyze hardware performance counter data among many IBM Systems (formerly IBM eServer™) platforms.
    • Trace Analyzer visualizes Cell/B.E. traces containing information such as DMA communication, locking and unlocking activities, mailbox messages, and so on.
    • Control Flow Analyzer reads call trace data files.


Prerequisites

Although this is a fairly entry-level tutorial, you will benefit from prior experimentation with the Cell/B.E. SDK 3.0.


System requirements

Hardware

The following table shows the recommended minimum configuration for each hardware platform.


Table 1. Hardware requirements
SystemRecommended minimum configuration
x86 or x86-642 GHz Pentium® 4 processor
PowerPC® 64-bit PPC with a clock speed of 1.42 GHz
32-bit PPC platforms not supported
BladeCenter® QS20Revision 31 or greater, and minimum firmware level of QA-06.14.0-0F (7.21)
BladeCenter QS21Minimum hardware firmware level of QB-01.08.0-00

All systems are required to have:

  • Hard disk space: 5 GB (minimum) to install the source package and the accompanying development tools
  • 1 GB RAM (minimum) on the host system

Note: If you use the Full System Simulator, the minimum amount of RAM installed must be twice the amount of simulated memory. For example, to simulate a system with 512 MB of RAM, the host system must have at least 1 GB of RAM installed.

Software

The SDK 3.0 requires Fedora 7, which must be installed before you install the SDK.

For SELinux: The SELinux policy files that are included in the Fedora 7 base distribution prevent spufs from loading correctly on boot. To install the SDK, you must either turn off SELinux or update the selinux-policy and selinux-policy-targeted RPMs to the latest version. The preferred method is to update the RPMs. To update, type the following commands as the root user: yum update selinux-policy selinux-policy-targeted.

For expat: The DaCS for Hybrid-x86 daemon for both X86_64 and the BladeCenter QS20 and QS21 platforms requires the expat XML parsing library. Install expat by typing the following command as the root user: yum install expat.

SDK utility software dependencies: The SDK requires the packages rsync, sed, TCL, and wget. To install these dependencies, type the following command as the root user: yum install rsync sed tcl wget.



Duration

1 hour


Formats

html, pdf


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!


Back to top


Document options

Document options requiring JavaScript are not displayed

Discuss


More in this series:
Cell/B.E. SDK 3.0 tools