 | Level: Introductory Gad Haber (gadi@us.ibm.com), Senior Architect,
IBM
15 Apr 2008 This introductory tutorial, designed as a companion for the IBM SDK for
Multicore Acceleration, Version 3.0 (otherwise known as the Cell Broadband
Engine® SDK), teaches you how to use five performance tools that reside in the SDK
3.0: OProfile, Cell Performance Counter, Performance Debugging Tool, the PDT Trace
Reader, and FDPR-Pro. The Visual Performance Analyzer, available separately, is also highlighted. In this tutorial
This tutorial introduces performance tools for the IBM SDK for Multicore
Acceleration 3.0 (the Cell/B.E. SDK 3.0) and explains how to use them: - OProfile for Cell is a system-level profiler for Cell/B.E. Linux® systems
that is capable of profiling all running code at low overhead. OProfile is
released as part of the Cell Broadband Engine SDK. OProfile consists of a kernel
driver, a daemon for collecting sample data, and several post-profiling tools
for turning data into information. OProfile leverages the hardware performance
counters of the CPU to enable profiling of a wide variety of interesting
statistics that can also be used for basic time-spent profiling. All code is
profiled: hardware and software interrupt handlers, kernel modules, the
kernel, shared libraries, and applications. OProfile is distributed as part of the SDK.
- The Cell Performance Counter tool (also called cell-perf-counter tool and CPC) is used for
setting up and using the hardware performance counters in the Cell/B.E.
processor. These counters allow you to see how many times certain hardware
events are occurring, which is useful if you are analyzing the performance of
software running on a Cell/B.E. system. Hardware events are available from all
of the logical units within the Cell/B.E. processor, including the PPE, SPEs,
interface bus, and memory and I/O controllers. Four 32-bit counters, which can
also be configured as pairs of 16-bit counters, are provided in the Cell/B.E.
performance monitoring unit (PMU) for counting these events. CPC also makes
use of the hardware sampling capabilities of the Cell/B.E. PMU. This feature
enables the hardware to collect very precise counter data at programmable time
intervals. The accumulated data can be used to monitor the changes in
performance of the Cell/B.E. system over longer periods of time. CPC provides
a variety of output formats for the counter data: simple text output
shown in the terminal session; HTML output available for viewing in a Web
browser; and XML output generated for use by higher-level analysis
tools such as the Visual Performance Analyzer. CPC is distributed as part of the SDK.
- The Performance Debugging Tool (PDT) provides tracing for recording
significant events during program execution and for maintaining the sequential
order of events. The main objective of the PDT is to provide the capability to
trace events of interest, in real time, and to record relevant data from the SPEs
and PPE. This objective is achieved by instrumenting the code that implements
key functions of the events on the SPEs and PPE and by collecting the trace
records. This instrumentation requires additional communication between the
SPEs and the PPE as trace records are collected in the PPE memory. Tracing 16 SPEs
using one central PPE might lead to a heavy load on the PPE and therefore,
might influence the application performance. The PDT is designed to reduce the
tracing execution load and to provide a means for throttling the tracing activity
on the PPE and each SPE. In addition, the SPE tracing code size is minimized
so that it fits into the small SPE local store. After tracing is enabled, data
may be gathered for any running application. Tracing is enabled at the
application level (user space). After the application has been enabled, the
tracing facility trace data is gathered each time the application runs.
The PDT is distributed as part of the SDK.
- The PDT Trace Reader (PDTR) is a command-line tool that reads and displays
PDT traces and generates various trace event-based summary reports.
The PDTR is distributed as part of the SDK.
- FDPR-Pro for Cell/B.E. is a performance-tuning utility to reduce the
execution time and real memory utilization of user-level application programs.
It optimizes the executable image of a program by collecting information about
the program's behavior under a typical workload and creating a new version of
the program that is optimized for that workload. FDPR-Pro for Cell/B.E. is distributed as part of
the SDK.
- The Visual Performance Analyzer (VPA) is an Eclipse-based performance
visualization toolset that consists of six major components:
- Profile Analyzer provides a powerful set of graphical and text-based
views to allow users to narrow down performance problems to a particular
process, thread, module, symbol, offset, instruction, or source line.
- Code Analyzer displays detailed information on basic blocks, functions,
and assembly instructions of executable files and DLLs (Dynamic Link
Libraries).
- Pipeline Analyzer displays pipeline execution (for POWER® systems).
- Counter Analyzer is a common tool to analyze hardware performance
counter data among many IBM Systems (formerly IBM eServer™) platforms.
- Trace Analyzer visualizes Cell/B.E. traces containing information such
as DMA communication, locking and unlocking activities, mailbox messages,
and so on.
- Control Flow Analyzer reads call trace data files.
Prerequisites
Although this is a fairly entry-level tutorial, you will benefit from
prior experimentation with the Cell/B.E. SDK 3.0.
System requirements
Hardware The following table shows the recommended minimum configuration for each
hardware platform. Table 1. Hardware requirements
| System | Recommended minimum configuration |
|---|
| x86 or x86-64 | 2 GHz Pentium® 4 processor |
|---|
| PowerPC®
| 64-bit PPC with a clock speed of 1.42 GHz 32-bit PPC platforms not
supported |
|---|
| BladeCenter® QS20 | Revision 31 or greater, and minimum firmware level of QA-06.14.0-0F
(7.21) |
|---|
| BladeCenter QS21 | Minimum hardware firmware level of QB-01.08.0-00 |
|---|
All systems are required to have:
- Hard disk space: 5 GB (minimum) to install the source package and the
accompanying development tools
- 1 GB RAM (minimum) on the host system
Note: If you use the Full System Simulator, the minimum amount of RAM
installed must be twice the amount of simulated memory. For example, to simulate
a system with 512 MB of RAM, the host system must have at least 1 GB of RAM
installed. Software The SDK 3.0 requires Fedora 7, which must be installed before you install the
SDK.
For SELinux: The SELinux policy files that are included in the Fedora 7
base distribution prevent spufs from loading
correctly on boot. To install the SDK, you must either turn off SELinux or
update the selinux-policy and
selinux-policy-targeted RPMs to the latest version.
The preferred method is to update the RPMs. To update, type the following
commands as the root user:
yum update selinux-policy selinux-policy-targeted.
For expat: The DaCS for Hybrid-x86 daemon for both X86_64 and the
BladeCenter QS20 and QS21 platforms requires the expat XML parsing library.
Install expat by typing the following command as the root user:
yum install expat.
SDK utility software dependencies: The SDK requires the packages rsync,
sed, TCL, and wget. To install these dependencies, type the following command as
the root user: yum install rsync sed tcl wget.
Duration
1 hour
Formats html, pdf
|  | |  |