|  | Level: Intermediate Peter Altevogt (ALTEVOGT@de.ibm.com), Performance Architect,
IBM
Hans Boettiger (h.boettiger@de.ibm.com), Performance Architect,
IBM
Tibor Kiss (tibor.kiss@de.ibm.com), Performance Engineer, Contractor,
IBM
Zvonko Krnjajic (KRNJAJIC@de.ibm.com), Software Engineer, Contractor,
IBM
06 May 2008 Although there is extensive published data about the hardware performance
features of a single Cell Broadband Engine™ (Cell/B.E.) processor (and about the performance of a
multitude of applications ported to it), there is little on the specific hardware
performance features of the IBM BladeCenter® QS21 using a coherent SMP node of two
Cell/B.E processors as well as an elaborate IO subsystem. This glossary goes with
the article "Evaluating IBM
BladeCenter QS21 hardware performance." In that article, the
authors close that gap by providing information about basic latencies, throughputs,
and relative execution times for some key computational benchmark kernels, such as
Linpack and SPEC2000. The article also delivers a basic architectural overview of
the system. And, you can get tips on how to optimize application
performance.
This glossary goes with
the article "Evaluating IBM
BladeCenter QS21 hardware performance."
Glossary of terms
| BE0, BE1 | Aliases for the two Cell/B.E. processors of the QS21. |
|---|
| BIF | The Cell/B.E. interface: A fully coherent protocol
connecting the two Cell/B.E. processors of the QS21. |
|---|
| DCBB2 | Dual Cell-based blade configuration number 2: A special deliverable for IBM
Global Engineering Services (GES). |
|---|
| DDR2 | Double data rate 2 is a technology for high speed memory. |
|---|
| DMA | Direct memory access is a technology to move data within a computer system
without requiring services from the main processor. |
|---|
| EIB | The Element Interconnect Bus is the communication path for command and data
between all processor elements and the on-chip memory and I/O controller of the
Cell/B.E. processor. |
|---|
| HS Connector | A High-Speed 2x PCI-E 16x Connector supports only the DCBB2 blade
deliverable. |
|---|
| HSDC | High speed daughter cards, such as the InfiniBand Daughter Card. |
|---|
| IBDC | InfiniBand Daughter Card. |
|---|
| IOIF | The non-coherent I/O interface protocol of the Cell/B.E. system that is
suitable for I/O devices. |
|---|
| mc0, mc1 | The memory controller interfacing the Southbridges and the attached DDR2
memory. |
|---|
| MFC | Memory flow controller: Component of the SPE that transfers data between the
local store of the SPU and the XDR DRAM and providing synchronization
services. |
|---|
| MIC | Memory interface controller: Provides the interface between the EIB bus and
the XDR DRAM. |
|---|
|
MPI
| Message passing interface is a specification of a message passing
library. |
|---|
| MTU | Maximum transmission unit: Specifies the maximum packet size in bytes that
can be transmitted over a network without being fragmented. |
|---|
| n1/2
| n1/2 is the message size where the throughput achieves half of
its maximum value. |
|---|
| PCI-E | PCI Express: A computer expansion card interface standard introduced to
replace PCI-X. |
|---|
| PCI-X | Peripheral Component Interconnect Extended: A computer expansion card
interface standard introduced to replace PCI. |
|---|
| PPE | The PowerPC® Processor Element of the Cell/B.E. processor: A general purpose,
dual-threaded 64-bit RISC® processor core. |
|---|
|
rDMA
| Remote direct memory access allows data to move directly from the memory of
one computer into that of another without involving either one's
processor. |
|---|
| SIMD | Single Instruction Multiple Data is a classic technique to implement data
parallelism; that is, to execute the same operation concurrently on a set of
data. |
|---|
| SPE1, ..., SPE8 | The eight Synergistic Processor Elements constitute the computational core of
the Cell/B.E. processors. Each SPE is a 128-bit RISC processor executing SIMD
instructions. Its main units are the Synergistic Processing Unit (SPU), which
contains the computational pipelines, and the memory flow controller (MFC),
which implements the DMA operations. |
|---|
| SPU | Synergistic Processing Unit: Processor component of an SPE with two
pipelines executing up to two instructions per cycle and an attached local
store memory. |
|---|
| TLB | Translation lookaside buffer: Caches at the SPEs and PPEs that are used by
the memory management hardware to improve the latency of virtual address
translation. |
|---|
| XDR™
| eXtreme Data Rate Dynamic Random Access Memory: High-performance memory from
Rambus, Inc. |
|---|
Resources Learn
- Refer to
Analyzing Computer Systems Performance with Perl: PDQ
by Neil Gunther (SpringerVerlag, 2005, pp. 92-93) for an explanation of the
typical knee in
the graphs represented in the section "Several SPEs concurrently accessing XDR DRAM".
- Read
"The LINPACK Benchmark: Past, Present, and Future"
to clear up any confusion and mystery surrounding the LINPACK benchmark and some
of its variations.
- Look at the scheduling brochure
Cell/B.E.-Opteron hybrid supercomputer at LANL, Roadrunner.
- Explore the
Linux openfabrics.org Wiki
for information about the Open MPI implementation of the Open Fabrics Enterprise
Distribution (OFED).
- Check out
NetPipe (a protocol independent
performance tool that visually represents the network performance under a variety
of conditions) and Netperf (a
benchmark that can be used to measure the performance of many different types of
networking, including testing for both unidirectional throughput and end-to-end
latency).
- Find out more about best practices for Cell/B.E.
development in the IBM Redbook™ draft
"Programming the Cell Broadband Engine Examples and Best Practices"
(IBM Redbooks, February 2008).
- Get answers to Cell/B.E. SDK 3.0 installation
questions in the original installation document,
"Installation Guide for the SDK for Multicore Acceleration v3.0."
- Read
"Introduction to the Cell Multiprocessor"
(IBM Journal of Research and Development, 2005) for an introductory
overview of the Cell/B.E. multiprocessor's history, the program objectives and
challenges, the design concept, the architecture and programming models, and the
implementation. Also of interest from early Cell/B.E. Architecture efforts is
"Cell Broadband Engine Architecture and its first implementation"
(developerWorks, November 2005).
- Find the
"Cell/B.E.
SDK 3.0 tools: Using performance tools" tutorial series
(developerWorks, April 2008) for a tour of six performance tools for use
with the Cell/B.E. SDK 3.0 and for Cell/B.E. system performance best
practices.
- To learn more on Cell/B.E. programming, try the
developerWorks series:
- Refer to the Cell
Broadband Engine documentation section of the IBM Semiconductor Solutions Technical Library for a wealth of downloadable manuals,
specifications, and more.
- Sign up for the developerWorks newsletter
and get the latest developer news and Cell/B.E. happenings delivered to your inbox each week.
Check Power Architecture
® when you sign up to receive Cell/B.E. news in your newsletter.
- The
Cell Broadband Engine/Power Architecture notebook
is a blog-based resource that hosts
news,
as well as two instructional features -- the
"Forum watch"
of interesting questions and hot topics from the forum and the
"Infobomb"
series (short, precise, task-specific, quick-read knowledge bombs gleaned from
Cell/B.E. documentation).
Get products and technologies
Discuss
About the authors  | |  | Dr. Peter Altevogt is a performance architect in the IBM Systems and Technology Group at the IBM Laboratory Boeblingen (Germany). He built the performance team for the IBM Blade computer using the Cell/B.E. processor. His other responsibilities include performance analysis and modeling of future IBM processors and systems. Dr. Altevogt holds degrees in Mathematics and Physics from the University of Heidelberg, and he holds a doctorate in theoretical physics from the University of Karlsruhe. He joined the IBM Scientific Center in Heidelberg in 1991, and he moved to the IBM Laboratory Boeblingen in 1998. |
 | |  | Hans Boettiger works in IBM Systems and Technology Group at the IBM Germany Development Lab. He joined IBM in 1973. He has held various technical leadership positions in software, operating systems, and hardware development for mainframes, as well as in performance analysis for BI systems, compilers, and blade computers. He currently works as a performance architect on next generation systems. |
 | |  | Tibor Kiss is a performance engineer at the IBM Laboratory Boeblingen (Germany). Since 2005, he has been a member of the IBM Systems and Technology Group performance team, responsible for the performance of the IBM Blades using the Cell/B.E. processor. He holds a Bachelor of Science degree in Computer Engineering. His interests include performance analysis and modeling. |
 | |  | Zvonko Krnjajic is a software engineer at the IBM Laboratory Boeblingen (Germany) working on performance analysis of Cell/B.E.-based blades. His other interests include graphics on the Cell/B.E processor (he did his diploma thesis on implementing graphics algorithms on the Cell/B.E. processor at the IBM Laboratory Boeblingen). He holds a Bachelor's degree from the University of Esslingen, and he is currently working on his master's thesis in the area of Distributed Systems Engineering with a focus on general purpose computing on GPUs. He is also interested in High Performance Computing and Cryptography. |
Rate this page
|  |
IBM, developerWorks, PowerPC, Redbook, and RISC are trademarks of IBM Corporation in the United States, other countries, or both. Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc. Other company, product, or service names may be trademarks or service marks of others. Other company, product, or service names may be
trademarks or service marks of others. |