The IBM Cell Broadband Engine™ (Cell/B.E.) Software Development Kit (SDK) has been updated once again. This article addresses what's new and interesting in the 3.0 release from a non-IBM point of view. Version 3.0 came out on October 19, 2007, and it completely replaces the prior 2.1 release. (In fact, if you don't have version 2.1, don't bother looking for it.)
The 3.0 update shows off some major changes. Probably the most significant to a lot of developers is that it can now be installed natively on a PS3 running Linux® without having to modify or cheat the installer. The 2.1 and earlier releases refused to install on a Cell/B.E.-based system without installing a kernel that ran only on native hardware, rather than in the hypervisor environment used by the PS3.
Of course, if you don't have a Cell/B.E.-based system at all, you can still run under the Full System Simulator included on the extras CD. I wouldn't recommend trying to run the system simulator on a PS3, though; the performance would likely be a little slow.
The SDK is supported on the Fedora 7 and the RHEL 5.1 systems. Although it might be possible to use it on others, IBM provides no support for that. The CDs for the product come with a small starter RPM that installs the SDK installer. The SDK installer can then be used to install the remainder of the SDK. There are two CDs in the SDK: the developer package, providing the main functionality of the SDK, and the extras package, providing experimental code, add-ons, and additional features such as the Full System Simulator.
The SDK installer uses files from the SDK CDs and files downloaded from the IBM Barcelona Supercomputing Center. If you have the ISO images downloaded already, the SDK installer can mount them and copy most of the files it needs from them, but you will still need to download some files (about 130MB) from the Barcelona site.
With each revision, the SDK installer gets a little easier to use. Compared with the hand-tuning of some earlier revisions, it's pretty polished now. Installation will take a while, partly because the yum installer chews up a fairly large amount of memory while it's running. The SDK installer uses yum after creating special custom repositories holding the SDK components; it's unclear how this is an improvement over installing them directly, but apparently it works.
The installer doesn't install the extras automatically. The extras are considered
somewhat experimental, so if you want to install them, you'll have to do it
yourself. You might need to install other packages first. For example, the Fedora
7 installation doesn't seem to have all the tcl and tk versions needed, but
yum install tcl and
yum install tk solves that. Some of the extra RPM
files depend on RPMs that might not be installed by default. You might have to install the
trace, pdt, alf-trace, and dacs-trace packages, plus their corresponding
development packages, before some of the extras will install.
The SDK has undergone the usual variety of changes, updates, and enhancements. The XL C compiler has been updated from 0.8.2 to 0.9.0, although it's still considered an alpha product. The GNU compiler has also been updated with substantial improvements to autovectorization. There is a workaround for one of the few known errata for the SPE (see Resources). The accelerator library framework (ALF) has been updated as well. There are some brand-new features such as Fortran (both PPE and SPE) and Ada (PPE only) support, and a linear algebra library.
However, perhaps the biggest change is that the documentation is updated and is now installed as part of the SDK. (Trust a writer to think the documentation is the biggest change.) The documentation has always been fairly good, but in previous releases, you had to go looking for it. Now it's all installed in /opt/cell: specifically, in /opt/cell/sdk/docs. The documentation provided with the SDK goes way beyond anything an article can cover, so go read the documentation.
A recurring theme in the updates of the SDK is a move toward standardization and specification. The early SDK offered experimental protocols. The original libspe is now deprecated because it had flaws that were addressed by developing a new and improved API called (with typical creativity) libspe2. The SIMD math library has been polished, improved, and debugged. For example, there's a SIMD math specification now, and the library implements it completely. That's significant progress, especially if you look at the notes in a previous article in this series about working around one of the limitations of a previous version.
The Fedora version of the SDK includes an updated 2.6.23 kernel specifically built to run on Cell/B.E. hardware. While Fedora 7 runs natively on the hardware, the kernel built for it is noticeably smaller. For example, the initrd file is about 40 percent of the size of the standard Fedora one, and the config file used for the kernel is even smaller. Those savings matter on the fairly common PS3 development platform, where even a couple of megabytes of free space can be at a premium.
A tour of the SDK's example code
If you plan to do any actual development, one of the most important parts of the
SDK is the example code. The documentation can only get you so far. At the end of the
day, reading working code is one of the best ways to learn how to use a system.
The addition of man pages helps immensely too. For example, you can now expect
man spe_context_create_affinity to show you what you
need to know about creating contexts with affinity. (Note: One of the
things it tells you is that you can't do it on the PS3 target hardware.)
The example code comes in four tar files. Just extract them in the /opt/cell/sdk/src
directory where they are located, and run make to build
all the sample code.
There are five subdivisions of the example programs:
- Benchmarks
- Well, it's a single benchmark, but it's very nice.
- Demos
- The demo programs show off particular accomplishments.
- Examples
- The example programs are
tiny little programs, each of which highlights a particular coding technique or
recipe for accomplishing a single task. For example, the examples/cache directory
has a pair of programs showing how to take advantage of a software-managed cache
implemented for the SPEs.
- Tutorial
- The tutorial code is either the most useful or the least useful, depending on
your experience level. If you aren't
sure where to get started, working through the tutorial is a great idea, and the
euler tutorial shows a series of steps in the development of a fairly simple
program that does a little bit of everything. Of particular interest is that when a
file is unchanged between iterations, it's a symbolic link back to the previous
iteration; noticing this could save you a bit of time trying to figure out what
changed.
- Library code
- The library code offers an even more little cookbook recipes and reusable code snippets that show elegant (or at the least efficient) ways to perform common tasks. If you've been struggling with issues such as how to efficiently perform convolutions on arrays of data or fast Fourier transforms, you'll be happy to see that the code's already been written. Note that most of this code can run on both the PPE and the SPEs. For example, Listing 1 shows a nicely trivial function that produces a component-by-component maximum of two vectors:
Listing 1. Finding a maximum
static __inline vector signed int
_max_int_v(vector signed int in1, vector signed int in2)
{
#ifdef __SPU__
vector unsigned int cmp;
cmp = spu_cmpgt(in1, in2);
return (spu_sel(in2, in1, cmp));
#else
return (vec_max(in1, in2));
#endif
}
|
An interesting change is the generally improved support for using the PPE's VMX vector unit to help keep things vectorized on both the PPE and the SPEs. It's too easy to get into the habit of thinking of the PPE as a scalar processor, but it has substantial vector hardware, too.
The SDK has developed a broader repertoire of tools and support features. There's a lot more support for performance tuning in the new SDK. The Performance Debugging Tool (PDT) provides tracing support and a visualizer for trace output. The Data Communication and Synchronization (DaCS) library provides support functions for process management, data movement, and synchronization between PPE and SPE.
The SDK is starting to include things everyone was previously developing independently, plus things that people kept asking for. While performance-tuning your code for a novel architecture is always a bit of work, there are a number of tasks that are pretty consistent. The provided linear algebra library is useful because there is a lot of commonality between different programs that need linear algebra. A tuned and debugged library for this can cut development costs substantially. Similarly, one of the extras is a simulation-grade random number generator. I don't know that I've ever met a programmer who hasn't written at least one very bad random number generator. The SDK providing a good one is a very good idea.
The 2.x Cell/B.E. SDK releases showed a significant step away from purely experimental material toward viable and polished material, but it was not quite polished and completed. The fact that the 3.x SDK is being distributed from developerWorks marks a transition from a somewhat alpha-level product to a development system at or above the quality one would usually expect from a commercial product. There's still experimental material, especially on the Extras CD, but the core SDK has been tested, verified, and even debugged a little. If you've been putting off getting into Cell/B.E. development because the API looked to be in flux, it seems to be more stable now.
With the removal of IDL from the base SDK, you might have to revise your old IDL-based fractal generator to use a more hands-on protocol. That sounds like a fun project, and it'll be interesting to see what impact it has on performance.
The next article in the series introduces the DaCS library in more detail.
Learn
- Use an
RSS
feed to request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)
- Check out the other articles in the "Little broadband engine that could" series.
- Check out the PDF
application note
"Preventing Synergistic Processor
Element Indefinite Stalls Resulting from Instruction Depletion in the Cell Broadband Engine Processor for CMOS SOI 90 nm"
from the IBM Semiconductor solutions library. It states "under some circumstances, the
use of a branch hint instruction can cause a Synergistic Processor Element (SPE)
to stall indefinitely. SPE instruction prefetch can be interfered with in such a
way that an SPE can run out of instructions to execute and make no further
progress, while remaining in the RUN state." (Ever notice that the shorter the
resource, the longer the title?)
- Read
"Maximizing the power of the Cell Broadband Engine processor: 25 tips to optimal application performance"
(developerWorks, June 2006) to learn some
SPE programming tips.
- Ger information from Jonathan Bartlett's series on "Programming high
performance applications on the Cell/B.E. processor" (developerWorks, January 2007
to present) provides an
intro to Linux on the PS3,
programming the PS3's SPE,
an
intro to the SPU,
SPU performance programming,
C/C++ SPU programming,
and
managing smart buffer DMA transfers.
- To learn more on Cell/B.E. programming, try the
developerWorks series:
- "Programming high-performance applications on the Cell/B.E. processor"
- "PS3 fab-to-lab"
- "The little broadband engine that could"
- Speaking of Cell/B.E. SDK documentation,
there's a new blog series that abstracts important topic sections of some of the
major SDK documentation to give you a quick-read on the topic (in case you don't
need a fuller explanation). They're called
Infobombs,
and some topics already covered include
- Getting a successful FC7 install on a PS3.
- The basic structure of an ALF application.
- Double buffering on ALF as an optimization.
- ALF and DaCS for x86 hybrids.
- Configuring and using the Basic Linear Algebra Subprograms.
- Glossaries on ALF and DaCS error codes, trace events, and ALF attributes.
- Refer to the Cell
Broadband Engine documentation section of the IBM Semiconductor Solutions Technical Library for a wealth of downloadable manuals,
specifications, and more.
- Find all Cell/B.E.-related articles, discussion forums, downloads,
and more at the IBM developerWorks Cell
Broadband Engine resource center: your definitive resource for all things Cell/B.E.
- Keep abreast of all the latest in Cell/B.E.
news and information: subscribe to the
IBM microNews newsletter.
Get products and technologies
- Get Cell:
Contact
IBM about custom Cell-based or custom-processor based solutions.
- Contact IBM about custom
Cell/B.E.-based or custom-processor based solutions.
Discuss
- Participate in the discussion forum.
- Get fast answers from IBM experts and
real-world practitioners in developerWorks
Cell
Broadband Engine discussion forum
- The
Cell Broadband Engine/Power Architecture notebook
is a blog-based resource that hosts
news,
as well as two instructional features: the
"Forum watch"
of interesting questions and hot topics from the forum, and the
"Infobomb"
series (short, precise, task-specific, quick-read knowledge "bombs" gleaned from
Cell/B.E. documentation).




