 | Level: Intermediate Peter Seebach (developerworks@seebs.plethora.net), Freelance author, Plethora.net
03 Jul 2007 Meet two more means of communication between the SPE and the PPE --
mailboxes and signal notification. Mailboxes are special-purpose registers, similar to the I/O registers used to communicate with peripheral devices on some systems, available on the SPEs and the PPE. Signal notification registers are registers which can be read or written to by the PPE, but which the SPE can only read.
In the previous column, I looked at the simplest case of running code on one
of the Cell Broadband Engine™ (Cell/B.E.) processor's SPEs -- you start a thread, which is given all the information it needs and
returns a chunk of data when it's done. This is all well and good if you want
to process a single block of data, but each block of data so processed
requires the whole program to be loaded to the SPE first. This, of course,
is inefficient.
In this article, I'll introduce two more means of communication between
the SPE and the PPE: Mailboxes and signal notification. Mailboxes
are special-purpose registers available on the
SPEs and the PPE, similar to the I/O registers used to
communicate with peripheral devices on some systems. Each SPE has a total of three mailboxes -- two outbound
(one interrupting, one not) which hold only a single entry, and one
inbound which can hold up to four entries. (The capacities of the mailboxes
could in theory change in future implementations.)
Signal notification registers are registers which can be read or written to
by the PPE, but which the SPE can only read; reading by the SPE clears them.
Each SPE has a pair of these registers.
Mailboxes are a special case of the more general channel interface
used for a broad variety of communications to and from the SPEs. They are
accessed using the same instructions (rdch/wrch/rchcnt) used for the other
channels. By default, only the PPE can communicate with SPEs through
mailboxes, but it is possible to give SPEs privileges so they can talk to each other
directly. By contrast, everyone can write to signal notification registers;
on the SPEs, this uses special instructions. All of these features are
accessed through memory-mapped I/O (MMIO) registers on the PPE.
Communication options
A more detailed discussion of how to configure all the various settings is beyond
the immediate scope of this article, but it's worth pointing out that these
communications mechanisms allow a broad variety of configuration changes.
Signal notification registers may, for instance, be configured to deliver
interrupts or not. SPEs may be given access to each others' mailboxes.
The SPE's inbound mailbox may be configured to generate an event (which can
generate an interrupt), or it may be configured not to. The SPE has access to both
interrupt-creating and interrupt-free mailboxes to the PPE.
In short, if you come up with a reasonably sane notion of how you want to
communicate between the SPEs and the PPE, the chances are that it's possible
to manage. In general, the default arrangement is to use
- DMA for large chunks of data,
- mailboxes to send small data, and
- signal notification just to send signals.
Note that some channels (including the mailboxes) are blocking
on the SPE's side; reads from empty channels or writes to full ones stall
the SPE until something changes. This provides an easy way to reduce power
consumption compared to an active spin loop. Although there's not an option
to directly change this, the rchcnt instruction lets the SPE check whether a
given blocking channel has data available to read or space available to
write. (Each channel is exclusively used for read or write operations.)
A simple mailbox program
To begin, I'll use an example of a particularly simple program which
performs very simple operations on incoming mailbox data, writing them to an
outgoing mailbox rather than using any DMA operations at all. To keep the
focus on the API rather than on the algorithm, the operation will be increment.
The initial setup on the PPE is similar to that for the previous example of
an SPE program which performs operations on data passed to it as arguments;
however, this time the data is passed in individually using mailboxes.
This can handle a stream of data of arbitrary length and each item is
processed in real time.
There are a couple of different ways to do this. Writing to the SPE's mailbox
is easy; getting returned data is potentially complicated. The SPU can write data back in
two possible ways. One is to use the
interrupting mailbox which triggers interrupts on the PPE; the other is to
use the non-interrupting mailbox which can simply be read later.
The SPE management library (libspe2) provides a way to read the current value
of the non-interrupting mailbox; the spe_out_mbox_read function reads one or
more values (up to whatever limit you specify) into an array, and then returns
a value indicating how many it read. This is noticeably improved over the
libspe1 interface, which read a single value, returning either the value or
-1. You can also check to see how many values are available, using spe_out_mbox_status
to query for availability. The other option is to use the interrupting mailbox,
which can block until an incoming mailbox event wakes the receiver. The
SPE library hides this complexity from you. The previous version made you
register to obtain interrupts and process them. For this first sample
program, I went with the interrupting mailbox, and just used blocking reads.
The SPE code
The SPE code is painfully simple:
Listing 1. SPE code to read and write mailboxes
#include <spu_mfcio.h>
int
main(unsigned long long id, unsigned long long argp) {
int i;
while (1) {
i = spu_read_in_mbox();
++i;
spu_write_out_intr_mbox(i);
}
return 0;
}
|
It really is that simple. Both the read and the write are blocking
operations; the spu_readch() function stalls until a datum is available
in the incoming mailbox, and the spu_writech() function stalls until the
outgoing mailbox is empty. If you don't like syntactic sugar, you can do this
directly using the spu_readch() and
spu_writech() primitives; for instance, the
write to the interrupt mailbox could be written spu_writech(SPU_WrOutIntrMbox, i);.
The PPE code
The PPE code is a bit more interesting. Unlike the simpler sample programs
which run a process on the SPE, and then accept results, this program needs
to run while the SPE process is running. Since spe_context_run is a blocking operation, that means
using threads on the PPE.
To run an SPE program "in the background," you must create a new thread
for it to run in. The thread's main loop is an utterly trivial function
which simply runs a provided SPE context:
Listing 2. Running the context
void *
inc_on_spe(void *context) {
spe_context_ptr_t c = context;
unsigned int entry = SPE_DEFAULT_ENTRY;
spe_context_run(c, &entry, 0, NULL, 0, 0);
return NULL;
}
|
This is the familiar spe_context_run call, which
can now be set up as a thread using pthread_create.
The code to perform the actual communications doesn't look too bad. A caveat
here: I've removed the error-checking code for display purposes. However,
the error-checking code is the only reason I have a working code sample to
present. Check your errors!
Listing 3. PPE communications
context = spe_context_create(0, 0);
spe_program_load(context, &spu_prog);
pthread_create(&inc, NULL, inc_on_spe, context);
i = 0;
while (i < 10) {
int s;
s = spe_in_mbox_write(context, &i, 1, SPE_MBOX_ALL_BLOCKING);
spe_out_intr_mbox_read(context, &i, 1, SPE_MBOX_ALL_BLOCKING);
printf("%d\n", i);
}
|
Once the thread has been started, the loop is simple: write to the SPE's
"in" mailbox (the one where the SPE receives data), then read from the
interrupt mailbox. Only the interrupt mailbox supports blocking read operations.
The other way to do it is with the non-interrupt mailbox. Once again, without
error checking, that code looks like this:
Listing 4. PPE communications, revised
while (i < 10) {
spe_write_in_mbox(id, i);
while (spe_out_mbox_read(context, &i, 1) < 1)
usleep(100000);
printf("%d\n", i);
}
|
The setup code is the same. This code uses usleep() to
sleep for 100,000 microseconds between queries, to avoid busy-looping, although this
is rather inefficient -- it guarantees a wait of 100ms if the mailbox hasn't got
data immediately. The inefficiency there is why I showed the interrupt version
first; in most contexts, it's better. The only change on the
SPE side is replacing spu_write_out_intr_mbox with spu_write_out_mbox.
Real applications
As may seem fairly obvious, no one is going to get much mileage from sending
single 32-bit words to the SPE for processing under normal circumstances.
Mailboxes are more useful for transmitting instructions to the SPE about
data to fetch using DMA. In general, the Cell/B.E. architecture favors letting the
SPE, not the PPE, do the DMA fetching. So, for instance, if you're having
SPEs process buffers of data, mailboxes would be a good way to send
information about how much data has just been plopped in the buffer that was
configured in the initial call to spe_context_run(), or
even possibly an address at which a new block of data can be found.
Signal notification
The signal notification registers (there are two for each SPE) are 32-bit
registers, but unlike the mailboxes, they are typically treated as bits rather
than values. The PPE has read/write access to the signal notification
registers. The SPE has read-only access, but a read by the SPE clears the
value. Each of the registers may be set either in overwrite mode (the
default in which new values written replace any existing value) or in "OR" mode in which
new values are merged with any existing values using a bitwise OR. This
might be useful in cases where multiple sources might wish to raise "signals"
for the SPE to process.
If there are no signals, reading a signal register stalls the SPE. (The
channel count can be queried, the same as with any other channel.) Reading
a signal register atomically clears it; the SPE gets whatever flags were set
and any incoming flags that were not yet set will be set after the clear
operation occurs, so signals cannot be "lost" in this way.
While the incoming mailbox is a queue holding up to four messages, each
signal register is a single value. In overwrite mode, it simply holds the
most recent value written; in OR mode, it holds all of the bits which have
been set in any values written since the last read.
It is possible for the PPE to configure memory access permissions allowing
SPEs to send each other signals. This, coupled with OR mode, allows
many-to-one usage where multiple sources can deliver notifications to the SPE
of available workloads.
Although the phrase signal notification makes POSIX-oriented programmers
think of interrupts, the signal notification registers do not necessarily
trigger interrupts. They (as well as the inbound and outbound
mailboxes) can be configured to generate events to which SPE software can react
instead of constantly polling, but they can also be used without any kind
of interrupt being generated.
Next up: Why is scalar slow?
In the next installment, I'll show you why your scalar code is so slow by introducing you
to the SIMD-only architecture of the SPE (no scalar operations; all operations are
performed on 16-byte vectors). I'll discuss potential challenges developers face in overcoming this and talk about designing code so that your compiler can make efficient use of the SPE.
Resources Learn
- Use an RSS
feed to request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)
- Check out the other articles in the "Little broadband engine that could" series.
-
The Unrolling AltiVec series (developerWorks, 2005) is an oldie but goodie that exposes you to the various guises of this vector processing SIMD technology.
-
Jonathon Bartlett's series on "Programming high performance applications on the Cell/B.E. processor" (developerWorks, January 2007 to present) provides an intro to Linux on the PS3, programming the PS3's SPE, an intro to the SPU, SPU performance programming, C/C++ SPU programming, and managing smart buffer DMA transfers.
-
You know, to use the Cell/B.E. SDK 2.1, you'll have to be running Fedora Core 6 -- this quick install guide (developerWorks, April 2007) should help you get FC6 up and running.
-
The
IBM Semiconductor Solutions Technical Library
Cell Broadband Engine
documentation section contains a wealth of downloadable manuals,
specifications, and much more.
-
Find all Cell/B.E.-related articles, discussion forums, downloads,
and more at the IBM
developerWorks Cell
Broadband Engine resource center: your definitive resource for all
things Cell/B.E.
-
The IBM microNews newsletter delivers Cell/B.E happenings to your desktop twice a month.
Get products and technologies
Discuss
About the author  | 
|  | Peter Seebach has a better motto than the US Postal Service: "He always delivers!" That often means "better ways for more effective communication." |
Rate this page
|  |