The little broadband engine that could: Mailboxes and interrupts

Uncover two means of communication between the SPE and the PPE -- mailboxes and signal notification.

Meet two more means of communication between the SPE and the PPE -- mailboxes and signal notification. Mailboxes are special-purpose registers, similar to the I/O registers used to communicate with peripheral devices on some systems, available on the SPEs and the PPE. Signal notification registers are registers which can be read or written to by the PPE, but which the SPE can only read.

Peter Seebach, Freelance author, Plethora.net

Peter SeebachPeter Seebach has a better motto than the US Postal Service: "He always delivers!" That often means "better ways for more effective communication."



03 July 2007

In the previous column, I looked at the simplest case of running code on one of the Cell Broadband Engine™ (Cell/B.E.) processor's SPEs -- you start a thread, which is given all the information it needs and returns a chunk of data when it's done. This is all well and good if you want to process a single block of data, but each block of data so processed requires the whole program to be loaded to the SPE first. This, of course, is inefficient.

In this article, I'll introduce two more means of communication between the SPE and the PPE: Mailboxes and signal notification. Mailboxes are special-purpose registers available on the SPEs and the PPE, similar to the I/O registers used to communicate with peripheral devices on some systems. Each SPE has a total of three mailboxes -- two outbound (one interrupting, one not) which hold only a single entry, and one inbound which can hold up to four entries. (The capacities of the mailboxes could in theory change in future implementations.)

Signal notification registers are registers which can be read or written to by the PPE, but which the SPE can only read; reading by the SPE clears them. Each SPE has a pair of these registers.

Mailboxes are a special case of the more general channel interface used for a broad variety of communications to and from the SPEs. They are accessed using the same instructions (rdch/wrch/rchcnt) used for the other channels. By default, only the PPE can communicate with SPEs through mailboxes, but it is possible to give SPEs privileges so they can talk to each other directly. By contrast, everyone can write to signal notification registers; on the SPEs, this uses special instructions. All of these features are accessed through memory-mapped I/O (MMIO) registers on the PPE.

Communication options

A more detailed discussion of how to configure all the various settings is beyond the immediate scope of this article, but it's worth pointing out that these communications mechanisms allow a broad variety of configuration changes. Signal notification registers may, for instance, be configured to deliver interrupts or not. SPEs may be given access to each others' mailboxes. The SPE's inbound mailbox may be configured to generate an event (which can generate an interrupt), or it may be configured not to. The SPE has access to both interrupt-creating and interrupt-free mailboxes to the PPE.

In short, if you come up with a reasonably sane notion of how you want to communicate between the SPEs and the PPE, the chances are that it's possible to manage. In general, the default arrangement is to use

  • DMA for large chunks of data,
  • mailboxes to send small data, and
  • signal notification just to send signals.

Note that some channels (including the mailboxes) are blocking on the SPE's side; reads from empty channels or writes to full ones stall the SPE until something changes. This provides an easy way to reduce power consumption compared to an active spin loop. Although there's not an option to directly change this, the rchcnt instruction lets the SPE check whether a given blocking channel has data available to read or space available to write. (Each channel is exclusively used for read or write operations.)

A simple mailbox program

To begin, I'll use an example of a particularly simple program which performs very simple operations on incoming mailbox data, writing them to an outgoing mailbox rather than using any DMA operations at all. To keep the focus on the API rather than on the algorithm, the operation will be increment.

The initial setup on the PPE is similar to that for the previous example of an SPE program which performs operations on data passed to it as arguments; however, this time the data is passed in individually using mailboxes. This can handle a stream of data of arbitrary length and each item is processed in real time.

There are a couple of different ways to do this. Writing to the SPE's mailbox is easy; getting returned data is potentially complicated. The SPU can write data back in two possible ways. One is to use the interrupting mailbox which triggers interrupts on the PPE; the other is to use the non-interrupting mailbox which can simply be read later.

The SPE management library (libspe2) provides a way to read the current value of the non-interrupting mailbox; the spe_out_mbox_read function reads one or more values (up to whatever limit you specify) into an array, and then returns a value indicating how many it read. This is noticeably improved over the libspe1 interface, which read a single value, returning either the value or -1. You can also check to see how many values are available, using spe_out_mbox_status to query for availability. The other option is to use the interrupting mailbox, which can block until an incoming mailbox event wakes the receiver. The SPE library hides this complexity from you. The previous version made you register to obtain interrupts and process them. For this first sample program, I went with the interrupting mailbox, and just used blocking reads.

The SPE code

The SPE code is painfully simple:

Listing 1. SPE code to read and write mailboxes
#include <spu_mfcio.h>

int
main(unsigned long long id, unsigned long long argp) {
            int i;

            while (1) {
                        i = spu_read_in_mbox();
                        ++i;
                        spu_write_out_intr_mbox(i);
            }
            return 0;
}

It really is that simple. Both the read and the write are blocking operations; the spu_readch() function stalls until a datum is available in the incoming mailbox, and the spu_writech() function stalls until the outgoing mailbox is empty. If you don't like syntactic sugar, you can do this directly using the spu_readch() and spu_writech() primitives; for instance, the write to the interrupt mailbox could be written spu_writech(SPU_WrOutIntrMbox, i);.

The PPE code

The PPE code is a bit more interesting. Unlike the simpler sample programs which run a process on the SPE, and then accept results, this program needs to run while the SPE process is running. Since spe_context_run is a blocking operation, that means using threads on the PPE.

To run an SPE program "in the background," you must create a new thread for it to run in. The thread's main loop is an utterly trivial function which simply runs a provided SPE context:

Listing 2. Running the context
void *
inc_on_spe(void *context) {
        spe_context_ptr_t c = context;
        unsigned int entry = SPE_DEFAULT_ENTRY;

        spe_context_run(c, &entry, 0, NULL, 0, 0);
        return NULL;
}

This is the familiar spe_context_run call, which can now be set up as a thread using pthread_create. The code to perform the actual communications doesn't look too bad. A caveat here: I've removed the error-checking code for display purposes. However, the error-checking code is the only reason I have a working code sample to present. Check your errors!

Listing 3. PPE communications
context = spe_context_create(0, 0);
spe_program_load(context, &spu_prog);
pthread_create(&inc, NULL, inc_on_spe, context);

i = 0;
while (i < 10) {
		 int s;
		 s = spe_in_mbox_write(context, &i, 1, SPE_MBOX_ALL_BLOCKING);
		 spe_out_intr_mbox_read(context, &i, 1, SPE_MBOX_ALL_BLOCKING);
		 printf("%d\n", i);
}

Once the thread has been started, the loop is simple: write to the SPE's "in" mailbox (the one where the SPE receives data), then read from the interrupt mailbox. Only the interrupt mailbox supports blocking read operations.

The other way to do it is with the non-interrupt mailbox. Once again, without error checking, that code looks like this:

Listing 4. PPE communications, revised
while (i < 10) {
        spe_write_in_mbox(id, i);
        while (spe_out_mbox_read(context, &i, 1) < 1)
        	        usleep(100000);
        printf("%d\n", i);
}

The setup code is the same. This code uses usleep() to sleep for 100,000 microseconds between queries, to avoid busy-looping, although this is rather inefficient -- it guarantees a wait of 100ms if the mailbox hasn't got data immediately. The inefficiency there is why I showed the interrupt version first; in most contexts, it's better. The only change on the SPE side is replacing spu_write_out_intr_mbox with spu_write_out_mbox.

Real applications

As may seem fairly obvious, no one is going to get much mileage from sending single 32-bit words to the SPE for processing under normal circumstances. Mailboxes are more useful for transmitting instructions to the SPE about data to fetch using DMA. In general, the Cell/B.E. architecture favors letting the SPE, not the PPE, do the DMA fetching. So, for instance, if you're having SPEs process buffers of data, mailboxes would be a good way to send information about how much data has just been plopped in the buffer that was configured in the initial call to spe_context_run(), or even possibly an address at which a new block of data can be found.

Signal notification

The signal notification registers (there are two for each SPE) are 32-bit registers, but unlike the mailboxes, they are typically treated as bits rather than values. The PPE has read/write access to the signal notification registers. The SPE has read-only access, but a read by the SPE clears the value. Each of the registers may be set either in overwrite mode (the default in which new values written replace any existing value) or in "OR" mode in which new values are merged with any existing values using a bitwise OR. This might be useful in cases where multiple sources might wish to raise "signals" for the SPE to process.

If there are no signals, reading a signal register stalls the SPE. (The channel count can be queried, the same as with any other channel.) Reading a signal register atomically clears it; the SPE gets whatever flags were set and any incoming flags that were not yet set will be set after the clear operation occurs, so signals cannot be "lost" in this way.

While the incoming mailbox is a queue holding up to four messages, each signal register is a single value. In overwrite mode, it simply holds the most recent value written; in OR mode, it holds all of the bits which have been set in any values written since the last read.

It is possible for the PPE to configure memory access permissions allowing SPEs to send each other signals. This, coupled with OR mode, allows many-to-one usage where multiple sources can deliver notifications to the SPE of available workloads.

Although the phrase signal notification makes POSIX-oriented programmers think of interrupts, the signal notification registers do not necessarily trigger interrupts. They (as well as the inbound and outbound mailboxes) can be configured to generate events to which SPE software can react instead of constantly polling, but they can also be used without any kind of interrupt being generated.

Next up: Why is scalar slow?

In the next installment, I'll show you why your scalar code is so slow by introducing you to the SIMD-only architecture of the SPE (no scalar operations; all operations are performed on 16-byte vectors). I'll discuss potential challenges developers face in overcoming this and talk about designing code so that your compiler can make efficient use of the SPE.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=237757
ArticleTitle=The little broadband engine that could: Mailboxes and interrupts
publish-date=07032007