Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Changes in libspe: How libspe2 affects Cell Broadband Engine programming

Learn to do basic SPE process management and communication with libspe2

Jonathan Bartlett (johnnyb@eskimo.com), Director of Technology, New Medio
Jonathan Bartlett is the author of the book Programming from the Ground Up, an introduction to programming using Linux assembly language. He is the lead developer at New Media Worx, responsible for developing Web, video, kiosk, and desktop applications for clients.

Summary:  The standard library that Power Processor Element (PPE) programs use to access and manage Synergistic Processor Elements (SPEs), called libspe, has undergone a major revision. The Cell Broadband Engine™ (Cell/B.E.) SDK 2.1 officially changes the library interface from libspe1 to libspe2. In this article, Jonathan Bartlett introduces the libspe2 concepts and shows how to do basic SPE process management and communication with libspe2.

Date:  17 Jul 2007
Level:  Intermediate
Also available in:   Russian  Japanese

Activity:  16611 views
Comments:  

libspe provides an interface for programmers to manage and communicate with SPE processes. The upgrade from libspe1 to libspe2 has involved an entire rethinking of the way that SPE processes are managed from the PPE. In this article, I'll refer to the general library without respect to version number as libspe, and I'll use libspe1 and libspe2 to refer to the specific versions.

Before diving into libspe2, you should note that the purpose of libspe is not to manage physical SPEs. Because the code that uses libspe is user-level code, it should not have the ability to directly modify hardware resources. The user-level code requests SPE resources from the operating system which implements that physically in whatever way the operating system deems best. The operating system, for instance, is allowed to schedule multiple contexts onto a single physical SPE by loading and unloading contexts as needed. Each SPE process will get access to the whole SPE as long as it is running, but the operating system can freely suspend execution at any point (to be resumed later) and schedule another SPE context to run. All of this should be invisible to the user-level software on both the SPE and the PPE, except that excessive context switches on the SPE would cause performance degradations.

In libspe1 the SPE resource the operating system managed was called a thread; in libspe2 this resource is called a context. The change is not simply a terminology change -- instead, the goal of libspe2 is to separate the threading model from the SPE resource management.

Creating and using SPE contexts

The two biggest differences between libspe1 and libspe2 is in the basic execution of SPE code. In libspe1, context creation and execution were handled simultaneously with spe_create_thread. In libspe2 these are separate steps (in fact, as you will see below, it is several steps).

In libspe1, calls to run SPE code were handled asynchronously. spe_create_thread caused the SPE code to be run at the same time as the continuing PPE code. In libspe2, calls to run SPE code are handled synchronously. This means that in order to get one or more SPE simultaneous execution threads, the program will have to create that many threads in the PPE code, and then run the SPE program in each of those threads. This might seem like more work, but in fact it is simple to wrap with helper functions and provides quite a bit of flexibility with the way that SPE programs are run.

Here is a libspe1 program that starts a new SPE thread running (enter as ppe_example.c):


Listing 1. Simple libspe1 thread creation
#include <stdlib.h>
#include <stdio.h>
#include <libspe.h>

/* Assumes you have an embedded SPE program under the name test_handle */
extern spe_program_handle_t test_handle; 

int main() {
	/* Create the thread and record the thread ID */
	speid_t spe_id = spe_create_thread(0, &test_handle, NULL, NULL, -1, 0);
	int status;

	/* Check for errors */
	if(spe_id == 0) {
		perror("Unable to create SPE thread");
		exit(1);
	}

	/* Wait for completion */
	spe_wait(spe_id, &status, 0);

	return 0;
}

Here is the SPE code for this program (enter as spe_example.c):


Listing 2. SPE code for thread demonstration
#include <stdio.h>
int main(unsigned long long spe_id, unsigned long long pdata) {
        printf("Hello world!\n"); /* Calls back to PPE */

        return 0;
}  

To build the program, just enter:

spu-gcc spe_example.c -o spe_example
ppu-embedspu test_handle spe_example spe_example_csf.o
ppu-gcc ppe_example.c spe_example_csf.o -lspe -o ppe_example

The PPE part of the program can be rewritten using libspe2 like this (the SPE code can remain as is):


Listing 3. Simple libspe2 thread creation
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <libspe2.h>

/* NOTE -- the prototype is based on the standard pthread thread signature */
void *spe_test_function(void *data);

/* Assumes you have an embedded SPE program under the name test_handle */
extern spe_program_handle_t test_handle;

int main() {
	pthread_t my_thread;
	int retval;
	
	/* Create Thread */
	retval = pthread_create(
		&my_thread, /* Thread object */
		NULL, /* Thread attributes */
		spe_test_function, /* Thread function */
		NULL /* Thread argument */
	);

	/* Check for thread creation errors */
	if(retval) {
		fprintf(stderr, "Error creating thread! Exit code is: %d\n", retval);
		exit(1);
	}

	/* Wait for Thread Completion */
	retval = pthread_join(my_thread, NULL);
	/* Check for thread joining errors */
	if(retval) {
		fprintf(stderr, "Error joining thread! Exit code is: %d\n", retval);
		exit(1);
	}

	return 0;
}

/* NOTE -- the prototype is based on the standard pthread thread signature */
void *spe_test_function(void *data) {
	int retval;
	unsigned int entry_point = SPE_DEFAULT_ENTRY; /* Required for continuing 
      execution, SPE_DEFAULT_ENTRY is the standard starting offset. */
	spe_context_ptr_t my_context;

	/* Create the SPE Context */
	my_context = spe_context_create(SPE_EVENTS_ENABLE|SPE_MAP_PS, NULL);

	/* Load the embedded code into this context */
	spe_program_load(my_context, &test_handle);

	/* Run the SPE program until completion */
	do {
		retval = spe_context_run(my_context, &entry_point, 0, NULL, NULL, NULL);
	} while (retval > 0); /* Run until exit or error */

	pthread_exit(NULL);
}

The build process for this program is only slightly different:

spu-gcc spe_example.c -o spe_example
ppu-embedspu test_handle spe_example spe_example_csf.o
ppu-gcc ppe_example.c spe_example_csf.o -lspe2 -o ppe_example

As you can see, libspe2 is using pthreads to handle the threading rather than relying on libspe to handle it for you (to learn more about pthreads, see Resources at the end of the article). The procedure that the code is following is this:

  • Create and run the pthread (pthread_create).
  • Create an SPE context (spe_context_create).
  • Load the SPE code into the context (spe_program_load).
  • Run (and re-run) the SPE code until completion (spe_context_run).

By breaking the process up into steps, it gives the programmer direct control over each stage of the process. Want to use a different thread model? Just replace the pthread calls with whatever threading system you want. Want to run an external program file rather than embedding one in the application? Just use spe_image_open before calling spe_program_load. Want to customize how PPE callbacks work? Just change the loop and flags around spe_context_run. Is all this too complicated? Just wrap the sequence that works best for your program into a helper function.

The whole process is pretty straightforward except for spe_context_run. This is the function that actually causes the code on the SPE to run. It is synchronous, meaning that it suspends the execution of the currently running thread on the PPE while it runs the SPE code. When the SPE code terminates or temporarily stops (for signalling, for example), it resumes the PPE thread. In the simple program shown here, you would do just fine to leave out the do..while loop around spe_context_run. However, in more complicated programs, you might want to keep it and check the return value to see if it is a normal program exit or if it is the result of a stop and signal instruction and needs special handling.

spe_context_run also has some interesting arguments. The first argument is simply a reference to the context that you want to run. The second argument, however, is a pointer to the offset into the program that you want to begin execution. That's right, you literally tell the SPE which byte of your program you want to start executing. This is easier to manage than it sounds. First of all, the default starting point of all programs is SPE_DEFAULT_ENTRY. Second, since you are passing a pointer to the value, the spe_context_run function modifies the value for you upon return so that it points to the correct re-entry position. Therefore, as long as it is initialized correctly, the entry-point will actually be maintained by spe_context_run itself. However, this gives you the added flexibility of being able to restart the SPE program after signals at any location you wish.

The third argument is a set of run flags. This is normally fine to set to 0. You can also send the flag SPE_NO_CALLBACKS which means that your program will handle all callback functions directly rather than letting the spe_context_run function auto-dispatch them and hide them from you. The fourth and fifth arguments are argp and envp. These are used at SPE program startup to pass arguments and environment information to the SPE program (this is the same as the same parameters in libspe1's spe_create_thread). Finally, the last argument is an optional pointer to a structure of type spe_stop_info_t. For advanced usage, this gives you detailed information about the reason that spe_context_run returned. However, for most applications the return value has sufficient information.

The return code for spe_context_run is either 0 for successful termination of the program, a positive number indicating that a "stop and signal" instruction was executed (the actual value will be the value set by the signalling instruction), or a -1 indicating an error condition (errno will then be set and detailed information will be in the spe_stop_info_t structure, if provided).

PPE callback functions

In libspe1, SPE programs could easily call certain libc functions on the PPE and therefore essentially have access to the operating system's libc. However, the list of functions available for use by the SPE were essentially pre-defined. libspe2 provides an interface to include additional PPE callback functions that are available for the SPE to call.

To create additional callback functions on the PPE, you have to create both the function itself in your PPE code and generate a stub on the SPE which performs the thunk. When performing the callback, the SPE provides the PPE with a pointer to a single four-byte argument and provides no direct support for returning a value. The usual pattern for calling and returning on the callback is to create a struct to hold both the parameters to the function and the return value. The address to this struct is then what is passed as the data. Remember though that the callback function doesn't get the pointer in this case, but a pointer to the pointer.

The prototype of the callback function is this:

int the_callback_function(void *ls_base, unsigned int data_ptr);

The return value for this is 0 for success, and any other value is considered an error which will cause spe_context_run to return (the return value itself will be in the spe_stop_info_t structure if provided). The ls_base is the effective address (PPE process virtual address) that the SPE's local store has been memory-mapped into. data_ptr is the SPE pointer to the parameter that is passed from the SPE. Because the SPE's address space has been memory-mapped into the PPE's address space, it is very easy to move data back and forth from and to the SPE local store.

Since data_ptr is an SPE pointer, it needs to be treated specially. Remember that SPE pointers refer to the local store address space, not to the PPE process's address space. This means that SPE pointers must all be translated into the PPE address space before dereference. To do that, you need to:

  1. Have ls_base as a char * in order that arithmetic on it will treat it as bytes.
  2. Add the SPE pointer to the ls_base.
  3. Cast the result into the proper pointer type.
  4. Assign or dereference the new pointer.

For pointers to pointers (like data_ptr usually is), the procedure is slightly more difficult since you have to do the procedure I just mentioned for each stage of dereferencing.

Using the memory-mapped local store instead of DMA makes it easy to access the arguments to the callback (which are in the SPE's local store). Accessing the SPE's local store through the memory-mapped interface can be slow because it goes through MMIO instead of DMA. However, for storing and retrieving small bits of data, it is not very problematic like it is for larger data sets. It has two advantages that are particularly useful for implementing callback functions:

  • Memory-mapped data transfers do not have to keep 16-byte data alignments like DMA transfers do.
  • Memory-mapped data transfers are much easier to perform since it is simply a short address-space conversion followed by a pointer dereference rather than setting up a DMA transfer and waiting for it to finish.

In any case, these examples will access the local store through memory-mapped addresses -- you can feel free to implement them using whichever communication method you wish.

For this example, I will implement a callback function which calculates the length of a string. This is a bit redundant because strlen() is already implemented in the SPE's libc, but it makes a good example because it has a return value, simple arguments, and a simple implementation. Here is what the PPE function will look like:


Listing 4. Example PPE callback function
/* We are using this to make explicit what values are SPE addresses (which we treat as 
 offsets from ls_base) */
typedef unsigned int spe_offset_t; 

/* This is our input/output structure */
typedef struct {
	spe_offset_t str; /* Input param - this is an address on the SPE */
	int length; /* Output param */
} my_strlen_param_t;

int my_strlen(void *ls_base_tmp, unsigned int data_offset) {
	/* Convert the void pointer to a char pointer - this allows us to do byte offsets 
      into the memory-mapped address space */
	char *ls_base = (char *)ls_base_tmp; 

	/* Grab the PPE memory-mapped pointer to the "params" pointer variable */
	/* We sent &params to this function, but data_offset actually has the value of 
      &(&params) */
	spe_offset_t params_offset = *((spe_offset_t *)(ls_base + data));

	/* Now use that to find the value of &params */
	my_strlen_param_t *params = (my_strlen_param_t *)(ls_base + params_offset);

	/* Convert the SPE address of the string into the PPE memory-mapped address */
	char *the_string = ls_base + params->str;

	/* Calculate the length of the string and return it */
	params->length = strlen(the_string);

	/* Successful termination */
	return 0;
}

Now that the actual code is written, I still need to write the stub on the SPE to run this function. According to the libspe2 specs, the SPE needs to have the following assembly language instruction sequence to use the callbacks:

stop 0x2110    #stop tells the SPE to stop running and signal the PPE, and passes the 
                value "0x2110"
               #The "21" part of "0x2110" tells the PPE that you are requesting a 
                callback
               #The "10" part of "0x2110" tells the PPE which callback you are 
                requesting

actual_data:
.word 0        #This is the "data" that will be transmitted to your program.  We added 
               #the label actual_data so that values can be loaded into here before 
               #signalling the PPE

#Here is where execution continues.  The callback mechanism jumps the "data" word when 
 returning.

You may have noticed that the code references my PPE code with a number, 0x2110. The hex digits 21 tell the signalling mechanism that this is a PPE callback and the hex digits 10 references the specific callback that I am trying to call. On the PPE side, each callback must be registered with the callback system on a specific number between 0x00 and 0xff. 0 through 3 are reserved for C99, POSIX, and Linux® functions (these use a secondary "opcode" representing the specific function in the highest-order byte). Callback functions can be registered using spe_callback_handler_register. The first parameter is the function address, the second parameter is the callback number to register, and the last parameter should be SPE_CALLBACK_NEW. Note that while this is supposed to throw an error if the number is in use, this particular feature is a bit buggy. Therefore, if your callback isn't being called, try moving it to a different callback number.

Don't worry, you won't have to write assembly instructions yourself; there is a nice C function for that. To make the call from C, you need to use the C library's __send_to_ppe function. This performs the assembly language magic needed to stop the SPE, signal the PPE, and pass a single argument. __send_to_ppe is called with three arguments:

  • The first one is the signal to send to the PPE. PPE callbacks begin with the hex digits 21 and the next two hex digits refer to the callback number being called.
  • The second argument is used for callbacks that code multiple functions and basically serves to allow the callback function to switch functionality based on the value. This value basically gets plugged into the high-order byte of the data pointer. Most simple applications just use 0 for this argument.
  • The third argument is the data (usually a pointer) to pass to the PPE. Remember, however, that the high-order byte of the third parameter gets replaced with the value of the second parameter when the system actually passes the arguments.

__send_to_ppe makes it fairly simple to create stubs. In the string length example, the SPE stub looks like this:

#include <sys/send_to_ppe.h>

/* We could just use a pointer, but doing it this way will make it consistent in the 
 SPE and PPE */
typedef unsigned int spe_offset_t; 

/* This is our input/output structure */
typedef struct {
	spe_offset_t str; /* Input param - this is an address on the SPE */
	int length; /* Output param */
} my_strlen_param_t;

int my_strlen(char *str) {
	/* Construct parameters */
	my_strlen_param_t params = { (spe_offset_t)str, 0 };

	/* Signal PPE (returns when PPE is finished) */
	__send_to_ppe(0x2110, 0, &params);

	/* Return the result */
	return params.length;
}

Here is the complete listing for a program that uses the callback. First, I have the PPE code (enter as callback_ppe.c):


Listing 5. PPE code illustrating callback functions
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <libspe2.h>
#include <string.h>

/* Define Types */
typedef unsigned int spe_offset_t;
typedef struct {
	spe_offset_t str;
	int length;
} my_strlen_param_t;

/* Function Prototypes */
void *spe_test_function(void *data);
int my_strlen(void *ls_base, spe_offset_t data);

/* SPE Code Reference */
extern spe_program_handle_t test_handle;

/* PPE Callback Function */
int my_strlen(void *ls_base_tmp, unsigned int data) {
	char *ls_base = (char *)ls_base_tmp; 
	spe_offset_t params_offset = *((spe_offset_t *)(ls_base + data));
	my_strlen_param_t *params = (my_strlen_param_t *)(ls_base + params_offset);
	char *the_string = ls_base + params->str;
	params->length = strlen(the_string);
	return 0;
}

/* Code to test our function */
int main() {
	pthread_t my_thread;
	int retval;

	/* Register our callback */
	spe_callback_handler_register(my_strlen, 0x10, SPE_CALLBACK_NEW);
	
	retval = pthread_create(&my_thread, NULL, spe_test_function, NULL);
	if(retval) {
		fprintf(stderr, "Error creating thread! Exit code is: %d\n", retval);
		exit(1);
	}

	retval = pthread_join(my_thread, NULL);
	if(retval) {
		fprintf(stderr, "Error joining thread! Exit code is: %d\n", retval);
		exit(1);
	}

	return 0;
}

void *spe_test_function(void *data) {
	int retval;
	unsigned int entry_point = SPE_DEFAULT_ENTRY; 
	spe_context_ptr_t my_context;

	my_context = spe_context_create(SPE_EVENTS_ENABLE|SPE_MAP_PS, NULL);
	spe_program_load(my_context, &test_handle);
	retval = spe_context_run(my_context, &entry_point, 0, NULL, NULL, NULL);
	if(retval) {
		perror("An error occurred running the SPE program");
	}

	pthread_exit(NULL);
}

Here is the SPE code that demonstrates both the stub and some example code that uses it (enter as callback_spe.c):


Listing 6. Program to create PPE callback stub and use it
#include <sys/send_to_ppe.h>

/* Type declarations */
typedef unsigned int spe_offset_t; 
typedef struct {
	spe_offset_t str; /* Input param */
	int length; /* Output param */
} my_strlen_param_t;

/* Function declarations */
int my_strlen(char *str);

/* Callback Stub */
int my_strlen(char *str) {
	my_strlen_param_t params = { (spe_offset_t)str, 0 };
	__send_to_ppe(0x2110, 0, &params);
	return params.length;
}

/* Example Usage */
int main(unsigned long long spe_id, unsigned long long argp, unsigned long long envp) {
	char *my_str = "Hello there!";
	printf("The length of '%s' is %d\n", my_str, my_strlen(my_str));
	return 0;
}

To build, just do:

spu-gcc callback_spe.c -o spe_callback 
ppu-embedspu test_handle spe_callback spe_callback_csf.o
ppu-gcc -lspe2 -lpthread callback_ppe.c spe_callback_csf.o -o ppe_callback_example 

Problem state functions in libspe2

The problem state functions in libspe2 track the ones in libspe1 pretty closely. One difference is that all of the libspe2 functions use the return value solely for status and not for returning results. libspe2 functions return zero for success. On error, the error code will be stored in errno. Another difference is that libspe1 functions usually take the SPE thread ID (of type speid_t) as the first parameter, while in libspe2 it is usually the SPE context pointer (of type spe_context_ptr_t).

Here are some of the function differences between libspe1 and libspe2:


Table 1. Function differences between libspe1 and libspe2
libspe1 functionlibspe2 functionNotes
spe_mfc_* (get, put, getb, etc.)spe_mfcio_*These functions operate the same on both systems.
spe_mfc_read_tag_status_* (all, any, or immediate) spe_mfcio_tag_status_read The libspe2 function combines the functionality of the three libspe1 functions by having the fourth parameter be a constant telling which functionality to use. Also, the fifth parameter in libspe2 is now a pointer to the return value, rather than returning it.
spe_stat_*_mbox spe_*_mbox_status These functions are pretty much direct equivalents.
spe_read_out_mbox spe_out_mbox_read The libspe2 function has two additional parameters. The second parameter is a pointer to the results. Also, in libspe2 a single call to this function can return multiple results. The maximum number of results to read is put in the third parameter. This function also returns the number of results read. A negative result indicates an error.
spe_write_in_mbox spe_in_mobx_write The libspe2 function is much more flexible. The second parameter is a pointer to an array of values to write, with the number of writes stored in the third parameter. The fourth parameter is the behavior, which can be one of SPE_MBOX_ALL_BLOCKING which blocks until all messages in this function have been written, SPE_MBOX_ANY_BLOCKING which blocks until at least one of the messages has been written, and SPE_MBOX_ANY_NONBLOCKING which only writes what it can without blocking. The function returns the number of messages that it could write under the given behaviors, or -1 if there was an error.
spe_get_ls spe_ls_area_get The return value for this function is the pointer to the SPE's local store. On error, a NULL is returned and errno is set.

In conclusion

The transition to libspe2 should provide a more flexible environment for managing SPE processes. The 2.1 SDK officially deprecates libspe1, so transitioning soon is important. Also, libspe2 includes additional facilities such as callback registration which makes SPE programming easier.


Resources

Learn

Get products and technologies

Discuss

About the author

Jonathan Bartlett is the author of the book Programming from the Ground Up, an introduction to programming using Linux assembly language. He is the lead developer at New Media Worx, responsible for developing Web, video, kiosk, and desktop applications for clients.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration, Linux
ArticleID=240942
ArticleTitle=Changes in libspe: How libspe2 affects Cell Broadband Engine programming
publish-date=07172007
author1-email=johnnyb@eskimo.com
author1-email-cc=dwpower@us.ibm.com