Skip to main content

Debugging Cell Broadband Engine systems

Essential tools and techniques for the Cell BE software developer

Michael Kistler (mkistler@us.ibm.com), Austin Research Laboratory, IBM, Software Group
Michael Kistler is a Senior Software Engineer in the IBM Austin Research Laboratory. He joined IBM in 1982 and has held technical and management positions in MVS, OS/2, and Lotus Notes development. He joined the IBM Austin Research Laboratory in May 2000 and is currently working on simulation technologies for IBM's Power and PowerPC processors and systems. His research interests are parallel and cluster computing, fault tolerance, and full system simulation of high-performance computing systems. Mr. Kistler received his BA in Computer Science from Susquehanna University in 1982 and MS in Computer Science from Syracuse University in 1990.
Sidney Manning (sid@us.ibm.com), Systems & Technology Group, IBM, Software Group
Sid Manning is a Development Programmer with IBM Systems & Technology Group.

Summary:  Software development for new architectures can be an intimidating prospect, but the Cell Broadband Engine™ (Cell BE) SDK 1.1 provides the debugging tools you need to tackle it for the Cell BE architecture. This article describes how to use new versions of the GNU Debugger (GDB) to diagnose problems in both PPU and SPU programs.

Date:  08 Aug 2006
Level:  Intermediate
Activity:  4617 views

Programmers face several new challenges in developing applications for the Cell BE processor. With nine cores, multiple Instruction Set Architectures (ISAs), and non-coherent memory, the design of the Cell BE processor presents an environment where debugging is both more important and more complex than in traditional architectures. The Cell BE SDK contains several tools to aid in debugging, the most important of which are the GNU Debugger, or GDB, and the IBM Full-System Simulator for the Cell Broadband Engine, or SystemSim.

GDB is a command-line debugger available as part of the GNU development environment. Because of the Cell BE processor's unique characteristics, GDB has been modified so that there are actually two versions of the debugger -- ppu-gdb for debugging PPE programs, and spu-gdb for debugging SPU programs. The IBM Full-System Simulator for the Cell BE processor can be used alone or in conjunction with GDB to observe and control program execution in fine detail to facilitate problem diagnosis. The simulator lets you view many aspects of the simulated system with an easy-to-use graphical user interface. You can also control many aspects of the simulator using Tcl commands.

This article describes how to begin debugging Cell BE software, starting with a description of how to debug PPE and SPU programs, followed by a brief description of some simulator features for debugging common problems in Cell BE applications. The final section describes how to debug the Linux® kernel for the Cell BE processor running on the IBM Full-System Simulator.

Debugging PPE programs

There are several approaches to debugging programs running on the PPE. If you have access to Cell BE hardware, you can use the standard approach of running the application under GDB. A similar approach is to run the application under GDB inside the simulator. The file system provided with the Cell BE SDK for use inside the simulator already has GDB installed. After the application is running under GDB, you can simply use the standard commands available in GDB to debug the application. Many excellent resources are available on GDB commands and debugging techniques (see Resources).

Another approach that is available both for hardware-based and simulator-based debugging is to run the application under gdbserver. gdbserver is a companion program to GDB that implements the GDB remote serial protocol. This is used to convert GDB into a client/server-style application, where gdbserver launches and controls the application on the target platform, and GDB connects to gdbserver to specify debugging commands. The connection between GDB and gdbserver can be either through a traditional serial line or through TCP/IP.

To exploit this feature, you must have a version of GDB that supports the 64-bit PowerPC® architecture. On 64-bit PowerPC host systems, this version of GDB might be available as part of the standard OS installation. Otherwise, download and build a version of GDB with the appropriate architecture support. Listing 1 illustrates the steps needed to configure, compile, and install the correct version of GDB. Simply cut and paste this into a file and execute it as a shell script, sh file. If the wget of the GDB source fails, download it manually from one of the many mirror sites and comment out that line of the script. By default the install stage installs into /usr/local/; for those who do not have write access to /usr/local, specify the --prefix option on configure to specify a different installation directory (for example, configure --target=powerpc64-linux --prefix /home/sdkuser/local).


Listing 1. The commands to download and build powerpc64-linux-gdb for x86
#
# Script to download and build gdb for ppc64.
#
    mkdir -p base
    mkdir -p obj
    wget -c ftp://ftp.gnu.org/pub/gnu/gdb/gdb-6.3.tar.bz2 -P base
    tar jxvf base/gdb-6.3.tar.bz2
    pushd obj
    ../gdb-6.3/configure --target=powerpc64-linux
    make all
    make install
    popd

Remote debugging using gdbserver can occasionally be useful when running applications on real hardware, but it is especially valuable for debugging applications on the simulator since it enables the use of graphical debuggers such as DDD and Eclipse. To employ this approach, you need a version of gdbserver for the target platform and network connectivity. gdbserver typically comes packaged with GDB and is installed on the file system for the simulated system in the Cell BE SDK. For network connectivity to the simulated system, you must enable bogusnet support in the simulator, which creates a special ethernet device that uses a "call-thru" interface to send and receive packets to the host system. See the simulator documentation for details on how to enable bogusnet (see Resources).

To start a remote debugging session, launch the application on the target platform (either hardware or inside the simulator) using gdbserver as follows:

gdbserver :2101 myprog arg1 arg2

where :2101 is a parameter to gdbserver specifying the TCP/IP port to be used for communication, myprog is the name of the program to be debugged, and arg1 arg2 are the command line arguments to myprog. Then start GDB from the client system (for the simulator this will be the host system of the simulator):

/usr/local/bin/powerpc64-linux-gdbtui myprog

You should have the source and compiled executable version for myprog on the host system. If your program links to dynamic libraries, GDB will attempt to locate these when it attaches to the program. If you are cross-debugging, you will need to direct GDB to the correct versions of the libraries or it will try to load the libraries of the host platform. For the Cell BE SDK 1.1, this is accomplished with the following gdb command:

set solib-absolute-prefix /opt/sce/toolchain-3.2/ppu/sysroot

Then at the (gdb) prompt connect to the server with the command:

target remote 172.20.0.2:2101

Note that the :2101 parameter in this command matches the TCP/IP port parameter used when starting gdbserver. The IP address of the simulator is generally fixed to the 172.20.0.2 address but you can verify this IP address by issuing the ifconfig command in the console window of the simulator. Giving the simulator a symbolic name is useful and can be done by editing the host system's /etc/hosts file as shown here:


Listing 2. The modified /etc/hosts file
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain   localhost
172.20.0.2      mambo
 

Figure 1 shows the example:


Figure 1. GDB session for myprog attached to gdbserver running on the simulator


Debugging SPU programs

To debug SPU programs, you need to have Version 1.0.1 or later of the Cell BE SDK installed. This is the first version that includes SPU GDB and the necessary kernel and library support. As part of the SDK install process, SPU GDB is installed as spu-gdb on the file system to be used by the system running inside the simulator.

You can use SPU GDB to launch and debug stand-alone SPU programs in much the same way as GDB is used on PPE programs. Stand-alone SPU programs are self-contained applications that execute entirely on the SPU. Listing 3 presents a simple stand-alone SPU program, Listing 4 presents its trivial Makefile, and Figure 2 presents a sample debug session for this program using SPU GDB. (Note: Recursive SPU programs are generally a bad idea due to the limited size of local storage. We've made an exception here since it allows us to illustrate the backtrace command of GDB with a simple example.)


Listing 3. A simple stand-alone SPU program
#include <stdio.h>
#include <spu_intrinsics.h>

unsigned int
fibn(unsigned int n)
{
    if (n <= 2)
        return 1;
    return (fibn (n-1) + fibn (n-2));
}

int main(int argc, char **argv)
{
    unsigned int c;
    c = fibn (8);
    printf ("c=%d\n", c);
    return 0;
}


Listing 4. The simple Makefile
CC=/opt/sce/toolchain-3.2/spu/bin/spu-gcc
simple: simple.c
        $(CC) simple.c -g -o simple

Source-level debugging of SPU programs with GDB is similar in nearly all aspects to source-level debugging for the PPE. For example, you can set breakpoints on source lines, display variables by name, display a stack trace, and single-step execution. Figure 2 illustrates the backtrace output for the simple stand-alone SPU program.


Figure 2. SPU GDB session for a stand-alone SPU program
Figure 2. SPU GDB  session for a stand-alone SPU program

GDB also supports many of the familiar techniques for debugging SPU programs at the assembler code level. For example, you can display register values, examine the contents of memory (which for the SPU means local storage), disassemble sections of the program, and step execution at the machine instruction level. Figure 3 illustrates some of these facilities.


Figure 3. Assembly language features for debugging SPU programs
Figure 3. Assembly language features for debugging SPU programs

One point that deserves special mention is the way GDB deals with the SPU registers. Since each SPU register can hold multiple fixed or floating point values of several different sizes, GDB treats each register as a data structure that can be accessed with multiple formats. The GDB ptype command, illustrated in Listing 5, shows the mapping used for SPU registers.


Listing 5. The SPU's registers as a data structure
(gdb) ptype $r80
type = union __gdb_builtin_type_vec128 {
    int128_t uint128;
    float v4_float[4];
    int32_t v4_int32[4];
    int16_t v8_int16[8];
    int8_t v16_int8[16];
}
 

To display or update a specific slot in an SPU register, specify the appropriate field in the data structure, as shown in Listing 6.


Listing 6. Modifying an SPU register
(gdb)  p $r80.uint128
$1 = 0x00018ff000018ff000018ff000018ff0
(gdb)  set $r80.v4_int32[2]=0xbaadf00d
(gdb)  p $r80.uint128
$2 = 0x00018ff000018ff0baadf00d00018ff0
 


Debugging Cell BE applications

Cell BE applications generally contain functions that execute on the PPE as well as on the SPUs. In some cases, the SPU portion of the application can be recast as a stand-alone SPU program so that the previous approach can be used. Otherwise you can attach SPU GDB to the SPU program after it has been created by the PPE portion of the application. To accomplish this, you must set the environment variable SPU_DEBUG_START to 1. This directs the libspe library to create each SPE thread in the stopped state and wait for a signal to start execution. After creating the SPE thread, libspe also prints a message with the thread ID of the SPU program, which you must specify to SPU GDB with the -p command line option. After SPU GDB has attached to the target thread, it signals the thread to begin execution.

The Full-System Simulator for the Cell Broadband Engine (SystemSim) provides only one console window for interaction with the simulated system, which means you have to either start the application in the background, or background it after it starts running. Whatever method you use, access to a shell prompt is required when it is time to attach to the SPE thread.

One of the most common things programmers do is transfer program data from Main Storage to Local Storage using DMA. If a buffer address is wrong, the program might hang or get a bus error. If mailboxes are used to coordinate communication between the PPE and SPE, and one of the two threads falls out of sync for some reason, the program might hang. In both of these instances it is helpful to be able to use a debugger to track down the source of the problem.

Figure 4 illustrates a typical bus error. The program is run and immediately gets a bus error. Then the environment variable SPU_DEBUG_START is set. This stops the SPE threads just after getting loaded, and gives the programmer a chance to attach to them. After the debugger has attached, the program is allowed to continue to the point of failure. Here the debugger reveals that the SPU program was at line 20 when the error occurred. Since line 20 is waiting for the completion of the DMA that was initiated at line 18, this suggests this DMA is the cause of the bus error. In this case, the effective address specified on the DMA, ef_addr, is the address of the parameter supplied when the SPE thread was created. By inspecting the PPE portion of the application which started the SPE thread, shown in Figure 5, you can now identify the source of the problem. The PPE program malloc'ed the memory but the address looks like a stack address. The PPE source should have passed buffer[i], that is, the missing array subscript was the problem.


Figure 4. Debugging a Bus error with spu-gdb
Figure 4. Debugging a Bus error with spu-gdb

Figure 5. Code causing bus error. Red highlight shows error, yellow shows fix.
Figure 5. Code causing bus error.  Red highlight shows error, yellow shows fix.

Beginning with the Cell BE SDK 1.1, it is also possible to do remote debugging of SPU programs using an approach similar to that described above for PPE programs. Remote debugging of SPU programs is performed with the spu-gdbserver application, which operates in a similar fashion to gdbserver. As with remote debugging for PPE programs, bogusnet should be enabled to allow network connectivity between the host and simulated system.

To start a remote debugging session for the SPU portion of an application, start the application just as you would for local debugging. When it is time to attach the debugger to the SPE thread, use spu-gdbserver in place of spu-gdb. For example, to attach to the SPE thread with ID 375, issue the command:

spu-gdbserver :2101 --attach 375

After spu-gdbserver has attached to the SPE thread, start SPU GDB on the simulator host system and use the target command to attach to the gdbserver that is running on the simulator. Figure 6 shows a GDB session running within the Data Display Debugger (DDD) that has connected to the gdbserver running in the simulator. Note the "target remote mambo:2101" GDB command in the lower window of the figure. In this example, mambo is a symbolic name for the IP address of the system running on the simulator. This name and the corresponding IP address were added to the host system's /etc/hosts file as described above.

Many of the graphical front-ends for GDB can easily be configured to use SPU GDB, enabling full graphical, source-level debugging of SPU programs. Figure 6 shows the use of DDD with SPU GDB. DDD is included with Fedora Core 5. To verify if it is installed do an rpm -q ddd, and the result should be similar to:

ddd-3.3.11-5.2

If ddd is not already installed, you can install it from the FC-5 distribution CDs or with a yum install ddd. To run DDD with SPU GDB, use the -debugger argument to specify the name of the debugger program. If you are using the Cell BE SDK 1.1, this name is /opt/sce/toolchain-3.2/spu/bin/spu-gdb. For example:

ddd -debugger /opt/sce/toolchain-3.2/spu/bin/spu-gdb myprog


Figure 6. SPU GDB session running under DDD
Figure 6. SPU GDB session running under DDD

Debugging features in SystemSim

The simulator has a vast array of debug facilities. This section explores some that are helpful for SPU debugging.

Detecting SPU stack overflow

The SPU Local Store has no memory protection, and memory access wraps from the end of Local Store back to the beginning. An SPU program is free to write anywhere in Local Store including its own instruction space. A common problem in SPU programming is the corruption of the SPU program text when the stack area overflows into the program area. This problem typically does not become apparent until some later point in the program execution when the program attempts to execute code in area that was corrupted, which typically results in an illegal instruction exception. Even with a debugger it can be difficult to track down this type of problem because the cause and effect can occur far apart in the program execution. Adding printf's just moves the failure point around.

The simulator has a feature that monitors selected addresses or regions of Local Store for read or write accesses. This feature can identify stack overflow conditions. To make this easy to use, a Tcl procedure is provided with the simulator that creates the triggers functions to detect stack overflow for a given SPU program. The Tcl procedure is called enable_stack_checking, and is invoked in the simulator command window as follows:

enable_stack_checking [spu_number] [spu_executable_filename]

This procedure uses the nm system utility to determine the area of Local Store that will contain program code and creates trigger functions to trap writes by the SPU into this region (see the SystemSim documentation for further information on trigger functions). Figure 7 shows the simulator console window and command window from a simulator run that employed enable_stack_checking to detect a stack overflow in a Cell BE application.

Note: The simulator's method of detecting stack overflow only looks for stack overflow into the text and static data segments and thus does not detect stack overflows into the heap. Another approach (that currently only works using gcc) is to enable stack checking by the compiler. The -fstack-check compile flag results in the insertion of runtime tests which will detect both forms of stack overflow. The program halts in the event of overflow.


Figure 7. Example of the SystemSim's stack overflow detection facility
Figure 7. Example of the SystemSim's stack overflow detection facility

DMA alignment errors

Another common error in SPU programs is a DMA that specifies an invalid combination of Local Store address, effective address, and transfer size. The alignment rules for DMAs specify that transfers for less than 16 bytes must be "naturally aligned," meaning that the address must be divisible by the size. Transfers of 16 bytes or more must be 16-byte aligned. The size can have a value of 1, 2, 4, 8, 16, or a multiple of 16 bytes to a maximum of 16KB. In addition, the low-order four bits of the Local Store address must match the low-order four bits of effective address (in other words, they must have the same alignment within a quadword). Any DMA that violates one of these rules will generate an alignment exception which is presented to the user as a bus error.

The simulator checks all these alignment requirements and raises alignment exceptions as necessary to match the behavior of the hardware. But in addition to this, the simulator also generates warning messages to aid the programmer in finding and correcting these problems. Figure 8 illustrates a warning message, "WARNING: 441391050: GET command with illegal size (12) (< 16 and not 0, 1, 2, 4, or 8)," (highlighted in red in the figure) issued by the simulator for a DMA alignment exception.


Figure 8. Warning message from simulator for DMA alignment exception
Figure 8. Warning message from simulator for DMA alignment exception

Kernel debugging

Debugging the Linux kernel can be a difficult task, in part because the kernel is a complex piece of software, but also because the debugger cannot rely on basic OS functions being available or working properly. On the Cell BE SDK, kernel debugging is simplified because the IBM Full-System Simulator, part of the Cell BE SDK, allows a debugger running on the host system to debug a Linux kernel running inside the simulator.

To debug the Linux kernel at the source level, you should build a version of the kernel that contains the debugging information. To do this, you need a version of the Linux kernel source that contains support for the Cell BE platform. The easiest way to do this is to download and install the kernel source RPM from the Linux on CBE-based Systems Web site at the Barcelona Supercomputing Center (BSC; see Resources). The process for building the kernel depends on the host system, installed tools, and other details, and is beyond the scope of this paper. This article only covers the necessary steps to enable the debugging information. The example commands shown illustrate these steps on a Linux x86 platform with Cell BE SDK 1.1 installed.

To enable debugging information in the kernel, go to the directory where you will build the kernel and type:

ARCH=powerpc PLATFORM=cell CROSS_COMPILE=/opt/sce/toolchain-3.2/ppu/bin/ppu- make xconfig

The make xconfig command brings up the configuration menu shown in Figure 9. Scroll down and click on the "Kernel hacking" in the left-hand set of options, then click on the "Compile the kernel with debug info" (DEBUG_INFO) on the right-hand side set of options. This option specifies that symbols and source information are retained in the generated binary to allow source-level debugging. In some cases, you might also choose to turn off certain compiler optimizations to make debugging easier. In particular, disabling the -fomit-frame-pointer optimization allows the debugger backtrack command to work reliably, and changing the optimization level from -Os to -O0 will make it easier for GDB to associate individual instructions with a line in the source code. After making all the desired changes, save the configuration, exit the configuration dialog, and then rebuild the kernel.


Figure 9. The make xconfig screen
Figure 9. The make xconfig screen

Next, start the simulator with the newly built kernel. To ensure that the simulator is using the new kernel, create a symbolic link named vmlinux to the new kernel in the current directory before starting the simulator. To verify that the correct kernel is being used, check the name of the kernel file displayed by the simulator during start-up.

Now you are ready to start a debug session. First, start the simulator and click on the "Service GDB" button; notice that the text of the button changes to "Waiting for GDB... ." In another window, change directories to the location where you compiled vmlinux and start the GDB session with the command

/usr/local/bin/powerpc64-linux-gdbtui vmlinux

then at the (gdb) prompt type

break start_kernel
target remote :2345
continue

You should see something very similar to Figure 10.


Figure 10. The kernel debug session
Figure 10. The kernel debug session

Now GDB is attached to the simulator and can monitor and control the execution of the Linux kernel. From here it is possible to set additional breakpoints, display variables by name, display processor registers, display a stack trace, single-step execution, and so on.


Conclusion

The Cell BE SDK provides most of the standard GNU debug facilities and taken with some of the facilities found in the simulation environment, developers have at their disposal the tools needed to debug many problems. The key to simplifying any debugging task is to program defensively from the beginning and use sound software engineering principles.


Resources

Learn

Get products and technologies

Discuss

About the authors

Michael Kistler is a Senior Software Engineer in the IBM Austin Research Laboratory. He joined IBM in 1982 and has held technical and management positions in MVS, OS/2, and Lotus Notes development. He joined the IBM Austin Research Laboratory in May 2000 and is currently working on simulation technologies for IBM's Power and PowerPC processors and systems. His research interests are parallel and cluster computing, fault tolerance, and full system simulation of high-performance computing systems. Mr. Kistler received his BA in Computer Science from Susquehanna University in 1982 and MS in Computer Science from Syracuse University in 1990.

Sid Manning is a Development Programmer with IBM Systems & Technology Group.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration, Linux
ArticleID=153004
ArticleTitle=Debugging Cell Broadband Engine systems
publish-date=08082006
author1-email=mkistler@us.ibm.com
author1-email-cc=dwpower@us.ibm.com
author2-email=sid@us.ibm.com
author2-email-cc=dwpower@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers