IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
Core partners, Part 2: Using DDT to clean up Cell/B.E. app bugs
skip to main content

developerWorks  >  Power Architecture technology  >

Core partners, Part 2: Using DDT to clean up Cell/B.E. app bugs

How to use Allinea's Distributed Debugging Tool to eliminate errors in Cell/B.E. applications

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

David Lecomber (david@allinea.com), CTO, Allinea Software

05 Feb 2008

Allinea Software's Distributed Debugging Tool (DDT) provides an easy-to-use, capable debugger that is able to debug complete Cell Broadband Engine applications, including multiple threads within a single Cell/B.E. processor and clusters of Cell/B.E. processors.

Introduction

The Cell Broadband Engine (Cell/B.E.) architecture is the result of collaboration among IBM, Sony, and Toshiba to design a high-performance and power-efficient processor that can drive applications in fields as diverse as gaming, HDTV, and supercomputing. The Cell/B.E. multiprocessor consists of:

  • The Power Processing Element (PPE) that contains a 64-bit Power Architecture core: the Power Processing Unit (PPU).
  • Eight Synergistic Processor Elements (SPEs), which are specialized coprocessor units, each containing a Synergistic Processing Unit (SPU) with a coherent on-chip bus for communication between the elements.

While the PPE has a familiar processor with at least 256MB RAM (global memory) available, each SPE has a large register set and a local store of 256KB. Access to the global memory from the SPEs is performed using Direct Memory Access (DMA) using the bus or by exchanging messages with the PPE through a mailbox mechanism.

Programming Cell/B.E. processors

Standard PowerPC™ programs can run unmodified on a Cell/B.E. system, such as the IBM QS20 BladeCenter® or the Sony PS3, both of which run the Fedora or Yellow Dog Linux® distributions using just the PPE. However, high-performance computing users are smart to exploit the capability of the SPUs fully. The choices for this fit broadly into three models:

  • Transparent modelUsing libraries that have been optimized for the Cell/B.E. architecture. Many high-performance computing (HPC) codes use proprietary or open source libraries for the computational kernel of the application. These libraries might have been ported to the Cell/B.E. platform. Simple relinking enables an application to use the new libraries.
  • Intermediate modelUsing advanced languages or compiler directives. A number of third parties provide compilers and libraries that can optimize the transfer of data and the computation between the SPUs and the PPU. This can involve rewriting the computational kernel of an application or just adding compiler directives rather similar to parallelizing using OpenMP.
  • Direct modelUsing applications written for both the SPUs and the PPU. The applications are responsible for data transfer and synchronization. This is typically used to directly control the behavior of the Cell/B.E. processor, to hand-optimize the performance of the computational kernel, or to use code that does not fit well with the patterns that the advanced languages support.


Back to top


Feeling the need to debug

Inevitably, where programming exists, there is debugging (or at least the need to debug). That's where Allinea Software's DDT (now in Version 2.1.1) comes in. DDT is a debugger for multithreaded and parallel applications. Like programming for the Cell/B.E. processor, the process of debugging can also differ from your previous experiences. Trying to find bugs by placing print statements in the code is popular, but this leads to a repeated cycle of modify-compile-run that can be rather slow to achieve results. And, the method does not work well in multithreaded environments: particularly with the Cell/B.E. environment in which a print issued by the SPU leads to communication between the PPU and SPU to produce the output, which can change the behavior of the bug.

With graphical debugging tools like DDT, you can control the progress of a program at runtime, and you can see all the values of its variables, its memory, and the current execution stack. This capability provides far more information and flexibility than print statements do, making bug repair a quicker and less frustrating task.

With applications written in the direct model for Cell/B.E. platforms, the debugging need is greatest. At this level, extensions to enable SPU debugging and PPU debugging concurrently are essential. You need to be able to see variables and memory across every part of the processor.

For the intermediate model in which you use the advanced compiler tools and languages for Cell/B.E. apps, a full Cell/B.E. debugger can give a better view of the program state than a standard debugger. For example, you can allow users to see the active SPE threads and check their progress while monitoring the PPU.

DDT is designed to debug programs that typically can have high degrees of parallelism, sometimes thousands of processors simultaneously running the same application. But DDT is also effective for smaller clusters of Linux computers. This is what makes DDT a good choice for Cell/B.E. application debugging. DDT's inherently parallel model of execution and support for parallelism through intuitive process controls, along with sophisticated memory debugging, make it easier for DDT to find common errors with heap memory or detect illegal reads.



Back to top


Using DDT to debug

DDT is a source level debugger. It enables you to see source files and examine the detail of various threads as they progress through execution.

Entree to Cell/B.E. applications

Cell/B.E. applications consist of two components:

  • Code for the PPU
  • Code for the SPUs

The SPU code is usually embedded in the PPU binary at link time using special tools in the IBM SDK for Multicore Acceleration 3.0 (Cell/B.E. SDK 3.0). When DDT loads an application for debugging, it automatically detects both the PPU and SPU code, and it finds the source files for both sections.

Initially the program stops at main(), which is the entry point to the code. Then, as each SPU thread is started, DDT can stop the process so that you can see how the thread was created. And, at the end of an SPU thread, DDT can stop allowing the reason for termination to be easily determined. Tracking thread termination is an important task. Many errors inside SPU threads can cause immediate, and often difficult to understand, thread termination. Examples include exhausting the stack or the lack of available space in the local store.

Seeing the at-a-glance program state

When a program is first paused inside DDT, there are many parts of the DDT interface that can help you understand what is going on with the code.

The source code is highlighted, showing which lines have threads on them. By simply hovering the cursor, the actual threads present are identified.

The Parallel Stack View is another very popular feature for showing divergent behavior across processors and threads. This window displays the call stacks of each thread in a single tree view, optionally including every thread of every process for a program with multiple Cell/B.E. processors.

Where threads share a common stack (meaning they are in the same part of a program), they are on the same branch of the tree. The number of threads at each part of the tree is shown, meaning it is easy to find the threads that are behaving abnormally just by looking at those branches that contain only one thread.

Controlling the PPU and SPUs

The SPU and PPU threads are shown across the top of the DDT main window, indicated by the PPU and SPU acronyms beneath the thread numbers. Clicking on a thread to select it makes that thread current, which means that the variables shown are those present on that particular thread.

You can add breakpoints to the program to stop PPU and SPU threads when a particular point in the program is reached. DDT also allows you to set breakpoints that apply to all or just one of the threads.

DDT can control all the threads or just one at a time, which lets you focus on an individual PPU or SPU thread (by stepping through one line of code at a time while the other threads are paused). Following individual threads through a computation is one of the most useful capabilities of a debugger.

Often Cell/B.E. code exhibits SIMD (Single Instruction Multiple Data) parallelism with each SPU thread executing the same part of a code at the same time. In these situations, it can be instructive to step through each thread ahead a line at a time. When threads diverge, it can be the source of the problem. DDT does this with a special mode, which you can select by toggling the step threads together button.

Examining variables

Each SPE and PPE has its own addressable memory region. For the SPE, the local store holds such variables as automatic stack and global or static variables. It also contains local heap memory allocated using the malloc() system call within the SPE.

There are some traps to SPU programming. A global variable is only global on the same SPE. And each SPE and PPE has a different variable, unlike conventional multithreaded programming in which processes share the same memory. DDT can help locate problems like this with such tools as the Cross Thread Comparison window. It examines the variables with the same identifier across each SPU and PPU thread. This is ideal for spotting rogue values. The common values are also grouped together so that you can really focus on the differences.

If an individual SPU or PPU thread is selected, the values of its variables are shown in the Local Variables tab. Dragging the current line markers in the source code selects the variables shown on each selected line and evaluates these also in the Current Line tab. For user defined data types (structures, classes, and derived types) the contents can be opened by clicking on the variables.

Variables can also be dropped into the Evaluations box for more detailed analysis. You can follow pointers from the evaluation window by right-clicking and selecting dereference.

Viewing low level Cell/B.E. status

There are additional tabs in DDT that show the current status of DMA requests and mailbox events. The DDT shows this information that the kernel provides. DDT can also show other important data, such as the current content of the mailboxes that are used to communicate between the PPU and SPUs. SPUs and PPUs have three mailboxes: two for SPU-to-PPU communication (one interrupting the PPU, while the other must be polled), and one for PPU-to-SPU communication (polled).

Conclusion

This quick overview of DDT should explain how you can use its capabilities for a better operations experience when debugging Cell/B.E. applications. DDT for the Cell/B.E. platform is available now for the IBM QS20 BladeCenter and the Sony PS3 running Fedora Core 6 and the Cell SDK 2.1. Watch for a version of DDT that works with SDK 3.0 soon.

Share this...

digg Digg this story
del.icio.us Post to del.icio.us
Slashdot Slashdot it!



Resources

Learn

Get products and technologies

Discuss


About the author

David's history in High Performance Computing began with the Oxford BSP group in 1993, working on an early alternative model for parallel programming to the emerging but complex MPI standard. He obtained a DPhil in Parallel Computing, producing work on the simulation of shared-memory systems using, and formal semantics for, distributed-memory clusters. He subsequently continued to work at Oxford in post-doctoral and teaching positions, researching parallel libraries and languages. After two years developing software for high-volume online services on clusters of Java computers, he returned to the fold of High Performance Computing at Allinea, researching the development tools needed for parallel programming.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top


IBM, BladeCenter, and PowerPC are trademarks of IBM Corporation in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Other company, product, or service names may be trademarks or service marks of others.


    About IBMPrivacyContact