Skip to main content

Core partners, Part 3: Transforming Gedae-built portable apps

Watch as an application goes from simulation to DSPs to the Cell/B.E. platform

James Widmore Steed (jsteed@gedae.com), Director of Software Development, Gedae, Inc.
A founding member of Gedae, Steed is the head of product development. Prior to joining to Gedae, Steed worked with Gedae at Lockheed Martin where he was primarily responsible for developing the embeddable library of functions, including testing and creating a database and search utility. Since helping to found Gedae, Steed has been responsible for new product development. His most prominent project is the development of Gedae's new RTL language. Steed earned a computer science degree from Cornell University and a masters in computer science from North Carolina State University.
William Lundgren (wlundgren@gedae.com), President and CEO, Gedae, Inc.
A co-founder of Gedae, Inc., in 2001, William Lundgren is the President and CEO. Prior to founding Gedae, Lundgren started his professional career at Corning Glass Works as a product development physicist. After leaving Corning, Major Lundgren was an active member of the US Air Force Institute of Technology and the USAF Research Laboratories, where he developed new speech and audio-processing technologies. Lundgren moved to RCA Advance Technology Laboratories (subsequently Lockheed Martin), where he spent 16 years leading the development of Gedae and acting as the program manager for eight different projects at ATL. He earned a BS degree in Physics from Rensselaer, and he earned BS and MS degrees in electrical engineering from USAF Institute of Technology. Lundgren is ABD for his PhD in electrical engineering from the University of Pennsylvania.
Kerry B. Barnes (kbarnes@gedae.com), Chief Scientist, Gedae, Inc.
A founding member of Gedae, Inc., Barnes was a Principal Member of the Engineering staff at Lockheed Martin, Advance Technologies Laboratories. At ATL, Barnes was responsible for signal-processing systems software and hardware, single chip FFT design, design and implementation of direct digital frequency synthesizer, mapping of algorithms to parallel hardware, OQPSK modulation and demodulation on Thinking Machine CM2, and development of various software tools and applications. Barnes earned a degree in electrical engineering from Lehigh University and a masters degree in computer and information science from the University of Pennsylvania.

Summary:  This concise study examines the portability of applications developed in Gedae by analyzing the work required to move an example application from a simulation on a PC to actually running on a DSP board (the Mercury Computer System AdapDev system) to running on a multicore Cell Broadband Engine™ (Cell/B.E.). The article illustrates how architecture considerations were taken into account when porting the application to each system. You can see the amount of work required to port the application and the performance of the application on each system.

View more content in this series

Date:  08 Apr 2008
Level:  Intermediate
Activity:  2300 views

Introduction

This article takes you on a tour of how portable an application designed with Gedae technology can be. The example takes an application running as a simulation on a PC, ports it to a working application running on a Mercury Computer System AdapDev system, and then ports it to a working application running on a Cell/B.E. system.

You can see the work required to perform each step, paying close attention to the architecture considerations needed when porting. You can learn about the amount of work required to port the application and the performance of the application on each system.

This article structure:

  • Shows you the basic application
  • Looks at the simulation
  • Illuminates the multiprocessor and multicore implementations

Introducing the application

The example application involves tracking a model train as it goes in a circular path around its track. The application uses input audio data from four microphones placed in a circle around the track to locate the train in the audio field. Using this location, the application pans and tilts a camera to point at the engine of the train. An illustration of this environment is shown in Figure 1.


Figure 1. The tracking algorithm targeting a train running on a track with four microphones as sensor inputs
 The tracking algorithm           targeting a train running on a track with four microphones as sensor inputs

The algorithm is based on RADAR technology. A beamformer correlates a linear array of RADAR sensors to identify a target based on a beam of high correlation. In this application, the array is circular, so the high intensity in the correlation of the four channels forms a spot, as shown in Figure 2.


Figure 2. The correlation of the four audio channels forming a spot of high intensity
The correlation of the four audio channels forming a spot of high intensity

After the spot formation forms this audio map, a detection algorithm identifies the high intensity peak corresponding to the train. The pan and tilt angles are computed to reposition the camera.

Because this application must continue to work in noisy environments, several approaches are used to reduce jitter and ensure smooth tracking. The input channels are run through low-pass filtering to remove frequencies outside the desired band. In the detection algorithm, several peaks are identified and tested. Feedback is used to monitor the speed and direction of the train to help rule out spurious peaks in the correlation data.

Figure 3 shows how the application looks in Gedae.


Figure 3. How the application looks in Gedae
How the application looks in Gedae

Notice that the same flow graph was used for the simulation, quad DSP board, and Cell/B.E. processor implementations.


Introducing the simulation

The application was first developed as a simulation. The environment of the train and camera was simulated, and the four channels of audio data for the microphone array were read from files. To show the results of the simulation, a 3D rendering of the scene is presented from the view angle of the camera, as shown in Figure 4.

Gedae trace table

Gedae collects trace information with low overhead in a circular buffer kept on each processor. Accurate clock synchronization between processors and nanosecond resolution enables you to correctly determine the causal relationships between processors to quickly solve blocking problems that span processors.

The Gedae trace table provides the information needed to optimize performance. A summary timeline for each processor enables you to make load-balancing decisions, while timelines for each primitive enable you to identify slow primitives or primitives needing a granularity increase. The capability to zoom and scroll through the timeline and to collapse and reorder the location of hierarchical boxes in the trace table simplifies navigating the trace table for large graphs. The timeline gives you the information you need to choose the communication method that best meets the throughput and latency requirements of your application. After you optimize performance, you can rerun the graph and measure the improvements.


Figure 4. The simulation, including a 3D-model rendering of the environment
The simulation, including a 3D-model rendering of the environment

Using Gedae-simulation for the example application, experiments were done on multiprocessor implementations of the application to prepare to move it to hardware processing realtime data. Once created, the code of a Gedae application does not have to be changed in order to partition and map it to multiple processors. For the example application, several mappings to virtual processors were used in different configurations. The results were analyzed in the Gedae trace table.


Implementing multiprocessor DSP

To transition the example application to use real world data, it was ported to the Mercury Computer System AdapDev system. The MCS AdapDev system provides an Intel® Pentium host and two quad DSP boards where each DSP is a 500MHz AltiVec processor (see Figure 5).

  • Pentium III development host: 1.26GHz; 1GB SDRAM
  • Quad PowerPC® 500MHz (MCP7410): AltiVec instruction set; 2 MB L2 cache; 256 MB SDRAM; DMA engines
  • RACE++ switched-fabric architecture

Figure 5. The Mercury Computer System AdapDev system
The Mercury Computer System AdapDev system

Physical components for the camera, gimbal, microphones, and audio digital converter (ADC) were assembled. The application was altered to remove the artificial audio source and scene rendering, which were replaced with an interface to the ADC (using PCI), the gimbal (using a serial port), and the camera (using USB).

While the sources and sinks were replaced to use real-world data, the algorithms and their coding did not require changing to create a realtime implementation. For the example, a partitioning and mapping scheme was entered into the partition and map partition tables, as shown in Figure 6.


Figure 6. The partitioning and mapping scheme
The partitioning and mapping scheme

Notice that the application was mapped to multiple DSP processors without changing the code.

The communication protocols can be tweaked using the transfer table by picking direct schedule access transfers (equivalent to DMA) and removing the blocking of the host-to-DSP transfers.

Additionally, you can use automated strip mining to optimize vectorization and improve cache utilization. These changes, which include both changing the graph to use real-world data and setting the implementation parameters, take about one day of effort, and the resulting implementation can process three frames per second—enough to track the train at its maximum speed with few errors or jitter.


Partitioning and transferring

One of the most powerful features of Gedae is the ease of partitioning and mapping an application to run on multiple processing elements (not to mention repartitioning and remapping). By creating an application as a flow graph, it is easy to partition the graph into sections, and then map each of those sections to hardware. This act of partitioning the graph provides information to the Gedae compiler, which helps it plan the application's threads and adjust for the distribution you specified.

When a graph is ready to be distributed, you must first partition the graph. In the partition table, you are presented with a table that lists all the components in the flow graph. Simply select which components should be broken off into a new partition, and assign them to a new partition name.

Mapping those partitions to target hardware is just as easy. In the map partition table, you see a table listing all the partitions you have just made in the partition table. For each partition, select the processor number to which that partition will be mapped from a predefined list.

Many transfer methods can be made available in Gedae: from DMA to shared memory to processor-specific protocols, and through the transfer table. You can easily select transfer methods for each communication. The transfer methods are fully parameterized, allowing for precise specification of buffer sizes and other parameters so that the most efficient transfer can be used.


Reviewing multicore implementation

You are probably already familiar with the Cell/B.E. architecture, so here's a little refresher:

  • Power Processing Element (PPE)
  • Eight Synergistic Processing Elements (SPEs): VMX SIMD instruction set; DMA engines; 256 KB local storage (LS)
  • System memory
  • Element interconnect bus (EIB): Over 200 GBps

Figure 7. The Cell/B.E. architecture
The Cell/B.E. architecture

To illustrate support for the Cell/B.E. platform, this application was ported to the Sony Playstation 3 (PS3). The Cell/B.E. on the Sony PS3 provides a dual-threaded PPE core, as well as 6 enabled SPEs. The SPEs are very efficient vector processors, but they have strict memory restrictions, including only 256 KB of local storage and no cache. Programming a Cell/B.E. system by hand requires careful management and planning of memory and data movement between the SPEs.

Gedae addresses the issues of memory management and data movement directly. The automated implementation of these issues simplifies development for a Cell/B.E. system. After altering the application to use a USB-based ADC (the PS3 does not have a PCI slot), the application can be easily moved. The process of optimizing it for the multicore architecture should take about two hours.

To optimize the application, the compute-intensive signal-processing portion of the application is partitioned for the six SPEs. The memory footprint of the program and data is taken into account during this process using the Schedule Parameters dialog to analyze the size of the threads that will be created for each partition. To reduce the size of the memory footprint, the automated strip-mining capability is used, allowing a set of audio vectors to be processed independently on each SPE instead of simultaneously. Additionally, a primitive that performs a column-wise sum of a matrix is identified as pushing the thread memory size over the limit. To fix the issue, the primitive is replaced with one that integrates a series of row vectors.

The Gedae trace table is used to analyze the performance. During this process, one primitive can be identified as being slow, and you can recode it to use a unity stride. Based on the processor load, you can also alter the distribution of the work. After two hours in the final optimization, four SPEs are used to do a majority of the preprocessing (one SPE per audio channel), including the band filtering of the frequency spectrum. The other two SPEs are used to combine the data in the correlation calculation of the spot formation. The PPE performs the detection algorithm and interfaces with the I/O devices. With this implementation, the application is able to process almost 15 frames each second on the Cell/B.E. system, providing a much smoother tracking of the train.


Conclusion

This article started out with a compute-intensive application that was ported to these systems:

SystemProcessorsSensorsOutputUI
PC-based simulation1Datafile of 4 recorded channelsConstellation displayRendered scene
Multiprocessor DSP board
(MCS AdapDev)
4 500MHz PowerPC AltiVec
1 Pentium
ICS 610 ADC PCI board, 4 microphonesDirected Perception D46-17 Pan-Tilt UnitMatrix Vision BlueFOX USB camera displayed using video for Gedae
Multicore system
(PS3-Cell/B.E.)
1 PPE, 6 SPEsM-Audio Quattro USB device, 4 microphonesDirected Perception D46-17 Pan-Tilt UnitMatrix Vision BlueFOX USB camera displayed using video for Gedae

Here is the breakdown of the amount of time these tasks took:

  • Simulation (4 weeks programmer time, yielding no change in performance);
  • DSP (6 hours programmer time, yielding a gain of 3 Hz);
  • PS3 (4 hours programmer time, yielding a gain of 15 Hz);

By following this example, you can reach these conclusions:

  • Gedae helps you easily move the application to new hardware.
  • Changes to the implementation are handled by automation and simple GUIs, not changes to code.
  • With only minimal efforts, you can achieve relatively high performance gains.

Resources

Learn

Get products and technologies

Discuss

About the authors

A founding member of Gedae, Steed is the head of product development. Prior to joining to Gedae, Steed worked with Gedae at Lockheed Martin where he was primarily responsible for developing the embeddable library of functions, including testing and creating a database and search utility. Since helping to found Gedae, Steed has been responsible for new product development. His most prominent project is the development of Gedae's new RTL language. Steed earned a computer science degree from Cornell University and a masters in computer science from North Carolina State University.

A co-founder of Gedae, Inc., in 2001, William Lundgren is the President and CEO. Prior to founding Gedae, Lundgren started his professional career at Corning Glass Works as a product development physicist. After leaving Corning, Major Lundgren was an active member of the US Air Force Institute of Technology and the USAF Research Laboratories, where he developed new speech and audio-processing technologies. Lundgren moved to RCA Advance Technology Laboratories (subsequently Lockheed Martin), where he spent 16 years leading the development of Gedae and acting as the program manager for eight different projects at ATL. He earned a BS degree in Physics from Rensselaer, and he earned BS and MS degrees in electrical engineering from USAF Institute of Technology. Lundgren is ABD for his PhD in electrical engineering from the University of Pennsylvania.

A founding member of Gedae, Inc., Barnes was a Principal Member of the Engineering staff at Lockheed Martin, Advance Technologies Laboratories. At ATL, Barnes was responsible for signal-processing systems software and hardware, single chip FFT design, design and implementation of direct digital frequency synthesizer, mapping of algorithms to parallel hardware, OQPSK modulation and demodulation on Thinking Machine CM2, and development of various software tools and applications. Barnes earned a degree in electrical engineering from Lehigh University and a masters degree in computer and information science from the University of Pennsylvania.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=299928
ArticleTitle=Core partners, Part 3: Transforming Gedae-built portable apps
publish-date=04082008
author1-email=jsteed@gedae.com
author1-email-cc=kane@us.ibm.com
author2-email=wlundgren@gedae.com
author2-email-cc=kane@us.ibm.com
author3-email=kbarnes@gedae.com
author3-email-cc=kane@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers