IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
Porting workshop 1: Processor porting strategies
skip to main content

developerWorks  >  Power Architecture technology  >

Porting workshop 1: Processor porting strategies

Introducing the 7-part, fast-read series on porting compute-intense applications to the Cell Broadband Engine Architecture

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

John Easton (JKJ@uk.ibm.com), Senior Software Engineer, IBM Global Services
Ingo Meents, Architect for Cell Solutions, Advanced Planning, Simulation, and Optimization, IBM
Olaf Stephan, Server Specialist, DB2, Warehousing BI Solutions, IBM
Horst Zisgen, Program Manager Simulation/Operations Research, IBM
Sei Kato, Research Staff Member, IBM

07 Aug 2007

The seven, quick-read parts of this series, "Porting workshop," take you on a real-world trip from strategy and planning through workload execution through performance tweaking through optimization to a solid conclusion -- how to most effectively port compute-intensive applications to the Cell Broadband Engine™ platform. In part one, discover the top three top strategies for porting.

In this seven-part, quick-read workshop series, taken from the real-world case study whitepaper, "Porting Financial Markets Applications to the Cell Broadband Engine™ Architecture" (see Resources), you can spend minimal time reading each installment and complete the series with a strong basic knowledge of the requirements for effectively porting a compute-intensive application (in this case, a financial market application) to the Cell/B.E. processor.

Upcoming in this series:
|_ 1. Porting strategies. (August 7, 2007)
|
|_ 2. Original code analysis. (August 21, 2007)
|
|_ 3. Initial performance results. (September 4, 2007)
|
|_ 4. Mersenne-Twister. (September 18, 2007)
|
|_ 5. Mixed-precision workloads. (October 2, 2007)
|
|_ 6. Tying it all together. (October 16, 2007)
|
|_ 7. Getting the most performance. (November 6, 2007)

A description of the application

The example application that the modification applies to is a piece of code used to price a European Option to highlight the benefits of the Cell/B.E. blade. A European Option is just a simple financial contract with strict terms and properties that gives the buyer the right to trade a given asset at a specific price on a specific date -- it is generally an option that can only be exercised at the end of its life. In constrast, an American Option may be traded at any time between its purchase date and the date at which the contract expires.

As such, because a European Option is traded on a fixed date, it is a simpler calculation to perform since the time variability of the American Option has been removed.

A number of different models can be used to price a European Option depending on the type of asset that underlies it. For instance, an option based on currency is calculated using a slightly different model than an option that is based on futures. In the case described in this series, the calculation is based on a simple Monte Carlo simulation technique. A large number (200,000,000 in this case) of uniform, pseudo-random numbers needs to be generated. These numbers are transformed to a log-normal distribution via a Box-Müller transform. Using the random numbers generated, the financial model is executed repeatedly to simulate a random walk. The final stage of the analysis is the calculation of the relevant statistics, namely the minimum, maximum and average and the 95 percent quantile for losses.

TOPIC: Porting strategies for the Cell/B.E. platform

The nature of the Cell/B.E. environment makes it an interesting platform to port code to. A number of strategies can be applied of increasing effort and return.

Simply recompile

The most basic strategy is a simple recompilation of the existing source code for the Cell/B.E. platform. The IBM XLC compiler generates code for both the Power processing unit (PPU) core of the chip, as well as the eight Synergistic Processing Unit (SPU) cores. The enhancements that have been made to the Linux™ operating system that runs on the Cell/B.E. platform enable the Linux kernel to schedule processes and threads across the multiple functional units of the chip. This strategy appears to allow for some potential performance improvement simply by allowing work to be scheduled across more processing cores without any significant investment in time and effort. The results, however, do not demonstrate any significant performance improvement; the achieved results using the alternative GNU gcc compiler for this workload are extremely poor indeed.

This is mainly due to the optimizations that XLC provides. For the vast majority of applications, simply recompiling the existing code is unlikely to show any performance improvement. However, that does not mean that this activity should be ignored. Getting the existing code running on the PPU core of the Cell/B.E. chip is the first step in exploiting the hardware and, for this reason, a simple recompilation is often sufficient.

Build structural changes

The next strategy for porting to the Cell/B.E. environment is to make some structural changes to the code. Rather than modifying the functional logic, you can create a simple framework to explicitly start separate execution threads on each SPU. By explicitly driving work to the SPUs, you can better exploit the "cluster" of processing cores on the Cell/B.E. blade.

This can be achieved with minimal code reengineering effort. In this case, involving a basic port of the code that performs the random-number generation and valuation with the simple goal of keeping each of the SPU processing cores active using the OS in the pricing loop. This can be achieved by splitting the random-number generation across all of the Cell/B.E. cores. We still take advantage of the XLC compiler's ability to generate executable code for both the PPU and SPU elements of the chip. As such, we are not modifying the code that performs the random-number generation; rather, we are creating a simple framework to explicitly manage the threads on each SPU.

This thread management framework sets up the execution contexts for each thread, creates the threads on each of the SPUs, and then waits until all of the threads complete. This framework is a surprisingly simple yet effective way to better exploit the Cell/B.E. environment, one that we have also found to be useful in many different application cases and so we have successfully reused our simple framework on multiple engagements.

Functional code changes

Our third and final strategy is to make functional changes to the code. In doing this, we re-engineer the selected library functions to exploit vectorization on the SPU core and use SPU intrinsics. In theory, this should result in close to the theoretical 70 to 100 times speed up, but this step does require the logic of the code to be changed so that it is vectorized. This is an exercise that requires more effort than the previous two steps, but it delivers the greatest return. These vectorized library functions are then used in conjunction with the threading framework we described since, rather than running multiple threads based on the original code in parallel, we are now running multiple threads based on the new vectorized code.

Acknowledgements and notes

Many other individuals contributed (both knowingly and unknowingly) to this piece of work. The authors wish to acknowledge their kind contributions. Without their assistance this paper would never have been written.

So why should you read the original whitepaper? The original whitepaper combines the contents of this entire series -- everything's available now. The paper also provides a tidy intro to the Cell/B.E. architecture, and it explains why the processor is important, especially for compute-intensive financial market applications.



Resources

Learn

Get products and technologies
  • The OpenMP API, a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications, supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix and Windows NT platforms.

  • Here is the centerpiece of Cell/B.E. development, the latest Cell/B.E. SDK release, version 2.1.

  • We mentioned the IBM XLC compiler for porting efforts -- it is optimized for the Cell/B.E. processor.

  • The developerWorks Cell Broadband Engine Resource Center is your clearinghouse for Cell/B.E.-related resources, downloads, and news.


Discuss


About the authors

John Easton has worked for IBM for 18 years in a variety of UNIX technical roles. He worked in Distributed Filesystems development in Austin during the development of the RS/6000 and holds several patents pertaining to security and distributed systems. From 1990 to 2002, he focused on high availability and clustering, becoming the worldwide technical support leader for these areas and part of the Poughkeepsie lab team responsible for architecting and developing the HACMP and HAGEO products. He designed carrier-grade Linux solutions for several major telecommunications companies and represented IBM to the Service Availability Forum. Since 2002, he has been part of IBM's Grid Computing organization and the senior grid architect for EMEA. He is responsible for designing and implementing grid solutions for major companies across Europe. He brings expertise from his previous role, designing mission-critical grid solutions and influencing IBM product strategy in these areas.


Ingo Meents joined IBM nine years ago and works currently as an IT Architect in IBM Global Engineering Solutions (GES). His current focus is to provide IBM's customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and creating solutions for this platform. Prior to his work on the Cell/B.E. platform, he has been the lead architect for a modeling, simulation, and production planning solution used by IBM's 300mm semiconductor line in Fishkill. Starting as a research student at IBM, Ingo Meents received his doctor's degree from the University of Clausthal in 2001


Olaf Stephan joined IBM in 1998 and works currently as an IT Specialist in IBM Global Engineering Solutions (GES). His focus is to provide IBM's customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and development for this platform. Prior to his work on the Cell/B.E. platform he has worked in the area data management, data warehousing, business intelligence, and data integration. Olaf holds a Masters degree in Electrical Engineering, specializing in Communications Technology, from the University of Applied Sciences, Koblenz, Germany.


Horst has more than 10 years experience in the application of simulation methods and the development of mathematical models in different areas. He is currently leading in IBM's Global Engineering Solutions (GES) division the development team of a simulation and planning solution which is used by the IBM's 300mm manufacturing site in Fishkill and external customers as well. Furthermore, he is the European subject matter expert for the GES supply chain offerings. In addition he gives regularly lectures at university in the filed of simulation and mathematical modeling and he is member of a standardization group concerning simulation and optimization.


Sei Kato is a researcher staff member of IBM Research, Tokyo Research Laboratory. He joined IBM in 2002 after receiving his PhD in Mathematical Science from the University of Tokyo. After joining IBM, he has worked on modeling and simulating the performance of Web system. His current studies are the speed-up of financial calculation and the large-scale traffic simulation.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top



    About IBMPrivacyContact