Skip to main content

Porting workshop, Part 7: Getting the most performance

Evaluating the performance data in part 7-of-7 on porting compute-intense apps to the Cell Broadband Engine architecture

John Easton (JKJ@uk.ibm.com), Infrastructure Architect, Emerging Technologies, IBM
John is currently leading a worldwide emerging technologies team within IBM Systems and Technology Group. He has several roles competing for his time, all of which revolve around advising organizations on how best to exploit new technologies. John has been working for IBM for over 20 years in a variety of technical roles. He worked in Distributed Systems Development in Austin before the launch of the RS/6000, and he holds several patents in the areas of security and systems software. Before taking his current role, he was the European technical leader for grid computing.
Ingo Meents (MEENTS@de.ibm.com), Architect for Cell Solutions, Advanced Planning, Simulation, and Optimization, IBM
Ingo Meents joined IBM nine years ago and works currently as an IT Architect in IBM Global Engineering Solutions (GES). His current focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and creating solutions for this platform. Before his work on the Cell/B.E. platform, he was lead architect for a modeling, simulation, and production planning solution used by the IBM 300mm semiconductor line in Fishkill. Starting as a research student at IBM, Ingo Meents received his doctor's degree from the University of Clausthal in 2001.
Olaf Stephan (STEPHANO@de.ibm.com), Server Specialist, DB2, Warehousing BI Solutions, IBM
Olaf Stephan joined IBM in 1998 and currently works as an IT Specialist in IBM Global Engineering Solutions (GES). His focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and development for this platform. Before his work on the Cell/B.E. platform, he worked in the areas of data management, data warehousing, business intelligence, and data integration. Olaf holds a Masters degree in Electrical Engineering, specializing in Communications Technology, from the University of Applied Sciences, Koblenz, Germany.
Horst Zisgen (horst_zisgen@de.ibm.com), Program Manager Simulation/Operations Research, IBM
Horst has over 10 years of experience in the application of simulation methods and the development of mathematical models in different areas. He is currently leading a development team in IBM Global Engineering Solutions (GES) that is working on a simulation and planning solution used by IBM 300mm manufacturing in Fishkill and by external customers as well. Horst is also the European subject matter expert for the GES supply chain offerings. In addition, Horst regularly gives lectures at universities about simulation and mathematical modeling. Horst is a member of a standardization group for simulation and optimization.
Sei Kato (SEIKATO@jp.ibm.com), Research Staff Member, IBM
Sei Kato is a staff member in IBM Research, Tokyo Research Laboratory. He joined IBM in 2002 after receiving his PhD in Mathematical Science from the University of Tokyo. After joining IBM, Sei has worked on modeling and simulating the performance of Web systems. His is currently working on the acceleration of financial calculations and on large-scale traffic simulations.

Summary:  The seven quick-read parts of this "Porting workshop" series take you on a real-world trip from strategy and planning through workload execution, performance tweaking, optimization, and a solid conclusion. The series describes how to most effectively port compute-intensive applications to the Cell Broadband Engine platform. In part seven, the authors evaluate the performance data to date.

View more content in this series

Date:  06 Nov 2007
Level:  Introductory
Activity:  2887 views

This seven-part, quick-read workshop series is taken from the real-world case study whitepaper, "Porting Financial Markets Applications to the Cell Broadband Engine Architecture" (written by John Easton, Ingo Meents, Olaf Stephan, Horst Zisgen, and Sei Kato, IBM Systems and Technology Group, June 2007; see Resources). You can probably spend less than 10 minutes reading each installment and come out at the end with a strong basic knowledge of the requirements for effectively porting a compute-intensive application (in this case, a financial market application) to the Cell/B.E. processor.

Editor's note: The performance results in this series were obtained using Versions 1 and 2.1 of the Cell Broadband Engine Software Developer Kit (SDK). The current version of the SDK, the IBM Software Development Kit for Multicore Acceleration, Version 3.0, has recently become available and offers many enhancements in functionality, ease of use, and performance over the earlier versions. While the results documented in this article are correct for the earlier versions of the SDK, different results will be obtained with SDK 3.0. Watch for updates to the articles in this series that will describe the latest performance improvements obtained using SDK 3.0.

Workshop series

Part 1: Porting strategies (developerWorks, August 2007)

Part 2: Analysis of the original code (developerWorks, August 2007)

Part 3: Initial performance results (developerWorks, September 2007)

Part 4: Mersenne-Twister (developerWorks, September 2007)

Part 5: Mixed-precision workloads (developerWorks, September 2007)

Part 6: Tying it all together (developerWorks, October 2007)

Part 7: Getting the most performance (developerWorks, November 2007)

Introducing the application

The example application modified in this article is a piece of code used to price a European Option to highlight the benefits of the Cell/B.E. blade. A European Option is a simple financial contract with strict terms and properties that gives the buyer the right to trade a given asset at a specific price on a specific date. It is generally an option that can be exercised only at the end of its life. By contrast, an American Option can be traded at any time between its purchase date and the date at which the contract expires. Because a European Option is traded on a fixed date, it is a simpler calculation to perform because the time variability of the American Option is removed.

You can use several different models price a European Option, depending on the type of asset that underlies it. For example, an option based on currency is calculated using a slightly different model than an option based on futures. In the example described in this series, the calculation is based on a simple Monte Carlo simulation technique. You will generate 200,000,000 uniform, pseudo-random numbers. These numbers are transformed to a log-normal distribution using a Box-Müller transform. Using the random numbers generated, you will execute the financial model repeatedly to simulate a random walk. The final stage of the analysis will be the calculation of the relevant statistics, such as the minimum, maximum, and average and the 95 percent quartile for losses.


Getting the most performance out of Cell/B.E. technology

This work has led to a few observations to share with those readers who would like to perform a similar porting exercise.

Offloading for performance enhancement

The performance data shows that a single SPU is capable of delivering almost half the performance of an Intel® general purpose processing core. As such, the first recommendation has to be to use the performance of the SPUs by offloading as much of the computation onto the SPUs as possible.

Exploiting the SIMD

To really make the SPU perform well, exploiting the SIMD nature of the SPU is vital. Although the compiler technology, especially that provided by XLC, provides some capability to auto-SIMDize the input program code, the very flexibility of a language such as C means that this technology does not always make a particularly good approximation of the best SIMD code.

As such, to take advantage of the SMID nature fully, it might be more appropriate to actually write the SIMD code yourself rather than relying on the compiler to do it. The key to doing this is to make sure that the way you have chosen to rewrite the algorithm to exploit SIMD is optimal for the given application and the input data. In certain situations, such as the case where the source code is heavily templated C++, or where the code makes significant use of object-oriented constructs, you might find that starting from scratch is a much quicker way to implement application code.


Drawing some conclusions

Many organizations are investigating the use of acceleration technologies to improve the performance of their algorithms, because their abilities to exploit ever-increasing numbers of general-purpose processors must be reconciled against the challenges that they face with space, power, and cooling issues. So, how is the Cell/B.E. system positioned against these other accelerator technologies?

At one end of the spectrum are the general-purpose processors from Intel and AMD that make up the majority of the computational infrastructures used by these organizations. The huge numbers of systems based on these processors give them several advantages over other technologies. There is a large supply of professionals skilled in working with them, which leads to lower skills costs, a lot of application development tooling, and therefore a large base of ISVs supporting these platforms. Couple these facts with the relatively easy code porting to these platforms, and it has meant that the majority of application codes run by the business run on systems based on these processors.

These general-purpose processors are very much regarded as jack-of-all-trades devices. They do suffer, though, in that they deliver relatively low performance for the area of the chip, and they do this with relatively high power consumption per computation.

At the other end of the spectrum is a number of what can only be described as esoteric technologies. These include Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), and Graphics Processing Units (GPUs). Though there is a wide range of different devices here, these all typically offer high performance for their chip area and consume much less power per computation than the general purpose processors. Given this, these devices are highly efficient at addressing a very narrow task scope.

You might ask why their adoption has not been wider. This is basically because they are very much the opposite of the general-purpose devices: the skills to program them are rare and relatively expensive. If you join this with the lack of application development tooling, it leads to quite hard code creation and porting for these platforms.

The performance capabilities of these platforms are well understood, but realizing this performance is proving to be a challenge for many organizations. Assuming that you can find the skills to do this, the porting process is generally both slow and costly. This in turn leads to a miniscule ISV base for these platforms today.

So how does the Cell Broadband Engine technology compare? You might expect that the nature of the Cell Broadband Engine environment means that it would be closer to the FPGAs and GPUs than the general-purpose processors. This is not the case. The facts that the Cell/B.E. environment runs a standard Linux® operating system and has a growing range of development tools, ISV applications, and standard libraries means that it is actually much closer to the general-purpose processors than might be expected. Working with other customers in this space has simply confirmed this. They all regard the Cell/B.E. platform as significantly easier to port code to than FPGAs and GPUs.

In an industry where time-to-market is a key concern, this places the Cell/B.E. platform in a unique position: for the workloads that can exploit its capabilities, the Cell/B.E. environment offers excellent performance advantages at only a slightly longer time-to-market than the general purpose devices.

Regardless of the technology used, this series introduced that code optimization is critical to get the best out of any chosen platform. The days of simply recompiling application code to fully exploit a new processor are over. Simple recompile will forever be relegated to a second-class effort position when it comes to effective porting. In the future, programmers will have to use increasingly advanced programming techniques that take into account the management of both code and data parallelism to get the best out of any architecture. The abilities of programmers to adapt to these new technologies will be critical to the success of organizations that are finding that they can no longer achieve commercial advantage simply by throwing more power at the problem.

Constraints over power, space, and cooling issues and an increasing pressure to innovate are causing organizations to reconsider strategies when it comes to designing the compute infrastructures of the future. A system based on a processor as capable as the Cell Broadband Engine processor has to be a strong contender, not only in terms of its computational power, but also its data movement and manipulation abilities. Couple this with a number of strong customer proof points, a growing ecosystem of tools, support from key Independent Software Vendors, and the results of experiments such as this one to see that the Cell/B.E. platform shows that it is extremely credible for taking on the technology challenges of these organizations.


Acknowledgments

Many other individuals contributed (both knowingly and unknowingly) to this piece of work. The authors wish to acknowledge their kind contributions. Without this assistance, this paper would never have been written.


Resources

Learn

Get products and technologies

Discuss

About the authors

John is currently leading a worldwide emerging technologies team within IBM Systems and Technology Group. He has several roles competing for his time, all of which revolve around advising organizations on how best to exploit new technologies. John has been working for IBM for over 20 years in a variety of technical roles. He worked in Distributed Systems Development in Austin before the launch of the RS/6000, and he holds several patents in the areas of security and systems software. Before taking his current role, he was the European technical leader for grid computing.

Ingo Meents joined IBM nine years ago and works currently as an IT Architect in IBM Global Engineering Solutions (GES). His current focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and creating solutions for this platform. Before his work on the Cell/B.E. platform, he was lead architect for a modeling, simulation, and production planning solution used by the IBM 300mm semiconductor line in Fishkill. Starting as a research student at IBM, Ingo Meents received his doctor's degree from the University of Clausthal in 2001.

Olaf Stephan joined IBM in 1998 and currently works as an IT Specialist in IBM Global Engineering Solutions (GES). His focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and development for this platform. Before his work on the Cell/B.E. platform, he worked in the areas of data management, data warehousing, business intelligence, and data integration. Olaf holds a Masters degree in Electrical Engineering, specializing in Communications Technology, from the University of Applied Sciences, Koblenz, Germany.

Horst has over 10 years of experience in the application of simulation methods and the development of mathematical models in different areas. He is currently leading a development team in IBM Global Engineering Solutions (GES) that is working on a simulation and planning solution used by IBM 300mm manufacturing in Fishkill and by external customers as well. Horst is also the European subject matter expert for the GES supply chain offerings. In addition, Horst regularly gives lectures at universities about simulation and mathematical modeling. Horst is a member of a standardization group for simulation and optimization.

Sei Kato is a staff member in IBM Research, Tokyo Research Laboratory. He joined IBM in 2002 after receiving his PhD in Mathematical Science from the University of Tokyo. After joining IBM, Sei has worked on modeling and simulating the performance of Web systems. His is currently working on the acceleration of financial calculations and on large-scale traffic simulations.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=267018
ArticleTitle=Porting workshop, Part 7: Getting the most performance
publish-date=11062007
author1-email=JKJ@uk.ibm.com
author1-email-cc=
author2-email=MEENTS@de.ibm.com
author2-email-cc=
author3-email=STEPHANO@de.ibm.com
author3-email-cc=
author4-email=horst_zisgen@de.ibm.com
author4-email-cc=
author5-email=SEIKATO@jp.ibm.com
author5-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers