This seven-part, quick-read workshop series is taken from the real-world case study whitepaper, "Porting Financial Markets Applications to the Cell Broadband Engine Architecture" (written by John Easton, Ingo Meents, Olaf Stephan, Horst Zisgen, and Sei Kato, IBM Systems and Technology Group, June 2007; see Resources). You can probably spend less than 10 minutes reading each installment and come out at the end with a strong basic knowledge of the requirements for effectively porting a compute-intensive application (in this case, a financial market application) to the Cell/B.E. processor.
Editor's note: The performance results in this series were obtained using Versions 1 and 2.1 of the Cell Broadband Engine Software Developer Kit (SDK). The current version of the SDK, the IBM Software Development Kit for Multicore Acceleration, Version 3.0, has recently become available and offers many enhancements in functionality, ease of use, and performance over the earlier versions. While the results documented in this article are correct for the earlier versions of the SDK, different results will be obtained with SDK 3.0. Watch for updates to the articles in this series that will describe the latest performance improvements obtained using SDK 3.0.
The example application modified in this article is a piece of code used to price a European Option to highlight the benefits of the Cell/B.E. blade. A European Option is a simple financial contract with strict terms and properties that gives the buyer the right to trade a given asset at a specific price on a specific date. It is generally an option that can be exercised only at the end of its life. By contrast, an American Option can be traded at any time between its purchase date and the date at which the contract expires. Because a European Option is traded on a fixed date, it is a simpler calculation to perform because the time variability of the American Option is removed.
You can use several different models price a European Option, depending on the type of asset that underlies it. For example, an option based on currency is calculated using a slightly different model than an option based on futures. In the example described in this series, the calculation is based on a simple Monte Carlo simulation technique. You will generate 200,000,000 uniform, pseudo-random numbers. These numbers are transformed to a log-normal distribution using a Box-Müller transform. Using the random numbers generated, you will execute the financial model repeatedly to simulate a random walk. The final stage of the analysis will be the calculation of the relevant statistics, such as the minimum, maximum, and average and the 95 percent quartile for losses.
After the initial results from Part 3 of this series, it seemed appropriate to evaluate the potential impact that the choice of random-number generator (from the SDK) might have had on the results. To achieve these results, just add some code to drive work more effectively across the SPUs, and replace the customer-supplied random-number generator with one that has been already optimized for the Cell/B.E. environment and supplied as part of the SDK. No need to make significant changes to the programmatic logic.
Recall that the original customer code uses the Mersenne-Twister random-number generator. Mersenne-Twister comes from the University of Hiroshima. This random-number generator is based on an algorithm that is approximately four times more effective than the random-number generator in the SDK, so it might be interesting to compare the performance of the two algorithms side by side. Here are the results:
- Run-time with Mersenne-Twister (without optimization): 5 seconds
- Run-time with the Cell/B.E. SDK: 4.1 seconds
These tests were performed on a 2.4 GHz pre-production Cell/B.E. blade. The interesting observation to make from these results is that the unoptimized Mersenne-Twister random-number generator performs very similarly to the optimized random-number generator from the SDK. Optimizing the Mersenne-Twister code to take advantage of the threading framework and rewriting the code to use the SIMD capabilities of the SPUs should provide a suitable mechanism to improve the performance even more.
Before looking at the porting to Mersenne-Twister of the Cell/B.E. environment, consider some of the issues in generating uniform pseudo-random numbers on the Cell/B.E system:
- The Cell/B.E. SDK implementation is similar to many C library implementations.
- The Cell/B.E. SDK implementation uses a scalar linear congruency generator.
- The Cell/B.E. SDK implementation has been vectorized to handle the generation of four 32-bit random numbers in parallel.
- As with many implementations of this sort, the code is implemented to deliver one stream of random numbers. Consequently, when the random-number generator is run multiple times in parallel, there is a risk that the subsequently generated streams of random numbers might overlap.
A solution is to create a new implementation of the Mersenne-Twister random-number generator on the Cell/B.E. processor with the appropriate initialization to avoid the potential overlap between the streams of random numbers. You can also implement a vectorized Polar-Transform to get normally distributed random numbers. This implementation delivers a number of performance improvements as expected. Most notably, it was four times faster compared to the Cell/B.E. SDK random-number generator, as shown in Table 5. (Tables and figures are numbered throughout the series to match the versions in the original whitepaper.)
Table 5. Performance comparison between Cell/B.E. SDK and Mersenne-Twister random-number generators
| Precision | Runtime (seconds) SDK RNG (2.4 GHz) | Runtime (seconds) Mersenne-Twister RNG (2.4 GHz) | Runtime (seconds) Mersenne-Twister RNG 3.2 GHz (estimated) |
|---|---|---|---|
| Single | 4.1 | 1.02 | 0.76 |
| Double | 9.9 | 2.47 | 1.85 |
Many other individuals contributed (both knowingly and unknowingly) to this piece of work. The authors wish to acknowledge their kind contributions. Without this assistance, this paper would never have been written.
Learn
- Use an RSS
feed to request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)
- Check out the original whitepaper,
"Porting Financial Markets Applications to the Cell Broadband Engine Architecture"
(alphaWorks, June 2007). The original whitepaper combines
the contents of this entire series. It also
provides a tidy introduction to the Cell/B.E. architecture, and it explains why the
processor is important, especially for compute-intensive financial market
applications.
- See
"Introduction to the Cell Multiprocessor"
(IBM Journal of Research and Development, 2005) for an introductory
overview of the Cell/B.E. multiprocessor's history, the program objectives and
challenges, the design concept, the architecture and programming models, and the
implementation.
- Go to "Porting practices: Compute-intensive applications"
(developerWorks, June 2007) for help bringing a compute-intensive
application to the Cell/B.E. architecture.
- Read "Tech tips: SPU vector intrinsics at your fingertips"
(developerWorks, May 2007) for a handy list to keep you on the right side of common
Cell/B.E. SPU vector intrinsics. This article was extracted from a longer article,
"Programming high-performance applications on the Cell BE processor, Part 5").
- Review "Cell Broadband Engine Architecture and its first implementation"
(developerWorks, November 2005) for an up-close look at the performance
figures and characteristics of the first implementation.
- Explore the
QuantLib project, which is a free/open-source library
written in C++ with a clean object model for modeling, trading, and risk
management in real-life. It is exported to different languages such as C#,
Objective Caml, Java, Perl, Python, GNU R, Ruby, and Scheme.
- Learn more about the
Mersenne-Twister.
It is a very fast, pseudo-random number-generating algorithm that uses memory very
efficiently. The algorithm has a far longer period and far higher order of
equidistribution than any other implemented generator.
- Refer to
"Implementation of a Mixed-Precision in Solving Systems of Linear Equations on the CELL Processor"
for details on the implementation of code to solve a linear system of equations
using Gaussian elimination in single precision with iterative refinement of the
solution to the full double-precision accuracy.
- Bring up the
Software Development Kit 2.1 Installation Guide Version 2.1
(PDF) to walk through installation, configuration, and many of the basics that
you need to know to get started with development. Two companion pieces,
"Cell/B.E. SDK 2.1: Setting up Fedora Core 6"
and
"Cell/B.E. SDK 2.1: Understanding the terminology"
(developerWorks, April 2007), can help you get the requisite FC6 up and running,
and they provide a quick reference to Cell/B.E. terminology.
- To learn more on Cell/B.E. programming, try the
developerWorks series:
- "Programming high-performance applications on the Cell/B.E. processor"
- "PS3 fab to lab"
- "The little broadband engine that could"
- Refer to the Cell
Broadband Engine documentation section of the IBM Semiconductor Solutions Technical Library for a wealth of downloadable manuals,
specifications, and more.
- Sign up for the developerWorks newsletter
and get the latest developer news and Cell/B.E. happenings delivered to your inbox each week.
Check Power Architecture when you sign up to receive Cell/B.E. news in your newsletter.
Get products and technologies
- Get the
OpenMP API to learn about a portable, scalable model
that gives shared-memory parallel programmers a simple and flexible interface for
developing parallel applications. The API supports multi-platform, shared-memory parallel
programming in C, C++, and FORTRAN on all architectures, including UNIX® and Windows
NT® platforms.
- Find the centerpiece
of Cell/B.E. development at
latest Cell/B.E. SDK release, version 2.1.
- Locate the
IBM XLC compiler for
porting efforts. It is optimized for the Cell/B.E. processor.
- Find all Cell/B.E.-related articles, discussion forums, downloads,
and more at the IBM developerWorks Cell
Broadband Engine resource center: your definitive resource for all
things Cell/B.E.
- Contact IBM about custom
Cell/B.E.-based or custom-processor based solutions.
Discuss
- Participate in the discussion forum.
- Check out the Cell Broadband
Engine Architecture forum to get your technical questions about the processor answered.
Juicy problems and answers from the forums are rounded up periodically and highlighted
in the "Forum watch" blog series.
- Go to the Power Architecture blog for news, downloads,
instructional resources, and event notifications for Cell/B.E. and other Power Architecture-related technologies. You can find
the popular "Forum watch" blog series (Q&A roundup) and the "FixIt" technology updates.
John is currently leading a worldwide emerging technologies team within IBM Systems and Technology Group. He has several roles competing for his time, all of which revolve around advising organizations on how best to exploit new technologies. John has been working for IBM for over 20 years in a variety of technical roles. He worked in Distributed Systems Development in Austin before the launch of the RS/6000, and he holds several patents in the areas of security and systems software. Before taking his current role, he was the European technical leader for grid computing.
Ingo Meents joined IBM nine years ago and works currently as an IT Architect in IBM Global Engineering Solutions (GES). His current focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and creating solutions for this platform. Before his work on the Cell/B.E. platform, he was lead architect for a modeling, simulation, and production planning solution used by the IBM 300mm semiconductor line in Fishkill. Starting as a research student at IBM, Ingo Meents received his doctor's degree from the University of Clausthal in 2001.
Olaf Stephan joined IBM in 1998 and currently works as an IT Specialist in IBM Global Engineering Solutions (GES). His focus is to provide IBM customers with knowledge of the latest Cell/B.E. software technology by consulting, educating, briefing, and development for this platform. Before his work on the Cell/B.E. platform, he worked in the areas of data management, data warehousing, business intelligence, and data integration. Olaf holds a Masters degree in Electrical Engineering, specializing in Communications Technology, from the University of Applied Sciences, Koblenz, Germany.
Horst has over 10 years of experience in the application of simulation methods and the development of mathematical models in different areas. He is currently leading a development team in IBM Global Engineering Solutions (GES) that is working on a simulation and planning solution used by IBM 300mm manufacturing in Fishkill and by external customers as well. Horst is also the European subject matter expert for the GES supply chain offerings. In addition, Horst regularly gives lectures at universities about simulation and mathematical modeling. Horst is a member of a standardization group for simulation and optimization.
Sei Kato is a staff member in IBM Research, Tokyo Research Laboratory. He joined IBM in 2002 after receiving his PhD in Mathematical Science from the University of Tokyo. After joining IBM, Sei has worked on modeling and simulating the performance of Web systems. His is currently working on the acceleration of financial calculations and on large-scale traffic simulations.
Comments (Undergoing maintenance)




