This article provides an example that shows how task dependency is used in a two-stage pipeline application. The problem is a simple simulation. An object P is placed in the middle of a flat surface with a bounding rectangular box. On each simulation step, the object moves in a random distance in a random direction. It moves back to the initial position when it hits the side walls of the bounding box. The problem is to calculate the number of hits to the four walls in a given time period (also known as playing Pong).
Figure 1. Object P randomly hits the side wall of the bounding box

A two-stage pipeline is used to solve the problem so that the random number generation and the simulation can be paralleled:
- The first stage generates random numbers using a pseudo-random number generator.
- The second stage simulates the movements.
Because ALF currently does not support pipeline directly, a pipeline structure is simulated using task dependency. There are two tasks that correspond to the two pipeline stages.
For this problem, each simulation step needs only a small amount of data, just as a motion vector. Although ALF does not have a strict limit on how small the data can be, it is better to use larger data blocks for performance considerations. Therefore, the data for thousands of simulation steps is grouped into a single work block.
For the stage 1 task, a Lagged Fibonacci pseudo-random number generator (PRNG) is used for simplicity. (A Lagged Fibonacci generator is an example of a pseudo-random number generator: a class of random number generator that is aims to be an improvement over the standard linear congruential generator. These are based on a generalization of the Fibonacci sequence, which can be described by the recurrence relation, an equation that defines a sequence recursively, so that each term of the sequence is defined as a function of the preceding terms.)
In this example, the algorithm is
Sn=(Sn-j^Sn-k)%232,
where k > n > 0 and k = 71, j = 65. The
algorithm requires a length k history buffer to save the older
values. In this implementation, the task context is used for the history
buffer. Because no input data is needed, the work block for this task
has only output data.
For the stage 2 task, the task context is used to save the current status of the simulation, including the position of the object and the number of hits to the walls. The work block in this stage has only input data, which are the PRNG results from stage 1.
Another target of pipelining is to overlap the execution of different stages for performance improvement. This requires work-block-level task synchronization between stages, which ALF does not yet support. The alternative approach is to use multiple tasks in which each task only handles a percentage of the work blocks for the whole simulation.
So there are now two stage tasks. For each chunk of work blocks, the following two tasks are created:
- The stage 1 task generates the random numbers and writes out the results to a temporary buffer.
- The stage 2 task reads the random numbers from the temporary buffer to do the simulation.
A task dependency is set between the two tasks to make sure the stage 2 task can get the correct results from stage 1 task. Because both the PRNG and the simulation have internal states, you have to pass the states data between the succeeding tasks of the same stage to preserve the states. The approach described here enables the tasks for the same stage to share the same task context buffer. Dependencies are used to make sure the tasks access the shared task context in the correct order.
Figure 2 shows the task dependency.
Figure 2. Task dependency with an unlimited number of interstage buffers

To further reduce the use of temporary intermediate buffers, you can use double or multi-buffering technology for the intermediate buffers. The task dependency graph for double buffering the intermediate buffers is shown in Figure 3.
Figure 3. Task dependency using double buffers

In this example, a new dependency is added between the n-2nd stage 2 task and the nth stage 1 task to make sure the stage 1 task does not overwrite the data that might still be in use by the previous stage 2 task. This is what is implemented in the sample code. You can find the complete source code in the sample SDK directory pipe_line.
This article described how to use the ALF task dependency in a two-stage pipeline application.
Learn
- Use an
RSS
feed to request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)
- Refer to Accelerated Library Framework for Cell Broadband Engine Programmerâs Guide and API Reference for the source material from which this article was extracted.
- Check out other articles in this Fun
with ALF series.
- Refer to the following ALF-related
quick-access guide series entry for an especially appropriate companion to this
article:
"Programming with ALF: When to use overlapped I/O buffers"
(developerWorks blog, April 2008) shows you when it's appropriate to use
the ALF overlapped I/O buffer to maximize accelerator memory use.
- Take a look at these other ALF-related
quick-read guides:
- "Introducing ALF."
- "10 major ALF concepts."
- "Programming with ALF: Basic ALF application structure."
- "Programming with ALF: Double buffering."
- "Programming with ALF: Handling ALF constraints."
- "Programming with ALF: Optimizing ALF applications."
- "ALF and hybrid x86."
- Learn more about Cell/B.E. programming
from the developerWorks series:
- "Programming high-performance applications on the Cell/B.E. processor"
- "PS3 fab-to-lab"
- "The little broadband engine that could"
- Refer to the Cell
Broadband Engine documentation section of the IBM Semiconductor Solutions Technical Library for a wealth of downloadable manuals,
specifications, and more.
- Sign up for the developerWorks newsletter
and get the latest developer news and Cell/B.E. happenings delivered to your inbox each week.
Check Power Architecture® when you sign up to receive Cell/B.E. news in your newsletter.
Get products and technologies
- Get your copy of the
IBM SDK for Multicore Acceleration 3.0
or browse through the library of Cell/B.E. documentation.
- Find all Cell/B.E.-related articles, discussion forums, downloads,
and more at the IBM developerWorks Cell
Broadband Engine resource center: your definitive resource for all
things Cell/B.E.
- Contact IBM about custom
Cell/B.E.-based or custom-processor based solutions.
Discuss
- Check out the Cell Broadband
Engine Architecture forum to get your technical questions about the processor answered.
Juicy problems and answers from the forums are rounded up periodically and highlighted
in the "Forum watch" blog series.
- Go to the Cell Broadband Engine/Power Architecture blog for
news, downloads,
instructional resources, and event notifications for Cell/B.E. and other Power Architecture-related technologies. You can find
the popular "Forum
watch" blog series (Q&A roundup), the "FixIt" technology updates, and the Infobomb
quick-read technology introductions.

Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.



