In this article, you'll learn how to use the ALF task context to keep the partial computing results for each task instance and then combine these partial results into the final result.
The example finds the minimum and maximum values in a large data set. The sequential code is a very simple textbook-style implementation. The code is a linear search across the whole data set that compares and updates the best known values with each step.
You can use the ALF framework to convert the sequential code into a parallel algorithm. The data set must be partitioned into smaller work blocks. These work blocks are then assigned to the different task instances running on the accelerators. Each invocation of a computational kernel on a task instance is to find the maximum or minimum value in the work block assigned to it. After all the work blocks are processed, you have multiple intermediate best values in the context of each task instance. The ALF runtime then calls the context merge function on accelerators to reduce the intermediate results into the final results.
Figure 1. The minimum and maximum finder in action
Figure 1 shows you what the minimum and maximum finder is going to do. You can find the source code in the sample directory task_context/min_max. You can use this sample code as a template to build more complicated applications.
Coding the computational kernel
The following code section shows the computational kernel for this example application.
The computational kernel finds the maximum and minimum values in the provided
input buffer then updates the task_context with those
values.
/* ---------------------------------------------- */
/* the accelerator side code */
/* ---------------------------------------------- */
/* the computation kernel function */
int comp_kernel(void *p_task_context, void *p_parm_ctx_buffer,
void *p_input_buffer, void *p_output_buffer,
void *p_inout_buffer, unsigned int current_count,
unsigned int total_count)
{
my_task_context_t *p_ctx = (my_task_context_t *) p_task_context;
my_wb_parms_t *p_parm = (my_wb_parms_t *) p_parm_ctx_buffer;
alf_data_int32_t *a = (alf_data_int32_t *)p_input_buffer;
unsigned int size = p_parm->num_data;
unsigned int i;
/* update the best known values in context buffer */
for(i=0;i<size;i++) {
if(a[i]>p_ctx->max)
p_ctx->max = a[i];
else if(a[i]<p_ctx->min)
p_ctx->min = a[i];
} return 0;
}
|
The following code segment shows the context_merge
function for the example application. The ALF runtime automatically invokes this function
after all the task instances have finished processing all the work blocks.
The final minimum and maximum values stored in the task context per task instance
are merged through this function.
/* the context merge function */
int ctx_merge(void* p_task_context_to_be_merged,
void* p_task_context)
{
my_task_context_t *p_ctx = (my_task_context_t *) p_task_context;
my_task_context_t *p_mgr_ctx = (my_task_context_t *)
p_task_context_to_be_merged;
if(p_mgr_ctx->max > p_ctx->max)
p_ctx->max = p_mgr_ctx->max;
if(p_mgr_ctx->min < p_ctx->min)
p_ctx->min = p_mgr_ctx->min;
return 0;
}
|
Now you know how to use the ALF task context to keep the partial computing results for each task instance and then combine these partial results into the final result.
Learn
- Use an
RSS
feed to request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)
- Refer to Accelerated Library Framework for Cell Broadband Engine ProgrammerâÂÂs Guide and API Reference for the source material from which this article was extracted.
- Check out other articles in this Fun
with ALF series.
- Take a look at these other ALF-related
quick-read guides:
- "Introducing ALF."
- "10 major ALF concepts."
- "Programming with ALF: Basic ALF application structure."
- "Programming with ALF: Double buffering."
- "Programming with ALF: Handling ALF constraints."
- "Programming with ALF: Optimizing ALF applications."
- "ALF and hybrid x86."
- To learn more on Cell/B.E. programming, try the
developerWorks series:
- "Programming high-performance applications on the Cell/B.E. processor"
- "PS3 fab-to-lab"
- "The little broadband engine that could"
- Refer to the Cell
Broadband Engine documentation section of the IBM Semiconductor Solutions Technical Library for a wealth of downloadable manuals,
specifications, and more.
- Sign up for the developerWorks newsletter
and get the latest developer news and Cell/B.E. happenings delivered to your inbox each week.
Check Power Architecture® when you sign up to receive Cell/B.E. news in your newsletter.
Get products and technologies
- Find all Cell/B.E.-related articles, discussion forums, downloads,
and more at the IBM developerWorks Cell
Broadband Engine resource center: your definitive resource for all
things Cell/B.E.
- Contact IBM about custom
Cell/B.E.-based or custom-processor based solutions.
- Get your copy of the
IBM SDK for Multicore Acceleration 3.0
or browse through the exhaustive
library of Cell/B.E. documentation.
Discuss
- Participate in the discussion forum.
- Check out the Cell Broadband
Engine Architecture forum to get your technical questions about the processor answered.
Juicy problems and answers from the forums are rounded up periodically and highlighted
in the "Forum watch" blog series.
- Go to the Power Architecture blog for news, downloads,
instructional resources, and event notifications for Cell/B.E. and other Power Architecture-related technologies. You can find
the popular "Forum watch" blog series (Q&A roundup), the "FixIt" technology updates, and the Infobomb quick-read
technology introductions.

Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.
Comments (Undergoing maintenance)




