 | Level: Introductory Kane Scarlett (kane@us.ibm.com), Editor, Multicore acceleration,
IBM
29 Apr 2008 In this Cell Broadband Engine™ (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) task context to keep the partial
computing results for each task instance and then combine them. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content.
Introduction
In this article, you'll learn how to use the ALF task context to keep
the partial computing results for each task instance and then combine these
partial results into the final result.
 |
More fun with ALF
Look for more in the Fun
with ALF series:
- In "Fun with
ALF, Part 1: Adding large matrices together"
(developerWorks, March 2008), see how to use ALF to add two large matrices
together (with an example for host data partitioning and for accelerator data
partitioning).
- In "Fun with
ALF, Part 2: Converting I/O" (developerWorks, March 2008), learn how the task context buffer is used as a large
lookup table to convert the 16-bit input data to 8-bit output data.
- In "Multiple vector dot products," uncover how to use the bundled work block
distribution with the task context to handle situations where the
work block cannot hold the partitioned data because of a local memory size
limit.
- In "Overlapped I/O buffer," take a look at using overlapped I/O buffers to
do matrix addition.
- In "Task dependency," check out a simple simulation in which you can use task
dependency in a two-stage pipeline application.
And watch for similar Fun with series addressing DaCS, BLAS, and other
technologies to make your Cell/B.E. programming easier.
|
|
The example finds the minimum and maximum values in a large data set. The
sequential code is a very simple textbook-style implementation. The code is a linear
search across the whole data set that compares and updates the best known values
with each step.
You can use the ALF framework to convert the sequential code into a parallel
algorithm. The data set must be partitioned into smaller work blocks. These work
blocks are then assigned to the different task instances running on the
accelerators. Each invocation of a computational kernel on a task instance is to
find the maximum or minimum value in the work block assigned to it. After all the
work blocks are processed, you have multiple intermediate best values in the
context of each task instance. The ALF runtime then calls the context merge
function on accelerators to reduce the intermediate results into the final
results.
Figure 1. The minimum and maximum finder in
action
Figure 1 shows you what the minimum and maximum finder is going to do. You can find the
source code in the sample directory task_context/min_max. You can use this sample
code as a template to build more complicated applications.
Coding the computational kernel
The following code section shows the computational kernel for this example application.
The computational kernel finds the maximum and minimum values in the provided
input buffer then updates the task_context with those
values.
/* ---------------------------------------------- */
/* the accelerator side code */
/* ---------------------------------------------- */
/* the computation kernel function */
int comp_kernel(void *p_task_context, void *p_parm_ctx_buffer,
void *p_input_buffer, void *p_output_buffer,
void *p_inout_buffer, unsigned int current_count,
unsigned int total_count)
{
my_task_context_t *p_ctx = (my_task_context_t *) p_task_context;
my_wb_parms_t *p_parm = (my_wb_parms_t *) p_parm_ctx_buffer;
alf_data_int32_t *a = (alf_data_int32_t *)p_input_buffer;
unsigned int size = p_parm->num_data;
unsigned int i;
/* update the best known values in context buffer */
for(i=0;i<size;i++) {
if(a[i]>p_ctx->max)
p_ctx->max = a[i];
else if(a[i]<p_ctx->min)
p_ctx->min = a[i];
} return 0;
}
|
Merging the values
The following code segment shows the context_merge
function for the example application. The ALF runtime automatically invokes this function
after all the task instances have finished processing all the work blocks.
The final minimum and maximum values stored in the task context per task instance
are merged through this function.
/* the context merge function */
int ctx_merge(void* p_task_context_to_be_merged,
void* p_task_context)
{
my_task_context_t *p_ctx = (my_task_context_t *) p_task_context;
my_task_context_t *p_mgr_ctx = (my_task_context_t *)
p_task_context_to_be_merged;
if(p_mgr_ctx->max > p_ctx->max)
p_ctx->max = p_mgr_ctx->max;
if(p_mgr_ctx->min < p_ctx->min)
p_ctx->min = p_mgr_ctx->min;
return 0;
}
|
Conclusion
Now you know how to use the ALF task context to keep
the partial computing results for each task instance and then combine these
partial results into the final result.
Resources Learn
Get products and technologies
Discuss
About the author  | 
|  | Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks. |
Rate this page
|  |