Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Fun with ALF, Part 5: Using overlapped I/O buffers to add matrices

Two examples show you how to use the ALF overlapped I/O buffers to do matrix addition

Kane Scarlett, Editor, Multicore acceleration, IBM
Kane Scarlett
Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.

Summary:  In this Cell Broadband Engine™ (Cell/B.E.) series, learn how to use the Accelerated Library Framework (ALF) overlapped input-output buffers to perform matrix addition. The "ALF for Cell/B.E. Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content.

View more content in this series

Date:  03 Jun 2008
Level:  Introductory
Also available in:   Japanese

Activity:  24165 views
Comments:  

More fun with ALF

Look for more in the Fun with ALF series:

  • In "Fun with ALF, Part 1: Adding large matrices together" (developerWorks, March 2008), see how to use ALF to add two large matrices together (with an example for host data partitioning and for accelerator data partitioning).
  • In "Fun with ALF, Part 2: Converting I/O" (developerWorks, March 2008), learn how the task context buffer is used as a large lookup table to convert the 16-bit input data to 8-bit output data.
  • In "Fun with ALF, Part 3: Finding minimum and maximum values" (developerWorks, April 2008), discover how you can use the task context to keep the partial computing results for each task instance and then combine these partial results into the final result.
  • In "Fun with ALF, Part 4: Determining the dot product of large vectors" (developerWorks, April 2008), uncover how to use the bundled work block distribution with the task context to handle situations where the work block cannot hold the partitioned data because of a local memory size limit.
  • In "Task dependency," check out a simple simulation in which you can use task dependency in a two-stage pipeline application.

And watch for similar Fun with series addressing DaCS, BLAS, and other technologies to make your Cell/B.E. programming easier.

Introduction

The overlapped input and output buffer (overlapped I/O buffer) is a work block buffer that contains both input and output data. The input and output sections are dynamically designated for each work block. This buffer is especially useful when you want to maximize the use of accelerator memory and when the input buffer can be overwritten by the output data.

The data for the input buffer can come from distinct sections of a large data set in host memory. These distinct data segments are gathered into the input data buffer on the accelerators. The ALF framework minimizes performance overhead by not duplicating input data unnecessarily.

The output data buffer is a single, contiguous buffer in the memory of the accelerator. Output data can be transferred to distinct memory segments within a large output buffer in host memory. After the compute kernel returns from processing one work block, the data in this buffer is moved to the host memory locations specified by the alf_wb_dtl_entry_add routine when the work block is constructed.

The following two simple examples show the usage of overlapped I/O buffers. Both examples perform matrix addition.

Setting up the matrix

The code is similar to the matrix_add example in Fun with ALF, Part 1: Adding large matrices together. Listing 1 shows only the relevant code.


Listing 1. Setting up the matrix
/* ---------------------------------------------- */ 
/* matrix declaration for the two cases           */ 
/* ---------------------------------------------- */ 
#ifdef C_A_B // C = A + B 
       alf_data_int32_t mat_a[ROW_SIZE][COL_SIZE]; // the matrix a 
       alf_data_int32_t mat_b[ROW_SIZE][COL_SIZE]; // the matrix b 
       alf_data_int32_t mat_c[ROW_SIZE][COL_SIZE]; // the matrix c 
#else // A = A + B 
       alf_data_int32_t mat_a[ROW_SIZE][COL_SIZE]; // the matrix a 
       alf_data_int32_t mat_b[ROW_SIZE][COL_SIZE]; // the matrix b 
#endif

Setting up the work block

The Listing 2 code segment shows the work block creation process for the two cases.


Listing 2. Creating the work block
for (i = 0; i < ROW_SIZE; i+=PART_SIZE){ 
     if(i+PART_SIZE <= ROW_SIZE) 
        wb_parm.num_data = PART_SIZE; 
     else 
        wb_parm.num_data = ROW_SIZE - i; 

     alf_wb_create(task_handle, ALF_WB_SINGLE, 0, &wb_handle); 

     #ifdef C_A_B // C = A + B 
            // the input data A and B 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_IN, 0); // offset at 0 
            alf_wb_dtl_entry_add(wb_handle, &mat_a[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // A 
            alf_wb_dtl_entry_add(wb_handle, &mat_b[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // B 
            alf_wb_dtl_end(wb_handle); 

            // the output data C is overlapped with input data A 
            // offset at 0, this is overlapped with A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_OUT, 0); 
            alf_wb_dtl_entry_add(wb_handle, &mat_c[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // C 
            alf_wb_dtl_end(wb_handle); 
     
      #else // A = A + B 
            // the input and output data A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_INOUT, 0); // offset 0 
            alf_wb_dtl_entry_add(wb_handle, &mat_a[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // A 
            alf_wb_dtl_end(wb_handle); 

            // the input data B is placed after A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_IN, 
             wb_parm.num_data*COL_SIZE*sizeof(alf_data_int32_t)); 
            alf_wb_dtl_entry_add(wb_handle, &mat_b[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // B 
            alf_wb_dtl_end(wb_handle); 
      #endif 
      alf_wb_parm_add(wb_handle, (void *)&wb_parm, sizeof(wb_parm)/sizeof(unsigned int), 
       ALF_DATA_INT32, 0); 
      alf_wb_enqueue(wb_handle); 
}

Setting up the accelerator code

Listing 3 shows the accelerator code. In both cases, the output sc can be set to the same location in accelerator memory as sa and sb.


Listing 3. Sample code listing at maximum width
/* ---------------------------------------------- */ 
/* the accelerator side code                      */ 
/* ---------------------------------------------- */ 
/* the computation kernel function                */ 
int comp_kernel(void *p_task_context, void *p_parm_ctx_buffer, 
                void *p_input_buffer, void *p_output_buffer, 
                void *p_inout_buffer, unsigned int current_count, 
                unsigned int total_count) 
{ 
  unsigned int i, cnt; 
  int *sa, *sb, *sc; 
  my_wb_parms_t *p_parm = (my_wb_parms_t *) p_parm_context; 

  cnt = p_parm->num_data * COL_SIZE; 

  sa = (int *) p_inout_buffer; 
  sb = sa + cnt; 
  sc = sa; 

  for (i = 0; i < cnt; i ++) 
       sc[i] = sa[i] + sb[i]; 

  return 0; 
}

Conclusion

This article described two implementations you can use to do matrix addition with overlapped I/O buffers.


Resources

Learn

Get products and technologies

Discuss

About the author

Kane Scarlett

Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=311688
ArticleTitle=Fun with ALF, Part 5: Using overlapped I/O buffers to add matrices
publish-date=06032008
author1-email=kane@us.ibm.com
author1-email-cc=