Skip to main content

Fun with ALF, Part 5: Using overlapped I/O buffers to add matrices

Two examples show you how to use the ALF overlapped I/O buffers to do matrix addition

Kane Scarlett, Editor, Multicore acceleration, IBM
Kane Scarlett
Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.

Summary:  In this Cell Broadband Engine™ (Cell/B.E.) series, learn how to use the Accelerated Library Framework (ALF) overlapped input-output buffers to perform matrix addition. The "ALF for Cell/B.E. Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content.

View more content in this series

Date:  03 Jun 2008
Level:  Introductory
Activity:  2772 views

More fun with ALF

Look for more in the Fun with ALF series:

  • In "Fun with ALF, Part 1: Adding large matrices together" (developerWorks, March 2008), see how to use ALF to add two large matrices together (with an example for host data partitioning and for accelerator data partitioning).
  • In "Fun with ALF, Part 2: Converting I/O" (developerWorks, March 2008), learn how the task context buffer is used as a large lookup table to convert the 16-bit input data to 8-bit output data.
  • In "Fun with ALF, Part 3: Finding minimum and maximum values" (developerWorks, April 2008), discover how you can use the task context to keep the partial computing results for each task instance and then combine these partial results into the final result.
  • In "Fun with ALF, Part 4: Determining the dot product of large vectors" (developerWorks, April 2008), uncover how to use the bundled work block distribution with the task context to handle situations where the work block cannot hold the partitioned data because of a local memory size limit.
  • In "Task dependency," check out a simple simulation in which you can use task dependency in a two-stage pipeline application.

And watch for similar Fun with series addressing DaCS, BLAS, and other technologies to make your Cell/B.E. programming easier.

Introduction

The overlapped input and output buffer (overlapped I/O buffer) is a work block buffer that contains both input and output data. The input and output sections are dynamically designated for each work block. This buffer is especially useful when you want to maximize the use of accelerator memory and when the input buffer can be overwritten by the output data.

The data for the input buffer can come from distinct sections of a large data set in host memory. These distinct data segments are gathered into the input data buffer on the accelerators. The ALF framework minimizes performance overhead by not duplicating input data unnecessarily.

The output data buffer is a single, contiguous buffer in the memory of the accelerator. Output data can be transferred to distinct memory segments within a large output buffer in host memory. After the compute kernel returns from processing one work block, the data in this buffer is moved to the host memory locations specified by the alf_wb_dtl_entry_add routine when the work block is constructed.

The following two simple examples show the usage of overlapped I/O buffers. Both examples perform matrix addition.

Setting up the matrix

The code is similar to the matrix_add example in Fun with ALF, Part 1: Adding large matrices together. Listing 1 shows only the relevant code.


Listing 1. Setting up the matrix
/* ---------------------------------------------- */ 
/* matrix declaration for the two cases           */ 
/* ---------------------------------------------- */ 
#ifdef C_A_B // C = A + B 
       alf_data_int32_t mat_a[ROW_SIZE][COL_SIZE]; // the matrix a 
       alf_data_int32_t mat_b[ROW_SIZE][COL_SIZE]; // the matrix b 
       alf_data_int32_t mat_c[ROW_SIZE][COL_SIZE]; // the matrix c 
#else // A = A + B 
       alf_data_int32_t mat_a[ROW_SIZE][COL_SIZE]; // the matrix a 
       alf_data_int32_t mat_b[ROW_SIZE][COL_SIZE]; // the matrix b 
#endif

Setting up the work block

The Listing 2 code segment shows the work block creation process for the two cases.


Listing 2. Creating the work block
for (i = 0; i < ROW_SIZE; i+=PART_SIZE){ 
     if(i+PART_SIZE <= ROW_SIZE) 
        wb_parm.num_data = PART_SIZE; 
     else 
        wb_parm.num_data = ROW_SIZE - i; 

     alf_wb_create(task_handle, ALF_WB_SINGLE, 0, &wb_handle); 

     #ifdef C_A_B // C = A + B 
            // the input data A and B 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_IN, 0); // offset at 0 
            alf_wb_dtl_entry_add(wb_handle, &mat_a[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // A 
            alf_wb_dtl_entry_add(wb_handle, &mat_b[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // B 
            alf_wb_dtl_end(wb_handle); 

            // the output data C is overlapped with input data A 
            // offset at 0, this is overlapped with A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_OUT, 0); 
            alf_wb_dtl_entry_add(wb_handle, &mat_c[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // C 
            alf_wb_dtl_end(wb_handle); 
     
      #else // A = A + B 
            // the input and output data A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_INOUT, 0); // offset 0 
            alf_wb_dtl_entry_add(wb_handle, &mat_a[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // A 
            alf_wb_dtl_end(wb_handle); 

            // the input data B is placed after A 
            alf_wb_dtl_begin(wb_handle, ALF_BUF_OVL_IN, 
             wb_parm.num_data*COL_SIZE*sizeof(alf_data_int32_t)); 
            alf_wb_dtl_entry_add(wb_handle, &mat_b[i][0], wb_parm.num_data*COL_SIZE, 
             ALF_DATA_INT32); // B 
            alf_wb_dtl_end(wb_handle); 
      #endif 
      alf_wb_parm_add(wb_handle, (void *)&wb_parm, sizeof(wb_parm)/sizeof(unsigned int), 
       ALF_DATA_INT32, 0); 
      alf_wb_enqueue(wb_handle); 
}

Setting up the accelerator code

Listing 3 shows the accelerator code. In both cases, the output sc can be set to the same location in accelerator memory as sa and sb.


Listing 3. Sample code listing at maximum width
/* ---------------------------------------------- */ 
/* the accelerator side code                      */ 
/* ---------------------------------------------- */ 
/* the computation kernel function                */ 
int comp_kernel(void *p_task_context, void *p_parm_ctx_buffer, 
                void *p_input_buffer, void *p_output_buffer, 
                void *p_inout_buffer, unsigned int current_count, 
                unsigned int total_count) 
{ 
  unsigned int i, cnt; 
  int *sa, *sb, *sc; 
  my_wb_parms_t *p_parm = (my_wb_parms_t *) p_parm_context; 

  cnt = p_parm->num_data * COL_SIZE; 

  sa = (int *) p_inout_buffer; 
  sb = sa + cnt; 
  sc = sa; 

  for (i = 0; i < cnt; i ++) 
       sc[i] = sa[i] + sb[i]; 

  return 0; 
}

Conclusion

This article described two implementations you can use to do matrix addition with overlapped I/O buffers.


Resources

Learn

Get products and technologies

Discuss

About the author

Kane Scarlett

Kane Scarlett is a technology journalist/analyst with 20 years in the business, working for such publishers as National Geographic, Population Reference Bureau, Miller Freeman, and IDG, and managing, editing, and writing for such august journals as JavaWorld, LinuxWorld, and of course, developerWorks.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=311688
ArticleTitle=Fun with ALF, Part 5: Using overlapped I/O buffers to add matrices
publish-date=06032008
author1-email=kane@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers