The developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available.

**More details available on our FAQ.**(**Read in Japanese.**)
with Tags:
reduction
X

## OpenMP 4.0 about to be released and IWOMP 2013
So much has been happening in OpenMP since SC 12 that I hope to capture it all in this post while flying to the ADC C++ meeting where I will talk about C++14, ISOCPP.org, and Transactional Memory. First, the research arm of OpenMP is IWOMP, the annual research conference. You probably know by now that IWOMP 2013 will be in Canberra, Australia instead of its usual June summer time frame. This means that there is time (up till May 10) for new proposal submission. So if you have some research in OpenMP that should be exposed, please submit a... [More]
Tags: user tools defined 2003 4.0 iwomp atomics simd reduction fortran 2013 error openmp model tasks accelerators |

## Implementing a Scalable Parallel Reduction in Unified Parallel C
A reduction is the process of combining elements of a vector (or array) to yield a single aggregate element. It is commonly used in scientific computations. For instance the inner product of two n-dimensional vectors x, y is given by: This computation requires n multiplications and n-1 additions. The n multiplications are independent from each other, therefore could be executed concurrently. Once the additive terms have been computed they can be summed together to yield the final result. Given the importance of reduction operations in... [More]
Tags: upc reduction upc_forall parallel_performance parallel upc_programming cppcafe parallel_computing |

## Implementing a Scalable Parallel Reduction in Unified Parallel C (part 3)
continue from the second parallel reduction blog . To get better scalability (increased program performance as the number of threads increases), it is critical to remove the lock in the upc_forall loop. This can be done by accumulating the partial sum computed by each thread into a thread-local variable. A thread-local variable is allocated in the private memory space of each thread, thus there are THREADS “instances” of the variable. Each instance of the thread-local variable can be used to accumulate the sum of the array elements having... [More]
Tags: cppcafe parallel_performance parallel_computing upc_forall upc parallel reduction parallel_programming |

## Implementing a Scalable Parallel Reduction in Unified Parallel C (part 2)
continue from the previous parallel reduction blog The result is obvious wrong, but what is the problem? The keen reader might point out that the program as written contains a race condition. Multiple threads can write into shared variable "sum" concurrently, possibly overwriting a partial value previously stored. In order to eliminate the race condition we could protect writes into variable "sum" using a critical section. In UPC this is accomplished by using a "lock" variable as follow: upc_forall ( i=0;i<N;i++;&A[i] ) { upc_lock (... [More]
Tags: cppcafe parallel_performance upc_forall parallel_programming upc reduction parallel parallel_computing |