The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this community and its apps will no longer be available.

**More details available on our FAQ.**
with Tags:
upc
X

## Implementing a Scalable Parallel Reduction in Unified Parallel C
A reduction is the process of combining elements of a vector (or array) to yield a single aggregate element. It is commonly used in scientific computations. For instance the inner product of two n-dimensional vectors x, y is given by: This computation requires n multiplications and n-1 additions. The n multiplications are independent from each other, therefore could be executed concurrently. Once the additive terms have been computed they can be summed together to yield the final result. Given the importance of reduction operations in... [More]
Tags: upc reduction upc_forall parallel_performance parallel upc_programming cppcafe parallel_computing |

## Implementing a Scalable Parallel Reduction in Unified Parallel C (part 3)
continue from the second parallel reduction blog . To get better scalability (increased program performance as the number of threads increases), it is critical to remove the lock in the upc_forall loop. This can be done by accumulating the partial sum computed by each thread into a thread-local variable. A thread-local variable is allocated in the private memory space of each thread, thus there are THREADS “instances” of the variable. Each instance of the thread-local variable can be used to accumulate the sum of the array elements having... [More]
Tags: cppcafe parallel_performance parallel_computing upc_forall upc parallel reduction parallel_programming |

## Implementing a Scalable Parallel Reduction in Unified Parallel C (part 2)
continue from the previous parallel reduction blog The result is obvious wrong, but what is the problem? The keen reader might point out that the program as written contains a race condition. Multiple threads can write into shared variable "sum" concurrently, possibly overwriting a partial value previously stored. In order to eliminate the race condition we could protect writes into variable "sum" using a critical section. In UPC this is accomplished by using a "lock" variable as follow: upc_forall ( i=0;i<N;i++;&A[i] ) { upc_lock (... [More]
Tags: cppcafe parallel_performance upc_forall parallel_programming upc reduction parallel parallel_computing |