The View (or trip report) from the Feb 2010 OpenMP Santa Clara meeting
Michael_Wong 120000M1EH Visits (1941)
At the last meeting which occurred in Feb 1-3 (I know I am very late filing this trip report) at Santa Clara, California, a number of companies, and the OpenMP community came together to finalize details on the 3.1 specification.
This will update the current Specification from 3.0 to 3.1. The main list of features in this 3.1 update will improve a few aspects of the 3.0 specification. This includes:
The 3.1 specification will also clarify many examples, while adding about 20 minor bug fixes to the specification, most are technical, with some being editorial.
I will highlight what will be in 3.1.
First, let me talk about the increased affinity support.
There is now a new environment variable called OMP_PROC_BIND which provides affinity support between OpenMP threads and processors. So if you want your thread to not move OpenMP threads between processors, set this to true. Otherwise, set it to false.
Previously, the environment variable OMP_NUM_THREADS which allows you to set the number of threads to use for parallel regions, lacked any control for nested parallel regions. This effectively means all inner parallel regions have the same number of threads.
In 3.1, you will be able to control the number of threads to use for the inner regions in code such as the following:
In this case, the outer parallel region will have 4 threads and the inner parallel region will have 2 threads, if your implementation supports nested exception (as it is an optional part of OpenMP).
For Tasks, there will be a few significant enhancements. This include a facility to support the taskyield directive. This is a scheduling hint to the task scheduler that the current task can be switched.
Probably the biggest enhancement of 3.1 will be the support of the task final clause. This is a performance enhancement for increasingly fine grain task parallelism, where at some point, the overhead of generating a child task can overwhelm the advantage of executing it in a task. Usually, people can avoid the problem with a serial version of the code. The final clause affects the descendant tasks within a task region by making it an included task which means it is undeferred and executed immediately by the encountering thread.
$pragma omp task final (i>100)
//if i > 100 included tasks being generated
#pragma omp task
//code executed immediately
So here, we must define a new term called an included task which is a task for which execution is sequentially included in the generating task region.
After some further discussion, we decided to further enhance performance for tasks by reducing the overhead of handling the data sharing attribute clause and the Internal Control Variables (ICVs). In this case a given task can be completely merged inside another task, reusing the data environment of the parent task. So it can look like this:
#pragma omp task final(i>100)
#pragma omp task mergeable
//inherit ICV form the generating task
Finally, there is an API called omp_in_final() which decides if the enclosing task region is final. This is useful for guarding code that would only be needed if the task isn't final.
That's it for now. In the next blog, I will talk about the other major 3.1 features, such as atomic enhancements, min/max reductions for C++, and changes to the memory model.