OpenMP 4.0 about to be released and IWOMP 2013
Michael_Wong 120000M1EH Visits (6502)
So much has been happening in OpenMP since SC 12 that I hope to capture it all in this post while flying to the ADC C++ meeting where I will talk about C++14, ISOCPP.org, and Transactional Memory.
First, the research arm of OpenMP is IWOMP, the annual research conference. You probably know by now that IWOMP 2013 will be in Canberra, Australia instead of its usual June summer time frame. This means that there is time (up till May 10) for new proposal submission. So if you have some research in OpenMP that should be exposed, please submit a paper.
When we last spoke, you heard that OpenMP has introduced a Technical Report process to improve its agility at issuing interim specifications, and more importantly to obtain user feedback. We used that process to introduce TR1 for accelerator support. We also released Release Candidate 1 which had 31 feature/defect fixes.
Since then, we had the Houston F2F meeting in January 2013, where we gathered to complete the work of
The outcome of that meeting was Release Candidate 2 which is currently going through public comments, with the potential of being released in June or July as OpenMP 4.0. OpenMP, the de-facto standard for shared memory systems, will extend its reach beyond pure HPC to include embedded systems, real time systems, and accelerators.
OpenMP wants to become suitable for a wide range of applications, from simulation and medical to biotech, automation, robotics and financial analysis. I will describe some of the syntax that is coming for OpenMP 4.0. This will not be a complete description as there will be additional APIs that are also part of the features which I may not cover. In the mean time, if this wets your appetite, I urge you to go to the OpenMP forum to give feedback to OpenMP 4.0 RC2 immediately as the window will be closing rapidly. Of course, any of this can still change until ratification, but this is fairly close to final form.
The key features that OpenMP 4.0 will have are :
Target Data, Target
These constructs create a device data environment for the extent of the region. The Target construct also executes the construct on the same device.
#pragma omp target data [clause[ [, ]clause] ,...] structured-block
#pragma omp target [clause[ [, ]clause] ,...] structured-block
Makes the corresponding list items in the device data environment consistent with their original list items.
#pragma omp target update motion-clause [, clause[ [, ]clause] ,...]
Specifies that variables and functions are mapped to a device.
#pragma omp declare target
#pragma omp end declare target
Creates a league of thread teams where the master thread of each team executes the region.
#pragma omp teams [clause[ [, ]clause] ,...] structured-block
default(shared | none)
Specifies that the iterations of one or more loops will be executed by the thread teams in the context of their implicit tasks. The iterations are distributed across the
#pragma omp distribute [clause[ [, ]clause] ,...] for-loops
Ensures that a specific storage location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
#pragma omp atomic [read | write | update | capture] [seq_cst] expression-stmt
#pragma omp atomic capture [seq_cst] structured-block
Requests cancellation of the innermost enclosing region of the type specified. The cancel directive may not be used in place of the statement following an if, while, do,
#pragma omp cancel [clause[ [, ]clause]
Introduces a user-defined cancellation point at which tasks check if cancellation of the innermost enclosing region of the type specified has been requested. The cancellation point directive may not be used in place of the statement following an if, while, do, switch, or label.
#pragma omp cancellation point clause
OMP_PROC_BIND bind [true | false | master, close, spread]
Sets the value of the global bind-var ICV. The value of this environment variable must be true or false.
Sets the place-partition-var ICV that defines the OpenMP places that are available to the execution environment.
Defines an explicit task. The data environment of the task is created according to data-sharing attribute clauses on task construct and any defaults that apply.
#pragma omp task [clause[ [, ]clause] ...] structured-block
default(shared | none)
The list items that appear in the depend clause may include array sections.
• out and inout: The generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an in, out, or inout clause
We introduced task wait in 3.1 which specifies a wait on the completion of child tasks of the current task. Now you can also group them so that Taskgroup also waits for descendants of the child tasks.
Taskgroup also waits for descendants of the child tasks.
pragma omp taskgroup structured-block
Declares a redu
#pragma omp declare redu
typename-list: A list of type names
combiner: An expression
initializer-clause: initializer ( omp_priv = initializer | function-name (argument-list ))
Instructs the runtime to display the OpenMP version number and the initial values of the ICVs, once, during initialization of the runtime.
Applied to a loop to indicate that the loop can be transformed into a SIMD loop (that is, multiple iterations of the loop can be executed concurrently using SIMD instructions).
#pragma omp simd [clause[ [, ]clause] ...] for-loops
Applied to a function or a subroutine to enable the creation of one or more versions that can process multiple arguments using SIMD instructions from a single invocation from a SIMD loop.
#pragma omp declare simd [clause[ [, ]clause] ...] [#pragma omp declare simd [clause[ [, ]clause] ...]] [...] function definition or declaration
Specifies a loop that can be executed concurrently using SIMD instructions and that those iterations will also be executed in parallel by threads in the team.
#pragma omp for simd [clause[ [, ]clause] ...] for-loops
Any accepted by the simd or for directives with identical meanings and restrictions.
We already know that for the new release of OpenMP 4.0, we plan to decouple the examples from the specification to make their maintenance simpler. The examples will likely be issued as a separately maintained document.
There is more. OpenMP internally is also going through fundamental changes. We completed an internal survey of its members on the future of OpenMP and we are redefining our mission statement . It used to be :
"Standardize and unify shared memory, thread-level parallelism for HPC”
Much of this probably needs to change in view of our penetration into commercial market and accelerators new memory architecture.
Want even more ? Once OpenMP 4.0 is out, we plan to rapidly begin work on 5.0 and that will begin in the June 2013 Niagara Falls meeting.
Where does OpenMP fit within the scheme of the world? Well, I and many others like to view it as a much more rapidly moving specification then an ISO Standard like C or C++. Yet, it can popularize, or commercialize experimental, or company-specific features for parallelism by making such features widely available. As such, we are already seeing this transfer as more OpenMP features are moving into C and C++, as witness this C++
The parallel programming language world is clearly undergoing major tectonic shifts. But much of this effort is the work of hundreds of very talented experts who have been meeting weekly on the phone, meeting face-to-face and exchanging thousands of emails. Without them, this could not happen and we thank them for their dedication.