Where is OpenCL and OpenMP headed?
Michael_Wong 120000M1EH Visits (6434)
The following is a a private communication from an IBM engineer Matthew Markland who asked a great question. I do not claim great expertise but I feel that there is enough of an opinion piece that some folks may like to see this discussion or continue it. I have edited the response somewhat but it is largely in tact and reprinted with Matthew's permission. Note I have no insight into PGI or any other product other then what I read in public articles, and as such makes no product claim. Any opinion regarding other company remain necessarily my own and is not IBM's position.
Michael:Please join in the discussion, or even bring this up as other experts will chime in.
I hope that the new year finds everything well for you and yours.
I'm enjoying the C/C++ Cafe posts you guys put out immensely.
I just wanted to get your opinion on some things that have been
going through my mind with respect to the multicore/hybrid
programming models that are being put out by various entities. It
seems that many people believe that the best model is an extension
to the language model, be it a pure language extension like what
CUDA and OpenCL have, or with a new model of pragmas like PGI is
OpenCL/CUDA is mostly a library based model and a language extension(modulo the 4 memory annotations). But yes I see where you are going with this ...
adding. I'm wondering, especially in the case of the PGI extensions,I am assuming this is the pragma directives available in their technology preview:
whether they make sense given the existing OpenMP specSo there has been parallel languages that are directive based, language extensions, and library based. Usually they start off with library based because they are easy to port, and works on many vendors' compiler. Language-based solutions are harder to implement, and can not be easily corrected if wrong. Directive-based like OpenMP makes it easily adapted in an incremental manner, and keeps the base program running even on platforms that don't accept the directive. Today, we have examples of all three. MPI is a pure library based solution. Cilk is a pure language based solution and OpenMP is a directive-based solution (although it too has a library part).
Where do you
A mostly library based language like OpenCL is in a sense a step backwards. So PGI is trying a directive based approach to send the computational kernel to the accelerator/GPGPU. This is a bet from their part. I am familiar with their chief compiler engineer on the OpenMP Committee Michael Wolfe, and respects his opinion.
see this headed from a personal perspective.
Having some involvement in OpenCL, I can see where it falls somewhat short, but is nevertheless a tremendous accomplishment. It is designed for today's GPGPU architecture, assumes a weak memory model, implicitly have a dual layer of scheduling policy between the host (outer asynchronous layer) and the thread processors (inner synchronous processors with local memory). This is in addition to it being still relatively hard to program,( though easier then DirectX or OpenGL) and for people who have to port a 100,000 line of code is a large commitment on a technology that may not be around. OpenCL, is still a stream processing language and as such is limited in the scope of the parallel programs it can speed-up. What PGI is probably looking for is a more generalized programming model which works in broader situation. That is why they introduced the scheduling clause, and tied it to OpenMP. I would not be surprised if some kind of heterogenous programming support would be in OpenMP in future.
I don't have any significant personal insight but also is involved in adapting the OpenMP paradigm to fit in the next programming model without knowing where to go.
In the end (and this is based on Michael Wolfe's excellent analogy in an HPC paper), OpenCL is basically designed for a hardware that is a large wide body air carrier that can handle massive number of passengers in one run, but requires special airport transportation to get the passengers to the plane because the plane doesn't fit in the terminal. So the speed it has (in terms of # of passenger-miles) is mitigated by the wait time (DMA access)of loading the plane. It works when everything fits.
If you don't have that many passengers, or have a variable number of passengers, it doesn't buy you any extra benefit and may penalize you with a super wide-body jet. And there are lots of other kinds of air carriers out there, including the super-fast kind for the payload just has to get there by 9 am the next day and the medium sized ones that can carry your particular amount of load.
As such, there will still be a place for OpenMP, MPI, TBB, futures, UPC, TM. We are suffering under an alarming number of these so-called parallel lang