You might have heard it already: The next incarnation of the OpenMP specification, which is targeted to be released as version 3.1 around June in time for IWOMP 2011 in Chicago, has been published as a Draft for Public Comment. You may think of it as beta :-).
Back in October 2009, I already commented on some of the goals for versions 3.1 and 4.0. OpenMP 3.1 addresses some issues found in the 3.0 specification and brings only minor functional improvements, still it will be released with a delay of almost one year to our initially planned schedule. However, work on version 4.0 already made some significant progress, including support for accelerators (GPUs), further enhancements to the tasking model, and support for error handling. Taking the outline of my previous post on the development of OpenMP, this is the list of updates to be found in OpenMP 3.1 and the status of the development towards OpenMP 4.0 (expressed in my own words and stating my own beliefs and opinions):
1: Development of an OpenMP Error Model. There is nothing new on this topic in OpenMP 3.1. However, with respect to OpenMP 4.0, the so-called done directive has been discussed for quite some time already. It can be used to terminate the execution of a Parallel Region, or a Worksharing construct, or a Task construct, and it is a prominent candidate for the next OpenMP spec. It would provide necessary functionality towards full-featured error handling capabilities, for which there is no good proposal that could be agreed upon yet.
2: Interoperability and Composability. There is nothing new on this topic in OpenMP 3.1. We made several experiments, gained some insights, and the goal is to come up with a set of reliable expectations and assertions in the OpenMP 4.0 timeframe.
3: Incorporating Tools Support into the OpenMP Specification. There is currently no activity on this topic in the OpenMP Language Committee in general.
4: Associating Computation or Memory across Workshares. There is little progress in this direction to be found in OpenMP 3.1. The environment variable OMP_PROC_BIND has been added to control the binding of threads to processors, it accepts a boolean value. If enabled, the OpenMP runtime is instructed to not move OpenMP threads between processors. The mapping of threads to processors is unspecified and thus depends on the implementation. In general, introducing this variable that controls program-wide behavior was intended to standardize behavior found in almost all current OpenMP implementations.
5: Accelerators, GPUs and More. While there is nothing new on this topic in OpenMP 3.1, the Accelerator subcommittee put a lot of effort into coming up with a first (preliminary!) proposal. This is clearly interesting. From my personal point of view, OpenMP 4.0 might provide basic support for programming accelerators such as GPUs, thus delivering a vendor-neutral standard. Do not expect anything full-featured similar to CUDA, the current proposal is rather similar in spirit to the PGI Accelerator approach (which I do like). However, this is still far from being done, and may (or may not) change directions completely. The crucial aspects are to integrate well with the rest of OpenMP, and to provide an easy to use but still powerful approach to allow for bringing certain important code patterns to accelerator devices.
6: Transactional Memory and Thread Level Speculation. There is in general no activity on this topic in the OpenMP Language Committee and apparently it dropped from the set of important topics. Personally, (now) I do not think TM should be a target for OpenMP in the forseable future.
7: Refinements to the OpenMP Tasking Model. There have been some improvements to the Tasking model, with some more on the roadmap for OpenMP 4.0.
- The taskyield directive has been added to allow for user-defined task scheduling (tsp) points. A tsp is a point in the execution of a task at which is can be suspended to be resumed later; or the event of task completion, after which the executing thread may switch to a different task.
- The mergeable clause has been added to the list of possible task clauses, indicating that the task may have the same data region as the generating task region.
- The final clause has been added to the list of possible task clauses, denoting the execution of all descending tasks sequentially in the same region. This implies immediate execution of final tasks, and ignoring any untied task clauses. An optional scalar expression allows for conditioning the application of the final clause.
8: Extending OpenMP to C++0x and FORTRAN 2003. There is nothing new on this topic in OpenMP 3.1. We closely follow the development of the base language and it has to be seen what can (or has to) be done for OpenMP 4.0. Anyhow, the fact that base languages are introducing threading and a thread-aware memory model leads to some simplifications on the one hand, but also could lead to potential conflicts on the other hand. We are not aware of any such conflict, but digging through the details and implification of a base language such as C++ as well as OpenMP is a pretty complex task.
9: Extending OpenMP to Additional Languages. There is nothing new on this topic in OpenMP 3.1, and currently there is no intention of doing so inside the OpenMP Language Committee. Personally, I would like to see an OpenMP binding for Java, since it would really help teaching parallel programming, but I do not see this happen.
10: Clarifications to the Existing Specifications. There have been plenty of clarification, corrections, and micro-updates. Most notably the examples and description in the appendix have been corrected, clarified, and expanded.
11: Miscellaneous Extensions. A couple of miscellaneous extensions made it into OpenMP 3.1:
- The atomic construct has been extended to accept the following new clauses: read, write, update and capture. If none is given, it defaults to update. Specifying an atomic region allows to atomically read / write / update the value of the variable affected by the construct. Note that not everything inside an atomic region is performed atomically, i.e. the evaluation of “other” variables is not. For example in an atomic write construct, only the left-hand variable (the one that is written to) is written atomically.
- The firstprivate clause now accepts const-qualified types in C/C++ as well as intent(in) in Fortran. This is just a reaction to annoyances reported by some users.
- The reduction clause has been extended to allow for min and max reductions for built-in datatypes in C/C++. This still excludes aggregate types (including arrays) as well as pointer and reference types from being used in an OpenMP reduction. We had a proposal for powerful user-defined reductions (UDRs) on the table for a long time, it was discussed heavily, but did not make it into OpenMP 3.1. That would have made this release of the spec much stronger. Adding UDRs is a high priority for OpenMP 4.0 for many OpenMP Language Committee members, though.
- omp_in_final() is as new API routine to determine whether it is called from within a final (aka included) task region.
12: Additional Task / Threads Synchronization Mechanisms. There is nothing new on this topic in OpenMP 3.1, and not much interest in the OpenMP Language Committee that I have noticed. However, we are thinking of task dependencies and task reductions for OpenMP 4.0, and both feature would probably fall into this category (and then there would be a high interest).