This blog post has been hanging around in my draft folder for quite some time now. I have been talking about the changes a lot during the last couple of days and I just found a few minutes to finish it to publish my notes here. From a C++ programmer’s point of view, OpenMP 3.0 comes with a few improvements and since I was involved in getting those into the specification, this post is about explaining what we achieved and why we did not (yet) went further with some features…
1: The for-Worksharing has been enhanced to include RandomAccessIterators and both signed and unsigned integers and even C-style pointers.
- RandomAccessIterators allow for constant-time increment, decrement, advance and distance computation, as they basically encapsulate pointer arithmetic. The 3.0 specification allows the use of loop variables of RandomAccessIterator type for loops used with for-Worksharing, as long as only the following relational operators are used inside the loop expression: <, <=, >, >=.
While it is certainly nice to have this in, many programmers (including me) will find the != operator missing. The reason for the exclusion is that not all language committee members could be convinced that the number of loop iterations could be computed beforehand when this operator would be allowed (or: that the overflow-behavior would be equivalent to the integer case). By the time we decided to stop adding / extending features, we did not had answers for all questions and theoretical counter-examples, so != did not make it; nevertheless we hope to get it into the next specification.
- The 2.5 specification only allowed signed integer loop variables for loops used with for-Worksharing. This was bad, as it was incompatible with size_t, which is used to query the number of elements in a STL container, for example. The 3.0 specification allows both signed and unsigned integer variables, which is a rather small but nice improvement.
- C-style pointer loops have been allowed as well with the 3.0 specification. The same restrictions apply on the operator list as for the C++ RandomAccessIterator case.
2: It is now possible to threadprivatize static class member variables.
- It is important to note that only *static* class member variables can be made threadprivate. There have been certain use cases for this, for example the implementation of a Singleton pattern and thread-specific allocators, and the changes to the specification were minimal. This was probably just overlooked in the 2.5 specification process.
There were some requests to allow the privatization of class member variables in general, but this cannot be done by using the threadprivate clause since the address of general class member variables is not know at compile time.
3: We specified the lifetime and initialization of non-POD data types used in privatization clauses.
- The lifetime and initialization of non-POD data types was kind of unclear in the 2.5 specification and because of that, the behavior changed between different compilers. It was important to get this right, so we made some updates to the semantics of private variables of non-POD data types. There are a few things that are important to note:
- The order in which constructor calls and destructor calls for different threads happen is undefined. This is because we do not (want to) define the order in which threads are started, and you should never do any assumptions on that.
- The default constructor and destructor have to be accessible. If, for example, the default constructor is private, the program is non-conforming if such an object occurs in a private clause.
- We stated some things explicitly, e.g. that private objects have to be destructed at the end of a Parallel Region. This should force implementations to become consistent. Of course you should be aware that implementations are allowed to introduce additional objects of automatic storage duration if they “like”, this is granted by the C++ standard. The following lists a brief overview what happens with C++ non-POD data types for the different privatization clauses:
- private: There has to be an accessible, unambiguous default constructor which is called for each object, and the object is destructed at the end of the Parallel Region via the accessible, unambigous destructor.
- firstprivate: The private instances are copy-constructed, the argument for the copy constructor call is the original list item. Of course it is required for such a data type to have an accessible, unambiguous copy constructor.
- lastprivate: The value is written back to the original list item by using the accessible, unambiguous copy assignment operator. A suitable constructor has to be available unless the data type is used in a firstprivate clause, then a suitable copy constructor is needed.
- threadprivate: Here we have to differentiate three kinds of initialization: (i) no initialization, then the default constructor is called; (ii) direct initialization, then the constructor accepting the argument is called; (iii) copy initialization, then the copy constructor is called. In any case, the objects have to be constructed before the first reference, and have to be destructed after the last reference and before the program has been terminated.
- threadprivate+copyin and threadprivate+copyprivate: Regarding the initialization the rules for threadprivate apply for the first encountered Parallel Region, at any following Parallel Region the copy assignment operator is invoked.
This is just a brief summary of the changes we made, I hope this is of interest for at least some person other than me. From my point of view, there is still one “simple” thing missing: Allowing non-POD data types in reductions. By the time we decided to stop adding / extending features, we did not find a consensus on how the initialization for the reduction may occur. This is imporant, because with overloading you basically can implement user-defined reductions. We really hope to have that in the next specification update!