Tag Archives: Tasking

OpenMP 4.5 is on its way

Last week the first ever OpenMPCon and the IWOMP 2015 workshops took place here in Aachen and this week we hosted the OpenMP Language Committee face-to-face meeting. An important goal of this meeting was to address the remaining open issues and then to complete the specification work to make the next version of OpenMP available in time for SC15. The OpenMP 4.1 draft was released in July this year and the comment period was open until the end of September. However, after realizing how many changes and particularly improvements this next version will bring, we decided that we want to call it OpenMP 4.5. Keeping the major version at “4” assures that the changes will not break any existing code.

Within the next few weeks the Language Committee will make the final changes to the spec text and a final verification run, then the document will be handed over to the OpenMP Architecture Review Board (ARB) for the voting. It can be expected that the ARB will accept the document as the new version of the OpenMP standard. It will then be released early at SC15. If you like to know what it is and talk to us, come to our tutorial on Advanced OpenMP Programming on the Monday right before SC15.

Actually the number of changes is large, about 130 tickets have been passed. Any change to the spec text is represented by a ticket capturing the changes in LaTeX code and of course the corresponding discussion(s). However, there are tickets of different size: some are small and contain only minor corrections, others are large and bring lots of new functionality (I think the target stuff of OpenMP 4.0 was captured in two huge tickets).

OpenMP 4.5 will obviously bring many clarifications and some minor corrections, but also some notable enhancements:

  • We handled several items from the Fortran 2003 todo list. Fortran 2003 is now supported as a base language with a few exceptions mentioned explicitly.
  • SIMD and Tasking extensions and refinements made its way into OpenMP 4.5.
  • Finally, OpenMP will support reductions for C/C++ arrays and templates.
  • Runtime routines to support cancellation and affinity have been added.

We also introduced some new features:

  • Support for doacross loops.
  • Loops can now be divided into tasks with the taskloop construct.

I plan to talk about some of these here in detail.

At IWOMP, Bronis de Supinski from LLNL, who is the Chair of the OpenMP Language Committee, gave a talk on the State of OpenMP & Outlook on OpenMP 4.1 (back then we did not had decided to call it 4.5). We will make all the IWOMP talks available on the IWOMP homepage soon, but here are two of his slides outlining the most important new additions above from what I mentioned already:

Bronis: OpenMP 4.1 Features (1/2)

Bronis: OpenMP 4.1 Features (1/2)

Bronis: OpenMP 4.1 Features (2/2)

Bronis: OpenMP 4.1 Features (2/2)

During his talk he also outlined what is on the agenda for OpenMP 5.0:

Bronis: OpenMP 5.0 Plans

Bronis: OpenMP 5.0 Plans

OpenMP 4.0 RC2 is well on it’s way

Long time no blog post. But I have good news to share today: Yesterday the OpenMP Language Committee (LC) hold the final votes on a set of tickets (read: extensions or corrections) to find their way into OpenMP 4.0 RC2, the second release candidate of OpenMP 4.0, the anticipated next version of the specification. These tickets are basically the outcome of the last LC meeting in January plus some first feedback we received on RC1. And they bring some very nice new features to OpenMP (some of which are well overdue).

Before I give a brief overview of the new additions, some remarks on the procedure leading to the final OpenMP 4.0 specification. Our aim is to have the RC2 document ready and published in roughly two weeks from now. This is now hard work for our editor Richard Friedman, as all tickets are written as a diff to the currently latest spec, namely RC1. After the release of RC2, we will again solicit feedback on the new spec draft. This feedback is important for the final voting by all OpenMP members in the Architecture Review Board (ARB), as the ARB is the owner of the spec and has to formally accept the new spec proposed by the LC. Only then – given majority acceptance in the ARB vote – OpenMP 4.0 will be released. During the feedback period, the LC still has to complete some aspects of the spec, which are not ready yet, like the appendix. Especially the examples are not complete. And if anyone of the public reviewers finds a serious flaw in any of the new extensions, we will face the problem of fixing if quickly (if possible) or withdrawing it from OpenMP 4.0. This means the following additions have very high probability to be part of OpenMP 4.0, but nothing is guaranteed yet.

Cancellation. At so-called cancellation points the implicit and explicit tasks check for whether cancellation has been requested and if so, they abort the current region and jump right to the end of it. Cancellation can be requested via the new cancel construct, which is able to cancel either the whole parallel region, the innermost sections worksharing construct, the innermost for worksharing construct, or the innermost taskgroup, with innermost always defined regarding to the thread team encountering the cancel construct. Control flow will resume right after the end of the cancelled region. A thread or task requesting cancellation  does not lead to immediate abort of all other threads or tasks in the respective region, instead these will only abort execution once a cancellation points has been reached. Cancellation points are part of  barriers, the cancel construct itself, or used-defined via the cancellation point construct. This addition is the first step to a fully-featured error model in OpenMP.

Task Dependencies. The optional depend clause on a task enforces additional constraints on the scheduling of a task by enabling dependencies  between sibling tasks. It is of the form depend(dependency-type: list) accepting a list of variables. If an in dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an out or inout clause. If an out or inout dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an in, out, or inout clause. The principle model you should have in mind with this feature is the flow of data: if a variable appears in an in-type depend clause of a given task, this given task has to wait for all task in which this particular variable appears in an out-type depend clause, as these tasks first have to write to the variable as otherwise the update would be lost. And vice versa. This is why out/inout-type dependences also enforce “waiting” for out/inout-dependences, as hereby an ordering of the task execution is enforced. In the current form of this feature, there is no observable difference in task scheduling between the out and inout dependence types, but we forsee to allow certain types of optimizations in the future if there is a distinction between out and inout. Additionally we found that having both types better maps to the data-flow model.

Array Sectioning. Several clauses in OpenMP 4.0 require the ability to describe a subset of native arrays, especially the support for accelerators in the target construct (see below). Array sectioning allows to define a subset of the elements in an array via [lower-bound : length], or [lower-bound :], or [: length] or [:]. The use of array sectioning is restricted to selected constructs and clauses.

Support for Accelerators. This is really the big new thing in OpenMP 4.0 which the LC aimed for getting ready for inclusion. The target construct allows for the execution of OpenMP constructs on a device other than the current host/device. The target data construct creates a device data environment and allows for the “mapping” of data between the different devices – for current accelerators this means copying data from the host to the device and vice versa. The declare target directive instructs the OpenMP implementation to create device-specific versions of the variables or functions specified, meaning they are available (for execution) on the device. In order to support vector-style operations of current accelerators, there are two new constructs: the teams construct creates several OpenMP thread teams (then called a league) of which only the master executes the associated region and the distribute construct specifies that the corresponding loop iterations will be executed by the thread teams. — I understand this very brief description cannot serve as a good explanation for this new feature, but I don’t have my examples ready yet. If you know OpenACC and/or the PGI Accelerator programming model, you probably got a clue of what will be in OpenMP. Personally, I regard OpenMP 4.0 with this extension as a superset of OpenACC, in which the common roots are visible. More information and documentation on this new feature, which I like to call “OpenMP for Accelerators (OpenMP4Acc)”, should become available with the release of the OpenMP 4.0 RC2 spec draft. By the latest on March 14th for our next PPCES event I will have a more detailed introduction along with some examples ready and will put them here as well.

Print Environment Settings. The the OMP_DISPLAY_ENV environment variables is set to true, the execution environment is instructed to display the OpenMP version number as well as the values of all the ICVs (ICV = Internal Control Variable) after evaluating the user options before starting the actual program execution. This is very helpful if one uses multiple environments.

Adding so many things to OpenMP 4.0 (don’t forget the new features already present in OpenMP 4.0 RC1) also has at least one obvious downside: the specification itself has become almost unreadable for the average OpenMP user. I cannot completely exclude myself here, although I spent a reasonable amount of my work time dealing with OpenMP itself. This clearly underlines the need for good books on OpenMP 4.0 programming, but I am not aware of anyone currently working on such a thing. Several members of the LC as well as well-know instructors from academia will for sure add OpenMP 4.0 aspects to their lectures and tutorials soon, but this is only the first tiny step towards OpenMP 4.0 adoption. I am curious to see how programmers will pick up the new goodness…