Category Archives: Future of HPC

OpenMP 4.5 is on its way

Last week the first ever OpenMPCon and the IWOMP 2015 workshops took place here in Aachen and this week we hosted the OpenMP Language Committee face-to-face meeting. An important goal of this meeting was to address the remaining open issues and then to complete the specification work to make the next version of OpenMP available in time for SC15. The OpenMP 4.1 draft was released in July this year and the comment period was open until the end of September. However, after realizing how many changes and particularly improvements this next version will bring, we decided that we want to call it OpenMP 4.5. Keeping the major version at “4” assures that the changes will not break any existing code.

Within the next few weeks the Language Committee will make the final changes to the spec text and a final verification run, then the document will be handed over to the OpenMP Architecture Review Board (ARB) for the voting. It can be expected that the ARB will accept the document as the new version of the OpenMP standard. It will then be released early at SC15. If you like to know what it is and talk to us, come to our tutorial on Advanced OpenMP Programming on the Monday right before SC15.

Actually the number of changes is large, about 130 tickets have been passed. Any change to the spec text is represented by a ticket capturing the changes in LaTeX code and of course the corresponding discussion(s). However, there are tickets of different size: some are small and contain only minor corrections, others are large and bring lots of new functionality (I think the target stuff of OpenMP 4.0 was captured in two huge tickets).

OpenMP 4.5 will obviously bring many clarifications and some minor corrections, but also some notable enhancements:

  • We handled several items from the Fortran 2003 todo list. Fortran 2003 is now supported as a base language with a few exceptions mentioned explicitly.
  • SIMD and Tasking extensions and refinements made its way into OpenMP 4.5.
  • Finally, OpenMP will support reductions for C/C++ arrays and templates.
  • Runtime routines to support cancellation and affinity have been added.

We also introduced some new features:

  • Support for doacross loops.
  • Loops can now be divided into tasks with the taskloop construct.

I plan to talk about some of these here in detail.

At IWOMP, Bronis de Supinski from LLNL, who is the Chair of the OpenMP Language Committee, gave a talk on the State of OpenMP & Outlook on OpenMP 4.1 (back then we did not had decided to call it 4.5). We will make all the IWOMP talks available on the IWOMP homepage soon, but here are two of his slides outlining the most important new additions above from what I mentioned already:

Bronis: OpenMP 4.1 Features (1/2)

Bronis: OpenMP 4.1 Features (1/2)

Bronis: OpenMP 4.1 Features (2/2)

Bronis: OpenMP 4.1 Features (2/2)

During his talk he also outlined what is on the agenda for OpenMP 5.0:

Bronis: OpenMP 5.0 Plans

Bronis: OpenMP 5.0 Plans

OpenMP 4.0 almost ready after recent F2F meeting

Last week’s OpenMP Language Committee face-to-face (F2F) meeting was meant to resolve the final outstanding issues to get the OpenMP 4.0 specification ready. With this week’s concall I assume we achieved just that and now it is our editor’s turn to apply all remaining tickets to the spec document. After that, the OpenMP ARB will perform the official vote on July 11th (if my calendar is correct), which in case of a positive vote will then also be the release date of the OpenMP 4.0 spec. This voting is generally considered just a formality, as the OpenMP member companies and institutions sending staff to the Language Committee also constitute the OpenMP ARB. OpenMP 4.0 will not break existing codes.

If you are interested in learning about the new features, you may want to stop by at the JARA-HPC booth #755 at ISC in Leipzig next week. We have (preliminary) OpenMP 4.0 syntax reference cards as handouts for you. If you want to meet me in person, you are welcome to visit the booth during my booth duties on Monday (11:30h to 13:00h), Tuesday (11:30 to 13:00h) or Wednesday (13:00h to 14:30h).

Advanced OpenMP Tutorial @ ISC in June in Leipzig

The International Supercomputing Conference (ISC) will take place in Leipzig, Germany, next week from June 16th to June 20th, 2013. This year the program contains tutorials again and the team of Bronis de Supinski (LLNL), Michael Klemm (Intel) and myself will offer the Advanced OpenMP Programming tutorial on June 16th, 9:00 AM to 1:00 PM. If you are interested in learning about performance-focused OpenMP programming and the new features in OpenMP 4.0, this might be the right one for you, although we obviously cannot cover everything in detail in just the 4 hours we got. We asked for a full day, but got only a half one.

While we quickly review the basics of OpenMP programming, we assume attendees understand basic parallelization concepts and will easily grasp those basics. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of SIMD vector units. We discuss language features in-depth, with emphasis on features recently added to OpenMP such as tasking. We close with an overview of the new OpenMP 4.0 directives for attached compute accelerators. This is our detailed agenda:

  1. OpenMP Overview (15 minutes)
    1. Core Concepts: Parallel Region, Worksharing, Nesting
    2. Synchronization: Synchronization Constructs and the Memory Model
  2. Techniques to Obtain High Performance with OpenMP: Memory Access (45 minutes)
    1. Understanding Memory Access Patterns
    2. Memory Placement and Thread Binding
    3. Performance Tips and Tricks: Avoiding False Sharing, Private versus Shared Data
  3. Techniques to Obtain High Performance with OpenMP: Vectorization (30 minutes)
    1. Understanding Vector Microarchitectures
    2. Vectorization with OpenMP 4.0
  4. Advanced Language features (60 minutes)
    1. The OpenMP Tasking Model
    2. Tasking in Detail: Final, Mergeable, and Dependencies
    3. Cancellation
    4. Misc. OpenMP 4.0 Features: Controlling the Implementation, Reduction Extensions, Improved Atomic Support
  5. OpenMP for Attached Compute Accelerators (45 minutes)
    1. The OpenMP Execution Model for Devices
    2. Target Construct
    3. OpenMP on the Intel Xeon Phi Coprocessor Examples
  6. 6. Future OpenMP Directions (15 minutes)
    1. Comprehensive OpenMP new Features Overview
    2. OpenMP 4.0 and beyond Status, Directions and Schedule
    3. Open Discussion of Possible OpenMP Extensions (until we got thrown out of the room or people have left for lunch)

OpenMP 4.0 RC2 has been released

In addition to the new features already present in release candidate one (RC1), the second draft of the next OpenMP specification release contains the following additions (quoted from

  • Initial accelerators support: Device Data Environments (p16), target constructs (p68: target, target data, target update, declare target, teams, distribute; p151: map clause; and associated runtime routines (p191).)
  • Task dependency support through the new depend clause. (p91)
  • Initial error model support through cancel and cancellation point constructs to request cancellation of specified region types and to declare a user-defined cancellation point to  check for cancellation requests. (Section 2.13, p116: Cancellation Constructs)
  • Support for array sections in C, C++ and Fortran. (Section 2.4, p36: Array Sections)
  • Extended declare simd directive to allow multiple declarations. (p64)
  • New environment variable OMP_DISPLAY_ENV instructing the runtime to display the OpenMP version number and ICV values during initialization. (p219)
  • Additional enhancements to support Fortran 2003.

As we were not yet able to incorporate all the feedback that has been reported so far, a few know issues are still in the document. Additionally, some more minor changes are already in preparation. Feedback and questions are of course still welcomed, so head over to and download the new document.

OpenMP 4.0 RC2 is well on it’s way

Long time no blog post. But I have good news to share today: Yesterday the OpenMP Language Committee (LC) hold the final votes on a set of tickets (read: extensions or corrections) to find their way into OpenMP 4.0 RC2, the second release candidate of OpenMP 4.0, the anticipated next version of the specification. These tickets are basically the outcome of the last LC meeting in January plus some first feedback we received on RC1. And they bring some very nice new features to OpenMP (some of which are well overdue).

Before I give a brief overview of the new additions, some remarks on the procedure leading to the final OpenMP 4.0 specification. Our aim is to have the RC2 document ready and published in roughly two weeks from now. This is now hard work for our editor Richard Friedman, as all tickets are written as a diff to the currently latest spec, namely RC1. After the release of RC2, we will again solicit feedback on the new spec draft. This feedback is important for the final voting by all OpenMP members in the Architecture Review Board (ARB), as the ARB is the owner of the spec and has to formally accept the new spec proposed by the LC. Only then – given majority acceptance in the ARB vote – OpenMP 4.0 will be released. During the feedback period, the LC still has to complete some aspects of the spec, which are not ready yet, like the appendix. Especially the examples are not complete. And if anyone of the public reviewers finds a serious flaw in any of the new extensions, we will face the problem of fixing if quickly (if possible) or withdrawing it from OpenMP 4.0. This means the following additions have very high probability to be part of OpenMP 4.0, but nothing is guaranteed yet.

Cancellation. At so-called cancellation points the implicit and explicit tasks check for whether cancellation has been requested and if so, they abort the current region and jump right to the end of it. Cancellation can be requested via the new cancel construct, which is able to cancel either the whole parallel region, the innermost sections worksharing construct, the innermost for worksharing construct, or the innermost taskgroup, with innermost always defined regarding to the thread team encountering the cancel construct. Control flow will resume right after the end of the cancelled region. A thread or task requesting cancellation  does not lead to immediate abort of all other threads or tasks in the respective region, instead these will only abort execution once a cancellation points has been reached. Cancellation points are part of  barriers, the cancel construct itself, or used-defined via the cancellation point construct. This addition is the first step to a fully-featured error model in OpenMP.

Task Dependencies. The optional depend clause on a task enforces additional constraints on the scheduling of a task by enabling dependencies  between sibling tasks. It is of the form depend(dependency-type: list) accepting a list of variables. If an in dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an out or inout clause. If an out or inout dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an in, out, or inout clause. The principle model you should have in mind with this feature is the flow of data: if a variable appears in an in-type depend clause of a given task, this given task has to wait for all task in which this particular variable appears in an out-type depend clause, as these tasks first have to write to the variable as otherwise the update would be lost. And vice versa. This is why out/inout-type dependences also enforce “waiting” for out/inout-dependences, as hereby an ordering of the task execution is enforced. In the current form of this feature, there is no observable difference in task scheduling between the out and inout dependence types, but we forsee to allow certain types of optimizations in the future if there is a distinction between out and inout. Additionally we found that having both types better maps to the data-flow model.

Array Sectioning. Several clauses in OpenMP 4.0 require the ability to describe a subset of native arrays, especially the support for accelerators in the target construct (see below). Array sectioning allows to define a subset of the elements in an array via [lower-bound : length], or [lower-bound :], or [: length] or [:]. The use of array sectioning is restricted to selected constructs and clauses.

Support for Accelerators. This is really the big new thing in OpenMP 4.0 which the LC aimed for getting ready for inclusion. The target construct allows for the execution of OpenMP constructs on a device other than the current host/device. The target data construct creates a device data environment and allows for the “mapping” of data between the different devices – for current accelerators this means copying data from the host to the device and vice versa. The declare target directive instructs the OpenMP implementation to create device-specific versions of the variables or functions specified, meaning they are available (for execution) on the device. In order to support vector-style operations of current accelerators, there are two new constructs: the teams construct creates several OpenMP thread teams (then called a league) of which only the master executes the associated region and the distribute construct specifies that the corresponding loop iterations will be executed by the thread teams. — I understand this very brief description cannot serve as a good explanation for this new feature, but I don’t have my examples ready yet. If you know OpenACC and/or the PGI Accelerator programming model, you probably got a clue of what will be in OpenMP. Personally, I regard OpenMP 4.0 with this extension as a superset of OpenACC, in which the common roots are visible. More information and documentation on this new feature, which I like to call “OpenMP for Accelerators (OpenMP4Acc)”, should become available with the release of the OpenMP 4.0 RC2 spec draft. By the latest on March 14th for our next PPCES event I will have a more detailed introduction along with some examples ready and will put them here as well.

Print Environment Settings. The the OMP_DISPLAY_ENV environment variables is set to true, the execution environment is instructed to display the OpenMP version number as well as the values of all the ICVs (ICV = Internal Control Variable) after evaluating the user options before starting the actual program execution. This is very helpful if one uses multiple environments.

Adding so many things to OpenMP 4.0 (don’t forget the new features already present in OpenMP 4.0 RC1) also has at least one obvious downside: the specification itself has become almost unreadable for the average OpenMP user. I cannot completely exclude myself here, although I spent a reasonable amount of my work time dealing with OpenMP itself. This clearly underlines the need for good books on OpenMP 4.0 programming, but I am not aware of anyone currently working on such a thing. Several members of the LC as well as well-know instructors from academia will for sure add OpenMP 4.0 aspects to their lectures and tutorials soon, but this is only the first tiny step towards OpenMP 4.0 adoption. I am curious to see how programmers will pick up the new goodness…