New article on OpenMP 4.0 online

A while ago I published a list with articles and tutorials on OpenMP 4.0, including the German article on heise Developer I wrote together with Michael Klemm (Intel). A slightly modified English version of our text now appeared in issue 16 of Intel’s Parallel Universe magazine, titled Full throttle: OpenMP 4.0.

The current issue and also past issues of the Parallel Universe magazine are available at If you are interested in developing parallel code for Intel architectures you might find some interesting reads over there.

Gaussian Elimination Squad wins Intel Parallel Universe Computing Challenge at SC13

The German team with the glorious name Gaussian Elimination Squad made the first rank in the Intel Parallel Universe Computing Challenge! Each round of the challenge consisted of two parts: the first was a trivia challenge with 20 questions about computing, computer history, programming languages, and the SC conference series; the second part was a coding challenge, which gave each team ten minutes to speed up a piece of code they had never seen before as much as possible. On top of it all, the audience could watch what the teams were doing on two giant screens. Georg Hager, our team caption, has a blog post with all the details.

The competition was really a lot of fun and a nice distraction from an otherwise pretty busy SC13. There is a short video capturing the atmosphere during the final competition and also a brief article on insideHPC.

The Gaussian Elimination Squad represented the German HPC community, with members from RWTH Aachen (Christian Terboven and Joachim Protze), Jülich Supercomputing Center (Damian Alvarez), ZIH Dresden (Michael Kluge and Guido Juckeland), TU Darmstadt (Christian Iwainsky), Leibniz Supercomputing Center (Michael Ott), and Erlangen Regional Computing Center (Gerhard Wellein and Georg Hager). As only four team members were allowed per match, I was lucky to play together with Gerhard and Georg in all rounds, but the others helped us by shouting advice and answers they thought were correct.

Articles and Tutorials on OpenMP 4.0

You should have heard by now that OpenMP 4.0 has finally been released, you can find the official statement on It really is a major new release and therefore it will take a while until all implementations have incorporated all new features. Nevertheless, as some implementers already offer beta releases of their compiler products with some new OpenMP 4.0 features available, you might be interested in learning more about the new standard to get your hands dirty. In this blog post I collected links to the OpenMP 4.0 material I am currently aware of and give pointers to places and events at which you can learn more.

First, if you are fine with reading a German article, my friend Michael Klemm and I have written an overview piece discussing the most important changes and new additions (from our point of view), including some code examples. It has been published at heise Developer here: Together we also gave a corresponding presentation at parallel 2013, of which I made the slides available on my blog (slides in English), again with several code snippets.

End of July / early August we hold our “Parallel Programming Summer Course” at Aachen, during which OpenMP occupied two days of the agenda. The course material contains three slide decks on OpenMP which give a thorough introduction (I hope) into OpenMP Programming and touch the following new OpenMP 4.0 feature: device construct, task dependencies, thread affinity, array sections and user-defined reductions. I gave very similar talks at the Hartree Centre Summer School 2013.

Rolf Rabenseifner from HLRS also holds many very good courses on parallel programming. He is currently extending his material to cover selected OpenMP 4.0 topics, probably for the next course instance already.

If you attended ISC’13 in Leipzig, you had the chance to hear Bronis de Supinsky, Michael Klemm and myself in the half-day Advanced OpenMP Tutorial. Our slides are part of the tutorial proceedings.

At SC13 in Denver the same group plus Ruud van der Pas will talk about Advanced OpenMP: Performance and 4.0 Features, see This will be the first time we will focus in great detail on new features of OpenMP 4.0 and how to exploit those for programmability and performance. And finally at Euro-Par 2013 together with Tim Mattson I will be giving a half-day tutorial on Advanced OpenMP again, this time even more focussing on lower-level system details like the memory model and cache coherency mechanisms.

OpenMP 4.0 almost ready after recent F2F meeting

Last week’s OpenMP Language Committee face-to-face (F2F) meeting was meant to resolve the final outstanding issues to get the OpenMP 4.0 specification ready. With this week’s concall I assume we achieved just that and now it is our editor’s turn to apply all remaining tickets to the spec document. After that, the OpenMP ARB will perform the official vote on July 11th (if my calendar is correct), which in case of a positive vote will then also be the release date of the OpenMP 4.0 spec. This voting is generally considered just a formality, as the OpenMP member companies and institutions sending staff to the Language Committee also constitute the OpenMP ARB. OpenMP 4.0 will not break existing codes.

If you are interested in learning about the new features, you may want to stop by at the JARA-HPC booth #755 at ISC in Leipzig next week. We have (preliminary) OpenMP 4.0 syntax reference cards as handouts for you. If you want to meet me in person, you are welcome to visit the booth during my booth duties on Monday (11:30h to 13:00h), Tuesday (11:30 to 13:00h) or Wednesday (13:00h to 14:30h).

Advanced OpenMP Tutorial @ ISC in June in Leipzig

The International Supercomputing Conference (ISC) will take place in Leipzig, Germany, next week from June 16th to June 20th, 2013. This year the program contains tutorials again and the team of Bronis de Supinski (LLNL), Michael Klemm (Intel) and myself will offer the Advanced OpenMP Programming tutorial on June 16th, 9:00 AM to 1:00 PM. If you are interested in learning about performance-focused OpenMP programming and the new features in OpenMP 4.0, this might be the right one for you, although we obviously cannot cover everything in detail in just the 4 hours we got. We asked for a full day, but got only a half one.

While we quickly review the basics of OpenMP programming, we assume attendees understand basic parallelization concepts and will easily grasp those basics. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of SIMD vector units. We discuss language features in-depth, with emphasis on features recently added to OpenMP such as tasking. We close with an overview of the new OpenMP 4.0 directives for attached compute accelerators. This is our detailed agenda:

  1. OpenMP Overview (15 minutes)
    1. Core Concepts: Parallel Region, Worksharing, Nesting
    2. Synchronization: Synchronization Constructs and the Memory Model
  2. Techniques to Obtain High Performance with OpenMP: Memory Access (45 minutes)
    1. Understanding Memory Access Patterns
    2. Memory Placement and Thread Binding
    3. Performance Tips and Tricks: Avoiding False Sharing, Private versus Shared Data
  3. Techniques to Obtain High Performance with OpenMP: Vectorization (30 minutes)
    1. Understanding Vector Microarchitectures
    2. Vectorization with OpenMP 4.0
  4. Advanced Language features (60 minutes)
    1. The OpenMP Tasking Model
    2. Tasking in Detail: Final, Mergeable, and Dependencies
    3. Cancellation
    4. Misc. OpenMP 4.0 Features: Controlling the Implementation, Reduction Extensions, Improved Atomic Support
  5. OpenMP for Attached Compute Accelerators (45 minutes)
    1. The OpenMP Execution Model for Devices
    2. Target Construct
    3. OpenMP on the Intel Xeon Phi Coprocessor Examples
  6. 6. Future OpenMP Directions (15 minutes)
    1. Comprehensive OpenMP new Features Overview
    2. OpenMP 4.0 and beyond Status, Directions and Schedule
    3. Open Discussion of Possible OpenMP Extensions (until we got thrown out of the room or people have left for lunch)

OpenMP 4.0 RC2 has been released

In addition to the new features already present in release candidate one (RC1), the second draft of the next OpenMP specification release contains the following additions (quoted from

  • Initial accelerators support: Device Data Environments (p16), target constructs (p68: target, target data, target update, declare target, teams, distribute; p151: map clause; and associated runtime routines (p191).)
  • Task dependency support through the new depend clause. (p91)
  • Initial error model support through cancel and cancellation point constructs to request cancellation of specified region types and to declare a user-defined cancellation point to  check for cancellation requests. (Section 2.13, p116: Cancellation Constructs)
  • Support for array sections in C, C++ and Fortran. (Section 2.4, p36: Array Sections)
  • Extended declare simd directive to allow multiple declarations. (p64)
  • New environment variable OMP_DISPLAY_ENV instructing the runtime to display the OpenMP version number and ICV values during initialization. (p219)
  • Additional enhancements to support Fortran 2003.

As we were not yet able to incorporate all the feedback that has been reported so far, a few know issues are still in the document. Additionally, some more minor changes are already in preparation. Feedback and questions are of course still welcomed, so head over to and download the new document.