A while ago I published a list with articles and tutorials on OpenMP 4.0, including the German article on heise Developer I wrote together with Michael Klemm (Intel). A slightly modified English version of our text now appeared in issue 16 of Intel’s Parallel Universe magazine, titled Full throttle: OpenMP 4.0.
The current issue and also past issues of the Parallel Universe magazine are available at http://software.intel.com/en-us/intel-parallel-universe-magazine. If you are interested in developing parallel code for Intel architectures you might find some interesting reads over there.
The German team with the glorious name Gaussian Elimination Squad made the first rank in the Intel Parallel Universe Computing Challenge! Each round of the challenge consisted of two parts: the first was a trivia challenge with 20 questions about computing, computer history, programming languages, and the SC conference series; the second part was a coding challenge, which gave each team ten minutes to speed up a piece of code they had never seen before as much as possible. On top of it all, the audience could watch what the teams were doing on two giant screens. Georg Hager, our team caption, has a blog post with all the details.
The competition was really a lot of fun and a nice distraction from an otherwise pretty busy SC13. There is a short video capturing the atmosphere during the final competition and also a brief article on insideHPC.
The Gaussian Elimination Squad represented the German HPC community, with members from RWTH Aachen (Christian Terboven and Joachim Protze), Jülich Supercomputing Center (Damian Alvarez), ZIH Dresden (Michael Kluge and Guido Juckeland), TU Darmstadt (Christian Iwainsky), Leibniz Supercomputing Center (Michael Ott), and Erlangen Regional Computing Center (Gerhard Wellein and Georg Hager). As only four team members were allowed per match, I was lucky to play together with Gerhard and Georg in all rounds, but the others helped us by shouting advice and answers they thought were correct.
You will attend SC13 in Denver and you want to learn about using the new OpenMP 4.0 features? Our tutorial will help you out.
SC13 Advanced OpenMP Tutorial: OpenMP 4.0 Features
Posted in Future of HPC, NUMA, OpenACC, OpenMP, Tasking
Tagged Binding, C++, cc-NUMA, Loop Parallelization, OpenACC, OpenMP, SC13, Supercomputing, Teaching
You should have heard by now that OpenMP 4.0 has finally been released, you can find the official statement on openmp.org: http://openmp.org/wp/2013/07/openmp-40/. It really is a major new release and therefore it will take a while until all implementations have incorporated all new features. Nevertheless, as some implementers already offer beta releases of their compiler products with some new OpenMP 4.0 features available, you might be interested in learning more about the new standard to get your hands dirty. In this blog post I collected links to the OpenMP 4.0 material I am currently aware of and give pointers to places and events at which you can learn more.
First, if you are fine with reading a German article, my friend Michael Klemm and I have written an overview piece discussing the most important changes and new additions (from our point of view), including some code examples. It has been published at heise Developer here: http://www.heise.de/developer/artikel/Die-wichtigsten-Neuerungen-von-OpenMP-4-0-1915844.html. Together we also gave a corresponding presentation at parallel 2013, of which I made the slides available on my blog (slides in English), again with several code snippets.
End of July / early August we hold our “Parallel Programming Summer Course” at Aachen, during which OpenMP occupied two days of the agenda. The course material contains three slide decks on OpenMP which give a thorough introduction (I hope) into OpenMP Programming and touch the following new OpenMP 4.0 feature: device construct, task dependencies, thread affinity, array sections and user-defined reductions. I gave very similar talks at the Hartree Centre Summer School 2013.
Rolf Rabenseifner from HLRS also holds many very good courses on parallel programming. He is currently extending his material to cover selected OpenMP 4.0 topics, probably for the next course instance already.
If you attended ISC’13 in Leipzig, you had the chance to hear Bronis de Supinsky, Michael Klemm and myself in the half-day Advanced OpenMP Tutorial. Our slides are part of the tutorial proceedings.
At SC13 in Denver the same group plus Ruud van der Pas will talk about Advanced OpenMP: Performance and 4.0 Features, see http://sc13.supercomputing.org/content/tutorials. This will be the first time we will focus in great detail on new features of OpenMP 4.0 and how to exploit those for programmability and performance. And finally at Euro-Par 2013 together with Tim Mattson I will be giving a half-day tutorial on Advanced OpenMP again, this time even more focussing on lower-level system details like the memory model and cache coherency mechanisms.
Last week’s OpenMP Language Committee face-to-face (F2F) meeting was meant to resolve the final outstanding issues to get the OpenMP 4.0 specification ready. With this week’s concall I assume we achieved just that and now it is our editor’s turn to apply all remaining tickets to the spec document. After that, the OpenMP ARB will perform the official vote on July 11th (if my calendar is correct), which in case of a positive vote will then also be the release date of the OpenMP 4.0 spec. This voting is generally considered just a formality, as the OpenMP member companies and institutions sending staff to the Language Committee also constitute the OpenMP ARB. OpenMP 4.0 will not break existing codes.
If you are interested in learning about the new features, you may want to stop by at the JARA-HPC booth #755 at ISC in Leipzig next week. We have (preliminary) OpenMP 4.0 syntax reference cards as handouts for you. If you want to meet me in person, you are welcome to visit the booth during my booth duties on Monday (11:30h to 13:00h), Tuesday (11:30 to 13:00h) or Wednesday (13:00h to 14:30h).
The International Supercomputing Conference (ISC) will take place in Leipzig, Germany, next week from June 16th to June 20th, 2013. This year the program contains tutorials again and the team of Bronis de Supinski (LLNL), Michael Klemm (Intel) and myself will offer the Advanced OpenMP Programming tutorial on June 16th, 9:00 AM to 1:00 PM. If you are interested in learning about performance-focused OpenMP programming and the new features in OpenMP 4.0, this might be the right one for you, although we obviously cannot cover everything in detail in just the 4 hours we got. We asked for a full day, but got only a half one.
While we quickly review the basics of OpenMP programming, we assume attendees understand basic parallelization concepts and will easily grasp those basics. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of SIMD vector units. We discuss language features in-depth, with emphasis on features recently added to OpenMP such as tasking. We close with an overview of the new OpenMP 4.0 directives for attached compute accelerators. This is our detailed agenda:
- OpenMP Overview (15 minutes)
- Core Concepts: Parallel Region, Worksharing, Nesting
- Synchronization: Synchronization Constructs and the Memory Model
- Techniques to Obtain High Performance with OpenMP: Memory Access (45 minutes)
- Understanding Memory Access Patterns
- Memory Placement and Thread Binding
- Performance Tips and Tricks: Avoiding False Sharing, Private versus Shared Data
- Techniques to Obtain High Performance with OpenMP: Vectorization (30 minutes)
- Understanding Vector Microarchitectures
- Vectorization with OpenMP 4.0
- Advanced Language features (60 minutes)
- The OpenMP Tasking Model
- Tasking in Detail: Final, Mergeable, and Dependencies
- Misc. OpenMP 4.0 Features: Controlling the Implementation, Reduction Extensions, Improved Atomic Support
- OpenMP for Attached Compute Accelerators (45 minutes)
- The OpenMP Execution Model for Devices
- Target Construct
- OpenMP on the Intel Xeon Phi Coprocessor Examples
- 6. Future OpenMP Directions (15 minutes)
- Comprehensive OpenMP new Features Overview
- OpenMP 4.0 and beyond Status, Directions and Schedule
- Open Discussion of Possible OpenMP Extensions (until we got thrown out of the room or people have left for lunch)
In addition to the new features already present in release candidate one (RC1), the second draft of the next OpenMP specification release contains the following additions (quoted from openmp.org):
- Initial accelerators support: Device Data Environments (p16), target constructs (p68: target, target data, target update, declare target, teams, distribute; p151: map clause; and associated runtime routines (p191).)
- Task dependency support through the new depend clause. (p91)
- Initial error model support through cancel and cancellation point constructs to request cancellation of specified region types and to declare a user-defined cancellation point to check for cancellation requests. (Section 2.13, p116: Cancellation Constructs)
- Support for array sections in C, C++ and Fortran. (Section 2.4, p36: Array Sections)
- Extended declare simd directive to allow multiple declarations. (p64)
- New environment variable OMP_DISPLAY_ENV instructing the runtime to display the OpenMP version number and ICV values during initialization. (p219)
- Additional enhancements to support Fortran 2003.
As we were not yet able to incorporate all the feedback that has been reported so far, a few know issues are still in the document. Additionally, some more minor changes are already in preparation. Feedback and questions are of course still welcomed, so head over to http://openmp.org/wp/openmp-specifications/ and download the new document.