Upcoming OpenMP Tutorials

This blog post is to announce three OpenMP tutorial events that I have committed to. As usual, my OpenMP tutorials focus on Tasking early on and when it comes to performance, I will talk about dealing with NUMA architectures and thread + data affinity in detail. So if you are interested in learning more about these topics and in getting hands-on experience, the tutorials might be of interest for you.

The first one is in about two weeks from now at the Hartree Centre in the UK and part of the Hartree Summer School Series 2014. This summer school consists of three weeks in total, of which the first one is dedicated to Visualization, the second one to High Performance Computing (HPC) and the last and third week is all about Big Data. The week on HPC covers all the HPC programming foundations you might need (I would say), including my part on OpenMP.

The second tutorial event is in September as part of the IWOMP 2014 workshop in Salvador in Brazil. This year’s IWOMP will host two tutorials, the first one is a full-day Introduction to OpenMP given by my colleague Dirk Schmidl and myself. We will do an experiment this year in that we partition the tutorial into many small parts of roughly 20 minutes per topic. During these short slots we will present a specific topic, and each slot will directly be followed by practical hands-on exercises or live demos on the given topic. The second tutorial at IWOMP 2015 2014 will be a half-day tutorial on the OpenMP Accelerator Model given by Eric Stotzer. The plan is that attendees can decide for their specialization: we teach the basics in the morning and go into performance tuning for “traditional” architectures in the afternoon, while Eric will cover the target construct in detail in the afternoon.

Finally the third tutorial will be at SC14 in New Orleans in November, as our Advanced OpenMP Tutorial has been accepted again. This tutorial is really about advanced OpenMP programming for performance, as we want to enable an in-depth understanding of advanced OpenMP constructs and features to provide attendees with a set of performance and scalability recipes that can be applied to improve performance of OpenMP applications. We will also explain how to write new code for and extend existing OpenMP code to compute accelerators with the new OpenMP 4.0 capabilities and in order to do so we extended the team of previous years (consisting of Bronis R. de Supinski, Michael Klemm, Ruud van der Pas and myself) with Eric Stotzer to cover this aspect in detail.

Posted in NUMA, OpenMP, Tasking | Tagged ,

PPCES Video Lectures on OpenMP, MPI and Xeon Phi release

Since 2001 already, the IT Center (formerly: Center for Computing and Communication) of RWTH Aachen University offers a one week HPC workshop on Parallel Programming during spring time. This course is not restricted to scientists and engineers from our university, in fact we have about 30% of external attendees each time. This year we were very happy about a record attendance of up to 85 persons for the OpenMP lectures on Wednesday. As usual we publish all course materials online, but this year we also created screencasts from all presentations. That means you see the slides and the live demos and you hear the presenter talk. This blog post contains links to both the screencasts as well as the other course material, sorted by topic.

OpenMP

We have three talks as an introduction to OpenMP from Wednesday and two talks on selected topics from Thursday, which were vectorization and tools.

Introduction to OpenMP Programming (part 1), by Christian Terboven:

 

Getting OpenMP up to Speed, by Ruud van der Pas:

 

Introduction to OpenMP Programming (part 2), by Christian Terboven:

 

Vectorization with OpenMP, by Dirk Schmidl:

 

Tools for OpenMP Programming, by Dirk Schmidl:

 

MPI

We have two talks as an introduction to MPI and one on using the Vampir toolchain, all from Tuesday.

Introduction to MPI Programming (part 1), by Hristo Iliev:

 

Introduction to MPI Programming (part 2), by Hristo Iliev:

 

Introduction to VampirTrace and Vampir by Hristo Iliev:

 

Intel Xeon Phi

We put a special focus on presenting this architecture and we have one overview talk and one talk on using OpenMP 4.0 constructs for this architecture.

Programming the Intel Xeon Phi Coprocessor Overview, by Tim Cramer:

 

OpenMP 4.0 for Accelerators, by Christian Terboven:

 

Other talks

Some more talks, for instance on using our cluster or basics of parallel computer architectures, can be found in the youtube channel: https://www.youtube.com/channel/UCtdrEoe46tD2IvJJRs_JH1A.

Posted in NUMA, OpenMP, Tasking, University | Tagged , , , , , , , , , | 3 Comments

HPC matters

What is HPC?

This is a very nice, professionally made video on why HPC matters, aiming to motivate you to attend SC14: https://www.youtube.com/watch?v=zJybFF6PqEQ&feature=youtu.be

It pretty much captures why I like to work in HPC, namely that I can come into contact with so much technology, so many different scientific topics, and such many different people from all over the world.

Posted in Future of HPC, University | Tagged , , ,

New article on OpenMP 4.0 online

A while ago I published a list with articles and tutorials on OpenMP 4.0, including the German article on heise Developer I wrote together with Michael Klemm (Intel). A slightly modified English version of our text now appeared in issue 16 of Intel’s Parallel Universe magazine, titled Full throttle: OpenMP 4.0.

The current issue and also past issues of the Parallel Universe magazine are available at http://software.intel.com/en-us/intel-parallel-universe-magazine. If you are interested in developing parallel code for Intel architectures you might find some interesting reads over there.

Posted in C++, NUMA, OpenACC, OpenMP, Tasking | Tagged , , , , ,

Gaussian Elimination Squad wins Intel Parallel Universe Computing Challenge at SC13

The German team with the glorious name Gaussian Elimination Squad made the first rank in the Intel Parallel Universe Computing Challenge! Each round of the challenge consisted of two parts: the first was a trivia challenge with 20 questions about computing, computer history, programming languages, and the SC conference series; the second part was a coding challenge, which gave each team ten minutes to speed up a piece of code they had never seen before as much as possible. On top of it all, the audience could watch what the teams were doing on two giant screens. Georg Hager, our team caption, has a blog post with all the details.

The competition was really a lot of fun and a nice distraction from an otherwise pretty busy SC13. There is a short video capturing the atmosphere during the final competition and also a brief article on insideHPC.

The Gaussian Elimination Squad represented the German HPC community, with members from RWTH Aachen (Christian Terboven and Joachim Protze), Jülich Supercomputing Center (Damian Alvarez), ZIH Dresden (Michael Kluge and Guido Juckeland), TU Darmstadt (Christian Iwainsky), Leibniz Supercomputing Center (Michael Ott), and Erlangen Regional Computing Center (Gerhard Wellein and Georg Hager). As only four team members were allowed per match, I was lucky to play together with Gerhard and Georg in all rounds, but the others helped us by shouting advice and answers they thought were correct.

Posted in OpenMP, Private, University | Tagged , , ,

SC13 Tutorial on Advanced OpenMP: Performance and 4.0 Features

You will attend SC13 in Denver and you want to learn about using the new OpenMP 4.0 features? Our tutorial will help you out.

SC13 Advanced OpenMP Tutorial: OpenMP 4.0 Features

SC13 Advanced OpenMP Tutorial: OpenMP 4.0 Features

Posted in Future of HPC, NUMA, OpenACC, OpenMP, Tasking | Tagged , , , , , , , ,

Articles and Tutorials on OpenMP 4.0

You should have heard by now that OpenMP 4.0 has finally been released, you can find the official statement on openmp.org: http://openmp.org/wp/2013/07/openmp-40/. It really is a major new release and therefore it will take a while until all implementations have incorporated all new features. Nevertheless, as some implementers already offer beta releases of their compiler products with some new OpenMP 4.0 features available, you might be interested in learning more about the new standard to get your hands dirty. In this blog post I collected links to the OpenMP 4.0 material I am currently aware of and give pointers to places and events at which you can learn more.

First, if you are fine with reading a German article, my friend Michael Klemm and I have written an overview piece discussing the most important changes and new additions (from our point of view), including some code examples. It has been published at heise Developer here: http://www.heise.de/developer/artikel/Die-wichtigsten-Neuerungen-von-OpenMP-4-0-1915844.html. Together we also gave a corresponding presentation at parallel 2013, of which I made the slides available on my blog (slides in English), again with several code snippets.

End of July / early August we hold our “Parallel Programming Summer Course” at Aachen, during which OpenMP occupied two days of the agenda. The course material contains three slide decks on OpenMP which give a thorough introduction (I hope) into OpenMP Programming and touch the following new OpenMP 4.0 feature: device construct, task dependencies, thread affinity, array sections and user-defined reductions. I gave very similar talks at the Hartree Centre Summer School 2013.

Rolf Rabenseifner from HLRS also holds many very good courses on parallel programming. He is currently extending his material to cover selected OpenMP 4.0 topics, probably for the next course instance already.

If you attended ISC’13 in Leipzig, you had the chance to hear Bronis de Supinsky, Michael Klemm and myself in the half-day Advanced OpenMP Tutorial. Our slides are part of the tutorial proceedings.

At SC13 in Denver the same group plus Ruud van der Pas will talk about Advanced OpenMP: Performance and 4.0 Features, see http://sc13.supercomputing.org/content/tutorials. This will be the first time we will focus in great detail on new features of OpenMP 4.0 and how to exploit those for programmability and performance. And finally at Euro-Par 2013 together with Tim Mattson I will be giving a half-day tutorial on Advanced OpenMP again, this time even more focussing on lower-level system details like the memory model and cache coherency mechanisms.

Posted in NUMA, OpenACC, OpenMP