As a member of the hpc.nrw regional network, I have recorded 10 video sessions for an online OpenMP tutorial. Each part consists of a short video on one selected aspect of OpenMP, followed by a couple of quiz questions for self-control. The tutorial has been designed to be platform-independent and to work with every operating system with an OpenMP-compatible compiler available. However, my examples are limited to C/C++.
All material is provided under a Creative Commons license. The topics that are currently available are:
Overview
This part provides a brief history of OpenMP and then introduces the concept of the parallel region: find it here.
Worksharing
This part introduces the concept of OpenMP worksharing, loop scheduling, and the first synchronization mechanisms: find it here.
Data Scoping
This part provides an overview of one of the most challenging parts (well, probably at first sight) of OpenMP: data scoping. It discusses the differences between private
, firstprivate
, lastprivate
and shared
variables and also explains the reduction operation: find it here.
False Sharing
This part explains the concept of caches in parallel computer architectures, discusses the problem of false sharing, and shows how to avoid it: find it here.
Tasking
This part introduces task parallelism in OpenMP. This concept enables the programmer to parallelize code regions with non-canonical loops or regions which do not use loops at all (including recursive algorithms): find it here.
Tasking and Data Scoping
This part deepens the knowledge of OpenMP task parallelism and data scoping by using an artificial example: find it here.
Tasking and Synchronization
This session discusses different synchronization mechanisms for OpenMP task parallelism: find it here.
Loops and Tasks
This part presents the taskloop construct in OpenMP: find it here.
Task Scheduling
This part explains how task scheduling works in OpenMP: find it here.
Non-Uniform Memory Access
This part explains a non-uniform memory access (NUMA) architecture may influence the performance of OpenMP programs. It illustrates how to distribute data and bind threads across NUMA domains and how to avoid uncontrolled data or thread migration: find it here.
What is missing? Please let me know which aspects of OpenMP you would like to see covered in one of the next small bites. Just to let you know, some parts on GPU programming with OpenMP are already in preparation and will hopefully be released in the next lecture-free period.