Affinity in OpenMP 6.0 on taskloop construct

During last week’s SC24 conference in Atlanta, GA, I briefly reported on the activity of the Affinity subcommittee of the OpenMP language committee. One topic was that, together with the Tasking subcommittee, we brought support for taskloop affinity to OpenMP 6.0, which I am going to describe here.

As you are probably well aware, the OpenMP specification currently allows for the use of the depend and affinity clauses on task constructs. The depend clause provides a mechanism for expressing data dependencies among tasks, and the affinity clause functions as a hint to guide the OpenMP runtime where to execute the tasks, preferably close to the data items specified in the clause. However, this functionality was not made available when the taskloop construct was added, which parallelizes a loop by creating a set of tasks, where each task typically handles one or more iterations of the loop. Specifically, the depend clause could not be used to express dependencies, either between tasks within a taskloop or between tasks generated by a taskloop and other tasks, limiting its applicability.

OpenMP 6.0 introduced the task_iteration directive, which, when used with a taskloop construct, allows for fine-grained control over the creation and properties of individual tasks within the loop. Each task_iteration directive within a taskloop signals the creation of a new task with corresponding properties. With this functionality, one can express:

  • Dependencies: The depend clause on a task_iteration directive allows to specify data dependencies between tasks generated by the taskloop as well between tasks of this taskloop and other tasks (standalone and e.g. generated by other taskloops).
  • Affinity: The affinity clause can be used to specify data affinity for individual tasks. This enables optimizing data locality and improving cache utilization.
  • Conditions: The if clause can be used to conditionally generate tasks within the taskloop. This can be helpful for situations where not all iterations of the loop need to generate a dependency, in particular to reduce overhead.

Let’s consider the following artificial example code.

// TL1 taskloop
#pragma omp taskloop nogroup
for (int i = 1; i < n; i++)
{
   #pragma omp task_iteration depend(inout: A[i]) depend(in: A[i-1])
   A[i] += A[i] * A[i-1];
}


// TL2 taskloop + grainsize
#pragma omp taskloop grainsize(strict: 4) nogroup
for (int i = 1; i < n; i++)
{
   #pragma omp task_iteration depend(inout: A[i]) depend(in: A[i-4])
\
if ((i % 4) == 0 || i == n-1)
   A[i] += A[i] * A[i-1];
}


// T3 other task
#pragma omp task depend(in: A[n-1])

The first taskloop TL1 construct parallelizes a loop that has an obvious dependency: every iteration i depends on the previous iteration i-1. This is expressed with the depend clause accordingly. Consequently, this will manifest in dependencies between tasks generated by this taskloop.

The second taskloop TL2 parallelized the loop by creating tasks that each execute four iterations, because of the grainsize clause with the strict modifier. In addition, a task dependency is only created if the expression of the if clause evaluates to true, limiting the overall number of dependencies per task

The remaining standalone task T3 is a regular explicit task that depends on the final element of array A, that is produced by the last task of TL2, and hence ensures the completion of all previously generated tasks.


Webinar: Using OpenMP Tasking

With the increasing prevalence of multi-core processors, shared-memory programming models are essential. OpenMP is a popular, portable, widely supported and easy-to-use shared-memory model. Since version 3.0, released in 2008, OpenMP offers tasking to support the creation of composable parallel software blocks and the parallelization of irregular algorithms. However, the tasking concept requires a change in the way developers reason about the structure of their code and hence expose the parallelism of it. In this webinar, we will give an overview about the OpenMP tasking language features and performance aspects, such as introducing cut-off mechanisms and exploiting task dependencies.

The recording from the webinar is now available here: https://youtu.be/C8ekL2x4hZk.

Book: Using OpenMP – The Next Step

If everything goes according to plan, the book Using OpenMP – The Next Step will appear in time for SC17 (November 2017). The book is already available for pre-order on amazon: https://www.amazon.de/Using-Openmp-Next-Step-Accelerators/dp/0262534789/ref=sr_1_1?ie=UTF8&qid=1504249007&sr=8-1&keywords=using+openmp.

Book Cover
Book Cover: Using OpenMP – The Next Step

From the book’s blurb:

This book offers an up-to-date, practical tutorial on advanced features in the widely used OpenMP parallel programming model. Building on the previous volume, Using OpenMP: Portable Shared Memory Parallel Programming (MIT Press), this book goes beyond the fundamentals to focus on what has been changed and added to OpenMP since the 2.5 specifications. It emphasizes four major and advanced areas: thread affinity (keeping threads close to their data), accelerators (special hardware to speed up certain operations), tasking (to parallelize algorithms with a less regular execution flow), and SIMD (hardware assisted operations on vectors).

As in the earlier volume, the focus is on practical usage, with major new features primarily introduced by example. Examples are restricted to C and C++, but are straightforward enough to be understood by Fortran programmers. After a brief recap of OpenMP 2.5, the book reviews enhancements introduced since 2.5. It then discusses in detail tasking, a major functionality enhancement; Non-Uniform Memory Access (NUMA) architectures, supported by OpenMP; SIMD, or Single Instruction Multiple Data; heterogeneous systems, a new parallel programming model to offload computation to accelerators; and the expected further development of OpenMP.