Using C++ with OpenMP in jupyter notebooks

Many people might know jupyter notebooks as an interactive, web-based environment for python programming, often in the context of data analysis. However, if the cling kernel is used, it becomes possible to interactively work with C/C++ code. The so-called kernel is the interpreter used for the evaluation of code cells and cling is an interactive C++ interpreter. This could look like this:

C++ in jupyter notebook

In the IkapP project funded by the Stifterverband and state of North Rhine-Westphalia, one goal is to remove entry barriers faced by students using HPC systems in lectures. One step towards this goal is the creation of a virtual lab environment for parallel programming that can also be used for interactive experiments with different parallelization approaches. Users can, e.g., interactively experience performance results of code changes on real HPC systems. There are many parallel programming models in use and of relevance for our lectures, but we wanted to start with OpenMP and MPI. However, cling does not support OpenMP out of the box.

At the time of this writing, the current version of xeus-cling is 0.8.1, which is not based on a recent version of clang. So in principle, OpenMP Version 3.1 should be supported, which means tasking will be available, but offloading will not be available. OpenMP “in action” in a jupyter notebook could look like this:

C++ with OpenMP in jupyter notebook

In order for such notebooks to work correctly, we had to fix a few things in the xeus-cling code, in particular to ensure correct output from multiple threads. The corresponding patches were created and submitted by Jonas Hahnfeld, a student worker in the IkapP project at RWTH. They have been accepted to mainline (#314, #315, #320, #332, #319, #316, #324, #325 (also submitted to xeus-python) and #329), but since our submission there has been no new release.

Compiling and Installing xeus-cling on CentOS 7.7

The production environment on RWTH’s HPC systems is CentOS 7.7. The build instructions were compiled by Jonas. In order to build xeus-clang for a jupyter environment, do for each of the following projects (in this order):

https://github.com/jarro2783/cxxopts, https://github.com/nlohmann/json, https://github.com/zeux/pugixml, https://github.com/xtensor-stack/xtl, https://github.com/zeromq/libzmq, https://github.com/zeromq/cppzmq, https://github.com/jupyter-xeus/xeus, https://github.com/jupyter-xeus/xeus-cling

$ git clone https://github.com/org/repo src
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/path/to/install/xeus-cling/ \
-DCMAKE_C_COMPILER=/path/to/install/xeus-cling/bin/clang \
-DCMAKE_CXX_COMPILER=/path/to/install/xeus-cling/bin/clang++ \  ../src
$ make -j32
$ make install

After that, activate the kernels via

for k in xcpp11 xcpp14 xcpp17; do
cp -r ../../../../xeus-cling/share/jupyter/kernels/$k /path/to/install/jupyter/share/jupyter/kernels/;
done

and add -fopenmp to each kernel.json to enable OpenMP. Finally, let cling find the runtime libraries by adding to jupyterhub_config.py:

c.Spawner.environment = {
  'LD_LIBRARY_PATH': '/path/to/install/xeus-cling/lib/',
}

The Ongoing Evolution of OpenMP

Usually, I do not use this blog to talk directly about my work. I want to make one exception to point to the following article titles The Ongoing Evolution of OpenMP. It appeared online at IEEE and is accessible here: https://ieeexplore.ieee.org/document/8434208/.

From the abstract:
This paper presents an overview of the past, present and future of the OpenMP application programming interface (API). While the API originally specified a small set of directives that guided shared memory fork-join parallelization of loops and program sections, OpenMP now provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory. As we look toward the future of OpenMP, we immediately see further evolution of the support for that range of parallelization strategies and the addition of direct support for debugging and performance analysis tools. Looking beyond the next major release of the specification of the OpenMP API, we expect the specification eventually to include support for more parallelization strategies and to embrace closer integration into its Fortran, C and, in particular, C++ base languages, which will likely require the API to adopt additional programming abstractions

Webinar: Using OpenMP Tasking

With the increasing prevalence of multi-core processors, shared-memory programming models are essential. OpenMP is a popular, portable, widely supported and easy-to-use shared-memory model. Since version 3.0, released in 2008, OpenMP offers tasking to support the creation of composable parallel software blocks and the parallelization of irregular algorithms. However, the tasking concept requires a change in the way developers reason about the structure of their code and hence expose the parallelism of it. In this webinar, we will give an overview about the OpenMP tasking language features and performance aspects, such as introducing cut-off mechanisms and exploiting task dependencies.

The recording from the webinar is now available here: https://youtu.be/C8ekL2x4hZk.

Book: Using OpenMP – The Next Step

If everything goes according to plan, the book Using OpenMP – The Next Step will appear in time for SC17 (November 2017). The book is already available for pre-order on amazon: https://www.amazon.de/Using-Openmp-Next-Step-Accelerators/dp/0262534789/ref=sr_1_1?ie=UTF8&qid=1504249007&sr=8-1&keywords=using+openmp.

Book Cover
Book Cover: Using OpenMP – The Next Step

From the book’s blurb:

This book offers an up-to-date, practical tutorial on advanced features in the widely used OpenMP parallel programming model. Building on the previous volume, Using OpenMP: Portable Shared Memory Parallel Programming (MIT Press), this book goes beyond the fundamentals to focus on what has been changed and added to OpenMP since the 2.5 specifications. It emphasizes four major and advanced areas: thread affinity (keeping threads close to their data), accelerators (special hardware to speed up certain operations), tasking (to parallelize algorithms with a less regular execution flow), and SIMD (hardware assisted operations on vectors).

As in the earlier volume, the focus is on practical usage, with major new features primarily introduced by example. Examples are restricted to C and C++, but are straightforward enough to be understood by Fortran programmers. After a brief recap of OpenMP 2.5, the book reviews enhancements introduced since 2.5. It then discusses in detail tasking, a major functionality enhancement; Non-Uniform Memory Access (NUMA) architectures, supported by OpenMP; SIMD, or Single Instruction Multiple Data; heterogeneous systems, a new parallel programming model to offload computation to accelerators; and the expected further development of OpenMP.

Webinar: Getting Performance from OpenMP Programs on NUMA Architectures

Most contemporary shared memory systems expose a non-uniform memory architecture (NUMA) with implications on application performance. However, the OpenMP programming model does not provide explicit support for that. This 30-minute live webinar will discuss the approaches to getting the best performance from OpenMP applications on NUMA architecture.

The recording from the webinar is now available here: https://pop-coe.eu/blog/2nd-pop-webinar-getting-performance-from-openmp-programs-on-numa-architectures.