Monthly Archives: September 2008

Debugging parallel programs with Visual Studio: MPI (using Allinea DDTLite)

Just this week Allinea released it’s DDTLite plugin for Visual Studio 2008. I have been using a beta version for a couple of weeks now and in my humble opinion, DDTLite extends the MPI cluster debugger of Visual Studio 2008 with a must-have feature for any parallel debugger: Individual process control for MPI programs. With the capabilities provided by the MPI cluster debugger of Visual Studio, debugging MPI programs can be a pain as it is not possible to control MPI processes individually. That means if you select one process and execute it step-by-step, the other process will continue as well and there is no chance of stopping it from doing so (e.g. freezing as you can do with threads). This blog post is not intended to become an Allinea commercial, but I want to briefly demonstrate what DDTLite can do for you.

In order to debug MPI programs, you have to go to the project properties, choose Debugging in the left column, and select the MPI Cluster Debugger as the debugger to launch. Additionally you have to provide the following options (listed below along with my advices):

  • MPIRun Command: The location of mpirun. Specify the full path to the mpiexec program here, do not use “”, and do not omit the .exe extension.
  • MPIRun Arguments: Arguments to pass to mpirun, such as number of processes to start.
  • MPIShim Location: Location of mpishim.exe. As far as my experience goes, you avoid trouble if you copy mpishim.exe to a path that does not contain any white space (the original location is C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\Remote Debugger\x86 on a 32-bit system), again do not omit the .exe extension.

That said, your configuration could look like this:


If you then start the debugger (e.g. via F5), two MPI processes will be started and you can switch between them using the Processes window (you can enable that window via the menu: Debug –> Windows –> Processes):


From the menu via Tools –> Options… –> Debugging (in the left column) you can set the option Break all processes when one process breaks to influence what happens when a breakpoint is encountered. For the case of debugging MPI programs, you probably want this option to be enabled! But – as already mentioned above – when all processes were interrupted after a breakpoint has been hit, you cannot continue with just one process step-by-step, as the other process will always do a step as well. And this is where DDTLite comes into play…


After the plugin has been enabled (via the menu: Tools –> Add-in Manager…) you are presented with several additional windows, among this is the Selected Processes and Threads window to select and switch between processes and threads, as shown above. Via the Groups – Parallel View window you can select individual processes (in the screenshot above you can see that only the MPI process with rank 0, out of two MPI processes, is selected) and then control the selection (selecting a group of processes is possible as well) using the Visual Studio debugger as you do with a serial program. All MPI processes not currently selected stand still!

There is more in DDTLite: For example you can select a variable and go to the Variable – Parallel View window to receive a list of variable values by MPI rank (the screenshot below shows the iMyRank member of a struct type named data, which denotes the MPI rank).


Of course there are even more capabilities provided by DDTLite, but you can go to the product homepage and find out for yourself by grabbing a 30-day trial version (I used that trial to create the screenshots shown in this blog post). But I would like to add one additional note on the question of how many MPI processes you should use for debugging. Most parallel debuggers (including DDTLite and DDT) are advertised that they are capable of controlling hundreds (and even thousands) of MPI processes. I think that you will hardly ever need that! Instead, I bet that in 99% of the cases in which your MPI programs works fine with one and two processes but fails when using more, you will find the issue by using three or maybe five processes with your debugger. That is all you need for finding the usual off-by-one work-distribution error and similar things :-).

C++0x: OpenMP loop parallelization without pragmas?!

Some people are complaining that OpenMP’s approach of using pragmas to annotate a program is not very nice, as pragmas / the OpenMP directives are not well-integrated into the language. Personally, I like the OpenMP approach and think it has some specific advantages. But I am also very interested in researching how the OpenMP language bindings could be improved, especially for C++. This post is about using C++0x features to build parallelization constructs that have been praised in the context of other approaches (e.g. Parallel Extensions for C#, or Intel’s Threading Building Blocks), but using OpenMP constructs.

Let’s consider the following sequential loop which is very similar to the example used in the Microsoft Parallel Extensions to the .NET Framework 3.5 (June 2008) documentation:

01   double dStart, dEnd;
02   for (int rep = 0; rep < iNumRepetitions; rep++)
03   {04       dStart = omp_get_wtime();
05       for (int i = 0; i < iNumElements; i++)
06       {
07           vec[i] = compute(vec[i], iNumIterations);
08       }
09       dEnd = omp_get_wtime();10   }

The experiment loop (line 05 to line 08) is executed iNumRepetitions times, the time is taken in line 04 and 09 using OpenMP time measurement functions (portability!), and the time required for each element can be controlled via iNumIterations. I will use that parametrization for my performance experiments – for now let’s just look at how this would be parallelized in OpenMP:

#pragma omp parallel for shared(iNumElements, vec, iNumIterations)
        for (int i = 0; i < iNumElements; i++)
            vec[i] = compute(vec[i], iNumIterations);

Pretty straight forward – as this parallel loop is perfectly balanced, we do not need the schedule clause here. How could that loop look like without using pragmas? Maybe as shown here:

omp_pfor (0, iNumElements, [&](int i)
    vec[i] = compute(vec[i], iNumIterations);

Do you like that? No OpenMP pragma is visible in the user’s code, he just has to specify the loop iteration space and the loop variable, the parallelization is done “under the hood”. The implementation of this is pretty simple using lambda functions of C++0x:

template<typename F>
void omp_pfor(int start, int end, F x)
#pragma omp parallel for
    for(int __i = start; __i < end; __i++)

Of course I am still using OpenMP directives here, but they are hidden as an implementation detail. The actual loop body is passed as an argument to the omp_pfor lambda function, as well as the loop boundaries. Please note that this is just a very simple example, of course one can handle all types of loops that are currently supported in OpenMP 3.0 (any maybe even more) and STL-type algorithms!

In this post I only talked about syntax, but there is more to it. A part of my research is looking into how programmers (especially from the background of computation engineering science in Aachen) can be provided with more powerful language-based tools to ease writing parallel and reusable code / components. I am always happy to discuss on such a topic – if you like the Live Space comment functionality as little as I do, just drop me a mail at

You can download this example code from my website. In order to compile that code, I recommend using the latest Intel 11.0 beta compiler.

Building and Using BOOST.MPI on Windows HPC Server 2008

If you are a MPI programmer, you probably know that there are C++ bindings for MPI, which nowadays come with most MPI distributions. Personally, I find they are ugly and do not provide any advantage over using the plain C bindings. In addition, there are even some disadvantages in using the C++ bindings, as I encountered that they are causing problems for several MPI analysis tools (under some circumstances). If you are intending to use the MPI C++ bindings on Windows, you will find that MS-MPI does not come with them. So if you really need them, I would advise to use Intel MPI on Windows – but if you are interested in better C++ bindings for MPI, I would advise to take a look at the BOOST.MPI bindings.

This brief blog post is intended to get you started building and using BOOST.MPI on Windows HPC Server 2008. I will provide just basic BOOST build instructions and point you to the “Getting Started on Windows”-guide at the BOOST homepage for more details. This is how we typically build BOOST on our systems, our student Christopher Schleiden figured out the nifty details:

  • Download (53.4 MB) and (120 KB), which were the most current versions by the time of this writing (September 8, 2008).
  • Extract both archives, here I used X:\src\boost_1_36_0 as the destination path of the boost package itself and X:\src\bin as the destination path of the bjam tool.
  • Start a command shell and put the Visual Studio (2008) compiler in your path. You can easily get a suitable prompt via Start –> All Programs –> Microsoft Visual Studio 2008 –> Visual Studio Tools –> Visual Studio 2008 Command Prompt.
  • Put the bjam tool in your path, e.g. via set PATH=%PATH%;x:\src\bin.
  • Start the build process via X:\src\boost_1_36_0\boost_1_36_0> bjam –build-type=release –toolset=msvc –build-dir=x:\src\boost_1_36_0\build\90\32 –stagedir=x:\src\boost_1_36_0\stage\90-32 stage. This will take some time… A brief description on my options:
    • –build-type: You can choose between release and debug, or build both. Please take into account that the debug build will consume a significant amount of disc space.
    • –toolset: msvc stands for the Microsoft Visual Studio C/C++ compiler. On Windows you could also use the Intel C/C++ compiler and maybe even cygwin, but I never tried that.
    • –build-dir: In order to avoid trouble you should specify a directory to contain the intermediate files created during the build process. As we support Visual Studio version 2005 and 2008 for 32bit and 64bit targets, we created an appropriate naming scheme.
    • –stage-dir: This denotes the directory in which you intend to install boost.

Following this approach, the MPI bindings will be skipped! To enable the MPI library build process, you have to add –with-mpi to the bjam command line from above, or edit the user-config.jam file in subdirectory tools\build\v2 of your BOOST sources and add the line using mpi ; (notice the white spaces) to the bottom of that file – the latter is the approach preferred by me. If you are building on a Windows Compute Cluster 2003 machine, you will end up with the desired library. If you are building on a Windows HPC Server 2008 machine, you will receive the following error message:

MPI auto-detection failed: unknown wrapper compiler mpic++

This is because the MPI configuration just looks for MS-MPI v1, which typically resides in directory C:\Program Files\Microsoft Compute Cluster Pack. In order to make the auto-configuration look for MS-MPI v2, you have to edit the file mpi.jam in subdirectory tools\build\v2\tools and replace line 235 with the following:

local cluster_pack_path_native = “C:\\Program Files\\Microsoft HPC Pack 2008 SDK” ;

Future versions of BOOST will probably look for the MS-MPI v2 automatically. Of course this works as well on a Windows HPC Server 2008 as on a workstation having the HPC Pack SDK installed (e.g. on my notebook running Vista 32-bit). Once the stage target is completed, you will find five additional files in your BOOST target directory:

  • libboost_mpi-vc90-mt.lib
  • libboost_mpi-vc90-mt-1_36.lib
  • boost_mpi-vc90-mt.lib
  • boost_mpi-vc90-mt-1_36.dll
  • boost_mpi-vc90-mt-1_36.lib

You will find some examples for BOOST.MPI on page describing the BOOST.MPI bindings. In order to build those, you have to make the following changes to your project:

  • Add include path (C:\Program Files\Microsoft HPC Pack 2008 SDK\Include) and library path (C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386 for 32-bit applications or C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64 for 64-bit applications) for MS-MPI v2 and add msmpi.lib to the linker input.
  • Add BOOST to your include path (in my example: X:\src\boost_1_36_0\boost_1_36_0) and to your library path (in my example: X:\src\boost_1_36_0\stage\90-32\lib).

That’s all it takes! Note that if you have just compiled the release libraries of BOOST (as done in this example) and not the debug libraries, building your project using BOOST.MPI will fail (or cause you trouble), so for development purposes you should build both.