Just this week Allinea released it’s DDTLite plugin for Visual Studio 2008. I have been using a beta version for a couple of weeks now and in my humble opinion, DDTLite extends the MPI cluster debugger of Visual Studio 2008 with a must-have feature for any parallel debugger: Individual process control for MPI programs. With the capabilities provided by the MPI cluster debugger of Visual Studio, debugging MPI programs can be a pain as it is not possible to control MPI processes individually. That means if you select one process and execute it step-by-step, the other process will continue as well and there is no chance of stopping it from doing so (e.g. freezing as you can do with threads). This blog post is not intended to become an Allinea commercial, but I want to briefly demonstrate what DDTLite can do for you.
In order to debug MPI programs, you have to go to the project properties, choose Debugging in the left column, and select the MPI Cluster Debugger as the debugger to launch. Additionally you have to provide the following options (listed below along with my advices):
- MPIRun Command: The location of mpirun. Specify the full path to the mpiexec program here, do not use “”, and do not omit the .exe extension.
- MPIRun Arguments: Arguments to pass to mpirun, such as number of processes to start.
- MPIShim Location: Location of mpishim.exe. As far as my experience goes, you avoid trouble if you copy mpishim.exe to a path that does not contain any white space (the original location is C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\Remote Debugger\x86 on a 32-bit system), again do not omit the .exe extension.
That said, your configuration could look like this:
If you then start the debugger (e.g. via F5), two MPI processes will be started and you can switch between them using the Processes window (you can enable that window via the menu: Debug –> Windows –> Processes):
From the menu via Tools –> Options… –> Debugging (in the left column) you can set the option Break all processes when one process breaks to influence what happens when a breakpoint is encountered. For the case of debugging MPI programs, you probably want this option to be enabled! But – as already mentioned above – when all processes were interrupted after a breakpoint has been hit, you cannot continue with just one process step-by-step, as the other process will always do a step as well. And this is where DDTLite comes into play…
After the plugin has been enabled (via the menu: Tools –> Add-in Manager…) you are presented with several additional windows, among this is the Selected Processes and Threads window to select and switch between processes and threads, as shown above. Via the Groups – Parallel View window you can select individual processes (in the screenshot above you can see that only the MPI process with rank 0, out of two MPI processes, is selected) and then control the selection (selecting a group of processes is possible as well) using the Visual Studio debugger as you do with a serial program. All MPI processes not currently selected stand still!
There is more in DDTLite: For example you can select a variable and go to the Variable – Parallel View window to receive a list of variable values by MPI rank (the screenshot below shows the iMyRank member of a struct type named data, which denotes the MPI rank).
Of course there are even more capabilities provided by DDTLite, but you can go to the product homepage and find out for yourself by grabbing a 30-day trial version (I used that trial to create the screenshots shown in this blog post). But I would like to add one additional note on the question of how many MPI processes you should use for debugging. Most parallel debuggers (including DDTLite and DDT) are advertised that they are capable of controlling hundreds (and even thousands) of MPI processes. I think that you will hardly ever need that! Instead, I bet that in 99% of the cases in which your MPI programs works fine with one and two processes but fails when using more, you will find the issue by using three or maybe five processes with your debugger. That is all you need for finding the usual off-by-one work-distribution error and similar things🙂.