Tag Archives: SC11

On the future of HPC on Windows

Just a few weeks ago during SC11 Microsoft released two new or updated HPC products, namely Windows Azure HPC Scheduler and Windows HPC Server 2008 R2 SP3. However, what I saw and heard during the last few months as well as during SC11 did not give me the best feeling for the future of Microsoft’s HPC Server product. This post is on my impressions and thoughts not only on the product, but also on doing HPC on the Windows platform in general.

What disturbed me a little was the absence of any roadmap presentation. Well, over the last few years Windows HPC Server clearly has become mature enough to not lack any significant feature necessary for deployment and use on a medium-sized HPC installation. However, Microsoft publically outlining a product roadmap with several key features always felt right, and it’s absence at SC11 has been noted by the community. Furthermore, they quietly killed their Dryad project (including LINQ to HPC), which was prominently displayed at SC10, now betting¬† on a yet-to-be-released distribution of Apache Hadoop for Windows HPC Server and Azure. Finally, there have been several business restructuring activities inside Microsoft. For example, here in Germany Microsoft apparently shut down the HPC group and moved (some of) the people under the hood of Azure. From what I heard, all these activities caused some confusion in the community on how Microsoft sees the future of the Windows HPC Server product and how much support and innovations may be expected from the company on this regard.

What Microsoft now talks a lot about is the Azure integration. If you followed the development of Windows HPC Server up to release R2 SP3, you could clearly see this coming. From a technology point of view, I am impressed. However, I am not convinced yet, for several reasons – the most important one being the offer much too expensive for our application needs. Of course we are following what is going on regarding Clouds and HPC, and in fact in one project we are extending one application to make use of both on-premise and off-premis compute power based on availability (and maybe even price). But for the time being, our local clusters, including the one running Windows, will clearly dominate (or, as we Germans say, set the tone).

Finally, I am missing a clear picture of HPC-related improvements in the Windows Server roadmap. Just recently we added a frontend system with 160 (logical) cores, this is 8 sockets, 512 GB of memory. Windows just works on such a machine – but it could do better. It could serve HPC applications better. And given that next-gen ordinary (HPC) systems probably have a similar core count, Windows really has to serve applications better on such machines in order to stay competitive. Furthermore, smooth and stable integration of accelerators – be it GPGPUs, or something different but similar in spirit – will be as important at least.

Windows Task-Manager with 160 cores (8 sockets)

Windows Task-Manager with 160 cores (8 sockets)

I will stop here. Our user base is clearly showing a demand for Windows HPC Server-based clusters, and in fact the demand is growing. Trying to combine my personal opinion with the feedback and opinions I got from the (German) community, Microsoft has to improve the communication regarding Windows HPC Server. It is time for a clear statement regarding the future of the product and the directions it will be going to.

OpenMP and OpenACC

If you attended SC11, you might have noticed some buzz around OpenACC. Well, at least I did. For example, today’s OpenMP BOF had some information on this. I want to use this blog post to add some general comments and insights on the developments and direction of the OpenMP language committee as well as what has lead to OpenACC. As always you have to understand that these statements are mine only, on this blog I do not speak in any official role.

Since quite a while now, OpenMP is moving into the accelerator space, with the work done by the OpenMP for Accelerators subcommittee of the OpenMP Language Committee. That subcommittee publically presented the status of their work at the last IWOMP, where James Beyer et al had a paper on that particular topic (PDF of their presentation). They invested a lot of effort and made good progress since then. In order to make support for accelerators happen in OpenMP, they have to achieve three goals: (i) provide support for Slicing and Shaping expressions, (ii) provide support for data management constructs and clauses, and finally (iii) provide support to denote kernels and constructs for execution on the accelerator. For all three items the subcommittee looked at existing other proposals, particularly from PGI, BSC and CAPS, but also from others. There are good proposals underway for (i) and (ii) which probably are backed by a majority in the language committee, since this functionality may turn out to be very handy to drive other features and proposals as well. Just as an example we are aiming for improved support for Affinity of threads and data, which requires Slicing and Shaping of array expressions.

However, support for (iii) is really tough, if one wants to integrate well with the rest of OpenMP and allow for future extensions. An important design goal is that OpenMP will support not just one particular type of accelerator, but rather be widely applicable to different kinds of devices from different vendors. These are the reasons for OpenMP developing with the slow speed it is. We are planning for a public draft of OpenMP 4.0 for SC12, one year from now.

In order to allow for faster development and ignoring the OpenMP integration just for a moment, the OpenACC standard initiative was formed and basically is a spin-off of the OpenMP Language Committee. Personally, I see this as a beta of OpenMP for Accelerators, and I hope that this initiative will help to collect valuable feedback on how pragma-based accelerator programming has to look like. Cray, PGI and CAPS all have announced to implement the specification as it is currently. When it comes to getting the resources for that, it is much easier to implement this spin-off spec, instead of implementing an incompleted proposal draft. This is what I like the OpenACC effort for. Any by the way, it was prominently promoted during the NVIDIA keynote at SC11 on Tuesday morning.

However, what I do not like is, how it was marketed. People did not get the relation to OpenMP. They way it was published it was not clear that effort from other parties was involved in the development as well, not just the ones mentioned on the website. In fact, many people who visited the booth thought that OpenACC is about to become a competitor for OpenMP in the accelerator domain. This is not true, it is clearly the intend to feed back the OpenACC development into the next OpenMP specification. While clearly hope for the SC12 time frame to release a draft, but until then we have several technical problems to solve.