All members of the Microsoft Technology Adoption Program (TAP) for Windows HPC Server 2008 R2 just got mail that build number 2369 is ready for release. It is available via MS Connect already and will be made available via the usual channels in the coming days and weeks. We have been trying various builds throughout our participation in the TAP program – with varying success – and got a good overview of the new features in this product. As usual there are some features I really like and have been waiting for, and some features of questionable value.
The new product, both the HPC Pack 2008 R2 and the Windows Server 2008 R2 HPC Edition, will be available in two editions: Express (for traditional HPC usage including MS-MPI, the Job Scheduler and the Admin features you already know) and Enterprise (for SOA and Excel-based workload including everything from Express as well as new Excel and Workstation Cycle Stealing functionality). The HPC Pack 2008 R2 will also be available as a for Workstation only edition (giving you the Cycle Stealing functionality). I still have no clue in what version our licenses with Software Assurance will be converted, lets hope for Enterprise🙂.
What is new for traditional HPC users (such as our center)?
- The MPI stack (MS-MPI) has been improved and, for example, has been equipped with several environment variables to allow for more fine-granular control of the inner workings, i.e. which protocol sheme to use depending on the message size. Together with general performance improvements this offers some options for further performance tuning as well as analysis of the MPI behaviour.
- The option to boot compute nodes via iSCSI from the network has been introduced. What you need is a suitable iSCSI provider (ask your storage vendor, MS will offer an iSCSI provider development kit) and a suitable volume, Windows HPC Server 2008 R2 is intended to do the management for you. This is the feature I (personally) was most interested in. It took us until the appearance of the release candidate to get it working well with our NetApp installation, so our experience with this is still limited but I am very keen on seeing how this behaves with heavy job loads.
- Improved Diagnostics have been made available. Especially on the network side the options to (automatically) check the health of your cluster have been significantly improved, along with possibilities to test whether compute nodes are ok to run ISV codes. For the latter, we have written a lot of test on our own, and it took us a lot of time to get them right in detecting the most prominent issues with ISV codes. Providing well-integrated and extensive diagnostics is a great opportunity for ISVs to save their users from a lot of pain!
- In addition there are several other things, like new Scheduling Policies and an improved Admin Console. The new Windows HPC Server 2008 R2 support for 256 threads (I think the mean cores), instead of 64. It became significantly easier to run pre- and post-job scripts, or enable email notifications when the job status changes, and things like that. Once the R2 cluster is in production I intend to share our experiences with this…
A special focus of this release lies on support for “emerging workloads” – this is how Microsoft names it – based on Enterprise SOA, Excel and Desktop Cycle Stealing. I did not look into the SOA improvements so far, therefore no comment on that. A better integration of Excel with the HPC server is very welcomed, although we do not (yet) have real users for this in our center. You will be able to run distributed instances of Excel 2010 on a cluster where every instance is computing an individual workbook (with a difference dataset), or you can source out the computation of user-defined functions of Excel 2010 to the cluster. In the past myself (and a few others) experimented with using Excel to steer computing, for example optimizing a kernel with various parameters, and I am curious whether there will be more use of that in the future by directly attaching a computation into Excel.
Well, and then there is Desktop Cycle Stealing. The idea (as far as I got it) is to use Windows 7-based workstations to run jobs, without the tight integration into a cluster as regular compute nodes have. Admittedly my view is shaped by what we do in our center, but I do not think using desktops makes a lot of sense for what most people name HPC. We design our cluster in a way that applications run efficiently on it, i.e. by taking special networks. The network connection to a workstation, even if it is GE, is comparably weak. Compute nodes are centrally managed, equipped with efficient cooling, etc. – workstations are distributed and often not reliable. There may be some applications that can profit from getting some cycles here and there. But promising desktop cycle stealing to save some money for HPC-type ISV codes will not result in satisfied users, since these codes just do not run efficiently on a weakly coupled network of inhomogeneous machines. JM2C, as always I am happy to learn about counter examples.