<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Christian&#039;s corner on the Web</title>
	<atom:link href="http://terboven.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://terboven.com</link>
	<description>Random Thoughts, mostly on Parallel Programming - by Christian Terboven.</description>
	<lastBuildDate>Wed, 10 Apr 2013 04:20:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='terboven.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Christian&#039;s corner on the Web</title>
		<link>http://terboven.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://terboven.com/osd.xml" title="Christian&#039;s corner on the Web" />
	<atom:link rel='hub' href='http://terboven.com/?pushpress=hub'/>
		<item>
		<title>OpenMP 4.0 RC2 has been released</title>
		<link>http://terboven.com/2013/03/14/openmp-4-0-rc2-has-been-released/</link>
		<comments>http://terboven.com/2013/03/14/openmp-4-0-rc2-has-been-released/#comments</comments>
		<pubDate>Thu, 14 Mar 2013 10:27:23 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[IWOMP]]></category>
		<category><![CDATA[Supercomputing]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10435</guid>
		<description><![CDATA[In addition to the new features already present in release candidate one (RC1), the second draft of the next OpenMP specification release contains the following additions (quoted from openmp.org): Initial accelerators support: Device Data Environments (p16), target constructs (p68: target, target data, target &#8230; <a href="http://terboven.com/2013/03/14/openmp-4-0-rc2-has-been-released/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10435&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In addition to the new features already present in release candidate one (RC1), the second draft of the next OpenMP specification release contains the following additions (quoted from <a href="http://www.openmp.org" target="_blank">openmp.org</a>):</p>
<ul>
<li>Initial accelerators support: Device Data Environments (p16), <strong>target</strong> constructs (p68: <strong>target, target data, target update, declare target, teams, distribute</strong>; p151: <strong>map</strong> clause; and associated runtime routines (p191).)</li>
<li>Task dependency support through the new <strong>depend</strong> clause. (p91)</li>
<li>Initial error model support through <strong>cancel</strong> and <strong>cancellation point</strong> constructs to request cancellation of specified region types and to declare a user-defined cancellation point to  check for cancellation requests. (Section 2.13, p116: Cancellation Constructs)</li>
<li><span style="line-height:1.5;">Support for array sections in C, C++ and Fortran. (Section 2.4, p36: Array Sections)</span></li>
<li><span style="line-height:1.5;">Extended </span><strong style="line-height:1.5;">declare simd</strong><span style="line-height:1.5;"> directive to allow multiple declarations. (p64)</span></li>
<li><span style="line-height:1.5;">New environment variable </span><strong style="line-height:1.5;">OMP_DISPLAY_ENV </strong><span style="line-height:1.5;">instructing the runtime to display the OpenMP version number and ICV values during initialization. (p219)</span></li>
<li><span style="line-height:1.5;">Additional enhancements to support Fortran 2003.</span></li>
</ul>
<p><span style="font-size:medium;"><span style="line-height:24px;">As we were not yet able to incorporate all the feedback that has been reported so far, a few know issues are still in the document. Additionally, some more minor changes are already in preparation. Feedback and questions are of course still welcomed, so head over to <a href="http://openmp.org/wp/openmp-specifications/" target="_blank">http://openmp.org/wp/openmp-specifications/</a> and download the new document.</span></span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10435/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10435&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2013/03/14/openmp-4-0-rc2-has-been-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>OpenMP 4.0 RC2 is well on it&#8217;s way</title>
		<link>http://terboven.com/2013/02/27/openmp-4-0-rc2-is-well-on-its-way/</link>
		<comments>http://terboven.com/2013/02/27/openmp-4-0-rc2-is-well-on-its-way/#comments</comments>
		<pubDate>Wed, 27 Feb 2013 15:48:29 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[GPGPU]]></category>
		<category><![CDATA[OpenMP4Acc]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Tasking]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10431</guid>
		<description><![CDATA[Long time no blog post. But I have good news to share today: Yesterday the OpenMP Language Committee (LC) hold the final votes on a set of tickets (read: extensions or corrections) to find their way into OpenMP 4.0 RC2, &#8230; <a href="http://terboven.com/2013/02/27/openmp-4-0-rc2-is-well-on-its-way/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10431&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Long time no blog post. But I have good news to share today: Yesterday the OpenMP Language Committee (LC) hold the final votes on a set of tickets (read: extensions or corrections) to find their way into OpenMP 4.0 RC2, the second release candidate of OpenMP 4.0, the anticipated next version of the specification. These tickets are basically the outcome of the last <a href="http://openmp.org/wp/2013/01/openmp-language-committee-meeting-jan-28-31/" target="_blank">LC meeting in January</a> plus some first feedback we received on RC1. And they bring some very nice new features to OpenMP (some of which are well overdue).</p>
<p>Before I give a brief overview of the new additions, some remarks on the procedure leading to the final OpenMP 4.0 specification. Our aim is to have the RC2 document ready and published in roughly two weeks from now. This is now hard work for our editor <a href="http://www.rchrd.com/" target="_blank">Richard Friedman</a>, as all tickets are written as a diff to the currently latest spec, namely RC1. After the release of RC2, we will again solicit feedback on the new spec draft. This feedback is important for the final voting by all OpenMP members in the Architecture Review Board (ARB), as the ARB is the owner of the spec and has to formally accept the new spec proposed by the LC. Only then &#8211; given majority acceptance in the ARB vote &#8211; OpenMP 4.0 will be released. During the feedback period, the LC still has to complete some aspects of the spec, which are not ready yet, like the appendix. Especially the examples are not complete. And if anyone of the public reviewers finds a serious flaw in any of the new extensions, we will face the problem of fixing if quickly (if possible) or withdrawing it from OpenMP 4.0. This means the following additions have very high probability to be part of OpenMP 4.0, but nothing is guaranteed yet.</p>
<p><strong>Cancellation</strong>. At so-called cancellation points the implicit and explicit tasks check for whether cancellation has been requested and if so, they abort the current region and jump right to the end of it. Cancellation can be requested via the new <em>cancel</em> construct, which is able to cancel either the whole parallel region, the innermost sections worksharing construct, the innermost for worksharing construct, or the innermost taskgroup, with innermost always defined regarding to the thread team encountering the cancel construct. Control flow will resume right after the end of the cancelled region. A thread or task requesting cancellation  does not lead to immediate abort of all other threads or tasks in the respective region, instead these will only abort execution once a cancellation points has been reached. Cancellation points are part of  barriers, the cancel construct itself, or used-defined via the <em>cancellation point</em> construct. This addition is the first step to a fully-featured error model in OpenMP.</p>
<p><strong>Task Dependencies</strong>. The optional <em>depend</em> clause on a task enforces additional constraints on the scheduling of a task by enabling dependencies  between sibling tasks. It is of the form <em>depend(dependency-type: list)</em> accepting a list of variables. If an <em>in</em> dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an <em>out</em> or <em>inout</em> clause. If an <em>out</em> or <em>inout</em> dependence-type is given, the generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an <em>in</em>, <em>out</em>, or <em>inout</em> clause. The principle model you should have in mind with this feature is the flow of data: if a variable appears in an in-type depend clause of a given task, this given task has to wait for all task in which this particular variable appears in an out-type depend clause, as these tasks first have to write to the variable as otherwise the update would be lost. And vice versa. This is why out/inout-type dependences also enforce &#8220;waiting&#8221; for out/inout-dependences, as hereby an ordering of the task execution is enforced. In the current form of this feature, there is no observable difference in task scheduling between the out and inout dependence types, but we forsee to allow certain types of optimizations in the future if there is a distinction between out and inout. Additionally we found that having both types better maps to the data-flow model.</p>
<p><strong>Array Sectioning</strong>. Several clauses in OpenMP 4.0 require the ability to describe a subset of native arrays, especially the support for accelerators in the target construct (see below). Array sectioning allows to define a subset of the elements in an array via <em>[lower-bound : length]</em>, or <em>[lower-bound :]</em>, or <em>[: length]</em> or <em>[:]</em>. The use of array sectioning is restricted to selected constructs and clauses.</p>
<p><strong>Support for Accelerators</strong>. This is really the big new thing in OpenMP 4.0 which the LC aimed for getting ready for inclusion. The <em>target</em> construct allows for the execution of OpenMP constructs on a device other than the current host/device. The <em>target data</em> construct creates a device data environment and allows for the &#8220;mapping&#8221; of data between the different devices &#8211; for current accelerators this means copying data from the host to the device and vice versa. The <em>declare target</em> directive instructs the OpenMP implementation to create device-specific versions of the variables or functions specified, meaning they are available (for execution) on the device. In order to support vector-style operations of current accelerators, there are two new constructs: the <em>teams</em> construct creates several OpenMP thread teams (then called a league) of which only the master executes the associated region and the <em>distribute</em> construct specifies that the corresponding loop iterations will be executed by the thread teams. &#8212; I understand this very brief description cannot serve as a good explanation for this new feature, but I don&#8217;t have my examples ready yet. If you know OpenACC and/or the PGI Accelerator programming model, you probably got a clue of what will be in OpenMP. Personally, I regard OpenMP 4.0 with this extension as a superset of OpenACC, in which the common roots are visible. More information and documentation on this new feature, which I like to call &#8220;OpenMP for Accelerators (OpenMP4Acc)&#8221;, should become available with the release of the OpenMP 4.0 RC2 spec draft. By the latest on March 14th for our next PPCES event I will have a more detailed introduction along with some examples ready and will put them here as well.</p>
<p><strong>Print Environment Settings</strong>. The the <em>OMP_DISPLAY_ENV</em> environment variables is set to true, the execution environment is instructed to display the OpenMP version number as well as the values of all the ICVs (ICV = Internal Control Variable) after evaluating the user options before starting the actual program execution. This is very helpful if one uses multiple environments.</p>
<p>Adding so many things to OpenMP 4.0 (don&#8217;t forget the new features already present in OpenMP 4.0 RC1) also has at least one obvious downside: the specification itself has become almost unreadable for the average OpenMP user. I cannot completely exclude myself here, although I spent a reasonable amount of my work time dealing with OpenMP itself. This clearly underlines the need for good books on OpenMP 4.0 programming, but I am not aware of anyone currently working on such a thing. Several members of the LC as well as well-know instructors from academia will for sure add OpenMP 4.0 aspects to their lectures and tutorials soon, but this is only the first tiny step towards OpenMP 4.0 adoption. I am curious to see how programmers will pick up the new goodness&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10431/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10431/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10431&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2013/02/27/openmp-4-0-rc2-is-well-on-its-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>OpenMP 4.0 RC1 and the Accelerator TR available</title>
		<link>http://terboven.com/2012/11/14/openmp-4-0-rc1-and-the-accelerator-tr-available/</link>
		<comments>http://terboven.com/2012/11/14/openmp-4-0-rc1-and-the-accelerator-tr-available/#comments</comments>
		<pubDate>Tue, 13 Nov 2012 23:44:52 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[cc-NUMA]]></category>
		<category><![CDATA[Loop Parallelization]]></category>
		<category><![CDATA[SC12]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10428</guid>
		<description><![CDATA[Quoting from openmp.org: OpenMP, the de-facto standard for parallel programming on shared memory systems, continues to extend its reach beyond pure HPC to include embedded systems, real time systems, and accelerators. Release Candidate 1 of the OpenMP 4.0 API specifications &#8230; <a href="http://terboven.com/2012/11/14/openmp-4-0-rc1-and-the-accelerator-tr-available/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10428&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Quoting from openmp.org: OpenMP, the de-facto standard for parallel programming on shared memory systems, continues to extend its reach beyond pure HPC to include embedded systems, real time systems, and accelerators. Release Candidate 1 of the OpenMP 4.0 API specifications currently under development is now available for public discussion. This update includes thread affinity, initial support for Fortran 2003, SIMD constructs to vectorize both serial and parallelized loops, user-defined reductions, and sequentially consistent atomics. The OpenMP ARB plans to integrate the Technical Report on directives for attached accelerators, as well as more new features, in a final Release Candidate 2, to appear sometime in the first Quarter of 2013, followed by the finalized full 4.0 API specifications soon thereafter.</p>
<p>The OpenMP Language Commmittee really put a lot of effort and dedicated work into both documents and we hope for good, constructive feedback. Both documents are available at the <a href="http://openmp.org/wp/openmp-specifications/" target="_blank">OpenMP Specifications</a> webpage: <a href="http://openmp.org/wp/openmp-specifications/" target="_blank">http://openmp.org/wp/openmp-specifications/</a>.</p>
<p>Grab them now while they are hot <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10428/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10428/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10428&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/11/14/openmp-4-0-rc1-and-the-accelerator-tr-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>Expect big OpenMP 4.0 news for SC12</title>
		<link>http://terboven.com/2012/11/06/expect-big-openmp-4-0-news-for-sc12/</link>
		<comments>http://terboven.com/2012/11/06/expect-big-openmp-4-0-news-for-sc12/#comments</comments>
		<pubDate>Tue, 06 Nov 2012 19:28:57 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[cc-NUMA]]></category>
		<category><![CDATA[Loop Parallelization]]></category>
		<category><![CDATA[SC12]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Threading]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10425</guid>
		<description><![CDATA[Expect big news on OpenMP 4.0 for next week&#8217;s SC12. The OpenMP Language Committee &#8211; responsible for developing the standard &#8211; always planned to release the next version of the standard as a draft for public comment in time for &#8230; <a href="http://terboven.com/2012/11/06/expect-big-openmp-4-0-news-for-sc12/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10425&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Expect big news on OpenMP 4.0 for next week&#8217;s <a href="http://sc12.supercomputing.org/schedule/event_detail.php?evid=tut124" target="_blank">SC12</a>. The OpenMP Language Committee &#8211; responsible for developing the standard &#8211; always planned to release the next version of the standard as a draft for public comment in time for SC12. We worked very hard during the last weeks to stay within our schedule. And we will do the following:</p>
<ul>
<li>Release OpenMP 4.0 RC1 as a draft for public review. This document will be in a pretty good shape and will represent the foundation of OpenMP 4.0. It will contain several new features, to be discussed and explained during SC12 at our booth and/or the OpenMP BoF. Among these new features is the SIMD construct, to vectorize both serial as well as parallelized loops, taskgroups (no task dependencies yet), thread binding via places (I talked a lot on this already), array sectioning, basic support for Fortran 2003, and some other minor corrections and improvements.</li>
<li>Publish a Technical Report on OpenMP for Accelerators, more specifically on &#8220;Directives for Attached Accelerators&#8221;. This was always planned to be the major addition for OpenMP 4.0. However, integrating support for accelerators with the rest of OpenMP is a hard task and a lot of work, and it is not 100% done yet. There were many discussion on how to deal with this situation: do as outlined here, wait for just some more weeks, come up with a completely new schedule and wait until we are completely done, &#8230; . Almost all technical aspects have been discussed and answered. But the wording is not yet completed. And support for NVIDIA-like GPUs might not be optimal. However, I personally think the proposal is really good and the big opportunity in making the current state of work public is that the HPC community can take a look at it, think about it, comment on it, and possibly improve it. It is already online: <a href="http://openmp.org/wp/openmp-specifications/" target="_blank">http://openmp.org/wp/openmp-specifications/</a>.</li>
</ul>
<p>Hoping for constructive feedback and taking the additional time to work on the OpenMP for Accelerator extension, the current plan is to come up with a second draft for public comment (RC2) in January 2013 and then finalize the standard quickly after. Quickly in terms of a few weeks. This plan is still ambitious, but I think this is a good plan.</p>
<p>If you want to learn more, come to the OpenMP booth, and come to the BoF on Tuesday afternoon, 17:30h, which unluckily I will not be able to attend myself :-/. Listen to what the people will show you and let us know what you like and what you dislike.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10425/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10425/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10425&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/11/06/expect-big-openmp-4-0-news-for-sc12/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>A Glimpse at OpenMP for Accelerators (aka OpenACC v2?)</title>
		<link>http://terboven.com/2012/10/12/a-glimpse-at-openmp-for-accelerators/</link>
		<comments>http://terboven.com/2012/10/12/a-glimpse-at-openmp-for-accelerators/#comments</comments>
		<pubDate>Fri, 12 Oct 2012 07:23:15 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Language Committee]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10415</guid>
		<description><![CDATA[During our OpenACC Workshop I contributed a brief talk on the current status of the OpenMP for Accelerators proposal. It caused some interest, because if successful, this proposal will be the de-facto successor of OpenACC, fully integrated into the rest &#8230; <a href="http://terboven.com/2012/10/12/a-glimpse-at-openmp-for-accelerators/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10415&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>During our <a href="http://www.rz.rwth-aachen.de/go/id/tmy?lang=en" target="_blank">OpenACC Workshop</a> I contributed a brief talk on the <a href="http://terboven.files.wordpress.com/2012/10/openmp_for_accelerators.pdf" target="_blank">current status of the OpenMP for Accelerators proposal</a>. It caused some interest, because if successful, this proposal will be the de-facto successor of OpenACC, fully integrated into the rest of OpenMP. Hence I wanted to share this slide deck, but please understand that some of the information presented in there is changing on a daily base! We expect the concepts to remain valid, but i.e. the spelling seems to change quite quickly &#8211; understand this slide deck as a snapshot of October 11th, 2012. I once <a href="http://terboven.com/2011/11/16/openmp-and-openacc/" target="_blank">was critical how OpenACC was born,</a> but since came to realize it was a good move and helped a lot to gain experiences with a pragma-based paradigm to program accelerators. Furthermore, users have something to work with already, instead of still waiting for a standard to be completed&#8230;</p>
<p>The OpenMP for Accelerators subcommittee is run by James Beyer (Cray) and Eric Stotzer (TI), who do a great job of documenting the current state of the discussion. This made it pretty easy for me to compile the slide deck and keep colleagues as well as users informed.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10415/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10415/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10415&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/10/12/a-glimpse-at-openmp-for-accelerators/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>Event Annoucement: Microsoft Azure Compute Tutorial</title>
		<link>http://terboven.com/2012/10/02/event-annoucement-microsoft-azure-compute-tutorial/</link>
		<comments>http://terboven.com/2012/10/02/event-annoucement-microsoft-azure-compute-tutorial/#comments</comments>
		<pubDate>Tue, 02 Oct 2012 12:42:26 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[Windows-HPC]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Visual Studio]]></category>
		<category><![CDATA[Windows-HPC UG]]></category>

		<guid isPermaLink="false">http://terboven.com/?p=10408</guid>
		<description><![CDATA[This time I am going to announce an event to which we never had a similar predecessor: On November 5th, 2012, we will conduct a Microsoft Azure Compute Tutorial with speakers from the European Microsoft Innovation Center in Aachen. What &#8230; <a href="http://terboven.com/2012/10/02/event-annoucement-microsoft-azure-compute-tutorial/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10408&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This time I am going to announce an event to which we never had a similar predecessor: On November 5th, 2012, we will conduct a Microsoft Azure Compute Tutorial with speakers from the European Microsoft Innovation Center in Aachen. What we mean with &#8220;compute&#8221; is not quite what HPC people might think of as computing. The rationale is the following:</p>
<p>Cloud computing enables the usage of computing resources provided as a service via a network (e.g. the internet). One cloud platform is Microsoft&#8217;s Windows Azure. It can be used to build, deploy and manage applications in the cloud, which hereby consists of Microsoft-managed data centers. This workshop will introduce Microsoft Azure facilities with a focus on compute services. In the morning of this tutorial, we will introduce you to Azure computing, storage and services. For interested participants, there will be a hands-on session after lunch, in which an example application will be created step-by-step. More details and the link for registration can be found at the <a href="http://www.rz.rwth-aachen.de/go/id/tot/?lang=en" target="_blank">event website</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10408/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10408/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10408&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/10/02/event-annoucement-microsoft-azure-compute-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>Revamped Web Presence</title>
		<link>http://terboven.com/2012/09/21/revamped-web-presence/</link>
		<comments>http://terboven.com/2012/09/21/revamped-web-presence/#comments</comments>
		<pubDate>Fri, 21 Sep 2012 07:35:56 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Private]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://terboven.wordpress.com/?p=10400</guid>
		<description><![CDATA[I am not a designer. Neither am I an artist. But I appreciate well-designed, simplistic websites. Hence I decided to completely migrate to wordpress.com, this is where you ended up now. The domain I own, terboven.com, will soon point here, &#8230; <a href="http://terboven.com/2012/09/21/revamped-web-presence/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10400&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I am not a designer. Neither am I an artist. But I appreciate well-designed, simplistic websites. Hence I decided to completely migrate to <a href="http://www.wordpress.com" target="_blank">wordpress.com</a>, this is where you ended up now. The domain I own, <a href="http://www.terboven.com" target="_blank">terboven.com</a>, will soon point here, too. I transferred most of the content of my old homepage, my blog is staying here for quite a while already. I hope you like the new site as much as I do and that you will find here what you were searching for.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10400/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10400/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10400&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/09/21/revamped-web-presence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>Several Event Annoucements</title>
		<link>http://terboven.com/2012/09/11/several-event-annoucements/</link>
		<comments>http://terboven.com/2012/09/11/several-event-annoucements/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 14:29:08 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Windows-HPC]]></category>
		<category><![CDATA[cc-NUMA]]></category>
		<category><![CDATA[Debugging]]></category>
		<category><![CDATA[MPI]]></category>
		<category><![CDATA[OpenACC]]></category>
		<category><![CDATA[SC12]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Teaching]]></category>
		<category><![CDATA[Windows HPC Server]]></category>
		<category><![CDATA[Windows-HPC UG]]></category>

		<guid isPermaLink="false">http://terboven.wordpress.com/?p=10322</guid>
		<description><![CDATA[These are just some announcements of upcoming events in which I am involved in a varying degree. The first two will be take place at RWTH Aachen University and attendance is free of charge, the second is part of the &#8230; <a href="http://terboven.com/2012/09/11/several-event-annoucements/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10322&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>These are just some announcements of upcoming events in which I am involved in a varying degree. The first two will be take place at RWTH Aachen University and attendance is free of charge, the second is part of the SC12 conference in Salt Lake City, UT in the US.</p>
<p><strong>Tuning for bigSMP HPC Workshop &#8211; aixcelerate (October 8th &#8211; 10th, 2012).</strong> The number of cores per processor chip is increasing. Today&#8217;s &#8220;fat&#8221; compute nodes are equipped with up to 16 eight-core Intel Xeon processors, resulting in 128 phyiscal cores, with up to 2 TB of main memory. Furthermore, special solutions like a ScaleMP vSMP system may consist of 16 nodes with 4 eight-core Intel Xeon processors each and 4 TB of accumulated main memory, scaling the number of cores even further up to 1024 per machine.  While message-passing with MPI is the dominating paradigm for parallel programming in the domain of high performance computing (HPC), with the growing number of cores per cluster node the combination of MPI with shared memory programming is gaining importance. The efficient use of these systems also requires NUMA-aware data management. In order to exploit different levels of parallelism, namely through shared memory programming within a node and message-passing across the nodes, obtaining good performance becomes increasingly difficult.  This tuning workshop will in detail cover tools and methods to program big SMP systems. The first day will focus on OpenMP programming on big NUMA systems, the second day will focus on Intel Performance Tools as well as the ScaleMP machine, and the third day will focus on Hybrid Parallelization. Attendees are kindly requested to prepare and bring in their own code, if applicable. If you do not have an own code, but you are interested in the presented topics, you may work on prepared exercises during the lab time (hands-on). It is recommended to have good knowledge in MPI and/or OpenMP. More details and the registration link can be found at the <a href="http://www.rz.rwth-aachen.de/go/id/tmv/?lang=en" target="_blank">event website</a>.</p>
<p><strong>OpenACC Tutorial Workshop (October 11th  to 12th, 2012).</strong> OpenACC is a directive-based programming model for accelerators which enables delegating the responsibility for low-level (e.g. CUDA or OpenCL) programming tasks to the compiler. To this end, using the OpenACC API, the programmer can easily offload compute-intensive loops to an attached accelerator. The open industry standard OpenACC has been introduced in November 2011 and supports accelerating regions of code in standard C, C++ and Fortran. It provides portability across operating systems, host CPUs and accelerators. Up to know, OpenACC compilers exist from Cray, PGI and CAPS. During this workshop, you will work with PGI&#8217;s OpenACC implementation on Nvidia Quadro 6000 GPUs. This OpenACC workshop is divided into two parts (with separate registrations!). In the first part, we will give an introduction to the OpenACC API while focusing on GPUs. It is open for everyone who is interested in the topic. In contrast to the first part, the second part will not contain any presentations or hands-on sessions. To the second day, we invite all programmers who have their own code and want to give it a try to accelerate it on a GPU using OpenACC and with the help of our team members and Nvidia staff. More details and the registration link can be found at the <a href="http://www.rz.rwth-aachen.de/go/id/tmy?lang=en" target="_blank">event website</a>.</p>
<p><strong>Advanced OpenMP Tutorial at SC12 (November 12th, 2012).</strong> With the increasing prevalence of multicore processors, shared-memory programming models are essential. OpenMP is a popular, portable, widely supported and easy-to-use shared-memory model. Developers usually find OpenMP easy to learn. However, they are often disappointed with the performance and scalability of the resulting code. This disappointment stems not from shortcomings of OpenMP but rather with the lack of depth with which it is employed. Our “Advanced OpenMP Programming” tutorial addresses this critical need by exploring the implications of possible OpenMP parallelization strategies, both in terms of correctness and performance. While we quickly review the basics of OpenMP programming, we assume attendees understand basic parallelization concepts and will easily grasp those basics. We discuss how OpenMP features are implemented and then focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and private versus shared data. We discuss language features in-depth, with emphasis on features recently added to OpenMP such as tasking. We close with debugging, compare various tools, and illustrate how to avoid correctness pitfalls. More details can be found on the <a href="http://sc12.supercomputing.org/schedule/event_detail.php?evid=tut124" target="_blank">event website</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10322/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10322/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10322&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/09/11/several-event-annoucements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>
	</item>
		<item>
		<title>The Design of OpenMP Thread Affinity</title>
		<link>http://terboven.com/2012/06/21/the-design-of-openmp-thread-affinity/</link>
		<comments>http://terboven.com/2012/06/21/the-design-of-openmp-thread-affinity/#comments</comments>
		<pubDate>Thu, 21 Jun 2012 07:49:11 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[cc-NUMA]]></category>
		<category><![CDATA[IWOMP]]></category>
		<category><![CDATA[ScaleMP]]></category>
		<category><![CDATA[Threading]]></category>

		<guid isPermaLink="false">http://terboven.wordpress.com/?p=10280</guid>
		<description><![CDATA[Exascale machines will employ significantly more threads than today, but even on current architectures controlling thread affinity is crucial to fuel all the cores and to maintain data affinity, but both MPI and OpenMP lack a solution to this problem &#8230; <a href="http://terboven.com/2012/06/21/the-design-of-openmp-thread-affinity/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10280&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>Exascale machines will employ significantly more threads than today, but even on current architectures controlling thread affinity is crucial to fuel all the cores and to maintain data affinity, but both MPI and OpenMP lack a solution to this problem</em> &#8211; this is the first sentence of our IWOMP 2012 paper with the same title as this blog post. The need for thread affinity in OpenMP has been demonstrated several times at several occasions. Inside the OpenMP Language Committee we formed the Affinity Subcommittee and we are working on this topic since several years now. Meanwhile almost all vendors have introduced their own extensions to support thread affinity, but they are all different and thus offer a clearly suboptimal user experience. Furthermore, they do not support nested OpenMP and in general they are static, meaning that only one affinity setting can be used for the whole program. For OpenMP 4.0, which is expected to be released as a draft in November 2012, we have a good thread affinity proposal on the table that not only standardizes existing vendor extensions, but also will add additional capabilities. This blog post will present this proposal along with some information why things are the way the are. I welcome any comments or questions via <a href="mailto:christian@terboven.com">email</a>.</p>
<p>When we started thinking about Affinity in general, we first tried to define a machine model or rather a machine abstraction and intended to use that to bind threads to cores as well as to possibly define a data layout. Over time I got convinced that this is not the right approach. Whatever method we used to describe the machine topology, we always envisioned systems that would be very complicated to be described. But furthermore, describing the system could end up being a task to be performed by the user, which I think is too complicated for most of them. We also do not want to enforce users to think about an explicit mapping of threads to cores, because for 95 % of the OpenMP programmers we think this is too low level. And last but not least, when there would be a new machine that could not be comfortably described by our method, OpenMP develops too slowly to be extended to support that.</p>
<p>To overcome this problem, the current proposal as developed by Alexandre E. Eichenberger, myself and the members of the OpenMP Language Committee Affinity Subcommittee, introduced the concepts of a <em>place</em> and a <em>place-list</em>. A place is defined as a set of execution units capable of executing OpenMP threads. For now you may think of a place like a set of cores.  A place-list is an ordered list of places, the <em>ordered</em> attribute is important. It can be defined by either using abstract names or rather constructing the places by enumerating the cores. The place-list will be used together with an affinity policy to bind the OpenMP threads in a team of a parallel region to the places in the list. It can be specified via the new environment variable <em>OMP_PLACES</em> (the name might still change). Lets illustrate that with an example: The figure below depicts a very standard system (<em>node 0</em>) with two sockets (<em>socket 0</em> and <em>socket 1</em>), every socket having four cores (<em>core 0</em> to <em>core 3</em> on <em>socket 0</em>) and finally every core has two hardware-threads (<em>t0</em> and <em>t1</em>), i.e. every core can execute two threads simultaneously.</p>
<div id="attachment_10298" class="wp-caption aligncenter" style="width: 310px"><a href="https://terboven.files.wordpress.com/2012/06/system_topology_example.png"><img class="size-medium wp-image-10298" title="System Topology Example: 2 sockets, 4 cores, 2 hw-threads" src="https://terboven.files.wordpress.com/2012/06/system_topology_example.png?w=300&#038;h=64" alt="System Topology Example: 2 sockets, 4 cores, 2 hw-threads" width="300" height="64" /></a><p class="wp-caption-text">System Topology Example: 2 sockets, 4 cores, 2 hw-threads</p></div>
<p>Lets construct a place-list consisting of eight places, every place to be a physical core consisting of two hardware-threads (I often call those logical-threads). All of the following methods are equivalent, but we expect almost all users to use the first option:</p>
<pre>OMP_PLACES=cores
OMP_PLACES="(0,1),(2,3),(4,5),(6,7),(8,9),(10,11),(12,13),(14,15)"
OMP_PLACES="(0:2):2:8"</pre>
<p>As for now we will define three abstract names to describe the place-list: <em>hwthreads</em>, <em>cores</em> and <em>sockets</em>. It is up to the implementation to define what is meant to be a &#8220;core&#8221; for instance, but of course we will provide some hints. The wording on that is not yet completed, but it will be something along the lines of hwthreads := smallest unit of execution capable of executing an OpenMP thread; cores := set of execution units in which more than one hardware-thread share some resources such as caches; sockets := physical package of multiple cores.</p>
<p>Of course defining a place-list does not lead to any thread affinity. As I said above, the place list is just used to define the places the threads of a parallel region can be bound to. In our proposal, the user does not have to define an explicit mapping of threads to places (or execution units in a place) &#8211; instead, the user can specify a so-called <em>affinity policy</em> via the new <em>affinity</em> clause which can be put on a parallel region. Our proposal consists of currently three affinity policies that allow to exploit the place-list in several possible ways (the names might still change):</p>
<ul>
<li>SPREAD: spread OpenMP threads as evenly as possible among the places. The place-list will be partitioned, so that subsequent threads (i.e. nested OpenMP) will only be allocated within the partition. Given the place-list outlined above, this policy would provide most dedicated hardware resources to the OpenMP program.</li>
<li>CLOSE: pack OpenMP threads near to the master thread. There is no partitioning. Given the place-list from above, this policy would be used if sharing of resources among threads is desirable.</li>
<li>MASTER: collocate OpenMP threads with the master thread (in the same place). This will ensure maximum locality to the master thread.</li>
</ul>
<p>It is important to understand that these affinity policies influence the allocation of threads to places &#8211; not directly to the system topology. In my example the (ordered!) place-list was designed so that two threads far apart from each other also end up on physical cores far apart in the system. Although we expect this to be the standard use case, it does not necessarily have to be this way.</p>
<p>Lets take a closer look at what the affinity policies do by looking at some examples. The figure below shows what SPREAD will do. The green box denotes the place-list, and for every number of threads &gt;=2 the place-list will be partitioned when a parallel region with this affinity clause is encountered. This will support nested OpenMP, as we will see later on. Every thread will receive its own sub-place-list. If there are more threads than places, more than one thread has to be allocated per place. This will occur so that if threads i and i+1 are put together in one place, this will also be the case for the OpenMP thread ids i and i+1 (in this example with 16 threads: threads with OpenMP thread id 0 and 1 are on place 0).</p>
<div id="attachment_10304" class="wp-caption aligncenter" style="width: 310px"><a href="http://terboven.files.wordpress.com/2012/06/affinity_example_spread.png"><img class="size-medium wp-image-10304" title="Affinity Example: SPREAD" src="http://terboven.files.wordpress.com/2012/06/affinity_example_spread.png?w=300&#038;h=165" alt="Affinity Example: SPREAD" width="300" height="165" /></a><p class="wp-caption-text">Affinity Example: SPREAD</p></div>
<p>Lets also take a brief look at the two other affinity policies we are proposing, namely CLOSE and MASTER. Both are exampled in the figure below. For CLOSE, threads i and i+1 are meant to reside on place j and j+1, unless more than one thread will be allocated per place. For MASTER, all threads will be put into the same place the master thread is running on, unless this cannot be fulfilled by the implementation for any reason.</p>
<div id="attachment_10307" class="wp-caption aligncenter" style="width: 310px"><a href="http://terboven.files.wordpress.com/2012/06/affinity_example_close_master.png"><img class="size-medium wp-image-10307" title="Affinity Example: CLOSE and MASTER" src="http://terboven.files.wordpress.com/2012/06/affinity_example_close_master.png?w=300&#038;h=187" alt="Affinity Example: CLOSE and MASTER" width="300" height="187" /></a><p class="wp-caption-text">Affinity Example: CLOSE and MASTER</p></div>
<p>When discussing the proprietary support offered by OpenMP implementers, I said that their solutions are static for the whole program lifetime. In our proposal the initial place-list is fixed, but the affinity policy might of course be set dynamically. Furthermore, the figure below shows how nested OpenMP is supported. The outer parallel region uses the SPREAD affinity policy to create partitions and to maximize resource usage. The inner parallel region uses CLOSE to stay within the respective partition.</p>
<div id="attachment_10310" class="wp-caption aligncenter" style="width: 310px"><a href="http://terboven.files.wordpress.com/2012/06/affinity_example_nested_spread_close.png"><img class="size-medium wp-image-10310" title="Affinity Example with Nested OpenMP: SPREAD + CLOSE" src="http://terboven.files.wordpress.com/2012/06/affinity_example_nested_spread_close.png?w=300&#038;h=121" alt="Affinity Example with Nested OpenMP: SPREAD + CLOSE" width="300" height="121" /></a><p class="wp-caption-text">Affinity Example with Nested OpenMP: SPREAD + CLOSE</p></div>
<p>Whenever a new feature is intended to go into the OpenMP specification, we require the existence of at least one reference implementation to not only prove implementability, but also to get an estimation of the effort it takes to be implemented. The reference implementation for this proposal was done by Alexandre E. Eichenberger in an experimental OpenMP runtime for the IBM BlueGene/Q system. Our proposal does not affect performance critical parts of the implementation, &#8220;just&#8221; the thread selection and allocation parts. According to Alexandre&#8217;s findings the total overhead was less than 1 %, which is in the order of system noise.</p>
<p>Finally, let me summarize a few important properties / implications that I did not discuss in detail so far:</p>
<ul>
<li>If the place-list is constructed by enumerating the cores, it will be done with the same naming scheme as used by the operating system. This approach is also used by all vendor-proprietary extensions and removes the need to define an explicit naming scheme, which might confuse users if it is different from the operation system and also might become inappropriate for future system topologies that we would not foresee today.</li>
<li>Every implementation will provide a default place-list to an OpenMP program. It has to document what the default place-list is. I guess that implementations will provide something like cores or hwthreads as a default. This corresponds to the behavior that the number of threads to be used if not specified by the user is also implementation defined (some implementations use just 1 thread, others as many as there are cores in the system).</li>
<li>When one (or more) threads are allocated to a place, they are allowed to migrate within this place if it contains more than one execution unit (i.e. physical core). This will allow for both an explicit thread-to-core binding as well as a more flexible as threads to a socket, for example, depending on how the place-list is constructed as well as which affinity policy is used.</li>
<li>The binding of the initial thread may occur as early as the runtime decides to be appropriate, but not later than when the first parallel region is encountered.</li>
</ul>
<p>Thanks for reading until down here. More details can be found in the paper which is published by <a href="http://www.springerlink.com/content/03m04n367322u274/" target="_blank">Springer in IWOMP 2012</a>. Again, I welcome any comments or questions via <a href="mailto:christian@terboven.com">email</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10280/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10280/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10280&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2012/06/21/the-design-of-openmp-thread-affinity/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>

		<media:content url="https://terboven.files.wordpress.com/2012/06/system_topology_example.png?w=300" medium="image">
			<media:title type="html">System Topology Example: 2 sockets, 4 cores, 2 hw-threads</media:title>
		</media:content>

		<media:content url="http://terboven.files.wordpress.com/2012/06/affinity_example_spread.png?w=300" medium="image">
			<media:title type="html">Affinity Example: SPREAD</media:title>
		</media:content>

		<media:content url="http://terboven.files.wordpress.com/2012/06/affinity_example_close_master.png?w=300" medium="image">
			<media:title type="html">Affinity Example: CLOSE and MASTER</media:title>
		</media:content>

		<media:content url="http://terboven.files.wordpress.com/2012/06/affinity_example_nested_spread_close.png?w=300" medium="image">
			<media:title type="html">Affinity Example with Nested OpenMP: SPREAD + CLOSE</media:title>
		</media:content>
	</item>
		<item>
		<title>On the future of HPC on Windows</title>
		<link>http://terboven.com/2011/12/29/on-the-future-of-hpc-on-windows/</link>
		<comments>http://terboven.com/2011/12/29/on-the-future-of-hpc-on-windows/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 20:40:25 +0000</pubDate>
		<dc:creator>terboven</dc:creator>
				<category><![CDATA[Future of HPC]]></category>
		<category><![CDATA[Windows-HPC]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[PLINQ]]></category>
		<category><![CDATA[SC11]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Windows HPC Server]]></category>
		<category><![CDATA[Windows-HPC UG]]></category>

		<guid isPermaLink="false">http://terboven.wordpress.com/?p=10268</guid>
		<description><![CDATA[Just a few weeks ago during SC11 Microsoft released two new or updated HPC products, namely Windows Azure HPC Scheduler and Windows HPC Server 2008 R2 SP3. However, what I saw and heard during the last few months as well &#8230; <a href="http://terboven.com/2011/12/29/on-the-future-of-hpc-on-windows/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10268&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Just a few weeks ago during <a href="http://sc11.supercomputing.org" target="_blank">SC11</a> Microsoft released two new or updated HPC products, namely <em><a href="http://go.microsoft.com/fwlink/?LinkID=230449&amp;clcid=0x409" target="_blank">Windows Azure HPC Scheduler</a></em> and <a href="http://go.microsoft.com/fwlink/?LinkID=231891" target="_blank"><em>Windows HPC Server 2008 R2 SP3</em></a>. However, what I saw and heard during the last few months as well as during SC11 did not give me the best feeling for the future of Microsoft&#8217;s HPC Server product. This post is on my impressions and thoughts not only on the product, but also on doing HPC on the Windows platform in general.</p>
<p>What disturbed me a little was the absence of any roadmap presentation. Well, over the last few years Windows HPC Server clearly has become mature enough to not lack any significant feature necessary for deployment and use on a medium-sized HPC installation. However, Microsoft publically outlining a product roadmap with several key features always felt right, and it&#8217;s absence at SC11 has been noted by the community. Furthermore, they quietly killed their Dryad project (including LINQ to HPC), which was prominently displayed at SC10, now betting  on a yet-to-be-released distribution of <a href="http://hadoop.apache.org/" target="_blank">Apache Hadoop</a> for Windows HPC Server and Azure. Finally, there have been several business restructuring activities inside Microsoft. For example, here in Germany Microsoft apparently shut down the HPC group and moved (some of) the people under the hood of Azure. From what I heard, all these activities caused some confusion in the community on how Microsoft sees the future of the Windows HPC Server product and how much support and innovations may be expected from the company on this regard.</p>
<p>What Microsoft now talks a lot about is the Azure integration. If you followed the development of Windows HPC Server up to release R2 SP3, you could clearly see this coming. From a technology point of view, I am impressed. However, I am not convinced yet, for several reasons &#8211; the most important one being the offer much too expensive for our application needs. Of course we are following what is going on regarding Clouds and HPC, and in fact in one project we are extending one application to make use of both on-premise and off-premis compute power based on availability (and maybe even price). But for the time being, our local clusters, including the one running Windows, will clearly dominate (or, as we Germans say, set the tone).</p>
<p>Finally, I am missing a clear picture of HPC-related improvements in the Windows Server roadmap. Just recently we added a frontend system with 160 (logical) cores, this is 8 sockets, 512 GB of memory. Windows just works on such a machine &#8211; but it could do better. It could serve HPC applications better. And given that next-gen ordinary (HPC) systems probably have a similar core count, Windows really has to serve applications better on such machines in order to stay competitive. Furthermore, smooth and stable integration of accelerators &#8211; be it GPGPUs, or something different but similar in spirit &#8211; will be as important at least.</p>
<div id="attachment_10275" class="wp-caption aligncenter" style="width: 310px"><a href="http://terboven.files.wordpress.com/2011/12/windows_8_socket_many_cores.jpg"><img class="size-medium wp-image-10275" title="Windows Task-Manager with 160 cores (8 sockets)" src="http://terboven.files.wordpress.com/2011/12/windows_8_socket_many_cores.jpg?w=300&#038;h=277" alt="Windows Task-Manager with 160 cores (8 sockets)" width="300" height="277" /></a><p class="wp-caption-text">Windows Task-Manager with 160 cores (8 sockets)</p></div>
<p>I will stop here. Our user base is clearly showing a demand for Windows HPC Server-based clusters, and in fact the demand is growing. Trying to combine my personal opinion with the feedback and opinions I got from the (German) community, Microsoft has to improve the communication regarding Windows HPC Server. It is time for a clear statement regarding the future of the product and the directions it will be going to.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/terboven.wordpress.com/10268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/terboven.wordpress.com/10268/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=terboven.com&#038;blog=5383873&#038;post=10268&#038;subd=terboven&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://terboven.com/2011/12/29/on-the-future-of-hpc-on-windows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7c6710b2a262527d1c3c69cd91296556?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">terboven</media:title>
		</media:content>

		<media:content url="http://terboven.files.wordpress.com/2011/12/windows_8_socket_many_cores.jpg?w=300" medium="image">
			<media:title type="html">Windows Task-Manager with 160 cores (8 sockets)</media:title>
		</media:content>
	</item>
	</channel>
</rss>
