Performance is a Feature!2020-03-04T09:03:44+00:00http://www.mattwarren.orgMatt Warrenmatt.warren@live.co.ukAnalysing .NET start-up time with Flamegraphs2020-03-03T00:00:00+00:00http://www.mattwarren.org/2020/03/03/Analysing-.NET-Runtime-Startup-with-Flamegraphs
<p>Recently I gave a talk at the <a href="https://nyanconference.splashthat.com/">NYAN Conference</a> called <a href="https://nyanconference.splashthat.com/">‘From ‘dotnet run’ to ‘hello world’</a>:</p>
<p>In the talk I demonstrate how you can use <a href="https://github.com/microsoft/perfview#perfview-overview">PerfView</a> to analyse <strong>where the .NET Runtime is spending it’s time during start-up</strong>:</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/xU98KRbWFvU2SC?startSlide=26" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/from-dotnet-run-to-hello-world" title="From 'dotnet run' to 'hello world'" target="_blank">From 'dotnet run' to 'hello world'</a> </strong> from <strong><a href="//www.slideshare.net/mattwarren" target="_blank">Matt Warren</a></strong> </div>
<p><strong>This post is a step-by-step guide to that demo.</strong></p>
<hr />
<h2 id="code-sample">Code Sample</h2>
<p>For this exercise I <em>delibrately</em> only look at what the .NET Runtime is doing during program start-up, so I ensure the minimum amount of <em>user code</em> is runing, hence the following ‘Hello World’:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System</span><span class="p">;</span>
<span class="k">namespace</span> <span class="nn">HelloWorld</span>
<span class="p">{</span>
<span class="k">class</span> <span class="nc">Program</span>
<span class="p">{</span>
<span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Hello World!"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Press <ENTER> to exit"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">ReadLine</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">Console.ReadLine()</code> call is added because I want to ensure the process doesn’t exit whilst PerfView is still collecting data.</p>
<h2 id="data-collection">Data Collection</h2>
<p>PerfView is a <em>very</em> powerful program, but not the most <em>user-friendly</em> of tools, so I’ve put togerther a step-by-step guide:</p>
<ol>
<li>Download and run a <a href="https://github.com/microsoft/perfview/releases/latest">recent version of ‘PerfView.exe’</a></li>
<li>Click ‘Run a command’ or (Alt-R’) and “collect data while the command is running”</li>
<li>Ensure that you’ve entered values for:
<ol>
<li>“<strong>Command</strong>”</li>
<li>“<strong>Current Dir</strong>”</li>
</ol>
</li>
<li>Tick ‘<strong>Cpu Samples</strong>’ if it isn’t already selected</li>
<li>Set ‘<strong>Max Collect Sec</strong>’ to 15 seconds (because our ‘HelloWorld’ app never exits, we need to ensure PerfView stops collecting data at some point)</li>
<li>Ensure that ‘<strong>.NET Symbol Collection</strong>’ is selected</li>
<li>Hit ‘<strong>Run Command</strong></li>
</ol>
<p><a href="/images/2020/03/PerfView - Collection Options - annotated.png"><img src="/images/2020/03/PerfView - Collection Options - annotated.png" alt="Collection Options" /></a></p>
<p>If you then inspect the log you can see that it’s collecting data, obtaining symbols and then finally writing everything out to a .zip file. Once the process is complete you should see the newly created file in the left-hand pane of the main UI, in this case it’s called ‘PerfViewData.etl.zip’</p>
<h2 id="data-processing">Data Processing</h2>
<p>Once you have your ‘.etl.zip’ file, double-click on it and you will see a tree-view with all the available data. Now, select ‘CPU Stacks’ and you’ll be presented with a view like this:</p>
<p><a href="/images/2020/03/PerfView - Unresolved Symbols.png"><img src="/images/2020/03/PerfView - Unresolved Symbols.png" alt="Unresolved Symbols" /></a></p>
<p>Notice there’s alot of ‘?’ characters in the list, this means that PerfView is not able to work out the method names as it hasn’t resolved the necessary symbols for the Runtime dlls. Lets fix that:</p>
<ol>
<li>Open ‘<strong>CPU Stacks</strong>’</li>
<li>In the list, select the ‘<strong>HelloWorld</strong>’ process (PerfView collects data <em>machine-wide</em>)</li>
<li>In the ‘<strong>GroupPats</strong>’ drop-down, select ‘[no grouping]’</li>
<li><em>Optional</em>, change the ‘<strong>Symbol Path</strong>’ from the default to something else</li>
<li>In the ‘<strong>By name</strong>’ tab, hit ‘Ctrl+A’ to select all the rows</li>
<li>Right-click and select ‘<strong>Lookup Symbols</strong>’ (or just hit ‘Alt+S’)</li>
</ol>
<p>Now the ‘CPU Stacks’ view should look something like this:</p>
<p><a href="/images/2020/03/PerfView - Resolved Symbols.png"><img src="/images/2020/03/PerfView - Resolved Symbols.png" alt="Resolved Symbols" /></a></p>
<p>Finally, we can get the data we want:</p>
<ol>
<li>Select the ‘<strong>Flame Graph</strong>’ tab</li>
<li>Change ‘<strong>GroupPats</strong>’ to one of the following for a better flame graph:
<ol>
<li>[group module entries] {%}!=>module $1</li>
<li>[group class entries] {%!*}.%(=>class $1;{%!*}::=>class $1</li>
</ol>
</li>
<li>Change ‘<strong>Fold%</strong>’ to a higher number, maybe 3%, to get rid of any <em>thin</em> bars (any higher and you start to loose information)</li>
</ol>
<p><a href="/images/2020/03/PerfView - Flamegraph.png"><img src="/images/2020/03/PerfView - Flamegraph.png" alt="Flamegraph" /></a></p>
<p>Now, at this point I actually recommend exporting the PerfView data into a format that can be loaded into <a href="https://speedscope.app/">https://speedscope.app/</a> as it gives you a <em>much</em> better experience. To do this click <strong>File</strong> -> <strong>Save View As</strong> and then in the ‘Save as type’ box select <strong>Speed Scope Format</strong>. Once that’s done you can ‘browse’ that file at <a href="https://www.speedscope.app/">speedscope.app</a>, or if you want you can just take a look at one <a href="https://www.speedscope.app/#profileURL=https%3A%2F%2Fmattwarren.org%2Fdata%2F2020%2F03%2Fflamegraph.speedscope.json">I’ve already created</a>.</p>
<p><strong>Note:</strong> If you’ve never encountered ‘<strong>flamegraphs</strong>’ before, I really recommend reading this excellent explanation by <a href="https://twitter.com/b0rk">Julia Evans</a>:</p>
<blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">perf & flamegraphs <a href="https://t.co/duzWs2hoLT">pic.twitter.com/duzWs2hoLT</a></p>— 🔎Julia Evans🔍 (@b0rk) <a href="https://twitter.com/b0rk/status/945680809712857090?ref_src=twsrc%5Etfw">December 26, 2017</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<hr />
<h2 id="anaylsis-of-net-runtime-startup">Anaylsis of .NET Runtime Startup</h2>
<p>Finally, we can answer our original question:</p>
<blockquote>
<p>Where does the .NET Runtime spend time during start-up?</p>
</blockquote>
<p>Here’s the data <a href="https://www.speedscope.app/#profileURL=https%3A%2F%2Fmattwarren.org%2Fdata%2F2020%2F03%2Fflamegraph.speedscope.json">from the flamegraph</a> summarised as text, with links the corresponding functions in the ‘.NET Core Runtime’ source code:</p>
<ol>
<li>Entire Application - <strong>100%</strong> - 233.28ms</li>
<li>Everything except <code class="language-plaintext highlighter-rouge">helloworld!wmain</code> - <strong>21%</strong></li>
<li><code class="language-plaintext highlighter-rouge">helloworld!wmain</code> - <strong>79%</strong> - 184.57ms
<ol>
<li><code class="language-plaintext highlighter-rouge">hostpolicy!create_hostpolicy_context</code> - <strong>30%</strong> - 70.92ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/installer/corehost/cli/hostpolicy/hostpolicy.cpp#L98-L139">here</a></li>
<li><code class="language-plaintext highlighter-rouge">hostpolicy!create_coreclr</code> - <strong>22%</strong> - 50.51ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/installer/corehost/cli/hostpolicy/hostpolicy.cpp#L47-L96">here</a>
<ol>
<li><code class="language-plaintext highlighter-rouge">coreclr!CorHost2::Start</code> - <strong>9%</strong> - 20.98ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/coreclr/src/vm/corhost.cpp#L93-L173">here</a></li>
<li><code class="language-plaintext highlighter-rouge">coreclr!CorHost2::CreateAppDomain</code> - <strong>10%</strong> - 23.52ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/coreclr/src/vm/corhost.cpp#L632-L795">here</a></li>
</ol>
</li>
<li><code class="language-plaintext highlighter-rouge">hostpolicy!runapp</code> - <strong>20%</strong> - 46.20ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/installer/corehost/cli/hostpolicy/hostpolicy.cpp#L269-L276">here</a>, ends up calling into <code class="language-plaintext highlighter-rouge">Assembly::ExecuteMainMethod</code> <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/coreclr/src/vm/assembly.cpp#L1619-L1693">here</a>
<ol>
<li><code class="language-plaintext highlighter-rouge">coreclr!RunMain</code> - <strong>9.9%</strong> - 23.12ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/coreclr/src/vm/assembly.cpp#L1504-L1566">here</a></li>
<li><code class="language-plaintext highlighter-rouge">coreclr!RunStartupHooks</code> - <strong>8.1%</strong> - 19.00ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/coreclr/src/vm/assembly.cpp#L1604-L1617">here</a></li>
</ol>
</li>
<li><code class="language-plaintext highlighter-rouge">hostfxr!resolve_frameworks_for_app</code> - <strong>3.4%</strong> - 7.89ms <a href="https://github.com/dotnet/runtime/blob/9e93d094/src/installer/corehost/cli/fxr/fx_resolver.cpp#L504-L529">here</a></li>
</ol>
</li>
</ol>
<p>So, the main places that the runtime spends time are:</p>
<ol>
<li><strong>30%</strong> of total time is spent <strong>Launching the runtime</strong>, controlled via the ‘host policy’, which mostly takes place in <code class="language-plaintext highlighter-rouge">hostpolicy!create_hostpolicy_context</code> (30% of total time)</li>
<li><strong>22%</strong> of time is spend on <strong>Initialisation of the runtime</strong> itself and the initial (and only) AppDomain it creates, this can be see in <code class="language-plaintext highlighter-rouge">CorHost2::Start</code> (<em>native</em>) and <code class="language-plaintext highlighter-rouge">CorHost2::CreateAppDomain</code> (<em>managed</em>). For more info on this see <a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/">The 68 things the CLR does before executing a single line of your code</a></li>
<li><strong>20%</strong> was used <strong>JITting and executing</strong> the <code class="language-plaintext highlighter-rouge">Main</code> method in our ‘Hello World’ code sample, this started in <code class="language-plaintext highlighter-rouge">Assembly::ExecuteMainMethod</code> above.</li>
</ol>
<p>To confirm the last point, we can return to PerfView and take a look at the ‘JIT Stats Summary’ it produces. From the main menu, under ‘Advanced Group’ -> ‘JIT Stats’ we see that 23.1 ms or 9.1% of the total CPU time was spent JITing:</p>
<p><a href="/images/2020/03/PerfView - JIT Stats for HelloWorld.png"><img src="/images/2020/03/PerfView - JIT Stats for HelloWorld.png" alt="JIT Stats for HelloWorld" /></a></p>
Under the hood of "Default Interface Methods"2020-02-19T00:00:00+00:00http://www.mattwarren.org/2020/02/19/Under-the-hood-of-Default-Interface-Methods
<h2 id="background">Background</h2>
<p>‘Default Interface Methods’ (DIM) sometimes referred to as ‘Default Implementations in Interfaces’, appeared in C# 8. In case you’ve never heard of the feature, here’s some links to get you started:</p>
<ul>
<li><a href="https://devblogs.microsoft.com/dotnet/default-implementations-in-interfaces/">Default implementations in interfaces</a> (official <em>announcement</em>)</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/default-interface-methods">Default Interface Methods</a> (C# Language Proposal), here’s some notable sections:
<ul>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/default-interface-methods#diamond-inheritance-and-classes-closed">Diamond inheritance and classes</a></li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/default-interface-methods#interface-methods-vs-structs-closed">Interface methods vs structs</a></li>
<li><a href="https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-04-19.md#structs-and-default-implementations">Structs and default implementations</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/csharplang/issues/52">Champion “default interface methods”</a> (including links for ‘Language Design Meeting’ notes)</li>
<li><a href="https://docs.microsoft.com/en-gb/dotnet/csharp/tutorials/default-interface-methods-versions">Tutorial: Update interfaces with default interface methods in C# 8.0</a></li>
</ul>
<p>Also, there are quite a few other blogs posts discussing this feature, but as you can see opinion is split on whether it’s useful or not:</p>
<ul>
<li><a href="https://www.infoq.com/articles/default-interface-methods-cs8/">Default Interface Methods in C# 8</a></li>
<li><a href="https://www.codejourney.net/2019/02/csharp-8-default-interface-methods/">C# 8: Default Interface Methods Implementation</a></li>
<li><a href="https://daveaglick.com/posts/default-interface-members-what-are-they-good-for">Default Interface Members, What Are They Good For?</a></li>
<li><a href="https://gunnarpeipman.com/csharp-interface-default-implementations/">C# 8: Default implementations in interfaces</a></li>
<li><a href="https://www.talkingdotnet.com/default-implementations-in-interfaces-in-c-sharp-8/">Interfaces in C# 8.0 gets a makeover</a></li>
<li><a href="https://stu.dev/csharp8-doing-unsupported-things/#default-interface-members">C# 8.0 and .NET Standard 2.0 - Doing Unsupported Things</a></li>
<li><a href="https://jeremybytes.blogspot.com/2019/09/interfaces-in-c-8-are-bit-of-mess.html">Interfaces in C# 8 are a Bit of a Mess</a></li>
<li><a href="https://www.reddit.com/r/dotnet/comments/asq3jl/the_most_controversial_c_80_feature_default/">The most controversial C# 8.0 feature: Default Interface Methods Implementation (Reddit discussion)</a></li>
</ul>
<hr />
<p>But this post isn’t about what they are, how you can use them or if they’re useful or not. Instead we will be exploring how ‘Default Interface Methods’ work <em>under-the-hood</em>, looking at what the .NET Core Runtime has to do to make them work and how the feature was developed.</p>
<hr />
<p><strong>Table of Contents</strong></p>
<ul>
<li><a href="#background">Background</a></li>
<li><a href="#development-timeline-and-prs">Development Timeline and PRs</a>
<ul>
<li><a href="#initial-work-prototype-and-timeline">Initial work, Prototype and Timeline</a></li>
<li><a href="#interesting-prs-done-after-the-prototype-newest---oldest">Interesting PR’s done after the prototype (newest -> oldest)</a></li>
<li><a href="#bug-fixes-done-since-the-prototype-newest---oldest">Bug fixes done since the Prototype (newest -> oldest)</a></li>
<li><a href="#possible-future-work"><em>Possible</em> future work</a></li>
</ul>
</li>
<li><a href="#default-interface-methods-in-action">Default Interface Methods ‘in action’</a></li>
<li><a href="#enabling-methods-on-an-interface">Enabling Methods on an Interface</a></li>
<li><a href="#resolving-the-method-dispatch">Resolving the Method Dispatch</a></li>
<li><a href="#analysis-of-finddefaultinterfaceimplementation">Analysis of <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation(..)</code></a></li>
<li><a href="#diamond-inheritance-problem">Diamond Inheritance Problem</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
<hr />
<h2 id="development-timeline-and-prs">Development Timeline and PRs</h2>
<p>First of all, there are a few places you can go to get a ‘high-level’ understanding of what was done:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/projects/6">GitHub Project for Default Interface Methods</a></li>
<li>List of <a href="https://github.com/dotnet/coreclr/pulls?q=is%3Aclosed+is%3Apr+project%3Adotnet%2Fcoreclr%2F6+sort%3Acreated-asc">all the PRs done during the Project</a></li>
<li>To see which parts of the runtime are affected, you can <a href="https://github.com/dotnet/runtime/search?q=FEATURE_DEFAULT_INTERFACES">search for ‘FEATURE_DEFAULT_INTERFACES’</a> in the .NET (Core) Runtime source code as the entire feature is behind a #define.</li>
<li>In addition, you can see the corresponding work being done in Mono, <a href="https://github.com/mono/mono/issues/6961">Epic: Default Interface Implementation #6961</a> and <a href="https://github.com/mono/mono/issues/11267">Update default interfaces support #11267</a></li>
</ul>
<h3 id="initial-work-prototype-and-timeline">Initial work, Prototype and Timeline</h3>
<ul>
<li>The entire prototype is split across several PRs, running from <strong>March - July 2017</strong>:
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/10505">Default Interface Method Prototype #10505</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/10818">More update for default interface methods #10818</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/11693">More update in /dev/defaultintf #11693</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/11940">Add RuntimeFeature detection for default interface method #11940</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/12753">Finalize override lookup algorithm #12753</a></li>
</ul>
</li>
<li>All the initial work was merged into master in <strong>December 2017</strong> in <a href="https://github.com/dotnet/coreclr/pull/15370">Merge dev/defaultintf to master #15370</a></li>
<li>The entire feature was turned on by default in <strong>March 2019</strong> in <a href="https://github.com/dotnet/coreclr/pull/23225">Enable FeatureDefaultInterfaces unconditionally #23225</a></li>
<li>It was then <a href="https://devblogs.microsoft.com/dotnet/default-implementations-in-interfaces/">announced/released</a> in <strong>May 2019</strong>.</li>
</ul>
<h3 id="interesting-prs-done-after-the-prototype-newest---oldest">Interesting PR’s done after the prototype (newest -> oldest)</h3>
<p>Once the prototype was merged in, there was additional <em>feature</em> work done to ensure that DIM’s worked across different scenarios:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/25770">Use native code slot for default interface methods #25770</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/23313">Allow reabstraction of default interface methods #23313</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/22295">Throw the right exception when interface dispatch is ambiguous #22295</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/21355">Implement two pass algorithm for variant interface dispatch #21355</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/16257">Make it possible to Reflection.Emit default interface methods #16257</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/16034">Fix reflection to work with default interface methods #16034</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/15925">Stop treating all calls to instance interface methods as callvirt #15925</a></li>
<li><a href="https://github.com/dotnet/runtime/issues/9601">[Default Interfaces] Edit and Continue #9601</a></li>
</ul>
<h3 id="bug-fixes-done-since-the-prototype-newest---oldest">Bug fixes done since the Prototype (newest -> oldest)</h3>
<p>In addition, there were various bugs fixes done to ensure that existing parts of the CLR played nicely with DIMs:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/23970">Block usage of default interfaces feature in COM scenarios #23970</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/23032">Remove legacy behavior around non-virtual interface calls #23032</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/22464">Fix constrained call corner cases #22464</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/22427">Fix delegate creation for default interface methods on structs #22427</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/21525">Fix stack walking and reporting of default interface methods #21525</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/20458">Allow supressing exceptions in diamond inheritance cases #20458</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/20404">Handle generics in methodimpls for default interface methods #20404</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/15979">Do not devirtualize shared default interface methods #15979</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/15978">Catch ambiguous interface method resolution exceptions #15978</a></li>
</ul>
<h3 id="possible-future-work"><em>Possible</em> future work</h3>
<p>Finally, there’s no guarantee if or when this will be done, but here are the remaining issues associated with the project:</p>
<ul>
<li><a href="https://github.com/dotnet/runtime/issues/9588">Support for default interface method devirtualization #9588</a></li>
<li><a href="https://github.com/dotnet/runtime/issues/9556">Debugger support #9556</a></li>
<li><a href="https://github.com/dotnet/runtime/issues/9552">Interfaces implemented by arrays #9552</a></li>
<li><a href="https://github.com/dotnet/runtime/issues/9490">Support constrained interface calls on value types #9490</a></li>
<li><a href="https://github.com/dotnet/runtime/issues/9479">Add support for default interfaces in type generator #9479</a></li>
</ul>
<hr />
<h2 id="default-interface-methods-in-action">Default Interface Methods ‘in action’</h2>
<p>Now that we’ve seen what was done, let’s look at what that all means, starting with this code that simply demonstrates ‘Default Interface Methods’ in action:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">interface</span> <span class="nc">INormal</span> <span class="p">{</span>
<span class="k">void</span> <span class="nf">Normal</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">interface</span> <span class="nc">IDefaultMethod</span> <span class="p">{</span>
<span class="k">void</span> <span class="nf">Default</span><span class="p">()</span> <span class="p">=></span> <span class="nf">WriteLine</span><span class="p">(</span><span class="s">"IDefaultMethod.Default"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">CNormal</span> <span class="p">:</span> <span class="n">INormal</span> <span class="p">{</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">Normal</span><span class="p">()</span> <span class="p">=></span> <span class="nf">WriteLine</span><span class="p">(</span><span class="s">"CNormal.Normal"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">CDefault</span> <span class="p">:</span> <span class="n">IDefaultMethod</span> <span class="p">{</span>
<span class="c1">// Nothing to do here!</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">CDefaultOwnImpl</span> <span class="p">:</span> <span class="n">IDefaultMethod</span> <span class="p">{</span>
<span class="k">void</span> <span class="n">IDefaultMethod</span><span class="p">.</span><span class="nf">Default</span><span class="p">()</span> <span class="p">=></span> <span class="nf">WriteLine</span><span class="p">(</span><span class="s">"CDefaultOwnImpl.IDefaultMethod.Default"</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Test out the Normal/DefaultMethod Interfaces</span>
<span class="n">INormal</span> <span class="n">iNormal</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">CNormal</span><span class="p">();</span>
<span class="n">iNormal</span><span class="p">.</span><span class="nf">Normal</span><span class="p">();</span> <span class="c1">// prints "CNormal.Normal"</span>
<span class="n">IDefaultMethod</span> <span class="n">iDefault</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">CDefault</span><span class="p">();</span>
<span class="n">iDefault</span><span class="p">.</span><span class="nf">Default</span><span class="p">();</span> <span class="c1">// prints "IDefaultMethod.Default"</span>
<span class="n">IDefaultMethod</span> <span class="n">iDefaultOwnImpl</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">CDefaultOwnImpl</span><span class="p">();</span>
<span class="n">iDefaultOwnImpl</span><span class="p">.</span><span class="nf">Default</span><span class="p">();</span> <span class="c1">// prints "CDefaultOwnImpl.IDefaultMethod.Default"</span>
</code></pre></div></div>
<p>The first way we can understand how they are implemented is by using <a href="https://docs.microsoft.com/en-us/dotnet/api/system.type.getinterfacemap?view=netframework-4.8#examples"><code class="language-plaintext highlighter-rouge">Type.GetInterfaceMap(Type)</code></a> (which actually <a href="https://github.com/dotnet/coreclr/issues/15645">had to be fixed to work with DIMs</a>), this can be done with code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">ShowInterfaceMapping</span><span class="p">(</span><span class="n">Type</span> <span class="n">@implemetation</span><span class="p">,</span> <span class="n">Type</span> <span class="n">@interface</span><span class="p">)</span> <span class="p">{</span>
<span class="n">InterfaceMapping</span> <span class="n">map</span> <span class="p">=</span> <span class="n">@implemetation</span><span class="p">.</span><span class="nf">GetInterfaceMap</span><span class="p">(</span><span class="n">@interface</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">$"</span><span class="p">{</span><span class="n">map</span><span class="p">.</span><span class="n">TargetType</span><span class="p">}</span><span class="s">: GetInterfaceMap(</span><span class="p">{</span><span class="n">map</span><span class="p">.</span><span class="n">InterfaceType</span><span class="p">}</span><span class="s">)"</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">counter</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">counter</span> <span class="p"><</span> <span class="n">map</span><span class="p">.</span><span class="n">InterfaceMethods</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">counter</span><span class="p">++)</span> <span class="p">{</span>
<span class="n">MethodInfo</span> <span class="n">im</span> <span class="p">=</span> <span class="n">map</span><span class="p">.</span><span class="n">InterfaceMethods</span><span class="p">[</span><span class="n">counter</span><span class="p">];</span>
<span class="n">MethodInfo</span> <span class="n">tm</span> <span class="p">=</span> <span class="n">map</span><span class="p">.</span><span class="n">TargetMethods</span><span class="p">[</span><span class="n">counter</span><span class="p">];</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">$" </span><span class="p">{</span><span class="n">im</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">}</span><span class="s">::</span><span class="p">{</span><span class="n">im</span><span class="p">.</span><span class="n">Name</span><span class="p">}</span><span class="s"> --> </span><span class="p">{</span><span class="n">tm</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">}</span><span class="s">::</span><span class="p">{</span><span class="n">tm</span><span class="p">.</span><span class="n">Name</span><span class="p">}</span><span class="s"> (</span><span class="p">{(</span><span class="n">im</span> <span class="p">==</span> <span class="n">tm</span> <span class="p">?</span> <span class="s">"same"</span> <span class="p">:</span> <span class="s">"different"</span><span class="p">)}</span><span class="s">)"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">" MethodHandle 0x{0:X} --> MethodHandle 0x{1:X}"</span><span class="p">,</span>
<span class="n">im</span><span class="p">.</span><span class="n">MethodHandle</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="nf">ToInt64</span><span class="p">(),</span> <span class="n">tm</span><span class="p">.</span><span class="n">MethodHandle</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="nf">ToInt64</span><span class="p">());</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">" FunctionPtr 0x{0:X} --> FunctionPtr 0x{1:X}"</span><span class="p">,</span>
<span class="n">im</span><span class="p">.</span><span class="n">MethodHandle</span><span class="p">.</span><span class="nf">GetFunctionPointer</span><span class="p">().</span><span class="nf">ToInt64</span><span class="p">(),</span> <span class="n">tm</span><span class="p">.</span><span class="n">MethodHandle</span><span class="p">.</span><span class="nf">GetFunctionPointer</span><span class="p">().</span><span class="nf">ToInt64</span><span class="p">());</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Which gives the following output:</p>
<pre><code class="language-blank">//ShowInterfaceMapping(typeof(CNormal), @interface: typeof(INormal));
//ShowInterfaceMapping(typeof(CDefault), @interface: typeof(IDefaultMethod));
//ShowInterfaceMapping(typeof(CDefaultOwnImpl), @interface: typeof(IDefaultMethod));
TestApp.CNormal: GetInterfaceMap(TestApp.INormal)
TestApp.INormal::Normal --> TestApp.CNormal::Normal (different)
MethodHandle 0x7FF993916A80 --> MethodHandle 0x7FF993916B10
FunctionPtr 0x7FF99385FC50 --> FunctionPtr 0x7FF993861880
TestApp.CDefault: GetInterfaceMap(TestApp.IDefaultMethod)
TestApp.IDefaultMethod::Default --> TestApp.IDefaultMethod::Default (same)
MethodHandle 0x7FF993916BD8 --> MethodHandle 0x7FF993916BD8
FunctionPtr 0x7FF99385FC78 --> FunctionPtr 0x7FF99385FC78
TestApp.CDefaultOwnImpl: GetInterfaceMap(TestApp.IDefaultMethod)
TestApp.IDefaultMethod::Default --> TestApp.CDefaultOwnImpl::TestApp.IDefaultMethod.Default (different)
MethodHandle 0x7FF993916BD8 --> MethodHandle 0x7FF993916D10
FunctionPtr 0x7FF99385FC78 --> FunctionPtr 0x7FF9938663A0
</code></pre>
<p>So here we can see that in the case of <code class="language-plaintext highlighter-rouge">IDefaultMethod</code> interface on the <code class="language-plaintext highlighter-rouge">CDefault</code> class the interface and method implementations are the <em>same</em>. As you can see, in the other scenarios the interface method maps to a <em>different</em> method implementation.</p>
<p>But lets look at bit lower, making use of WinDBG and the <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension">SOS extension</a> to get a peek into the internal ‘data structures’ that the runtime uses.</p>
<p>First, lets take a look at the <code class="language-plaintext highlighter-rouge">MethodTable</code> (<code class="language-plaintext highlighter-rouge">dumpmt</code>) for the <code class="language-plaintext highlighter-rouge">INormal</code> interface:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> dumpmt -md 00007ff8bcc31dd8
EEClass: 00007FF8BCC2C420
Module: 00007FF8BCC0F788
Name: TestApp.INormal
mdToken: 0000000002000002
File: C:\DefaultInterfaceMethods\TestApp\bin\Debug\netcoreapp3.0\TestApp.dll
BaseSize: 0x0
ComponentSize: 0x0
Slots in VTable: 1
Number of IFaces in IFaceMap: 0
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
00007FF8BCB70580 00007FF8BCC31DC8 NONE TestApp.INormal.Normal()
</code></pre></div></div>
<p>So we can see that the interface has an entry for the <code class="language-plaintext highlighter-rouge">Normal()</code> method, as expected, but lets look in more detail at the <code class="language-plaintext highlighter-rouge">MethodDesc</code> (<code class="language-plaintext highlighter-rouge">dumpmd</code>):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> dumpmd 00007FF8BCC31DC8
Method Name: TestApp.INormal.Normal()
Class: 00007ff8bcc2c420
MethodTable: 00007ff8bcc31dd8
mdToken: 0000000006000001
Module: 00007ff8bcc0f788
IsJitted: no
Current CodeAddr: ffffffffffffffff
Version History:
ILCodeVersion: 0000000000000000
ReJIT ID: 0
IL Addr: 0000000000000000
CodeAddr: 0000000000000000 (MinOptJitted)
NativeCodeVersion: 0000000000000000
</code></pre></div></div>
<p>So whilst the method exists in the interface definition, it’s clear that the method has not been jitted (<code class="language-plaintext highlighter-rouge">IsJitted: no</code>) and in fact it never will, as it can never be executed.</p>
<p>Now lets compare that output with the one for the <code class="language-plaintext highlighter-rouge">IDefaultMethod</code> interface, again the <code class="language-plaintext highlighter-rouge">MethodTable</code> (<code class="language-plaintext highlighter-rouge">dumpmt</code>) and the <code class="language-plaintext highlighter-rouge">MethodDesc</code> (<code class="language-plaintext highlighter-rouge">dumpmd</code>):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> dumpmt -md 00007ff8bcc31e68
EEClass: 00007FF8BCC2C498
Module: 00007FF8BCC0F788
Name: TestApp.IDefaultMethod
mdToken: 0000000002000003
File: C:\DefaultInterfaceMethods\TestApp\bin\Debug\netcoreapp3.0\TestApp.dll
BaseSize: 0x0
ComponentSize: 0x0
Slots in VTable: 1
Number of IFaces in IFaceMap: 0
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
00007FF8BCB70590 00007FF8BCC31E58 JIT TestApp.IDefaultMethod.Default()
> dumpmd 00007FF8BCC31E58
Method Name: TestApp.IDefaultMethod.Default()
Class: 00007ff8bcc2c498
MethodTable: 00007ff8bcc31e68
mdToken: 0000000006000002
Module: 00007ff8bcc0f788
IsJitted: yes
Current CodeAddr: 00007ff8bcb765c0
Version History:
ILCodeVersion: 0000000000000000
ReJIT ID: 0
IL Addr: 0000000000000000
CodeAddr: 00007ff8bcb765c0 (MinOptJitted)
NativeCodeVersion: 0000000000000000
</code></pre></div></div>
<p>Here we see something very different, the <code class="language-plaintext highlighter-rouge">MethodDesc</code> entry in the <code class="language-plaintext highlighter-rouge">MethodTable</code> actually has jitted, executable code associated with it.</p>
<hr />
<h2 id="enabling-methods-on-an-interface">Enabling Methods on an Interface</h2>
<p>So we’ve seen that ‘default interface methods’ are wired up by the runtime, but how does that happen?</p>
<p>Firstly, it’s very illuminating to look at the initial prototype of the feature in <a href="https://github.com/dotnet/coreclr/pull/10505/">CoreCLR PR #10505</a>, because we can understand at the lowest level what the feature is actually enabling, from <a href="https://github.com/dotnet/coreclr/pull/10505/files#diff-711c484c34d9ba3361552c3f2e1a4246">/src/vm/classcompat.cpp</a>:</p>
<p><a href="/images/2020/02/Default Interface Methods - Relaxing class constraints.png"><img src="/images/2020/02/Default Interface Methods - Relaxing class constraints.png" alt="Default Interface Methods - Relaxing class constraints" /></a></p>
<p>Here we see why DIM didn’t require any changes to the .NET <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">‘Intermediate Language’ (IL)</a> op-codes, instead <strong>they are enabled by relaxing a previous restriction</strong>. Before this change, you weren’t able to add ‘<em>virtual, non-abstract</em>’ or ‘<em>non-virtual</em>’ methods to an interface:</p>
<ul>
<li>“Virtual Non-Abstract Interface Method.” (<code class="language-plaintext highlighter-rouge">BFA_VIRTUAL_NONAB_INT_METHOD</code>)</li>
<li>“Nonvirtual Instance Interface Method.” (<code class="language-plaintext highlighter-rouge">BFA_NONVIRT_INST_INT_METHOD</code>)</li>
</ul>
<p>This ties in with the <em>proposed</em> changes to the <a href="https://www.ecma-international.org/publications/standards/Ecma-335.htm">ECMA-335 specification</a>, from the <a href="https://github.com/dotnet/coreclr/blob/release/3.1/Documentation/design-docs/default-interface-methods.md">‘Default interface methods’ design doc</a>:</p>
<blockquote>
<p>The major changes are:</p>
<ul>
<li><strong>Interfaces are now allowed to have instance methods (both virtual and non-virtual). Previously we only allowed abstract virtual methods.</strong>
<ul>
<li>Interfaces obviously still can’t have instance fields.</li>
</ul>
</li>
<li>Interface methods are allowed to MethodImpl other interface methods the interface requires (but we require the MethodImpls to be final to keep things simple) - i.e. an interface is allowed to provide (or override) an implementation of another interface’s method</li>
</ul>
</blockquote>
<p>However, just allowing ‘<em>virtual, non-abstract</em>’ or ‘<em>non-virtual</em>’ methods to exist on an interface is only the start, the runtime then needs to allow code to call those methods and that is far harder!</p>
<hr />
<h2 id="resolving-the-method-dispatch">Resolving the Method Dispatch</h2>
<p>In .NET, since version 2.0, all interface methods calls have taken place via a mechanism known as <a href="https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/botr/virtual-stub-dispatch.md">Virtual Stub Dispatch</a>:</p>
<blockquote>
<p>Virtual stub dispatching (VSD) is the technique of using <strong>stubs for virtual method invocations instead of the traditional virtual method table</strong>. In the past, interface dispatch required that interfaces had process-unique identifiers, and that every loaded interface was added to a global interface virtual table map. This requirement meant that all interfaces and all classes that implemented interfaces had to be restored at runtime in NGEN scenarios, causing significant startup working set increases. <strong>The motivation for stub dispatching was to eliminate much of the related working set, as well as distribute the remaining work throughout the lifetime of the process.</strong></p>
<p>Although it is possible for VSD to dispatch both virtual instance and interface method calls, <strong>it is currently used only for interface dispatch.</strong></p>
</blockquote>
<p>For more information I recommend reading the section on <a href="https://lukasatkinson.de/2018/interface-dispatch/#slotmaps">C#’s slotmaps</a> in the excellent article on ‘Interface Dispatch’ by <a href="https://twitter.com/latkde">Lukas Atkinson</a>.</p>
<p>So, to make DIM work, the runtime has to wire up any ‘default methods’, so that they integrate with the ‘virtual stub dispatch’ mechanism. We can see this in action by looking at the call stack from the hand-crafted assembly stub (<code class="language-plaintext highlighter-rouge">ResolveWorkerAsmStub</code>) all the way down to <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation(..)</code> which finds the correct method, given an interface (<code class="language-plaintext highlighter-rouge">pInterfaceMD</code>) and the default method to call (<code class="language-plaintext highlighter-rouge">pInterfaceMT</code>):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- coreclr.dll!MethodTable::FindDefaultInterfaceImplementation(MethodDesc *pInterfaceMD, MethodTable *pInterfaceMT, MethodDesc **ppDefaultMethod, int allowVariance, int throwOnConflict) Line 6985 C++
- coreclr.dll!MethodTable::FindDispatchImpl(unsigned int typeID, unsigned int slotNumber, DispatchSlot *pImplSlot, int throwOnConflict) Line 6851 C++
- coreclr.dll!MethodTable::FindDispatchSlot(unsigned int typeID, unsigned int slotNumber, int throwOnConflict) Line 7251 C++
- coreclr.dll!VirtualCallStubManager::Resolver(MethodTable *pMT, DispatchToken token, OBJECTREF *protectedObj, unsigned __int64 *ppTarget, int throwOnConflict) Line 2208 C++
- coreclr.dll!VirtualCallStubManager::ResolveWorker(StubCallSite *pCallSite, OBJECTREF *protectedObj, DispatchToken token, VirtualCallStubManager::StubKind stubKind) Line 1874 C++
- coreclr.dll!VSD_ResolveWorker(TransitionBlock *pTransitionBlock, unsigned __int64 siteAddrForRegisterIndirect, unsigned __int64 token, unsigned __int64 flags) Line 1683 C++
- coreclr.dll!ResolveWorkerAsmStub() Line 42 Unknown
</code></pre></div></div>
<p>If you want to explore the call-stack in more detail, you can follow the links below:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ResolveWorkerAsmStub</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/amd64/VirtualCallStubAMD64.asm#L40">here</a>
<ul>
<li>This is the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md#generic-resolver">‘Generic Resolver’</a> phase of ‘Virtual Stub Dispatch’.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">VSD_ResolveWorker(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/virtualcallstub.cpp#L1683">here</a></li>
<li><code class="language-plaintext highlighter-rouge">VirtualCallStubManager::ResolveWorker(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/virtualcallstub.cpp#L1874">here</a></li>
<li><code class="language-plaintext highlighter-rouge">VirtualCallStubManager::Resolver(..)</code><a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/virtualcallstub.cpp#L2204">here</a></li>
<li><code class="language-plaintext highlighter-rouge">MethodTable::FindDispatchSlot(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7459">here</a>
<code class="language-plaintext highlighter-rouge">[MethodTable::FindDispatchImpl(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7065">here</a> or <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7075">here</a></li>
<li>Finally ending up in <code class="language-plaintext highlighter-rouge">MethodTable::FindDefaultInterfaceImplementation(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7173-L7444">here</a></li>
</ul>
<hr />
<h2 id="analysis-of-finddefaultinterfaceimplementation">Analysis of <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation(..)</code></h2>
<p>So the code in <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation(..)</code> is at the heart of the feature, but what does it need to do and how does it do it? This list from <a href="https://github.com/dotnet/coreclr/pull/12753">Finalize override lookup algorithm #12753</a> gives us some idea of the complexity:</p>
<blockquote>
<ul>
<li>properly detect diamond shape positive case (where I4 overrides both I2/I3 which both overrides I1) by keep tracking of a current list of best candidates. I went for the simplest algorithm and didn’t build any complex graph / DFS since the majority case the list of interfaces would be small, and interface dispatch cache would ensure majority of cases we don’t need to redo the (slow) dispatch. If needed we can revisit this to make it a proper topological sort.</li>
<li>VerifyVirtualMethodsImplemented now properly validates default interface scenarios - it is happy if there is at least one implementation and early returns. It doesn’t worry about conflicting overrides, for performance reasons.</li>
<li>NotSupportedException thrown in conflicting override scenario now has a proper error message</li>
<li>properly supports GVM when detecting method impl overrides</li>
<li>Revisited code that adds method impl for interfaces. added proper methodimpl validation and ensure methodimpl are virtual and final (and throw exception if it is not final)</li>
<li>Added test scenario with method that has multiple method impl. found and fixed a bug where the slot array is not big enough when building method impls for interfaces.</li>
</ul>
</blockquote>
<p>In addition, the ‘two-pass’ algorithm was implemented in <a href="https://github.com/dotnet/coreclr/pull/21355">Implement two pass algorithm for variant interface dispatch #21355</a>, which contains an interesting discussion of the <a href="https://github.com/dotnet/coreclr/pull/21355#discussion_r238893252">edge-cases that need to be handled</a>.</p>
<p>So onto the code, this is the high-level view of the algorithm:</p>
<ul>
<li>Which actually starts in <code class="language-plaintext highlighter-rouge">MethodTable::FindDispatchImpl(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7057-L7112">here</a>, where <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation</code> can be called twice:
<ol>
<li>First time to try and find an ‘exact match’ (<code class="language-plaintext highlighter-rouge">allowVariance</code>=false)</li>
<li>Then if that fails, it’s called again to try and find a ‘variant match’ (<code class="language-plaintext highlighter-rouge">allowVariance</code>=true)</li>
</ol>
</li>
<li>The entire <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation</code> method <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7173-L7444">is here</a>, it’s fairly straight-forward and relatively easy to understand, plus there’s only ~270 LOC and they’re all very well commented. The high-level algorithm is the following:
<ol>
<li>Walk interface from <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7202-L7402">derived class to parent class here</a>, this is a straight-forward implementation that may me revisited <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7204-L7206">if it doesn’t scale well</a></li>
<li>Then <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7220-L7398">scan through each class</a> looking for a match:
<ol>
<li>an <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7227-L7234">‘exact match’</a></li>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7237-L7244">‘generic variance match’</a>, i.e. the interfaces match via ‘casting’, but ultimately have the same <code class="language-plaintext highlighter-rouge">TypeDef</code></li>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7276-L7303">‘more specific interface’</a> that matches, this match is made more complicated by the fact that <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7278-L7282">‘generic instantiations’ are involved</a></li>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7304-L7308">‘more specific interface’</a> matches, but without generics involved, so much simpler to calculate</li>
</ol>
</li>
<li>If the previous step produced a match, double-check that it is the <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7314-L7395"><em>most</em> specific interface match seen so far</a>, by keeping a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7198-L7200">‘candidates list’</a> and classifying each scenario as:
<ol>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7354-L7357">‘tie’ which is ignored</a>, i.e. a ‘variant match’ on the same type</li>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7358-L7374">‘more specific’ match</a>, which is used to update the ‘candidates list’</li>
<li>a <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7375-L7381">‘less-specific’ match</a>, so no need to carry on with this candidate</li>
</ol>
</li>
<li>Finally, a scan is done to see if there are any conflicts <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7404-L7438">here</a>, which is acceptable when <code class="language-plaintext highlighter-rouge">allowVariance=true</code>, but otherwise <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7427">throws an exception</a></li>
<li>That’s it, the <a href="https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/methodtable.cpp#L7434-L7438">‘best-candidate’ is then returned to the caller</a> (assuming there is one)</li>
</ol>
</li>
</ul>
<h2 id="diamond-inheritance-problem">Diamond Inheritance Problem</h2>
<p>Finally, the ‘diamond inheritance problem’ was mentioned in a few of the PRs/Issues related to the feature, but what is it?</p>
<p>A good place to starts is one of the test cases, <a href="https://github.com/dotnet/coreclr/blob/release/3.1/tests/src/Loader/classloader/DefaultInterfaceMethods/diamondshape/diamondshape.cs">diamondshape.cs</a>. However there’s a more concise example in the <a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/default-interface-methods#diamond-inheritance-and-classes-closed">C#8 Language Proposal</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">interface</span> <span class="nc">IA</span>
<span class="p">{</span>
<span class="k">void</span> <span class="nf">M</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">interface</span> <span class="nc">IB</span> <span class="p">:</span> <span class="n">IA</span>
<span class="p">{</span>
<span class="k">override</span> <span class="k">void</span> <span class="nf">M</span><span class="p">()</span> <span class="p">{</span> <span class="nf">WriteLine</span><span class="p">(</span><span class="s">"IB"</span><span class="p">);</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">Base</span> <span class="p">:</span> <span class="n">IA</span>
<span class="p">{</span>
<span class="k">void</span> <span class="n">IA</span><span class="p">.</span><span class="nf">M</span><span class="p">()</span> <span class="p">{</span> <span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Base"</span><span class="p">);</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">Derived</span> <span class="p">:</span> <span class="n">Base</span><span class="p">,</span> <span class="n">IB</span> <span class="c1">// allowed?</span>
<span class="p">{</span>
<span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Ia</span> <span class="n">a</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Derived</span><span class="p">();</span>
<span class="n">a</span><span class="p">.</span><span class="nf">M</span><span class="p">();</span> <span class="c1">// what does it do?</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So the issue is which of the matching interface methods should be used, in this case <code class="language-plaintext highlighter-rouge">IB.M()</code> or <code class="language-plaintext highlighter-rouge">Base.IA.M()</code>? The resolution, as outlined in the <a href="https://github.com/dotnet/csharplang/blob/master/proposals/csharp-8.0/default-interface-methods.md#diamond-inheritance-and-classes-closed">C#8 language proposal</a> was to use the <em>most specific override</em>:</p>
<blockquote>
<p><strong><em>Closed Issue:</em></strong> Confirm the draft spec, above, for <em>most specific override</em> as it applies to mixed classes and interfaces (a class takes priority over an interface). See <a href="https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-04-19.md#diamonds-with-classes">https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-04-19.md#diamonds-with-classes</a>.</p>
</blockquote>
<p>Which ties in with the ‘more-specific’ and ‘less-specific’ steps we saw in the outline of <code class="language-plaintext highlighter-rouge">FindDefaultInterfaceImplementation</code> above.</p>
<hr />
<h2 id="summary">Summary</h2>
<p>So there you have it, an entire feature delivered end-to-end, yay for .NET (Core) being open source! Thanks to the runtime engineers for making their Issues and PRs easy to follow and for adding such great comments to their code! Also kudos to the language designers for making their proposals and meeting notes available for all to see (e.g. <a href="https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-04-19.md#diamonds-with-classes">LDM-2017-04-19</a>).</p>
<p>Whether you think they are useful or not, it’s hard to argue that ‘Default Interface Methods’ aren’t well designed and well implemented.</p>
<p>But what makes it even more unique feature is that it required the <em>compiler</em> and <em>runtime</em> teams working together to make it possible!</p>
Research based on the .NET Runtime2019-10-25T00:00:00+00:00http://www.mattwarren.org/2019/10/25/Research-based-on-the-.NET-Runtime
<p>Over the last few years, I’ve come across more and more research papers based, in some way, on the ‘Common Language Runtime’ (CLR).</p>
<p>So armed with <a href="https://scholar.google.com/">Google Scholar</a> and ably assisted by <a href="https://www.semanticscholar.org/">Semantic Scholar</a>, I put together the list below.</p>
<p><strong>Note:</strong> I put the papers into the following categories to make them easier to navigate (papers in each category are sorted by date, newest -> oldest):</p>
<ul>
<li>Using the .NET Runtime as a <strong><em>case-study</em></strong>
<ul>
<li>to prove its <em>correctness</em>, study <em>how it works</em> or analyse its <em>behaviour</em></li>
</ul>
</li>
<li>Research carried out by <a href="https://www.microsoft.com/en-us/research/"><strong>Microsoft Research</strong></a>, the research subsidiary of Microsoft.
<ul>
<li>“<em>It was formed in 1991, with the intent to advance state-of-the-art computing and solve difficult world problems through technological innovation in collaboration with academic, government, and industry researchers</em>” (<a href="https://en.wikipedia.org/wiki/Microsoft_Research">according to Wikipedia</a>)</li>
</ul>
</li>
<li>Papers based on the <a href="https://www.mono-project.com/"><strong>Mono Runtime</strong></a>
<ul>
<li>a ‘<em>Cross-Platform, open-source .NET framework</em>’</li>
</ul>
</li>
<li>Using <a href="https://blogs.msdn.microsoft.com/jasonz/2006/03/23/rotor-sscli-2-0-ships/"><strong>‘Rotor’</strong></a>, real name ‘Shared Source CLI (SSCLI)’
<ul>
<li>from <a href="https://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure">Wikipedia</a> “<em>Microsoft provides the Shared Source CLI as a reference CLI implementation suitable for educational use</em>”</li>
</ul>
</li>
</ul>
<p><strong>Any papers I’ve missed? If so, please let me know in the comments or on <a href="https://twitter.com/matthewwarren">Twitter</a></strong></p>
<hr />
<ul>
<li><a href="#net-runtime-as-a-case-study"><strong>.NET Runtime as a Case-Study</strong></a>
<ul>
<li><a href="#pitfalls-of-c-generics-and-their-solution-using-concepts-belyakova--mikhalkovich-2015"><strong>Pitfalls of C# Generics and Their Solution Using Concepts</strong> (Belyakova & Mikhalkovich, 2015)</a></li>
<li><a href="#efficient-compilation-of-net-programs-for-embedded-systems-sallenaveab--ducournaub-2011"><strong>Efficient Compilation of .NET Programs for Embedded Systems</strong> (Sallenaveab & Ducournaub, 2011)</a></li>
<li><a href="#type-safety-of-c-and-net-clr-fruja-2007"><strong>Type safety of C# and .Net CLR</strong> (Fruja, 2007)</a></li>
<li><a href="#modeling-the-net-clr-exception-handling-mechanism-for-a-mathematical-analysis-fruja--b%c3%b6rger-2006"><strong>Modeling the .NET CLR Exception Handling Mechanism for a Mathematical Analysis</strong> (Fruja & Börger, 2006)</a></li>
<li><a href="#analysis-of-the-net-clr-exception-handling-mechanism-fruja--b%c3%b6rger-2005"><strong>Analysis of the .NET CLR Exception Handling Mechanism</strong> (Fruja & Börger, 2005)</a></li>
<li><a href="#a-modular-design-for-the-common-language-runtime-clr-architecture-fruja-2005"><strong>A Modular Design for the Common Language Runtime (CLR) Architecture</strong> (Fruja, 2005)</a></li>
<li><a href="#cross-language-program-slicing-in-the-net-framework-p%c3%b3cza-bicz%c3%b3--porkol%c3%a1b-2005"><strong>Cross-language Program Slicing in the .NET Framework</strong> (Pócza, Biczó & Porkoláb, 2005)</a></li>
<li><a href="#design-and-implementation-of-a-high-level-multi-language--net-debugger-strein-2005"><strong>Design and Implementation of a high-level multi-language . NET Debugger</strong> (Strein, 2005)</a></li>
<li><a href="#a-high-level-modular-definition-of-the-semantics-of-c-b%c3%b6rger-fruja-gervasi--st%c3%a4rk-2004"><strong>A High-Level Modular Definition of the Semantics of C#</strong> (Börger, Fruja, Gervasi & Stärk, 2004)</a></li>
<li><a href="#an-asm-specification-of-c-threads-and-the-net-memory-model-st%c3%a4rk-and-b%c3%b6rger-2004"><strong>An ASM Specification of C# Threads and the .NET Memory Model</strong> (Stärk and Börger, 2004)</a></li>
<li><a href="#common-language-runtime--a-new-virtual-machine-ferreira-2004"><strong>Common Language Runtime : a new virtual machine</strong> (Ferreira, 2004)</a></li>
<li><a href="#jvm-versus-clr-a-comparative-study-singer-2003"><strong>JVM versus CLR: a comparative study</strong> (Singer, 2003)</a></li>
<li><a href="#runtime-code-generation-with-jvm-and-clr-sestoft-2002"><strong>Runtime Code Generation with JVM And CLR</strong> (Sestoft, 2002)</a></li>
</ul>
</li>
<li><a href="#microsoft-research"><strong>Microsoft Research</strong></a>
<ul>
<li><a href="#project-snowflake-non-blocking-safe-manual-memory-management-in-net-parkinson-vaswani-costa-deligiannis-blankstein-mcdermott-balkind--vytiniotis-2017"><strong>Project Snowflake: Non-blocking safe manual memory management in .NET</strong> (Parkinson, Vaswani, Costa, Deligiannis, Blankstein, McDermott, Balkind & Vytiniotis, 2017)</a></li>
<li><a href="#simple-fast-and-safe-manual-memory-management-kedia-costa-vytiniotis-parkinson-vaswani--blankstein-2017"><strong>Simple, Fast and Safe Manual Memory Management</strong> (Kedia, Costa, Vytiniotis, Parkinson, Vaswani & Blankstein, 2017)</a></li>
<li><a href="#uniqueness-and-reference-immutability-for-safe-parallelism-gordon-parkinson-parsons-bromfield--duffy-2012"><strong>Uniqueness and Reference Immutability for Safe Parallelism</strong> (Gordon, Parkinson, Parsons, Bromfield & Duffy, 2012)</a></li>
<li><a href="#a-study-of-concurrent-real-time-garbage-collectors-pizlo-petrank--steensgaard-2008"><strong>A study of concurrent real-time garbage collectors</strong> (Pizlo, Petrank & Steensgaard, 2008)</a></li>
<li><a href="#optimizing-concurrency-levels-in-the-net-threadpool-a-case-study-of-controller-design-and-implementation-hellerstein-morrison--eilebrecht-2008"><strong>Optimizing concurrency levels in the. net threadpool: A case study of controller design and implementation</strong> (Hellerstein, Morrison & Eilebrecht, 2008)</a></li>
<li><a href="#stopless-a-real-time-garbage-collector-for-multiprocessors-pizlo-frampton-petrank--steensgaard-2007"><strong>Stopless: a real-time garbage collector for multiprocessors.</strong> (Pizlo, Frampton, Petrank & Steensgaard, 2007)</a></li>
<li><a href="#securing-the-net-programming-model-kennedy-2006"><strong>Securing the .NET Programming Model</strong> (Kennedy, 2006)</a></li>
<li><a href="#combining-generics-pre-compilation-and-sharing-between-software-based-processes-syme--kennedy-2004"><strong>Combining Generics, Pre-compilation and Sharing Between Software-Based Processes</strong> (Syme & Kennedy, 2004)</a></li>
<li><a href="#formalization-of-generics-for-the-net-common-language-runtime-yu-kennedy--syme-2004"><strong>Formalization of Generics for the .NET Common Language Runtime</strong> (Yu, Kennedy & Syme, 2004)</a></li>
<li><a href="#runtime-verification-of-net-contracts-barnett--schulte-2003"><strong>Runtime Verification of .NET Contracts</strong> (Barnett & Schulte, 2003)</a></li>
<li><a href="#design-and-implementation-of-generics-for-the-net-common-language-runtime-kennedy--syme-2001"><strong>Design and Implementation of Generics for the .NET Common Language Runtime</strong> (Kennedy & Syme, 2001)</a></li>
<li><a href="#typing-a-multi-language-intermediate-code-gordon--syme-2001"><strong>Typing a Multi-Language Intermediate Code</strong> (Gordon & Syme, 2001)</a></li>
</ul>
</li>
<li><a href="#mono-runtime"><strong>Mono Runtime</strong></a>
<ul>
<li><a href="#static-and-dynamic-analysis-of-android-malware-and-goodware-written-with-unity-framework-shim-lim-cho-han--park-2018"><strong>Static and Dynamic Analysis of Android Malware and Goodware Written with Unity Framework</strong> (Shim, Lim, Cho, Han & Park, 2018)</a></li>
<li><a href="#reducing-startup-time-of-a-deterministic-virtualizing-runtime-environment-d%c3%a4umler--werner-2013"><strong>Reducing startup time of a deterministic virtualizing runtime environment</strong> (Däumler & Werner, 2013)</a></li>
<li><a href="#detecting-clones-across-microsoft-net-programming-languages-al-omari-keivanloo-roy--rilling-2012"><strong>Detecting Clones Across Microsoft .NET Programming Languages</strong> (Al-Omari, Keivanloo, Roy & Rilling, 2012)</a></li>
<li><a href="#language-independent-sandboxing-of-just-in-time-compilation-and-self-modifying-code-ansel--marchenko-2012"><strong>Language-independent sandboxing of just-in-time compilation and self-modifying code</strong> (Ansel & Marchenko, 2012)</a></li>
<li><a href="#vmkit-a-substrate-for-managed-runtime-environments-geoffray-thomas-lawall-muller--folliot-2010"><strong>VMKit: a Substrate for Managed Runtime Environments</strong> (Geoffray, Thomas, Lawall, Muller & Folliot, 2010)</a></li>
<li><a href="#mmc-the-mono-model-checker-ruys--aan-de-brugh-2007"><strong>MMC: the Mono Model Checker</strong> (Ruys & Aan de Brugh, 2007)</a></li>
<li><a href="#numeric-performance-in-c-c-and-java-sestoft-2007"><strong>Numeric performance in C, C# and Java</strong> (Sestoft, 2007)</a></li>
<li><a href="#mono-versus-net-a-comparative-study-of-performance-for-distributed-processing-blajian-eggen-eggen--pitts-2006">[<strong>Mono versus .Net: A Comparative Study of Performance for Distributed Processing.</strong> (Blajian, Eggen, Eggen & Pitts, 2006)]()</a></li>
<li><a href="#mono-versus-net-a-comparative-study-of-performance-for-distributed-processing-blajian-eggen-eggen--pitts-2006"><strong>Mono versus .Net: A Comparative Study of Performance for Distributed Processing.</strong> (Blajian, Eggen, Eggen & Pitts, 2006)</a></li>
<li><a href="#automated-detection-of-performance-regressions-the-mono-experience-kalibera-bulej--tuma-2005"><strong>Automated detection of performance regressions: the mono experience</strong> (Kalibera, Bulej & Tuma, 2005)</a></li>
</ul>
</li>
<li><a href="#shared-source-common-language-infrastructure-sscli---aka-rotor"><strong>Shared Source Common Language Infrastructure</strong> (SSCLI) - a.k.a ‘<strong>Rotor</strong>’</a>
<ul>
<li><a href="#efficient-virtual-machine-support-of-runtime-structural-reflection-ortina-redondoa--perez-schofield-2009"><strong>Efficient virtual machine support of runtime structural reflection</strong> (Ortina, Redondoa & Perez-Schofield, 2009)</a></li>
<li><a href="#extending-the-sscli-to-support-dynamic-inheritance-redondo-ortin--perez-schofield-2008"><strong>Extending the SSCLI to Support Dynamic Inheritance</strong> (Redondo, Ortin & Perez-Schofield, 2008)</a></li>
<li><a href="#sampling-profiler-for-rotor-as-part-of-optimizing-compilation-system-chilingarova--safonov-2006"><strong>Sampling profiler for Rotor as part of optimizing compilation system</strong> (Chilingarova & Safonov, 2006)</a></li>
<li><a href="#to-jit-or-not-to-jit-the-effect-of-code-pitching-on-the-performance-of-net-framework-anthony-leung--srisa-an-2005"><strong>To JIT or not to JIT: The effect of code-pitching on the performance of .NET framework</strong> (Anthony, Leung & Srisa-an, 2005)</a></li>
<li><a href="#adding-structural-reflection-to-the-sscli-ortin-redondo-vinuesa--lovelle-2005"><strong>Adding structural reflection to the SSCLI</strong> (Ortin, Redondo, Vinuesa & Lovelle, 2005)</a></li>
<li><a href="#static-analysis-for-identifying-and-allocating-clusters-of-immortal-objects-ravindar--srikant-2005"><strong>Static Analysis for Identifying and Allocating Clusters of Immortal Objects</strong> (Ravindar & Srikant, 2005)</a></li>
<li><a href="#an-optimizing-just-intime-compiler-for-rotor-trindade--silva-2005"><strong>An Optimizing Just-InTime Compiler for Rotor</strong> (Trindade & Silva, 2005)</a></li>
<li><a href="#software-interactions-into-the-sscli-platform-charfi--emsellem-2004"><strong>Software Interactions into the SSCLI platform</strong> (Charfi & Emsellem, 2004)</a></li>
<li><a href="#experience-integrating-a-new-compiler-and-a-new-garbage-collector-into-rotor-anderson-eng-glew-lewis-menon--stichnoth-2004"><strong>Experience Integrating a New Compiler and a New Garbage Collector Into Rotor</strong> (Anderson, Eng, Glew, Lewis, Menon & Stichnoth, 2004)</a></li>
</ul>
</li>
</ul>
<hr />
<h2 id="net-runtime-as-a-case-study"><strong>.NET Runtime as a Case-Study</strong></h2>
<h3 id="pitfalls-of-c-generics-and-their-solution-using-concepts-belyakova--mikhalkovich-2015"><a href="https://www.researchgate.net/publication/277564142_Pitfalls_of_C_Generics_and_Their_Solution_Using_Concepts"><strong>Pitfalls of C# Generics and Their Solution Using Concepts</strong> (Belyakova & Mikhalkovich, 2015)</a></h3>
<p><strong>Abstract</strong></p>
<p>In comparison with Haskell type classes and C ++ concepts, such object-oriented languages as C# and Java provide much limited mechanisms of generic programming based on F-bounded polymorphism. Main pitfalls of C# generics are considered in this paper. Extending C# language with concepts which can be simultaneously used with interfaces is proposed to solve the problems of generics; a design and translation of concepts are outlined.</p>
<h3 id="efficient-compilation-of-net-programs-for-embedded-systems-sallenaveab--ducournaub-2011"><a href="https://www.researchgate.net/publication/260107282_Efficient_Compilation_of_NET_Programs_for_Embedded_Systems"><strong>Efficient Compilation of .NET Programs for Embedded Systems</strong> (Sallenaveab & Ducournaub, 2011)</a></h3>
<p><strong>Abstract</strong></p>
<p>Compiling under the closed-world assumption (CWA) has been shown to be an appropriate way for implementing object-oriented languages such as Java on low-end embedded systems. In this paper, we explore the implications of using whole program optimizations such as Rapid Type Analysis (RTA) and coloring on programs targeting the .NET infrastructure. We extended RTA so that it takes into account .NET specific features such as (i) array covariance, a language feature also supported in Java, (ii) generics, whose specifications in .Net impacts type analysis and (iii) delegates, which encapsulate methods within objects. We also use an intraprocedural control flow analysis in addition to RTA . We eval-uated the optimizations that we implemented on programs written in C#. Preliminary results show a noticeable reduction of the code size, class hierarchy and polymorphism of the programs we optimize. Array covariance is safe in almost all cases, and some delegate calls can be implemented as direct calls.</p>
<h3 id="type-safety-of-c-and-net-clr-fruja-2007"><a href="https://www.research-collection.ethz.ch/handle/20.500.11850/72699"><strong>Type safety of C# and .Net CLR</strong> (Fruja, 2007)</a></h3>
<p><strong>Abstract</strong></p>
<p>Type safety plays a crucial role in the security enforcement of any typed programming language. This thesis presents a formal proof of C#’s type safety. For this purpose, we develop an abstract
framework for C#, comprising formal specifications of the language’s grammar, of the statically correct programs, and of the static and operational semantics. Using this framework, we prove that C# is type-safe, by showing that the execution of statically correct C# programs does not lead to type errors.</p>
<h3 id="modeling-the-net-clr-exception-handling-mechanism-for-a-mathematical-analysis-fruja--börger-2006"><a href="https://www.semanticscholar.org/paper/Modeling-the-.NET-CLR-Exception-Handling-Mechanism-Fruja-B%C3%B6rger/a6fe1d4763a70d5a5658f83e9f56135725c772a6"><strong>Modeling the .NET CLR Exception Handling Mechanism for a Mathematical Analysis</strong> (Fruja & Börger, 2006)</a></h3>
<p><strong>Abstract</strong></p>
<p>This work is part of a larger project which aims at establishing some important properties of C# and CLR by mathematical proofs. Examples are the correctness of the bytecode verifier of CLR, the type safety (along the lines of the first author’s correctness proof for the definite assignment rules) of C#, the correctness of a general compilation scheme.</p>
<h3 id="analysis-of-the-net-clr-exception-handling-mechanism-fruja--börger-2005"><a href="https://www.semanticscholar.org/paper/Analysis-of-the-.NET-CLR-Exception-Handling-Fruja-B%C3%83%C2%B6rger/0155d293eb444358542828e0405c8c754f543da0"><strong>Analysis of the .NET CLR Exception Handling Mechanism</strong> (Fruja & Börger, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>We provide a complete mathematical model for the exception handling mechanism of the Common Language Runtime (CLR), the virtual machine underlying the interpretation of .NET programs. The goal is to use this rigorous model in the corresponding part of the still-to-be-developed soundness proof for the CLR bytecode verifier.</p>
<h3 id="a-modular-design-for-the-common-language-runtime-clr-architecture-fruja-2005"><a href="https://www.semanticscholar.org/paper/A-Modular-Design-for-the-Common-Language-Runtime-Fruja/b2bd42f6ff8970777ae3c1cc87a65d963c891082"><strong>A Modular Design for the Common Language Runtime (CLR) Architecture</strong> (Fruja, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>This paper provides a modular high-level design of the Common Language Runtime (CLR) architecture. Our design is given in terms of Abstract State Machines (ASMs) and takes the form of an interpreter. We describe the CLR as a hierarchy of eight submachines, which correspond to eight submodules into which the Common Intermediate Language (CIL) instruction set can be decomposed.</p>
<h3 id="cross-language-program-slicing-in-the-net-framework-pócza-biczó--porkoláb-2005"><a href="https://www.semanticscholar.org/paper/Cross-language-Program-Slicing-in-the-.NET-P%C3%B3cza-Bicz%C3%B3/d0fffd3b754f2ab1181e108e7a416bb23c8612d7"><strong>Cross-language Program Slicing in the .NET Framework</strong> (Pócza, Biczó & Porkoláb, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>Dynamic program slicing methods are very attractive for debugging because many statements can be ignored in the process of localizing a bug. Although language interoperability is a key concept in modern development platforms, current slicing techniques are still restricted to a single language. In this paper a cross-language dynamic program slicing technique is introduced for the .NET environment. The method is utilizing the CLR Debugging Services API, hence it can be applied to large multi-language applications.</p>
<h3 id="design-and-implementation-of-a-high-level-multi-language--net-debugger-strein-2005"><a href="https://www.semanticscholar.org/paper/Design-and-Implementation-of-a-high-level-.-NET-Strein/811ae0dcda26249f7722a7315e9970be4f830b93"><strong>Design and Implementation of a high-level multi-language . NET Debugger</strong> (Strein, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Microsoft .NET Common Language Runtime (CLR) provides a low-level debugging application programmers interface (API), which can be used to implement traditional source code debuggers but can also be useful to implement other dynamic program introspection tools. This paper describes our experience in using this API for the implementation of a high-level debugger. The API is difficult to use from a technical point of view because it is implemented as a set of Component Object Model (COM) interfaces instead of a managed .NET API. Nevertheless, it is possible to implement a debugger in managed C# code using COM-interop. We describe our experience in taking this approach. We define a high-level debugging API and implement it in the C# language using COM-interop to access the low-level debugging API. Furthermore, we describe the integration of this high-level API in the multi-language development environment X-develop to enable source code debugging of .NET languages. This paper can be useful for anybody who wants to take the same approach to implement debuggers or other tools for dynamic program introspection.</p>
<h3 id="a-high-level-modular-definition-of-the-semantics-of-c-börger-fruja-gervasi--stärk-2004"><a href="https://www.sciencedirect.com/science/article/pii/S0304397504007765"><strong>A High-Level Modular Definition of the Semantics of C#</strong> (Börger, Fruja, Gervasi & Stärk, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>We propose a structured mathematical definition of the semantics of programs to provide a platform-independent interpreter view of the language for the programmer, which can also be used for a precise analysis of the ECMA standard of the language and as a reference model for teaching. The definition takes care to reflect directly and faithfully—as much as possible without becoming inconsistent or incomplete—the descriptions in the standard to become comparable with the corresponding models for Java in Stärk et al. (Java and Java Virtual Machine—Definition, Verification, Validation, Springer, Berlin, 2001) and to provide for implementors the possibility to check their basic design decisions against an accurate high-level model. The model sheds light on some of the dark corners of and on some critical differences between the ECMA standard and the implementations of the language.</p>
<h3 id="an-asm-specification-of-c-threads-and-the-net-memory-model-stärk-and-börger-2004"><a href="https://link.springer.com/chapter/10.1007/978-3-540-24773-9_4"><strong>An ASM Specification of C# Threads and the .NET Memory Model</strong> (Stärk and Börger, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>We present a high-level ASM model of C# threads and the .NET memory model. We focus on purely managed, fully portable threading features of C#. The sequential model interleaves the computation steps of the currently running threads and is suitable for uniprocessors. The parallel model addresses problems of true concurrency on multiprocessor systems. The models provide a sound basis for the development of multi-threaded applications in C#. The thread and memory models complete the abstract operational semantics of C# in.</p>
<h3 id="common-language-runtime--a-new-virtual-machine-ferreira-2004"><a href="https://www.semanticscholar.org/paper/Common-Language-Runtime-%3A-a-new-virtual-machine-Ferreira/fec4a355450eb0c935fe3ccff9d296529b11b873"><strong>Common Language Runtime : a new virtual machine</strong> (Ferreira, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>Virtual Machines provide a runtime execution platform combining bytecode portability with a performance close to native code. An overview of current approaches precedes an insight into Microsoft CLR (Common Language Runtime), comparing it to Sun JVM (Java Virtual Machine) and to a native execution environment (IA 32). A reference is also made to CLR in a Unix platform and to techniques on how CLR improves code execution.</p>
<h3 id="jvm-versus-clr-a-comparative-study-singer-2003"><a href="https://www.semanticscholar.org/paper/JVM-versus-CLR%3A-a-comparative-study-Singer/b57aaf581e043fb63c56ebd662720190e3121220"><strong>JVM versus CLR: a comparative study</strong> (Singer, 2003)</a></h3>
<p><strong>Abstract</strong></p>
<p>We present empirical evidence to demonstrate that there is little or no difference between the Java Virtual Machine and the .NET Common Language Runtime, as regards the compilation and execution of object-oriented programs. Then we give details of a case study that proves the superiority of the Common Language Runtime as a target for imperative programming language compilers (in particular GCC).</p>
<h3 id="runtime-code-generation-with-jvm-and-clr-sestoft-2002"><a href="https://www.researchgate.net/publication/2831690_Runtime_Code_Generation_with_JVM_And_CLR"><strong>Runtime Code Generation with JVM And CLR</strong> (Sestoft, 2002)</a></h3>
<p><strong>Abstract</strong></p>
<p>Modern bytecode execution environments with optimizing just-in-time compilers, such as Sun’s Hotspot Java Virtual Machine, IBM’s Java Virtual Machine, and Microsoft’s Common Language Runtime, provide an infrastructure for generating fast code at runtime. Such runtime code generation can be used for efficient implementation of parametrized algorithms. More generally, with runtime code generation one can introduce an additional binding-time without performance loss. This permits improved performance and improved static correctness guarantees.</p>
<hr />
<h2 id="microsoft-research"><strong>Microsoft Research</strong></h2>
<h3 id="project-snowflake-non-blocking-safe-manual-memory-management-in-net-parkinson--vaswani-costa-deligiannis-blankstein-mcdermott-balkind--vytiniotis-2017"><a href="https://www.microsoft.com/en-us/research/publication/project-snowflake-non-blocking-safe-manual-memory-management-net/"><strong>Project Snowflake: Non-blocking safe manual memory management in .NET</strong> (Parkinson, Vaswani, Costa, Deligiannis, Blankstein, McDermott, Balkind & Vytiniotis, 2017)</a></h3>
<p><strong>Abstract</strong></p>
<p>Garbage collection greatly improves programmer productivity and ensures memory safety. Manual memory management on the other hand often delivers better performance but is typically unsafe and can lead to system crashes or security vulnerabilities. We propose integrating safe manual memory management with garbage collection in the .NET runtime to get the best of both worlds. In our design, programmers can choose between allocating objects in the garbage collected heap or the manual heap. All existing applications run unmodified, and without any performance degradation, using the garbage collected heap. Our programming model for manual memory management is flexible: although objects in the manual heap can have a single owning pointer, we allow deallocation at any program point and concurrent sharing of these objects amongst all the threads in the program. Experimental results from our .NET CoreCLR implementation on real-world applications show substantial performance gains especially in multithreaded scenarios: up to 3x savings in peak working sets and 2x improvements in runtime.</p>
<h3 id="simple-fast-and-safe-manual-memory-management-kedia-costa-vytiniotis-parkinson-vaswani--blankstein-2017"><a href="https://www.microsoft.com/en-us/research/publication/simple-fast-safe-manual-memory-management/"><strong>Simple, Fast and Safe Manual Memory Management</strong> (Kedia, Costa, Vytiniotis, Parkinson, Vaswani & Blankstein, 2017)</a></h3>
<p><strong>Abstract</strong></p>
<p>Safe programming languages are readily available, but many applications continue to be written in unsafe languages, because the latter are more efficient. As a consequence, many applications continue to have exploitable memory safety bugs. Since garbage collection is a major source of inefficiency in the implementation of safe languages, replacing it with safe manual memory management would be an important step towards solving this problem.</p>
<p>Previous approaches to safe manual memory management use programming models based on regions, unique pointers, borrowing of references, and ownership types. We propose a much simpler programming model that does not require any of these concepts. Starting from the design of an imperative type safe language (like Java or C#), we just add a delete operator to free memory explicitly and an exception which is thrown if the program dereferences a pointer to freed memory. We propose an efficient implementation of this programming model that guarantees type safety. Experimental results from our implementation based on the C# native compiler show that this design achieves up to 3x reduction in peak working set and run time.</p>
<h3 id="uniqueness-and-reference-immutability-for-safe-parallelism-gordon--parkinson-parsons-bromfield--duffy-2012"><a href="https://www.microsoft.com/en-us/research/publication/uniqueness-and-reference-immutability-for-safe-parallelism/"><strong>Uniqueness and Reference Immutability for Safe Parallelism</strong> (Gordon, Parkinson, Parsons, Bromfield & Duffy, 2012)</a></h3>
<p><strong>Abstract</strong></p>
<p>A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system’s flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.</p>
<h3 id="a-study-of-concurrent-real-time-garbage-collectors-pizlo-petrank--steensgaard-2008"><a href="https://www.semanticscholar.org/paper/A-study-of-concurrent-real-time-garbage-collectors-Pizlo-Petrank/9d3f2b64fef6b8e66a081eae760bf3afed086687"><strong>A study of concurrent real-time garbage collectors</strong> (Pizlo, Petrank & Steensgaard, 2008)</a></h3>
<p><strong>Abstract</strong></p>
<p>Concurrent garbage collection is highly attractive for real-time systems, because offloading the collection effort from the executing threads allows faster response, allowing for extremely short deadlines at the microseconds level. Concurrent collectors also offer much better scalability over incremental collectors. The main problem with concurrent real-time collectors is their complexity. The first concurrent real-time garbage collector that can support fine synchronization, STOPLESS, has recently been presented by Pizlo et al. In this paper, we propose two additional (and different) algorithms for concurrent real-time garbage collection: CLOVER and CHICKEN. Both collectors obtain reduced complexity over the first collector STOPLESS, but need to trade a benefit for it. We study the algorithmic strengths and weaknesses of CLOVER and CHICKEN and compare them to STOPLESS. Finally, we have implemented all three collectors on the Bartok compiler and runtime for C# and we present measurements to compare their efficiency and responsiveness.</p>
<h3 id="optimizing-concurrency-levels-in-the-net-threadpool-a-case-study-of-controller-design-and-implementation-hellerstein-morrison--eilebrecht-2008"><a href="https://www.researchgate.net/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation"><strong>Optimizing concurrency levels in the. net threadpool: A case study of controller design and implementation</strong> (Hellerstein, Morrison & Eilebrecht, 2008)</a></h3>
<p><strong>Abstract</strong></p>
<p>This paper presents a case study of developing a hill climb-ing concurrency controller (HC 3) for the .NET ThreadPool. The intent of the case study is to provide insight into soft-ware considerations for controller design, testing, and imple-mentation. The case study is structured as a series of issues encountered and approaches taken to their resolution. Ex-amples of issues and approaches include: (a) addressing the need to combine a hill climbing control law with rule-based techniques by the use of hybrid control; (b) increasing the ef-ficiency and reducing the variability of the test environment by using resource emulation; and (c) effectively assessing design choices by using test scenarios for which the optimal concurrency level can be computed analytically and hence desired test results are known a priori. We believe that these issues and approaches have broad application to controllers for resource management of software systems.</p>
<h3 id="stopless-a-real-time-garbage-collector-for-multiprocessors-pizlo-frampton-petrank--steensgaard-2007"><a href="https://www.researchgate.net/publication/221032935_Stopless_a_real-time_garbage_collector_for_multiprocessors"><strong>Stopless: a real-time garbage collector for multiprocessors.</strong> (Pizlo, Frampton, Petrank & Steensgaard, 2007)</a></h3>
<p><strong>Abstract</strong></p>
<p>We present STOPLESS: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that sup- ports real-time on modern platforms is notoriously hard, especially if real-time implies lock-freedom. Known real-time collectors ei- ther restrict the real-time guarantees to uniprocessors only, rely on special hardware, or just give up supporting atomic operations (which are crucial for lock-free software). STOPLESS is the first collector that provides real-time responsiveness while preserving lock-freedom, supporting atomic operations, controlling fragmen- tation by compaction, and supporting modern parallel platforms. STOPLESS is adequate for modern languages such as C# or Java. It was implemented on top of the Bartok compiler and runtime for C# and measurements demonstrate high responsiveness (a factor of a 100 better than previously published systems), virtually no pause times, good mutator utilization, and acceptable overheads.</p>
<h3 id="securing-the-net-programming-model-kennedy-2006"><a href="https://www.microsoft.com/en-us/research/publication/securing-the-net-programming-model/"><strong>Securing the .NET Programming Model</strong> (Kennedy, 2006)</a></h3>
<p><strong>Abstract</strong></p>
<p>The security of the .NET programming model is studied from the standpoint of fully abstract compilation of C#. A number of failures of full abstraction are identified, and fixes described. The most serious problems have recently been fixed for version 2.0 of the .NET Common Language Runtime.</p>
<h3 id="combining-generics-pre-compilation-and-sharing-between-software-based-processes-syme--kennedy-2004"><a href="https://www.microsoft.com/en-us/research/publication/combining-generics-pre-compilation-and-sharing-between-software-based-processes/"><strong>Combining Generics, Pre-compilation and Sharing Between Software-Based Processes</strong> (Syme & Kennedy, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>We describe problems that have arisen when combining the proposed design for generics for the Microsoft .NET Common Language Runtime (CLR) with two resource-related features supported by the Microsoft CLR implementation: application domains and pre-compilation. Application domains are “software based processes” and the interaction between application domains and generics stems from the fact that code and descriptors are generated on a pergeneric-instantiation basis, and thus instantiations consume resources which are preferably both shareable and recoverable. Pre-compilation runs at install-time to reduce startup overheads. This interacts with application domain unloading: compilation units may contain shareable generated instantiations. The paper describes these interactions and the different approaches that can be used to avoid or ameliorate the problems.</p>
<h3 id="formalization-of-generics-for-the-net-common-language-runtime-yu-kennedy--syme-2004"><a href="https://www.microsoft.com/en-us/research/publication/formalization-of-generics-for-the-net-common-language-runtime/"><strong>Formalization of Generics for the .NET Common Language Runtime</strong> (Yu, Kennedy & Syme, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>We present a formalization of the implementation of generics in the .NET Common Language Runtime (CLR), focusing on two novel aspects of the implementation: mixed specialization and sharing, and efficient support for run-time types. Some crucial constructs used in the implementation are dictionaries and run-time type representations. We formalize these aspects type-theoretically in a way that corresponds in spirit to the implementation techniques used in practice. Both the techniques and the formalization also help us understand the range of possible implementation techniques for other languages, e.g., ML, especially when additional source language constructs such as run-time types are supported. A useful by-product of this study is a type system for a subset of the polymorphic IL proposed for the .NET CLR.</p>
<h3 id="runtime-verification-of-net-contracts-barnett--schulte-2003"><a href="https://www.microsoft.com/en-us/research/publication/runtime-verification-of-net-contracts/"><strong>Runtime Verification of .NET Contracts</strong> (Barnett & Schulte, 2003)</a></h3>
<p><strong>Abstract</strong></p>
<p>We propose a method for implementing behavioral interface specifications on the .NET platform. Our interface specifications are expressed as executable model programs. Model programs can be run either as stand-alone simulations or used as contracts to check the conformance of an implementation class to its specification. We focus on the latter, which we call runtime verification.In our framework, model programs are expressed in the new specification language AsmL. We describe how AsmL can be used to describe contracts independently from any implementation language, how AsmL allows properties of component interaction to be specified using mandatory calls, and how AsmL is used to check the behavior of a component written in any of the .NET languages, such as VB, C#, or C++.</p>
<h3 id="design-and-implementation-of-generics-for-the-net-common-language-runtime-kennedy--syme-2001"><a href="https://www.microsoft.com/en-us/research/publication/design-and-implementation-of-generics-for-the-net-common-language-runtime/"><strong>Design and Implementation of Generics for the .NET Common Language Runtime</strong> (Kennedy & Syme, 2001)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Microsoft .NET Common Language Runtime provides a shared type system, intermediate language and dynamic execution environment for the implementation and inter-operation of multiple source languages. In this paper we extend it with direct support for parametric polymorphism (also known as generics), describing the design through examples written in an extended version of the C# programming language, and explaining aspects of implementation by reference to a prototype extension to the runtime. Our design is very expressive, supporting parameterized types, polymorphic static, instance and virtual methods, “F-bounded” type parameters, instantiation at pointer and value types, polymorphic recursion, and exact run-time types. The implementation takes advantage of the dynamic nature of the runtime, performing justin-time type specialization, representation-based code sharing and novel techniques for efficient creation and use of run-time types. Early performance results are encouraging and suggest that programmers will not need to pay an overhead for using generics, achieving performance almost matching hand-specialized code.</p>
<h3 id="typing-a-multi-language-intermediate-code-gordon--syme-2001"><a href="https://www.microsoft.com/en-us/research/publication/typing-a-multi-language-intermediate-code/"><strong>Typing a Multi-Language Intermediate Code</strong> (Gordon & Syme, 2001)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Microsoft .NET Framework is a new computing architecture designed to support a variety of distributed applications and web-based services. .NET software components are typically distributed in an object-oriented intermediate language, Microsoft IL, executed by the Microsoft Common Language Runtime. To allow convenient multi-language working, IL supports a wide variety of high-level language constructs, including class-based objects, inheritance, garbage collection, and a security mechanism based on type safe execution. This paper precisely describes the type system for a substantial fragment of IL that includes several novel features: certain objects may be allocated either on the heap or on the stack; those on the stack may be boxed onto the heap, and those on the heap may be unboxed onto the stack; methods may receive arguments and return results via typed pointers, which can reference both the stack and the heap, including the interiors of objects on the heap. We present a formal semantics for the fragment. Our typing rules determine well-typed IL instruction sequences that can be assembled and executed. Of particular interest are rules to ensure no pointer into the stack outlives its target. Our main theorem asserts type safety, that well-typed programs in our IL fragment do not lead to untrapped execution errors. Our main theorem does not directly apply to the product. Still, the formal system of this paper is an abstraction of informal and executable specifications we wrote for the full product during its development. Our informal specification became the basis of the product team’s working specification of type-checking. The process of writing this specification, deploying the executable specification as a test oracle, and applying theorem proving techniques, helped us identify several security critical bugs during development.</p>
<hr />
<h2 id="mono-runtime"><strong>Mono Runtime</strong></h2>
<h3 id="static-and-dynamic-analysis-of-android-malware-and-goodware-written-with-unity-framework-shim-lim-cho-han--park-2018"><a href="https://www.semanticscholar.org/paper/Static-and-Dynamic-Analysis-of-Android-Malware-and-Shim-Lim/e270ebdde1988b4d99d7721d664c046ffaa366f1"><strong>Static and Dynamic Analysis of Android Malware and Goodware Written with Unity Framework</strong> (Shim, Lim, Cho, Han & Park, 2018)</a></h3>
<p><strong>Abstract</strong></p>
<p>Unity is the most popular cross-platform development framework to develop games for multiple platforms such as Android, iOS, and Windows Mobile. While Unity developers can easily develop mobile apps for multiple platforms, adversaries can also easily build malicious apps based on the “write once, run anywhere” (WORA) feature. Even thoughmalicious apps were discovered among Android apps written with Unity framework (Unity apps), little research has been done on analysing the malicious apps. We propose static and dynamic reverse engineering techniques for malicious Unity apps. We first inspect the executable file format of a Unity app and present an effective static analysis technique of the Unity app. Then, we also propose a systematic technique to analyse dynamically the Unity app. Using the proposed techniques, the malware analyst can statically and dynamically analyse Java code, native code in C or C ++, and the Mono runtime layer where the C# code is running.</p>
<h3 id="reducing-startup-time-of-a-deterministic-virtualizing-runtime-environment-däumler--werner-2013"><a href="https://dl.acm.org/citation.cfm?id=2463604"><strong>Reducing startup time of a deterministic virtualizing runtime environment</strong> (Däumler & Werner, 2013)</a></h3>
<p><strong>Abstract</strong></p>
<p>Virtualized runtime environments like Java Virtual Machine (JVM) or Microsoft .NET’s Common Language Runtime (CLR) introduce additional challenges to real-time software development. Since applications for such environments are usually deployed in platform independent intermediate code, one issue is the timing of code transformation from intermediate code into native code. We have developed a solution for this problem, so that code transformation is suitable for real-time systems. It combines pre-compilation of intermediate code with the elimination of indirect references in native code. The gain of determinism comes with an increased application startup time. In this paper we present an optimization that utilizes an Ahead-of-Time compiler to reduce the startup time while keeping the real-time suitable timing behaviour. In an experiment we compare our approach with existing ones and demonstrate its benefits for certain application cases.</p>
<h3 id="detecting-clones-across-microsoft-net-programming-languages-al-omari--keivanloo-roy--rilling-2012"><a href="https://www.semanticscholar.org/paper/Detecting-Clones-Across-Microsoft-.NET-Programming-Al-Omari-Keivanloo/22241ada86ef977315cc6c5978ce0a0636e1850c#paper-header"><strong>Detecting Clones Across Microsoft .NET Programming Languages</strong> (Al-Omari, Keivanloo, Roy & Rilling, 2012)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Microsoft .NET framework and its language family focus on multi-language development to support interoperability across several programming languages. The framework allows for the development of similar applications in different languages through the reuse of core libraries. As a result of such a multi-language development, the identification and trace ability of similar code fragments (clones) becomes a key challenge. In this paper, we present a clone detection approach for the .NET language family. The approach is based on the Common Intermediate Language, which is generated by the .NET compiler for the different languages within the .NET framework. In order to achieve an acceptable recall while maintaining the precision of our detection approach, we define a set of filtering processes to reduce noise in the raw data. We show that these filters are essential for Intermediate Language-based clone detection, without significantly affecting the precision of the detection approach. Finally, we study the quantitative and qualitative performance aspects of our clone detection approach. We evaluate the number of reported candidate clone-pairs, as well as the precision and recall (using manual validation) for several open source cross-language systems, to show the effectiveness of our proposed approach.</p>
<h3 id="language-independent-sandboxing-of-just-in-time-compilation-and-self-modifying-code-ansel--marchenko-2012"><a href="https://www.researchgate.net/publication/314841984_Language-independent_sandboxing_of_just-in-time_compilation_and_self-modifying_code"><strong>Language-independent sandboxing of just-in-time compilation and self-modifying code</strong> (Ansel & Marchenko, 2012)</a></h3>
<p><strong>Abstract</strong></p>
<p>When dealing with dynamic, untrusted content, such as on the Web, software behavior must be sandboxed, typically through use of a language like JavaScript. However, even for such specially-designed languages, it is difficult to ensure the safety of highly-optimized, dynamic language runtimes which, for efficiency, rely on advanced techniques such as Just-In-Time (JIT) compilation, large libraries of native-code support routines, and intricate mechanisms for multi-threading and garbage collection. Each new runtime provides a new potential attack surface and this security risk raises a barrier to the adoption of new languages for creating untrusted content. Removing this limitation, this paper introduces general mechanisms for safely and efficiently sandboxing software, such as dynamic language runtimes, that make use of advanced, low-level techniques like runtime code modification. Our language-independent sandboxing builds on Software-based Fault Isolation (SFI), a traditionally static technique. We provide a more flexible form of SFI by adding new constraints and mechanisms that allow safety to be guaranteed despite runtime code modifications. We have added our extensions to both the x86-32 and x86-64 variants of a production-quality, SFI-based sandboxing platform; on those two architectures SFI mechanisms face different challenges. We have also ported two representative language platforms to our extended sandbox: the Mono common language runtime and the V8 JavaScript engine. In detailed evaluations, we find that sandboxing slowdown varies between different benchmarks, languages, and hardware platforms. Overheads are generally moderate and they are close to zero for some important benchmark/platform combinations.</p>
<h3 id="vmkit-a-substrate-for-managed-runtime-environments-geoffray-thomas-lawall-muller--folliot-2010"><a href="https://www.researchgate.net/publication/221137881_VMKit_a_Substrate_for_Managed_Runtime_Environments"><strong>VMKit: a Substrate for Managed Runtime Environments</strong> (Geoffray, Thomas, Lawall, Muller & Folliot, 2010)</a></h3>
<p><strong>Abstract</strong></p>
<p>Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages. This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparableto the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the Dacapo benchmarks.</p>
<h3 id="mmc-the-mono-model-checker-ruys--aan-de-brugh-2007"><a href="https://www.sciencedirect.com/science/article/pii/S1571066107005348"><strong>MMC: the Mono Model Checker</strong> (Ruys & Aan de Brugh, 2007)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Mono Model Checker (mmc) is a software model checker for cil bytecode programs. mmc has been developed on the Mono platform. mmc is able to detect deadlocks and assertion violations in cil programs. The design of mmc is inspired by the Java PathFinder (jpf), a model checker for Java programs. The performance of mmc is comparable to jpf. This paper introduces mmc and presents its main architectural characteristics.</p>
<h3 id="numeric-performance-in-c-c-and-java-sestoft-2007"><a href="https://www.researchgate.net/publication/228380860_Numeric_performance_in_C_C_and_Java"><strong>Numeric performance in C, C# and Java</strong> (Sestoft, 2007)</a></h3>
<p><strong>Abstract</strong></p>
<p>We compare the numeric performance of C, C# and Java on three small cases.</p>
<h3 id="mono-versus-net-a-comparative-study-of-performance-for-distributed-processing-blajian-eggen-eggen--pitts-2006"><a href="https://www.researchgate.net/publication/221134118_Mono_versus_net_A_Comparative_Study_of_Performance_for_Distributed_Processing"><strong>Mono versus .Net: A Comparative Study of Performance for Distributed Processing.</strong> (Blajian, Eggen, Eggen & Pitts, 2006)</a></h3>
<p><strong>Abstract</strong></p>
<p>Microsoft has released .NET, a platform dependent standard for the C#,programming language. Sponsored by Ximian/Novell, Mono, the open source development platform based on the .NET framework, has been developed to be a platform independent version of the C#,programming environment. While .NET is platform dependent, Mono allows developers to build Linux and crossplatform applications. Mono’s .NET implementation is based on the ECMA standards for C#. This paper examines both of these programming environments with the goal of evaluating the performance characteristics of each. Testing is done with various algorithms. We also assess the trade-offs associated with using a cross-platform versus a platform.</p>
<h3 id="automated-detection-of-performance-regressions-the-mono-experience-kalibera-bulej--tuma-2005"><a href="https://www.semanticscholar.org/paper/Automated-detection-of-performance-regressions%3A-the-Kalibera-Bulej/3b4c7756340df10a7c73a5646931763a9e81ee05"><strong>Automated detection of performance regressions: the mono experience</strong> (Kalibera, Bulej & Tuma, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>Engineering a large software project involves tracking the impact of development and maintenance changes on the software performance. An approach for tracking the impact is regression benchmarking, which involves automated benchmarking and evaluation of performance at regular intervals. Regression benchmarking must tackle the nondeterminism inherent to contemporary computer systems and execution environments and the impact of the nondeterminism on the results. On the example of a fully automated regression benchmarking environment for the mono open-source project, we show how the problems associated with nondeterminism can be tackled using statistical methods.</p>
<hr />
<h2 id="shared-source-common-language-infrastructure-sscli---aka-rotor"><strong>Shared Source Common Language Infrastructure</strong> (SSCLI) - a.k.a ‘<strong>Rotor</strong>’</h2>
<h3 id="efficient-virtual-machine-support-of-runtime-structural-reflection-ortina-redondoa--perez-schofield-2009"><a href="https://www.sciencedirect.com/science/article/pii/S0167642309000689"><strong>Efficient virtual machine support of runtime structural reflection</strong> (Ortina, Redondoa & Perez-Schofield, 2009)</a></h3>
<p><strong>Abstract</strong></p>
<p>Increasing trends towards adaptive, distributed, generative and pervasive software have made object-oriented dynamically typed languages become increasingly popular. These languages offer dynamic software evolution by means of reflection, facilitating the development of dynamic systems. Unfortunately, this dynamism commonly imposes a runtime performance penalty. In this paper, we describe how to extend a production JIT-compiler virtual machine to support runtime object-oriented structural reflection offered by many dynamic languages. Our approach improves runtime performance of dynamic languages running on statically typed virtual machines. At the same time, existing statically typed languages are still supported by the virtual machine.</p>
<p>We have extended the .Net platform with runtime structural reflection adding prototype-based object-oriented semantics to the statically typed class-based model of .Net, supporting both kinds of programming languages. The assessment of runtime performance and memory consumption has revealed that a direct support of structural reflection in a production JIT-based virtual machine designed for statically typed languages provides a significant performance improvement for dynamically typed languages.</p>
<h3 id="extending-the-sscli-to-support-dynamic-inheritance-redondo-ortin--perez-schofield-2008"><a href="https://link.springer.com/chapter/10.1007/978-3-642-05201-9_2"><strong>Extending the SSCLI to Support Dynamic Inheritance</strong> (Redondo, Ortin & Perez-Schofield, 2008)</a></h3>
<p><strong>Abstract</strong></p>
<p>This paper presents a step forward on a research trend focused on increasing runtime adaptability of commercial JIT-based virtual machines, describing how to include dynamic inheritance into this kind of platforms. A considerable amount of research aimed at improving runtime performance of virtual machines has converted them into the ideal support for developing different types of software products. Current virtual machines do not only provide benefits such as application interoperability, distribution and code portability, but they also offer a competitive runtime performance.</p>
<p>Since JIT compilation has played a very important role in improving runtime performance of virtual machines, we first extended a production JIT-based virtual machine to support efficient language-neutral structural reflective primitives of dynamically typed programming languages. This article presents the next step in our research work: supporting language-neutral dynamic inheritance for both statically and dynamically typed programming languages. Executing both kinds of programming languages over the same platform provides a direct interoperation between them.</p>
<h3 id="sampling-profiler-for-rotor-as-part-of-optimizing-compilation-system-chilingarova--safonov-2006"><a href="https://www.semanticscholar.org/paper/Sampling-profiler-for-Rotor-as-part-of-optimizing-Chilingarova/99e874d7fd5ab5f890d599759a875ef2212e6a3b"><strong>Sampling profiler for Rotor as part of optimizing compilation system</strong> (Chilingarova & Safonov, 2006)</a></h3>
<p><strong>Abstract</strong></p>
<p>This paper describes a low-overhead self-tuning sampling-based runtime profiler integrated into SSCLI virtual machine. Our profiler estimates how “hot” a method is and builds a call context graph based on managed stack samples analysis. The frequency of sampling is tuned dynamically at runtime, based on the information of how often the same activation record appears on top of the stack. The call graph is presented as a novel Call Context Map (CC-Map) structure that combines compact representation and accurate information about the context. It enables fast extraction of data helpful in making compilation decisions, as well as fast placing data into the map. Sampling mechanism is integrated with intrinsic Rotor mechanisms of thread preemption and stack walk. A separate system thread is responsible for organizing data in the CC-Map. This thread gathers and stores samples quickly queued by managed threads, thus decreasing the time they must hold up their user-scheduled job</p>
<h3 id="to-jit-or-not-to-jit-the-effect-of-code-pitching-on-the-performance-of-net-framework-anthony-leung--srisa-an-2005"><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.9138"><strong>To JIT or not to JIT: The effect of code-pitching on the performance of .NET framework</strong> (Anthony, Leung & Srisa-an, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>The.NET Compact Framework is designed to be a highperformance virtual machine for mobile and embedded devices that operate on Windows CE (version 4.1 and later). It achieves fast execution time by compiling methods dynamically instead of using interpretation. Once compiled, these methods are stored in a portion of the heap called code-cache and can be reused quickly to satisfy future method calls. While code-cache provides a high-level of reusability, it can also use a large amount of memory. As a result, the Compact Framework provides a “code pitching ” mechanism that can be used to discard the previously compiled methods as needed. In this paper, we study the effect of code pitching on the overall performance and memory utilization of.NET applications. We conduct our experiments using Microsoft’s Shared-Source Common Language Infrastructure (SSCLI). We profile the access behavior of the compiled methods. We also experiment with various code-cache configurations to perform pitching. We find that programs can operate efficiently with a small code-cache without incurring substantial recompilation and execution overheads.</p>
<h3 id="adding-structural-reflection-to-the-sscli-ortin-redondo-vinuesa--lovelle-2005"><a href="https://www.researchgate.net/publication/249898327_Adding_structural_reflection_to_the_SSCLI"><strong>Adding structural reflection to the SSCLI</strong> (Ortin, Redondo, Vinuesa & Lovelle, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>Although dynamic languages are becoming widely used due to the flexibility needs of specific software prod- ucts, their major drawback is their runtime performance. Compiling the source program to an abstract machine’s intermediate language is the current technique used to obtain the best performance results. This intermediate code is then executed by a virtual machine developed as an interpreter. Although JIT adaptive optimizing com- pilation is currently used to speed up Java and .net intermediate code execution, this practice has not been em- ployed successfully in the implementation of dynamically adaptive platforms yet. We present an approach to improve the runtime performance of a specific set of structural reflective primitives, extensively used in adaptive software development. Looking for a better performance, as well as interaction with other languages, we have employed the Microsoft Shared Source CLI platform, making use of its JIT compiler. The SSCLI computational model has been enhanced with semantics of the prototype-based object-oriented com- putational model. This model is much more suitable for reflective environments. The initial assessment of per- formance results reveals that augmenting the semantics of the SSCLI model, together with JIT generation of native code, produces better runtime performance than the existing implementations.</p>
<h3 id="static-analysis-for-identifying-and-allocating-clusters-of-immortal-objects-ravindar--srikant-2005"><a href="https://www.researchgate.net/publication/252238722_Static_Analysis_for_Identifying_and_Allocating_Clusters_of_Immortal_Objects"><strong>Static Analysis for Identifying and Allocating Clusters of Immortal Objects</strong> (Ravindar & Srikant, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>Long living objects lengthen the trace time which is a critical phase of the garbage collection process. However, it is possible to recognize object clusters i.e. groups of long living objects having approximately the same lifetime and treat them separately to reduce the load on the garbage collector and hence improve overall performance. Segregating objects this way leaves the heap for objects with shorter lifetimes and now a typical collection can nd more garbage than before. In this paper, we describe a compile time analysis strategy to identify object clusters in programs. The result of the compile time analysis is the set of allocation sites that contribute towards allocating objects belonging to such clusters. All such allocation sites are replaced by a new allocation method that allocates objects into the cluster area rather than the heap. This study was carried out for a concurrent collector which we developed for Rotor, Microsoft’s Shared Source Implementation of .NET. We analyze the performance of the program with combina- tions of the cluster and stack allocation optimizations. Our results show that the clustering optimization reduces the number of collections by 66.5% on average, even eliminating the need for collection in some programs. As a result, the total pause time reduces by 62.8% on average. Using both stack allocation and the cluster optimizations brings down the number of collections by 91.5% thereby improving the total pause time by 79.33%.</p>
<h3 id="an-optimizing-just-intime-compiler-for-rotor-trindade--silva-2005"><a href="https://www.semanticscholar.org/paper/An-Optimizing-Just-InTime-Compiler-for-Rotor-Trindade-Silva/6ef2de17e42fa2ad74799211d87ceaa6edc6e8bb"><strong>An Optimizing Just-InTime Compiler for Rotor</strong> (Trindade & Silva, 2005)</a></h3>
<p><strong>Abstract</strong></p>
<p>The Shared Source CLI (SSCLI), also known as Rotor, is an implementation of the CLI released by Microsoft in source code. Rotor includes a single pass just-in-time compiler that generates non-optimized code for Intel IA-32 and IBM PowerPC processors. We extend Rotor with an optimizing justin-time compiler for IA-32. This compiler has three passes: control flow graph generation, data dependence graph generation and final code generation. Dominance relations in the control flow graph are used to detect natural loops. A number of optimizations are performed during the generation of the data dependence graph. During native code generation, the rich address modes of IA32 are used for instruction folding, reducing code size and usage of register names. Despite the overhead of three passes and optimizations, this compiler is only 1.4 to 1.9 times slower than the original SSCLI compiler and generates code that runs 6.4 to 10 times faster.</p>
<h3 id="software-interactions-into-the-sscli-platform-charfi--emsellem-2004"><a href="https://www.semanticscholar.org/paper/SOFTWARE-INTERACTIONS-INTO-THE-SSCLI-PLATFORM-Charfi-Emsellem/b51d387d9ed8fb73d6f2449469a8276d40293fb2"><strong>Software Interactions into the SSCLI platform</strong> (Charfi & Emsellem, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>By using an Interaction Specification Language (ISL), interactions between components can be expressed in a language independent way. At class level, interaction pattern specified in ISLrepresent model s of future interactions when applied on some component instances. The Interaction Server is in charge of managing the life cycle of interactions (interaction pattern registration and instantiation, destruction of interactions, merging). It acts as a central repository that keeps the global coherency of the adaptations realized on the component instances.The Interaction service allows creati ng interactions between heterogeneous components. Noah is an implementation of this Interaction Service. It can be thought as a dynamic aspect repository with a weaver that uses an aspect composition mechanism that insures commutable and associative adaptations. In this paper, we propose the implementation of the Interaction Service in the SSCLI. In contrast to other implementations such as Java where interaction management represents an additional layer, SSCLI enables us to integrate Interaction Management as in intrinsic part of the CLI runtime.</p>
<h3 id="experience-integrating-a-new-compiler-and-a-new-garbage-collector-into-rotor-anderson-eng-glew-lewis-menon--stichnoth-2004"><a href="https://www.semanticscholar.org/paper/Experience-Integrating-a-New-Compiler-and-a-New-Anderson-Eng/55f0fd31010e8b7a3d92cbb4ab012db75deff1aa"><strong>Experience Integrating a New Compiler and a New Garbage Collector Into Rotor</strong> (Anderson, Eng, Glew, Lewis, Menon & Stichnoth, 2004)</a></h3>
<p><strong>Abstract</strong></p>
<p>Microsoft’s Rotor is a shared-source CLI implementation intended for use as a research platform. It is particularly attractive for research because of its complete implementation and extensive libraries, and because its modular design allows dierent implementations of certain components such as just-in-time compilers (JITs). Our group has independently developed our own high-performance JIT and garbage collector (GC) and wanted to take advantage of Rotor to experiment with these components in a CLI environment. In this paper, we describe our experience integrating these components into Rotor and evaluate the flexibility of Rotor’s design toward this goal. We found it easier to integrate our JIT than our GC because Rotor has a well-defined interface for the former but not the latter. However, our JIT integration still required significant changes to both Rotor and our JIT. For example, we modified Rotor to support multiple JITs. We also added support for a second JIT manager in Rotor, and implemented a new code manager compatible with our JIT. We had to change our JIT compiler to support Rotor’s calling conventions, helper functions, and exception model. Our GC integration was complicated by the many places in Rotor where components make assumptions about how its garbage collector is implemented, as well as Rotor’s lack of a well-defined GC interface. We also had to reconcile the dierent assumptions made by Rotor and our garbage collector about the layout of objects, virtual-method tables, and thread structures.</p>
"Stubs" in the .NET Runtime2019-09-26T00:00:00+00:00http://www.mattwarren.org/2019/09/26/Stubs-in-the-.NET-Runtime
<p>As the saying goes:</p>
<blockquote>
<p>“All problems in computer science can be solved by another level of indirection”</p>
<p>- <a href="https://www2.dmst.aueb.gr/dds/pubs/inbook/beautiful_code/html/Spi07g.html#another_level_of_indirection">David Wheeler</a></p>
</blockquote>
<p>and it certainly seems like the ‘.NET Runtime’ Engineers took this advice to heart!</p>
<p><a href="https://en.wikipedia.org/wiki/Method_stub">‘Stubs’</a>, as they’re known in the runtime (sometimes <a href="https://en.wikipedia.org/wiki/Thunk">‘Thunks’</a>), provide a level of indirection throughout the source code, there’s almost <a href="https://github.com/dotnet/coreclr/search?q=stub+OR+thunk&unscoped_q=stub+OR+thunk">500 mentions of them</a>!</p>
<p>This post will explore <strong>what</strong> they are, <strong>how</strong> they work and <strong>why</strong> they’re needed.</p>
<hr />
<p><strong>Table of Contents</strong></p>
<ul>
<li><a href="#what-are-stubs">What are stubs?</a>
<ul>
<li><a href="#why-are-stubs-needed">Why are stubs needed?</a></li>
<li><a href="#clr-application-binary-interface-abi">CLR ‘Application Binary Interface’ (ABI)</a></li>
<li><a href="#stub-management">Stub Management</a></li>
</ul>
</li>
<li><a href="#types-of-stubs">Types of stubs</a>
<ul>
<li><a href="#precode">Precode</a>
<ul>
<li><a href="#just-in-time-jit-and-tiered-compilation">‘Just-in-time’ (JIT) and ‘Tiered’ Compilation</a></li>
</ul>
</li>
<li><a href="#stubs-as-il">Stubs-as-IL</a></li>
<li><a href="#pinvoke-reverse-pinvoke-and-calli">P/Invoke, Reverse P/Invoke and ‘calli’</a>
<ul>
<li><a href="#marshalling">Marshalling</a></li>
</ul>
</li>
<li><a href="#generics">Generics</a></li>
<li><a href="#delegates">Delegates</a>
<ul>
<li><a href="#singlecast-delegates">Singlecast Delegates</a></li>
<li><a href="#shuffle-thunks">Shuffle Thunks</a></li>
</ul>
</li>
<li><a href="#unboxing">Unboxing</a></li>
<li><a href="#arrays">Arrays</a></li>
<li><a href="#tail-calls">Tail Calls</a></li>
<li><a href="#virtual-stub-dispatch-vsd">Virtual Stub Dispatch (VSD)</a></li>
</ul>
</li>
<li><a href="#other-types-of-stubs">Other Types of Stubs</a></li>
<li><a href="#stubs-in-the-mono-runtime">Stubs in the Mono Runtime</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<hr />
<h2 id="what-are-stubs">What are stubs?</h2>
<p>In the context of the .NET Runtime, ‘stubs’ look something like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Call-site Callee
+--------------+ +---------+ +-------------+
| | | | | |
| +---------->+ Stub + - - - - ->+ |
| | | | | |
+--------------+ +---------+ +-------------+
</code></pre></div></div>
<p>So they sit between a method ‘<em>call-site</em>’ (i.e. code such as <code class="language-plaintext highlighter-rouge">var result = Foo(..);</code>) and the ‘<em>callee</em>’ (where the method itself is implemented, the native/assembly code) and I like to think of them as doing <strong>tidy-up</strong> or <strong>fix-up</strong> work. Note that moving from the ‘stub’ to the ‘callee’ isn’t another full method call (hence the dotted line), it’s often just a single <code class="language-plaintext highlighter-rouge">jmp</code> or <code class="language-plaintext highlighter-rouge">call</code> assembly instruction, so the 2nd transition doesn’t involve all the same work that was initially done at the call-site (pushing/popping arguments into registers, increasing the stack space, etc).</p>
<p>The stubs themselves can be as simple as just a few assembly instructions or something more complicated, we’ll look at individual examples <a href="#types-of-stubs">later on in this post</a>.</p>
<p>Now, to be clear, not <em>all</em> method calls require a stub, if you’re doing a regular call to an <em>static</em> or <em>instance</em> method that just goes directly from the ‘call-site’ to the ‘callee’. But once you involve <a href="#virtual-stub-dispatch-vsd">virtual methods</a>, <a href="#delegates">delegates</a> or <a href="#generics">generics</a> things get a bit more complicated.</p>
<h3 id="why-are-stubs-needed">Why are stubs needed?</h3>
<p>There are several reasons that stubs need to be created by the runtime:</p>
<ul>
<li><strong>Required Functionality</strong>
<ul>
<li>For instance <a href="#delegates">Delegates</a> and <a href="#arrays">Arrays</a> <em>must</em> be provided but the runtime, their method bodies are not generated by the C#/F#/VB.NET compiler and neither do they exist in the <a href="https://github.com/dotnet/corefx">Base-Class Libraries</a>. This requirement is outlined in the <a href="/2018/04/06/Taking-a-look-at-the-ECMA-335-Standard-for-.NET/">ECMA 355 Spec</a>, for instance ‘Partition I’ in section ‘8.9.1 Array types’ says:
<blockquote>
<p>Exact array types are <strong>created automatically by the VES when they are required</strong>. Hence, the operations on an array type are defined by the CTS. These generally are: allocating the array based on size and lower-bound information, indexing the array to read and write a value, computing the address of an element of the array (a managed pointer), and querying for the rank, bounds, and the total number of values stored in the array.</p>
</blockquote>
<p>Likewise for delegates, which are covered in ‘I.8.9.3 Delegates’:</p>
<blockquote>
<p>While, for the most part, delegates appear to be simply another kind of user-defined class, they are tightly controlled. <strong>The implementations of the methods are provided by the VES, not user code</strong>. The only additional members that can be defined on delegate types are static or instance methods.</p>
</blockquote>
</li>
</ul>
</li>
<li><strong>Performance</strong>
<ul>
<li>Other types of ‘stubs’, such as <a href="#virtual-stub-dispatch-vsd">Virtual Stub Dispatch</a> and <a href="#generics">Generic Instantiation Stubs</a> are there to make those operations perform well or to have an positive impact on the entire runtime, such as reducing the memory footprint (in the case of <a href="https://yizhang82.dev/dotnet-generics-sharing">‘shared generic code’</a>).</li>
</ul>
</li>
<li><strong>Consistent method calls</strong>
<ul>
<li>A final factor is that having ‘stubs’ makes the work of the JIT compiler easier. As we will see in the rest of the post, stubs deal with a variety of different types of method calls. This means the the JIT can generate more straightforward code for any given ‘call site’, because it (mostly) doesn’t care whats happening in the ‘callee’. If stubs didn’t exist, for a given method call the JIT would have to generate different code depending on whether generics where involved or not, if it was a virtual or non-virtual call, if it was going via a delegate, etc. Stubs abstact a lot of this behaviour away from the JIT, allowing it to deal with a more simple ‘Application Binary Interface’ (ABI).</li>
</ul>
</li>
</ul>
<h3 id="clr-application-binary-interface-abi">CLR ‘Application Binary Interface’ (ABI)</h3>
<p>Therefore, another way to think about ‘stubs’ is that they are part of what makes the CLR-specific <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md">‘Application Binary Interface’ (ABI)</a> work.</p>
<p>All code needs to work with the ABI or ‘calling convention’ of the CPU/OS that it’s running on, for instance by following the <a href="https://en.wikipedia.org/wiki/X86_calling_conventions">x86 calling convention</a>, <a href="https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019">x64 calling convention</a> or <a href="https://wiki.osdev.org/System_V_ABI">System V ABI</a>. This applies across runtimes, for more on this see:</p>
<ul>
<li><a href="https://science.raphael.poss.name/go-calling-convention-x86-64.html">The Go low-level calling convention on x86-64</a></li>
<li><a href="https://stackoverflow.com/questions/41693637/whats-the-calling-convention-for-the-java-code-in-linux-platform">What’s the calling convention for the Java code in Linux platform?</a></li>
<li><a href="https://doc.rust-lang.org/nomicon/ffi.html">Rust docs - Foreign Function Interface</a></li>
<li><a href="https://doc.rust-lang.org/unstable-book/language-features/abi-thiscall.html">Rust docs - abi_thiscall</a></li>
<li><a href="https://blog.filippo.io/rustgo/">rustgo: calling Rust from Go with near-zero overhead</a></li>
</ul>
<p>As an aside, if you want more information about ‘calling conventions’ here’s some links that I found useful:</p>
<ul>
<li><a href="https://www.codeproject.com/articles/1388/calling-conventions-demystified">Calling Conventions Demystified</a></li>
<li><a href="https://wiki.osdev.org/Calling_Conventions">OS Dev - Calling Conventions</a></li>
<li><a href="https://www.gamasutra.com/view/news/171088/x64_ABI_Intro_to_the_Windows_x64_calling_convention.php">x64 ABI: Intro to the Windows x64 calling convention</a></li>
<li><a href="https://blogs.msdn.microsoft.com/freik/2006/03/06/x64-abi-vs-x86-abi-aka-calling-conventions-for-amd64-em64t/">x64 ABI vs. x86 ABI (aka Calling Conventions for AMD64 & EM64T)</a></li>
</ul>
<p>However, on-top of what the CLR <em>has</em> to support due to the CPU/OS conventions, it also has it’s own <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md">extended ABI</a> for <em>.NET-specific</em> use cases, including:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#the-this-pointer">“this” pointer</a>:
<blockquote>
<p><strong>The managed “this” pointer is treated like a new kind of argument not covered by the native ABI</strong>, so we chose to always pass it as the first argument in (AMD64) <code class="language-plaintext highlighter-rouge">RCX</code> or (ARM, ARM64) <code class="language-plaintext highlighter-rouge">R0</code>.
AMD64-only: Up to .NET Framework 4.5, the managed “this” pointer was treated just like the native “this” pointer (meaning it was the second argument when the call used a return buffer and was passed in <code class="language-plaintext highlighter-rouge">RDX</code> instead of <code class="language-plaintext highlighter-rouge">RCX</code>). Starting with .NET Framework 4.5, it is always the first argument.</p>
</blockquote>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#generics">Generics</a> or more specifically to handle ‘Shared generics’:
<blockquote>
<p><strong>In cases where the code address does not uniquely identify a generic instantiation of a method, then a ‘generic instantiation parameter’ is required</strong>. Often the “this” pointer can serve dual-purpose as the instantiation parameter. When the “this” pointer is not the generic parameter, <strong>the generic parameter is passed as an additional argument</strong>..</p>
</blockquote>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#hidden-parameters">Hidden Parameters</a>, covering ‘Stub dispatch’, ‘Fast Pinvoke’, ‘Calli Pinvoke’ and ‘Normal PInvoke’. For instance, here’s why ‘PInvoke’ has a hidden parameter:
<blockquote>
<p><em>Normal PInvoke</em> - The VM <strong>shares IL stubs based on signatures</strong>, but wants the <strong>right method to show up in call stack and exceptions</strong>, so the MethodDesc for the exact PInvoke is passed in the (x86) <code class="language-plaintext highlighter-rouge">EAX</code> / (AMD64) <code class="language-plaintext highlighter-rouge">R10</code> / (ARM, ARM64) <code class="language-plaintext highlighter-rouge">R12</code> (in the JIT: <code class="language-plaintext highlighter-rouge">REG_SECRET_STUB_PARAM</code>). Then in the IL stub, when the JIT gets <code class="language-plaintext highlighter-rouge">CORJIT_FLG_PUBLISH_SECRET_PARAM</code>, it must move the register into a compiler temp.</p>
</blockquote>
</li>
</ul>
<p>Not all of these scenarios need a stub, for instance the ‘this’ pointer is handled directly by the JIT, but many do as we’ll see in the rest of the post.</p>
<h3 id="stub-management">Stub Management</h3>
<p>So we’ve seen why stubs are needed and what type of functionality they can provide. But before we look at all the specific examples that exist in the <a href="https://github.com/dotnet/coreclr/tree/master/src">CoreCLR source</a>, I just wanted to take some time to understand the common or shared concerns that apply to all stubs.</p>
<p>Stubs in the CLR are snippets of assembly code, but they have to be stored in memory and have their life-time managed. Also, they have to play nice with the debugger, from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-code-guide.md">What Every CLR Developer Must Know Before Writing Code</a>:</p>
<blockquote>
<p><strong>2.8 Is your code compatible with managed debugging?</strong></p>
<ul>
<li>..</li>
<li>If you add a new stub (or way to call managed code), make sure that you can <strong>source-level step-in (F11) it under the debugger</strong>. The debugger is not psychic. A source-level step-in needs to be able to go <strong>from the source-line before a call to the source-line after the call</strong>, or managed code developers will be very confused. If you make that call transition be a giant 500 line stub, you must cooperate with the debugger for it to know how to step-through it. (<strong>This is what StubManagers are all about. See <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/stubmgr.h">src\vm\stubmgr.h</a></strong>). Try doing a step-in through your new codepath under the debugger.</li>
</ul>
</blockquote>
<p>So every type of stub has a <code class="language-plaintext highlighter-rouge">StubManager</code> which deals with the allocation, storage and lookup of the stubs. The lookup is significant, as it provides the mapping from an arbitrary memory address to the type of stub (if any) that created the code. As an example, here’s what the <code class="language-plaintext highlighter-rouge">CheckIsStub_Internal(..)</code> method <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L2107-L2122">here</a> and <code class="language-plaintext highlighter-rouge">DoTraceStub(..)</code> method <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L2124-L2140">here</a> look like for the <code class="language-plaintext highlighter-rouge">DelegateInvokeStubManager</code>:</p>
<pre><code class="language-C++">BOOL DelegateInvokeStubManager::CheckIsStub_Internal(PCODE stubStartAddress)
{
LIMITED_METHOD_DAC_CONTRACT;
bool fIsStub = false;
#ifndef DACCESS_COMPILE
#ifndef _TARGET_X86_
fIsStub = fIsStub || (stubStartAddress == GetEEFuncEntryPoint(SinglecastDelegateInvokeStub));
#endif
#endif // !DACCESS_COMPILE
fIsStub = fIsStub || GetRangeList()->IsInRange(stubStartAddress);
return fIsStub;
}
BOOL DelegateInvokeStubManager::DoTraceStub(PCODE stubStartAddress, TraceDestination *trace)
{
LIMITED_METHOD_CONTRACT;
LOG((LF_CORDB, LL_EVERYTHING, "DelegateInvokeStubManager::DoTraceStub called\n"));
_ASSERTE(CheckIsStub_Internal(stubStartAddress));
// If it's a MC delegate, then we want to set a BP & do a context-ful
// manager push, so that we can figure out if this call will be to a
// single multicast delegate or a multi multicast delegate
trace->InitForManagerPush(stubStartAddress, this);
LOG_TRACE_DESTINATION(trace, stubStartAddress, "DelegateInvokeStubManager::DoTraceStub");
return TRUE;
}
</code></pre>
<p>The code to initialise the various stub managers is <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/appdomain.cpp#L1668-L1679">here</a> in <code class="language-plaintext highlighter-rouge">SystemDomain::Attach()</code> and by working through the list we can get a sense of what each category of stub does (plus the informative comments in the code help!)</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">PrecodeStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L972-L1123">here</a>
<ul>
<li>‘<em>Stub manager functions & globals</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">DelegateInvokeStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L2061-L2343">here</a>
<ul>
<li>‘<em>Since we don’t generate delegate invoke stubs at runtime on IA64, we can’t use the StubLinkStubManager for these stubs. Instead, we create an additional DelegateInvokeStubManager instead.</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">JumpStubStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1334-L1381">here</a>
<ul>
<li>‘<em>Stub manager for jump stubs created by ExecutionManager::jumpStub() These are currently used only on the 64-bit targets IA64 and AMD64</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">RangeSectionStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1383-L1591">here</a>
<ul>
<li>‘<em>Stub manager for code sections. It forwards the query to the more appropriate stub manager, or handles the query itself.</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ILStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1593-L1893">here</a>
<ul>
<li>‘<em>This is the stub manager for IL stubs</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">InteropDispatchStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1897-L2058">here</a>
<ul>
<li>‘<em>This is used to recognize GenericComPlusCallStub, VarargPInvokeStub, and GenericPInvokeCalliHelper.</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">StubLinkStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1126-L1282">here</a></li>
<li><code class="language-plaintext highlighter-rouge">ThunkHeapStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L1286-L13321">here</a>
<ul>
<li>‘<em>Note, the only reason we have this stub manager is so that we can recgonize UMEntryThunks for IsTransitionStub. ..</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">TailCallStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L2350-L2488">here</a>
<ul>
<li>‘<em>This is the stub manager to help the managed debugger step into a tail call. It helps the debugger trace through JIT_TailCall().</em>’ (from stubmgr.h)</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ThePreStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1993-L2032">here</a> (in prestub.cpp)
<ul>
<li>‘<em>The following code manages the PreStub. All method stubs initially use the prestub.</em>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">VirtualCallStubManager</code> implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/virtualcallstub.cpp">here</a> (in virtualcallstub.cpp)
<ul>
<li>‘<em>VirtualCallStubManager is the heart of the stub dispatch logic. See the book of the runtime entry</em>’ (<a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">BOTR - Virtual Stub Dispatch</a>)</li>
</ul>
</li>
</ul>
<p>Finally, we can also see the ‘StubManagers’ in action if we use the <code class="language-plaintext highlighter-rouge">eeheap</code> <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension#commands">SOS command</a> to inspect the ‘heap dump’ of a .NET Process, as it helps report the size of the different ‘stub heaps’:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> !eeheap -loader
Loader Heap:
--------------------------------------
System Domain: 704fd058
LowFrequencyHeap: Size: 0x0(0)bytes.
HighFrequencyHeap: 002e2000(8000:1000) Size: 0x1000(4096)bytes.
StubHeap: 002ea000(2000:1000) Size: 0x1000(4096)bytes.
Virtual Call Stub Heap:
- IndcellHeap: Size: 0x0(0)bytes.
- LookupHeap: Size: 0x0(0)bytes.
- ResolveHeap: Size: 0x0(0)bytes.
- DispatchHeap: Size: 0x0(0)bytes.
- CacheEntryHeap: Size: 0x0(0)bytes.
Total size: 0x2000(8192)bytes
--------------------------------------
</code></pre></div></div>
<p>(output taken from <a href="https://blogs.msdn.microsoft.com/carlos/2009/11/09/net-generics-and-code-bloat-or-its-lack-thereof/">.NET Generics and Code Bloat (or its lack thereof)</a>)</p>
<p>You can see that in this case the entire ‘stub heap’ is taking up 4096 bytes and in addition there are more in-depth statistics covering the heaps used by <a href="#virtual-stub-dispatch-vsd">virtual call dispatch</a>.</p>
<hr />
<h2 id="types-of-stubs">Types of stubs</h2>
<p>The different stubs used by the runtime fall into 3 main categories:</p>
<ul>
<li><strong>Hand-written assembly code</strong> e.g. <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/amd64/PInvokeStubs.asm">/vm/amd64/PInvokeStubs.asm
</a></li>
<li><strong>Dynamically emitted assembly code</strong>, implemented in C++, e.g. <code class="language-plaintext highlighter-rouge">StubLinkerCPU::EmitShuffleThunk(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/stubs.cpp#L1760-L1802">/vm/arm64/stubs.cpp</a></li>
<li>‘<strong>Stubs-as-IL</strong>’ which we discuss <a href="#stubs-as-il">later on in this post</a>, for example <code class="language-plaintext highlighter-rouge">COMDelegate::GetMulticastInvoke(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L2324-L2444">/vm/comdelegate.cpp</a></li>
</ul>
<p>Most stubs are wired up in <code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code>, in this <a href="https://github.com/dotnet/coreclr/blob/964461ca69639003914fd4fedaf08baf1f388f7e/src/vm/prestub.cpp#L1891-L1941">section of code</a> or <a href="https://github.com/dotnet/coreclr/blob/964461ca69639003914fd4fedaf08baf1f388f7e/src/vm/prestub.cpp#L1801-LL1816">this section</a> for COM Interop. The stubs generated include the following (definitions taken from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#kinds-of-methoddescs">BOTR - ‘Kinds of MethodDescs’</a>, also see <code class="language-plaintext highlighter-rouge">enum MethodClassification</code> <a href="https://github.com/dotnet/coreclr/blob/855491b895b187bdc396c491884a370b11d999e9/src/vm/method.hpp#L100-L125">here</a>):</p>
<ul>
<li><strong>Instantiating</strong> in (<code class="language-plaintext highlighter-rouge">FEATURE_SHARE_GENERIC_CODE</code>, on by default) in <code class="language-plaintext highlighter-rouge">MakeInstantiatingStubWorker(..)</code> <a href="https://github.com/dotnet/coreclr/blob/964461ca69639003914fd4fedaf08baf1f388f7e/src/vm/prestub.cpp#L1505-L1552">here</a>
<ul>
<li><em>Used for less common IL methods that have generic instantiation or that do not have preallocated slot in method table.</em></li>
</ul>
</li>
<li><strong>P/Invoke</strong> (a.k.a <strong>NDirect</strong>) in <code class="language-plaintext highlighter-rouge">GetStubForInteropMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/2832f54a6602cd4c0dff4fa65163345ab3ad953c/src/vm/dllimport.cpp#L5757-L5824">here</a>
<ul>
<li><em>P/Invoke methods. These are methods marked with DllImport attribute.</em></li>
</ul>
</li>
<li><strong>FCall</strong> methods in <code class="language-plaintext highlighter-rouge">ECall::GetFCallImpl(..)</code> <a href="https://github.com/dotnet/coreclr/blob/1f3f474a13bdde1c5fecdf8cd9ce525dbe5df000/src/vm/ecall.cpp#L355-L522">here</a>
<ul>
<li><em>Internal methods implemented in unmanaged code. These are methods marked with <code class="language-plaintext highlighter-rouge">MethodImplAttribute(MethodImplOptions.InternalCall)</code> attribute, delegate constructors and tlbimp constructors.</em></li>
</ul>
</li>
<li><strong>Array</strong> methods in <code class="language-plaintext highlighter-rouge">GenerateArrayOpStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/cab949098dcdab9d458f102eb59e81311bac45c4/src/vm/array.cpp#L1023-L1068">here</a>
<ul>
<li><em>Array methods whose implementation is provided by the runtime (Get, Set, Address)</em></li>
</ul>
</li>
<li><strong>EEImpl</strong> in <code class="language-plaintext highlighter-rouge">PCODE COMDelegate::GetInvokeMethodStub(EEImplMethodDesc* pMD)</code> <a href="https://github.com/dotnet/coreclr/blob/d3e39bc2f81e3dbf9e4b96347f62b49d8700336c/src/vm/comdelegate.cpp#L2075-L2118">here</a>
<ul>
<li><em>Delegate methods, implementation provided by the runtime</em></li>
</ul>
</li>
<li><strong>COM Interop</strong> (<code class="language-plaintext highlighter-rouge">FEATURE_COMINTEROP</code>, on by default) in <code class="language-plaintext highlighter-rouge">GetStubForInteropMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/2832f54a6602cd4c0dff4fa65163345ab3ad953c/src/vm/dllimport.cpp#L5757-L5824">here</a>
<ul>
<li><em>COM interface methods. Since the non-generic interfaces can be used for COM interop by default, this kind is usually used for all interface methods.</em></li>
</ul>
</li>
<li><strong>Unboxing</strong> in <code class="language-plaintext highlighter-rouge">Stub * MakeUnboxingStubWorker(MethodDesc *pMD)</code> <a href="https://github.com/dotnet/coreclr/blob/964461ca69639003914fd4fedaf08baf1f388f7e/src/vm/prestub.cpp#L1470-L1502">here</a></li>
</ul>
<p>Right, now lets look at the individual stub in more detail.</p>
<h3 id="precode">Precode</h3>
<p>First up, we’ll take a look at ‘precode’ stubs, because they are used by all other types of stubs, as explained in the BotR page on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">Method Descriptors</a>:</p>
<blockquote>
<p>The precode is a small fragment of code used to implement temporary entry points and an <strong>efficient wrapper for stubs</strong>. Precode is a niche code-generator for these two cases, generating the most efficient code possible. In an ideal world, all native code dynamically generated by the runtime would be produced by the JIT. That’s not feasible in this case, given the specific requirements of these two scenarios. The basic precode on x86 may look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mov eax,pMethodDesc // Load MethodDesc into scratch register
jmp target // Jump to a target
</code></pre></div> </div>
<p><strong>Efficient Stub wrappers:</strong> The implementation of certain methods (e.g. P/Invoke, delegate invocation, multi dimensional array setters and getters) is provided by the runtime, typically as hand-written assembly stubs. Precode provides a <strong>space-efficient wrapper over stubs, to multiplex them for multiple callers</strong>.</p>
<p>The worker code of the stub is wrapped by a precode fragment that can be mapped to the MethodDesc and that jumps to the worker code of the stub. The worker code of the stub can be shared between multiple methods this way. It is an important optimization used to implement P/Invoke marshalling stubs.</p>
</blockquote>
<p>By providing a ‘pointer’ to the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/method.hpp#L178-L2041">MethodDesc class</a>, the precode allows any subsequent stub to have access to <em>a lot</em> of information about a method call and it’s containing <code class="language-plaintext highlighter-rouge">Type</code> via the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/methodtable.h#L601-L4175">MethodTable</a> (‘hot’) and <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/class.h#L797-L2050">EEClass</a> (‘cold’) data structures. The MethodDesc data-structure is one of the most fundamental types in the runtime, hence why it has it’s own <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md">BotR page</a>.</p>
<p>Each ‘precode’ is created in <code class="language-plaintext highlighter-rouge">MethodDesc::GetOrCreatePrecode()</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/method.cpp#L4712-L4776">here</a> and there are several different types as we can see in this <code class="language-plaintext highlighter-rouge">enum</code> from <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/precode.h#L19-L31">/vm/precode.h</a>:</p>
<pre><code class="language-C++">enum PrecodeType {
PRECODE_INVALID = InvalidPrecode::Type,
PRECODE_STUB = StubPrecode::Type,
#ifdef HAS_NDIRECT_IMPORT_PRECODE
PRECODE_NDIRECT_IMPORT = NDirectImportPrecode::Type,
#endif // HAS_NDIRECT_IMPORT_PRECODE
#ifdef HAS_FIXUP_PRECODE
PRECODE_FIXUP = FixupPrecode::Type,
#endif // HAS_FIXUP_PRECODE
#ifdef HAS_THISPTR_RETBUF_PRECODE
PRECODE_THISPTR_RETBUF = ThisPtrRetBufPrecode::Type,
#endif // HAS_THISPTR_RETBUF_PRECODE
};
</code></pre>
<p>As always, the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#types-of-precode">BotR page</a> describes the different types in great detail, but in summary:</p>
<ul>
<li><strong>StubPrecode</strong> - <em>.. is the basic precode type. It loads MethodDesc into a scratch register and then jumps. It must be implemented for precodes to work. It is used as fallback when no other specialized precode type is available.</em></li>
<li><strong>FixupPrecode</strong> - <em>.. is used when the final target does not require MethodDesc in scratch register. The FixupPrecode saves a few cycles by avoiding loading MethodDesc into the scratch register. The most common usage of FixupPrecode is for method fixups in NGen images.</em></li>
<li><strong>ThisPtrRetBufPrecode</strong> - <em>.. is used to switch a return buffer and the this pointer for open instance delegates returning valuetypes. It is used to convert the calling convention of <code class="language-plaintext highlighter-rouge">MyValueType Bar(Foo x)</code> to the calling convention of <code class="language-plaintext highlighter-rouge">MyValueType Foo::Bar()</code>.</em></li>
<li><strong>NDirectImportPrecode</strong> (a.k.a P/Invoke) - <em>.. is used for lazy binding of unmanaged P/Invoke targets. This precode is for convenience and to reduce amount of platform specific plumbing.</em></li>
</ul>
<p>Finally, to give you an idea of some real-world scenarios for ‘precode’ stubs, take a look at <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/cgenamd64.cpp#L732-L736">this comment</a> from the <code class="language-plaintext highlighter-rouge">DoesSlotCallPrestub(..)</code> method (AMD64):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// AMD64 has the following possible sequences for prestub logic:
// 1. slot -> temporary entrypoint -> prestub
// 2. slot -> precode -> prestub
// 3. slot -> precode -> jumprel64 (jump stub) -> prestub
// 4. slot -> precode -> jumprel64 (NGEN case) -> prestub
</code></pre></div></div>
<h4 id="just-in-time-jit-and-tiered-compilation">‘Just-in-time’ (JIT) and ‘Tiered’ Compilation</h4>
<p>However, another piece of functionality that ‘precodes’ provide is related to ‘just-in-time’ (JIT) compilation, again from the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">BotR page</a>:</p>
<blockquote>
<p><strong>Temporary entry points</strong>: Methods must provide entry points before they are jitted so that jitted code has an address to call them. These temporary entry points are provided by precode. They are a specific form of stub wrappers.</p>
<p><strong>This technique is a lazy approach to jitting</strong>, which provides a performance optimization in both space and time. Otherwise, the transitive closure of a method would need to be jitted before it was executed. This would be a waste, since only the dependencies of taken code branches (e.g. if statement) require jitting.</p>
<p>Each temporary entry point is much smaller than a typical method body. They need to be small since there are a lot of them, even at the cost of performance. The temporary entry points are executed just once before the actual code for the method is generated.</p>
</blockquote>
<p>So these ‘temporary entry points’ provide something concrete that can be referenced before a method has been JITted. They then trigger the JIT-compilation which does the job of generating the native code for a method. The entire process looks like this (dotted lines represent a pointer indirection, solid lines are a ‘control transfer’ e.g. a jmp/call assembly instruction):</p>
<p><strong>Before JITing</strong></p>
<p><img src="/images/2019/09/01 - Before JITing.svg" alt="Before JITing" /></p>
<p>Here we see the ‘temporary entry point’ pointing to the ‘fixup precode’, which ultimately calls into the <code class="language-plaintext highlighter-rouge">PrestubWorker()</code> function <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1555-L1668">here</a>.</p>
<p><strong>After JIting</strong></p>
<p><img src="/images/2019/09/02 - After JITing - Normal.svg" alt="After JIting - Normal" /></p>
<p>Once the method has been JITted, we can see that the <code class="language-plaintext highlighter-rouge">PrestubWorker</code> is now out of the picture and instead we have the native code for the function. In addition, there is now a ‘stable entry point’ that can be used by any other code that wants to execute the function. Also, we can see that the ‘fixup precode’ has been ‘backpatched’ to also point at the ‘native code’. For an idea of how this ‘back-patching’ works, see the <code class="language-plaintext highlighter-rouge">StubPrecode ::SetTargetInterlocked(..)</code> method <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/cgencpu.h#L551-L610">here</a> (ARM64).</p>
<p><strong>After JIting - Tiered Compilation</strong></p>
<p><img src="/images/2019/09/03 - After JITing - Tiered Compilation.svg" alt="After JIting - Tiered Compilation" /></p>
<p>However, there is also another ‘after’ scenario, now that .NET Core has <a href="https://devblogs.microsoft.com/dotnet/tiered-compilation-preview-in-net-core-2-1/">‘Tiered Compilation’</a>. Here we see that the ‘stable entry point’ still goes via the ‘fixup precode’, it doesn’t directly call into the ‘native code’. This is because ‘tiered compilation’ counts how many times a method is called and once it decides the method is ‘<em>hot</em>’, it re-compiles a more optimised version that will give better performance. This ‘call counting’ takes place in <a href="https://github.com/dotnet/coreclr/blob/d8d6d8a5/src/vm/prestub.cpp#L1986">this code</a> in <code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> which calls into <code class="language-plaintext highlighter-rouge">CodeVersionManager::PublishNonJumpStampVersionableCodeIfNecessary(..)</code> <a href="https://github.com/dotnet/coreclr/blob/d8d6d8a5/src/vm/codeversion.cpp#L2295-L2474">here</a> and then if <code class="language-plaintext highlighter-rouge">shouldCountCalls</code> is true, it ends up calling <code class="language-plaintext highlighter-rouge">CallCounter::OnMethodCodeVersionCalledSubsequently(..)</code> <a href="https://github.com/dotnet/coreclr/blob/d8d6d8a5/src/vm/callcounter.cpp#L83-L109">here</a>.</p>
<p>What’s been interesting to watch during the development of ‘tiered compilation’ is that (not surprisingly) there has been a significant amount of work to ensure that the extra level of indirection doesn’t make the entire process slower, for instance see <a href="https://github.com/dotnet/coreclr/pull/21292">Patch vtable slots and similar when tiering is enabled #21292</a>.</p>
<p>Like all the other stubs, ‘precodes’ have different versions for different CPU architectures. As a reference, the list below contains links to all of them:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">Precodes</code> (a.k.a ‘Precode Fixup Thunk’):
<ul>
<li>x86 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/asmhelpers.S#L538-L556">/vm/i386/asmhelpers.S</a></li>
<li>x64 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/AsmHelpers.asm#L251-L264">/vm/amd64/AsmHelpers.asm</a></li>
<li>ARM in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/asmhelpers.S#L468-L490">/vm/arm/asmhelpers.S</a></li>
<li>ARM64 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/asmhelpers.asm#L209-L230">/vm/arm64/asmhelpers.asm</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ThePreStub</code>:
<ul>
<li>x86 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/asmhelpers.S#L897-L933">/vm/i386/asmhelpers.S</a></li>
<li>x64 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/ThePreStubAMD64.asm#L11-L32">/vm/amd64/ThePreStubAMD64.asm</a></li>
<li>ARM in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/asmhelpers.S#L401-L442">/vm/arm/asmhelpers.S</a></li>
<li>ARM64 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/asmhelpers.asm#L230-L257">/vm/arm64/asmhelpers.asm</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">PreStubWorker(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1555-L1668">/vm/prestub.cpp</a></li>
<li><code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1706-L1989">here</a></li>
<li><code class="language-plaintext highlighter-rouge">MethodDesc::DoBackpatch(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L77-L257">here</a></li>
</ol>
<p>Finally, for even more information on the JITing process, see:</p>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2014/09/29/net-just-in-time-compilation-and-warming-up-your-system/">.NET Just in Time Compilation and Warming up Your System</a></li>
<li><a href="https://xoofx.com/blog/2018/04/12/writing-managed-jit-in-csharp-with-coreclr/">Writing a Managed JIT in C# with CoreCLR</a></li>
<li><a href="https://ntcore.com/files/netint_native.htm">.NET Internals and Native Compiling</a></li>
<li><a href="https://www.codeproject.com/Articles/26060/NET-Internals-and-Code-Injection">.NET Internals and Code Injection</a></li>
<li><a href="https://habr.com/ru/post/307088/">Intercepting .NET / CLR Functions</a> (in Russian, <a href="https://translate.google.com/translate?hl=&sl=auto&tl=en&u=https%3A%2F%2Fhabr.com%2Fru%2Fpost%2F307088%2F&sandbox=1">Google Translate version</a>)</li>
</ul>
<h3 id="stubs-as-il">Stubs-as-IL</h3>
<p>‘Stubs as IL’ actually describes several types of individual stubs, but what they all have in common is they’re generated from <a href="https://www.techopedia.com/definition/24290/intermediate-language-il-net">‘Intermediate Language’</a> (IL) which is then compiled by the JIT, in exactly the same way it handles the code we write (after it’s first been compiled from C#/F#/VB.NET into IL by another compiler).</p>
<p>This makes sense, it’s far easier to write the IL once and then have the JIT worry about compiling it for different CPU architectures, rather than having to write raw assembly each time (for x86/x64/arm/etc). However all stubs were hand-written assembly in <a href="https://github.com/dotnet/coreclr/pull/18476#issuecomment-400810704">.NET Framework 1.0</a>:</p>
<blockquote>
<p>What you have described is how it actually works. The only difference is that the shuffle thunk is hand-emitted in assembly and not generated by the JIT for historic reasons. All stubs (including all interop stubs) were hand-emitted like this in .NET Framework 1.0. <strong>Starting with .NET Framework 2.0, we have been converting the stubs to be generated by the JIT (the runtime generates IL for the stub, and then the JIT compiles the IL as regular method)</strong>. The shuffle thunk is one of the few remaining ones not converted yet. Also, we have the IL path on some platforms but not others - <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is related to it.</p>
</blockquote>
<p>In the CoreCLR source code, ‘stubs as IL’ are controlled by the feature flag <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code>, with the following additional flags for each specific type:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">StubsAsIL</code></li>
<li><code class="language-plaintext highlighter-rouge">ArrayStubAsIL</code></li>
<li><code class="language-plaintext highlighter-rouge">MulticastStubAsIL</code></li>
</ul>
<p>On <code class="language-plaintext highlighter-rouge">Windows</code> only some features are implemented with IL stubs, see <a href="https://github.com/dotnet/coreclr/blob/7df151664237b539de91f1394e97f145460d05b6/clr.featuredefines.props#L26-L28">this code</a>, e.g. ‘ArrayStubAsIL’ is disabled on ‘x86’, but enabled elsewhere.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><PropertyGroup</span> <span class="na">Condition=</span><span class="s">"'$(TargetsWindows)' == 'true'"</span><span class="nt">></span>
<span class="nt"><FeatureArrayStubAsIL</span> <span class="na">Condition=</span><span class="s">"'$(Platform)' != 'x86'"</span><span class="nt">></span>true<span class="nt"></FeatureArrayStubAsIL></span>
<span class="nt"><FeatureMulticastStubAsIL</span> <span class="na">Condition=</span><span class="s">"'$(Platform)' != 'x86'"</span><span class="nt">></span>true<span class="nt"></FeatureMulticastStubAsIL></span>
<span class="nt"><FeatureStubsAsIL</span> <span class="na">Condition=</span><span class="s">"'$(Platform)' == 'arm64'"</span><span class="nt">></span>true<span class="nt"></FeatureStubsAsIL></span>
...
<span class="nt"></PropertyGroup></span>
</code></pre></div></div>
<p>On <code class="language-plaintext highlighter-rouge">Unix</code> they are all done in IL, regardless of CPU Arch, as <a href="https://github.com/dotnet/coreclr/blob/7df151664237b539de91f1394e97f145460d05b6/clr.featuredefines.props#L20-L22">this code</a> shows:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><PropertyGroup</span> <span class="na">Condition=</span><span class="s">"'$(TargetsUnix)' == 'true'"</span><span class="nt">></span>
...
<span class="nt"><FeatureArrayStubAsIL></span>true<span class="nt"></FeatureArrayStubAsIL></span>
<span class="nt"><FeatureMulticastStubAsIL></span>true<span class="nt"></FeatureMulticastStubAsIL></span>
<span class="nt"><FeatureStubsAsIL></span>true<span class="nt"></FeatureStubsAsIL></span>
<span class="nt"></PropertyGroup></span>
</code></pre></div></div>
<p>Finally, here’s the complete list of stubs that can be implemented in IL from <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/ilstubresolver.h#L73-L93">/vm/ilstubresolver.h</a>:</p>
<pre><code class="language-C++"> enum ILStubType
{
Unassigned = 0,
CLRToNativeInteropStub,
CLRToCOMInteropStub,
CLRToWinRTInteropStub,
NativeToCLRInteropStub,
COMToCLRInteropStub,
WinRTToCLRInteropStub,
#ifdef FEATURE_ARRAYSTUB_AS_IL
ArrayOpStub,
#endif
#ifdef FEATURE_MULTICASTSTUB_AS_IL
MulticastDelegateStub,
#endif
#ifdef FEATURE_STUBS_AS_IL
SecureDelegateStub,
UnboxingILStub,
InstantiatingStub,
#endif
};
</code></pre>
<p>But the usage of IL stubs has grown over time and it seems that they are the preferred mechanism where possible as they’re easier to write and debug. See <a href="https://github.com/dotnet/coreclr/pull/9752">[x86/Linux] Enable FEATURE_ARRAYSTUB_AS_IL</a>, <a href="https://github.com/dotnet/coreclr/pull/11624">Switch multicast delegate stub on Windows x64 to use stubs-as-il</a> and <a href="https://github.com/dotnet/coreclr/pull/26169#issuecomment-521518184">Fix GenerateShuffleArray to support cyclic shuffles #26169 (comment)</a> for more information.</p>
<h3 id="pinvoke-reverse-pinvoke-and-calli">P/Invoke, Reverse P/Invoke and ‘calli’</h3>
<p>All these stubs have one thing in common, they allow a transition between ‘managed’ and ‘un-managed’ (or native) code. To make this safe and to preserve the guarantees that the .NET runtime provides, stubs are used every time the transition is made.</p>
<p>This entire process is outlined in great detail in the BotR page <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#per-call-site-pinvoke-work">CLR ABI - PInvokes</a>, from the ‘Per-call-site PInvoke work’ section:</p>
<blockquote>
<ol>
<li>For direct calls, the JITed code sets <code class="language-plaintext highlighter-rouge">InlinedCallFrame->m_pDatum</code> to the MethodDesc of the call target.
<ul>
<li>For JIT64, indirect calls within <strong>IL stubs</strong> sets it to the secret parameter (this seems redundant, but it might have changed since the per-frame initialization?).</li>
<li>For JIT32 (ARM) indirect calls, it sets this member to the size of the pushed arguments, according to the comments. The implementation however always passed 0.</li>
</ul>
</li>
<li>For JIT64/AMD64 only: Next for <strong>non-IL stubs</strong>, the InlinedCallFrame is ‘pushed’ by setting <code class="language-plaintext highlighter-rouge">Thread->m_pFrame</code> to point to the InlinedCallFrame (recall that the per-frame initialization already set <code class="language-plaintext highlighter-rouge">InlinedCallFrame->m_pNext</code> to point to the previous top). For <strong>IL stubs</strong> this step is accomplished in the per-frame initialization.</li>
<li>The Frame is made active by setting <code class="language-plaintext highlighter-rouge">InlinedCallFrame->m_pCallerReturnAddress</code>.</li>
<li>The code then toggles the GC mode by setting <code class="language-plaintext highlighter-rouge">Thread->m_fPreemptiveGCDisabled = 0</code>.</li>
<li>Starting now, no GC pointers may be live in registers. RyuJit LSRA meets this requirement by adding special refPositon <code class="language-plaintext highlighter-rouge">RefTypeKillGCRefs</code> before unmanaged calls and special helpers.</li>
<li><strong>Then comes the actual call/PInvoke.</strong></li>
<li>The GC mode is set back by setting <code class="language-plaintext highlighter-rouge">Thread->m_fPreemptiveGCDisabled = 1</code>.</li>
<li>Then we check to see if <code class="language-plaintext highlighter-rouge">g_TrapReturningThreads</code> is set (non-zero). If it is, we call <code class="language-plaintext highlighter-rouge">CORINFO_HELP_STOP_FOR_GC</code>.
<ul>
<li>For ARM, this helper call preserves the return register(s): <code class="language-plaintext highlighter-rouge">R0</code>, <code class="language-plaintext highlighter-rouge">R1</code>, <code class="language-plaintext highlighter-rouge">S0</code>, and <code class="language-plaintext highlighter-rouge">D0</code>.</li>
<li>For AMD64, the generated code must manually preserve the return value of the PInvoke by moving it to a non-volatile register or a stack location.</li>
</ul>
</li>
<li>Starting now, GC pointers may once again be live in registers.</li>
<li>Clear the <code class="language-plaintext highlighter-rouge">InlinedCallFrame->m_pCallerReturnAddress</code> back to 0.</li>
<li>For JIT64/AMD64 only: For <strong>non-IL stubs</strong> ‘pop’ the Frame chain by resetting <code class="language-plaintext highlighter-rouge">Thread->m_pFrame</code> back to <code class="language-plaintext highlighter-rouge">InlinedCallFrame.m_pNext</code>.</li>
</ol>
<p>Saving/restoring all the non-volatile registers helps by preventing any registers that are unused in the current frame from accidentally having a live GC pointer value from a parent frame. The argument and return registers are ‘safe’ because they cannot be GC refs. Any refs should have been pinned elsewhere and instead passed as native pointers.</p>
<p>For <strong>IL stubs</strong>, the Frame chain isn’t popped at the call site, so instead it must be popped right before the epilog and right before any jmp calls. It looks like we do not support tail calls from PInvoke <strong>IL stubs</strong>?</p>
</blockquote>
<p>As you can see, quite a bit of the work is to keep the Garbage Collector (GC) happy. This makes sense because once execution moves into un-managed/native code the .NET runtime has no control over what’s happening, so it needs to ensure that the GC doesn’t clean up or move around objects that are being used in the native code. It achives this by constraining what the GC can do (on the current thread) from the time execution moves into un-managed code and keeps that in place until it returns back to the mamanged side.</p>
<p>On top of that, there needs to be support for allowing <a href="/2019/01/21/Stackwalking-in-the-.NET-Runtime/">‘stack walking’ or ‘unwinding</a>, to allowing debugging and produce meaningful stack traces. This is done by setting up <em>frames</em> that are put in place when control transitions from managed -> un-managed, before being removed (‘popped’) when transitioning back. Here’s a list of the different scenarios that are covered, from <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/frames.h#L147-L193">/vm/frames.h</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This is the list of Interop stubs & transition helpers with information
regarding what (if any) Frame they used and where they were set up:
P/Invoke:
JIT inlined: The code to call the method is inlined into the caller by the JIT.
InlinedCallFrame is erected by the JITted code.
Requires marshaling: The stub does not erect any frames explicitly but contains
an unmanaged CALLI which turns it into the JIT inlined case.
Delegate over a native function pointer:
The same as P/Invoke but the raw JIT inlined case is not present (the call always
goes through an IL stub).
Calli:
The same as P/Invoke.
PInvokeCalliFrame is erected in stub generated by GenerateGetStubForPInvokeCalli
before calling to GetILStubForCalli which generates the IL stub. This happens only
the first time a call via the corresponding VASigCookie is made.
ClrToCom:
Late-bound or eventing: The stub is generated by GenerateGenericComplusWorker
(x86) or exists statically as GenericComPlusCallStub[RetBuffArg] (64-bit),
and it erects a ComPlusMethodFrame frame.
Early-bound: The stub does not erect any frames explicitly but contains an
unmanaged CALLI which turns it into the JIT inlined case.
ComToClr:
Normal stub:
Interpreted: The stub is generated by ComCall::CreateGenericComCallStub
(in ComToClrCall.cpp) and it erects a ComMethodFrame frame.
Prestub:
The prestub is ComCallPreStub (in ComCallableWrapper.cpp) and it erects a ComPrestubMethodFrame frame.
Reverse P/Invoke (used for C++ exports & fixups as well as delegates
obtained from function pointers):
Normal stub:
x86: The stub is generated by UMEntryThunk::CompileUMThunkWorker
(in DllImportCallback.cpp) and it is frameless. It calls directly
the managed target or to IL stub if marshaling is required.
non-x86: The stub exists statically as UMThunkStub and calls to IL stub.
Prestub:
The prestub is generated by GenerateUMThunkPrestub (x86) or exists statically
as TheUMEntryPrestub (64-bit), and it erects an UMThkCallFrame frame.
Reverse P/Invoke AppDomain selector stub:
The asm helper is IJWNOADThunkJumpTarget (in asmhelpers.asm) and it is frameless.
</code></pre></div></div>
<p>The P/Invoke <strong>IL stubs</strong> are <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1895-L1899">wired up</a> in the <code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> method (note that P/Invoke is also known as <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#kinds-of-methoddescs">‘NDirect’</a>), in addition they are also created <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1787-L1800">here</a> when being used for ‘COM Interop’. That code then calls into <code class="language-plaintext highlighter-rouge">GetStubForInteropMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/dllimport.cpp#L5757-L5824">in /vm/dllimport.cpp</a>, before branching off to handle each case:</p>
<ul>
<li>P/Invoke calls into <code class="language-plaintext highlighter-rouge">NDirect::GetStubForILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/dllimport.cpp#L5607-L5664">here</a></li>
<li>Reverse P/Invoke calls into another overload of <code class="language-plaintext highlighter-rouge">NDirect::GetStubForILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/dllimport.cpp#L5584-L5605">here</a></li>
<li>COM Interop goes to <code class="language-plaintext highlighter-rouge">ComPlusCall::GetStubForILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/clrtocomcall.cpp#L394-L471">here in /vm/clrtocomcall.cpp</a></li>
<li>EE implemented methods end up in <code class="language-plaintext highlighter-rouge">COMDelegate::GetStubForILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L1502-L1518">here</a> (for more info on <code class="language-plaintext highlighter-rouge">EEImpl</code> methods see <a href="https://github.com/dotnet/coreclr/blob/masterDocumentation/botr/method-descriptor.md#kinds-of-methoddescs">‘Kinds of MethodDescs’</a>)</li>
</ul>
<p>There are also hand-written assembly stubs for the differents scenarios, such as <code class="language-plaintext highlighter-rouge">JIT_PInvokeBegin</code>, <code class="language-plaintext highlighter-rouge">JIT_PInvokeEnd</code> and <code class="language-plaintext highlighter-rouge">VarargPInvokeStub</code>, these can be seen in the files below:</p>
<ul>
<li>x64 in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/amd64/PInvokeStubs.asm">/vm/amd64/PInvokeStubs.asm</a></li>
<li>x86 in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/i386/PInvokeStubs.asm">/vm/i386/PInvokeStubs.asm</a></li>
<li>ARM in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/arm/PInvokeStubs.asm">/vm/arm/PInvokeStubs.asm</a></li>
<li>ARM64 in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/arm64/PInvokeStubs.asm">/vm/arm64/PInvokeStubs.asm</a></li>
</ul>
<p>As an example, <code class="language-plaintext highlighter-rouge">calli</code> method calls (see <a href="https://docs.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.calli?view=netframework-4.8">OpCodes.Calli</a>) end up in <code class="language-plaintext highlighter-rouge">GenericPInvokeCalliHelper</code>, which has a nice bit of ASCII art in the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/asmhelpers.S#L636-L725">i386 version</a>:</p>
<pre><code class="language-C++">// stack layout at this point:
//
// | ... |
// | stack arguments | ESP + 16
// +----------------------+
// | VASigCookie* | ESP + 12
// +----------------------+
// | return address | ESP + 8
// +----------------------+
// | CALLI target address | ESP + 4
// +----------------------+
// | stub entry point | ESP + 0
// ------------------------
</code></pre>
<p>However, all these stubs can have an adverse impact on start-up time, see <a href="https://github.com/dotnet/coreclr/issues/22212">Large numbers of Pinvoke stubs created on startup</a> for example. This impact has been mitigated by compiling the stubs ‘Ahead-of-Time’ (AOT) and storing them in the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/readytorun-overview.md">‘Ready-to-Run’ images</a> (replacement format for <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/ngen-exe-native-image-generator">NGEN (Native Image Generator)</a>). From <a href="https://github.com/dotnet/coreclr/pull/24823">R2R ilstubs</a>:</p>
<blockquote>
<p>IL stub generation for interop takes measurable time at startup, and it is possible to generate some of them in an ahead of time</p>
<p>This change introduces ahead of time R2R compilation of IL stubs</p>
</blockquote>
<p>Related work was done in <a href="https://github.com/dotnet/coreclr/pull/22560">Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required</a> and <a href="https://github.com/dotnet/coreclr/pull/24834">PInvoke stubs for Unix platforms</a> (‘Enables inlining of PInvoke stubs for Unix platforms’).</p>
<p>Finally, for even more information on the issues involved, see:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/15465">Better diagnostic for collected delegate #15465</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/12731">Fill freed loader heap chunk with non-zero value #12731</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/13125">[Arm64] Implement Poison() #13125</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/15809">Collected delegate diagnostic #15809</a></li>
<li><a href="https://github.com/Firwood-Software/AdvancedDLSupport">AdvancedDLSupport</a>:
<ul>
<li>Delegate-based C# P/Invoke alternative - compatible with all platforms and runtimes.</li>
<li>Also see <a href="https://github.com/Firwood-Software/AdvancedDLSupport/blob/master/docs/developer_docs.md">‘The Developer Documentation’</a> for the project.</li>
</ul>
</li>
</ul>
<h4 id="marshalling">Marshalling</h4>
<p>However, dealing with the ‘managed’ to ‘un-managed’ transition is only one part of the story. The other is that there are also stubs created to deal with the ‘marshalling’ of arguments between the 2 sides. This process of ‘Interop Marshalling’ is explained nicely in the <a href="https://docs.microsoft.com/en-us/dotnet/framework/interop/interop-marshaling">Microsoft docs</a>:</p>
<blockquote>
<p>Interop marshaling governs how data is passed in method arguments and return values between managed and unmanaged memory during calls. Interop marshaling is a run-time activity performed by the common language runtime’s marshaling service.</p>
<p>Most data types have common representations in both managed and unmanaged memory. The interop marshaler handles these types for you. Other types can be ambiguous or not represented at all in managed memory.</p>
</blockquote>
<p>Like many stubs in the CLR, the marshalling stubs have evolved over time. As we can read in the excellent post <a href="https://devblogs.microsoft.com/dotnet/improvements-to-interop-marshaling-in-v4-il-stubs-everywhere/">Improvements to Interop Marshaling in V4: IL Stubs Everywhere</a>:</p>
<blockquote>
<p><strong>History</strong>
The 1.0 and 1.1 versions of the CLR had several different techniques for creating and executing these stubs that were each designed for marshaling different types of signatures. These techniques ranged from <strong>directly generated x86 assembly instructions</strong> for simple signatures to <strong>generating specialized ML (an internal marshaling language)</strong> and running them through an <strong>internal interpreter</strong> for the most complicated signatures. This system worked well enough – although not without difficulties – in 1.0 and 1.1 but presented us with a serious maintenance problem when 2.0, and its support for multiple processor architectures, came around.</p>
</blockquote>
<p>That’s right, there was an internal interpreter built into early version of the .NET CLR that had the job of running the ‘marshalling language’ (ML) code!</p>
<p>However, it then goes on to explain why this process wasn’t sustainable:</p>
<blockquote>
<p>We realized early in the process of adding 64 bit support to 2.0 that this approach was not sustainable across multiple architectures. <strong>Had we continued with the same strategy we would have had to create parallel marshaling infrastructures for each new architecture we supported (remember in 2.0 we introduced support for both x64 and IA64) which would, in addition to the initial cost, at least triple the cost of every new marshaling feature or bug fix</strong>. We needed one marshaling stub technology that would work on multiple processor architectures and could be efficiently executed on each one: enter IL stubs.</p>
</blockquote>
<p>The solution was to implement all stubs using ‘Intermediate Language’ (IL) that is CPU-agnostic. Then the JIT-compiler is used to convert the IL into machine code for each CPU architecture, which makes sense because it’s exactly what the JIT is good at. Also worth noting is that this work still continues today, for instance see <a href="https://github.com/dotnet/coreclr/pull/26340">Implement struct marshalling via IL Stubs instead of via FieldMarshalers #26340</a>.</p>
<p>Finally, there is a really nice investigation into the whole process in <a href="http://devops.lol/pinvoke-beyond-the-magic/">PInvoke: beyond the magic</a> (also <a href="http://devops.lol/compile-time-marshalling/">Compile time marshalling</a>). What’s also nice is that you can use PerfView to <a href="https://twitter.com/matthewwarren/status/1124268756413374465">see the stubs that the runtime generates</a>.</p>
<h3 id="generics">Generics</h3>
<p>It is reasonably well known that generics in .NET use ‘code sharing’ to save space. That is, given a generic method such as <code class="language-plaintext highlighter-rouge">public void Insert<T>(..)</code>, <strong>one method body</strong> of ‘native code’ will be created and shared by the <em>instantiated</em> types of <code class="language-plaintext highlighter-rouge">Insert<Foo>(..)</code> and <code class="language-plaintext highlighter-rouge">Insert<Bar>(..)</code> (assumning that <code class="language-plaintext highlighter-rouge">Foo</code> and <code class="language-plaintext highlighter-rouge">Bar</code> are <em>references</em> types), but <strong>different</strong> versions will be created for <code class="language-plaintext highlighter-rouge">Insert<int>(..)</code> and <code class="language-plaintext highlighter-rouge">Insert<double>(..)</code> (as <code class="language-plaintext highlighter-rouge">int</code>/<code class="language-plaintext highlighter-rouge">double</code> are <em>value</em> types). This is possible, for the <a href="https://stackoverflow.com/a/598738">reasons outlined by Jon Skeet</a> in a StackOverflow question:</p>
<blockquote>
<p>.. consider what the CLR needs to know about a type. It includes:</p>
<ul>
<li>The size of a value of that type (i.e. if you have a variable of some type, how much space will that memory need?)</li>
<li>How to treat the value in terms of garbage collection: is it a reference to an object, or a value which may in turn contain other references?</li>
</ul>
<p>For all reference types, the answers to these questions are the same. The size is just the size of a pointer, and the value is always just a reference (so if the variable is considered a root, the GC needs to recursively descend into it).</p>
<p>For value types, the answers can vary significantly.</p>
</blockquote>
<p>But, this poses a problem. What about if the ‘shared’ method needs to do something specific for each type, like call <code class="language-plaintext highlighter-rouge">typeof(T)</code>?</p>
<p>This whole issue is explained in these 2 great posts, which I really recommend you take the time to read:</p>
<ul>
<li><a href="https://yizhang82.dev/dotnet-generics-sharing">Sharing .NET generic code under the hood</a></li>
<li><a href="https://yizhang82.dev/dotnet-generics-typeof-t">typeof(TSecret) - the secret magic behind .NET generics</a></li>
</ul>
<p>I’m not going to repeat what they cover here, except to say that (not surprisingly) ‘stubs’ are used to solve this issue, in conjunction with a ‘hidden’ parameter. These stubs are known as ‘instantiating’ stubs and we can find out more about them in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/method.hpp#L811-L817">this comment</a>:</p>
<blockquote>
<p>Instantiating Stubs - Return TRUE if this is this a special stub used to implement an <strong>instantiated generic method or per-instantiation static method</strong>. The action of an instantiating stub is - pass on a <code class="language-plaintext highlighter-rouge">MethodTable</code> or <code class="language-plaintext highlighter-rouge">InstantiatedMethodDesc</code> extra argument to shared code</p>
</blockquote>
<p>The different scenarios are handled in <code class="language-plaintext highlighter-rouge">MakeInstantiatingStubWorker(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1510-L1534">/vm/prestub.cpp</a>, you can see the check for <code class="language-plaintext highlighter-rouge">HasMethodInstantiation</code> and the fall-back to a ‘per-instantiation static method’:</p>
<pre><code class="language-C++"> // It's an instantiated generic method
// Fetch the shared code associated with this instantiation
pSharedMD = pMD->GetWrappedMethodDesc();
_ASSERTE(pSharedMD != NULL && pSharedMD != pMD);
if (pMD->HasMethodInstantiation())
{
extraArg = pMD;
}
else
{
// It's a per-instantiation static method
extraArg = pMD->GetMethodTable();
}
Stub *pstub = NULL;
#ifdef FEATURE_STUBS_AS_IL
pstub = CreateInstantiatingILStub(pSharedMD, extraArg);
#else
CPUSTUBLINKER sl;
_ASSERTE(pSharedMD != NULL && pSharedMD != pMD);
sl.EmitInstantiatingMethodStub(pSharedMD, extraArg);
pstub = sl.Link(pMD->GetLoaderAllocator()->GetStubHeap());
#endif
</code></pre>
<p>As a reminder, <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is defined for <em>all</em> Unix versions of the CoreCLR, but on Windows it’s only used with ARM64.</p>
<ul>
<li>When <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is defined, the code calls into <code class="language-plaintext highlighter-rouge">CreateInstantiatingILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1353-L1452">here</a>. To get an overview of what it’s doing, we can take a look at the steps called-out in the code comments:
<ul>
<li><code class="language-plaintext highlighter-rouge">// 1. Build the new signature</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1394">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2. Emit the method body</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1398">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2.2 Push the rest of the arguments for x86</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1406">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2.3 Push the hidden context param</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1413">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2.4 Push the rest of the arguments for not x86</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1418">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2.5 Push the target address</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1425">here</a></li>
<li><code class="language-plaintext highlighter-rouge">// 2.6 Do the calli</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1428">here</a></li>
</ul>
</li>
<li>When <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is <strong>note</strong> defined, per CPU/OS versions of <code class="language-plaintext highlighter-rouge">EmitInstantiatingMethodStub(..)</code> are used, they exist for:
<ul>
<li>i386 in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L3197-L3394">/vm/i386/stublinkerx86.cpp</a></li>
<li>ARM in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L2121-L2142">/vm/arm/stubs.cpp</a></li>
</ul>
</li>
</ul>
<p>In the last case, (<code class="language-plaintext highlighter-rouge">EmitInstantiatingMethodStub(..)</code> on ARM), the stub shares code with the instantiating version of the <em>unboxing</em> stub, so the heavy-lifting is done in <code class="language-plaintext highlighter-rouge">StubLinkerCPU::ThumbEmitCallWithGenericInstantiationParameter(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1677-L2118">here</a>. This method is over 400 lines for fairly complex code, althrough there is also a nice piece of <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1917-L1951">ASCII art</a> (for info on why this ‘complex’ case is needed see <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1687-L1701">this comment</a>):</p>
<pre><code class="language-C++">// Complex case where we need to emit a new stack frame and copy the arguments.
// Calculate the size of the new stack frame:
//
// +------------+
// SP -> | | <-- Space for helper arg, if isRelative is true
// +------------+
// | | <-+
// : : | Outgoing arguments
// | | <-+
// +------------+
// | Padding | <-- Optional, maybe required so that SP is 64-bit aligned
// +------------+
// | GS Cookie |
// +------------+
// +-> | vtable ptr |
// | +------------+
// | | m_Next |
// | +------------+
// | | R4 | <-+
// Stub | +------------+ |
// Helper | : : |
// Frame | +------------+ | Callee saved registers
// | | R11 | |
// | +------------+ |
// | | LR/RetAddr | <-+
// | +------------+
// | | R0 | <-+
// | +------------+ |
// | : : | Argument registers
// | +------------+ |
// +-> | R3 | <-+
// +------------+
// Old SP -> | |
//
</code></pre>
<h3 id="delegates">Delegates</h3>
<p>Delegates in .NET provide a nice abstraction over the top of a function call, from <a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/">Delegates (C# Programming Guide)</a>:</p>
<blockquote>
<p>A delegate is a type that represents references to methods with a particular parameter list and return type. When you instantiate a delegate, you can associate its instance with any method with a compatible signature and return type. You can invoke (or call) the method through the delegate instance.</p>
</blockquote>
<p>But under the hood there is quite a bit going on, for the full story take a look at <a href="/2017/01/25/How-do-.NET-delegates-work/">How do .NET delegates work?</a>, but in summary, there are several different types of delegates, as shown in this table from <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L3145-L3175">/vm/comdelegate.cpp</a>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DELEGATE KINDS TABLE</span>
<span class="c1">//</span>
<span class="c1">// _target _methodPtr _methodPtrAux _invocationList _invocationCount</span>
<span class="c1">//</span>
<span class="c1">// 1- Instance closed 'this' ptr target method null null 0</span>
<span class="c1">// 2- Instance open non-virt delegate shuffle thunk target method null 0</span>
<span class="c1">// 3- Instance open virtual delegate Virtual-stub dispatch method id null 0</span>
<span class="c1">// 4- Static closed first arg target method null null 0</span>
<span class="c1">// 5- Static closed (special sig) delegate specialSig thunk target method first arg 0</span>
<span class="c1">// 6- Static opened delegate shuffle thunk target method null 0</span>
<span class="c1">// 7- Secure delegate call thunk MethodDesc (frame) target delegate creator assembly </span>
<span class="c1">//</span>
<span class="c1">// Delegate invoke arg count == target method arg count - 2, 3, 6</span>
<span class="c1">// Delegate invoke arg count == 1 + target method arg count - 1, 4, 5</span>
<span class="c1">//</span>
<span class="c1">// 1, 4 - MulticastDelegate.ctor1 (simply assign _target and _methodPtr)</span>
<span class="c1">// 5 - MulticastDelegate.ctor2 (see table, takes 3 args)</span>
<span class="c1">// 2, 6 - MulticastDelegate.ctor3 (take shuffle thunk)</span>
<span class="c1">// 3 - MulticastDelegate.ctor4 (take shuffle thunk, retrieve MethodDesc) ???</span>
<span class="c1">//</span>
<span class="c1">// 7 - Needs special handling</span>
</code></pre></div></div>
<p>The difference between <a href="https://blog.slaks.net/2011/06/open-delegates-vs-closed-delegates.html">Open Delegates vs. Closed Delegates</a> is nicely illustrated in this code sample from the linked post:</p>
<pre><code class="language-C#">Func<string> closed = new Func<string>("a".ToUpperInvariant);
Func<string, string> open = (Func<string, string>)
Delegate.CreateDelegate(
typeof(Func<string, string>),
typeof(string).GetMethod("ToUpperInvariant")
);
closed(); //Returns "A"
open("abc"); //Returns "ABC"
</code></pre>
<p>Stubs are used in several scenarios, including the intruiging named ‘shuffle thunk’ whose job it is to literally shuffle arguments around! In the simplest case, this process looks a bit like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Delegate Call: [delegateThisPtr, arg1, arg2, ...]
Method Call: [targetThisPtr, arg1, arg2, ...]
</code></pre></div></div>
<p>So when you invoke a delegate, the <code class="language-plaintext highlighter-rouge">Invoke(..)</code> method (generated by CLR), expects a ‘this’ pointer of the delegate object itself. However when the target method is called (i.e. the method the delagate ‘wraps’), the ‘this’ pointer needs to be the one for the type/class that the target method exists in, hence all the swapping/shuffling.</p>
<p>Of couse things get more complicated when you deal with static methods (no ‘this’ pointer) and different CPU calling conventions, as this answer to the question <a href="https://github.com/dotnet/coreclr/pull/18476#issuecomment-400805569">‘<em>What in the world is a shuffle thunk cache?</em>’</a> explains:</p>
<blockquote>
<p>When you use a delegate to call a method, <strong>the JIT doesn’t know at the time it generates the code what the delegate points to</strong>. It can e.g. be a member method or a static method. So the JIT generates arguments to registers and stack based on the signature of the delegate and the call then doesn’t call the target method directly, <strong>but a shuffle thunk instead</strong>. This thunk is generated based on the caller side signature and the real target method signature and shuffles the arguments in registers and on stack to correspond to the target calling convention. <strong>So if it needs to add “this” pointer into the first argument register, it needs to move the first argument register to the second, the second to the third and the last to the stack (obviously in the right order so that nothing gets overwritten)</strong>. And e.g. Unix amd64 calling convention makes it even more interesting when there are arguments that are structs that can be passed in multiple registers.</p>
</blockquote>
<h4 id="singlecast-delegates">Singlecast Delegates</h4>
<p>‘Singlecast’ delegates (as opposed to the <a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/how-to-combine-delegates-multicast-delegates">‘multicast’ variants</a>) are the most common scenario and so they’re written as optimised ‘stubs’, starting in:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1706-L1989">here</a>, specifically when <code class="language-plaintext highlighter-rouge">IsEEImpl()</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1916-L1921">is true</a> which calls into</li>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::GetInvokeMethodStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L2075-L2118">here</a>, that then calls</li>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::TheDelegateInvokeStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L2043-L2073">here</a>
<ul>
<li>If <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is <strong>not</strong> defined, it calls into <code class="language-plaintext highlighter-rouge">EmitDelegateInvoke()</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L4185-L4218">/vm/i386/stublinkerx86.cpp</a> (for x86)</li>
<li>If <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is defined, a per-CPU/OS version of <code class="language-plaintext highlighter-rouge">SinglecastDelegateInvokeStub</code> is wired up:
<ul>
<li><strong>Windows</strong>
<ul>
<li>AMD64 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/AsmHelpers.asm#L746-L761">/vm/amd64/AsmHelpers.asm</a></li>
<li>ARM <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/asmhelpers.asm#L334-L348">/vm/arm/asmhelpers.asm</a></li>
<li>ARM64 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/asmhelpers.asm#L615-L629">/vm/arm64/asmhelpers.asm</a></li>
</ul>
</li>
<li><strong>Unix</strong>
<ul>
<li>i386 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/asmhelpers.S#L1055-L1069">/vm/i386/asmhelpers.S</a></li>
<li>AMD64 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/unixasmhelpers.S#L216-L231">/vm/amd64/unixasmhelpers.S</a></li>
<li>ARM <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/asmhelpers.S#L269-L283">/vm/arm/asmhelpers.S</a></li>
<li>ARM64 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/asmhelpers.S#L558-L572">/vm/arm64/asmhelpers.S</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>For example, this is the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/AsmHelpers.asm#L746-L761">AMD64 (Windows) version</a> of <code class="language-plaintext highlighter-rouge">SinglecastDelegateInvokeStub</code>:</p>
<pre><code class="language-assembly">LEAF_ENTRY SinglecastDelegateInvokeStub, _TEXT
test rcx, rcx
jz NullObject
mov rax, [rcx + OFFSETOF__DelegateObject___methodPtr]
mov rcx, [rcx + OFFSETOF__DelegateObject___target] ; replace "this" pointer
jmp rax
NullObject:
mov rcx, CORINFO_NullReferenceException_ASM
jmp JIT_InternalThrow
LEAF_END SinglecastDelegateInvokeStub, _TEXT
</code></pre>
<p>As you can see, it reaches into the internals of the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/object.h#L2223-L2266">DelegateObject</a>, pulls out the values in the <code class="language-plaintext highlighter-rouge">methodPtr</code> and <code class="language-plaintext highlighter-rouge">target</code> fields and puts them into the the <code class="language-plaintext highlighter-rouge">rax</code> and <code class="language-plaintext highlighter-rouge">rcx</code> registers.</p>
<h4 id="shuffle-thunks">Shuffle Thunks</h4>
<p>Finally, let’s look at ‘shuffle thunks’ in more detail (cases 2, 3, 6 from the table above).</p>
<ul>
<li>There are created in several places in the CoreCLR source, which all call into <code class="language-plaintext highlighter-rouge">COMDelegate::SetupShuffleThunk(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L580-L636">here</a>
<ol>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::BindToMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L864-L1037">here</a></li>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::DelegateConstruct(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L1680-L1889">here</a></li>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::GetDelegateCtor(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L3049-L3256">here</a></li>
</ol>
</li>
<li><code class="language-plaintext highlighter-rouge">COMDelegate::SetupShuffleThunk(..)</code> then calls <code class="language-plaintext highlighter-rouge">GenerateShuffleArray(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L243-L508">here</a></li>
<li>Followed by a call to <code class="language-plaintext highlighter-rouge">StubCacheBase::Canonicalize(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubcache.cpp#L70-L165">here</a>, that ends up in <code class="language-plaintext highlighter-rouge">ShuffleThunkCache::CompileStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.h#L234-L244">here</a></li>
<li>This ends up calls the CPU-specific method <code class="language-plaintext highlighter-rouge">EmitShuffleThunk(..)</code>:
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L3815-L4096">src/vm/i386</a> (also does AMD64 and UNIX_AMD64_ABI)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1422-L1604">src/vm/arm</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/stubs.cpp#L1760-L1802">src/vm/arm64</a></li>
</ul>
</li>
</ul>
<p>Note how the stubs are cached in the <code class="language-plaintext highlighter-rouge">ShuffleThunkCache</code> where possible. This is because the thunks don’t have to be unique <em>per method</em> they can be shared across multiple methods as long as the signatures are compatible.</p>
<p>However, these stubs are not straight-forward and sometimes they go wrong, for instance <a href="https://github.com/dotnet/coreclr/issues/26054">Infinite loop in GenerateShuffleArray on unix64 #26054</a>, fixed in <a href="https://github.com/dotnet/coreclr/pull/26169">PR #26169</a>. Also see <a href="https://github.com/dotnet/coreclr/issues/16833">Corrupted struct passed to delegate constructed via reflection #16833</a> and
<a href="https://github.com/dotnet/coreclr/pull/16904">Fix shuffling thunk for Unix AMD64 #16904</a> for more examples.</p>
<p>To give a flavour of what they need to do, here’s the code of the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/stubs.cpp#L1760-L1802">ARM64 version</a>, which is by far the simplest one!! If you want to understand the full complexities, take a look at the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1422-L1604">ARM version</a> which is 182 LOC or the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L3815-L4096">x86 one</a> at 281 LOC!!</p>
<pre><code class="language-C++">// Emits code to adjust arguments for static delegate target.
VOID StubLinkerCPU::EmitShuffleThunk(ShuffleEntry *pShuffleEntryArray)
{
// On entry x0 holds the delegate instance. Look up the real target address stored in the MethodPtrAux
// field and save it in x16(ip). Tailcall to the target method after re-arranging the arguments
// ldr x16, [x0, #offsetof(DelegateObject, _methodPtrAux)]
EmitLoadStoreRegImm(eLOAD, IntReg(16), IntReg(0), DelegateObject::GetOffsetOfMethodPtrAux());
//add x11, x0, DelegateObject::GetOffsetOfMethodPtrAux() - load the indirection cell into x11 used by ResolveWorkerAsmStub
EmitAddImm(IntReg(11), IntReg(0), DelegateObject::GetOffsetOfMethodPtrAux());
for (ShuffleEntry* pEntry = pShuffleEntryArray; pEntry->srcofs != ShuffleEntry::SENTINEL; pEntry++)
{
if (pEntry->srcofs & ShuffleEntry::REGMASK)
{
// If source is present in register then destination must also be a register
_ASSERTE(pEntry->dstofs & ShuffleEntry::REGMASK);
EmitMovReg(IntReg(pEntry->dstofs & ShuffleEntry::OFSMASK), IntReg(pEntry->srcofs & ShuffleEntry::OFSMASK));
}
else if (pEntry->dstofs & ShuffleEntry::REGMASK)
{
// source must be on the stack
_ASSERTE(!(pEntry->srcofs & ShuffleEntry::REGMASK));
EmitLoadStoreRegImm(eLOAD, IntReg(pEntry->dstofs & ShuffleEntry::OFSMASK), RegSp, pEntry->srcofs * sizeof(void*));
}
else
{
// source must be on the stack
_ASSERTE(!(pEntry->srcofs & ShuffleEntry::REGMASK));
// dest must be on the stack
_ASSERTE(!(pEntry->dstofs & ShuffleEntry::REGMASK));
EmitLoadStoreRegImm(eLOAD, IntReg(9), RegSp, pEntry->srcofs * sizeof(void*));
EmitLoadStoreRegImm(eSTORE, IntReg(9), RegSp, pEntry->dstofs * sizeof(void*));
}
}
// Tailcall to target
// br x16
EmitJumpRegister(IntReg(16));
}
</code></pre>
<h3 id="unboxing">Unboxing</h3>
<p>I’ve written about this type of ‘stub’ before in <a href="/2017/08/02/A-look-at-the-internals-of-boxing-in-the-CLR/#unboxing-stub-creation">A look at the internals of ‘boxing’ in the CLR</a>, but in summary the unboxing stub needs to handle steps 2) and 3) from the diagram below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. MyStruct: [0x05 0x00 0x00 0x00]
| Object Header | MethodTable | MyStruct |
2. MyStruct (Boxed): [0x40 0x5b 0x6f 0x6f 0xfe 0x7 0x0 0x0 0x5 0x0 0x0 0x0]
^
object 'this' pointer |
| Object Header | MethodTable | MyStruct |
3. MyStruct (Boxed): [0x40 0x5b 0x6f 0x6f 0xfe 0x7 0x0 0x0 0x5 0x0 0x0 0x0]
^
adjusted 'this' pointer |
</code></pre></div></div>
<p><strong>Key to the diagram</strong></p>
<ol>
<li>Original <code class="language-plaintext highlighter-rouge">struct</code>, on the <strong>stack</strong></li>
<li>The <code class="language-plaintext highlighter-rouge">struct</code> being <em>boxed</em> into an <code class="language-plaintext highlighter-rouge">object</code> that lives on the <strong>heap</strong></li>
<li>Adjustment made to <em>this</em> pointer so <code class="language-plaintext highlighter-rouge">MyStruct::ToString()</code> will work</li>
</ol>
<p>These stubs make is possible for ‘value types’ (structs) to override methods from <code class="language-plaintext highlighter-rouge">System.Object</code>, such as <code class="language-plaintext highlighter-rouge">ToString()</code> and <code class="language-plaintext highlighter-rouge">GetHashCode()</code>. The fix-up is needed because structs don’t have an ‘object header’, but when they’re <em>boxed</em> into an <code class="language-plaintext highlighter-rouge">Object</code> they do. So the stub has the job of moving or adjusting the ‘this’ pointer so that the code in the <code class="language-plaintext highlighter-rouge">ToString()</code> method can work the same, regardless of whether it’s operating on a regular ‘struct’ or one that’s been boxed into an ‘object.</p>
<p>The unboxing stubs are created in <code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1455-L1487">here</a>, which in turn calls into <code class="language-plaintext highlighter-rouge">MakeUnboxingStubWorker(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1455-L1487">here</a></p>
<ul>
<li>when <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is <strong>disabled</strong> it then calls <code class="language-plaintext highlighter-rouge">EmitUnboxMethodStub(..)</code> to create the stub, there are per-CPU versions:
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L3135-L3193">i386</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L2145-L2189">ARM</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm64/stubs.cpp#L1831-L1841">ARM64</a></li>
</ul>
</li>
<li>when <code class="language-plaintext highlighter-rouge">FEATURE_STUBS_AS_IL</code> is <strong>enabled</strong> is instead calls into <code class="language-plaintext highlighter-rouge">CreateUnboxingILStubForSharedGenericValueTypeMethods(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1258-L1351">here</a></li>
</ul>
<p>For more information on some of the internal details of unboxing stubs and how they interact with <a href="#generics">‘generic instantiations’</a> see <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/genmeth.cpp#L641-L703">this informative comment</a> and one in the code for <code class="language-plaintext highlighter-rouge">MethodDesc::FindOrCreateAssociatedMethodDesc(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/genmeth.cpp#L843-L1036">here</a>.</p>
<h3 id="arrays">Arrays</h3>
<p>As discussed <a href="#why-are-stubs-needed">at the beginning</a>, the method bodies for arrays is provided by the runtime, that is the array access methods, ‘get’ and ‘set’, that allow <code class="language-plaintext highlighter-rouge">var a = myArray[5]</code> and <code class="language-plaintext highlighter-rouge">myArray[7] = 5</code> to work. Not surprisingly, these are done as stubs to allow them to be as small and efficient as possible.</p>
<p>Here is the flow for wiring up ‘array stubs’. It all starts up in <code class="language-plaintext highlighter-rouge">MethodDesc::DoPrestub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/prestub.cpp#L1914">here</a>:</p>
<ul>
<li>If <code class="language-plaintext highlighter-rouge">FEATURE_ARRAYSTUB_AS_IL</code> is defined (see <a href="#stubs-as-il">‘Stubs-as-IL’</a>), it happens in <code class="language-plaintext highlighter-rouge">GenerateArrayOpStub(ArrayMethodDesc* pMD)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/array.cpp#L1023-L1068">here</a>
<ul>
<li>Then <code class="language-plaintext highlighter-rouge">ArrayOpLinker::EmitStub()</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/array.cpp#L778-L1020">here</a>, which is responsible for generating 3 types of stubs <code class="language-plaintext highlighter-rouge">{ ILSTUB_ARRAYOP_GET, ILSTUB_ARRAYOP_SET, ILSTUB_ARRAYOP_ADDRESS }</code>.</li>
<li>Before calling <code class="language-plaintext highlighter-rouge">ILStubCache::CreateAndLinkNewILStubMethodDesc(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/ilstubcache.cpp#L84-L138">here</a></li>
<li>Finally ending up in <code class="language-plaintext highlighter-rouge">JitILStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/dllimport.cpp#L5666-L5712">here</a></li>
</ul>
</li>
<li>When <code class="language-plaintext highlighter-rouge">FEATURE_ARRAYSTUB_AS_IL</code> isn’t defined, happens in another version of <code class="language-plaintext highlighter-rouge">GenerateArrayOpStub(ArrayMethodDesc* pMD)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/array.cpp#L1255-L1271">lower down</a>
<ul>
<li>Then <code class="language-plaintext highlighter-rouge">void GenerateArrayOpScript(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/array.cpp#L1071-L1226">here</a></li>
<li>Followed by a call to <code class="language-plaintext highlighter-rouge">StubCacheBase::Canonicalize(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubcache.cpp#L70-L165">here</a>, that ends up in <code class="language-plaintext highlighter-rouge">ArrayStubCache::CompileStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/array.cpp#L1273-L1279">here</a>.</li>
<li>Eventually, we end up in <code class="language-plaintext highlighter-rouge">StubLinkerCPU::EmitArrayOpStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L4786-L5636">here</a>, which does the heavy lifting (despite being under ‘\src\vm\i386' seems to support x86 and AMD64?)</li>
</ul>
</li>
</ul>
<p>I’m not going to include the code for the ‘stub-as-IL’ (<code class="language-plaintext highlighter-rouge">ArrayOpLinker::EmitStub()</code>) or the assembly code (<code class="language-plaintext highlighter-rouge">StubLinkerCPU::EmitArrayOpStub(..)</code>) versions of the array stubs because they’re both 100’s of lines long, dealing with type and bounds checking, computing address, multi-dimensional arrays and mode. But to give an idea of the complexities, take a look at this comment from <code class="language-plaintext highlighter-rouge">StubLinkerCPU::EmitArrayOpStub(..)</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L4809">here</a>:</p>
<pre><code class="language-C++">// Register usage
//
// x86 AMD64
// Inputs:
// managed array THIS_kREG (ecx) THIS_kREG (rcx)
// index 0 edx rdx
// index 1/value <stack> r8
// index 2/value <stack> r9
// expected element type for LOADADDR eax rax rdx
// Working registers:
// total (accumulates unscaled offset) edi r10
// factor (accumulates the slice factor) esi r11
</code></pre>
<p>Finally, these stubs are still being improved, for example see <a href="https://github.com/dotnet/coreclr/pull/22376/files">Use unsigned index extension in muldi-dimensional array stubs</a>.</p>
<h3 id="tail-calls">Tail Calls</h3>
<p>The .NET runtime provides a nice optimisation when doing ‘tail calls’, that (amoung other things) will prevent <code class="language-plaintext highlighter-rouge">StackoverflowExceptions</code> in recursive scenarios. For more on <em>why</em> these tail call optimisations are useful and how they work, take a look at:</p>
<ul>
<li><a href="https://dev.to/rohit/demystifying-tail-call-optimization-5bf3">Demystifying Tail Call Optimization</a></li>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2008/09/26/tail-call-optimization/">Tail call optimization</a></li>
<li><a href="https://volgarev.me/2013/09/27/tail-recursion-and-trampolining-in-csharp.html">Tail Recursion And Trampolining In C#</a></li>
<li><a href="https://blogs.msdn.microsoft.com/davbr/2007/06/20/enter-leave-tailcall-hooks-part-2-tall-tales-of-tail-calls/">Enter, Leave, Tailcall Hooks Part 2: Tall tales of tail calls</a></li>
<li><a href="https://blogs.msdn.microsoft.com/davbr/2007/06/20/tail-call-jit-conditions/">Tail call JIT conditions</a></li>
</ul>
<p>In summary, a tail call optimisation allows the <em>same</em> stack frame to be re-used if in the <em>caller</em>, there is no work done after the function call to the <em>callee</em> (see <a href="https://blogs.msdn.microsoft.com/davbr/2007/06/20/tail-call-jit-conditions/">Tail call JIT conditions</a> (2007) for a more precise definition).</p>
<p>And why is this beneficial? From <a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2009/05/11/tail-call-improvements-in-net-framework-4/">Tail Call Improvements in .NET Framework 4</a>:</p>
<blockquote>
<p>The primary reason for a tail call as an optimization is to improve data locality, memory usage, and cache usage. By doing a tail call the callee will use the same stack space as the caller. This reduces memory pressure. It marginally improves the cache because the same memory is reused for subsequent callers and thus can stay in the cache, rather than evicting some older cache line to make room for a new cache line.</p>
</blockquote>
<p>To make this clear, the code below <em>can</em> benefit from the optimisation, because both functions return straight after calling each the other:</p>
<pre><code class="language-C#">public static long Ping(int cnt, long val)
{
if (cnt-- == 0)
return val;
return Pong(cnt, val + cnt);
}
public static long Pong(int cnt, long val)
{
if (cnt-- == 0)
return val;
return Ping(cnt, val + cnt);
}
</code></pre>
<p>However, if the code was changed to the version below, the optimisation would no longer work because <code class="language-plaintext highlighter-rouge">PingNotOptimised(..)</code> does some extra work between calling <code class="language-plaintext highlighter-rouge">Pong(..)</code> and when it returns:</p>
<pre><code class="language-C#">public static long PingNotOptimised(int cnt, long val)
{
if (cnt-- == 0)
return val;
var result = Pong(cnt, val + cnt);
result += 1; // prevents the Tail-call optimization
return result;
}
public static long Pong(int cnt, long val)
{
if (cnt-- == 0)
return val;
return PingNotOptimised(cnt, val + cnt);
}
</code></pre>
<p>You can see the difference in the code emitted by the JIT compiler for the different scenarios <a href="https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgAJiBGAdgFgAoAb0fLfIHoPyBhCAExjlc2fAAcANkIBmUCPnIALDBjG4QXAOYBLDIoCuwAHSR8HfhAwA7GBg5hci7FEnYrmjtty59MXB1IAVhQUAGIvHxhTfBgrDDgaAE4ABgAOVFJSVnZqJEoUcgBZbG0rAApqZIBtAF1yZ01cAEpsthYGdk7ODmojakSygAVSzTLSjCNihAA1bAlfNHJkpqaAblaurl7+oZGAOUsAeTEMbXwvGH4xuMnsGbmFpZX1jvYAX0YN4gBmSio8iQQdzkYbua4YchgOKLQHAgBucxarzaG062mk5DKUPicHIAF48U9UV02MQaOQERIXsTSeTBkDRtjFpTyABqSFxNYktgbD7I8hfX65ciwzQg/ZHE5nC5XcYcjAwhkUxEbdrc8jozHYuC4glE/nqsnKqmfA3sBFQciwHwSCGE+lgpnGtnytY0q1+fS2tmEqirbrkMSwOGxDC4ch6IQAFRKEjgYDmEnIECl5wAXthTkD3UbrV6MC9OnzOoK/gClQ7RnKnaLjUjOmruZqsXEdfjCct3Z0jZTqWbaeLHdDnezsW7+90jaDNAcMMdTudcJcWwqR67C+9GG8gA=">in SharpLab</a>.</p>
<p>But where do the ‘tail call optimisation <em>stubs</em>’ come into play? Helpfully there is a tail call related <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/tailcalls-with-helpers.md">design doc</a> that explains, from ‘current way of handling tail-calls’:</p>
<blockquote>
<p><strong>Fast tail calls</strong>
These are tail calls that are handled directly by the jitter and no runtime cooperation is needed. They are limited to cases where:</p>
<ul>
<li>Return value and call target arguments are all either primitive types, reference types, or valuetypes with a single primitive type or reference type fields</li>
<li>The aligned size of call target arguments is less or equal to aligned size of caller arguments</li>
</ul>
</blockquote>
<p>So, the stubs aren’t always needed, sometimes the work can be done by the JIT, if there scenario is simple enough.</p>
<p>However for the more complex cases, a ‘helper’ stub is needed:</p>
<blockquote>
<p><strong>Tail calls using a helper</strong>
Tail calls in cases where we cannot perform the call in a simple way are implemented using a tail call helper. Here is a rough description of how it works:</p>
<ul>
<li>For each tail call target, the jitter asks runtime to <strong>generate an assembler argument copying routine</strong>. This routine reads vararg list of arguments and places the arguments in their proper slots in the CONTEXT or on the stack. Together with the argument copying routine, the runtime also builds a list of offsets of references and byrefs for return value of reference type or structs returned in a hidden return buffer and for structs passed by ref. The gc layout data block is stored at the end of the argument copying thunk.</li>
<li>At the time of the tail call, the caller generates a vararg list of all arguments of the tail called function and then calls <code class="language-plaintext highlighter-rouge">JIT_TailCall</code> runtime function. It passes it the copying routine address, the target address and the vararg list of the arguments.</li>
<li>The <code class="language-plaintext highlighter-rouge">JIT_TailCall</code> then performs the following:
…</li>
</ul>
</blockquote>
<p>To see the rest of the steps that <code class="language-plaintext highlighter-rouge">JIT_TailCall</code> takes you can read the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/tailcalls-with-helpers.md#tail-calls-using-a-helper">design doc</a> or if you’re really keen you can look at the code in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/jithelpers.cpp#L5978-L6195">/vm/jithelpers.cpp</a>. Also, there’s a useful explanation of what it needs to handle in the JIT code, see <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/jit/morph.cpp#L7491-L7528">here</a> and <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/jit/lower.cpp#L2125-L2150">here</a>.</p>
<p>However, we’re just going to focus on the stubs, refered to as an ‘assembler argument copying routine’. Firstly, we can see that they have their own stub manager, <code class="language-plaintext highlighter-rouge">TailCallStubManager</code>, which is implemented <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/stubmgr.cpp#L2350-L2488">here</a> and allows the stubs to play nicely with the debugger. Also interesting to look at is the <code class="language-plaintext highlighter-rouge">TailCallFrame</code> <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/frames.h#L3141-L3271">here</a> that is used to ensure that the ‘stack walker’ can work well with tail calls.</p>
<p>Now, onto the stubs themselves, the ‘copying routines’ are provided by the runtime via a call to <code class="language-plaintext highlighter-rouge">CEEInfo::getTailCallCopyArgsThunk(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/jitinterface.cpp#L13814-L13838">/vm/jitinterface.cpp</a>. This in turn calls the CPU specific versions of <code class="language-plaintext highlighter-rouge">CPUSTUBLINKER::CreateTailCallCopyArgsThunk(..)</code>:</p>
<ul>
<li>X86 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L6042-L6435">/vm/i386/stublinkerx86.cpp</a></li>
<li>ARM <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L3089-L3339">/vm/arm/stubs.cpp</a></li>
</ul>
<p>These routines have the complex and hairy job of dealing with the CPU registers and calling conventions. They achieve this by dynamicially emitting assembly instructions, to create a function that looks like the <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/stublinkerx86.cpp#L6051-L6064">following pseudo code</a> (X86 version):</p>
<pre><code class="language-C++"> // size_t CopyArguments(va_list args, (RCX)
// CONTEXT *pCtx, (RDX)
// DWORD64 *pvStack, (R8)
// size_t cbStack) (R9)
// {
// if (pCtx != NULL) {
// foreach (arg in args) {
// copy into pCtx or pvStack
// }
// }
// return <size of stack needed>;
// }
</code></pre>
<p>In addition there is one other type of stub that is used. Known as the <code class="language-plaintext highlighter-rouge">TailCallHelperStub</code>, they also come in per-CPU versions:</p>
<ul>
<li>AMD64 <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/JitHelpers_Fast.asm#L858-L956">/vm/amd64/JitHelpers_Fast.asm</a></li>
<li>ARM <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/asmhelpers.asm#L203-L315">/vm/arm/asmhelpers.asm</a>.</li>
</ul>
<p>Going forward, there are several limitations of to this approach of using per-CPU stubs, <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/tailcalls-with-helpers.md#tail-calls-using-a-helper">as the design doc explains</a>:</p>
<blockquote>
<ul>
<li><strong>It is expensive to port to new platforms</strong>
<ul>
<li>Parsing the vararg list is not possible to do in a portable way on Unix. Unlike on Windows, the list is not stored a linear sequence of the parameter data bytes in memory. va_list on Unix is an opaque data type, some of the parameters can be in registers and some in the memory.</li>
<li>Generating the copying asm routine needs to be done for each target architecture / platform differently. And it is also very complex, error prone and impossible to do on platforms where code generation at runtime is not allowed.</li>
</ul>
</li>
<li><strong>It is slower than it has to be</strong>
<ul>
<li>The parameters are copied possibly twice - once from the vararg list to the stack and then one more time if there was not enough space in the caller’s stack frame.</li>
<li><code class="language-plaintext highlighter-rouge">RtlRestoreContext</code> restores all registers from the <code class="language-plaintext highlighter-rouge">CONTEXT</code> structure, not just a subset of them that is really necessary for the functionality, so it results in another unnecessary memory accesses.</li>
</ul>
</li>
<li><strong>Stack walking over the stack frames of the tail calls requires runtime assistance.</strong></li>
</ul>
</blockquote>
<p>Fortunately, it then goes into great depth discussing how a new approach <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/tailcalls-with-helpers.md#the-new-approach-to-tail-calls-using-helpers">could be implemented</a> and how it would solve these issues. Even better, work has already started and we can follow along in <a href="https://github.com/dotnet/coreclr/pull/26418">Implement portable tailcall helpers #26418</a> (currently sitting at ‘31 of 55’ tasks completed, with over 50 files modified, it’s not a small job!).</p>
<p>Finally, for other PRs related to tail calls, see:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/703/files">Disable JIT_TailCall invocation on Unix #703</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/9405">JIT: enable implicit tail calls from inlined code #9405</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/2556">Full tailcall support on Unix #2556</a></li>
</ul>
<h3 id="virtual-stub-dispatch-vsd">Virtual Stub Dispatch (VSD)</h3>
<p>I’ve saved the best for last, ‘Virtual Stub Dispatch’ or VSD is such an in-depth topic, that it an entire <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">BotR page devoted to it</a>!! From the introduction:</p>
<blockquote>
<p>Virtual stub dispatching (VSD) is the <strong>technique of using stubs for virtual method invocations instead of the traditional virtual method table</strong>. In the past, interface dispatch required that interfaces had process-unique identifiers, and that every loaded interface was added to a global interface virtual table map. This requirement meant that all interfaces and all classes that implemented interfaces had to be restored at runtime in NGEN scenarios, causing significant startup working set increases. <strong>The motivation for stub dispatching was to eliminate much of the related working set, as well as distribute the remaining work throughout the lifetime of the process</strong>.</p>
</blockquote>
<p>It then goes on to say:</p>
<blockquote>
<p>Although it is possible for VSD to dispatch both virtual instance and interface method calls, <strong>it is currently used only for interface dispatch</strong>.</p>
</blockquote>
<p>So despite having the work ‘virtual’ in the title, it’s not actually used for C# methods with the <code class="language-plaintext highlighter-rouge">virtual</code> modifier on them. However, if you look at the <a href="https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBLANgHwAEAmARgFgAoQgZgAIS6BhOkOgSQDEII6BvKnSEN6hFHQAiEbhAAUASn6DhKgPSrJvAM4QAtjAwALbADsA5gEJlQgL5U7lKrTqmMMKADMAhmBgcZSpR0AJBimjIKANzC9kA==">IL for interface methods</a> you can see why they are also known as ‘virtual’.</p>
<p>Virtual Stub Dispatch is so complex, it actually has several different stub types, from <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/virtualcallstub.h#L311--L318">/vm/virtualcallstub.h</a>:</p>
<pre><code class="language-C++">enum StubKind {
SK_UNKNOWN,
SK_LOOKUP, // Lookup Stubs are SLOW stubs that simply call into the runtime to do all work.
SK_DISPATCH, // Dispatch Stubs have a fast check for one type otherwise jumps to runtime. Works for monomorphic sites
SK_RESOLVE, // Resolve Stubs do a hash lookup before fallling back to the runtime. Works for polymorphic sites.
SK_VTABLECALL, // Stub that jumps to a target method using vtable-based indirections. Works for non-interface calls.
SK_BREAKPOINT
};
</code></pre>
<p>So there are the following types (these are links to the <code class="language-plaintext highlighter-rouge">AMD64</code> versions, <code class="language-plaintext highlighter-rouge">x86</code> versions are in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/i386/virtualcallstubcpu.hpp">/vm/i386/virtualcallstubcpu.hpp</a>):</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L50-L82">Lookup Stubs</a>:
<ul>
<li><code class="language-plaintext highlighter-rouge">// Virtual and interface call sites are initially setup to point at LookupStubs. This is because the runtime type of the <this> pointer is not yet known, so the target cannot be resolved.</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L194-L286">Dispatch Stubs</a>:
<ul>
<li><code class="language-plaintext highlighter-rouge">// Monomorphic and mostly monomorphic call sites eventually point to DispatchStubs. A dispatch stub has an expected type (expectedMT), target address (target) and fail address (failure). If the calling frame does in fact have the <this> type be of the expected type, then control is transfered to the target address, the method implementation. If not, then control is transfered to the fail address, a fail stub (see below) where a polymorphic lookup is done to find the correct address to go to.</code></li>
<li>There’s also specific versions, <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L118-L143">DispatchStubShort</a> and <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L154-L183">DispatchStubLong</a>, see <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L110-L116">this comment</a> for why they are both needed.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L337-L435">Resolve Stubs</a>:
<ul>
<li><code class="language-plaintext highlighter-rouge">// Polymorphic call sites and monomorphic calls that fail end up in a ResolverStub. There is only one resolver stub built for any given token, even though there may be many call sites that use that token and many distinct <this> types that are used in the calling call frames. A resolver stub actually has two entry points, one for polymorphic call sites and one for dispatch stubs that fail on their expectedMT test. There is a third part of the resolver stub that enters the ee when a decision should be made about changing the callsite.</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/virtualcallstubcpu.hpp#L460-L494">V-Table or Virtual Call Stubs</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">//These are jump stubs that perform a vtable-base virtual call. These stubs assume that an object is placed in the first argument register (this pointer). From there, the stub extracts the MethodTable pointer, followed by the vtable pointer, and finally jumps to the target method at a given slot in the vtable.</code></li>
</ul>
</li>
</ul>
<p>The below diagram shows the general control flow between these stubs</p>
<p><img src="/images/2019/09/virtualstubdispatch-fig2.png" alt="Virtual Stub Dispatch" /></p>
<p>(Image from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md#design-of-virtual-stub-dispatch">‘Design of Virtual Stub Dispatch’</a>)</p>
<p>Finally, if you want <em>even</em> more in-depth information see <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/virtualcallstub.h#L176-L219">this comment</a>.</p>
<p>However, these stubs come at a cost, which makes <em>virtual</em> method calls more expensive than <em>direct</em> ones. This is why <em>de-virtualization</em> is so important, i.e. the process of the .NET JIT detecting when a <em>virtual</em> call can instead be replaced by a <em>direct</em> one. There has been some work done in .NET Core to improve this, see <a href="https://github.com/dotnet/coreclr/pull/9230">Simple devirtualization #9230</a> which covers <code class="language-plaintext highlighter-rouge">sealed</code> classes/methods and when the object type is known <em>exactly</em>. However there is still more to be done, as shown in <a href="https://github.com/dotnet/coreclr/issues/9908">JIT: devirtualization next steps #9908</a>, where ‘5 of 23’ tasks have been completed.</p>
<h2 id="other-types-of-stubs">Other Types of Stubs</h2>
<p>This post is already <em>way</em> too long, so I don’t intend to offer any analysis of the following stubs. Instead I’ve just included some links to more information so you can read up on any that interest you!</p>
<p><strong>‘Jump’ stubs</strong></p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/jump-stubs.md">‘Jump Stubs’ design doc</a></li>
<li><a href="https://support.microsoft.com/en-gb/help/3152158/out-of-memory-exception-in-a-managed-application-that-s-running-on-the#section-3">Out-of-memory exception in a managed application that’s running on the 64-bit .NET Framework</a></li>
</ul>
<p><strong>‘Function Pointer’ stubs</strong></p>
<ul>
<li>‘Function Pointer’ Stubs, see <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/fptrstubs.cpp">/vm/fptrstubs.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/fptrstubs.h">/vm/fptrstubs.h</a></li>
<li><code class="language-plaintext highlighter-rouge">// FuncPtrStubs contains stubs that is used by GetMultiCallableAddrOfCode() if the function has not been jitted. Using a stub decouples ldftn from the prestub, so prestub does not need to be backpatched. This stub is also used in other places which need a function pointer</code></li>
</ul>
<p><strong>‘Thread Hijacking’ stubs</strong></p>
<p>From the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/threading.md">BotR page on ‘Threading’</a>:</p>
<blockquote>
<ul>
<li>If fully interruptable, it is safe to perform a GC at any point, since the thread is, by definition, at a safe point. It is reasonable to leave the thread suspended at this point (because it’s safe) but various historical OS bugs prevent this from working, because the CONTEXT retrieved earlier may be corrupt). Instead, the thread’s instruction pointer is overwritten, redirecting it to a <strong>stub</strong> that will capture a more complete CONTEXT, leave cooperative mode, wait for the GC to complete, reenter cooperative mode, and restore the thread to its previous state.</li>
<li>If partially-interruptable, the thread is, by definition, not at a safe point. However, the caller will be at a safe point (method transition). Using that knowledge, the CLR “hijacks” the top-most stack frame’s return address (physically overwrite that location on the stack) with a <strong>stub</strong> similar to the one used for fully-interruptable code. When the method returns, it will no longer return to its actual caller, but rather to the <strong>stub</strong> (the method may also perform a GC poll, inserted by the JIT, before that point, which will cause it to leave cooperative mode and undo the hijack).</li>
</ul>
</blockquote>
<p>Done with the <code class="language-plaintext highlighter-rouge">OnHijackTripThread</code> method in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64/AsmHelpers.asm#L431-L456">/vm/amd64/AsmHelpers.asm</a>, which calls into <code class="language-plaintext highlighter-rouge">OnHijackWorker(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/threadsuspend.cpp#L5691-L5734">/vm/threadsuspend.cpp</a>.</p>
<p><strong>‘NGEN Fixup’ stubs</strong></p>
<p>From <a href="https://web.archive.org/web/20090213104137/http://msdn.microsoft.com/en-us/magazine/cc163610.aspx#S7">CLR Inside Out - The Performance Benefits of NGen</a> (2006):</p>
<blockquote>
<p>Throughput of NGen-compiled code is lower than that of JIT-compiled code primarily for one reason: cross-assembly references. In JIT-compiled code, cross-assembly references can be implemented as direct calls or jumps since the exact addresses of these references are known at run time. For statically compiled code, however, <strong>cross-assembly references need to go through a jump slot that gets populated with the correct address at run time by executing a method pre-stub</strong>. The method pre-stub ensures, among other things, that the <strong>native images for assemblies referenced by that method are loaded into memory before the method is executed</strong>. The pre-stub only needs to be executed the first time the method is called; it is short-circuited out for subsequent calls. However, every time the method is called, cross-assembly references do need to go through a level of indirection. This is principally what accounted for the 5-10 percent drop in throughput for NGen-compiled code when compared to JIT-compiled code.</p>
</blockquote>
<p>Also see the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/jump-stubs.md#ngen">‘NGEN’ section</a> of the ‘jump stub’ design doc.</p>
<hr />
<h2 id="stubs-in-the-mono-runtime">Stubs in the Mono Runtime</h2>
<p>Mono refers to ‘Stubs’ as ‘Trampolines’ and they’re <a href="https://github.com/mono/mono/search?l=C&q=trampoline&type=Code">widely used</a> in the source code.</p>
<p>The Mono docs have an excellent page <a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/">all about ‘Trampolines’</a>, that lists the following types:</p>
<ol>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#jit-trampolines">JIT Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#virtual-call-trampolines">Virtual Call Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#jump-trampolines">Jump Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#class-init-trampolines">Class Init Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#generic-class-init-trampoline">Generic Class Init Trampoline</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#rgctx-lazy-fetch-trampolines">RGCTX Lazy Fetch Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#aot-trampolines">AOT Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#delegate-trampolines">Delegate Trampolines</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/trampolines/#monitor-enterexit-trampolines">Monitor Enter/Exit Trampolines</a></li>
</ol>
<p>Also the docs page on <a href="https://www.mono-project.com/docs/advanced/runtime/docs/generic-sharing/">Generic Sharing</a> has some good, in-depth information.</p>
<ul>
<li><a href="https://web.archive.org/web/20170628151528/http://www.advogato.org/person/lupus/diary/24.html">Memory savings with magic trampolines in Mono</a></li>
<li><a href="https://www.infoq.com/news/2007/10/Mono-JIT/">Mono JIT Enhancements: Trampolines and Code Sharing</a></li>
<li><a href="https://tirania.org/blog/archive/2007/Sep-21.html">Generics Improvements</a></li>
<li><a href="https://schani.wordpress.com/2007/09/22/generics-sharing-in-mono/">Generics Sharing in Mono</a></li>
<li><a href="https://schani.wordpress.com/2007/10/12/the-trouble-with-shared-generics/">The Trouble with Shared Generics</a></li>
<li><a href="https://blogs.unity3d.com/2015/06/16/il2cpp-internals-generic-sharing-implementation/">IL2CPP Internals: Generic sharing implementation</a></li>
<li><a href="https://github.com/mono/mono/blob/master/docs/jit-trampolines">How-to trigger JIT compilation</a></li>
</ul>
<hr />
<h2 id="conclusion">Conclusion</h2>
<p>So it turns out that ‘stubs’ are way more prevelant in the .NET Core Runtime that I imagined when I first started on this post. They are an interesting technique and they contain a fair amount of complexity. In addition, I only covered each stub in isolation, in reality many of them have to play nicely together, for instance imagine a <code class="language-plaintext highlighter-rouge">delegate</code> calling a <code class="language-plaintext highlighter-rouge">virtual</code> method that has <code class="language-plaintext highlighter-rouge">generic</code> type parameters and you can see that things start to get complex! (that scenario <em>might</em> contain 3 seperate stubs, although they are also <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/arm/stubs.cpp#L1677-L1684">shared where possible</a>). If you were then to add <code class="language-plaintext highlighter-rouge">array</code> methods, <code class="language-plaintext highlighter-rouge">P/Invoke</code> marshalling and <code class="language-plaintext highlighter-rouge">un-boxing</code> to the mix, things get <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/comdelegate.cpp#L3228-L3252">even more hairy</a> and <a href="https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/genmeth.cpp#L660-L703">even more complex</a>!</p>
<p><strong>If anyone has read this far and wants a fun challenge, try and figure out what’s the most stubs you can force a single method call to go via! If you do, let me know in the comments or <a href="https://twitter.com/matthewwarren">via twitter</a></strong></p>
<p>Finally, by knowing <strong>where</strong> and <strong>when</strong> stubs are involved in our method calls, we can start to understand the overhead of each scenario. For instance, it explains why <code class="language-plaintext highlighter-rouge">delegate</code> method calls are a bit slower than calling a method directly and why ‘de-virtualization’ is so important. Having the JIT be able to perform extra analysis to determine that a virtual call can be converted into a direct one skips an entire level of indirection, for more on this see:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/9230">Simple devirtualization #9230</a> (already implemented)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/9908">JIT: devirtualization next steps #9908</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/GuardedDevirtualization.md">‘Guarded Devirtualization’ design doc</a></li>
</ul>
ASCII Art in .NET Code2019-04-25T00:00:00+00:00http://www.mattwarren.org/2019/04/25/ASCII-Art-in-.NET-Code
<p>Who doesn’t like a nice bit of ‘ASCII Art’? I know I certainly do!</p>
<p><a href="https://www.youtube.com/watch?v=bwSNyA1Nfz4&t=1477"><img src="/images/2019/04/ASCII Art - Matt's CLR.png" alt="ASCII Art - Matt's CLR" /></a></p>
<p>To see what <em>Matt’s CLR</em> was all about you can watch the recording of my talk <a href="https://www.youtube.com/watch?v=bwSNyA1Nfz4&t=1477">‘From ‘dotnet run’ to ‘Hello World!’’</a> (from about ~24:30 in)</p>
<hr />
<p>So armed with a trusty regex <code class="language-plaintext highlighter-rouge">/\*(.*?)\*/|//(.*?)\r?\n|"((\\[^\n]|[^"\n])*)"|@("[^"]*")+</code>, I set out to find all the <strong>interesting ASCII Art</strong> used in source code comments in the following <em>.NET related</em> repositories:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/">dotnet/CoreCLR</a> - “<em>the runtime for .NET Core. It includes the garbage collector, JIT compiler, primitive data types and low-level classes.</em>”</li>
<li><a href="https://github.com/mono/mono">Mono</a> - “<em>open source ECMA CLI, C# and .NET implementation.</em>”</li>
<li><a href="https://github.com/dotnet/corefx">dotnet/CoreFX</a> - “<em>the foundational class libraries for .NET Core. It includes types for collections, file systems, console, JSON, XML, async and many others.</em>”</li>
<li><a href="https://github.com/dotnet/Roslyn">dotnet/Roslyn</a> - “<em>provides C# and Visual Basic languages with rich code analysis APIs</em>”</li>
<li><a href="https://github.com/aspnet/AspNetCore">aspnet/AspNetCore</a> - “<em>a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.</em>”</li>
</ul>
<p><strong>Note</strong>: Yes, I shamelessly ‘borrowed’ this idea from <a href="https://twitter.com/johnregehr/status/1095018518737637376">John Regehr</a>, I was motivated to write this because his excellent post <a href="https://blog.regehr.org/archives/1653">‘Explaining Code using ASCII Art’</a> didn’t have any <em>.NET related</em> code in it!</p>
<p><strong>If you’ve come across any interesting examples I’ve missed out, please let me know!</strong></p>
<hr />
<h2 id="table-of-contents">Table of Contents</h2>
<p>To make the examples easier to browse, I’ve split them up into categories:</p>
<ul>
<li><a href="#dave-cutler">Dave Cutler</a></li>
<li><a href="#syntax-trees">Syntax Trees</a></li>
<li><a href="#timelines">Timelines</a></li>
<li><a href="#logic-tables">Logic Tables</a></li>
<li><a href="#class-hierarchies">Class Hierarchies</a></li>
<li><a href="#component-diagrams">Component Diagrams</a></li>
<li><a href="#algorithms">Algorithms</a></li>
<li><a href="#bit-packing">Bit Packing</a></li>
<li><a href="#data-structures">Data Structures</a></li>
<li><a href="#state-machines">State Machines</a></li>
<li><a href="#rfcs-and-specs">RFC’s and Specs</a></li>
<li><a href="#dates--times">Dates & Times</a></li>
<li><a href="#stack-layouts">Stack Layouts</a></li>
<li><a href="#the-rest">The Rest</a></li>
</ul>
<hr />
<h2 id="dave-cutler">Dave Cutler</h2>
<p>There’s no <em>art</em> in this one, but it deserves it’s own category as it quotes the amazing <a href="https://en.wikipedia.org/wiki/Dave_Cutler">Dave Cutler</a> who led the development of Windows NT. Therefore there’s no better person to ask a deep, technical question about how <em>Thread Suspension</em> works on Windows, from <a href="https://github.com/dotnet/coreclr/blob/dc11162e1c36624d3cabb6e0bf6583a94ab2e30c/src/vm/threadsuspend.cpp#L102-L124">coreclr/src/vm/threadsuspend.cpp</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Message from David Cutler
/*
After SuspendThread returns, can the suspended thread continue to execute code in user mode?
[David Cutler] The suspended thread cannot execute any more user code, but it might be currently "running"
on a logical processor whose other logical processor is currently actually executing another thread.
In this case the target thread will not suspend until the hardware switches back to executing instructions
on its logical processor. In this case even the memory barrier would not necessarily work - a better solution
would be to use interlocked operations on the variable itself.
After SuspendThread returns, does the store buffer of the CPU for the suspended thread still need to drain?
Historically, we've assumed that the answer to both questions is No. But on one 4/8 hyper-threaded machine
running Win2K3 SP1 build 1421, we've seen two stress failures where SuspendThread returns while writes seem to still be in flight.
Usually after we suspend a thread, we then call GetThreadContext. This seems to guarantee consistency.
But there are places we would like to avoid GetThreadContext, if it's safe and legal.
[David Cutler] Get context delivers a APC to the target thread and waits on an event that will be set
when the target thread has delivered its context.
Chris.
*/
</code></pre></div></div>
<p>For more info on Dave Cutler, see this excellent interview <a href="https://dave.cheney.net/2018/10/06/internets-of-interest-6-dave-cutler-on-dave-cutler">‘Internets of Interest #6: Dave Cutler on Dave Cutler’</a> or <a href="https://news.microsoft.com/features/the-engineers-engineer-computer-industry-luminaries-salute-dave-cutlers-five-decade-long-quest-for-quality/">‘The engineer’s engineer: Computer industry luminaries salute Dave Cutler’s five-decade-long quest for quality’</a></p>
<hr />
<h2 id="syntax-trees">Syntax Trees</h2>
<p>The inner workings of the .NET ‘Just-in-Time’ (JIT) Compiler have always been a bit of a mystery to me. But, having informative comments like this one from <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/lsra.cpp#L6166-L6196">coreclr/src/jit/lsra.cpp</a> go some way to showing what it’s doing</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// For example, for this tree (numbers are execution order, lower is earlier and higher is later):
//
// +---------+----------+
// | GT_ADD (3) |
// +---------+----------+
// |
// / \
// / \
// / \
// +-------------------+ +----------------------+
// | x (1) | "tree" | y (2) |
// +-------------------+ +----------------------+
//
// generate this tree:
//
// +---------+----------+
// | GT_ADD (4) |
// +---------+----------+
// |
// / \
// / \
// / \
// +-------------------+ +----------------------+
// | GT_RELOAD (3) | | y (2) |
// +-------------------+ +----------------------+
// |
// +-------------------+
// | x (1) | "tree"
// +-------------------+
</code></pre></div></div>
<p>There’s also a more in-depth example in <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/morph.cpp#L6170-L6236">coreclr/src/jit/morph.cpp</a></p>
<p>Also from <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/VisualBasic/Portable/Semantics/TypeInference/RequiredConversion.vb#L87-L104">roslyn/src/Compilers/VisualBasic/Portable/Semantics/TypeInference/RequiredConversion.vb</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> '// These restrictions form a partial order composed of three chains: from less strict to more strict, we have:
'// [reverse chain] [None] < AnyReverse < ReverseReference < Identity
'// [middle chain] None < [Any,AnyReverse] < AnyConversionAndReverse < Identity
'// [forward chain] [None] < Any < ArrayElement < Reference < Identity
'//
'// = KEY:
'// / | \ = Identity
'// / | \ +r Reference
'// -r | +r -r ReverseReference
'// | +-any | +-any AnyConversionAndReverse
'// | /|\ +arr +arr ArrayElement
'// | / | \ | +any Any
'// -any | +any -any AnyReverse
'// \ | / none None
'// \ | /
'// none
'//
</code></pre></div></div>
<hr />
<h2 id="timelines">Timelines</h2>
<p>This example from <a href="https://github.com/dotnet/coreclr/blob/e277764916cbb740db199132be81701593820bb0/src/vm/comwaithandle.cpp#L129-L156">coreclr/src/vm/comwaithandle.cpp</a> was unique! I didn’t find another example of ASCII Art used to illustrate time-lines, it’s a really novel approach.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// In case the CLR is paused inbetween a wait, this method calculates how much
// the wait has to be adjusted to account for the CLR Freeze. Essentially all
// pause duration has to be considered as "time that never existed".
//
// Two cases exists, consider that 10 sec wait is issued
// Case 1: All pauses happened before the wait completes. Hence just the
// pause time needs to be added back at the end of wait
// 0 3 8 10
// |-----------|###################|------>
// 5-sec pause
// ....................>
// Additional 5 sec wait
// |=========================>
//
// Case 2: Pauses ended after the wait completes.
// 3 second of wait was left as the pause started at 7 so need to add that back
// 0 7 10
// |---------------------------|###########>
// 5-sec pause 12
// ...................>
// Additional 3 sec wait
// |==================>
//
// Both cases can be expressed in the same calculation
// pauseTime: sum of all pauses that were triggered after the timer was started
// expDuration: expected duration of the wait (without any pauses) 10 in the example
// actDuration: time when the wait finished. Since the CLR is frozen during pause it's
// max of timeout or pause-end. In case-1 it's 10, in case-2 it's 12
</code></pre></div></div>
<hr />
<h2 id="logic-tables">Logic Tables</h2>
<p>A sweet-spot for ASCII Art seems to be tables, there are so many examples. Starting with <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/methodtablebuilder.cpp#L4675-L4686">coreclr/src/vm/methodtablebuilder.cpp</a> (bonus points for combining comments and code together!)</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// | Base type</span>
<span class="c1">// Subtype | mdPrivateScope mdPrivate mdFamANDAssem mdAssem mdFamily mdFamORAssem mdPublic</span>
<span class="c1">// --------------+-------------------------------------------------------------------------------------------------------</span>
<span class="cm">/*mdPrivateScope | */</span> <span class="p">{</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdPrivate | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdFamANDAssem | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_SA</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdAssem | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_SA</span><span class="p">,</span> <span class="n">e_SA</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdFamily | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_NO</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_NSA</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdFamORAssem | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_SA</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_NO</span> <span class="p">},</span>
<span class="cm">/*mdPublic | */</span> <span class="p">{</span> <span class="n">e_SM</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span><span class="p">,</span> <span class="n">e_YES</span> <span class="p">}</span> <span class="p">};</span>
</code></pre></div></div>
<p>Also <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/importer.cpp#L15265-L15283">coreclr/src/jit/importer.cpp</a> which shows how the JIT deals with boxing/un-boxing</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/*
----------------------------------------------------------------------
| \ helper | | |
| \ | | |
| \ | CORINFO_HELP_UNBOX | CORINFO_HELP_UNBOX_NULLABLE |
| \ | (which returns a BYREF) | (which returns a STRUCT) |
| opcode \ | | |
|---------------------------------------------------------------------
| UNBOX | push the BYREF | spill the STRUCT to a local, |
| | | push the BYREF to this local |
|---------------------------------------------------------------------
| UNBOX_ANY | push a GT_OBJ of | push the STRUCT |
| | the BYREF | For Linux when the |
| | | struct is returned in two |
| | | registers create a temp |
| | | which address is passed to |
| | | the unbox_nullable helper. |
|---------------------------------------------------------------------
*/
</code></pre></div></div>
<p>Finally, there’s some other nice examples showing the rules for <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/CSharp/Portable/Binder/Semantics/Operators/BinaryOperatorEasyOut.cs#L104-L165">operator overloading</a> in the C# (Roslyn) Compiler and which .NET data-types <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Convert.cs#L46-L63">can be converted</a> via the <code class="language-plaintext highlighter-rouge">System.ToXXX()</code> functions.</p>
<hr />
<h2 id="class-hierarchies">Class Hierarchies</h2>
<p>Of course, most IDE’s come with tools that will generate class-hierarchies for you, but it’s much nicer to see them in ASCII, from <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/object.h#L28-L55">coreclr/src/vm/object.h</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> * COM+ Internal Object Model
*
*
* Object - This is the common base part to all COM+ objects
* | it contains the MethodTable pointer and the
* | sync block index, which is at a negative offset
* |
* +-- code:StringObject - String objects are specialized objects for string
* | storage/retrieval for higher performance
* |
* +-- BaseObjectWithCachedData - Object Plus one object field for caching.
* | |
* | +- ReflectClassBaseObject - The base object for the RuntimeType class
* | +- ReflectMethodObject - The base object for the RuntimeMethodInfo class
* | +- ReflectFieldObject - The base object for the RtFieldInfo class
* |
* +-- code:ArrayBase - Base portion of all arrays
* | |
* | +- I1Array - Base type arrays
* | | I2Array
* | | ...
* | |
* | +- PtrArray - Array of OBJECTREFs, different than base arrays because of pObjectClass
* |
* +-- code:AssemblyBaseObject - The base object for the class Assembly
</code></pre></div></div>
<p>There’s also an <a href="https://github.com/dotnet/coreclr/blob/1f02c30e053b1da4410e20c3b715128e3d1e354a/src/vm/frames.h#L7-L197">even larger one</a> that I stumbled across when writing <a href="/2019/01/21/Stackwalking-in-the-.NET-Runtime/">“Stack Walking” in the .NET Runtime</a>.</p>
<hr />
<h2 id="component-diagrams">Component Diagrams</h2>
<p>When you have several different components in a code-base it’s always nice to see how they fit together. From <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/codeman.h#L14-L56">coreclr/src/vm/codeman.h</a> we can see how the top-level parts of the .NET JIT work together</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ExecutionManager
|
+-----------+---------------+---------------+-----------+--- ...
| | | |
CodeType | CodeType |
| | | |
v v v v
+---------------+ +--------+<---- R +---------------+ +--------+<---- R
|ICorJitCompiler|<---->|IJitMan |<---- R |ICorJitCompiler|<---->|IJitMan |<---- R
+---------------+ +--------+<---- R +---------------+ +--------+<---- R
| x . | x .
| \ . | \ .
v \ . v \ .
+--------+ R +--------+ R
|ICodeMan| |ICodeMan| (RangeSections)
+--------+ +--------+
</code></pre></div></div>
<p>Other notable examples are:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/compile.h#L14-L47">coreclr/src/vm/compile.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/inc/ceegen.h#L47-L92">coreclr/src/inc/ceegen.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/debug/di/divalue.cpp#L1432-L1451">coreclr/src/debug/di/divalue.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/ceeload.cpp#L10543-L10578">coreclr/src/vm/ceeload.cpp</a></li>
</ul>
<p>Finally, from <a href="https://github.com/dotnet/coreclr/blob/e6034d903f2608445a3f66e3694f461fad7b8b88/src/vm/ceeload.cpp#L10350-L10385">coreclr/src/vm/ceeload.cpp</a> we see the inner-workings of the <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/ngen-exe-native-image-generator">Native Image Generator (NGEN)</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> This diagram illustrates the layout of fixups in the ngen image.
This is the case where function foo2 has a class-restore fixup
for class C1 in b.dll.
zapBase+curTableVA+rva / FixupList (see Fixup Encoding below)
m_pFixupBlobs
+-------------------+
pEntry->VA +--------------------+ | non-NULL | foo1
|Handles | +-------------------+
ZapHeader.ImportTable | | | non-NULL |
| | +-------------------+
+------------+ +--------------------+ | non-NULL |
|a.dll | |Class cctors |<---+ +-------------------+
| | | | \ | 0 |
| | p->VA/ | |<---+ \ +===================+
| | blobs +--------------------+ \ +-------non-NULL | foo2
+------------+ |Class restore | \ +-------------------+
|b.dll | | | +-------non-NULL |
| | | | +-------------------+
| token_C1 |<--------------blob(=>fixedUp/0) |<--pBlob--------index |
| | \ | | +-------------------+
| | \ +--------------------+ | non-NULL |
| | \ | | +-------------------+
| | \ | . | | 0 |
| | \ | . | +===================+
+------------+ \ | . | | 0 | foo3
\ | | +===================+
\ +--------------------+ | non-NULL | foo4
\ |Various fixups that | +-------------------+
\ |need too happen | | 0 |
\| | +===================+
|(CorCompileTokenTable)
| |
pEntryEnd->VA +--------------------+
</code></pre></div></div>
<hr />
<h2 id="algorithms">Algorithms</h2>
<p>They say ‘<em>a picture paints a thousand words</em>’ and that definately applies when describing complex algorithms, from <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Workspaces/Core/Portable/Utilities/EditDistance.cs#L232-L287">roslyn/src/Workspaces/Core/Portable/Utilities/EditDistance.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// If we fill out the matrix fully we'll get:
//
// s u n d a y <-- source
// ----------------
// |∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
// |∞ 0 1 2 3 4 5 6
// s |∞ 1 0 1 2 3 4 5
// a |∞ 2 1 1 2 3 3 4
// t |∞ 3 2 2 2 3 4 4
// u |∞ 4 3 2 3 3 4 5
// r |∞ 5 4 3 3 4 4 5
// d |∞ 6 5 4 4 3 4 5
// a |∞ 7 6 5 5 4 3 4
// y |∞ 8 7 6 6 5 4 3 <--
// ^
// |
</code></pre></div></div>
<p>Next, this gem that explains how the DOS wild-card matching works, <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemName.cs#L104-L158">corefx/src/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemName.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Matching routine description
// ============================
// (copied from native impl)
//
// This routine compares a Dbcs name and an expression and tells the caller
// if the name is in the language defined by the expression. The input name
// cannot contain wildcards, while the expression may contain wildcards.
//
// Expression wild cards are evaluated as shown in the nondeterministic
// finite automatons below. Note that ~* and ~? are DOS_STAR and DOS_QM.
//
// ~* is DOS_STAR, ~? is DOS_QM, and ~. is DOS_DOT
//
// S
// <-----<
// X | | e Y
// X * Y == (0)----->-(1)->-----(2)-----(3)
//
// S-.
// <-----<
// X | | e Y
// X ~* Y == (0)----->-(1)->-----(2)-----(3)
//
// X S S Y
// X ?? Y == (0)---(1)---(2)---(3)---(4)
//
// X . . Y
// X ~.~. Y == (0)---(1)----(2)------(3)---(4)
// | |________|
// | ^ |
// |_______________|
// ^EOF or .^
//
// X S-. S-. Y
// X ~?~? Y == (0)---(1)-----(2)-----(3)---(4)
// | |________|
// | ^ |
// |_______________|
// ^EOF or .^
//
// where S is any single character
// S-. is any single character except the final .
// e is a null character transition
// EOF is the end of the name string
//
// In words:
//
// * matches 0 or more characters.
// ? matches exactly 1 character.
// DOS_STAR matches 0 or more characters until encountering and matching
// the final . in the name.
// DOS_QM matches any single character, or upon encountering a period or
// end of name string, advances the expression to the end of the
// set of contiguous DOS_QMs.
// DOS_DOT matches either a . or zero characters beyond name string.
</code></pre></div></div>
<p>Finally from <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Workspaces/Core/Portable/Shared/Collections/IntervalTree%601.Node.cs#L65-L125">roslyn/src/Workspaces/Core/Portable/Shared/Collections/IntervalTree`1.Node.cs</a> we have per-method comments with samples, this is a great idea!</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Sample:</span>
<span class="c1">// 1 1 3</span>
<span class="c1">// / \ / \ / \</span>
<span class="c1">// a 2 a 3 1 2</span>
<span class="c1">// / \ => / \ => / \ / \</span>
<span class="c1">// 3 d b 2 a b c d</span>
<span class="c1">// / \ / \</span>
<span class="c1">// b c c d</span>
<span class="k">internal</span> <span class="n">Node</span> <span class="nf">InnerRightOuterLeftRotation</span><span class="p">(</span><span class="n">IIntervalIntrospector</span><span class="p"><</span><span class="n">T</span><span class="p">></span> <span class="n">introspector</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="c1">// Sample:</span>
<span class="c1">// 1 1 3</span>
<span class="c1">// / \ / \ / \</span>
<span class="c1">// 2 d 3 d 2 1</span>
<span class="c1">// / \ => / \ => / \ / \</span>
<span class="c1">// a 3 2 c a b c d</span>
<span class="c1">// / \ / \</span>
<span class="c1">// b c a b</span>
<span class="k">internal</span> <span class="n">Node</span> <span class="nf">InnerLeftOuterRightRotation</span><span class="p">(</span><span class="n">IIntervalIntrospector</span><span class="p"><</span><span class="n">T</span><span class="p">></span> <span class="n">introspector</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<h2 id="bit-packing">Bit Packing</h2>
<p>Maybe you can visualise which <em>individual</em> bits are set given a Hexadecimal value, but I can’t, so I’m always grateful for comments like this one from <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/CSharp/Portable/Symbols/Source/SourceMemberContainerSymbol.cs#L28-L37">roslyn/src/Compilers/CSharp/Portable/Symbols/Source/SourceMemberContainerSymbol.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// We current pack everything into two 32-bit ints; layouts for each are given below.
// First int:
//
// | |d|yy|xxxxxxxxxxxxxxxxxxxxxxx|wwwwww|
//
// w = special type. 6 bits.
// x = modifiers. 23 bits.
// y = IsManagedType. 2 bits.
// d = FieldDefinitionsNoted. 1 bit
</code></pre></div></div>
<p>This one from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Runtime.WindowsRuntime/src/System/Threading/Tasks/TaskToAsyncInfoAdapter.cs#L26-L43">corefx/src/System.Runtime.WindowsRuntime/src/System/Threading/Tasks/TaskToAsyncInfoAdapter.cs</a> also does a great job of showing the different bit-flags and how they interact</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// ! THIS DIAGRAM ILLUSTRATES THE CONSTANTS BELOW. UPDATE THIS IF UPDATING THE CONSTANTS BELOW!:
// 3 2 1 0
// 10987654321098765432109876543210
// X............................... Reserved such that we can use Int32 and not worry about negative-valued state constants
// ..X............................. STATEFLAG_COMPLETED_SYNCHRONOUSLY
// ...X............................ STATEFLAG_MUST_RUN_COMPLETION_HNDL_WHEN_SET
// ....X........................... STATEFLAG_COMPLETION_HNDL_NOT_YET_INVOKED
// ................................ STATE_NOT_INITIALIZED
// ...............................X STATE_STARTED
// ..............................X. STATE_RUN_TO_COMPLETION
// .............................X.. STATE_CANCELLATION_REQUESTED
// ............................X... STATE_CANCELLATION_COMPLETED
// ...........................X.... STATE_ERROR
// ..........................X..... STATE_CLOSED
// ..........................XXXXXX STATEMASK_SELECT_ANY_ASYNC_STATE
// XXXXXXXXXXXXXXXXXXXXXXXXXX...... STATEMASK_CLEAR_ALL_ASYNC_STATES
// 3 2 1 0
// 10987654321098765432109876543210
</code></pre></div></div>
<p>Finally, we have some helpful explanations of how different encoding work. Firstly UTF-8 from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Text/UTF8Encoding.cs#L38-L49">corefx//src/Common/src/CoreLib/System/Text/UTF8Encoding.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/*
bytes bits UTF-8 representation
----- ---- -----------------------------------
1 7 0vvvvvvv
2 11 110vvvvv 10vvvvvv
3 16 1110vvvv 10vvvvvv 10vvvvvv
4 21 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
----- ---- -----------------------------------
Surrogate:
Real Unicode value = (HighSurrogate - 0xD800) * 0x400 + (LowSurrogate - 0xDC00) + 0x10000
*/
</code></pre></div></div>
<p>and then UTF-32 in <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Text/UTF32Encoding.cs#L26-L35">corefx/src/Common/src/CoreLib/System/Text/UTF32Encoding.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/*
words bits UTF-32 representation
----- ---- -----------------------------------
1 16 00000000 00000000 xxxxxxxx xxxxxxxx
2 21 00000000 000xxxxx hhhhhhll llllllll
----- ---- -----------------------------------
Surrogate:
Real Unicode value = (HighSurrogate - 0xD800) * 0x400 + (LowSurrogate - 0xDC00) + 0x10000
*/
</code></pre></div></div>
<hr />
<h2 id="data-structures">Data Structures</h2>
<p>This comment from <a href="https://github.com/mono/mono/blob/2019-02/mono/utils/dlmalloc.c#L1509-L1564">mono/utils/dlmalloc.c</a> does a great job of showing how chunks of memory are arranaged by <code class="language-plaintext highlighter-rouge">malloc</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> A chunk that's in use looks like:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk (if P = 1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |P|
| Size of this chunk 1| +-+
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+- -+
| |
+- -+
| :
+- size - sizeof(size_t) available payload bytes -+
: |
chunk-> +- -+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|
| Size of next chunk (may or may not be in use) | +-+
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
And if it's free, it looks like this:
chunk-> +- -+
| User payload (must be in use, or we would have merged!) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |P|
| Size of this chunk 0| +-+
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Prev pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+- size - sizeof(struct chunk) unused bytes -+
: |
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of this chunk |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|
| Size of next chunk (must be in use, or we would have merged)| +-+
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+- User payload -+
: |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|
+-+
</code></pre></div></div>
<p>Also, from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/MemoryExtensions.cs#L1185-L1311">corefx/src/Common/src/CoreLib/System/MemoryExtensions.cs</a> we can see how overlapping memory regions are detected:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Visually, the two sequences are located somewhere in the 32-bit
// address space as follows:
//
// [----------------------------------------------) normal address space
// 0 2³²
// [------------------) first sequence
// xRef xRef + xLength
// [--------------------------) . second sequence
// yRef . yRef + yLength
// : . . .
// : . . .
// . . .
// . . .
// . . .
// [----------------------------------------------) relative address space
// 0 . . 2³²
// [------------------) : first sequence
// x1 . x2 :
// -------------) [------------- second sequence
// y2 y1
</code></pre></div></div>
<hr />
<h2 id="state-machines">State Machines</h2>
<p>This comment from <a href="https://github.com/mono/mono/blob/2019-02/mono/benchmark/zipmark.cs#L204-L237">mono/benchmark/zipmark.cs</a> gives a great over-view of the implementation of <a href="https://www.ietf.org/rfc/rfc1951.txt">RFC 1951 - DEFLATE Compressed Data Format Specification</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/*
* The Deflater can do the following state transitions:
*
* (1) -> INIT_STATE ----> INIT_FINISHING_STATE ---.
* / | (2) (5) |
* / v (5) |
* (3)| SETDICT_STATE ---> SETDICT_FINISHING_STATE |(3)
* \ | (3) | ,-------'
* | | | (3) /
* v v (5) v v
* (1) -> BUSY_STATE ----> FINISHING_STATE
* | (6)
* v
* FINISHED_STATE
* \_____________________________________/
* | (7)
* v
* CLOSED_STATE
*
* (1) If we should produce a header we start in INIT_STATE, otherwise
* we start in BUSY_STATE.
* (2) A dictionary may be set only when we are in INIT_STATE, then
* we change the state as indicated.
* (3) Whether a dictionary is set or not, on the first call of deflate
* we change to BUSY_STATE.
* (4) -- intentionally left blank -- :)
* (5) FINISHING_STATE is entered, when flush() is called to indicate that
* there is no more INPUT. There are also states indicating, that
* the header wasn't written yet.
* (6) FINISHED_STATE is entered, when everything has been flushed to the
* internal pending output buffer.
* (7) At any time (7)
*
*/
</code></pre></div></div>
<p>This might be pushing the definition of ‘state machine’ a bit far, but I wanted to include it because it shows just how complex ‘exception handling’ can be, from <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/jiteh.cpp#L1935-L1966">coreclr/src/jit/jiteh.cpp</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// fgNormalizeEH: Enforce the following invariants:
//
// 1. No block is both the first block of a handler and the first block of a try. In IL (and on entry
// to this function), this can happen if the "try" is more nested than the handler.
//
// For example, consider:
//
// try1 ----------------- BB01
// | BB02
// |--------------------- BB03
// handler1
// |----- try2 ---------- BB04
// | | BB05
// | handler2 ------ BB06
// | | BB07
// | --------------- BB08
// |--------------------- BB09
//
// Thus, the start of handler1 and the start of try2 are the same block. We will transform this to:
//
// try1 ----------------- BB01
// | BB02
// |--------------------- BB03
// handler1 ------------- BB10 // empty block
// | try2 ---------- BB04
// | | BB05
// | handler2 ------ BB06
// | | BB07
// | --------------- BB08
// |--------------------- BB09
//
</code></pre></div></div>
<hr />
<h2 id="rfcs-and-specs">RFC’s and Specs</h2>
<p>Next up, how the <a href="https://docs.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel?view=aspnetcore-2.2">Kestrel web-server</a> handles <a href="https://tools.ietf.org/html/rfc7540">RFC 7540 - Hypertext Transfer Protocol Version 2 (HTTP/2)</a>.</p>
<p>Firstly, from <a href="https://github.com/aspnet/AspNetCore/blob/ab3e0f953e537c71b3ba06966e6db1e88e33bc41/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.cs#L6-L16">aspnet/AspNetCore/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/* https://tools.ietf.org/html/rfc7540#section-4.1
+-----------------------------------------------+
| Length (24) |
+---------------+---------------+---------------+
| Type (8) | Flags (8) |
+-+-------------+---------------+-------------------------------+
|R| Stream Identifier (31) |
+=+=============================================================+
| Frame Payload (0...) ...
+---------------------------------------------------------------+
*/
</code></pre></div></div>
<p>and then in <a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Headers.cs#L6-L18">aspnet/AspNetCore/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Headers.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/* https://tools.ietf.org/html/rfc7540#section-6.2
+---------------+
|Pad Length? (8)|
+-+-------------+-----------------------------------------------+
|E| Stream Dependency? (31) |
+-+-------------+-----------------------------------------------+
| Weight? (8) |
+-+-------------+-----------------------------------------------+
| Header Block Fragment (*) ...
+---------------------------------------------------------------+
| Padding (*) ...
+---------------------------------------------------------------+
*/
</code></pre></div></div>
<p>There are other notable examples in <a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameReader.cs#L15-L25">aspnet/AspNetCore/src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameReader.cs</a> and <a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameWriter.cs#L145-L158">aspnet/AspNetCore/src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameWriter.cs</a>.</p>
<p>Also <a href="https://tools.ietf.org/html/rfc3986">RFC 3986 - Uniform Resource Identifier (URI)</a> is discussed in <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/System/Net/IPv4AddressHelper.Common.cs#L105-L113">corefx/src/Common/src/System/Net/IPv4AddressHelper.Common.cs</a></p>
<p>Finally, <a href="https://httpwg.org/specs/rfc7541.html">RFC 7541 - HPACK: Header Compression for HTTP/2</a>, is covered in <a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/HPack/HPackDecoder.cs#L26-L71">aspnet/AspNetCore/src/Servers/Kestrel/Core/src/Internal/Http2/HPack/HPackDecoder.cs</a></p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.6.1</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | 1 | Index (7+) |</span>
<span class="c1">// +---+---------------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">IndexedHeaderFieldMask</span> <span class="p">=</span> <span class="m">0x80</span><span class="p">;</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">IndexedHeaderFieldRepresentation</span> <span class="p">=</span> <span class="m">0x80</span><span class="p">;</span>
<span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.6.2.1</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | 0 | 1 | Index (6+) |</span>
<span class="c1">// +---+---+-----------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldWithIncrementalIndexingMask</span> <span class="p">=</span> <span class="m">0xc0</span><span class="p">;</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldWithIncrementalIndexingRepresentation</span> <span class="p">=</span> <span class="m">0x40</span><span class="p">;</span>
<span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.6.2.2</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | 0 | 0 | 0 | 0 | Index (4+) |</span>
<span class="c1">// +---+---+-----------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldWithoutIndexingMask</span> <span class="p">=</span> <span class="m">0xf0</span><span class="p">;</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldWithoutIndexingRepresentation</span> <span class="p">=</span> <span class="m">0x00</span><span class="p">;</span>
<span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.6.2.3</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | 0 | 0 | 0 | 1 | Index (4+) |</span>
<span class="c1">// +---+---+-----------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldNeverIndexedMask</span> <span class="p">=</span> <span class="m">0xf0</span><span class="p">;</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">LiteralHeaderFieldNeverIndexedRepresentation</span> <span class="p">=</span> <span class="m">0x10</span><span class="p">;</span>
<span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.6.3</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | 0 | 0 | 1 | Max size (5+) |</span>
<span class="c1">// +---+---------------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">DynamicTableSizeUpdateMask</span> <span class="p">=</span> <span class="m">0xe0</span><span class="p">;</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">DynamicTableSizeUpdateRepresentation</span> <span class="p">=</span> <span class="m">0x20</span><span class="p">;</span>
<span class="c1">// http://httpwg.org/specs/rfc7541.html#rfc.section.5.2</span>
<span class="c1">// 0 1 2 3 4 5 6 7</span>
<span class="c1">// +---+---+---+---+---+---+---+---+</span>
<span class="c1">// | H | String Length (7+) |</span>
<span class="c1">// +---+---------------------------+</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">byte</span> <span class="n">HuffmanMask</span> <span class="p">=</span> <span class="m">0x80</span><span class="p">;</span>
</code></pre></div></div>
<hr />
<h2 id="dates--times">Dates & Times</h2>
<p>It is pretty widely accepted that <a href="https://www.reddit.com/r/programming/comments/ln1tg/bad_timing_why_dates_and_times_are_hard/">dates and times are hard</a> and that’s reflected in the amount of comments explaining different scenarios. For example from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/TimeZoneInfo.cs#L1273-L1289">corefx/src/Common/src/CoreLib/System/TimeZoneInfo.cs</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// startTime and endTime represent the period from either the start of DST to the end and
// ***does not include*** the potentially overlapped times
//
// -=-=-=-=-=- Pacific Standard Time -=-=-=-=-=-=-
// April 2, 2006 October 29, 2006
// 2AM 3AM 1AM 2AM
// | +1 hr | | -1 hr |
// | <invalid time> | | <ambiguous time> |
// [========== DST ========>)
//
// -=-=-=-=-=- Some Weird Time Zone -=-=-=-=-=-=-
// April 2, 2006 October 29, 2006
// 1AM 2AM 2AM 3AM
// | -1 hr | | +1 hr |
// | <ambiguous time> | | <invalid time> |
// [======== DST ========>)
//
</code></pre></div></div>
<p>Also, from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/TimeZoneInfo.Unix.cs#L1244-L1265">corefx/src/Common/src/CoreLib/System/TimeZoneInfo.Unix.cs</a> we see some details on how ‘leap-years’ are handled:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// should be n Julian day format which we don't support.
//
// This specifies the Julian day, with n between 0 and 365. February 29 is counted in leap years.
//
// n would be a relative number from the begining of the year. which should handle if the
// the year is a leap year or not.
//
// In leap year, n would be counted as:
//
// 0 30 31 59 60 90 335 365
// |-------Jan--------|-------Feb--------|-------Mar--------|....|-------Dec--------|
//
// while in non leap year we'll have
//
// 0 30 31 58 59 89 334 364
// |-------Jan--------|-------Feb--------|-------Mar--------|....|-------Dec--------|
//
//
// For example if n is specified as 60, this means in leap year the rule will start at Mar 1,
// while in non leap year the rule will start at Mar 2.
//
// If we need to support n format, we'll have to have a floating adjustment rule support this case.
</code></pre></div></div>
<p>Finally, this comment from <a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Runtime/tests/System/TimeZoneInfoTests.cs#L1512-L1524">corefx/src/System.Runtime/tests/System/TimeZoneInfoTests.cs</a> discusses invalid and ambiguous times that are covered in tests:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// March 26, 2006 October 29, 2006
// 2AM 3AM 2AM 3AM
// | +1 hr | | -1 hr |
// | <invalid time> | | <ambiguous time> |
// *========== DST ========>*
//
// * 00:59:59 Sunday March 26, 2006 in Universal converts to
// 01:59:59 Sunday March 26, 2006 in Europe/Amsterdam (NO DST)
//
// * 01:00:00 Sunday March 26, 2006 in Universal converts to
// 03:00:00 Sunday March 26, 2006 in Europe/Amsterdam (DST)
//
</code></pre></div></div>
<hr />
<h2 id="stack-layouts">Stack Layouts</h2>
<p>To finish off, I wanted to look at ‘stack layouts’ because they seem to be a favourite of the .NET/Mono Runtime Engineers, there’s sooo many examples!</p>
<p>First-up, <code class="language-plaintext highlighter-rouge">x68</code> from <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/lclvars.cpp#L4309-L4374">coreclr/src/jit/lclvars.cpp</a> (you can also see the <a href="https://github.com/dotnet/coreclr/blob/e277764916cbb740db199132be81701593820bb0/src/jit/lclvars.cpp#L3574-L3658">x64</a>, <a href="https://github.com/dotnet/coreclr/blob/e277764916cbb740db199132be81701593820bb0/src/jit/lclvars.cpp#L3660-L3744">ARM</a> and <a href="https://github.com/dotnet/coreclr/blob/e277764916cbb740db199132be81701593820bb0/src/jit/lclvars.cpp#L3746-L3835">ARM64</a> versions).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> * The frame is laid out as follows for x86:
*
* ESP frames
*
* | |
* |-----------------------|
* | incoming |
* | arguments |
* |-----------------------| <---- Virtual '0'
* | return address |
* +=======================+
* |Callee saved registers |
* |-----------------------|
* | Temps |
* |-----------------------|
* | Variables |
* |-----------------------| <---- Ambient ESP
* | Arguments for the |
* ~ next function ~
* | |
* | | |
* | | Stack grows |
* | downward
* V
*
*
* EBP frames
*
* | |
* |-----------------------|
* | incoming |
* | arguments |
* |-----------------------| <---- Virtual '0'
* | return address |
* +=======================+
* | incoming EBP |
* |-----------------------| <---- EBP
* |Callee saved registers |
* |-----------------------|
* | security object |
* |-----------------------|
* | ParamTypeArg |
* |-----------------------|
* | Last-executed-filter |
* |-----------------------|
* | |
* ~ Shadow SPs ~
* | |
* |-----------------------|
* | |
* ~ Variables ~
* | |
* ~-----------------------|
* | Temps |
* |-----------------------|
* | localloc |
* |-----------------------| <---- Ambient ESP
* | Arguments for the |
* | next function ~
* | |
* | | |
* | | Stack grows |
* | downward
* V
*
</code></pre></div></div>
<p>Not to be left out, Mono has some nice examples covering <a href="https://github.com/mono/mono/blob/2019-02/mono/mini/mini-mips.c#L4682-L4705">MIPS</a> (below), <a href="https://github.com/mono/mono/blob/2019-02/mono/mini/mini-ppc.c#L4677-L4692">PPC</a> and <a href="https://github.com/mono/mono/blob/2019-02/mono/mini/mini-arm.c#L6137-L6149">ARM</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/*
* Stack frame layout:
*
* ------------------- sp + cfg->stack_usage + cfg->param_area
* param area incoming
* ------------------- sp + cfg->stack_usage + MIPS_STACK_PARAM_OFFSET
* a0-a3 incoming
* ------------------- sp + cfg->stack_usage
* ra
* ------------------- sp + cfg->stack_usage-4
* spilled regs
* ------------------- sp +
* MonoLMF structure optional
* ------------------- sp + cfg->arch.lmf_offset
* saved registers s0-s8
* ------------------- sp + cfg->arch.iregs_offset
* locals
* ------------------- sp + cfg->param_area
* param area outgoing
* ------------------- sp + MIPS_STACK_PARAM_OFFSET
* a0-a3 outgoing
* ------------------- sp
* red zone
*/
</code></pre></div></div>
<p>Finally, there’s another example <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/dllimportcallback.cpp#L254-L293">covering <code class="language-plaintext highlighter-rouge">[DLLImport]</code> callbacks</a> and one more <a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/codegenarm64.cpp#L791-L873">involving funclet frames in ARM64</a>, I told you there were lots!!</p>
<hr />
<h2 id="the-rest">The Rest</h2>
<p>If you aren’t sick of ‘ASCII Art’ by now, here’s a few more examples for you to look at!!</p>
<ul>
<li>CoreCLR
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/arm/stubs.cpp#L1934-L1966">coreclr/stubs.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/inlinetracking.h#L191-L203">coreclr/inlinetracking.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/inlinetracking.h#L248-L260">coreclr/inlinetracking.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/comcallablewrapper.h#L105-L131">coreclr/comcallablewrapper.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/vm/comcallablewrapper.cpp#L1986-L2012">coreclr/comcallablewrapper.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/4e2d07b5f592627530ee5645fd94325f17ee9487/src/System.Private.CoreLib/shared/System/Runtime/InteropServices/SafeHandle.cs#L36-L46">coreclr/SafeHandle.cs</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/gc/gcpriv.h#L375-L398">coreclr/gcpriv.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/compiler.hpp#L2081-L2104">coreclr/compiler.hpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/optimizer.cpp#L1004-L1019">coreclr/optimizer.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9d3f264b9ef8b4715017ec615dcb6f9d57e607cc/src/jit/codegencommon.cpp#L4858-L4911">coreclr/codegencommon.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/c4dca1072d15bdda64c754ad1ea474b1580fa554/src/jit/morph.cpp#L1768-L1785">coreclr/morph.cpp</a></li>
</ul>
</li>
<li>Roslyn
<ul>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/Test/Resources/Core/MetadataTests/Invalid/Signatures/SignatureCycle2.il#L3-L20">roslyn/SignatureCycle2.il</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/CSharp/Portable/Symbols/Source/SourceMemberContainerSymbol.cs#L1017-L1022">roslyn/SourceMemberContainerSymbol.cs</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/Core/CodeAnalysisTest/RealParserTests.cs#L529-L551">roslyn/RealParserTests.cs</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.9/src/Compilers/CSharp/Portable/Compilation/CSharpSemanticModel.cs#L2718-L2759">roslyn/CSharpSemanticModel.cs</a></li>
</ul>
</li>
<li>CoreFX
<ul>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L1433-L1453">corefx/Decimal.DecCalc.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Number.Grisu3.cs#L964-L991">corefx/Number.Grisu3.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Buffers/Binary/Reader.cs#L89-L107">corefx/Reader.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Globalization/Calendar.cs#L371-L400">corefx/Calendar.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/System/Collections/Generic/LargeArrayBuilder.SpeedOpt.cs#L196-L203">corefx/LargeArrayBuilder.SpeedOpt.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/Common/src/CoreLib/System/Runtime/Intrinsics/Vector128.cs#L610-L625">corefx/Vector128.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Collections/src/System/Collections/Generic/SortedSet.cs#L18-L26">corefx/SortedSet.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Data.Common/src/System/Data/RbTree.cs#L75-L81">corefx/RbTree.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Numerics.Vectors/src/System/Numerics/Matrix4x4.cs#L818-L842">corefx/Matrix4x4.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs#L396-L410">corefx/BlobBuilder.cs</a></li>
<li><a href="https://github.com/dotnet/corefx/blob/4b9fff5c022269c7dbb000bd14c10be27400beb2/src/System.Runtime.Extensions/src/System/IO/BufferedStream.cs#L909-L918">corefx/BufferedStream.cs</a></li>
</ul>
</li>
<li>AspNetCore
<ul>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Data.cs#L6-L14">AspNetCore/Http2Frame.Data.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Ping.cs#L6-L12">AspNetCore/Http2Frame.Ping.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.GoAway.cs#L6-L14">AspNetCore/Http2Frame.GoAway.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Priority.cs#L6-L12">AspNetCore/Http2Frame.Priority.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Settings.cs#L6-L13">AspNetCore/Http2Frame.Settings.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.RstStream.cs#L6-L10">AspNetCore/Http2Frame.RstStream.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.Continuation.cs#L6-L10">AspNetCore/Http2Frame.Continuation.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/Http2Frame.WindowUpdate.cs#L6-L10">AspNetCore/Http2Frame.WindowUpdate.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/Core/src/Internal/Http2/HPack/HPackDecoder.cs#L26-L71">AspNetCore/HPackDecoder.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Components/Components/test/RenderTreeBuilderTest.cs#L188-L203">AspNetCore/RenderTreeBuilderTest.cs</a></li>
<li><a href="https://github.com/aspnet/AspNetCore/blob/9f1a978230cdd161998815c425bfd2d25e8436b6/src/Servers/Kestrel/test/FunctionalTests/MaxRequestBufferSizeTests.cs#L25-L45">AspNetCore/MaxRequestBufferSizeTests.cs</a></li>
</ul>
</li>
<li>Mono
<ul>
<li><a href="https://github.com/mono/mono/blob/2019-02/mono/sgen/sgen-qsort.c#L46-L53">mono/sgen/sgen-qsort.c</a></li>
</ul>
</li>
</ul>
Is C# a low-level language?2019-03-01T00:00:00+00:00http://www.mattwarren.org/2019/03/01/Is-CSharp-a-low-level-language
<p>I’m a massive fan of everything <a href="http://fabiensanglard.net/">Fabien Sanglard</a> does, I love his blog and I’ve read <a href="http://fabiensanglard.net/gebbdoom/index.html">both</a> his <a href="http://fabiensanglard.net/gebbwolf3d/index.html">books</a> cover-to-cover (for more info on his books, check out the recent <a href="https://hanselminutes.com/666/episode-666-game-engine-black-book-doom-with-fabien-sanglard">Hansleminutes podcast</a>).</p>
<p>Recently he wrote an excellent post where he <a href="http://fabiensanglard.net/postcard_pathtracer/index.html">deciphered a postcard sized raytracer</a>, un-packing the obfuscated code and providing a fantastic explanation of the maths involved. I really recommend you take the time to read it!</p>
<p>But it got me thinking, <strong><em>would it be possible to port that C++ code to C#?</em></strong></p>
<p>Partly because in my <a href="https://raygun.com/platform/apm">day job</a> I’ve been having to write a fair amount of C++ recently and I’ve realised I’m a bit rusty, so I thought this might help!</p>
<p>But more significantly, I wanted to get a better insight into the question <strong>is C# a low-level language?</strong></p>
<p>A slightly different, but related question is <em>how suitable is C# for ‘systems programming’?</em> For more on that I really recommend Joe Duffy’s <a href="http://joeduffyblog.com/2013/12/27/csharp-for-systems-programming/">excellent post from 2013</a>.</p>
<hr />
<h2 id="line-by-line-port">Line-by-line port</h2>
<p>I started by simply porting the <a href="http://fabiensanglard.net/postcard_pathtracer/formatted_full.html">un-obfuscated C++ code</a> line-by-line <a href="https://gist.github.com/mattwarren/d17a0c356bd6fdb9f596bee6b9a5e63c">to C#</a>. Turns out that this was pretty straight forward, I guess the <a href="https://stackoverflow.com/a/1991356">story about C# being C++++</a> is true after all!!</p>
<p>Let’s look at an example, the main data structure in the code is a ‘vector’, here’s the code side-by-side, C++ on the left and C# on the right:</p>
<p><a href="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20struct%20Vec.png"><img src="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20struct%20Vec.png" alt="Diff - C++ v. C# - struct Vec" /></a></p>
<p>So there’s a few syntax differences, but because .NET lets you define <a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/value-types">your own ‘Value Types’</a> I was able to get the same functionality. This is significant because treating the ‘vector’ as a <code class="language-plaintext highlighter-rouge">struct</code> means we can get better ‘data locality’ and the .NET Garbage Collector (GC) doesn’t need to be involved as the data will go onto the <em>stack</em> (probably, yes I know it’s an implementation detail).</p>
<p>For more info on <code class="language-plaintext highlighter-rouge">structs</code> or ‘value types’ in .NET see:</p>
<ul>
<li><a href="http://tooslowexception.com/heap-vs-stack-value-type-vs-reference-type/">Heap vs stack, value type vs reference type</a></li>
<li><a href="https://adamsitnik.com/Value-Types-vs-Reference-Types/">Value Types vs Reference Types</a></li>
<li><a href="http://jonskeet.uk/csharp/memory.html">Memory in .NET - what goes where</a></li>
<li><a href="https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/">The Truth About Value Types</a></li>
<li><a href="https://blogs.msdn.microsoft.com/ericlippert/2009/04/27/the-stack-is-an-implementation-detail-part-one/">The Stack Is An Implementation Detail, Part One</a></li>
</ul>
<p>In particular that last post form Eric Lippert contains this helpful quote that makes it clear what ‘value types’ really are:</p>
<blockquote>
<p>Surely the most relevant fact about value types is <strong>not the implementation detail of <em>how they are allocated</em></strong>, but rather the <em>by-design semantic meaning</em> of “value type”, <strong>namely that they are <em>always copied “by value”</em></strong>. If the relevant thing was their allocation details then we’d have called them “heap types” and “stack types”. But that’s not relevant most of the time. Most of the time the relevant thing is their copying and identity semantics.</p>
</blockquote>
<p>Now lets look at how some other methods look side-by-side (again C++ on the left, C# on the right), first up <code class="language-plaintext highlighter-rouge">RayTracing(..)</code>:</p>
<p><a href="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20RayMatching.png"><img src="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20RayMatching.png" alt="Diff - C++ v. C# - RayMatching" /></a></p>
<p>Next <code class="language-plaintext highlighter-rouge">QueryDatabase(..)</code>:</p>
<p><a href="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20QueryDatabase%20(partial).png"><img src="/images/2019/03/Diff%20-%20C++%20v.%20C%23%20-%20QueryDatabase%20(partial).png" alt="Diff - C++ v. C# - QueryDatabase" /></a></p>
<p>(see <a href="http://fabiensanglard.net/postcard_pathtracer/">Fabien’s post</a> for an explanation of what these 2 functions are doing)</p>
<p>But the point is that again, C# lets us very easily write C++ code! In this case what helps us out the most is the <code class="language-plaintext highlighter-rouge">ref</code> keyword which lets us pass a <a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref">value by reference</a>. We’ve been able to use <code class="language-plaintext highlighter-rouge">ref</code> in method calls for quite a while, but recently there’s been a effort to allow <code class="language-plaintext highlighter-rouge">ref</code> in more places:</p>
<ul>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/ref-returns">Ref returns and ref locals</a></li>
<li><a href="https://blogs.msdn.microsoft.com/mazhou/2018/03/02/c-7-series-part-9-ref-structs/">C# 7 Series, Part 9: ref structs</a></li>
</ul>
<p>Now <em>sometimes</em> using <code class="language-plaintext highlighter-rouge">ref</code> can provide a performance boost because it means that the <code class="language-plaintext highlighter-rouge">struct</code> doesn’t need to be copied, see the benchmarks in <a href="https://adamsitnik.com/ref-returns-and-ref-locals/#passing-arguments-to-methods-by-reference">Adam Sitniks post</a> and <a href="https://blogs.msdn.microsoft.com/seteplia/2018/04/11/performance-traps-of-ref-locals-and-ref-returns-in-c/">Performance traps of ref locals and ref returns in C#</a> for more information.</p>
<p>However what’s most important for this scenario is that it allows us to have the same behaviour in our C# port as the original C++ code. Although I want to point out that ‘Managed References’ as they’re known aren’t exactly the same as ‘pointers’, most notably you can’t do arithmetic on them, for more on this see:</p>
<ul>
<li><a href="http://mustoverride.com/refs-not-ptrs/">ref returns are not pointers</a></li>
<li><a href="http://mustoverride.com/managed-refs-CLR/">Managed pointers</a></li>
<li><a href="https://blogs.msdn.microsoft.com/ericlippert/2009/02/17/references-are-not-addresses/">References are not addresses</a></li>
</ul>
<hr />
<h2 id="performance">Performance</h2>
<p>So, it’s all well and good being able to port the code, but ultimately the performance also matters. Especially in something like a ‘ray tracer’ that can take minutes to run! The C++ code contains a variable called <code class="language-plaintext highlighter-rouge">sampleCount</code> that controls the final quality of the image, with <code class="language-plaintext highlighter-rouge">sampleCount = 2</code> it looks like this:</p>
<p><a href="/images/2019/03/output-C%23%20-%20sampleCount%20=%202.png"><img src="/images/2019/03/output-C%23%20-%20sampleCount%20=%202.png" alt="output C# - sampleCount = 2" /></a></p>
<p>Which clearly isn’t that realistic!</p>
<p>However once you get to <code class="language-plaintext highlighter-rouge">sampleCount = 2048</code> things look a <em>lot</em> better:</p>
<p><a href="/images/2019/03/output-C%23%20-%20sampleCount%20=%202048.png"><img src="/images/2019/03/output-C%23%20-%20sampleCount%20=%202048.png" alt="output C# - sampleCount = 2048" /></a></p>
<p>But, running with <code class="language-plaintext highlighter-rouge">sampleCount = 2048</code> means the rendering takes a <strong>long time</strong>, so all the following results were run with it set to <code class="language-plaintext highlighter-rouge">2</code>, which means the test runs completed in ~1 minute. Changing <code class="language-plaintext highlighter-rouge">sampleCount</code> only affects the number of iterations of the outermost loop of the code, see <a href="https://gist.github.com/mattwarren/1580572d9d641147c61caf65c383c3a4">this gist</a> for an explanation.</p>
<h3 id="results-after-a-naive-line-by-line-port">Results after a ‘naive’ line-by-line port</h3>
<p>To be able to give a meaningful side-by-side comparison of the C++ and C# versions I used the <a href="https://code.google.com/archive/p/time-windows/source/default/source">time-windows</a> tool that’s a port of the Unix <code class="language-plaintext highlighter-rouge">time</code> command. My initial results looked this this:</p>
<table>
<thead>
<tr>
<th> </th>
<th>C++ (VS 2017)</th>
<th>.NET Framework (4.7.2)</th>
<th>.NET Core (2.2)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Elapsed time (secs)</td>
<td>47.40</td>
<td>80.14</td>
<td>78.02</td>
</tr>
<tr>
<td>Kernel time</td>
<td>0.14 (0.3%)</td>
<td>0.72 (0.9%)</td>
<td>0.63 (0.8%)</td>
</tr>
<tr>
<td>User time</td>
<td>43.86 (92.5%)</td>
<td>73.06 (91.2%)</td>
<td>70.66 (90.6%)</td>
</tr>
<tr>
<td>page fault #</td>
<td>1,143</td>
<td>4,818</td>
<td>5,945</td>
</tr>
<tr>
<td>Working set (KB)</td>
<td>4,232</td>
<td>13,624</td>
<td>17,052</td>
</tr>
<tr>
<td>Paged pool (KB)</td>
<td>95</td>
<td>172</td>
<td>154</td>
</tr>
<tr>
<td>Non-paged pool</td>
<td>7</td>
<td>14</td>
<td>16</td>
</tr>
<tr>
<td>Page file size (KB)</td>
<td>1,460</td>
<td>10,936</td>
<td>11,024</td>
</tr>
</tbody>
</table>
<p>So initially we see that the C# code is quite a bit slower than the C++ version, but it does get better (see below).</p>
<p>However lets first look at what the .NET JIT is doing for us even with this ‘naive’ line-by-line port. Firstly, it’s doing a nice job of in-lining the smaller ‘helper methods’, we can see this by looking at the output of the brilliant <a href="https://marketplace.visualstudio.com/items?itemName=StephanZehetner.InliningAnalyzer">Inlining Analyzer</a> tool (green overlay = inlined):</p>
<p><a href="/images/2019/03/Inlining Analyzer - QueryDatabase.png"><img src="/images/2019/03/Inlining Analyzer - QueryDatabase.png" alt="Inlining Analyzer - QueryDatabase" /></a></p>
<p>However, it doesn’t inline all methods, for example <code class="language-plaintext highlighter-rouge">QueryDatabase(..)</code> is skipped because of it’s complexity:</p>
<p><a href="/images/2019/03/Inlining Analyzer - RayMarching - with ToolTip.png"><img src="/images/2019/03/Inlining Analyzer - RayMarching - with ToolTip.png" alt="Inlining Analyzer - RayMarching - with ToolTip" /></a></p>
<p>Another feature that the .NET Just-In-Time (JIT) compiler provides is converting specific methods calls into corresponding CPU instructions. We can see this in action with the <code class="language-plaintext highlighter-rouge">sqrt</code> wrapper function, here’s the original C# code (note the call to <code class="language-plaintext highlighter-rouge">Math.Sqrt</code>):</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// intnv square root</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Vec</span> <span class="k">operator</span> <span class="p">!(</span><span class="n">Vec</span> <span class="n">q</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">q</span> <span class="p">*</span> <span class="p">(</span><span class="m">1.0f</span> <span class="p">/</span> <span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">q</span> <span class="p">%</span> <span class="n">q</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And here’s the assembly code that the .NET JIT generates, there’s no call to <code class="language-plaintext highlighter-rouge">Math.Sqrt</code> and it makes use of the <code class="language-plaintext highlighter-rouge">vsqrtsd</code> <a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=vsqrtsd&expand=5236">CPU instruction</a>:</p>
<pre><code class="language-assembly">; Assembly listing for method Program:sqrtf(float):float
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) float -> mm0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M8216_IG01:
vzeroupper
G_M8216_IG02:
vcvtss2sd xmm0, xmm0
vsqrtsd xmm0, xmm0
vcvtsd2ss xmm0, xmm0
G_M8216_IG03:
ret
; Total bytes of code 16, prolog size 3 for method Program:sqrtf(float):float
; ============================================================
</code></pre>
<p>(to get this output you need to following <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md#useful-complus-variables">these instructions</a>, use the <a href="https://github.com/EgorBo/Disasmo">‘Disasmo’ VS2019 Add-in</a> or take a look at <a href="https://sharplab.io/#v2:EYLgHgbALANALiAhgZwLYB8ACAGABJgRgG4BYAKEwGZ8AmXAYVwG9zc39rMpcBZACgCUzVu1EA3RACdcYXAF5eiOAAsAdAGUAjpLh8C2AaTKjRhAJx8whkWwC+5W0A==">SharpLab.io</a>)</p>
<p>These replacements are also known as <a href="https://en.wikipedia.org/wiki/Intrinsic_function">‘intrinsics’</a> and we can see the JIT generating them in the code below. This snippet just shows the mapping for <code class="language-plaintext highlighter-rouge">AMD64</code>, the JIT also targets <code class="language-plaintext highlighter-rouge">X86</code>, <code class="language-plaintext highlighter-rouge">ARM</code> and <code class="language-plaintext highlighter-rouge">ARM64</code>, the full method is <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/importer.cpp#L19144-L19217">here</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="n">Compiler</span><span class="o">::</span><span class="n">IsTargetIntrinsic</span><span class="p">(</span><span class="n">CorInfoIntrinsics</span> <span class="n">intrinsicId</span><span class="p">)</span>
<span class="p">{</span>
<span class="cp">#if defined(_TARGET_AMD64_) || (defined(_TARGET_X86_) && !defined(LEGACY_BACKEND))
</span> <span class="k">switch</span> <span class="p">(</span><span class="n">intrinsicId</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// AMD64/x86 has SSE2 instructions to directly compute sqrt/abs and SSE4.1</span>
<span class="c1">// instructions to directly compute round/ceiling/floor.</span>
<span class="c1">//</span>
<span class="c1">// TODO: Because the x86 backend only targets SSE for floating-point code,</span>
<span class="c1">// it does not treat Sine, Cosine, or Round as intrinsics (JIT32</span>
<span class="c1">// implemented those intrinsics as x87 instructions). If this poses</span>
<span class="c1">// a CQ problem, it may be necessary to change the implementation of</span>
<span class="c1">// the helper calls to decrease call overhead or switch back to the</span>
<span class="c1">// x87 instructions. This is tracked by #7097.</span>
<span class="k">case</span> <span class="n">CORINFO_INTRINSIC_Sqrt</span><span class="p">:</span>
<span class="k">case</span> <span class="n">CORINFO_INTRINSIC_Abs</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="k">case</span> <span class="n">CORINFO_INTRINSIC_Round</span><span class="p">:</span>
<span class="k">case</span> <span class="n">CORINFO_INTRINSIC_Ceiling</span><span class="p">:</span>
<span class="k">case</span> <span class="n">CORINFO_INTRINSIC_Floor</span><span class="p">:</span>
<span class="k">return</span> <span class="n">compSupports</span><span class="p">(</span><span class="n">InstructionSet_SSE41</span><span class="p">);</span>
<span class="nl">default:</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>As you can see, some methods are implemented like this, e.g. <code class="language-plaintext highlighter-rouge">Sqrt</code> and <code class="language-plaintext highlighter-rouge">Abs</code>, but for others the CLR instead uses the C++ runtime functions <a href="https://en.cppreference.com/w/c/numeric/math/pow">for instance <code class="language-plaintext highlighter-rouge">powf</code></a>.</p>
<p>This entire process is explained very nicely in <a href="https://stackoverflow.com/a/8870593">How is Math.Pow() implemented in .NET Framework?</a>, but we can also see it in action in the CoreCLR source:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">COMSingle::Pow</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/classlibnative/float/floatsingle.cpp#L205-L212">implementation</a>, i.e. the method that’s executed if you call <code class="language-plaintext highlighter-rouge">MathF.Pow(..)</code> from C# code</li>
<li>Mapping to <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/pal/inc/pal.h#L4094-L4198">C runtime method implementations</a></li>
<li>Cross-platform version of <code class="language-plaintext highlighter-rouge">powf</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/pal/src/cruntime/math.cpp#L755-L840">implementation</a> that ensures the same behaviour across OSes</li>
</ul>
<h3 id="results-after-simple-performance-improvements">Results after simple performance improvements</h3>
<p>However, I wanted to see if my ‘naive’ line-by-line port could be improved, after some profiling I made two main changes:</p>
<ul>
<li>Remove in-line array initialisation</li>
<li>Switch from <code class="language-plaintext highlighter-rouge">Math.XXX(..)</code> functions to the <code class="language-plaintext highlighter-rouge">MathF.XXX()</code> counterparts.</li>
</ul>
<p>These changes are explained in more depth below</p>
<h4 id="remove-in-line-array-initialisation">Remove in-line array initialisation</h4>
<p>For more information about why this is necessary see this excellent <a href="https://stackoverflow.com/a/39106675">Stack Overflow answer</a> from <a href="https://twitter.com/andrey_akinshin?lang=en">Andrey Akinshin</a> complete with benchmarks and assembly code! It comes to the following conclusion:</p>
<blockquote>
<p><strong>Conclusion</strong></p>
<ul>
<li><strong><em>Does .NET caches hardcoded local arrays?</em></strong> Kind of: the Roslyn compiler put it in the metadata.</li>
<li><strong><em>Do we have any overhead in this case?</em></strong> Unfortunately, yes: JIT will copy the array content from the metadata for each invocation; it will work longer than the case with a static array. Runtime also allocates objects and produce memory traffic.</li>
<li><strong><em>Should we care about it?</em></strong> It depends. If it’s a hot method and you want to achieve a good level of performance, you should use a static array. If it’s a cold method which doesn’t affect the application performance, you probably should write “good” source code and put the array in the method scope.</li>
</ul>
</blockquote>
<p>You can see the change I made <a href="https://gist.github.com/mattwarren/d17a0c356bd6fdb9f596bee6b9a5e63c/revisions#diff-ab5447b35812d457232030d7d2577458R114">in this diff</a>.</p>
<h4 id="using-mathf-functions-instead-of-math">Using MathF functions instead of Math</h4>
<p>Secondly and most significantly I got a big perf improvement by making the following changes:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if NETSTANDARD2_1 || NETCOREAPP2_0 || NETCOREAPP2_1 || NETCOREAPP2_2 || NETCOREAPP3_0
</span> <span class="c1">// intnv square root</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Vec</span> <span class="k">operator</span> <span class="p">!(</span><span class="n">Vec</span> <span class="n">q</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">q</span> <span class="p">*</span> <span class="p">(</span><span class="m">1.0f</span> <span class="p">/</span> <span class="n">MathF</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">q</span> <span class="p">%</span> <span class="n">q</span><span class="p">));</span>
<span class="p">}</span>
<span class="cp">#else
</span> <span class="k">public</span> <span class="k">static</span> <span class="n">Vec</span> <span class="k">operator</span> <span class="p">!(</span><span class="n">Vec</span> <span class="n">q</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">q</span> <span class="p">*</span> <span class="p">(</span><span class="m">1.0f</span> <span class="p">/</span> <span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">q</span> <span class="p">%</span> <span class="n">q</span><span class="p">));</span>
<span class="p">}</span>
<span class="cp">#endif
</span></code></pre></div></div>
<p>As of ‘.NET Standard 2.1’ there are now specific <code class="language-plaintext highlighter-rouge">float</code> implementations of the common maths functions, located in the <a href="https://apisof.net/catalog/System.MathF">System.MathF class</a>. For more information on this API and it’s implementation see:</p>
<ul>
<li><a href="https://github.com/dotnet/corefx/issues/1151">New API for single-precision math</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/5492/files">Adding single-precision math functions</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/7690">Provide a set of unit tests over the new single-precision math APIs</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/14155">System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/14156">Moving <code class="language-plaintext highlighter-rouge">Math.Abs(double)</code> and <code class="language-plaintext highlighter-rouge">Math.Abs(float)</code> to be implemented in managed code.</a></li>
<li><a href="https://github.com/dotnet/designs/issues/13">Design and process for adding platform dependent intrinsics to .NET</a></li>
</ul>
<p>After these changes, the C# code is ~10% slower than the C++ version:</p>
<table>
<thead>
<tr>
<th> </th>
<th>C++ (VS C++ 2017)</th>
<th>.NET Framework (4.7.2)</th>
<th>.NET Core (2.2) TC OFF</th>
<th>.NET Core (2.2) TC ON</th>
</tr>
</thead>
<tbody>
<tr>
<td>Elapsed time (secs)</td>
<td>41.38</td>
<td>58.89</td>
<td>46.04</td>
<td>44.33</td>
</tr>
<tr>
<td>Kernel time</td>
<td>0.05 (0.1%)</td>
<td>0.06 (0.1%)</td>
<td>0.14 (0.3%)</td>
<td>0.13 (0.3%)</td>
</tr>
<tr>
<td>User time</td>
<td>41.19 (99.5%)</td>
<td>58.34 (99.1%)</td>
<td>44.72 (97.1%)</td>
<td>44.03 (99.3%)</td>
</tr>
<tr>
<td>page fault #</td>
<td>1,119</td>
<td>4,749</td>
<td>5,776</td>
<td>5,661</td>
</tr>
<tr>
<td>Working set (KB)</td>
<td>4,136</td>
<td>13,440</td>
<td>16,788</td>
<td>16,652</td>
</tr>
<tr>
<td>Paged pool (KB)</td>
<td>89</td>
<td>172</td>
<td>150</td>
<td>150</td>
</tr>
<tr>
<td>Non-paged pool</td>
<td>7</td>
<td>13</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>Page file size (KB)</td>
<td>1,428</td>
<td>10,904</td>
<td>10,960</td>
<td>11,044</td>
</tr>
</tbody>
</table>
<p>TC = <a href="https://devblogs.microsoft.com/dotnet/tiered-compilation-preview-in-net-core-2-1/">Tiered Compilation</a> (I <em>believe</em> that it’ll be on by default in .NET Core 3.0)</p>
<p>For completeness, here’s the results across several runs:</p>
<table>
<thead>
<tr>
<th>Run</th>
<th style="text-align: center">C++ (VS C++ 2017)</th>
<th style="text-align: center">.NET Framework (4.7.2)</th>
<th style="text-align: center">.NET Core (2.2) TC OFF</th>
<th style="text-align: center">.NET Core (2.2) TC ON</th>
</tr>
</thead>
<tbody>
<tr>
<td>TestRun-01</td>
<td style="text-align: center">41.38</td>
<td style="text-align: center">58.89</td>
<td style="text-align: center">46.04</td>
<td style="text-align: center">44.33</td>
</tr>
<tr>
<td>TestRun-02</td>
<td style="text-align: center">41.19</td>
<td style="text-align: center">57.65</td>
<td style="text-align: center">46.23</td>
<td style="text-align: center">45.96</td>
</tr>
<tr>
<td>TestRun-03</td>
<td style="text-align: center">42.17</td>
<td style="text-align: center">62.64</td>
<td style="text-align: center">46.22</td>
<td style="text-align: center">48.73</td>
</tr>
</tbody>
</table>
<p><strong>Note:</strong> the difference between .NET Core and .NET Framework is due to the lack of the <code class="language-plaintext highlighter-rouge">MathF</code> API in .NET Framework v4.7.2, for more info see <a href="https://github.com/dotnet/standard/issues/859">Support .Net Framework (4.8?) for netstandard 2.1</a>.</p>
<hr />
<h2 id="further-performance-improvements">Further performance improvements</h2>
<p>However I’m sure that others can do better!</p>
<p>If you’re interested in trying to close the gap the <a href="https://gist.github.com/mattwarren/d17a0c356bd6fdb9f596bee6b9a5e63c">C# code is available</a>. For comparison, you can see the assembly produced by the C++ compiler courtesy of the brilliant <a href="https://godbolt.org/z/l2QZLY">Compiler Explorer</a>.</p>
<p>Finally, if it helps, here’s the output from the Visual Studio Profiler showing the ‘hot path’ (after the perf improvement described above):</p>
<p><a href="/images/2019/03/Call%20Tree%20(tidied%20up)%20-%20Report20190221-2029-After-MathF-Changes-NetCore.png"><img src="/images/2019/03/Call%20Tree%20(tidied%20up)%20-%20Report20190221-2029-After-MathF-Changes-NetCore.png" alt="Call Tree (tidied up) - Report20190221-2029-After-MathF-Changes-NetCore.png" /></a></p>
<hr />
<h2 id="is-c-a-low-level-language">Is C# a low-level language?</h2>
<p>Or more specifically:</p>
<blockquote>
<p><strong>What language features of C#/F#/VB.NET or BCL/Runtime functionality enable ‘low-level’* programming?</strong></p>
</blockquote>
<p>* yes, I know ‘low-level’ is a subjective term 😊</p>
<p><strong>Note</strong>: Any C# developer is going to have a different idea of what ‘low-level’ means, these features would be taken for granted by C++ or Rust programmers.</p>
<p>Here’s the list that I came up with:</p>
<ul>
<li><a href="https://adamsitnik.com/ref-returns-and-ref-locals/">ref returns and ref locals</a>
<ul>
<li>“tl;dr Pass and return by reference to avoid large struct copying. It’s type and memory safe. It can be even <strong>faster</strong> than <code class="language-plaintext highlighter-rouge">unsafe!</code>”</li>
</ul>
</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/unsafe-code">Unsafe code in .NET</a>
<ul>
<li>“The core C# language, as defined in the preceding chapters, differs notably from C and C++ in its omission of pointers as a data type. Instead, C# provides references and the ability to create objects that are managed by a garbage collector. This design, coupled with other features, makes C# a much safer language than C or C++.”</li>
</ul>
</li>
<li><a href="http://tooslowexception.com/managed-pointers-in-net/">Managed pointers in .NET</a>
<ul>
<li>“There is, however, another pointer type in CLR – a managed pointer. It could be defined as a more general type of reference, which may point to other locations than just the beginning of an object.”</li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/mazhou/2018/03/25/c-7-series-part-10-spant-and-universal-memory-management/">C# 7 Series, Part 10: Span<T> and universal memory management</a>
<ul>
<li>“<code class="language-plaintext highlighter-rouge">System.Span<T></code> is a stack-only type (<code class="language-plaintext highlighter-rouge">ref struct</code>) that wraps all memory access patterns, it is the type for universal contiguous memory access. You can think the implementation of the Span<T> contains a dummy reference and a length, accepting all 3 memory access types."</T></li>
</ul>
</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/interop/">Interoperability (C# Programming Guide)</a>
<ul>
<li>“The .NET Framework enables interoperability with unmanaged code through platform invoke services, the <code class="language-plaintext highlighter-rouge">System.Runtime.InteropServices</code> namespace, C++ interoperability, and COM interoperability (COM interop).”</li>
</ul>
</li>
</ul>
<p>However, I know my limitations and so I <a href="https://twitter.com/matthewwarren/status/1097875987398828032">asked on twitter</a> and got <em>a lot</em> more replies to add to the list:</p>
<ul>
<li><a href="https://twitter.com/ben_a_adams/status/1097876408775442432">Ben Adams</a> “Platform intrinsics (CPU instruction access)”</li>
<li><a href="https://twitter.com/marcgravell/status/1097877192745336837">Marc Gravell</a> “SIMD via Vector<T> (which mixes well with Span<T>) is *fairly* low; .NET Core should (soon?) offer direct CPU intrinsics for more explicit usage targeting particular CPU ops"</T></T></li>
<li><a href="https://twitter.com/marcgravell/status/1097878317875761153">Marc Gravell</a> “powerful JIT: things like range elision on arrays/spans, and the JIT using per-struct-T rules to remove huge chunks of code that it knows can’t be reached for that T, or on your particular CPU (BitConverter.IsLittleEndian, Vector.IsHardwareAccelerated, etc)”</li>
<li><a href="https://twitter.com/vcsjones/status/1097877294864056320">Kevin Jones</a> “I would give a special shout-out to the <code class="language-plaintext highlighter-rouge">MemoryMarshal</code> and <code class="language-plaintext highlighter-rouge">Unsafe</code> classes, and probably a few other things in the <code class="language-plaintext highlighter-rouge">System.Runtime.CompilerServices</code> namespace.”</li>
<li><a href="https://twitter.com/Pessimizations/status/1097877381296066560">Theodoros Chatzigiannakis</a> “You could also include <code class="language-plaintext highlighter-rouge">__makeref</code> and the rest.”</li>
<li><a href="https://twitter.com/damageboy/status/1097877247120326658">damageboy</a> “Being able to dynamically generate code that fits the expected input exactly, given that the latter will only be known at runtime, and might change periodically?”</li>
<li><a href="https://twitter.com/RobertHaken/status/1097880613988851712">Robert Haken</a> “dynamic IL emission”</li>
<li><a href="https://twitter.com/buybackoff/status/1097885830364966914">Victor Baybekov</a> “Stackalloc was not mentioned. Also ability to write raw IL (not dynamic, so save on a delegate call), e.g. to use cached <code class="language-plaintext highlighter-rouge">ldftn</code> and call them via <code class="language-plaintext highlighter-rouge">calli</code>. VS2017 has a proj template that makes this trivial via extern methods + MethodImplOptions.ForwardRef + ilasm.exe rewrite.”</li>
<li><a href="https://twitter.com/buybackoff/status/1097887318806093824">Victor Baybekov</a> “Also MethodImplOptions.AggressiveInlining “does enable ‘low-level’ programming” in a sense that it allows to write high-level code with many small methods and still control JIT behavior to get optimized result. Otherwise uncomposable 100s LOCs methods with copy-paste…”</li>
<li><a href="https://twitter.com/ben_a_adams/status/1097885533508980738">Ben Adams</a> “Using the same calling conventions (ABI) as the underlying platform and p/invokes for interop might be more of a thing though?”</li>
<li><a href="https://twitter.com/buybackoff/status/1097893756672581632">Victor Baybekov</a> “Also since you mentioned #fsharp - it does have <code class="language-plaintext highlighter-rouge">inline</code> keyword that does the job at IL level before JIT, so it was deemed important at the language level. C# lacks this (so far) for lambdas which are always virtual calls and workarounds are often weird (constrained generics).”</li>
<li><a href="https://twitter.com/xoofx/status/1097895771142320128">Alexandre Mutel</a> “new SIMD intrinsics, Unsafe Utility class/IL post processing (e.g custom, Fody…etc.). For C#8.0, upcoming function pointers…”</li>
<li><a href="https://twitter.com/xoofx/status/1097896059236466689">Alexandre Mutel</a> “related to IL, F# has support for direct IL within the language for example”</li>
<li><a href="https://twitter.com/0omari0/status/1097916897952235520">OmariO</a> “BinaryPrimitives. Low-level but safe.” (https://docs.microsoft.com/en-us/dotnet/api/system.buffers.binary.binaryprimitives?view=netcore-3.0)</li>
<li><a href="https://twitter.com/kozy_kekyo/status/1097982126190878720">Kouji (Kozy) Matsui</a> “How about native inline assembler? It’s difficult for how relation both toolchains and runtime, but can replace current P/Invoke solution and do inlining if we have it.”</li>
<li><a href="https://twitter.com/praeclarum/status/1098002275891642368">Frank A. Krueger</a> “Ldobj, stobj, initobj, initblk, cpyblk.”</li>
<li><a href="https://twitter.com/konradkokosa/status/1098155819340828672">Konrad Kokosa</a> “Maybe Thread Local Storage? Fixed Size Buffers? unmanaged constraint and blittable types should be probably mentioned:)”</li>
<li><a href="https://twitter.com/sebify/status/1098161110476312582">Sebastiano Mandalà</a> “Just my two cents as everything has been said: what about something as simple as struct layout and how padding and memory alignment and order of the fields may affect the cache line performance? It’s something I have to investigate myself too”</li>
<li><a href="https://twitter.com/NinoFloris/status/1098433286899146753">Nino Floris</a> “Constants embedding via readonlyspan, stackalloc, finalizers, WeakReference, open delegates, MethodImplOptions, MemoryBarriers, TypedReference, varargs, SIMD, Unsafe.AsRef can coerce struct types if layout matches exactly (used for a.o. TaskAwaiter and its <T> version)"</T></li>
</ul>
<hr />
<p><strong>So in summary, I would say that C# certainly lets you write code that looks a lot like C++ and in conjunction with the Runtime and Base-Class Libraries it gives you a lot of low-level functionality</strong></p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=19280049">Hacker News</a>, <a href="https://old.reddit.com/r/programming/comments/aw4ig7/is_c_a_lowlevel_language/?sort=top">/r/programming</a>, <a href="https://old.reddit.com/r/dotnet/comments/aw4ilf/is_c_a_lowlevel_language/?sort=top">/r/dotnet</a> or <a href="https://old.reddit.com/r/csharp/comments/aw4ij6/is_c_a_lowlevel_language/?sort=top">/r/csharp</a></p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://www.youtube.com/watch?v=7GTpwgsmHgU">Patterns for high-performance C#.</a> by <a href="https://twitter.com/federicolois">Federico Andres Lois</a></li>
<li><a href="https://blogs.msdn.microsoft.com/ricom/2005/05/10/performance-quiz-6-chineseenglish-dictionary-reader/">Performance Quiz #6 — Chinese/English Dictionary reader</a> (From 2005, 2 Microsoft bloggers have a ‘performance’ battle, C++ v. C#)</li>
<li><a href="https://blogs.msdn.microsoft.com/ricom/2005/05/20/performance-quiz-6-conclusion-studying-the-space/">Performance Quiz #6 — Conclusion, Studying the Space</a></li>
<li><a href="https://stackoverflow.com/a/138406">How much faster is C++ than C#?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/jonathanh/2005/05/20/optimizing-managed-c-vs-native-c-code/">Optimizing managed C# vs. native C++ code</a> (2005)</li>
</ul>
<p>The Unity ‘Burst’ Compiler:</p>
<ul>
<li><a href="https://blogs.unity3d.com/2019/02/26/on-dots-c-c/">How Unity is making (a subset of) C# as fast as C++</a></li>
<li><a href="http://infalliblecode.com/unity-burst-compiler/">Unity Burst Compiler: Performance Optimization Made Easy</a></li>
<li><a href="http://aras-p.info/blog/2018/03/28/Daily-Pathtracer-Part-3-CSharp-Unity-Burst/">Daily Pathtracer Part 3: C# & Unity & Burst</a></li>
<li><a href="https://lucasmeijer.com/posts/cpp_unity/">C++, C# and Unity</a></li>
<li><a href="https://www.youtube.com/watch?v=QkM6zEGFhDY">Deep Dive into the Burst Compiler - Unite LA</a></li>
</ul>
"Stack Walking" in the .NET Runtime2019-01-21T00:00:00+00:00http://www.mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime
<p>What is ‘stack walking’, well as always the ‘Book of the Runtime’ (BotR) helps us, from the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md">relevant page</a>:</p>
<blockquote>
<p>The CLR makes heavy use of a technique known as stack walking (or stack crawling). This involves <strong>iterating the sequence of call frames for a particular thread</strong>, from the most recent (the thread’s current function) back down to the base of the stack.</p>
<p><strong>The runtime uses stack walks for a number of purposes</strong>:</p>
<ul>
<li>The runtime walks the stacks of all threads <strong>during garbage collection, looking for managed roots</strong> (local variables holding object references in the frames of managed methods that need to be reported to the GC to keep the objects alive and possibly track their movement if the GC decides to compact the heap).</li>
<li>On some platforms the stack walker is used during the <strong>processing of exceptions</strong> (looking for handlers in the first pass and unwinding the stack in the second).</li>
<li>The <strong>debugger uses the functionality</strong> when generating managed stack traces.</li>
<li>Various miscellaneous methods, usually those close to some public managed API, perform a stack walk <strong>to pick up information about their caller</strong> (such as the method, class or assembly of that caller).</li>
</ul>
</blockquote>
<p><strong>The rest of this post will explore what ‘Stack Walking’ is, how it works and why so many parts of the runtime need to be involved.</strong></p>
<hr />
<p><strong>Table of Contents</strong></p>
<ul>
<li><a href="#where-does-the-clr-use-stack-walking">Where does the CLR use ‘Stack Walking’?</a>
<ul>
<li><a href="#common-scenarios">Common Scenarios</a></li>
<li><a href="#debuggingdiagnostics">Debugging/Diagnostics</a></li>
<li><a href="#obscure-scenarios">Obscure Scenarios</a></li>
<li><a href="#stack-crawl-marks">Stack Crawl Marks</a></li>
<li><a href="#exception-handling">Exception Handling</a></li>
</ul>
</li>
<li><a href="#the-stack-walking-api">The ‘Stack Walking’ API</a>
<ul>
<li><a href="#how-to-use-it">How to use it</a></li>
<li><a href="#how-it-works">How it works</a></li>
<li><a href="#see-it-in-action">See it ‘in Action’</a></li>
</ul>
</li>
<li><a href="#unwinding-native-code">Unwinding ‘Native’ Code</a>
<ul>
<li><a href="#frames">Frames</a></li>
<li><a href="#helper-method-frames">‘Helper Method’ Frames</a></li>
<li><a href="#native-unwind-information">Native Unwind Information</a></li>
<li><a href="#differences-between-windows-and-unix">Differences between Windows and Unix</a></li>
</ul>
</li>
<li><a href="#unwinding-jitted-code">Unwinding ‘JITted’ Code</a>
<ul>
<li><a href="#help-from-the-jit-compiler">Help from the ‘JIT Compiler’</a></li>
</ul>
</li>
<li><a href="#further-reading">Further Reading</a>
<ul>
<li><a href="#stack-unwinding-general">Stack Unwinding (general)</a></li>
<li><a href="#stack-unwinding-other-runtimes">Stack Unwinding (other runtimes)</a></li>
</ul>
</li>
</ul>
<hr />
<h2 id="where-does-the-clr-use-stack-walking">Where does the CLR use ‘Stack Walking’?</h2>
<p>Before we dig into the ‘internals’, let’s take a look at where the runtime utilises ‘stack walking’, below is the full list (as of .NET Core CLR ‘Release 2.2’). All these examples end up calling into the <code class="language-plaintext highlighter-rouge">Thread::StackWalkFrames(..)</code> method <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L978-L1042">here</a> and provide a <code class="language-plaintext highlighter-rouge">callback</code> that is triggered whenever the API encounters a new section of the stack (see <a href="#how-to-use-it">How to use it</a> below for more info).</p>
<h3 id="common-scenarios">Common Scenarios</h3>
<ul>
<li><strong>Garbage Collection (GC)</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">ScanStackRoots(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/gcenv.ee.cpp#L71-L151">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/gcenv.ee.common.cpp#L184-L293">callback</a></li>
</ul>
</li>
<li><strong>Exception Handling</strong> (unwinding)
<ul>
<li><code class="language-plaintext highlighter-rouge">x86</code> - <code class="language-plaintext highlighter-rouge">UnwindFrames(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/excep.cpp#L2199-L2232">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/i386/excepx86.cpp#L2718-L3119">callback</a></li>
<li><code class="language-plaintext highlighter-rouge">x64</code> - <code class="language-plaintext highlighter-rouge">ResetThreadAbortState(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/excep.cpp#L12770-L12868">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/excep.cpp#L12728-L12767">callback</a></li>
</ul>
</li>
<li><strong>Exception Handling</strong> (resumption):
<ul>
<li><code class="language-plaintext highlighter-rouge">ExceptionTracker::FindNonvolatileRegisterPointers(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/exceptionhandling.cpp#L357-L436">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/exceptionhandling.cpp#L249-L354">callback</a></li>
<li><code class="language-plaintext highlighter-rouge">ExceptionTracker::RareFindParentStackFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/exceptionhandling.cpp#L6991-L7031">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/exceptionhandling.cpp#L6924-L6989">callback</a></li>
</ul>
</li>
<li><strong>Threads</strong>:
<ul>
<li><code class="language-plaintext highlighter-rouge">Thread::IsRunningIn(..)</code> (AppDomain) <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threads.cpp#L8402-L8428">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threads.cpp#L8368-L8396">callback</a></li>
<li><code class="language-plaintext highlighter-rouge">Thread::DetectHandleILStubsForDebugger(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threads.cpp#L219-L282">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threads.cpp#L205-L217">callback</a></li>
</ul>
</li>
<li><strong>Thread Suspension</strong>:
<ul>
<li><code class="language-plaintext highlighter-rouge">Thread::IsExecutingWithinCer()</code> (‘Constrained Execution Region’) <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L962-L1006">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L831-L960">wrapper</a> and <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L672-L829">callback</a>)</li>
<li><code class="language-plaintext highlighter-rouge">Thread::HandledJITCase(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L6853-L6975">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L6130-L6312">callback</a>, <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threadsuspend.cpp#L6498-L6544">alternative callback</a></li>
</ul>
</li>
</ul>
<h3 id="debuggingdiagnostics">Debugging/Diagnostics</h3>
<ul>
<li><strong>Debugger</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">DebuggerWalkStack(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/ee/frameinfo.cpp#L2061-L2188">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/ee/frameinfo.cpp#L1367-L1874">callback</a></li>
<li><code class="language-plaintext highlighter-rouge">DebuggerWalkStackProc()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/ee/frameinfo.cpp#L1367-L1874">here</a> (called from <code class="language-plaintext highlighter-rouge">DebuggerWalkStack(..)</code>) -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/ee/frameinfo.cpp#L952-L1240">callback</a></li>
</ul>
</li>
<li><strong>Managed APIs</strong> (e.g <code class="language-plaintext highlighter-rouge">System.Diagnostics.StackTrace</code>)
<ul>
<li>Managed code calls via an <code class="language-plaintext highlighter-rouge">InternalCall</code> (C#) <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/mscorlib/src/System/Diagnostics/Stacktrace.cs#L317-L318">here</a> into <code class="language-plaintext highlighter-rouge">DebugStackTrace::GetStackFramesInternal(..)</code> (C++) <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/debugdebugger.cpp#L327-L800">here</a></li>
<li>Before ending up in <code class="language-plaintext highlighter-rouge">DebugStackTrace::GetStackFramesHelper(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/debugdebugger.cpp#L852-L956">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/debugdebugger.cpp#L976-L1060">callback</a></li>
</ul>
</li>
<li><strong>DAC (via by SOS)</strong> - Scan for GC ‘Roots’
<ul>
<li><code class="language-plaintext highlighter-rouge">DacStackReferenceWalker::WalkStack<..>(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/daccess/dacimpl.h#L1973-L2022">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/debug/daccess/daccess.cpp#L8466-L8638">callback</a></li>
</ul>
</li>
<li><strong>Profiling API</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">ProfToEEInterfaceImpl::ProfilerStackWalkFramesWrapper(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/proftoeeinterfaceimpl.cpp#L7624-L7652">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/proftoeeinterfaceimpl.cpp#L7177-L7286">callback</a></li>
</ul>
</li>
<li><strong>Event Pipe</strong> (Diagnostics)
<ul>
<li><code class="language-plaintext highlighter-rouge">EventPipe::WalkManagedStackForThread(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eventpipe.cpp#L971-L994">here</a> -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eventpipe.cpp#L996-L1029">callback</a></li>
</ul>
</li>
<li><strong>CLR prints a Stack Trace</strong> (to the console/log, DEBUG builds only)
<ul>
<li><code class="language-plaintext highlighter-rouge">PrintStackTrace()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/debughelp.cpp#L1015-L1109">here</a> (and other functions) -> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/debughelp.cpp#L881-L1013">callback</a></li>
</ul>
</li>
</ul>
<h3 id="obscure-scenarios">Obscure Scenarios</h3>
<ul>
<li><strong>Reflection</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">RuntimeMethodHandle::GetCurrentMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/reflectioninvocation.cpp#L1487-L1511">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/reflectioninvocation.cpp#L1449-L1485">callback</a>)</li>
</ul>
</li>
<li><strong>Application (App) Domains</strong> (See ‘Stack Crawl Marks’ below)
<ul>
<li><code class="language-plaintext highlighter-rouge">SystemDomain::GetCallersMethod(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/appdomain.cpp#L3389-L3417">here</a> (also <code class="language-plaintext highlighter-rouge">GetCallersType(..)</code> and <code class="language-plaintext highlighter-rouge">GetCallersModule(..)</code>) (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/appdomain.cpp#L3520-L3664">callback</a>)</li>
<li><code class="language-plaintext highlighter-rouge">SystemDomain::GetCallersModule(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/appdomain.cpp#L3494-L3518">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/appdomain.cpp#L3666-L3686">callback</a>)</li>
</ul>
</li>
<li><strong>‘Code Pitching’</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">CheckStacksAndPitch()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codepitchingmanager.cpp#L446-L501">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codepitchingmanager.cpp#L340-L347">wrapper</a> and <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codepitchingmanager.cpp#L304-L338">callback</a>)</li>
</ul>
</li>
<li><strong>Extensible Class Factory</strong> (<code class="language-plaintext highlighter-rouge">System.Runtime.InteropServices.ExtensibleClassFactory</code>)
<ul>
<li><code class="language-plaintext highlighter-rouge">RegisterObjectCreationCallback(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/extensibleclassfactory.cpp#L72-L130">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/extensibleclassfactory.cpp#L23-L69">callback</a>)</li>
</ul>
</li>
<li><strong>Stack Sampler</strong> (unused?)
<ul>
<li><code class="language-plaintext highlighter-rouge">StackSampler::ThreadProc()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stacksampler.cpp#L264-L331">here</a> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stacksampler.cpp#L217-L224">wrapper</a> and <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stacksampler.cpp#L226-L262">callback</a>)</li>
</ul>
</li>
</ul>
<h3 id="stack-crawl-marks">Stack Crawl Marks</h3>
<p>One of the above scenarios deserves a closer look, but firstly why are ‘stack crawl marks’ used, from <a href="https://github.com/dotnet/coreclr/issues/21629#issuecomment-449225852">coreclr/issues/#21629 (comment)</a>:</p>
<blockquote>
<p>Unfortunately, there is a ton of legacy APIs that were added during netstandard2.0 push whose behavior depend on the caller. <strong>The caller is basically passed in as an implicit argument to the API</strong>. Most of these StackCrawlMarks are there to support these APIs…</p>
</blockquote>
<p>So we can see that multiple functions within the CLR itself need to have knowledge of their <strong>caller</strong>. To understand this some more, let’s look an example, the <code class="language-plaintext highlighter-rouge">GetType(string typeName)</code> <a href="https://docs.microsoft.com/en-us/dotnet/api/system.type.gettype?view=netframework-4.7.2#System_Type_GetType_System_String_">method</a>. Here’s the flow from the externally-visible method all the way down to where the work is done, note how a <code class="language-plaintext highlighter-rouge">StackCrawlMark</code> instance is passed through:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Type::GetType(string typeName)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/System.Private.CoreLib/src/System/Type.CoreCLR.cs#L38-L43">implementation</a> (Creates <code class="language-plaintext highlighter-rouge">StackCrawlMark.LookForMyCaller</code>)</li>
<li><code class="language-plaintext highlighter-rouge">RuntimeType::GetType(.., ref StackCrawlMark stackMark)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/System.Private.CoreLib/src/System/RtType.cs#L1741-L1749">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">RuntimeType::GetTypeByName(.., ref StackCrawlMark stackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/System.Private.CoreLib/src/System/RuntimeHandles.cs#L431-L459">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">extern void GetTypeByName(.., ref StackCrawlMark stackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/System.Private.CoreLib/src/System/RuntimeHandles.cs#L426-L429">definition</a> (call into native code, i.e. <code class="language-plaintext highlighter-rouge">[DllImport(JitHelpers.QCall, ..)]</code>)</li>
<li><code class="language-plaintext highlighter-rouge">RuntimeTypeHandle::GetTypeByName(.., QCall::StackCrawlMarkHandle pStackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/runtimehandles.cpp#L1433-L1463">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">TypeHandle TypeName::GetTypeManaged(.., StackCrawlMark* pStackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/typeparse.cpp#L1178-L1271">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">TypeHandle TypeName::GetTypeWorker(.. , StackCrawlMark* pStackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/typeparse.cpp#L1405-L1662">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">SystemDomain::GetCallersAssembly(StackCrawlMark *stackMark,..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/appdomain.cpp#L3430-L3438">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">SystemDomain::GetCallersModule(StackCrawlMark* stackMark, ..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/appdomain.cpp#L3394-L3421">implementation</a></li>
<li><code class="language-plaintext highlighter-rouge">SystemDomain::CallersMethodCallbackWithStackMark(..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/appdomain.cpp#L3467-L3610">callback implementation</a></li>
</ul>
<p>In addition the JIT (via the VM) has to ensure that all relevant methods are available in the call-stack, i.e. they can’t be removed:</p>
<ul>
<li>Prevent in-lining <code class="language-plaintext highlighter-rouge">CEEInfo::canInline(..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/jitinterface.cpp#L7847-L7854">implementation</a></li>
<li>Prevent removal via a ‘tail call’ <code class="language-plaintext highlighter-rouge">CEEInfo::canTailCall(..)</code> <a href="https://github.com/dotnet/coreclr/blob/606c246/src/vm/jitinterface.cpp#L8321-L8332">implementation</a></li>
</ul>
<p>However, the <code class="language-plaintext highlighter-rouge">StackCrawlMark</code> feature is currently being <em>cleaned</em> up, so it may look different in the future:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/9342">Remove NoInlining/StackCrawlMarks from Tasks</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/21812">Remove stack marks from GetSatelliteAssembly</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/21054">Delete unnecessary StackCrawlMarks in RtFieldInfo</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/21783">Avoid passing stack crawl mark unnecessarily deep in the call stack</a> (the example shown above!!)</li>
</ul>
<h3 id="exception-handling">Exception Handling</h3>
<p>The place that most .NET Developers will run into ‘stack traces’ is when dealing with exceptions. I originally intended to also describe ‘exception handling’ here, but then I opened up <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/exceptionhandling.cpp">/src/vm/exceptionhandling.cpp</a> and saw that it contained <strong>over 7,000</strong> lines of code!! So I decided that it can wait for a future post 😁.</p>
<p>However, if you want to learn more about the ‘internals’ I really recommend Chris Brumme’s post <a href="https://blogs.msdn.microsoft.com/cbrumme/2003/10/01/the-exception-model/">The Exception Model</a> (2003) which is the definitive guide on the topic (also see his <a href="https://channel9.msdn.com/Search?term=Christopher%20Brumme&lang-en=true">Channel9 Videos</a>) and as always, the ‘BotR’ chapter <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/exceptions.md">‘What Every (<em>Runtime</em>) Dev needs to Know About Exceptions in the Runtime’</a> is well worth a read.</p>
<p>Also, I recommend talking a look at the slides from the <a href="https://blog.adamfurmanek.pl/wp-content/uploads/2018/06/Internals_of_exceptions.pdf">‘Internals of Exceptions’ talk’</a> and the related post <a href="https://blog.adamfurmanek.pl/blog/2016/10/01/handling-and-rethrowing-exceptions-in-c/">.NET Inside Out Part 2 — Handling and rethrowing exceptions in C#</a> both by <a href="https://twitter.com/furmanekadam">Adam Furmanek</a>.</p>
<hr />
<h2 id="the-stack-walking-api">The ‘Stack Walking’ API</h2>
<p>Now that we’ve seen <em>where</em> it’s used, let’s look at the ‘stack walking’ API itself. Firstly, <em>how</em> is it used?</p>
<h3 id="how-to-use-it">How to use it</h3>
<p>It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the <code class="language-plaintext highlighter-rouge">StackTrace</code> <a href="https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.stacktrace?view=netframework-4.7.2">class</a>, only the runtime itself can call into <code class="language-plaintext highlighter-rouge">Thread::StackWalkFrames(..)</code> directly. The simplest usage in the runtime is <code class="language-plaintext highlighter-rouge">EventPipe::WalkManagedStackForThread(..)</code> (see <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eventpipe.cpp#L971-L994">here</a>), which is shown below. As you can see it’s as simple as specifying the relevant flags, in this case <code class="language-plaintext highlighter-rouge">ALLOW_ASYNC_STACK_WALK | FUNCTIONSONLY | HANDLESKIPPEDFRAMES | ALLOW_INVALID_OBJECTS</code> and then providing the callback, which in the EventPipe class is the <code class="language-plaintext highlighter-rouge">StackWalkCallback</code> method (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eventpipe.cpp#L996-L102">here</a>)</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="n">EventPipe</span><span class="o">::</span><span class="n">WalkManagedStackForThread</span><span class="p">(</span><span class="n">Thread</span> <span class="o">*</span><span class="n">pThread</span><span class="p">,</span> <span class="n">StackContents</span> <span class="o">&</span><span class="n">stackContents</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CONTRACTL</span>
<span class="p">{</span>
<span class="n">NOTHROW</span><span class="p">;</span>
<span class="n">GC_NOTRIGGER</span><span class="p">;</span>
<span class="n">MODE_ANY</span><span class="p">;</span>
<span class="n">PRECONDITION</span><span class="p">(</span><span class="n">pThread</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">CONTRACTL_END</span><span class="p">;</span>
<span class="c1">// Calling into StackWalkFrames in preemptive mode violates the host contract,</span>
<span class="c1">// but this contract is not used on CoreCLR.</span>
<span class="n">CONTRACT_VIOLATION</span><span class="p">(</span> <span class="n">HostViolation</span> <span class="p">);</span>
<span class="n">stackContents</span><span class="p">.</span><span class="n">Reset</span><span class="p">();</span>
<span class="n">StackWalkAction</span> <span class="n">swaRet</span> <span class="o">=</span> <span class="n">pThread</span><span class="o">-></span><span class="n">StackWalkFrames</span><span class="p">(</span>
<span class="p">(</span><span class="n">PSTACKWALKFRAMESCALLBACK</span><span class="p">)</span> <span class="o">&</span><span class="n">StackWalkCallback</span><span class="p">,</span>
<span class="o">&</span><span class="n">stackContents</span><span class="p">,</span>
<span class="n">ALLOW_ASYNC_STACK_WALK</span> <span class="o">|</span> <span class="n">FUNCTIONSONLY</span> <span class="o">|</span> <span class="n">HANDLESKIPPEDFRAMES</span> <span class="o">|</span> <span class="n">ALLOW_INVALID_OBJECTS</span><span class="p">);</span>
<span class="k">return</span> <span class="p">((</span><span class="n">swaRet</span> <span class="o">==</span> <span class="n">SWA_DONE</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="n">swaRet</span> <span class="o">==</span> <span class="n">SWA_CONTINUE</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">StackWalkFrame(..)</code> function then does the <em>heavy-lifting</em> of actually walking the stack, before triggering the callback shown below. In this case it just records the ‘Instruction Pointer’ (IP/CP) and the ‘managed function’, which is an instance of the <code class="language-plaintext highlighter-rouge">MethodDesc</code> obtained via the <code class="language-plaintext highlighter-rouge">pCf->GetFunction()</code> call:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">StackWalkAction</span> <span class="n">EventPipe</span><span class="o">::</span><span class="n">StackWalkCallback</span><span class="p">(</span><span class="n">CrawlFrame</span> <span class="o">*</span><span class="n">pCf</span><span class="p">,</span> <span class="n">StackContents</span> <span class="o">*</span><span class="n">pData</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CONTRACTL</span>
<span class="p">{</span>
<span class="n">NOTHROW</span><span class="p">;</span>
<span class="n">GC_NOTRIGGER</span><span class="p">;</span>
<span class="n">MODE_ANY</span><span class="p">;</span>
<span class="n">PRECONDITION</span><span class="p">(</span><span class="n">pCf</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="n">PRECONDITION</span><span class="p">(</span><span class="n">pData</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">CONTRACTL_END</span><span class="p">;</span>
<span class="c1">// Get the IP.</span>
<span class="n">UINT_PTR</span> <span class="n">controlPC</span> <span class="o">=</span> <span class="p">(</span><span class="n">UINT_PTR</span><span class="p">)</span><span class="n">pCf</span><span class="o">-></span><span class="n">GetRegisterSet</span><span class="p">()</span><span class="o">-></span><span class="n">ControlPC</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">controlPC</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pData</span><span class="o">-></span><span class="n">GetLength</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// This happens for pinvoke stubs on the top of the stack.</span>
<span class="k">return</span> <span class="n">SWA_CONTINUE</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">_ASSERTE</span><span class="p">(</span><span class="n">controlPC</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// Add the IP to the captured stack.</span>
<span class="n">pData</span><span class="o">-></span><span class="n">Append</span><span class="p">(</span><span class="n">controlPC</span><span class="p">,</span> <span class="n">pCf</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">());</span>
<span class="c1">// Continue the stack walk.</span>
<span class="k">return</span> <span class="n">SWA_CONTINUE</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="how-it-works">How it works</h3>
<p>Now onto the most interesting part, how to the runtime actually walks the stack. Well, first let’s understand what the stack looks like, from the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md">‘BotR’ page</a>:</p>
<p><img src="/images/2019/01/Stack Description from BotR.png" alt="Stack Description from BotR" /></p>
<p>The main thing to note is that a .NET ‘stack’ can contain 3 types of methods:</p>
<ol>
<li><strong>Managed</strong> - this represents code that started off as C#/F#/VB.NET, was turned into IL and then finally compiled to native code by the ‘JIT Compiler’.</li>
<li><strong>Unmanaged</strong> - completely <em>native</em> code that exists outside of the runtime, i.e. a OS function the runtime calls into or a user call via <code class="language-plaintext highlighter-rouge">P/Invoke</code>. The runtime <em>only</em> cares about transitions <em>into</em> or <em>out of</em> regular unmanaged code, is doesn’t care about the stack frame within it.</li>
<li><strong>Runtime Managed</strong> - still <em>native</em> code, but this is slightly different because the runtime case more about this code. For example there are quite a few parts of the Base-Class libraries that make use of <code class="language-plaintext highlighter-rouge">InternalCall</code> methods, for more on this see the <a href="#helper-method-frames">‘Helper Method’ Frames</a> section later on.</li>
</ol>
<p>So the ‘stack walk’ has to deal with these different scenarios as it proceeds. Now let’s look at the ‘code flow’ starting with the entry-point method <code class="language-plaintext highlighter-rouge">StackWalkFrames(..)</code>:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Thread::StackWalkFrames(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L978-L1042">here</a>
<ul>
<li>the entry-point function, the type of ‘stack walk’ can be controlled via <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/threads.h#L3302-L3361">these flags</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">Thread::StackWalkFramesEx(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L899-L976">here</a>
<ul>
<li>worker-function that sets up the <code class="language-plaintext highlighter-rouge">StackFrameIterator</code>, via a call to <code class="language-plaintext highlighter-rouge">StackFrameIterator::Init(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L1150-L1274">here</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">StackFrameIterator::Next()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L1586-L1621">here</a>, then hands off to the primary <em>worker</em> method <code class="language-plaintext highlighter-rouge">StackFrameIterator::NextRaw()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L2291-L2761">here</a> that does 5 things:
<ol>
<li><code class="language-plaintext highlighter-rouge">CheckForSkippedFrames(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L3009-L3119">here</a>, deals with frames that may have been allocated inside a managed stack frame (e.g. an inlined p/invoke call).</li>
<li><code class="language-plaintext highlighter-rouge">UnwindStackFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eetwain.cpp#L4162-L4214">here</a>, in-turn calls:
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">x64</code></strong> - <code class="language-plaintext highlighter-rouge">Thread::VirtualUnwindCallFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L553-L671">here</a>, then calls <code class="language-plaintext highlighter-rouge">VirtualUnwindNonLeafCallFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L711-L757">here</a> or <code class="language-plaintext highlighter-rouge">VirtualUnwindLeafCallFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L676-L708">here</a>. All of of these functions make use of the <a href="https://docs.microsoft.com/en-us/windows/desktop/api/winnt/nf-winnt-rtllookupfunctionentry">Windows API function</a> <code class="language-plaintext highlighter-rouge">RtlLookupFunctionEntry(..)</code> to do the actual unwinding.</li>
<li><strong><code class="language-plaintext highlighter-rouge">x86</code></strong> - <code class="language-plaintext highlighter-rouge">::UnwindStackFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eetwain.cpp#L4012-L4107">here</a>, in turn calls <code class="language-plaintext highlighter-rouge">UnwindEpilog(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eetwain.cpp#L3528-L3557">here</a> and <code class="language-plaintext highlighter-rouge">UnwindEspFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/eetwain.cpp#L3663-L3721">here</a>. Unlike <code class="language-plaintext highlighter-rouge">x64</code>, under <code class="language-plaintext highlighter-rouge">x86</code> all the ‘stack-unwinding’ is done manually, within the CLR code.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">PostProcessingForManagedFrames(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L3193-L3229">here</a>, determines if the stack-walk is actually within a <strong>managed method</strong> rather than a <strong>native frame</strong>.</li>
<li><code class="language-plaintext highlighter-rouge">ProcessIp(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L2786-L2800">here</a> has the job of looking up the current <strong>managed method</strong> (if any) based on the current <strong>instruction pointer</strong> (IP). It does this by calling into <code class="language-plaintext highlighter-rouge">EECodeInfo::Init(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/jitinterface.cpp#L13948-L13976">here</a> and then ends up in one of:
<ul>
<li><code class="language-plaintext highlighter-rouge">EEJitManager::JitCodeToMethodInfo(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L3631-L3676">here</a>, that uses a very cool looking data structure refereed to as a <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/inc/nibblemapmacros.h#L12-L26">‘nibble map’</a></li>
<li><code class="language-plaintext highlighter-rouge">NativeImageJitManager::JitCodeToMethodInfo(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L5428-L5616">here</a></li>
<li><code class="language-plaintext highlighter-rouge">ReadyToRunJitManager::JitCodeToMethodInfo(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L6875-L6953">here</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ProcessCurrentFrame(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L2802-L3007">here</a>, does some final house-keeping and tidy-up.</li>
</ol>
</li>
<li><code class="language-plaintext highlighter-rouge">CrawlFrame::GotoNextFrame()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L369-L390">here</a>
<ul>
<li>in-turn calls <code class="language-plaintext highlighter-rouge">pFrame->Next()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/frames.h#L836-L840">here</a> to walk through the ‘linked list’ of frames which drive the ‘stack walk’ (more on these ‘frames’ later)</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">StackFrameIterator::Filter()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L1623-L2289">here</a>
<ul>
<li>essentially a <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L1677-L2271">huge <code class="language-plaintext highlighter-rouge">switch</code> statement</a> that handles all the different <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.h#L602-L613">Frame States</a> and decides whether or not the ‘stack walk’ should continue.</li>
</ul>
</li>
</ul>
<p>When it gets a valid frame it triggers the callback in <code class="language-plaintext highlighter-rouge">Thread::MakeStackwalkerCallback(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L859-L891">here</a> and passes in a pointer to the current <code class="language-plaintext highlighter-rouge">CrawlFrame</code> class <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.h#L68-L496">defined here</a>, this exposes methods such as <code class="language-plaintext highlighter-rouge">IsFrameless()</code>, <code class="language-plaintext highlighter-rouge">GetFunction()</code> and <code class="language-plaintext highlighter-rouge">GetThisPointer()</code>. The <code class="language-plaintext highlighter-rouge">CrawlFrame</code> actually represents 2 scenarios, based on the current IP:</p>
<ul>
<li><strong>Native</strong> code, represented by a <code class="language-plaintext highlighter-rouge">Frame</code> class <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/frames.h#L378-L284">defined here</a>, which we’ll discuss more in a moment.</li>
<li><strong>Managed</strong> code, well technically ‘managed code’ that was JITted to ‘native code’, so more accurately a <strong>managed stack frame</strong>. In this situation the <code class="language-plaintext highlighter-rouge">MethodDesc</code> class <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/method.hpp#L187-L1879">defined here</a> is provided, you can read more about this key CLR data-structure in <a href="https://github.com/dotnet/coreclr/blob/release/2.2/Documentation/botr/method-descriptor.md">the corresponding BotR chapter</a>.</li>
</ul>
<h3 id="see-it-in-action">See it ‘in Action’</h3>
<p>Fortunately we’re able to turn on some nice diagnostics in a debug build of the CLR (<code class="language-plaintext highlighter-rouge">COMPLUS_LogEnable</code>, <code class="language-plaintext highlighter-rouge">COMPLUS_LogToFile</code> & <code class="language-plaintext highlighter-rouge">COMPLUS_LogFacility</code>). With that in place, given C# code like this:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">internal</span> <span class="k">class</span> <span class="nc">Program</span> <span class="p">{</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">()</span> <span class="p">{</span>
<span class="nf">MethodA</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">void</span> <span class="nf">MethodA</span><span class="p">()</span> <span class="p">{</span>
<span class="nf">MethodB</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">void</span> <span class="nf">MethodB</span><span class="p">()</span> <span class="p">{</span>
<span class="nf">MethodC</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">void</span> <span class="nf">MethodC</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">var</span> <span class="n">stackTrace</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StackTrace</span><span class="p">(</span><span class="n">fNeedFileInfo</span><span class="p">:</span> <span class="k">true</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">stackTrace</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We get the output shown below, in which you can see the ‘stack walking’ process. It starts in <code class="language-plaintext highlighter-rouge">InitializeSourceInfo</code> and <code class="language-plaintext highlighter-rouge">CaptureStackTrace</code> which are methods internal to the <code class="language-plaintext highlighter-rouge">StackTrace</code> class (see <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/mscorlib/src/System/Diagnostics/Stacktrace.cs#L351-L407">here</a>), before moving up the stack <code class="language-plaintext highlighter-rouge">MethodC</code> -> <code class="language-plaintext highlighter-rouge">MethodB</code> -> <code class="language-plaintext highlighter-rouge">MethodA</code> and then finally stopping in the <code class="language-plaintext highlighter-rouge">Main</code> function. Along the way its does a ‘FILTER’ and ‘CONSIDER’ step before actually unwinding (‘finished unwind for …’):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>TID 4740: STACKWALK starting with partial context
TID 4740: STACKWALK: [000] FILTER : EXPLICIT : PC= 00000000`00000000 SP= 00000000`00000000 Frame= 00000002`9977cc48 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [001] CONSIDER: EXPLICIT : PC= 00000000`00000000 SP= 00000000`00000000 Frame= 00000002`9977cc48 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [001] FILTER : EXPLICIT : PC= 00000000`00000000 SP= 00000000`00000000 Frame= 00000002`9977cc48 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [002] CONSIDER: EXPLICIT : PC= 00000000`00000000 SP= 00000000`00000000 Frame= 00000002`9977cdd8 vtbl= 00007ffd`74995220
TID 4740: STACKWALK LazyMachState::unwindLazyState(ip:00007FFD7439C45C,sp:000000029977C338)
TID 4740: STACKWALK: [002] CALLBACK: EXPLICIT : PC= 00000000`00000000 SP= 00000000`00000000 Frame= 00000002`9977cdd8 vtbl= 00007ffd`74995220
TID 4740: STACKWALK HelperMethodFrame::UpdateRegDisplay cached ip:00007FFD72FE9258, sp:000000029977D300
TID 4740: STACKWALK: [003] CONSIDER: FRAMELESS: PC= 00007ffd`72fe9258 SP= 00000002`9977d300 method=InitializeSourceInfo
TID 4740: STACKWALK: [003] CALLBACK: FRAMELESS: PC= 00007ffd`72fe9258 SP= 00000002`9977d300 method=InitializeSourceInfo
TID 4740: STACKWALK: [004] about to unwind for 'InitializeSourceInfo', SP: 00000002`9977d300 , IP: 00007ffd`72fe9258
TID 4740: STACKWALK: [004] finished unwind for 'InitializeSourceInfo', SP: 00000002`9977d480 , IP: 00007ffd`72eeb671
TID 4740: STACKWALK: [004] CONSIDER: FRAMELESS: PC= 00007ffd`72eeb671 SP= 00000002`9977d480 method=CaptureStackTrace
TID 4740: STACKWALK: [004] CALLBACK: FRAMELESS: PC= 00007ffd`72eeb671 SP= 00000002`9977d480 method=CaptureStackTrace
TID 4740: STACKWALK: [005] about to unwind for 'CaptureStackTrace', SP: 00000002`9977d480 , IP: 00007ffd`72eeb671
TID 4740: STACKWALK: [005] finished unwind for 'CaptureStackTrace', SP: 00000002`9977d5b0 , IP: 00007ffd`72eeadd0
TID 4740: STACKWALK: [005] CONSIDER: FRAMELESS: PC= 00007ffd`72eeadd0 SP= 00000002`9977d5b0 method=.ctor
TID 4740: STACKWALK: [005] CALLBACK: FRAMELESS: PC= 00007ffd`72eeadd0 SP= 00000002`9977d5b0 method=.ctor
TID 4740: STACKWALK: [006] about to unwind for '.ctor', SP: 00000002`9977d5b0 , IP: 00007ffd`72eeadd0
TID 4740: STACKWALK: [006] finished unwind for '.ctor', SP: 00000002`9977d5f0 , IP: 00007ffd`14c620d3
TID 4740: STACKWALK: [006] CONSIDER: FRAMELESS: PC= 00007ffd`14c620d3 SP= 00000002`9977d5f0 method=MethodC
TID 4740: STACKWALK: [006] CALLBACK: FRAMELESS: PC= 00007ffd`14c620d3 SP= 00000002`9977d5f0 method=MethodC
TID 4740: STACKWALK: [007] about to unwind for 'MethodC', SP: 00000002`9977d5f0 , IP: 00007ffd`14c620d3
TID 4740: STACKWALK: [007] finished unwind for 'MethodC', SP: 00000002`9977d630 , IP: 00007ffd`14c62066
TID 4740: STACKWALK: [007] CONSIDER: FRAMELESS: PC= 00007ffd`14c62066 SP= 00000002`9977d630 method=MethodB
TID 4740: STACKWALK: [007] CALLBACK: FRAMELESS: PC= 00007ffd`14c62066 SP= 00000002`9977d630 method=MethodB
TID 4740: STACKWALK: [008] about to unwind for 'MethodB', SP: 00000002`9977d630 , IP: 00007ffd`14c62066
TID 4740: STACKWALK: [008] finished unwind for 'MethodB', SP: 00000002`9977d660 , IP: 00007ffd`14c62016
TID 4740: STACKWALK: [008] CONSIDER: FRAMELESS: PC= 00007ffd`14c62016 SP= 00000002`9977d660 method=MethodA
TID 4740: STACKWALK: [008] CALLBACK: FRAMELESS: PC= 00007ffd`14c62016 SP= 00000002`9977d660 method=MethodA
TID 4740: STACKWALK: [009] about to unwind for 'MethodA', SP: 00000002`9977d660 , IP: 00007ffd`14c62016
TID 4740: STACKWALK: [009] finished unwind for 'MethodA', SP: 00000002`9977d690 , IP: 00007ffd`14c61f65
TID 4740: STACKWALK: [009] CONSIDER: FRAMELESS: PC= 00007ffd`14c61f65 SP= 00000002`9977d690 method=Main
TID 4740: STACKWALK: [009] CALLBACK: FRAMELESS: PC= 00007ffd`14c61f65 SP= 00000002`9977d690 method=Main
TID 4740: STACKWALK: [00a] about to unwind for 'Main', SP: 00000002`9977d690 , IP: 00007ffd`14c61f65
TID 4740: STACKWALK: [00a] finished unwind for 'Main', SP: 00000002`9977d6d0 , IP: 00007ffd`742f9073
TID 4740: STACKWALK: [00a] FILTER : NATIVE : PC= 00007ffd`742f9073 SP= 00000002`9977d6d0
TID 4740: STACKWALK: [00b] CONSIDER: EXPLICIT : PC= 00007ffd`742f9073 SP= 00000002`9977d6d0 Frame= 00000002`9977de58 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [00b] FILTER : EXPLICIT : PC= 00007ffd`742f9073 SP= 00000002`9977d6d0 Frame= 00000002`9977de58 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [00c] CONSIDER: EXPLICIT : PC= 00007ffd`742f9073 SP= 00000002`9977d6d0 Frame= 00000002`9977e7e0 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: [00c] FILTER : EXPLICIT : PC= 00007ffd`742f9073 SP= 00000002`9977d6d0 Frame= 00000002`9977e7e0 vtbl= 00007ffd`74a105b0
TID 4740: STACKWALK: SWA_DONE: reached the end of the stack
</code></pre></div></div>
<p>To find out more, you can search for these diagnostic message in <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp">\vm\stackwalk.cpp</a>, e.g. in <code class="language-plaintext highlighter-rouge">Thread::DebugLogStackWalkInfo(..)</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L802-L856">here</a></p>
<hr />
<h2 id="unwinding-native-code">Unwinding ‘Native’ Code</h2>
<p>As explained in <a href="https://science.raphael.poss.name/go-calling-convention-x86-64.html#aside-exceptions-in-c-c">this excellent article</a>:</p>
<blockquote>
<p>There are fundamentally two main ways to implement exception propagation in an ABI (Application Binary Interface):</p>
<ul>
<li>
<p>“dynamic registration”, <strong>with frame pointers in each activation record, organized as a linked list</strong>. This makes stack unwinding fast at the expense of having to set up the frame pointer in each function that calls other functions. This is also simpler to implement.</p>
</li>
<li>
<p>“table-driven”, <strong>where the compiler and assembler create data structures alongside the program code to indicate which addresses of code correspond to which sizes of activation records</strong>. This is called “Call Frame Information” (CFI) data in e.g. the GNU tool chain. When an exception is generated, the data in this table is loaded to determine how to unwind. This makes exception propagation slower but the general case faster.</p>
</li>
</ul>
</blockquote>
<p>It turns out that .NET uses the ‘table-driven’ approach, for the reason explained in the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#the-stack-model">‘BotR’</a>:</p>
<blockquote>
<p>The exact definition of a frame varies from platform to platform and <strong>on many platforms there isn’t a hard definition of a frame format that all functions adhere to</strong> (x86 is an example of this). Instead the compiler is often free to optimize the exact format of frames. On such systems it is not possible to guarantee that a stackwalk will return 100% correct or complete results (for debugging purposes, debug symbols such as pdbs are used to fill in the gaps so that debuggers can generate more accurate stack traces).</p>
<p>This is not a problem for the CLR, however, since we do not require a fully generalized stack walk. <strong>Instead we are only interested in those frames that are managed (i.e. represent a managed method) or, to some extent, frames coming from unmanaged code used to implement part of the runtime itself</strong>. In particular there is no guarantee about fidelity of 3rd party unmanaged frames other than to note where such frames transition into or out of the runtime itself (i.e. one of the frame types we do care about).</p>
</blockquote>
<h3 id="frames">Frames</h3>
<p>To enable ‘unwinding’ of native code or more strictly the transitions ‘into’ and ‘out of’ native code, the CLR uses a mechanism of <code class="language-plaintext highlighter-rouge">Frames</code>, which are defined in the source code <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/frames.h#L7-L143">here</a>. These frames are arranged into a hierachy and there is one type of <code class="language-plaintext highlighter-rouge">Frame</code> for each scenario, for more info on these individual <code class="language-plaintext highlighter-rouge">Frames</code> take a look at the excellent source-code comments <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/frames.h#L145-L195">here</a>.</p>
<ul>
<li><strong>Frame</strong> (abstract/base class)
<ul>
<li><strong>GCFrame</strong></li>
<li><strong>FaultingExceptionFrame</strong></li>
<li><strong>HijackFrame</strong></li>
<li><strong>ResumableFrame</strong>
<ul>
<li>RedirectedThreadFrame</li>
</ul>
</li>
<li><strong>InlinedCallFrame</strong></li>
<li><strong>HelperMethodFrame</strong>
<ul>
<li>HelperMethodFrame_1OBJ</li>
<li>HelperMethodFrame_2OBJ</li>
<li>HelperMethodFrame_3OBJ</li>
<li>HelperMethodFrame_PROTECTOBJ</li>
</ul>
</li>
<li><strong>TransitionFrame</strong>
<ul>
<li>StubHelperFrame</li>
<li>SecureDelegateFrame
<ul>
<li>MulticastFrame</li>
</ul>
</li>
<li>FramedMethodFrame
<ul>
<li>ComPlusMethodFrame</li>
<li>PInvokeCalliFrame</li>
<li>PrestubMethodFrame</li>
<li>StubDispatchFrame</li>
<li>ExternalMethodFrame</li>
<li>TPMethodFrame</li>
</ul>
</li>
</ul>
</li>
<li><strong>UnmanagedToManagedFrame</strong>
<ul>
<li>ComMethodFrame
<ul>
<li>ComPrestubMethodFrame</li>
</ul>
</li>
<li>UMThkCallFrame</li>
</ul>
</li>
<li><strong>ContextTransitionFrame</strong></li>
<li><strong>TailCallFrame</strong></li>
<li><strong>ProtectByRefsFrame</strong></li>
<li><strong>ProtectValueClassFrame</strong></li>
<li><strong>DebuggerClassInitMarkFrame</strong></li>
<li><strong>DebuggerSecurityCodeMarkFrame</strong></li>
<li><strong>DebuggerExitFrame</strong></li>
<li><strong>DebuggerU2MCatchHandlerFrame</strong></li>
<li><strong>FuncEvalFrame</strong></li>
<li><strong>ExceptionFilterFrame</strong></li>
</ul>
</li>
</ul>
<h3 id="helper-method-frames">‘Helper Method’ Frames</h3>
<p>But to make sense of this, let’s look at one type of <code class="language-plaintext highlighter-rouge">Frame</code>, known as <code class="language-plaintext highlighter-rouge">HelperMethodFrame</code> (above). This is used when .NET code in the runtime calls into C++ code to do the heavy-lifting, often for performance reasons. One example is if you call <code class="language-plaintext highlighter-rouge">Environment.GetCommandLineArgs()</code> you end up <a href="https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/src/System/Environment.cs#L151-L180">in this code</a> (C#), but note that it ends up calling an <code class="language-plaintext highlighter-rouge">extern</code> method marked with <code class="language-plaintext highlighter-rouge">InternalCall</code>:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">MethodImplAttribute</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">InternalCall</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">extern</span> <span class="kt">string</span><span class="p">[]</span> <span class="nf">GetCommandLineArgsNative</span><span class="p">();</span>
</code></pre></div></div>
<p>This means that the rest of the method is implemented in the runtime in C++, you can see how the method call is <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/ecalllist.h#L153">wired up</a>, before ending up <code class="language-plaintext highlighter-rouge">SystemNative::GetCommandLineArgs</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/classlibnative/bcltype/system.cpp#L178-L221">here</a>, which is shown below:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FCIMPL0</span><span class="p">(</span><span class="n">Object</span><span class="o">*</span><span class="p">,</span> <span class="n">SystemNative</span><span class="o">::</span><span class="n">GetCommandLineArgs</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">FCALL_CONTRACT</span><span class="p">;</span>
<span class="n">PTRARRAYREF</span> <span class="n">strArray</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">HELPER_METHOD_FRAME_BEGIN_RET_1</span><span class="p">(</span><span class="n">strArray</span><span class="p">);</span> <span class="c1">// <-- 'Helper method Frame' started here</span>
<span class="c1">// Error handling and setup code removed for clarity</span>
<span class="n">strArray</span> <span class="o">=</span> <span class="p">(</span><span class="n">PTRARRAYREF</span><span class="p">)</span> <span class="n">AllocateObjectArray</span><span class="p">(</span><span class="n">numArgs</span><span class="p">,</span> <span class="n">g_pStringClass</span><span class="p">);</span>
<span class="c1">// Copy each argument into new Strings.</span>
<span class="k">for</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="n">numArgs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">STRINGREF</span> <span class="n">str</span> <span class="o">=</span> <span class="n">StringObject</span><span class="o">::</span><span class="n">NewString</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="n">STRINGREF</span> <span class="o">*</span> <span class="n">destData</span> <span class="o">=</span> <span class="p">((</span><span class="n">STRINGREF</span><span class="o">*</span><span class="p">)(</span><span class="n">strArray</span><span class="o">-></span><span class="n">GetDataPtr</span><span class="p">()))</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
<span class="n">SetObjectReference</span><span class="p">((</span><span class="n">OBJECTREF</span><span class="o">*</span><span class="p">)</span><span class="n">destData</span><span class="p">,</span> <span class="p">(</span><span class="n">OBJECTREF</span><span class="p">)</span><span class="n">str</span><span class="p">,</span> <span class="n">strArray</span><span class="o">-></span><span class="n">GetAppDomain</span><span class="p">());</span>
<span class="p">}</span>
<span class="k">delete</span> <span class="p">[]</span> <span class="n">argv</span><span class="p">;</span>
<span class="n">HELPER_METHOD_FRAME_END</span><span class="p">();</span> <span class="c1">// <-- 'Helper method Frame' ended/closed here</span>
<span class="k">return</span> <span class="n">OBJECTREFToObject</span><span class="p">(</span><span class="n">strArray</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">FCIMPLEND</span>
</code></pre></div></div>
<p><strong>Note</strong>: this code makes heavy use of macros, see <a href="https://gist.github.com/mattwarren/36e52b3f80a411ca5a6b7211c9f1a3a9">this gist</a> for the original code and then the expanded versions (Release and Debug). In addition, if you want more information on these mysterious <code class="language-plaintext highlighter-rouge">FCalls</code> as they are known (and the related <code class="language-plaintext highlighter-rouge">QCalls</code>) see <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/mscorlib.md">Mscorlib and Calling Into the Runtime</a> in the ‘BotR’.</p>
<p>But the main thing to look at in the code sample is the <code class="language-plaintext highlighter-rouge">HELPER_METHOD_FRAME_BEGIN_RET_1()</code> macro, with ultimately installs an instance of the <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/frames.h#L1435-L1492">HelperMethodFrame_1OBJ class</a>. The macro expands into code like this:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FrameWithCookie</span> <span class="o"><</span> <span class="n">HelperMethodFrame_1OBJ</span> <span class="o">></span> <span class="n">__helperframe</span><span class="p">(</span><span class="n">__me</span><span class="p">,</span> <span class="n">Frame</span><span class="o">::</span><span class="n">FRAME_ATTR_NONE</span><span class="p">,</span> <span class="p">(</span><span class="n">OBJECTREF</span> <span class="o">*</span> <span class="p">)</span> <span class="o">&</span> <span class="n">strArray</span><span class="p">);</span>
<span class="p">{</span>
<span class="n">__helperframe</span><span class="p">.</span><span class="n">Push</span><span class="p">();</span> <span class="c1">// <-- 'Helper method Frame' pushed</span>
<span class="n">Thread</span> <span class="o">*</span> <span class="n">CURRENT_THREAD</span> <span class="o">=</span> <span class="n">__helperframe</span><span class="p">.</span><span class="n">GetThread</span><span class="p">();</span>
<span class="k">const</span> <span class="kt">bool</span> <span class="n">CURRENT_THREAD_AVAILABLE</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">CURRENT_THREAD_AVAILABLE</span><span class="p">;;</span> <span class="p">{</span>
<span class="n">Exception</span> <span class="o">*</span> <span class="n">__pUnCException</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">Frame</span> <span class="o">*</span> <span class="n">__pUnCEntryFrame</span> <span class="o">=</span> <span class="p">(</span> <span class="o">&</span> <span class="n">__helperframe</span><span class="p">);</span>
<span class="kt">bool</span> <span class="n">__fExceptionCatched</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;;</span>
<span class="k">try</span> <span class="p">{;</span>
<span class="c1">// Original code from SystemNative::GetCommandLineArgs goes in here</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">Exception</span> <span class="o">*</span> <span class="n">__pException</span><span class="p">)</span> <span class="p">{;</span>
<span class="k">do</span> <span class="p">{}</span> <span class="k">while</span> <span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="n">__pUnCException</span> <span class="o">=</span> <span class="n">__pException</span><span class="p">;</span>
<span class="n">UnwindAndContinueRethrowHelperInsideCatch</span><span class="p">(</span><span class="n">__pUnCEntryFrame</span><span class="p">,</span> <span class="n">__pUnCException</span><span class="p">);</span>
<span class="n">__fExceptionCatched</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">__fExceptionCatched</span><span class="p">)</span> <span class="p">{;</span>
<span class="n">UnwindAndContinueRethrowHelperAfterCatch</span><span class="p">(</span><span class="n">__pUnCEntryFrame</span><span class="p">,</span> <span class="n">__pUnCException</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="n">__helperframe</span><span class="p">.</span><span class="n">Pop</span><span class="p">();</span> <span class="c1">// <-- 'Helper method Frame' popped</span>
<span class="p">};</span>
</code></pre></div></div>
<p><strong>Note</strong>: the <code class="language-plaintext highlighter-rouge">Push()</code> and <code class="language-plaintext highlighter-rouge">Pop()</code> against <code class="language-plaintext highlighter-rouge">_helperMethodFrame</code> that make it available for ‘stack walking’. You can also see the <code class="language-plaintext highlighter-rouge">try</code>/<code class="language-plaintext highlighter-rouge">catch</code> block that the CLR puts in place to ensure any exceptions from <em>native</em> code are turned into <em>managed</em> exceptions that C#/F#/VB.NET code can handle. If you’re interested the full macro-expansion is available <a href="https://gist.github.com/mattwarren/36e52b3f80a411ca5a6b7211c9f1a3a9#expanded-code---release---81-loc">in this gist</a>.</p>
<p>So in summary, these <code class="language-plaintext highlighter-rouge">Frames</code> are <em>pushed onto</em> a ‘linked list’ when calling into native code and <em>popped off</em> the list when returning from native code. This means that are any moment the ‘linked list’ contains all the current or active <code class="language-plaintext highlighter-rouge">Frames</code>.</p>
<h3 id="native-unwind-information">Native Unwind Information</h3>
<p>In addition to creating ‘Frames’, the CLR also ensures that the C++ compiler emits ‘unwind info’ for native code. We can see this if we use the <a href="https://docs.microsoft.com/en-us/cpp/build/reference/dumpbin-reference?view=vs-2017">DUMPBIN tool</a> and run <code class="language-plaintext highlighter-rouge">dumpbin /UNWINDINFO coreclr.dll</code>. We get the following output for <code class="language-plaintext highlighter-rouge">SystemNative::GetCommandLineArgs(..)</code> (that we looked at before):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 0002F064 003789B0 00378B7E 004ED1D8 ?GetCommandLineArgs@SystemNative@@SAPEAVObject@@XZ (public: static class Object * __cdecl SystemNative::GetCommandLineArgs(void))
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x3B
Count of codes: 13
Unwind codes:
29: SAVE_NONVOL, register=r12 offset=0x1C8
25: SAVE_NONVOL, register=rdi offset=0x1C0
21: SAVE_NONVOL, register=rsi offset=0x1B8
1D: SAVE_NONVOL, register=rbx offset=0x1B0
10: ALLOC_LARGE, size=0x190
09: PUSH_NONVOL, register=r15
07: PUSH_NONVOL, register=r14
05: PUSH_NONVOL, register=r13
Handler: 00148F14 __GSHandlerCheck_EH
EH Handler Data: 00415990
GS Unwind flags: EHandler UHandler
Cookie Offset: 00000180
0002F070 00378B7E 00378BB4 004ED26C
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x0A
Count of codes: 2
Unwind codes:
0A: ALLOC_SMALL, size=0x20
06: PUSH_NONVOL, register=rbp
Handler: 0014978C __CxxFrameHandler3
EH Handler Data: 00415990
</code></pre></div></div>
<p>If you want to understand more of what’s going on here I really recommend reading the excellent article <a href="https://blogs.msdn.microsoft.com/ntdebugging/2010/05/12/x64-manual-stack-reconstruction-and-stack-walking/">x64 Manual Stack Reconstruction and Stack Walking</a>. But in essence the ‘unwind info’ describes which registers are used within a method and how big stack is for that method. These pieces of information are enough to tell the runtime how to ‘unwind’ that particular method when walking the stack.</p>
<h3 id="differences-between-windows-and-unix">Differences between Windows and Unix</h3>
<p>However, to further complicate things, the ‘native code unwinding’ uses a different mechanism for ‘Windows’ v. ‘Unix’, as explained in <a href="https://github.com/dotnet/coreclr/issues/177#issuecomment-73648128">coreclr/issues/#177 (comment)</a>:</p>
<blockquote>
<ol>
<li><strong>Stack walker for managed code</strong>. JIT will generate regular Windows style unwinding info. We will reuse Windows unwinder code that we currently have checked in for debugger components for unwinding calls in managed code on Linux/Mac. Unfortunately, this work requires changes in the runtime that currently cannot be tested in the CoreCLR repo so it is hard to do this in the public right now. But we are working on fixing that because, as I mentioned at the beginning, our goal is do most work in the public.</li>
<li><strong>Stack walker for native code</strong>. Here, in addition to everything else, we need to allow GC to unwind native stack of any thread in the current process until it finds a managed frame. Currently we are considering using libunwind (http://www.nongnu.org/libunwind) for unwinding native call stacks. @janvorli did some prototyping/experiments and it seems to do what we need. If you have any experience with this library or have any comments/suggestions please let us know.</li>
</ol>
</blockquote>
<p>This also shows that there are 2 different ‘unwind’ mechanisms for ‘managed’ or ‘native’ code, we will discuss how the “<em>stack walker for managed code</em>” works in <a href="#unwinding-jitted-code">Unwinding ‘JITted’ Code</a>.</p>
<p>There is also some more information in <a href="https://github.com/dotnet/coreclr/issues/177#issuecomment-73803242">coreclr/issues/#177 (comment)</a>:</p>
<blockquote>
<p>My current work has two parts, as @sergiy-k has already mentioned. The <strong>windows style unwinder that will be used for the jitted code</strong> and <strong>Unix unwinder for native code</strong> that uses the libunwind’s low level <code class="language-plaintext highlighter-rouge">unw_xxxx</code> functions like <code class="language-plaintext highlighter-rouge">unw_step</code> etc.</p>
</blockquote>
<p>So, for ‘native code’ the runtime uses an OS specific mechanism, i.e. on Unix the <a href="https://github.com/libunwind/libunwind">Open Source ‘libunwind’ library</a> is used. You can see the differences in the code below (from <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/amd64/gmsamd64.cpp#L54-L74">here</a>), under Windows <code class="language-plaintext highlighter-rouge">Thread::VirtualUnwindCallFrame(..)</code> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp#L552-L671">implementation</a>) is called, but on Unix (i.e. <code class="language-plaintext highlighter-rouge">FEATURE_PAL</code>) <code class="language-plaintext highlighter-rouge">PAL_VirtualUnwind(..)</code> (<a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/pal/src/exception/seh-unwind.cpp#L249-L349">implementation</a>) is called instead:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef FEATURE_PAL
</span> <span class="n">pvControlPc</span> <span class="o">=</span> <span class="n">Thread</span><span class="o">::</span><span class="n">VirtualUnwindCallFrame</span><span class="p">(</span><span class="o">&</span><span class="n">ctx</span><span class="p">,</span> <span class="o">&</span><span class="n">nonVolRegPtrs</span><span class="p">);</span>
<span class="cp">#else // !FEATURE_PAL
</span> <span class="p">...</span>
<span class="n">BOOL</span> <span class="n">success</span> <span class="o">=</span> <span class="n">PAL_VirtualUnwind</span><span class="p">(</span><span class="o">&</span><span class="n">ctx</span><span class="p">,</span> <span class="o">&</span><span class="n">nonVolRegPtrs</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">pvControlPc</span> <span class="o">=</span> <span class="n">GetIP</span><span class="p">(</span><span class="o">&</span><span class="n">ctx</span><span class="p">);</span>
<span class="cp">#endif // !FEATURE_PAL
</span></code></pre></div></div>
<p>Before we more on, here are some links to the work that was done to support ‘stack walking’ when .NET Core CLR was <a href="https://blogs.msdn.microsoft.com/dotnet/2016/06/27/announcing-net-core-1-0/#the-net-core-journey">ported to Linux</a>:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/8887">[x86/Linux] Support Simple Exception Catch</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/6698">[ARM/Linux] coreclr fails due to lack of DWARF feature in libunwind #6698</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/259">Modify the windows amd64 unwinder to work as jitted code unwinder on Uni… #259</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/284">Refactor libunwind to work on osx #284</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/308">Reimplement native exception handling for PAL #308</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/6c2c7994f1412e8aa504800c7164de875c350fc1">Move the windows unwinder code out of the debug folder.</a></li>
<li><a href="https://github.com/dotnet/core/blob/4c4642d548074b3fbfd425541a968aadd75fea99/release-notes/1.0/1.0.0.md#dependencies">.NET Core Dependencies</a> (includes ‘libunwind’)</li>
<li><a href="https://github.com/dotnet/coreclr/pull/437">The sos “ClrStack” command now works</a></li>
</ul>
<hr />
<h2 id="unwinding-jitted-code">Unwinding ‘JITted’ Code</h2>
<p>Finally, we’re going to look at what happens with ‘managed code’, i.e. code that started off as C#/F#/VB.NET, was turned into IL and then compiled into native code by the ‘JIT Compiler’. This is the code that you generally want to see in your ‘stack trace’, because it’s code you wrote yourself!</p>
<h3 id="help-from-the-jit-compiler">Help from the ‘JIT Compiler’</h3>
<p>Simply, what happens is that when the code is ‘JITted’, the compiler also emits some extra information, stored via the <code class="language-plaintext highlighter-rouge">EECodeInfo</code> class, which is defined <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/jitinterface.cpp#L13922-L14300">here</a>. Also see the <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/compiler.h#L7316-L7440">‘Unwind Info’ section</a> in the JIT Compiler <-> Runtime interface, note how it features seperate sections for <code class="language-plaintext highlighter-rouge">TARGET_ARM</code>, <code class="language-plaintext highlighter-rouge">TARGET_ARM64</code>, <code class="language-plaintext highlighter-rouge">TARGET_X86</code> and <code class="language-plaintext highlighter-rouge">TARGET_UNIX</code>.</p>
<p>In addition, in <code class="language-plaintext highlighter-rouge">CodeGen::genFnProlog()</code> <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/codegencommon.cpp#L8832-L9299">here</a> the JIT emits a function ‘prologue’ that contains several pieces of ‘unwind’ related data. This is also imlemented in <code class="language-plaintext highlighter-rouge">CEEJitInfo::allocUnwindInfo(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/jitinterface.cpp#L11275-L11300">this piece of code</a>, which behaves differently for each CPU architecture:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if defined(_TARGET_X86_)
</span> <span class="c1">// Do NOTHING</span>
<span class="cp">#elif defined(_TARGET_AMD64_)
</span> <span class="n">pUnwindInfo</span><span class="o">-></span><span class="n">Flags</span> <span class="o">=</span> <span class="n">UNW_FLAG_EHANDLER</span> <span class="o">|</span> <span class="n">UNW_FLAG_UHANDLER</span><span class="p">;</span>
<span class="n">ULONG</span> <span class="o">*</span> <span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="p">(</span><span class="n">ULONG</span><span class="o">*</span><span class="p">)</span><span class="n">ALIGN_UP</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">pUnwindInfo</span><span class="o">-></span><span class="n">UnwindCode</span><span class="p">[</span><span class="n">pUnwindInfo</span><span class="o">-></span><span class="n">CountOfUnwindCodes</span><span class="p">]),</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ULONG</span><span class="p">));</span>
<span class="o">*</span><span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="n">ExecutionManager</span><span class="o">::</span><span class="n">GetCLRPersonalityRoutineValue</span><span class="p">();</span>
<span class="cp">#elif defined(_TARGET_ARM64_)
</span> <span class="o">*</span><span class="p">(</span><span class="n">LONG</span> <span class="o">*</span><span class="p">)</span><span class="n">pUnwindInfo</span> <span class="o">|=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">20</span><span class="p">);</span> <span class="c1">// X bit</span>
<span class="n">ULONG</span> <span class="o">*</span> <span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="p">(</span><span class="n">ULONG</span><span class="o">*</span><span class="p">)((</span><span class="n">BYTE</span> <span class="o">*</span><span class="p">)</span><span class="n">pUnwindInfo</span> <span class="o">+</span> <span class="n">ALIGN_UP</span><span class="p">(</span><span class="n">unwindSize</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ULONG</span><span class="p">)));</span>
<span class="o">*</span><span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="n">ExecutionManager</span><span class="o">::</span><span class="n">GetCLRPersonalityRoutineValue</span><span class="p">();</span>
<span class="cp">#elif defined(_TARGET_ARM_)
</span> <span class="o">*</span><span class="p">(</span><span class="n">LONG</span> <span class="o">*</span><span class="p">)</span><span class="n">pUnwindInfo</span> <span class="o">|=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">20</span><span class="p">);</span> <span class="c1">// X bit</span>
<span class="n">ULONG</span> <span class="o">*</span> <span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="p">(</span><span class="n">ULONG</span><span class="o">*</span><span class="p">)((</span><span class="n">BYTE</span> <span class="o">*</span><span class="p">)</span><span class="n">pUnwindInfo</span> <span class="o">+</span> <span class="n">ALIGN_UP</span><span class="p">(</span><span class="n">unwindSize</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ULONG</span><span class="p">)));</span>
<span class="o">*</span><span class="n">pPersonalityRoutine</span> <span class="o">=</span> <span class="p">(</span><span class="n">TADDR</span><span class="p">)</span><span class="n">ProcessCLRException</span> <span class="o">-</span> <span class="n">baseAddress</span><span class="p">;</span>
<span class="cp">#endif
</span></code></pre></div></div>
<p>Also, the JIT has several <code class="language-plaintext highlighter-rouge">Compiler::unwindXXX(..)</code> methods, that are all implemented in per-CPU source files:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/unwind.cpp">/src/jit/unwind.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/unwindarm.cpp">/src/jit/unwind<strong>arm</strong>.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/unwindx86.cpp">/src/jit/unwind<strong>x86</strong>.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/unwindamd64.cpp">/src/jit/unwind<strong>amd64</strong>.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.2/src/jit/unwindarm64.cpp">src/jit/unwind<strong>arm64</strong>.cpp</a></li>
</ul>
<p>Fortunately, we can <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md#useful-complus-variables">ask the JIT</a> to output the unwind info that it emits, however this <em>only works</em> with a Debug version of the CLR. Given a simple method like this:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">void</span> <span class="nf">MethodA</span><span class="p">()</span> <span class="p">{</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nf">MethodB</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">Exception</span> <span class="n">ex</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">ex</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>if we call <code class="language-plaintext highlighter-rouge">SET COMPlus_JitUnwindDump=MethodA</code>, we get the following output with 2 ‘Unwind Info’ sections, one for the <code class="language-plaintext highlighter-rouge">try</code> and the other for the <code class="language-plaintext highlighter-rouge">catch</code> block:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0x00004e (not in unwind data)
Version : 1
Flags : 0x00
SizeOfProlog : 0x07
CountOfUnwindCodes: 4
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 11 * 8 + 8 = 96 = 0x60
CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rdi (7)
CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
Unwind Info:
>> Start offset : 0x00004e (not in unwind data)
>> End offset : 0x0000e2 (not in unwind data)
Version : 1
Flags : 0x00
SizeOfProlog : 0x07
CountOfUnwindCodes: 4
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 5 * 8 + 8 = 48 = 0x30
CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rdi (7)
CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
</code></pre></div></div>
<p>This ‘unwind info’ is then looked up during a ‘stack walk’ as explained in the <a href="#how-it-works">How it works</a> section above.</p>
<hr />
<p><strong>So next time you encounter a ‘stack trace’ remember that a lot of work went into making it possible!!</strong></p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>‘Stack Walking’ or ‘Stack Unwinding’ is a very large topic, so if you want to know more, here are some links to get you started:</p>
<h3 id="stack-unwinding-general">Stack Unwinding (general)</h3>
<ul>
<li><a href="https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/">Stack frame layout on x86-64</a> (also has a great list of links at the bottom)</li>
<li><a href="https://eli.thegreenplace.net/2011/02/04/where-the-top-of-the-stack-is-on-x86/">Where the top of the stack is on x86</a></li>
<li><a href="https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/">Programmatic access to the call stack in C++</a></li>
<li><a href="https://eli.thegreenplace.net/2011/02/07/how-debuggers-work-part-3-debugging-information">How debuggers work: Part 3 - Debugging information</a></li>
<li><a href="https://blog.tartanllama.xyz/writing-a-linux-debugger-unwinding/">Writing a Linux Debugger Part 8: Stack unwinding</a></li>
<li><a href="http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwinding.html">Deep Wizardry: Stack Unwinding</a></li>
<li><a href="https://www.reddit.com/r/programming/comments/1ebswy/deep_wizardry_stack_unwinding/">Deep Wizardry: Stack Unwinding</a> (/r/programmming)</li>
<li><a href="http://www.corsix.org/content/libunwind-dynamic-code-x86-64">On libunwind and dynamically generated code on x86-64</a></li>
<li><a href="https://news.ycombinator.com/item?id=11477039">On libunwind and dynamically generated code on x86-64</a> (HackerNews)</li>
<li><a href="https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames">x86 Disassembly/Functions and Stack Frames</a></li>
<li><a href="https://stackoverflow.com/questions/579262/what-is-the-purpose-of-the-ebp-frame-pointer-register">What is the purpose of the EBP frame pointer register?</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2011/07/20/manual-stack-walking/">Manual Stack Walking</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2011/08/22/walking-the-stack-without-symbols-and-with-fpo-frame-pointer-omission/">Walking the Stack Without Symbols and With FPO (Frame Pointer Omission)</a></li>
<li><a href="https://cshung.gitbooks.io/how-to-write-a-debuggable-programming-language/content/stack-unwinder.html">how to write a debuggable programming language - stack unwinding</a></li>
<li><a href="https://www.reddit.com/r/programming/comments/5v4ztx/how_the_net_runtime_walks_the_stack/">How the .NET Runtime Walks the Stack</a> (/r/programming discussion of the ‘BorR’ page)</li>
<li><a href="https://blog.slaks.net/2011/10/caller-info-attributes-vs-stack-walking.html">Caller Info Attributes vs. Stack Walking</a></li>
<li><a href="http://www.osronline.com/article.cfm?id=202">Stacking the Deck – Finding Your Way Through the Stack</a></li>
</ul>
<h3 id="stack-unwinding-other-runtimes">Stack Unwinding (other runtimes)</h3>
<p>In addition, it’s interesting to look at how other runtimes handles this process:</p>
<ul>
<li><strong>Mono</strong>
<ul>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/mini-porting/#unwind-info">Porting the Engine - Unwind Info</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/llvm-backend/#unwind-info">LLVM Backend - Unwind Info</a></li>
<li><a href="https://www.mono-project.com/docs/advanced/runtime/docs/exception-handling/#stack-unwinding-during-exception-handling">Stack unwinding during exception handling</a></li>
<li><a href="https://github.com/mono/mono/blob/master/mono/mini/unwind.c">/master/mono/mini/<strong>unwind.c</strong></a></li>
<li><a href="https://github.com/mono/mono/blob/master/mono/utils/mono-stack-unwinding.h">/master/mono/utils/<strong>mono-stack-unwinding.h</strong></a></li>
</ul>
</li>
<li><strong>CoreRT</strong> (<a href="/2018/06/07/CoreRT-.NET-Runtime-for-AOT/">A .NET Runtime for AOT</a>)
<ul>
<li><a href="https://github.com/dotnet/corert/blob/master/Documentation/high-level-engineering-plan.md#runtime">High-level Engineering Plan - Runtime</a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/src/Native/Runtime/unix/UnwindHelpers.cpp">/src/Native/Runtime/unix/<strong>UnwindHelpers.cpp</strong></a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/src/Native/Runtime/StackFrameIterator.cpp">/src/Native/Runtime/<strong>StackFrameIterator.cpp</strong></a> (see <code class="language-plaintext highlighter-rouge">StackFrameIterator::NextInternal()</code>)</li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/Native/libunwind">/src/Native/<strong>libunwind</strong></a></li>
</ul>
</li>
<li><strong>Go</strong>
<ul>
<li><a href="https://science.raphael.poss.name/go-calling-convention-x86-64.html#aside-exceptions-in-c-c">The Go low-level calling convention on x86-64</a></li>
<li><a href="https://github.com/teh-cmc/go-internals/blob/master/chapter1_assembly_primer/README.md#dissecting-main">Go Internals - Chapter I: A Primer on Go Assembly</a></li>
<li><a href="https://stackimpact.com/blog/go-profiler-internals/">Go Profiler Internals</a></li>
<li><a href="https://golang.org/src/runtime/stack.go">golang.org/src/runtime/stack.go</a></li>
<li><a href="https://golang.org/src/runtime/traceback.go?h=gentraceback#L98">golang.org/src/runtime/traceback.go</a> (see <code class="language-plaintext highlighter-rouge">gentraceback(..)</code>)</li>
<li><a href="https://golang.org/src/runtime/symtab.go?h=findfunc#L659">golang.org/src/runtime/symtab.go</a> (see <code class="language-plaintext highlighter-rouge">findfunc(..)</code>)</li>
<li><a href="https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html">Language Mechanics On Stacks And Pointers</a></li>
<li><a href="http://technosophos.com/2014/03/19/generating-stack-traces-in-go.html">Generating Stack Traces in Go</a></li>
</ul>
</li>
<li><strong>Java</strong>
<ul>
<li><a href="http://openjdk.java.net/jeps/259">JEP 259: Stack-Walking API</a></li>
<li><a href="https://alvinalexander.com/scala/fp-book/recursion-visual-look-jvm-stack-frames">A Visual Look at JVM Stacks and Frames</a></li>
<li><a href="https://www.artima.com/insidejvm/ed2/jvm8.html">The Java Virtual Machine - The Java Stack</a></li>
<li><a href="https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.5.6">The Structure of the Java Virtual Machine - Native Method Stacks</a></li>
<li><a href="https://harmony.apache.org/subcomponents/drlvm/developers_guide.html#Stack_Walking">Stack Walking - Dynamic Runtime Layer Virtual Machine Developer’s Guide</a></li>
<li><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.346&rep=rep1&type=pdf">A Study of Exception Handling and Its Dynamic Optimization in Java</a> (pdf)</li>
<li><a href="https://books.google.co.uk/books?id=jZG_DQAAQBAJ&lpg=PA125&ots=KwhW3tYXUa&dq=chapter%208%20stack%20unwinding&pg=PA125#v=onepage&q=chapter%208%20stack%20unwinding&f=false">Chapter 8 of ‘Advanced Design and Implementation of Virtual Machines’</a></li>
</ul>
</li>
<li><strong>Rust</strong>
<ul>
<li><a href="https://doc.rust-lang.org/nomicon/unwinding.html">Unwinding</a></li>
<li><a href="http://lucumr.pocoo.org/2014/10/30/dont-panic/">Don’t Panic! The Hitchhiker’s Guide to Unwinding</a></li>
<li><a href="https://news.ycombinator.com/item?id=8537756">Stack unwinding in Rust</a> (Hacker News)</li>
<li><a href="https://github.com/rust-lang/rfcs/blob/master/text/1513-less-unwinding.md">RFC 1513 - Less unwinding</a></li>
<li><a href="https://internals.rust-lang.org/t/disabling-panic-handling/1834">Disabling panic! handling</a></li>
<li><a href="https://rust-lang-nursery.github.io/edition-guide/rust-2018/error-handling-and-panics/controlling-panics-with-std-panic.html">Controlling panics with std::panic</a></li>
<li><a href="https://doc.rust-lang.org/1.3.0/std/rt/unwind/">Module std::rt::unwind</a></li>
</ul>
</li>
</ul>
Exploring the .NET Core Runtime (in which I set myself a challenge)2018-12-13T00:00:00+00:00http://www.mattwarren.org/2018/12/13/Exploring-the-.NET-Core-Runtime
<p>It seems like this time of year anyone with a blog is doing some sort of ‘advent calendar’, i.e. 24 posts leading up to Christmas. For instance there’s a <a href="https://sergeytihon.com/2018/10/22/f-advent-calendar-in-english-2018/">F# one</a> which inspired a <a href="https://crosscuttingconcerns.com/The-Second-Annual-C-Advent">C# one</a> (<em>C# copying from F#, that never happens</em> 😉)</p>
<p>However, that’s a bit of a problem for me, I struggled to write 24 posts <a href="/postsByYear/#2016-ref">in my most productive year</a>, let alone a single month! Also, I mostly blog about <a href="/tags/#Internals">‘.NET Internals’</a>, a subject which doesn’t necessarily lend itself to the more ‘<em>light-hearted</em>’ posts you get in these ‘advent calendar’ blogs.</p>
<p><strong>Until now!</strong></p>
<hr />
<p>Recently I’ve been giving a talk titled <strong>from ‘dotnet run’ to ‘hello world’</strong>, which attempts to explain everything that the .NET Runtime does from the point you launch your application till “Hello World” is printed on the screen:</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/xU98KRbWFvU2SC?startSlide=6" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/from-dotnet-run-to-hello-world" title="From 'dotnet run' to 'hello world'" target="_blank">From 'dotnet run' to 'hello world'</a> </strong> from <strong><a href="//www.slideshare.net/mattwarren" target="_blank">Matt Warren</a></strong> </div>
<p>But as I was researching and presenting this talk, it made me think about the <em>.NET Runtime</em> as a whole, <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/#high-level-overview"><em>what does it contain</em></a> and most importantly <strong>what can you do with it</strong>?</p>
<p><strong>Note:</strong> this is mostly for <em>informational</em> purposes, for the <em>recommended way</em> of achieving the same thing, take a look at this excellent <a href="https://natemcmaster.com/blog/2017/12/21/netcore-primitives/">Deep-dive into .NET Core primitives</a> by <a href="https://twitter.com/natemcmaster">Nate McMaster</a>.</p>
<hr />
<p>In this post I will explore what you can do <strong>using only the code in the <a href="https://github.com/dotnet/coreclr">dotnet/coreclr</a> repository</strong> and along the way we’ll find out more about how the runtime interacts with the wider <a href="https://dotnet.microsoft.com/">.NET Ecosystem</a>.</p>
<p>To makes things clearer, there are <strong>3 challenges</strong> that will need to be solved before a simple “Hello World” application can be run. That’s because in the <a href="https://github.com/dotnet/coreclr">dotnet/coreclr</a> repository there is:</p>
<ol>
<li>No <strong>compiler</strong>, that lives in <a href="https://github.com/dotnet/roslyn/">dotnet/Roslyn</a></li>
<li>No <strong>Framework Class Library (FCL)</strong> a.k.a. ‘<a href="https://github.com/dotnet/corefx">dotnet/CoreFX</a>’</li>
<li>No <code class="language-plaintext highlighter-rouge">dotnet run</code> as it’s implemented in the <a href="https://github.com/dotnet/cli/tree/release/2.2.2xx/src/dotnet/commands/dotnet-run">dotnet/CLI</a> repository</li>
</ol>
<hr />
<h2 id="building-the-coreclr">Building the CoreCLR</h2>
<p>But before we even work through these ‘challenges’, we need to build the CoreCLR itself. Helpfully there is really nice guide available in <a href="https://github.com/dotnet/coreclr#building-the-repository">‘Building the Repository’</a>:</p>
<blockquote>
<p>The build depends on Git, CMake, Python and of course a C++ compiler. Once these prerequisites are installed
the build is simply a matter of invoking the ‘build’ script (<code class="language-plaintext highlighter-rouge">build.cmd</code> or <code class="language-plaintext highlighter-rouge">build.sh</code>) at the base of the repository.</p>
<p>The details of installing the components differ depending on the operating system. See the following pages based on your OS. There is no cross-building across OS (only for ARM, which is built on X64). You have to be on the particular platform to build that platform.</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/windows-instructions.md">Windows Build Instructions</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/linux-instructions.md">Linux Build Instructions</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/osx-instructions.md">macOS Build Instructions</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/freebsd-instructions.md">FreeBSD Build Instructions</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/netbsd-instructions.md">NetBSD Build Instructions</a></li>
</ul>
</blockquote>
<p>If you follow these steps successfully, you’ll end up with the following files (at least on Windows, other OSes may produce something slightly different):</p>
<p><img src="/images/2018/12/CoreCLR Build Artifacts.png" alt="CoreCLR Build Artifacts" /></p>
<hr />
<h2 id="no-compiler">No Compiler</h2>
<p>First up, how do we get around the fact that we don’t have a compiler? After all we need some way of turing our simple “Hello World” code into a .exe?</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">namespace</span> <span class="nn">Hello_World</span>
<span class="p">{</span>
<span class="k">class</span> <span class="nc">Program</span>
<span class="p">{</span>
<span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Hello World!"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Fortunately we do have access to the <a href="https://github.com/dotnet/coreclr/tree/master/src/ilasm">ILASM tool (IL Assembler)</a>, which can turn <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">Common Intermediate Language (CIL)</a> into an .exe file. But how do we get the correct IL code? Well, one way is to write it from scratch, maybe after reading <a href="https://amzn.to/2QPpiTY">Inside NET IL Assembler</a> and <a href="https://amzn.to/2Ca34UI">Expert .NET 2.0 IL Assembler</a> by Serge Lidin (yes, amazingly, 2 books have been written about IL!)</p>
<p>Another, much easier way, is to use the amazing <a href="https://sharplab.io/">SharpLab.io site</a> to do it for us! If you paste the C# code from above into it, you’ll <a href="https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8ACAGAAhwEYBuAWACgqA7CMOKABwgGM4CAJODDAewD6AdT4IMAEyoBvKgTlEATEWIB2WfJmV525QDYiAFgIBZCGhoAKEngDaAXQIQEAcygBKdToKavXkgE4LACJuXj4CETFxAEIgtwotXwBfTwIUyiSgA==">get the following IL code</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.class private auto ansi '<Module>'
{
} // end of class <Module>
.class private auto ansi beforefieldinit Hello_World.Program
extends [mscorlib]System.Object
{
// Methods
.method private hidebysig static
void Main (
string[] args
) cil managed
{
// Method begins at RVA 0x2050
// Code size 11 (0xb)
.maxstack 8
IL_0000: ldstr "Hello World!"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: ret
} // end of method Program::Main
.method public hidebysig specialname rtspecialname
instance void .ctor () cil managed
{
// Method begins at RVA 0x205c
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ret
} // end of method Program::.ctor
} // end of class Hello_World.Program
</code></pre></div></div>
<p>Then, if we save this to a file called ‘HelloWorld.il’ and run the cmd <code class="language-plaintext highlighter-rouge">ilasm HelloWorld.il /out=HelloWorld.exe</code>, we get the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Microsoft (R) .NET Framework IL Assembler. Version 4.5.30319.0
Copyright (c) Microsoft Corporation. All rights reserved.
Assembling 'HelloWorld.il' to EXE --> 'HelloWorld.exe'
Source file is ANSI
HelloWorld.il(38) : warning : Reference to undeclared extern assembly 'mscorlib'. Attempting autodetect
Assembled method Hello_World.Program::Main
Assembled method Hello_World.Program::.ctor
Creating PE file
Emitting classes:
Class 1: Hello_World.Program
Emitting fields and methods:
Global
Class 1 Methods: 2;
Emitting events and properties:
Global
Class 1
Writing PE file
Operation completed successfully
</code></pre></div></div>
<p><strong>Nice, so part 1 is done, we now have our <code class="language-plaintext highlighter-rouge">HelloWorld.exe</code> file!</strong></p>
<h2 id="no-base-class-library">No Base Class Library</h2>
<p>Well, not exactly, one problem is that <code class="language-plaintext highlighter-rouge">System.Console</code> lives in <a href="https://github.com/dotnet/corefx/tree/release/2.2/src/System.Console/src/System">dotnet/corefx</a>, in there you can see the different files that make up the implementation, such as <code class="language-plaintext highlighter-rouge">Console.cs</code>, <code class="language-plaintext highlighter-rouge">ConsolePal.Unix.cs</code>, <code class="language-plaintext highlighter-rouge">ConsolePal.Windows.cs</code>, etc.</p>
<p>Fortunately, the nice CoreCLR developers included a simple <code class="language-plaintext highlighter-rouge">Console</code> implementation in <code class="language-plaintext highlighter-rouge">System.Private.CoreLib.dll</code>, the <a href="https://github.com/dotnet/coreclr/tree/master/src/System.Private.CoreLib">managed part of the CoreCLR</a>, which was previously known as <a href="https://github.com/dotnet/coreclr/tree/release/2.2/src/mscorlib">‘mscorlib’</a> (before it <a href="https://github.com/dotnet/coreclr/pull/17926">was renamed</a>). This internal version of <code class="language-plaintext highlighter-rouge">Console</code> is <a href="https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/src/Internal/Console.cs">pretty small and basic</a>, but it provides enough for what we need.</p>
<p>To use this ‘workaround’ we need to edit our <code class="language-plaintext highlighter-rouge">HelloWorld.il</code> to look like this (note the change from <code class="language-plaintext highlighter-rouge">mscorlib</code> to <code class="language-plaintext highlighter-rouge">System.Private.CoreLib</code>)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.class public auto ansi beforefieldinit C
extends [System.Private.CoreLib]System.Object
{
.method public hidebysig static void M () cil managed
{
.entrypoint
// Code size 11 (0xb)
.maxstack 8
IL_0000: ldstr "Hello World!"
IL_0005: call void [System.Private.CoreLib]Internal.Console::WriteLine(string)
IL_000a: ret
} // end of method C::M
...
}
</code></pre></div></div>
<p><strong>Note:</strong> You can achieve the same thing with C# code instead of raw IL, by invoking the C# compiler with the following cmd-line:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>csc -optimize+ -nostdlib -reference:System.Private.Corelib.dll -out:HelloWorld.exe HelloWorld.cs
</code></pre></div></div>
<p><strong>So we’ve completed part 2, we are able to at least print “Hello World” to the screen without using the CoreFX repository!</strong></p>
<hr />
<p>Now this is a nice little trick, but I wouldn’t ever recommend writing real code like this. Compiling against <code class="language-plaintext highlighter-rouge">System.Private.CoreLib</code> isn’t the right way of doing things. What the compiler normally does is compile against the publicly exposed surface area that lives in <a href="https://github.com/dotnet/corefx">dotnet/corefx</a>, but then at run-time a process called <a href="https://docs.microsoft.com/en-us/dotnet/framework/app-domains/type-forwarding-in-the-common-language-runtime">‘Type-Forwarding’</a> is used to make that ‘reference’ implementation in CoreFX map to the ‘real’ implementation in the CoreCLR. For more on this entire process see <a href="https://blog.lextudio.com/the-rough-history-of-referenced-assemblies-7d752d92c18c">The Rough History of Referenced Assemblies</a>.</p>
<p>However, only a <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/#high-level-overview">small amount of managed code</a> (i.e. C#) actually exists in the CoreCLR, to show this, the directory tree for <a href="https://github.com/dotnet/coreclr/tree/master/src/System.Private.CoreLib">/dotnet/coreclr/src/System.Private.CoreLib</a> is <a href="https://gist.github.com/mattwarren/6b36567b51e3adca6c1ca684e72b8f6f">available here</a> and the tree with all ~1280 .cs files included is <a href="https://gist.github.com/mattwarren/abc4e194b71e78eb9fa5a550a379a0a1">here</a>.</p>
<p>As a concrete example, if you look in CoreFX, you’ll see that the <a href="https://github.com/dotnet/corefx/tree/master/src/System.Reflection/src">System.Reflection implementation</a> is pretty empty! That’s because it’s a ‘partial facade’ that is eventually <a href="https://github.com/dotnet/corefx/blob/release/2.2/src/System.Reflection.Emit/src/System.Reflection.Emit.csproj#L19">‘type-forwarded’ to System.Private.CoreLib</a>.</p>
<p>If you’re interested, the entire API that is exposed in CoreFX (but actually lives in CoreCLR) is <a href="https://github.com/dotnet/corefx/blob/master/src/System.Runtime/ref/System.Runtime.cs">contained in System.Runtime.cs</a>. But back to our example, here is the code that describes all the <a href="https://github.com/dotnet/corefx/blob/master/src/System.Runtime/ref/System.Runtime.cs#L3035-L3048"><code class="language-plaintext highlighter-rouge">GetMethod(..)</code> functions</a> in the ‘System.Reflection’ API.</p>
<p>To learn more about ‘type forwarding’, I recommend watching <a href="https://www.youtube.com/watch?v=vg6nR7hS2lI">‘.NET Standard - Under the Hood’</a> (<a href="https://www.slideshare.net/terrajobst/net-standard-under-the-hood">slides</a>) by <a href="https://twitter.com/terrajobst">Immo Landwerth</a> and there is also some more in-depth information in <a href="https://github.com/dotnet/standard/blob/master/docs/history/evolution-of-design-time-assemblies.md">‘Evolution of design time assemblies’</a>.</p>
<p><strong>But why is this code split useful</strong>, from the <a href="https://github.com/dotnet/corefx#net-core-libraries-corefx">CoreFX README</a>:</p>
<blockquote>
<p><strong>Runtime-specific library code</strong> (<a href="https://github.com/dotnet/coreclr/tree/master/src/System.Private.CoreLib">mscorlib</a>) lives in the CoreCLR repo. It needs to be built and versioned in tandem with the runtime. The rest of CoreFX is <strong>agnostic of runtime-implementation and can be run on any compatible .NET runtime</strong> (e.g. <a href="https://github.com/dotnet/corert">CoreRT</a>).</p>
</blockquote>
<p>And from the other point-of-view, in the <a href="https://github.com/dotnet/coreclr#relationship-with-the-corefx-repository">CoreCLR README</a>:</p>
<blockquote>
<p>By itself, the <code class="language-plaintext highlighter-rouge">Microsoft.NETCore.Runtime.CoreCLR</code> package is actually not enough to do much. One reason for this is that the CoreCLR package tries to minimize the amount of the class library that it implements. <strong>Only types that have a strong dependency on the internal workings of the runtime are included</strong> (e.g, <code class="language-plaintext highlighter-rouge">System.Object</code>, <code class="language-plaintext highlighter-rouge">System.String</code>, <code class="language-plaintext highlighter-rouge">System.Threading.Thread</code>, <code class="language-plaintext highlighter-rouge">System.Threading.Tasks.Task</code> and most foundational interfaces).</p>
<p>Instead most of the class library is implemented as independent NuGet packages that simply use the .NET Core runtime as a dependency. Many of the most familiar classes (<code class="language-plaintext highlighter-rouge">System.Collections</code>, <code class="language-plaintext highlighter-rouge">System.IO</code>, <code class="language-plaintext highlighter-rouge">System.Xml</code> and so on), live in packages defined in the <a href="https://github.com/dotnet/corefx">dotnet/corefx</a> repository.</p>
</blockquote>
<p>One <strong>huge benefit</strong> of this approach is that <a href="https://www.mono-project.com/">Mono</a> can share <a href="https://mobile.twitter.com/matthewwarren/status/987292012520067072">large amounts of the CoreFX code</a>, as shown in this tweet:</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">How Mono reuses .NET Core sources for BCL (doesn't include runtime, tools, etc) according to my calculations 🙂 <a href="https://t.co/8JCDxqwnNi">pic.twitter.com/8JCDxqwnNi</a></p>— Egor Bogatov (@EgorBo) <a href="https://twitter.com/EgorBo/status/978737460061458432?ref_src=twsrc%5Etfw">March 27, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<hr />
<h2 id="no-launcher">No Launcher</h2>
<p>So far we’ve ‘compiled’ our code (well technically ‘assembled’ it) and we’ve been able to access a simple version of <code class="language-plaintext highlighter-rouge">System.Console</code>, but how do we actually run our <code class="language-plaintext highlighter-rouge">.exe</code>? Remember we can’t use the <code class="language-plaintext highlighter-rouge">dotnet run</code> command because that lives in the <a href="https://github.com/dotnet/cli/tree/release/2.2.2xx/src/dotnet/commands/dotnet-run">dotnet/CLI</a> repository (and that would be breaking the rules of this <em>slightly contrived</em> challenge!!).</p>
<p>Again, fortunately those clever runtime engineers have thought of this exact scenario and they built the very helpful <code class="language-plaintext highlighter-rouge">corerun</code> application. You can read more about in <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/workflow/UsingCoreRun.md">Using corerun To Run .NET Core Application</a>, but the td;dr is that it will only look for dependencies in the same folder as your .exe.</p>
<p>So, to complete the challenge, we can now run <code class="language-plaintext highlighter-rouge">CoreRun HelloWorld.exe</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># CoreRun HelloWorld.exe
Hello World!
</code></pre></div></div>
<p><strong>Yay, the least impressive demo you’ll see this year!!</strong></p>
<p>For more information on how you can ‘host’ the CLR in your application I recommend this excellent tutorial <a href="https://docs.microsoft.com/en-us/dotnet/core/tutorials/netcore-hosting">Write a custom .NET Core host to control the .NET runtime from your native code</a>. In addition, the docs page on <a href="https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/a51xd4ze(v=vs.100)">‘Runtime Hosts’</a> gives a nice overview of the different hosts that are available:</p>
<blockquote>
<p>The .NET Framework ships with a number of different runtime hosts, including the hosts listed in the following table.</p>
<table>
<thead>
<tr>
<th>Runtime Host</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASP.NET</td>
<td>Loads the runtime into the process that is to handle the Web request. ASP.NET also creates an application domain for each Web application that will run on a Web server.</td>
</tr>
<tr>
<td>Microsoft Internet Explorer</td>
<td>Creates application domains in which to run managed controls. The .NET Framework supports the download and execution of browser-based controls. The runtime interfaces with the extensibility mechanism of Microsoft Internet Explorer through a mime filter to create application domains in which to run the managed controls. By default, one application domain is created for each Web site.</td>
</tr>
<tr>
<td>Shell executables</td>
<td>Invokes runtime hosting code to transfer control to the runtime each time an executable is launched from the shell.</td>
</tr>
</tbody>
</table>
</blockquote>
Open Source .NET – 4 years later2018-12-04T00:00:00+00:00http://www.mattwarren.org/2018/12/04/Open-Source-.Net-4-years-later.
<link rel="stylesheet" href="/datavis/dotnet-oss.css" />
<script src="/datavis/dotnet-oss.js" type="text/javascript"></script>
<p>A little over 4 years ago Microsoft announced that they were <a href="http://www.hanselman.com/blog/AnnouncingNET2015NETAsOpenSourceNETOnMacAndLinuxAndVisualStudioCommunity.aspx">open sourcing large parts of the .NET framework</a> and as this slide from <a href="https://www.slideshare.net/jongalloway/net-core-previews-new-features-in-net-core-and-aspnet-core-21-blazor-and-more#8">New Features in .NET Core and ASP.NET Core 2.1</a> shows, the community has been contributing in a significant way:</p>
<p><a href="https://twitter.com/jongalloway/status/974064785397395456"><img src="/images/2018/12/NET Open Source Success.jpg" alt=".NET Open Source Success" /></a></p>
<p><strong>Side-note</strong>: This post forms part of an on-going series, if you want to see how things have changed over time you can check out the previous ones:</p>
<ul>
<li><a href="/2017/12/19/Open-Source-.Net-3-years-later?recommended=1">Open Source .NET – 3 years later</a></li>
<li><a href="/2016/11/23/open-source-net-2-years-later?recommended=1">Open Source .NET – 2 years later</a></li>
<li><a href="/2016/01/15/open-source-net-1-year-later-now-with-aspnet?recommended=1">Open Source .NET – 1 year later - Now with ASP.NET</a></li>
<li><a href="/2015/12/08/open-source-net-1-year-later?recommended=1">Open Source .NET – 1 year later</a></li>
</ul>
<hr />
<h2 id="runtime-changes">Runtime Changes</h2>
<p>Before I look at the numbers, I just want to take a moment to look at the <strong>significant</strong> runtime changes that have taken place over the last 4 years. Partly because I really like looking at the <a href="/tags/#Internals">‘Internals’ of CoreCLR</a>, but also because the runtime is the one repository that makes all the others possible, they rely on it!</p>
<p>To give some context, here’s the slides from a presentation I did called <a href="https://www.updateconference.net/en/session/from--dotnet-run--to--hello-world--">‘From ‘dotnet run’ to ‘hello world’</a>. If you flick through them you’ll see what components make up the CoreCLR code-base and what they do to make your application run.</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/xU98KRbWFvU2SC?startSlide=8" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/from-dotnet-run-to-hello-world" title="From 'dotnet run' to 'hello world'" target="_blank">From 'dotnet run' to 'hello world'</a> </strong> from <strong><a href="//www.slideshare.net/mattwarren" target="_blank">Matt Warren</a></strong> </div>
<p>So, after a bit of digging through the <a href="https://github.com/dotnet/coreclr">19,059 commits</a>, <a href="https://github.com/dotnet/coreclr/issues">5,790 issues</a> and <a href="https://github.com/dotnet/coreclr/projects">the 8 projects</a>, here’s the list of <strong>significant</strong> changes in the <strong>.NET Core Runtime (CoreCLR)</strong> over the last few years (if I’ve missed any out, please let me know!!):</p>
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">Span<T></code></strong> (<a href="https://msdn.microsoft.com/en-us/magazine/mt814808.aspx?f=255&MSPPError=-2147217396">more info</a>)
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/5851">Span<T></a> (‘umbrella’ issue for the whole feature)
<ul>
<li>Includes change to multiple parts of the runtime, the VM, JIT and GC</li>
</ul>
</li>
<li><a href="https://github.com/Microsoft/dotnet/issues/770">Will .NET Core 2.1’s Span-based APIs be made available on the .NET Framework? If so, when?</a></li>
<li>Also needed <strong>CoreFX</strong> work such as <a href="https://github.com/dotnet/corefx/issues/21281">Add initial Span/Buffer-based APIs across corefx</a> and <a href="https://github.com/dotnet/corefx/issues/21395">String-like extension methods to ReadOnlySpan<char> Epic</a> and <strong>Compiler</strong> changes, e.g. <a href="https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md">Compile time enforcement of safety for ref-like types</a></li>
</ul>
</li>
<li><strong><code class="language-plaintext highlighter-rouge">ref-like</code> like types</strong> (to support <code class="language-plaintext highlighter-rouge">Span<T></code>)
<ul>
<li><a href="https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md#generalized-ref-like-types-in-source-code">‘Generalized ref-like types in source code.’</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/15745">Detect ByRefLike types using attribute</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/18280">Interpretation of ByRefLikeAttribute in .NET Core 2.1 is a breaking change and a standard violation</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=IsByRefLike&type=">Search for ‘IsByRefLike’ in the CoreCLR source code</a></li>
</ul>
</li>
<li><strong>Tiered Compilation</strong> (<a href="https://blogs.msdn.microsoft.com/dotnet/2018/08/02/tiered-compilation-preview-in-net-core-2-1/">more info</a>)
<ul>
<li><a href="https://github.com/dotnet/coreclr/search?o=asc&p=3&q=tiered+compilation&s=author-date&type=Commits">Tiered Compilation step 1</a>, <a href="https://github.com/dotnet/coreclr/pull/14612">profiler changes for tiered compilation</a>, <a href="https://github.com/dotnet/coreclr/pull/17476">Fix x86 steady state tiered compilation performance</a></li>
<li>Also see the more general <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/code-versioning.md">‘Code Versioning’ design doc</a> and <a href="https://github.com/dotnet/coreclr/pull/19525">Enable Tiered Compilation by default</a></li>
</ul>
</li>
<li><strong>Cross-platform</strong> (Unix, OS X, etc, see list of all <a href="https://github.com/dotnet/coreclr/labels?utf8=%E2%9C%93&q=os-">‘os-xxx’ labels</a>)
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/170">Support building mscorlib on UNIX systems</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/177">Implement stack unwinding and exceptions for Linux</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/453">Inital build support for FreeBSD</a> and <a href="https://github.com/dotnet/coreclr/pull/827">Complete FreeBSD bringup</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/117">Initial Mac OSX Support (PR)</a> and the <a href="https://github.com/dotnet/coreclr/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Akangaroo+is%3Aclosed+OSX">rest of the work</a>!!</li>
<li><a href="https://praeclarum.org/2015/02/09/building-and-running-nets-coreclr-on-os-x.html">Building and Running .NET’s CoreCLR on OS X</a></li>
</ul>
</li>
<li><strong>New CPU Architectures</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/projects/2">ARM64 Project</a></li>
<li><a href="https://github.com/dotnet/coreclr/projects/4">ARM32 Project</a></li>
<li>List of all issues <a href="https://github.com/dotnet/coreclr/labels?utf8=%E2%9C%93&q=arch-">labelled ‘arch-xxx’</a></li>
</ul>
</li>
<li><strong>Hardware Intrinsics</strong> (<a href="https://github.com/dotnet/coreclr/projects/7">project</a>)
<ul>
<li><a href="https://github.com/dotnet/designs/blob/master/accepted/platform-intrinsics.md">Design Document</a></li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2018/10/10/using-net-hardware-intrinsics-api-to-accelerate-machine-learning-scenarios/">Using .NET Hardware Intrinsics API to accelerate machine learning scenarios</a> contains a nice overview of the implementation</li>
</ul>
</li>
<li><strong>Default Interface Methods</strong> (<a href="https://github.com/dotnet/coreclr/projects/6">project</a>)
<ul>
<li>Runtime support for the <a href="https://github.com/dotnet/csharplang/blob/0a4aa03e3767805b85b606f8e58559f089bc9337/proposals/default-interface-methods.md">default interface methods</a> C# language feature.</li>
</ul>
</li>
<li><strong>Performance Monitoring</strong> and <strong>Diagnostics</strong> (<a href="https://github.com/dotnet/coreclr/projects/5">project</a>)
<ul>
<li><a href="https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md">Cross-Platform Performance Monitoring Design</a> and <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md">NET Cross-Plat Performance and Eventing Design</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/1598">Enable Lttng Logging for CoreClr</a></li>
<li><a href="https://lttng.org/blog/2018/08/28/bringing-dotnet-perf-analysis-to-linux/">Bringing .NET application performance analysis to Linux</a></li>
</ul>
</li>
<li><strong>Ready-to-Run Images</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/readytorun-overview.md">ReadyToRun Overview</a></li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2018/08/20/bing-com-runs-on-net-core-2-1/">Bing.com runs on .NET Core 2.1!</a> (section on ‘ReadyToRun Images’)</li>
</ul>
</li>
<li><strong>LocalGC</strong> (<a href="https://github.com/dotnet/coreclr/projects/3">project</a>)
<ul>
<li>See in in action in <a href="http://tooslowexception.com/tag/garbagecollector/">Zero Garbage Collector for .NET Core</a> and the follow-up <a href="http://tooslowexception.com/zero-garbage-collector-for-net-core-2-1-and-asp-net-core-2-1/">Zero Garbage Collector for .NET Core 2.1 and ASP.NET Core 2.1</a></li>
</ul>
</li>
<li><strong>Unloadability</strong> (<a href="https://github.com/dotnet/coreclr/projects/9">project</a>)
<ul>
<li>Support for unloading <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/assemblyloadcontext.md">AssemblyLoadContext</a> and all assemblies loaded into it.</li>
</ul>
</li>
</ul>
<p>So there’s been quite a few large, fundamental changes to the runtime since it’s been open-sourced.</p>
<hr />
<h2 id="repository-activity-over-time">Repository activity over time</h2>
<p>But onto the data, first we are going to look at an overview of the <strong>level of activity in each repo</strong>, by analysing the total number of ‘<strong>Issues</strong>’ (created) or ‘<strong>Pull Requests</strong>’ (closed) per month. (<a href="http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR">Sparklines FTW!!</a>). If you are interested in <em>how</em> I got the data, see the previous post <a href="/2016/11/23/open-source-net-2-years-later#methodology---community-v-microsoft">because the process is the same</a>.</p>
<p><strong>Note:</strong> Numbers in <span style="color:rgb(0,0,0);font-weight:bold;">black</span> are from the most recent month, with the <span style="color:#d62728;font-weight:bold;">red</span> dot showing the lowest and the <span style="color:#2ca02c;font-weight:bold;">green</span> dot the highest previous value. You can toggle between <strong>Issues</strong> and <strong>Pull Requests</strong> by clicking on the buttons, hover over individual sparklines to get a tooltip showing the per/month values and click on the project name to take you to the GitHub page for that repository.</p>
<section class="press" align="center">
<button id="btnIssues" class="active">Issues</button>
<button id="btnPRs">Pull Requests</button>
</section>
<div id="textbox" class="rChartHeader">
<!-- The Start/End dates are setup dynamically, once the data is loaded -->
<p id="dataStartDate" class="alignleft"></p>
<p id="dataEndDate" class="alignright"></p>
</div>
<div style="clear: both;"></div>
<!-- All the sparklines are added to this div -->
<div id="sparkLines" class="rChart nvd3">
</div>
<p>This data gives a good indication of how healthy different repos are, are they growing over time, or staying the same. You can also see the different levels of activity each repo has and how they compare to other ones.</p>
<p>Whilst it’s clear that <a href="https://github.com/microsoft/vscode">Visual Studio Code</a> is way ahead of all the other repos (in ‘# of Issues’), it’s interesting to see that some of the .NET-only ones are still pretty large, notably CoreFX (base-class libraries), Roslyn (compiler) and CoreCLR (runtime).</p>
<hr />
<h2 id="overall-participation---community-v-microsoft">Overall Participation - Community v. Microsoft</h2>
<p>Next will will look at the <strong>total participation</strong> from the last 4 years, i.e. <strong>November 2014</strong> to <strong>November 2018</strong>. All <em>Pull Requests</em> and <em>Issues</em> are treated equally, so a large PR counts the same as one that fixes a speling mistake. Whilst this isn’t ideal it’s the simplest way to get an idea of the <strong>Microsoft/Community split</strong>. In addition, <em>Community</em> does include people paid by other companies to work on .NET Projects, for instance <a href="https://github.com/dotnet/coreclr/search?q=Samsung.com&unscoped_q=Samsung.com&type=Commits">Samsung Engineers</a>.</p>
<p><strong>Note:</strong> You can hover over the bars to get the actual numbers, rather than percentages.</p>
<body>
<div class="g-chart-issues">
<span style="font-weight:bold;font-size:large;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
<div class="g-chart-pull-requests">
<span style="font-weight:bold;font-size:large;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
</body>
<hr />
<h2 id="participation-over-time---community-v-microsoft">Participation over time - Community v. Microsoft</h2>
<p>Finally we can see the <strong>‘per-month’</strong> data from the last 4 years, i.e. <strong>November 2014</strong> to <strong>November 2018</strong>.</p>
<p><strong>Note</strong>: You can inspect different repos by selecting them from the pull-down list, but be aware that the y-axis on the graphs are re-scaled, so the maximum value will change each time.</p>
<div id="issuesGraph">
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
</div>
<div id="pullRequestsGraph">
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
</div>
<hr />
<h2 id="summary">Summary</h2>
<p>It’s clear that the community continues to be invested in the .NET-related, Open Source repositories, contributing significantly and for a sustained period of time. I think this is good for <em>all .NET developers</em>, whether you contribute to OSS or not, having .NET be a <strong>thriving, Open Source product</strong> has many benefits!</p>
A History of .NET Runtimes2018-10-02T00:00:00+00:00http://www.mattwarren.org/2018/10/02/A-History-of-.NET-Runtimes
<p>Recently I was fortunate enough to chat with <a href="https://twitter.com/Chrisdunelm">Chris Bacon</a> who wrote <a href="https://github.com/chrisdunelm/DotNetAnywhere">DotNetAnywhere</a> (<a href="/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/">an alternative .NET Runtime</a>) and I quipped with him:</p>
<blockquote>
<p>.. you’re probably one of only a <strong>select group</strong>(*) of people who’ve written a .NET runtime, that’s pretty cool!</p>
</blockquote>
<p>* if you exclude people who were paid to work on one, i.e. Microsoft/Mono/Xamarin engineers, it’s a <em>very</em> select group.</p>
<p>But it got me thinking, <strong>how many .NET Runtimes are there</strong>? I put together my own list, then enlisted a crack team of highly-paid researchers, a.k.a my twitter followers:</p>
<blockquote class="twitter-tweet" data-cards="hidden" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/LazyWeb?src=hash&ref_src=twsrc%5Etfw">#LazyWeb</a>, fun Friday quiz, how many different .NET Runtimes are there? (that implement ECMA-335 <a href="https://t.co/76stuYZLrw">https://t.co/76stuYZLrw</a>)<br />- .NET Framework<br />- .NET Core<br />- Mono<br />- Unity<br />- .NET Compact Framework<br />- DotNetAnywhere<br />- Silverlight<br />What have I missed out?</p>— Matt Warren (@matthewwarren) <a href="https://twitter.com/matthewwarren/status/1040622340739088384?ref_src=twsrc%5Etfw">September 14, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>For the purposes of this post I’m classifying a ‘<em>.NET Runtime</em>’ as anything that implements the <a href="/2018/04/06/Taking-a-look-at-the-ECMA-335-Standard-for-.NET/">ECMA-335 Standard for .NET</a> (more info <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/dotnet-standards.md">here</a>). I don’t know if there’s a more precise definition or even some way of officially veryifying conformance, but in practise it means that the runtimes can take a <strong>.NET exe/dll produced by any C#/F#/VB.NET compiler and run it</strong>.</p>
<p>Once I had the list, I made copious use of wikipedia (see the <a href="#references">list of ‘References’</a>) and came up with the following timeline:</p>
<iframe width="100%" height="400" src="https://time.graphics/embed?v=1&id=132735" frameborder="0" allowfullscreen=""></iframe>
<div><a style="font-size: 12px; text-decoration: none;" title="Timeline maker" href="https://time.graphics">Timeline maker</a></div>
<p>(If the interactive timeline isn’t working for you, take a look at <a href="/images/2018/10/History of .NET Runtimes - Timeline.png">this version</a>)</p>
<p><strong>If I’ve missed out any runtimes, please let me know!</strong></p>
<p>To make the timeline a bit easier to understand, I put each runtime into one of the following categories:</p>
<ol>
<li>
<font color="#f56c00" style="font-weight: bold;">Microsoft .NET Frameworks</font>
</li>
<li>
<font color="#5b0be9" style="font-weight: bold;">Other Microsoft Runtimes</font>
</li>
<li>
<font color="#46cc12" style="font-weight: bold;">Mono/Xamarin Runtimes</font>
</li>
<li>
<font color="#e9140b" style="font-weight: bold;">'Ahead-of-Time' (AOT) Runtimes</font>
</li>
<li>
<font color="#587934" style="font-weight: bold;">Community Projects</font>
</li>
<li>
<font color="#ec1954" style="font-weight: bold;">Research Projects</font>
</li>
</ol>
<p><strong>The rest of the post will look at the different runtimes in more detail. <em>Why</em> they were created, <em>What</em> they can do and <em>How</em> they compare to each other.</strong></p>
<hr />
<h2><font color="#f56c00" style="font-weight: bold;">Microsoft .NET Frameworks</font></h2>
<p>The original ‘.NET Framework’ was started by Microsoft in the late 1990’s and has been going strong ever since. Recently they’ve changed course somewhat with the announcement of <a href="https://blogs.msdn.microsoft.com/dotnet/2016/06/27/announcing-net-core-1-0/">.NET Core</a>, which is ‘<em>open-source</em>’ and ‘<em>cross-platform</em>’. In addition, by creating the <a href="https://blogs.msdn.microsoft.com/dotnet/2017/08/14/announcing-net-standard-2-0/">.NET Standard</a> they’ve provided a way for different runtimes to remain compatible:</p>
<blockquote>
<p><strong>.NET Standard is for sharing code.</strong> .NET Standard is a set of APIs that all .NET implementations must provide to conform to the standard. This unifies the .NET implementations and prevents future fragmentation.</p>
</blockquote>
<p>As an aside, if you want more information on the ‘History of .NET’, I really recommend <a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Anders-Hejlsberg-What-brought-about-the-birth-of-the-CLR">Anders Hejlsberg - What brought about the birth of the CLR?</a> and this presentation by <a href="https://twitter.com/richcampbell">Richard Campbell</a> who <em>really</em> knows how to tell a story!</p>
<iframe width="711" height="400" src="https://www.youtube.com/embed/FFCn_z7dn_A" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen=""></iframe>
<p>(Also <a href="https://dotnetrocks.com/?show=1500">available as a podcast</a> if you’d prefer and he’s <a href="https://twitter.com/richcampbell/status/966199852278403072">working on a book covering the same subject</a>. If you want to learn more about the history of the entire ‘<em>.NET Ecosystem</em>’ not just the Runtimes, check out <a href="http://corefx.strikingly.com/">‘Legends of .NET’</a>)</p>
<h2><font color="#5b0be9" style="font-weight: bold;">Other Microsoft Runtimes</font></h2>
<p>But outside of the main <em>general purpose</em> ‘.NET Framework’, Microsoft have also released other runtimes, designed for specific scenarios.</p>
<h3><font color="#5b0be9" style="font-weight: bold;">.NET Compact Framework</font></h3>
<p>The <em>Compact</em> (.NET CF) and <em>Micro</em> (.NET MF) Frameworks were both attempts to provide cut-down runtimes that would run on more constrained devices, for instance <a href="https://en.wikipedia.org/wiki/.NET_Compact_Framework">.NET CF</a>:</p>
<blockquote>
<p>… is designed to run on resource constrained mobile/embedded devices such as personal digital assistants (PDAs), mobile phones factory controllers, set-top boxes, etc. The .NET Compact Framework uses some of the same class libraries as the full .NET Framework and also a few libraries designed specifically for mobile devices such as .NET Compact Framework controls. However, the libraries are not exact copies of the .NET Framework; they are scaled down to use less space.</p>
</blockquote>
<h3><font color="#5b0be9" style="font-weight: bold;">.NET Micro Framework</font></h3>
<p>The <a href="https://en.wikipedia.org/wiki/.NET_Micro_Framework">.NET MF</a> is even more constrained:</p>
<blockquote>
<p>… for resource-constrained devices with at least 256 KB of flash and 64 KB of random-access memory (RAM). It includes a small version of the .NET Common Language Runtime (CLR) and supports development in C#, Visual Basic .NET, and debugging (in an emulator or on hardware) using Microsoft Visual Studio. NETMF features a subset of the .NET base class libraries (about 70 classes with about 420 methods),..
NETMF also features added libraries specific to embedded applications. It is free and open-source software released under Apache License 2.0.</p>
</blockquote>
<p>If you want to try it out, Scott Hanselman did a nice write-up <a href="https://www.hanselman.com/blog/TheNETMicroFrameworkHardwareForSoftwarePeople.aspx">The .NET Micro Framework - Hardware for Software People</a>.</p>
<h3><font color="#5b0be9" style="font-weight: bold;">Silverlight</font></h3>
<p>Although now only in <a href="https://support.microsoft.com/en-gb/lifecycle/search/12905">support mode</a> (or <a href="https://www.quora.com/Is-SilverLight-dead">‘dead’</a>/<a href="https://www.infragistics.com/community/blogs/b/engineering/posts/the-sunset-of-silverlight">‘sunsetted’</a> depending on your POV), it’s interesting to go back to the original announcement and see what <a href="https://weblogs.asp.net/scottgu/silverlight">Silverlight was trying to do</a>:</p>
<blockquote>
<p>Silverlight is a cross platform, cross browser .NET plug-in that enables designers and developers to build rich media experiences and RIAs for browsers. The preview builds we released this week currently support Firefox, Safari and IE browsers on both the Mac and Windows.</p>
</blockquote>
<p>Back in 2007, Silverlight 1.0 had <a href="https://weblogs.asp.net/scottgu/silverlight-1-0-released-and-silverlight-for-linux-announced">the following features</a> (it even worked on Linux!):</p>
<blockquote>
<ul>
<li>Built-in codec support for playing VC-1 and WMV video, and MP3 and WMA audio within a browser…</li>
<li>Silverlight supports the ability to progressively download and play media content from any web-server…</li>
<li>Silverlight also optionally supports built-in media streaming…</li>
<li>Silverlight enables you to create rich UI and animations, and blend vector graphics with HTML to create compelling content experiences…</li>
<li>Silverlight makes it easy to build rich video player interactive experiences…</li>
</ul>
</blockquote>
<h2><font color="#46cc12" style="font-weight: bold;">Mono/Xamarin Runtimes</font></h2>
<p>Mono came about when Miguel de Icaza and others explored the possibility of making .NET work on Linux (from <a href="https://www.mono-project.com/archived/mailpostearlystory/">Mono early history</a>):</p>
<blockquote>
<p>Who came first is not an important question to me, because Mono to me is a means to an end: a technology to help Linux succeed on the desktop.</p>
</blockquote>
<p>The <a href="https://www.mono-project.com/archived/mailpostearlystory/">same post</a> also talks about how it started:</p>
<blockquote>
<p>On the Mono side, the events were approximately like this:</p>
<p>As soon as the .NET documents came out in December 2000, I got really interested in the technology, and started where everyone starts: at the byte code interpreter, <strong>but I faced a problem: there was no specification for the metadata though</strong>.</p>
<p>The last modification to the early VM sources was done on January 22 2001, around that time I started posting to the .NET mailing lists asking for the missing information on the metadata file format.</p>
<p>…</p>
<p>About this time Sam Ruby was pushing at the ECMA committee to get the binary file format published, something that was not part of the original agenda. I do not know how things developed, but <strong>by April 2001 ECMA had published the file format</strong>.</p>
</blockquote>
<p>Over time, Mono (now <a href="https://tirania.org/blog/archive/2011/May-16.html">Xamarin</a>) has branched out into wider areas. It runs on <a href="https://github.com/xamarin/xamarin-android">Android</a> and <a href="https://github.com/xamarin/xamarin-macios">iOS/Mac</a> and was acquired by Microsoft in <a href="https://blogs.microsoft.com/blog/2016/02/24/microsoft-to-acquire-xamarin-and-empower-more-developers-to-build-apps-on-any-device/">Feb 2016</a>. In addition Unity & Mono/Xamarim have <a href="https://tirania.org/blog/archive/2009/Apr-09.html">long worked together</a>, to provide <a href="https://tirania.org/blog/archive/2007/Aug-31-1.html">C# support in Unity</a> and Unity is now a <a href="https://blogs.unity3d.com/2016/04/01/unity-joins-the-net-foundation/">member of the .NET Foundation</a>.</p>
<h2><font color="#e9140b" style="font-weight: bold;">'Ahead-of-Time' (AOT) Runtimes</font></h2>
<p>I wanted to include AOT runtimes as a seperate category, because traditionally .NET has been <a href="/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation/#how-it-works">‘Just-in-Time’ Compiled</a>, but over time more and more ‘Ahead-of-Time’ compilation options have been available.</p>
<p>As far as I can tell, Mono was the first, with an <a href="https://tirania.org/blog/archive/2006/Aug-17.html">‘AOT’ mode since Aug 2006</a>, but recently, Microsoft have released <a href="https://docs.microsoft.com/en-us/dotnet/framework/net-native/">.NET Native</a> and are they’re working on <a href="/2018/06/07/CoreRT-.NET-Runtime-for-AOT/">CoreRT - A .NET Runtime for AOT</a>.</p>
<h2><font color="#587934" style="font-weight: bold;">Community Projects</font></h2>
<p>However, not all ‘<em>.NET Runtimes’</em> were developed by Microsoft, or companies that they later acquired. There are some ‘<em>Community</em>’ owned ones:</p>
<ul>
<li>The oldest is <a href="http://www.gnu.org/software/dotgnu/pnet.html">DotGNU Portable.NET</a>, which started at the same time as Mono, with the goal ‘<em>to build a suite of Free Software tools to compile and execute applications for the Common Language Infrastructure (CLI)..</em>’.</li>
<li>Secondly, there is <a href="/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/">DotNetAnywhere</a>, the work of just one person, <a href="https://twitter.com/Chrisdunelm">Chris Bacon</a>. DotNetAnywhere has the <em>claim to fame</em> that it provided the <a href="http://blog.stevensanderson.com/2017/11/05/blazor-on-mono/">initial runtime</a> for the Blazor project. However it’s also an excellent resource if you want to look at what makes up a ‘.NET Compatible-Runtime’ and don’t have the time to wade through the millions of lines-of-code that make up the <a href="https://github.com/dotnet/coreclr/">CoreCLR</a>!</li>
<li>Next comes <a href="https://www.gocosmos.org/">CosmosOS</a> (<a href="https://github.com/CosmosOS/Cosmos">GitHub project</a>), which is not just a .NET Runtime, but a ‘<em>Managed Operating System</em>’. If you want to see how it achieves this I recommend reading through the <a href="https://www.gocosmos.org/faq/">excellent FAQ</a> or taking a <a href="https://github.com/CosmosOS/Cosmos/wiki/Quick-look-under-the-hood">quick look under the hood</a>. Another similar effort is <a href="https://en.wikipedia.org/wiki/SharpOS">SharpOS</a>.</li>
<li>Finally, I recently stumbled across <a href="https://web.archive.org/web/20090425073609/http://crossnet.codeplex.com/">CrossNet</a>, which takes a different approach, it ‘<em>parses .NET assemblies and generates unmanaged C++ code that can be compiled on any standard C++ compiler.’</em> Take a look at the <a href="https://web.archive.org/web/20090426113345/http://crossnet.codeplex.com:80/Wiki/View.aspx?title=overview">overview docs</a> and <a href="https://web.archive.org/web/20090426114553/http://crossnet.codeplex.com:80/Wiki/View.aspx?title=Examples%20of%20generated%20code">example of generated code</a> to learn more.</li>
</ul>
<h2><font color="#ec1954" style="font-weight: bold;">Research Projects</font></h2>
<p>Finally, onto the more esoteric .NET Runtimes. These are the <em>Research Projects</em> run by Microsoft, with the aim of seeing just how far can you extend a ‘managed runtime’, what can they be used for. Some of this research work has made it’s way back into commercial/shipping .NET Runtimes, for instance <a href="https://twitter.com/funcofjoe/status/943671450677927936">Span<T> came from Midori</a>.</p>
<p><a href="https://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure"><strong>Shared Source Common Language Infrastructure (SSCLI)</strong></a> (a.k.a ‘Rotor):</p>
<blockquote>
<p>is Microsoft’s shared source implementation of the CLI, the core of .NET. Although the SSCLI is not suitable for commercial use due to its license, it does make it possible for programmers to examine the implementation details of many .NET libraries and to create modified CLI versions. Microsoft provides the Shared Source CLI as a reference CLI implementation suitable for educational use.</p>
</blockquote>
<p>An interesting side-effect of releasing Rotor is that they were also able to release the <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52517">‘Gyro’ Project</a>, which gives an idea of how <a href="/2018/03/02/How-generics-were-added-to-.NET/#the-gyro-project---generics-for-rotor">Generics were added to the .NET Runtime</a>.</p>
<p><a href="https://en.wikipedia.org/wiki/Midori_(operating_system)"><strong>Midori</strong></a>:</p>
<blockquote>
<p>Midori was the code name for a managed code operating system being developed by Microsoft with joint effort of Microsoft Research. It had been reported to be a possible commercial implementation of the Singularity operating system, a research project started in 2003 to build a highly dependable operating system in which the <strong>kernel, device drivers, and applications are all written in managed code</strong>. It was designed for concurrency, and could run a program spread across multiple nodes at once. It also featured a security model that sandboxes applications for increased security. Microsoft had mapped out several possible migration paths from Windows to Midori. The operating system was discontinued some time in 2015, though many of its concepts were rolled into other Microsoft projects.</p>
</blockquote>
<p>Midori is the project that appears to have led to the most ideas making their way back into the ‘.NET Framework’, you can read more about this in <a href="https://twitter.com/funcOfJoe">Joe Duffy’s</a> excellent series <a href="http://joeduffyblog.com/2015/11/03/blogging-about-midori/">Blogging about Midori</a></p>
<ol>
<li><a href="http://joeduffyblog.com/2015/11/03/a-tale-of-three-safeties/">A Tale of Three Safeties</a></li>
<li><a href="http://joeduffyblog.com/2015/11/10/objects-as-secure-capabilities/">Objects as Secure Capabilities</a></li>
<li><a href="http://joeduffyblog.com/2015/11/19/asynchronous-everything/">Asynchronous Everything</a></li>
<li><a href="http://joeduffyblog.com/2015/12/19/safe-native-code">Safe Native Code</a></li>
<li><a href="http://joeduffyblog.com/2016/02/07/the-error-model">The Error Model</a></li>
<li><a href="http://joeduffyblog.com/2016/04/10/performance-culture">Performance Culture</a></li>
<li><a href="http://joeduffyblog.com/2016/11/30/15-years-of-concurrency/">15 Years of Concurrency</a></li>
</ol>
<p><a href="https://en.wikipedia.org/wiki/Singularity_(operating_system)"><strong>Singularity (operating system)</strong></a> (also <a href="https://archive.codeplex.com/?p=singularity">Singularity RDK</a>)</p>
<blockquote>
<p>Singularity is an experimental operating system (OS) which was built by Microsoft Research between 2003 and 2010. It was designed as a high dependability OS in which the <strong>kernel, device drivers, and application software were all written in managed code</strong>. Internal security uses type safety instead of hardware memory protection.</p>
</blockquote>
<p>Last, but not least, there is <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/glossary.md"><strong>Redhawk</strong></a>:</p>
<blockquote>
<p>Codename for experimental minimal managed code runtime that evolved into <a href="https://github.com/dotnet/corert">CoreRT</a>.</p>
</blockquote>
<hr />
<h2 id="references">References</h2>
<p>Below are the Wikipedia articles I referenced when creating the timeline:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/.NET_Framework">.NET Framework</a></li>
<li><a href="https://en.wikipedia.org/wiki/.NET_Framework_version_history">.NET Framework version history</a></li>
<li><a href="https://en.wikipedia.org/wiki/.NET_Core">.NET Core</a></li>
<li><a href="https://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure">Shared Source Common Language Infrastructure</a></li>
<li><a href="https://en.wikipedia.org/wiki/Mono_(software)">Mono (software)</a></li>
<li><a href="https://en.wikipedia.org/wiki/Unity_(game_engine)">Unity (game engine)</a></li>
<li><a href="https://en.wikipedia.org/wiki/Microsoft_Silverlight">Microsoft Silverlight</a></li>
<li><a href="https://en.wikipedia.org/wiki/.NET_Compact_Framework">.NET Compact Framework</a></li>
<li><a href="https://en.wikipedia.org/wiki/.NET_Micro_Framework">.NET Micro Framework</a></li>
<li><a href="https://en.wikipedia.org/wiki/Singularity_(operating_system)">Singularity (operating system)</a></li>
<li><a href="https://en.wikipedia.org/wiki/Midori_(operating_system)">Midori (operating system)</a></li>
<li><a href="https://en.wikipedia.org/wiki/DotGNU">DotGNU Portable.NET</a></li>
</ul>
Fuzzing the .NET JIT Compiler2018-08-28T00:00:00+00:00http://www.mattwarren.org/2018/08/28/Fuzzing-the-.NET-JIT-Compiler
<p>I recently came across the <a href="https://github.com/jakobbotsch/Fuzzlyn">excellent ‘Fuzzlyn’ project</a>, created as part of the <a href="https://kursuskatalog.au.dk/en/course/82764/language-based-security">‘Language-Based Security’ course at Aarhus University</a>. As per the project description Fuzzlyn is a:</p>
<blockquote>
<p>… fuzzer which utilizes Roslyn to generate random C# programs</p>
</blockquote>
<p>And what is a ‘fuzzer’, from the <a href="https://en.wikipedia.org/wiki/Fuzzing">Wikipedia page for ‘<em>fuzzing</em>’</a>:</p>
<blockquote>
<p><strong>Fuzzing</strong> or <strong>fuzz testing</strong> is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program.</p>
</blockquote>
<p>Or in other words, <strong>a <em>fuzzer</em> is a program that tries to create <em>source code</em> that finds <em>bugs</em> in a compiler</strong>.</p>
<p>Massive kudos to the developers behind Fuzzlyn, <a href="https://twitter.com/jakobbotsch">Jakob Botsch Nielsen</a> (who helped answer my questions when writing this post), <a href="https://twitter.com/Chrizzz42">Chris Schmidt</a> and <a href="https://github.com/JonasSL"> Jonas Larsen</a>, it’s an impressive project!! (to be clear, I have no link with the project and can’t take any of the credit for it)</p>
<hr />
<h2 id="compilation-in-net">Compilation in .NET</h2>
<p>But before we dive into ‘Fuzzlyn’ and what it does, we’re going to take a quick look at <strong>‘compilation’ in the .NET Framework</strong>. When you write C#/VB.NET/F# code (delete as appropriate) and compile it, the compiler converts it into <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">Intermediate Language (IL)</a> code. The IL is then stored in a .exe or .dll, which the Common Language Runtime (CLR) reads and executes when your program is actually run. However it’s the job of the <a href="http://mattwarren.org/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation/#how-it-works">Just-in-Time (JIT) Compiler</a> to convert the IL code into machine code.</p>
<p><strong>Why is this relevant?</strong> Because Fuzzlyn works by comparing the output of a <strong>Debug</strong> and a <strong>Release</strong> version of a program and if they are different, there’s a bug! But it turns out that very few optimisations are actually done by the <a href="https://github.com/dotnet/roslyn">‘Roslyn’ compiler</a>, compared to what the JIT does, from Eric Lippert’s excellent post <a href="https://blogs.msdn.microsoft.com/ericlippert/2009/06/11/what-does-the-optimize-switch-do/">What does the optimize switch do?</a> (2009)</p>
<blockquote>
<p>The /optimize flag <strong>does not change a huge amount of our emitting and generation logic</strong>. We try to always generate straightforward, verifiable code and then <strong>rely upon the jitter to do the heavy lifting of optimizations</strong> when it generates the real machine code. But we will do some simple optimizations with that flag set. For example, with the flag set:</p>
</blockquote>
<p>He then goes on to list the 15 things that the C# Compiler will optimise, before finishing with this:</p>
<blockquote>
<p>That’s pretty much it. These are very straightforward optimizations; <strong>there’s no inlining of IL, no loop unrolling, no interprocedural analysis</strong> whatsoever. We let the jitter team worry about optimizing the heck out of the code when it is actually spit into machine code; <strong>that’s the place where you can get real wins</strong>.</p>
</blockquote>
<p>So in .NET, very few of the techniques that an <a href="https://en.wikipedia.org/wiki/Optimizing_compiler">‘Optimising Compiler’</a> uses are done at <em>compile-time</em>. They are almost all done at <em>run-time</em> by the JIT Compiler (leaving aside <a href="/2018/06/07/CoreRT-.NET-Runtime-for-AOT/">AOT scenarios for the time being</a>).</p>
<p>For reference, most of the differences in IL are there to make the code easier to debug, for instance given this C# code:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">void</span> <span class="nf">M</span><span class="p">()</span> <span class="p">{</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">item</span> <span class="k">in</span> <span class="k">new</span> <span class="p">[]</span> <span class="p">{</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">4</span> <span class="p">})</span> <span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The differences in IL are shown below (‘Release’ on the left, ‘Debug’ on the right). As you can see there are a few extra <code class="language-plaintext highlighter-rouge">nop</code> instructions to allow the debugger to ‘step-through’ more locations in the code, plus an extra local variable, which makes it easier/possible to see the value when debugging.</p>
<p><a href="/images/2018/08/Release v Debug - IL Differences.png"><img src="/images/2018/08/Release v Debug - IL Differences.png" alt="Release v Debug - IL Differences" /></a></p>
<p>(click for larger image or you can view the <a href="https://sharplab.io/#v2:EYLgZgpghgLgrgJwgZwLQBEJinANjASQDsYIFsBjCAgWwAdcIaITYBLAeyIBoYQ3cAHwACABgAEwgIwBuALAAoYQGZJAJnEBhcQG9F4g5NXCALOICyACgCUu/YYdgOSKBQAW4ywDcoCcW1IafyJxIggAd3EAbQBdXXEpbnE1JOUkkwBfa3sHAz0FXIcASGkATksApmsZQxzcjLqGhQygA===">‘Release’ version</a> and the <a href="https://sharplab.io/#v2:EYLgZgpghgLgrgJwgZwLQBEJinANjASQDsYIFsBjCAgWwAdcIaITYBLAeyIBoYQ3c3ACYgA1AB8AAgAYABJICMAbgCwAKEkBmeQCZZAYVkBvdbLPztkgCyyAsgAoAlMdPm3YDkigUAFrPsAblAIsmykNKFEskQQAO6yANoAusayCtyyOhmaGVYAvo6ubmYmasVuAJCKAJz2YUyOSuZFxXktbWp5QA===">‘Debug’ version</a> on the excellent <a href="https://sharplab.io/">SharpLab</a>)</p>
<p>For more information on the differences in Release/Debug code-gen see the ‘Release (optimized)’ section in this doc on <a href="https://github.com/dotnet/roslyn/blob/master/docs/compilers/CSharp/CodeGen%20Differences.md">CodeGen Differences</a>. Also, because Roslyn is open-source we can see how this is handled in the code:</p>
<ul>
<li><a href="https://github.com/dotnet/roslyn/search?p=3&q=OptimizationLevel+-path%3Asrc%2FCompilers%2FTest+-path%3Asrc%2FCompilers%2FCSharp%2FTest+-path%3Asrc%2FCompilers%2FVisualBasic%2FTest+-path%3A%2Fsrc%2FEditorFeatures%2FTest+-path%3A%2Fsrc%2FScripting+-path%3A%2Fsrc%2FWorkspaces+-path%3A%2Fsrc%2FExpressionEvaluator+-path%3A%2Fsrc%2FVisualStudio&unscoped_q=OptimizationLevel+-path%3Asrc%2FCompilers%2FTest+-path%3Asrc%2FCompilers%2FCSharp%2FTest+-path%3Asrc%2FCompilers%2FVisualBasic%2FTest+-path%3A%2Fsrc%2FEditorFeatures%2FTest+-path%3A%2Fsrc%2FScripting+-path%3A%2Fsrc%2FWorkspaces+-path%3A%2Fsrc%2FExpressionEvaluator+-path%3A%2Fsrc%2FVisualStudio">All usages of the ‘OptimizationLevel’ enum</a></li>
<li><a href="https://github.com/dotnet/roslyn/search?q=ILEmitStyle+path%3Asrc%2FCompilers%2FCore+path%3Asrc%2FCompilers%2FCSharp+path%3Asrc%2FCompilers%2FVisualBasic&unscoped_q=ILEmitStyle+path%3Asrc%2FCompilers%2FCore+path%3Asrc%2FCompilers%2FCSharp+path%3Asrc%2FCompilers%2FVisualBasic">All usage of the ‘ILEmitStyle’ enum</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.7/src/Compilers/CSharp/Portable/CodeGen/CodeGenerator.cs#L88-L117">In Debug builds, extra ‘sequence points’ are created (as shown above)</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.7/src/Compilers/CSharp/Portable/Lowering/StateMachineRewriter/StateMachineRewriter.cs#L163-L186">Extra field added to the the async/await ‘State Machine’ in Debug builds</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.7/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_TryStatement.cs#L19-L32">In Release builds, some ‘catch’ blocks are discarded</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.7/src/Compilers/CSharp/Portable/Lowering/StateMachineRewriter/MethodToStateMachineRewriter.cs#L424-L425">In Debug builds, hoisted variables aren’t re-used</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2017-Version-15.7/src/Compilers/CSharp/Portable/Symbols/AnonymousTypes/SynthesizedSymbols/AnonymousType.TemplateSymbol.cs#L464">Extra Attribute is inserted in Debug builds</a></li>
</ul>
<p><strong>This all means that the ‘Fuzzlyn’ project has actually been finding bugs in the .NET JIT, not in the Roslyn Compiler</strong></p>
<p>(well, except this one <a href="https://github.com/dotnet/roslyn/issues/29481">Finally block belonging to unexecuted try runs anyway</a>, which was <a href="https://github.com/dotnet/roslyn/pull/29517">fixed here</a>)</p>
<hr />
<h2 id="how-it-works">How it works</h2>
<p>At the simplest level, Fuzzlyn works by compiling and running a piece of randomly generated code in ‘Debug’ and ‘Release’ versions and comparing the output. If the 2 versions produce different results, then it’s a bug, specifically a bug in the <strong>optimisations</strong> that the JIT compiler has attempted.</p>
<p>The .NET JIT, known as ‘RyuJIT’, has several modes. It can produce <strong>fully optimised</strong> code that has the highest-performance, or in can produce more <strong>‘debug’ friendly</strong> code that has no optimisations, but is much simpler. You can find out more about the different ‘optimisations’ that RyuJIT performs in this <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md">excellent tutorial</a>, in this <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/performance/JitOptimizerTodoAssessment.md">design doc</a> or you can search through the code for <a href="https://github.com/dotnet/coreclr/search?q=compDbgCode&unscoped_q=compDbgCode">usages of the ‘compDbgCode’ flag</a>.</p>
<p>From a high-level Fuzzlyn goes through the following steps:</p>
<ol>
<li><strong>Randomly</strong> generate a C# program</li>
<li><strong>Check</strong> if the code produces an error (Debug v. Release)</li>
<li><strong>Reduce</strong> the code to it’s simplest form</li>
</ol>
<p>If you want to see this in action, I ran Fuzzlyn until it produced a randomly generated program with a bug. You can see the <a href="https://gist.github.com/mattwarren/2293de54e15da4f54ac557dae09de386#file-fuzzlyn-bad-program-original-cs">original source</a> (6,802 LOC) and the <a href="https://gist.github.com/mattwarren/7bf0fa2b762b906614babc3ecfd06a80#file-fuzzlyn-bad-program-reduced-cs">reduced version</a> (28 LOC). What’s interesting is that you can clearly see the buggy line-of-code in the <a href="https://gist.github.com/mattwarren/2293de54e15da4f54ac557dae09de386#file-fuzzlyn-bad-program-original-cs-L4547">original code</a>, before it’s turned into a <a href="https://gist.github.com/mattwarren/7bf0fa2b762b906614babc3ecfd06a80#file-fuzzlyn-bad-program-reduced-cs-L17">simplified version</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Generated by Fuzzlyn v1.1 on 2018-08-22 15:19:26</span>
<span class="c1">// Seed: 14928117313359926641</span>
<span class="c1">// Reduced from 256.3 KiB to 0.4 KiB in 00:01:58</span>
<span class="c1">// Debug: Prints 0 line(s)</span>
<span class="c1">// Release: Prints 1 line(s)</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">Program</span>
<span class="p">{</span>
<span class="k">static</span> <span class="kt">short</span> <span class="n">s_18</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">byte</span> <span class="n">s_33</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">int</span><span class="p">[]</span> <span class="n">s_40</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">int</span><span class="p">[]{</span><span class="m">0</span><span class="p">};</span>
<span class="k">static</span> <span class="kt">short</span> <span class="n">s_74</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">s_18</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span><span class="p">;</span>
<span class="c1">// This comparision is the bug, in Debug it's False, in Release it's True</span>
<span class="c1">// However, '(ushort)(s_18 | 2L)' is 65,535 in Debug *and* Release</span>
<span class="k">if</span> <span class="p">(((</span><span class="kt">ushort</span><span class="p">)(</span><span class="n">s_18</span> <span class="p">|</span> <span class="m">2L</span><span class="p">)</span> <span class="p"><=</span> <span class="n">s_40</span><span class="p">[</span><span class="m">0</span><span class="p">]))</span>
<span class="p">{</span>
<span class="n">s_74</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="n">vr10</span> <span class="p">=</span> <span class="n">s_74</span> <span class="p"><</span> <span class="n">s_33</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">vr10</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">System</span><span class="p">.</span><span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="m">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="random-code-generation">Random Code Generation</h3>
<p>Fuzzlyn can’t produce every type of C# program, however it does support quite a few language features, from <a href="https://github.com/jakobbotsch/Fuzzlyn#supported-constructs">Supported constructs</a>:</p>
<blockquote>
<p>Fuzzlyn generates only a limited subset of C#. Most importantly, it does not support loops yet. It supports structs and classes, though it does not generate member methods in these. We make no attempt to fully support all kinds of expressions and statements.</p>
</blockquote>
<p>To see the code for these generators, follow the links below:</p>
<ul>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/CodeGenerator.cs">CodeGenerator</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/LiteralGenerator.cs">LiteralGenerator</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/master/Fuzzlyn/Methods/FuncGenerator.cs">FuncGenerator</a>, with specific generator for a:
<ul>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L124-L162">‘Statement’</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L164-L231">‘Block’</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L233-L341">‘Assignment’ statement</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L343-L355">‘Call’ statement</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L357-L376">‘If’ statement</a></li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L378-L393">‘Try/Catch’ statement</a></li>
</ul>
</li>
<li><a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/BinOpTable.cs">Binary Operation tables</a>, which are themselves <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn.TableGen/Program.cs">generated using Roslyn</a></li>
</ul>
<p>All the statements and expressions that are currently supported are <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Methods/FuncGenerator.cs#L921-L943">listed here</a>. Interestingly enough the <em>type</em> of statement/expression chosen is not completely random, instead that are chosen using <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/bb4b4753ed3dcdcebec52cedec475010324e7688/Fuzzlyn/FuzzlynOptions.cs#L43-L64">probability tables</a>, that look like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">ProbabilityDistribution</span> <span class="n">StatementTypeDist</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">=</span> <span class="k">new</span> <span class="nf">TableDistribution</span><span class="p">(</span><span class="k">new</span> <span class="n">Dictionary</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">double</span><span class="p">></span>
<span class="p">{</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">Assignment</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.57</span><span class="p">,</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">If</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.17</span><span class="p">,</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">Block</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.1</span><span class="p">,</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">Call</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.1</span><span class="p">,</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">TryFinally</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.05</span><span class="p">,</span>
<span class="p">[(</span><span class="kt">int</span><span class="p">)</span><span class="n">StatementKind</span><span class="p">.</span><span class="n">Return</span><span class="p">]</span> <span class="p">=</span> <span class="m">0.01</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>As we saw before, the initial program that Fuzzlyn produces is quite large (over 5,000 LOC), so why does it create and execute a very large program?</p>
<p>Partly because it’s quicker to do this compared to working with lots of smaller programs, i.e. the steps of generation, compilation and starting new processes can be reduced by running large programs.</p>
<p>In addition, Jakob explained the other reasons:</p>
<blockquote>
<ul>
<li><strong>Empirically, other similar projects have shown that larger programs are better</strong>. Csmith authors report that most bugs were found with examples of around 80 KB (I don’t remember the exact number). We actually found the same thing in v1.0 – our examples had an average size of 76 KB</li>
<li><strong>Small programs do not get as many opportunities to generate a lot of patterns</strong>. For example, it is very unlikely that a small program will have a method taking a <code class="language-plaintext highlighter-rouge">byte</code> parameter and at the same time, a method returning a <code class="language-plaintext highlighter-rouge">ref byte</code> (this pattern has a bug on Linux: <a href="https://github.com/dotnet/coreclr/issues/19256">dotnet/coreclr#19256</a>).</li>
<li>We mainly adjusted our probabilities based on how the examples looked. <strong>We strived for the generator to produce code that looked relatively like human code</strong>. This included going for a wide range of program sizes. By the way, you can run Fuzzlyn with <code class="language-plaintext highlighter-rouge">--stats --num-programs=10000</code> to get a view of the distribution of program sizes – it will output stats for every 500 programs generated.</li>
</ul>
</blockquote>
<h3 id="checking-for-bugs">‘Checking’ for bugs</h3>
<p>To check if the behaviour of 2 samples diverge (in ‘Release’ v ‘Debug’ mode), the tool inserts <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/master/Fuzzlyn/Execution/ChecksumSite.cs">checksum-related code</a> throughout the program. For example here’s a randomly generated method, note the calls to the <code class="language-plaintext highlighter-rouge">Checksum(..)</code> function at the end:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">sbyte</span> <span class="nf">M15</span><span class="p">(</span><span class="kt">int</span> <span class="n">arg0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">var0</span> <span class="p">=</span> <span class="p">-</span><span class="m">71</span> <span class="p"><</span> <span class="n">s_1</span><span class="p">;</span>
<span class="kt">uint</span> <span class="n">var1</span> <span class="p">=</span> <span class="p">(</span><span class="kt">uint</span><span class="p">)(</span><span class="m">1U</span><span class="n">L</span> <span class="p">&</span> <span class="n">s_4</span><span class="p">++);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">var0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">var0</span> <span class="p">=</span> <span class="n">var0</span><span class="p">;</span>
<span class="n">arg0</span> <span class="p">=</span> <span class="n">arg0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="k">ref</span> <span class="kt">ushort</span> <span class="n">var2</span> <span class="p">=</span> <span class="k">ref</span> <span class="n">s_4</span><span class="p">;</span>
<span class="n">var2</span> <span class="p">=</span> <span class="n">var2</span><span class="p">;</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_17"</span><span class="p">,</span> <span class="n">var2</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint</span> <span class="n">var3</span> <span class="p">=</span> <span class="n">var1</span><span class="p">;</span>
<span class="kt">short</span><span class="p">[]</span> <span class="n">var4</span> <span class="p">=</span> <span class="n">s_2</span><span class="p">[</span><span class="m">0</span><span class="p">][</span><span class="m">0</span><span class="p">];</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_18"</span><span class="p">,</span> <span class="n">arg0</span><span class="p">);</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_19"</span><span class="p">,</span> <span class="n">var0</span><span class="p">);</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_20"</span><span class="p">,</span> <span class="n">var1</span><span class="p">);</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_21"</span><span class="p">,</span> <span class="n">var3</span><span class="p">);</span>
<span class="n">s_rt</span><span class="p">.</span><span class="nf">Checksum</span><span class="p">(</span><span class="s">"c_22"</span><span class="p">,</span> <span class="n">var4</span><span class="p">[</span><span class="m">0</span><span class="p">]);</span>
<span class="k">return</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The checksums calls allow the execution of a program to be compared between ‘Release’ and ‘Debug’ modes, if a single variable has a different value, at <em>any point during execution</em>, the checksums will be different.</p>
<p>It’s also worth pointing out that Roslyn provides in-memory compilation that helps speed up this process because you don’t have to <em>shell-out</em> to an external process. As <a href="https://twitter.com/jakobbotsch/status/1004384699840696320">Jakob explains</a>:</p>
<blockquote>
<p>Additionally since we don’t have to start processes for every invocation when we use Roslyn’s in-memory compilation, we can compile and check for interesting behavior <em>super</em> fast. This allows our reducer to be really simple and dumb, while still giving great results.</p>
</blockquote>
<h3 id="reducing-the-output">‘Reducing’ the output</h3>
<p>However, the checksums also help Fuzzlyn ‘Reduce’ the program from the <a href="https://gist.github.com/mattwarren/2293de54e15da4f54ac557dae09de386#file-fuzzlyn-bad-program-original-cs">large initial version</a> to something <a href="https://gist.github.com/mattwarren/7bf0fa2b762b906614babc3ecfd06a80#file-fuzzlyn-bad-program-reduced-cs">much more readable</a>. By using a <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Reduction/Reducer.cs#L306-L384">‘binary search’ technique</a> it can remove a section of code and compare the checksums of the remaining code. If the checksums still differ then the remaining code contains the error/bug and Fuzzlyn can carry on reducing it, otherwise it can be discarded.</p>
<p>In addition, Fuzzlyn makes good use of the <a href="https://github.com/dotnet/roslyn/wiki/Roslyn-Overview#syntax-trees">Roslyn ‘syntax tree’ API</a> when removing code. For instance the <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/b1391faf9f533d1613c46118d17b7bc2b1af2c3f/Fuzzlyn/Reduction/CoarseStatementRemover.cs#L11">CoarseStatementRemover class</a> makes use of the Roslyn <code class="language-plaintext highlighter-rouge">CSharpSyntaxWriter</code> class, which is <a href="https://joshvarty.com/2014/08/15/learn-roslyn-now-part-5-csharpsyntaxrewriter/">designed to allow syntax re-writing</a> (also see <a href="https://johnkoerner.com/csharp/using-a-csharp-syntax-rewriter/">Using a CSharp Syntax Rewriter</a>).</p>
<hr />
<h2 id="the-results">The Results</h2>
<p>What initially drew me to the Fuzzlyn project (aside from the <a href="https://twitter.com/matthewwarren/status/1004013915876020225">great name</a>) was the <a href="https://twitter.com/matthewwarren/status/1027224393217449986">impressive results I saw it getting</a>. As of the end of Aug 2018, they’re reported 22 bugs, of which 11 have already been fixed (kudos to the .NET JIT devs for fixing them so quickly).</p>
<p>Here’s a list of some of them, taken from the <a href="https://github.com/jakobbotsch/Fuzzlyn/blob/master/README.md#bugs-reported">project README</a>:</p>
<blockquote>
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/18232">NullReferenceException thrown for multi-dimensional arrays in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18235">Wrong integer promotion in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18238">Cast to ushort is dropped in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18259">Wrong value passed to generic interface method in release</a></li>
<li><a href="https://github.com/dotnet/roslyn/issues/27348">Constant-folding int.MinValue % -1</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/18522">Deterministic program outputs indeterministic results on Linux in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18770">RyuJIT incorrectly reorders expression containing a CSE, resulting in exception thrown in release</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/18780">RyuJIT incorrectly narrows value on ARM32/x86 in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18850">Invalid value numbering when morphing casts that changes signedness after global morph</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/18867">RyuJIT spills 16 bit value but reloads as 32 bits in ARM32/x86 in release</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/18884">RyuJIT fails to preserve variable allocated to RCX around shift on x64 in release</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/19243">RyuJIT: Invalid ordering when assigning ref-return</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/19256">RyuJIT: Argument written to stack too early on Linux</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/19272">RyuJIT: Morph forgets about side effects when optimizing casted shift</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/19444">RyuJIT: By-ref assignment with null leads to runtime crash</a> (fixed)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/19558">RyuJIT: Mishandling of subrange assertion for rewritten call parameter</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/19583">RyuJIT: Incorrect ordering around Interlocked.Exchange and Interlocked.CompareExchange</a></li>
</ul>
</blockquote>
<p>(for the most up-to-date list see the <a href="https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=is%3Aissue+author%3Ajakobbotsch">GitHub Issues created by @jakobbotsch</a>)</p>
<hr />
<h2 id="summary">Summary</h2>
<p>I think that Fuzzlyn is a fantastic project, anything that roots out bugs or undesired behaviour in the JIT is a great benefit to all .NET Developers. If you want a see what the <em>potential</em> side-effects of JIT bugs can be, take a look at <a href="https://nickcraver.com/blog/2015/07/27/why-you-should-wait-on-dotnet-46/">Why you should wait on upgrading to .Net 4.6</a> by <a href="https://twitter.com/Nick_Craver">Nick Craver</a> (one of the developers at Stack Overflow).</p>
<p>Now, you could argue that some of the code patterns that Fuzzlyn detects are not ones you’d normally write, e.g. <code class="language-plaintext highlighter-rouge">if (((ushort)(s_18 | 2L) <= s_40[0]))</code>. But the wider point is that it’s <em>valid C# code</em>, which isn’t behaving as it should. Also, if you ever wrote this code you’d have a horrible time tracking down the problem because:</p>
<ol>
<li>Everyone knows that <a href="https://blog.codinghorror.com/the-first-rule-of-programming-its-always-your-fault/">The First Rule of Programming: It’s Always Your Fault</a> or <a href="https://lingpipe-blog.com/2007/06/27/select-isnt-broken-or-horses-not-zebras/">“select” Isn’t Broken</a>, i.e. getting to the point where you’re sure it is the compilers fault could take a while!</li>
<li>If you tried to debug it, the problem would go away (Fuzzlyn only finds Debug v. Release differences). At which point you might begin to doubt your sanity!</li>
</ol>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=17863554">Hacker News</a>, <a href="https://www.reddit.com/r/dotnet/comments/9b0qeo/fuzzing_the_net_jit_compiler_performance_is_a/">/r/dotnet</a> or <a href="https://www.reddit.com/r/csharp/comments/9b0qq5/fuzzing_the_net_jit_compiler_performance_is_a/">/r/csharp</a></p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>Jakob was kind enough to share some additional links with me:</p>
<ul>
<li><a href="http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf">Finding and Understanding Bugs in C Compilers (Csmith)</a> (pdf)</li>
<li><a href="http://www.cs.utah.edu/~regehr/papers/pldi12-preprint.pdf">Test-Case Reduction for C Compiler Bugs (C-reduce)</a> (pdf)</li>
<li><a href="http://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf">QuickCheck: a lightweight tool for random testing of Haskell programs</a> (pdf)
<ul>
<li>This deals with test-case generation for general programs, not for compilers, but still an interesting paper nonetheless. QuickCheck also includes test case reduction, but unfortunately not much about it in their papers.</li>
</ul>
</li>
</ul>
<p>Also I asked him “<em>Is any part of Fuzzlyn based on well known techniques, is it all implemented from scratch, or somewhere in-between</em>?”</p>
<blockquote>
<p>The state-of-the-art fuzzing techniques are unfortunately not well suited for testing the later stages of compilers (eg. the code output, optimization stages and so on). These techniques are for example symbolic execution, taint tracking, input length exploration, path slicing and more. The problem is that compilers use many intermediate representations, and it is hard to cross reference between what the fuzzer is passing in and what code is being executed at each stage. Even getting something to parse is hard without some kind of knowledge about what the structure needs to be. Fuzzlyn does not these techniques.</p>
<p>On the other hand, Fuzzlyn was very inspired by Csmith, which is a similar tool. But most of the code was written from scratch, since there is a big difference in generating C code (Csmith) and C# code. It is much more complicated to generate interesting C code that is free from undefined behavior.</p>
</blockquote>
Monitoring and Observability in the .NET Runtime2018-08-21T00:00:00+00:00http://www.mattwarren.org/2018/08/21/Monitoring-and-Observability-in-the-.NET-Runtime
<p>.NET is a <a href="https://en.wikipedia.org/wiki/Managed_code"><em>managed runtime</em></a>, which means that it provides high-level features that ‘manage’ your program for you, from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/intro-to-clr.md#fundamental-features-of-the-clr">Introduction to the Common Language Runtime (CLR)</a> (written in 2007):</p>
<blockquote>
<p>The runtime has many features, so it is useful to categorize them as follows:</p>
<ol>
<li><strong>Fundamental features</strong> – Features that have broad impact on the design of other features. These include:
<ol>
<li>Garbage Collection</li>
<li>Memory Safety and Type Safety</li>
<li>High level support for programming languages.</li>
</ol>
</li>
<li><strong>Secondary features</strong> – Features enabled by the fundamental features that may not be required by many useful programs:
<ol>
<li>Program isolation with AppDomains</li>
<li>Program Security and sandboxing</li>
</ol>
</li>
<li><strong>Other Features</strong> – Features that all runtime environments need but that do not leverage the fundamental features of the CLR. Instead, they are the result of the desire to create a complete programming environment. Among them are:
<ol>
<li>Versioning</li>
<li><strong>Debugging/Profiling</strong></li>
<li>Interoperation</li>
</ol>
</li>
</ol>
</blockquote>
<p>You can see that ‘Debugging/Profiling’, whilst not a Fundamental or Secondary feature, still makes it into the list because of a ‘<em>desire to create a complete programming environment</em>’.</p>
<p><strong>The rest of this post will look at <em>what</em> <a href="https://en.wikipedia.org/wiki/Application_performance_management">Monitoring</a>, <a href="https://en.wikipedia.org/wiki/Observability">Observability</a> and <a href="https://en.wikipedia.org/wiki/Virtual_machine_introspection">Introspection</a> features the Core CLR provides, <em>why</em> they’re useful and <em>how</em> it provides them.</strong></p>
<p>To make it easier to navigate, the post is split up into 3 main sections (with some ‘extra-reading material’ at the end):</p>
<ul>
<li><a href="#diagnostics">Diagnostics</a>
<ul>
<li>Perf View</li>
<li>Common Infrastructure</li>
<li>Future Plans</li>
</ul>
</li>
<li><a href="#profiling">Profiling</a>
<ul>
<li>ICorProfiler API</li>
<li>Profiling v. Debugging</li>
</ul>
</li>
<li><a href="#debugging">Debugging</a>
<ul>
<li>ICorDebug API</li>
<li>SOS and the DAC</li>
<li>3rd Party Debuggers</li>
<li>Memory Dumps</li>
</ul>
</li>
<li><a href="#further-reading">Further Reading</a></li>
</ul>
<hr />
<h2 id="diagnostics">Diagnostics</h2>
<p>Firstly we are going to look at the <strong>diagnostic</strong> information that the CLR provides, which has traditionally been supplied via <a href="https://docs.microsoft.com/en-us/windows/desktop/etw/about-event-tracing">‘Event Tracing for Windows’</a> (ETW).</p>
<p>There is quite a wide range of events that the <a href="https://docs.microsoft.com/en-us/dotnet/framework/performance/clr-etw-keywords-and-levels">CLR provides</a> related to:</p>
<ul>
<li>Garbage Collection (GC)</li>
<li>Just-in-Time (JIT) Compilation</li>
<li>Module and AppDomains</li>
<li>Threading and Lock Contention</li>
<li>and much more</li>
</ul>
<p>For example this is where the <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/corhost.cpp#L649">AppDomain Load event is fired</a>, this is the <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/exceptionhandling.cpp#L203">Exception Thrown event</a> and here is the <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/gctoclreventsink.cpp#L139-L144">GC Allocation Tick event</a>.</p>
<h3 id="perf-view">Perf View</h3>
<p>If you want to see the ETW Events coming from your .NET program I recommend using the excellent <a href="https://github.com/Microsoft/perfview">PerfView tool</a> and starting with these <a href="https://channel9.msdn.com/Series/PerfView-Tutorial">PerfView Tutorials</a> or this excellent talk <a href="https://www.slideshare.net/InfoQ/perfview-the-ultimate-net-performance-tool">PerfView: The Ultimate .NET Performance Tool</a>. PerfView is widely regarded because it provides invaluable information, for instance Microsoft Engineers regularly use it for <a href="https://github.com/dotnet/corefx/issues/28834">performance investigations</a>.</p>
<p><img src="/images/2018/08/PerfView - CPU Stacks.jpg" alt="PerfView - CPU Stacks" /></p>
<h3 id="common-infrastructure">Common Infrastructure</h3>
<p>However, in case it wasn’t clear from the name, ETW events are only available on Windows, which doesn’t really fit into the new ‘cross-platform’ world of .NET Core. You can use <a href="https://github.com/dotnet/coreclr/blob/release/2.1/Documentation/project-docs/linux-performance-tracing.md">PerfView for Performance Tracing on Linux</a> (via <a href="https://lttng.org/">LTTng</a>), but that is only the cmd-line collection tool, known as ‘PerfCollect’, the analysis and rich UI (which includes <a href="https://github.com/Microsoft/perfview/pull/502">flamegraphs</a>) is currently Windows only.</p>
<p>But if you do want to analyse .NET Performance Linux, there are some other approaches:</p>
<ul>
<li><a href="https://blogs.microsoft.co.il/sasha/2018/02/06/getting-stacks-for-lttng-events-with-net-core-on-linux/">Getting Stacks for LTTng Events with .NET Core on Linux</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/18465">Linux performance problem</a></li>
</ul>
<p>The 2nd link above discusses the new <strong>‘EventPipe’ infrastructure</strong> that is being worked on in .NET Core (along with EventSources & EventListeners, can you spot a theme!), you can see its aims in <a href="https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md">Cross-Platform Performance Monitoring Design</a>. At a high-level it will provide a single place for the CLR to push ‘events’ related to diagnostics and performance. These ‘events’ will then be routed to one or more loggers which may include ETW, LTTng, and BPF for example, with the exact logger being determined by which OS/Platform the CLR is running on. There is also more background information in <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md">.NET Cross-Plat Performance and Eventing Design</a> that explains the pros/cons of the different logging technologies.</p>
<p>All the work being done on ‘Event Pipes’ is being tracked in the <a href="https://github.com/dotnet/coreclr/projects/5">‘Performance Monitoring’ project</a> and the associated <a href="https://github.com/dotnet/coreclr/search?q=EventPipe&type=Issues">‘EventPipe’ Issues</a>.</p>
<h3 id="future-plans">Future Plans</h3>
<p>Finally, there are also future plans for a <a href="https://github.com/dotnet/designs/blob/master/accepted/performance-profiling-controller.md">Performance Profiling Controller</a> which has the following goal:</p>
<blockquote>
<p>The controller is responsible for control of the profiling infrastructure and exposure of performance data produced by .NET performance diagnostics components in a simple and cross-platform way.</p>
</blockquote>
<p>The idea is for it to expose the <a href="https://github.com/dotnet/designs/blob/master/accepted/performance-profiling-controller.md#functionality-exposed-through-controller">following functionality via a HTTP server</a>, by pulling all the relevant data from ‘Event Pipes’:</p>
<blockquote>
<p><strong>REST APIs</strong></p>
<ul>
<li>Pri 1: Simple Profiling: Profile the runtime for X amount of time and return the trace.</li>
<li>Pri 1: Advanced Profiling: Start tracing (along with configuration)</li>
<li>Pri 1: Advanced Profiling: Stop tracing (the response to calling this will be the trace itself)</li>
<li>Pri 2: Get the statistics associated with all EventCounters or a specified EventCounter.</li>
</ul>
<p><strong>Browsable HTML Pages</strong></p>
<ul>
<li>Pri 1: Textual representation of all managed code stacks in the process.
<ul>
<li>Provides an snapshot overview of what’s currently running for use as a simple diagnostic report.</li>
</ul>
</li>
<li>Pri 2: Display the current state (potentially with history) of EventCounters.
<ul>
<li>Provides an overview of the existing counters and their values.</li>
<li>OPEN ISSUE: I don’t believe the necessary public APIs are present to enumerate EventCounters.</li>
</ul>
</li>
</ul>
</blockquote>
<p>I’m excited to see where the ‘Performance Profiling Controller’ (PPC?) goes, I think it’ll be really valuable for .NET to have this built-in to the CLR, it’s something that <a href="https://github.com/golang/go/wiki/Performance">other runtimes have</a>.</p>
<hr />
<h2 id="profiling">Profiling</h2>
<p>Another powerful feature the CLR provides is the <a href="https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ms404386(v%3dvs.100)">Profiling API</a>, which is (mostly) used by 3rd party tools to hook into the runtime at a very low-level. You can find our more about the API in <a href="https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/bb384493(v%3dvs.100)">this overview</a>, but at a high-level, it allows your to wire up callbacks that are triggered when:</p>
<ul>
<li>GC-related events happen</li>
<li>Exceptions are thrown</li>
<li>Assemblies are loaded/unloaded</li>
<li><a href="https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ms230818%28v%3dvs.100%29">much, much more</a></li>
</ul>
<p><img src="/images/2018/08/profiling-overview.png" alt="profiling-overview" /></p>
<p><strong>Image from the BOTR page <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/profiling.md#profiling-api--overview">Profiling API – Overview</a></strong></p>
<p>In addition is has other <strong>very power features</strong>. Firstly you can <strong>setup hooks that are called every time a .NET method is executed</strong> whether in the runtime or from users code. These callbacks are known as ‘Enter/Leave’ hooks and there is a <a href="https://github.com/Microsoft/clr-samples/tree/master/ProfilingAPI/ReJITEnterLeaveHooks">nice sample</a> that shows how to use them, however to make them work you need to understand <a href="https://github.com/dotnet/coreclr/issues/19023">‘calling conventions’ across different OSes and CPU architectures</a>, which <a href="https://github.com/dotnet/coreclr/issues/18977">isn’t always easy</a>. Also, as a warning, the Profiling API is a COM component that can only be accessed via C/C++ code, you can’t use it from C#/F#/VB.NET!</p>
<p>Secondly, the Profiler is able to <strong>re-write the IL code of any .NET method before it is JITted</strong>, via the <a href="https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerfunctioncontrol-setilfunctionbody-method">SetILFunctionBody() API</a>. This API is hugely powerful and forms the basis of many .NET <a href="https://stackify.com/application-performance-management-tools/">APM Tools</a>, you can learn more about how to use it in my previous post <a href="/2014/08/14/how-to-mock-sealed-classes-and-static-methods/">How to mock sealed classes and static methods</a> and the <a href="https://github.com/mattwarren/DDD2011_ProfilerDemo/commit/9f804cec8ef11b802e020e648180b436a429833f?w=1">accompanying code</a>.</p>
<h3 id="icorprofiler-api">ICorProfiler API</h3>
<p>It turns out that the run-time has to perform all sorts of crazy tricks to make the Profiling API work, just look at what went into this PR <a href="https://github.com/dotnet/coreclr/pull/19054">Allow rejit on attach</a> (for more info on ‘ReJIT’ see <a href="https://blogs.msdn.microsoft.com/davbr/2011/10/12/rejit-a-how-to-guide/">ReJIT: A How-To Guide</a>).</p>
<p>The overall definition for all the Profiling API interfaces and callbacks is found in <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/corprof.idl">\vm\inc\corprof.idl</a> (see <a href="https://en.wikipedia.org/wiki/Interface_description_language">Interface description language</a>). But it’s divided into 2 logical parts, one is the <strong>Profiler -> ‘Execution Engine’ (EE)</strong> interface, known as<code class="language-plaintext highlighter-rouge">ICorProfilerInfo</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Declaration of class that implements the ICorProfilerInfo* interfaces, which allow the</span>
<span class="c1">// Profiler to communicate with the EE. This allows the Profiler DLL to get</span>
<span class="c1">// access to private EE data structures and other things that should never be exported</span>
<span class="c1">// outside of the EE.</span>
</code></pre></div></div>
<p>Which is implemented in the following files:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/proftoeeinterfaceimpl.h">\vm\proftoeeinterfaceimpl.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/proftoeeinterfaceimpl.inl">\vm\proftoeeinterfaceimpl.inl</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/proftoeeinterfaceimpl.cpp">\vm\proftoeeinterfaceimpl.cpp</a></li>
</ul>
<p>The other main part is the <strong>EE -> Profiler</strong> callbacks, which are grouped together under the <code class="language-plaintext highlighter-rouge">ICorProfilerCallback</code> interface:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// This module implements wrappers around calling the profiler's </span>
<span class="c1">// ICorProfilerCallaback* interfaces. When code in the EE needs to call the</span>
<span class="c1">// profiler, it goes through EEToProfInterfaceImpl to do so.</span>
</code></pre></div></div>
<p>These callbacks are implemented across the following files:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/eetoprofinterfaceimpl.h">vm\eetoprofinterfaceimpl.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/eetoprofinterfaceimpl.inl">vm\eetoprofinterfaceimpl.inl</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/eetoprofinterfaceimpl.cpp">vm\eetoprofinterfaceimpl.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/eetoprofinterfacewrapper.inl">vm\eetoprofinterfacewrapper.inl</a></li>
</ul>
<p>Finally, it’s worth pointing out that the Profiler APIs might not work across all OSes and CPU-archs that .NET Core runs on, e.g. <a href="https://github.com/dotnet/coreclr/issues/18977">ELT call stub issues on Linux</a>, see <a href="https://github.com/dotnet/coreclr/blob/release/2.1/Documentation/project-docs/profiling-api-status.md">Status of CoreCLR Profiler APIs</a> for more info.</p>
<h3 id="profiling-v-debugging">Profiling v. Debugging</h3>
<p>As a quick aside, ‘Profiling’ and ‘Debugging’ do have some overlap, so it’s helpful to understand what the different APIs provide <em>in the context of the .NET Runtime</em>, from <a href="https://blogs.msdn.microsoft.com/jmstall/2004/10/22/clr-debugging-vs-clr-profiling/">CLR Debugging vs. CLR Profiling</a></p>
<p><img src="/images/2018/08/Design Differences between CLR Debugging and CLR Profiling.png" alt="Design Differences between CLR Debugging and CLR Profiling" /></p>
<hr />
<h2 id="debugging">Debugging</h2>
<p>Debugging means different things to different people, for instance I asked on Twitter “<em>what are the ways that you’ve debugged a .NET program</em>” and got a <a href="https://mobile.twitter.com/matthewwarren/status/1030444463385178113">wide range</a> of <a href="https://mobile.twitter.com/matthewwarren/status/1030580487969038344">different responses</a>, although both sets of responses contain a really good list of tools and techniques, so they’re worth checking out, thanks #LazyWeb!</p>
<p>But perhaps this quote best sums up what <strong>Debugging really is</strong> 😊</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Debugging is like being the detective in a crime movie where you are also the murderer.</p>— Filipe Fortes (@fortes) <a href="https://twitter.com/fortes/status/399339918213652480?ref_src=twsrc%5Etfw">November 10, 2013</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>The CLR provides a very extensive range of features related to Debugging, but why does it need to provide these services, the excellent post <a href="https://blogs.msdn.microsoft.com/jmstall/2004/10/10/why-is-managed-debugging-different-than-native-debugging/">Why is managed debugging different than native-debugging?</a> provides 3 reasons:</p>
<ol>
<li>Native debugging can be abstracted at the hardware level but <strong>managed debugging needs to be abstracted at the IL level</strong></li>
<li>Managed debugging needs a lot of information <strong>not available until runtime</strong></li>
<li>A managed debugger needs to <strong>coordinate with the Garbage Collector (GC)</strong></li>
</ol>
<p>So to give a decent experience, the CLR <em>has</em> to provide the <a href="https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/debugging/">higher-level debugging API</a> known as <code class="language-plaintext highlighter-rouge">ICorDebug</code>, which is shown in the image below of a ‘common debugging scenario’ from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/dac-notes.md#marshaling-specifics">the BOTR</a>:</p>
<p><img src="/images/2018/08/common debugging scenario.png" alt="common debugging scenario" /></p>
<p>In addition, there is a nice description of how the different parts interact in <a href="https://blogs.msdn.microsoft.com/jmstall/2004/12/28/how-do-managed-breakpoints-work/">How do Managed Breakpoints work?</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Here’s an overview of the pipeline of components:
1) End-user
2) Debugger (such as Visual Studio or MDbg).
3) CLR Debugging Services (which we call "The Right Side"). This is the implementation of ICorDebug (in mscordbi.dll).
---- process boundary between Debugger and Debuggee ----
4) CLR. This is mscorwks.dll. This contains the in-process portion of the debugging services (which we call "The Left Side") which communicates directly with the RS in stage #3.
5) Debuggee's code (such as end users C# program)
</code></pre></div></div>
<h3 id="icordebug-api">ICorDebug API</h3>
<p>But how is all this implemented and what are the different components, from <a href="https://github.com/Microsoft/clrmd/blob/master/Documentation/GettingStarted.md#clr-debugging-a-brief-introduction">CLR Debugging, a brief introduction</a>:</p>
<blockquote>
<p>All of .Net debugging support is implemented on top of a dll we call “The Dac”. This file (usually named <code class="language-plaintext highlighter-rouge">mscordacwks.dll</code>) is the building block for both our public debugging API (<code class="language-plaintext highlighter-rouge">ICorDebug</code>) as well as the two private debugging APIs: The SOS-Dac API and IXCLR.</p>
<p>In a perfect world, everyone would use <code class="language-plaintext highlighter-rouge">ICorDebug</code>, our public debugging API. However a vast majority of features needed by tool developers such as yourself is lacking from <code class="language-plaintext highlighter-rouge">ICorDebug</code>. This is a problem that we are fixing where we can, but these improvements go into CLR v.next, not older versions of CLR. In fact, the <code class="language-plaintext highlighter-rouge">ICorDebug</code> API only added support for crash dump debugging in CLR v4. Anyone debugging CLR v2 crash dumps cannot use <code class="language-plaintext highlighter-rouge">ICorDebug</code> at all!</p>
</blockquote>
<p>(for an additional write-up, see <a href="https://github.com/dotnet/coreclr/blob/master/src/ToolBox/SOS/SOSAndICorDebug.md">SOS & ICorDebug</a>)</p>
<p>The <code class="language-plaintext highlighter-rouge">ICorDebug</code> API is actually split up into multiple interfaces, there are over 70 of them!! I won’t list them all here, but I will show the categories they fall into, for more info see <a href="https://blogs.msdn.microsoft.com/jmstall/2006/01/04/partition-of-icordebug/">Partition of ICorDebug</a> where this list came from, as it goes into much more detail.</p>
<ul>
<li><strong>Top-level:</strong> ICorDebug + ICorDebug2 are the top-level interfaces which effectively serve as a collection of ICorDebugProcess objects.</li>
<li><strong>Callbacks:</strong> Managed debug events are dispatched via methods on a callback object implemented by the debugger</li>
<li><strong>Process:</strong> This set of interfaces represents running code and includes the APIs related to eventing.</li>
<li><strong>Code / Type Inspection:</strong> Could mostly operate on a static PE image, although there are a few convenience methods for live data.</li>
<li><strong>Execution Control:</strong> Execution is the ability to “inspect” a thread’s execution. Practically, this means things like placing breakpoints (F9) and doing stepping (F11 step-in, F10 step-over, S+F11 step-out). ICorDebug’s Execution control only operates within managed code.</li>
<li><strong>Threads + Callstacks:</strong> Callstacks are the backbone of the debugger’s inspection functionality. The following interfaces are related to taking a callstack. ICorDebug only exposes debugging managed code, and thus the stacks traces are managed-only.</li>
<li><strong>Object Inspection:</strong> Object inspection is the part of the API that lets you see the values of the variables throughout the debuggee. For each interface, I list the “MVP” method that I think must succinctly conveys the purpose of that interface.</li>
</ul>
<p>One other note, as with the Profiling APIs the level of support for the Debugging API varies across OS’s and CPU architectures. For instance, as of Aug 2018 there’s <a href="https://github.com/dotnet/diagnostics/issues/58#issuecomment-414182115">“no solution for Linux ARM of managed debugging and diagnostic”</a>. For more info on ‘Linux’ support in general, see this great post <a href="https://www.raydbg.com/2018/Debugging-Net-Core-on-Linux-with-LLDB/">Debugging .NET Core on Linux with LLDB</a> and check-out the <a href="https://github.com/dotnet/diagnostics">Diagnostics repository</a> from Microsoft that has the goal of making it easier to debug .NET programs on Linux.</p>
<p>Finally, if you want to see what the <code class="language-plaintext highlighter-rouge">ICorDebug</code> APIs look like in C#, take a look at the <a href="https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/ICorDebug/ICorDebugWrappers.cs">wrappers included in CLRMD library</a>, include all the <a href="https://github.com/Microsoft/clrmd/blob/c81a592f3041a9ae86f4c09351d8183801e39eed/src/Microsoft.Diagnostics.Runtime/ICorDebug/ICorDebugHelpers.cs">available callbacks</a> (CLRMD will be covered in more depth, later on in this post).</p>
<h3 id="sos-and-the-dac">SOS and the DAC</h3>
<p>The ‘Data Access Component’ (DAC) is discussed in detail in the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/dac-notes.md">BOTR page</a>, but in essence it provides ‘out-of-process’ access to the CLR data structures, so that their internal details can be read from <em>another process</em>. This allows a debugger (via <code class="language-plaintext highlighter-rouge">ICorDebug</code>) or the <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension">‘Son of Strike’ (SOS) extension</a> to reach into a running instance of the CLR or a memory dump and find things like:</p>
<ul>
<li>all the running threads</li>
<li>what objects are on the managed heap</li>
<li>full information about a method, including the machine code</li>
<li>the current ‘stack trace’</li>
</ul>
<p><strong>Quick aside</strong>, if you want an explanation of all the strange names and a bit of a ‘.NET History Lesson’ see <a href="https://stackoverflow.com/questions/21361602/what-the-ee-means-in-sos/21363245#21363245">this Stack Overflow answer</a>.</p>
<p>The full list of <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#sos-commands">SOS Commands</a> is quite impressive and using it along-side WinDBG allows you a very low-level insight into what’s going on in your program and the CLR. To see how it’s implemented, lets take a look at the <code class="language-plaintext highlighter-rouge">!HeapStat</code> command that gives you a summary of the size of different Heaps that the .NET GC is using:</p>
<p><img src="/images/2018/08/SOS-heapstat-cmd.png" alt="SOS-heapstat-cmd.png" /></p>
<p>(image from <a href="https://blogs.msdn.microsoft.com/tom/2008/06/30/sos-upcoming-release-has-a-few-new-commands-heapstat/">SOS: Upcoming release has a few new commands – HeapStat</a>)</p>
<p>Here’s the code flow, showing how SOS and the DAC work together:</p>
<ul>
<li><strong>SOS</strong> The full <code class="language-plaintext highlighter-rouge">!HeapStat</code> command (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/ToolBox/SOS/Strike/strike.cpp#L4605-L4782">link</a>)</li>
<li><strong>SOS</strong> The code in the <code class="language-plaintext highlighter-rouge">!HeapStat</code> command that deals with the ‘Workstation GC’ (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/ToolBox/SOS/Strike/strike.cpp#L4631-L4667">link</a>)</li>
<li><strong>SOS</strong> <code class="language-plaintext highlighter-rouge">GCHeapUsageStats(..)</code> function that does the heavy-lifting (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/ToolBox/SOS/Strike/eeheap.cpp#L768-L850">link</a>)</li>
<li><strong>Shared</strong> The <code class="language-plaintext highlighter-rouge">DacpGcHeapDetails</code> data structure that contains pointers to the main data in the GC heap, such as segments, card tables and individual generations (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/dacprivate.h#L690-L722">link</a>)</li>
<li><strong>DAC</strong> <code class="language-plaintext highlighter-rouge">GetGCHeapStaticData</code> function that fills-out the <code class="language-plaintext highlighter-rouge">DacpGcHeapDetails</code> struct (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/dacprivate.h#L690-L722">link</a>)</li>
<li><strong>Shared</strong> the <code class="language-plaintext highlighter-rouge">DacpHeapSegmentData</code> data structure that contains details for an individual ‘segment’ with the GC Heap (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/dacprivate.h#L738-L771">link</a>)</li>
<li><strong>DAC</strong> <code class="language-plaintext highlighter-rouge">GetHeapSegmentData(..)</code> that fills-out the <code class="language-plaintext highlighter-rouge">DacpHeapSegmentData</code> struct (<a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/debug/daccess/request.cpp#L2829-L2868">link</a>)</li>
</ul>
<h3 id="3rd-party-debuggers">3rd Party ‘Debuggers’</h3>
<p>Because Microsoft published the debugging API it allowed 3rd parties to make use of the use of the <code class="language-plaintext highlighter-rouge">ICorDebug</code> interfaces, here’s a list of some that I’ve come across:</p>
<ul>
<li><a href="https://github.com/Samsung/netcoredbg">Debugger for .NET Core runtime</a> from <a href="https://github.com/Samsung">Samsung</a>
<ul>
<li>The debugger provides GDB/MI or VSCode debug adapter interface and allows to debug .NET apps under .NET Core runtime.</li>
<li><em>Probably</em> written as part of their work of <a href="https://developer.tizen.org/blog/celebrating-.net-core-2.0-looking-forward-tizen-4.0">porting .NET Core to their Tizen OS</a></li>
</ul>
</li>
<li><a href="https://github.com/0xd4d/dnSpy">dnSpy</a> - “.NET debugger and assembly editor”
<ul>
<li>A <a href="https://github.com/0xd4d/dnSpy#features-see-below-for-more-detail"><strong>very</strong> impressive tool</a>, it’s a ‘debugger’, ‘assembly editor’, ‘hex editor’, ‘decompiler’ and much more!</li>
</ul>
</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/mdbg-exe">MDbg.exe (.NET Framework Command-Line Debugger)</a>
<ul>
<li>Available as a <a href="https://www.nuget.org/packages/Microsoft.Samples.Debugging.MdbgEngine">NuGet package</a> and a <a href="https://github.com/SymbolSource/Microsoft.Samples.Debugging/tree/master/src">GitHub repo</a> or you can <a href="https://www.microsoft.com/en-us/download/details.aspx?id=2282">download is from Microsoft</a>.</li>
<li>However, at the moment is MDBG doesn’t seem to work with .NET Core, see <a href="https://github.com/dotnet/coreclr/issues/1145">Port MDBG to CoreCLR</a> and <a href="https://github.com/dotnet/coreclr/issues/8999">ETA for porting mdbg to coreclr</a> for some more information.</li>
</ul>
</li>
<li><a href="https://blog.jetbrains.com/dotnet/2017/02/23/rider-eap-18-coreclr-debugging-back-windows/">JetBrains ‘Rider’</a> allows .NET Core debugging on Windows
<ul>
<li>Although <a href="https://blog.jetbrains.com/dotnet/2017/02/15/rider-eap-17-nuget-unit-testing-build-debugging/">there was some controversy</a> due to licensing issues</li>
<li>For more info, see <a href="https://news.ycombinator.com/item?id=17323911">this HackerNews thread</a></li>
</ul>
</li>
</ul>
<h3 id="memory-dumps">Memory Dumps</h3>
<p>The final area we are going to look at is ‘memory dumps’, which can be captured from a <em>live</em> system and analysed off-line. The .NET runtime has always had good support for <a href="https://msdn.microsoft.com/en-us/library/dn342825.aspx?f=255&MSPPError=-2147217396#BKMK_Collect_memory_snapshots">creating ‘memory dumps’ on Windows</a> and now that .NET Core is ‘cross-platform’, the are also tools available <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/xplat-minidump-generation.md">do the same on other OSes</a>.</p>
<p>One of the issues with ‘memory dumps’ is that it can be tricky to get hold of the correct, matching versions of the SOS and DAC files. Fortunately Microsoft have just released the <a href="https://github.com/dotnet/symstore/tree/master/src/dotnet-symbol"><code class="language-plaintext highlighter-rouge">dotnet symbol</code> CLI tool</a> that:</p>
<blockquote>
<p>can download all the files needed for debugging (symbols, modules, SOS and DAC for the coreclr module given) for any given core dump, minidump or any supported platform’s file formats like ELF, MachO, Windows DLLs, PDBs and portable PDBs.</p>
</blockquote>
<p>Finally, if you spend any length of time <strong>analysing ‘memory dumps’</strong> you really should take a look at the excellent <a href="https://github.com/Microsoft/clrmd">CLR MD library</a> that Microsoft released a few years ago. I’ve <a href="/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD/">previously written about</a> what you can do with it, but in a nutshell, it allows you to interact with memory dumps via an intuitive C# API, with classes that provide access to the <a href="https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/ClrHeap.cs#L16">ClrHeap</a>, <a href="https://github.com/Microsoft/clrmd/blob/6735e1012d11c244874fa3ba3af6e73edc0da552/src/Microsoft.Diagnostics.Runtime/GCRoot.cs#L105">GC Roots</a>, <a href="https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/ClrThread.cs#L103">CLR Threads</a>, <a href="https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/ClrThread.cs#L37">Stack Frames</a> and <a href="https://github.com/Microsoft/clrmd/tree/master/src/Samples">much more</a>. In fact, aside from the time needed to implemented the work, CLR MD could <a href="https://github.com/Microsoft/clrmd/issues/33">implement <em>most</em> (if not all) of the SOS commands</a>.</p>
<p>But how does it work, from the <a href="https://blogs.msdn.microsoft.com/dotnet/2013/05/01/net-crash-dump-and-live-process-inspection/">announcement post</a>:</p>
<blockquote>
<p>The ClrMD managed library is a wrapper around CLR internal-only debugging APIs. Although those internal-only APIs are very useful for diagnostics, we do not support them as a public, documented release because they are incredibly difficult to use and tightly coupled with other implementation details of the CLR. ClrMD addresses this problem by providing an easy-to-use managed wrapper around these low-level debugging APIs.</p>
</blockquote>
<p>By making these APIs available, in an officially supported library, Microsoft have enabled developers to build a <a href="/2018/06/15/Tools-for-Exploring-.NET-Internals/#tools-based-on-clr-memory-diagnostics-clrmd">wide range of tools</a> on top of CLRMD, which is a great result!</p>
<hr />
<p><strong>So in summary, the .NET Runtime provides a wide-range of diagnostic, debugging and profiling features that allow a deep-insight into what’s going on inside the CLR.</strong></p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=17819352">HackerNews</a>, <a href="https://www.reddit.com/r/programming/comments/994119/monitoring_and_observability_in_the_net_runtime/">/r/programming</a> or <a href="https://www.reddit.com/r/csharp/comments/9940cm/monitoring_and_observability_in_the_net_runtime/">/r/csharp</a></p>
<hr />
<h1 id="further-reading">Further Reading</h1>
<p>Where appropriate I’ve included additional links that covers the topics discussed in this post.</p>
<p><strong>General</strong></p>
<ul>
<li><a href="https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c">Monitoring and Observability</a></li>
<li><a href="https://thenewstack.io/monitoring-and-observability-whats-the-difference-and-why-does-it-matter/">Monitoring and Observability — What’s the Difference and Why Does It Matter?</a></li>
</ul>
<p><strong>ETW Events and PerfView:</strong></p>
<ul>
<li><a href="https://assets.ctfassets.net/9n3x4rtjlya6/6A7ZxhamzKQI8cq0ikgYYO/d6430a29037100f73c235584ddada75f/Dina_Goldshtein_ETW_-_Monitor_Anything.pdf">ETW - Monitor Anything, Anytime, Anywhere</a> (pdf) by <a href="https://twitter.com/dinagozil?lang=en">Dina Goldshtein</a></li>
<li><a href="https://ruxcon.org.au/assets/2016/slides/ETW_16_RUXCON_NJR_no_notes.pdf">Make ETW Great Again</a> (pdf)</li>
<li><a href="https://www.cyberpointllc.com/posts/cp-logging-keystrokes-with-event-tracing-for-windows-etw.html">Logging Keystrokes with Event Tracing for Windows (ETW)</a></li>
<li>PerfView is based on <a href="https://github.com/Microsoft/perfview/blob/master/documentation/TraceEvent/TraceEventLibrary.md">Microsoft.Diagnostics.Tracing.TraceEvent</a>, which means you can easily write code to collect ETW events yourself, for example <a href="https://github.com/Microsoft/perfview/blob/master/src/TraceEvent/Samples/21_ObserveJitEvents.cs">‘Observe JIT Events’ sample</a></li>
<li>More info in the <a href="https://github.com/Microsoft/perfview/blob/master/documentation/TraceEvent/TraceEventProgrammersGuide.md">TraceEvent Library Programmers Guide</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/windows-performance-tracing.md">Performance Tracing on Windows</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/Documentation/coding-guidelines/EventLogging.md">CoreClr Event Logging Design</a></li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2018/10/24/bringing-net-application-performance-analysis-to-linux/">Bringing .NET application performance analysis to Linux</a> (introduction on the .NET Blog)</li>
<li><a href="https://lttng.org/blog/2018/08/28/bringing-dotnet-perf-analysis-to-linux/">Bringing .NET application performance analysis to Linux</a> (more detailed post on the LTTng blog)</li>
</ul>
<p><strong>Profiling API:</strong></p>
<ul>
<li>Read all of <a href="https://blogs.msdn.microsoft.com/davbr/">David Broman’s CLR Profiling API Blog</a>, seriously if you want to use the Profiling API, this is the place to start!</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/Documentation/botr/profiling.md">BOTR - Profiling</a> - explains what the ‘Profiling API’ provides, what you can do with it and how to use it.</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/Documentation/botr/profilability.md">BOTR - Profilability</a> - discusses what needs to be done within the CLR <em>ifself</em> to make profiling possible.</li>
<li>Interesting presentation <a href="https://dotnetstammtisch.at/slides/003/The-Profiling-API.pdf">The .NET Profiling API</a> (pdf)</li>
<li><a href="https://yaozhenhua.wordpress.com/2012/05/07/thought-on-managed-code-injection-and-interception/">Thought(s) on managed code injection and interception</a></li>
<li><a href="https://blogs.msdn.microsoft.com/rmbyers/2008/10/30/clr-4-0-advancements-in-diagnostics/">CLR 4.0 advancements in diagnostics</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/4382">Profiling: How to get GC Metrics in-process</a></li>
</ul>
<p><strong>Debugging:</strong></p>
<ul>
<li>Again, if you ware serious about using the Debugging API, you mist read all of <a href="https://blogs.msdn.microsoft.com/jmstall">Mike Stall’s .NET Debugging Blog</a>, great stuff, including:
<ul>
<li><a href="https://blogs.msdn.microsoft.com/jmstall/2004/12/28/how-do-managed-breakpoints-work/">How do Managed Breakpoints work?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/jmstall/2005/02/23/debugging-any-net-language/">Debugging any .Net language</a></li>
<li><a href="https://blogs.msdn.microsoft.com/jmstall/2004/10/05/how-can-i-use-icordebug/">How can I use ICorDebug?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/jmstall/2005/11/05/you-cant-debug-yourself/">You can’t debug yourself</a></li>
<li><a href="https://blogs.msdn.microsoft.com/jmstall/2005/11/28/tool-to-get-snapshot-of-managed-callstacks/">Tool to get snapshot of managed callstacks</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/dac-notes.md">BOTR Data Access Component (DAC) Notes</a></li>
<li><a href="http://blogs.microsoft.co.il/pavely/2012/04/03/whats-new-in-clr-45-debugging-api/">What’s New in CLR 4.5 Debugging API?</a></li>
<li><a href="https://lowleveldesign.org/2010/10/11/writing-a-net-debugger-part-1-starting-the-debugging-session/">Writing a .Net Debugger</a>, <a href="https://lowleveldesign.org/2010/10/22/writing-a-net-debugger-part-2-handling-events-and-creating-wrappers/">Part 2</a>, <a href="https://lowleveldesign.org/2010/11/08/writing-a-net-debugger-part-3-symbol-and-source-files/">Part 3</a> and <a href="https://lowleveldesign.org/2010/12/01/writing-a-net-debugger-part-4-breakpoints/">Part 4</a></li>
<li><a href="https://tripleemcoder.com/2011/12/10/writing-an-automatic-debugger-in-15-minutes-yes-a-debugger/">Writing an automatic debugger in 15 minutes (yes, a debugger!)</a></li>
<li>PR to <a href="https://github.com/dotnet/coreclr/pull/18160">add SOS DumpAsync command</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/8363">Question: what remaining SOS commands need to be ported to Linux/OS X</a></li>
</ul>
<p><strong>Memory Dumps:</strong></p>
<ul>
<li><a href="http://voneinem-windbg.blogspot.com/2007/03/creating-and-analyzing-minidumps-in-net.html">Creating and analyzing minidumps in .NET production applications</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2015/08/19/minidumper-smaller-dumps-net-applications/">Creating Smaller, But Still Usable, Dumps of .NET Applications</a> and <a href="http://blogs.microsoft.co.il/sasha/2015/09/30/more-on-minidumper-getting-the-right-memory-pages-for-net-analysis/">More on - MiniDumper: Getting the Right Memory Pages for .NET Analysis</a></li>
<li><a href="https://lowleveldesign.org/2018/02/22/minidumper-a-better-way-to-create-managed-memory-dumps/">Minidumper – A Better Way to Create Managed Memory Dumps</a></li>
<li><a href="http://www.debuginfo.com/tools/clrdump.html">ClrDump is a set of tools that allow to produce small minidumps of managed applications</a></li>
</ul>
Presentations and Talks covering '.NET Internals'2018-07-12T00:00:00+00:00http://www.mattwarren.org/2018/07/12/Presentations and Talks covering .NET Internals
<p>I’m constantly surprised at just <em>how popular</em> resources related to ‘.NET Internals’ are, for instance take this tweet and the thread that followed:</p>
<blockquote class="twitter-tweet" data-cards="hidden" data-lang="en"><p lang="en" dir="ltr">If you like learning about '.NET Internals' here's a few talks/presentations I've watched that you might also like. First 'Writing High Performance Code in .NET' by Bart de Smet <a href="https://t.co/L5S9BsBlWe">https://t.co/L5S9BsBlWe</a></p>— Matt Warren (@matthewwarren) <a href="https://twitter.com/matthewwarren/status/1016315333584531456?ref_src=twsrc%5Etfw">July 9, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>All I’d done was put together a list of Presentations/Talks (based on the criteria below) and people <strong>really seemed to appreciate it</strong>!!</p>
<hr />
<h2 id="criteria">Criteria</h2>
<p>To keep things focussed, the talks or presentations:</p>
<ul>
<li>Must explain some aspect of the <strong>‘internals’ of the .NET Runtime</strong> (CLR)
<ul>
<li>i.e. something ‘<em>under-the-hood</em>’, the more ‘<em>low-level</em>’ the better!</li>
<li>e.g. how the GC works, what the JIT does, how assemblies are structured, how to inspect what’s going on, etc</li>
</ul>
</li>
<li>Be entertaining and <strong>worth watching</strong>!
<ul>
<li>i.e. worth someone giving up 40-50 mins of their time for</li>
<li>this is hard when you’re talking about low-level details, not all speakers manage it!</li>
</ul>
</li>
<li>Needs to be a talk that I’ve <strong>watched myself</strong> and actually learnt something from
<ul>
<li>i.e. I don’t just hope it’s good based on the speaker/topic</li>
</ul>
</li>
<li>Doesn’t have to be unique, fine if it <strong>overlaps with another talk</strong>
<ul>
<li>it often helps having two people cover the same idea, from different perspectives</li>
</ul>
</li>
</ul>
<p>If you want more general lists of talks and presentations see <a href="https://github.com/JanVanRyswyck/awesome-talks">Awesome talks</a> and <a href="https://github.com/adamsitnik/awesome-dot-net-performance#conference-talks">Awesome .NET Performance</a></p>
<hr />
<h2 id="list-of-talks">List of Talks</h2>
<p>Here’s the complete list of talks, including a few bonus ones that weren’t in the tweet:</p>
<ol>
<li><a href="#perfview"><strong>PerfView: The Ultimate .NET Performance Tool</strong></a> by <a href="https://twitter.com/goldshtn">Sasha Goldshtein</a></li>
<li><a href="#highperfcode"><strong>Writing High Performance Code in .NET</strong></a> by <a href="https://channel9.msdn.com/Tags/bart+de+smet">Bart De Smet</a></li>
<li><a href="#stateofperf"><strong>State of the .NET Performance</strong></a> by <a href="https://twitter.com/sitnikadam">Adam Sitnik</a></li>
<li><a href="#benchmarking"><strong>Let’s talk about microbenchmarking</strong></a> by <a href="https://twitter.com/andrey_akinshin">Andrey Akinshin</a></li>
<li><a href="#systemprogramming"><strong>Safe Systems Programming in C# and .NET</strong></a> (<a href="https://www.infoq.com/news/2016/06/systems-programming-qcon">summary</a>) by <a href="https://twitter.com/funcOfJoe">Joe Duffy</a></li>
<li><a href="#flingos"><strong>FlingOS - Using C# for an OS</strong></a> by <a href="https://twitter.com/ednutting">Ed Nutting</a></li>
<li><a href="#netgc"><strong>Maoni Stephens on .NET GC</strong></a> by <a href="https://blogs.msdn.microsoft.com/maoni/">Maoni Stephens</a></li>
<li><a href="#netcoreperf"><strong>What’s new for performance in .NET Core 2.0</strong></a> by <a href="https://twitter.com/ben_a_adams">Ben Adams</a></li>
<li><a href="#opensourcehacking"><strong>Open Source Hacking the CoreCLR</strong></a> by <a href="https://twitter.com/geoffnorton">Geoff Norton</a></li>
<li><a href="#netcorexplat"><strong>.NET Core & Cross Platform</strong></a> by <a href="https://github.com/ellismg">Matt Ellis</a></li>
<li><a href="#netcoreunix"><strong>.NET Core on Unix</strong></a> by <a href="https://github.com/janvorli">Jan Vorlicek</a></li>
<li><a href="#multithreading"><strong>Multithreading Deep Dive</strong></a> by <a href="https://twitter.com/gfraiteur">Gael Fraiteur</a></li>
<li><a href="#netmemorylego"><strong>Everything you need to know about .NET memory</strong></a> by <a href="https://twitter.com/bcemmett">Ben Emmett</a></li>
</ol>
<p>I also added these 2 categories:</p>
<ul>
<li><a href="#channel-9"><strong>‘Channel 9’ Talks</strong></a>
<ul>
<li>So many great talks featuring the Microsoft Engineers who work on the .NET runtime</li>
</ul>
</li>
<li><a href="#future"><strong>Talks I plan to watch (but haven’t yet)</strong></a></li>
</ul>
<p><strong style="color:green">If I’ve missed any out, please let me know in the comments</strong> (or <a href="https://twitter.com/matthewwarren/">on twitter</a>)</p>
<hr />
<p><span id="perfview"></span>
<a href="https://www.infoq.com/presentations/perfview-net"><strong>PerfView: The Ultimate .NET Performance Tool</strong></a> by <a href="https://twitter.com/goldshtn">Sasha Goldshtein</a> (<a href="https://www.slideshare.net/InfoQ/perfview-the-ultimate-net-performance-tool">slides</a>)</p>
<p>In fact, just watch all the talks/presentations that Sasha has done, they’re great!! For example <a href="http://blogs.microsoft.co.il/sasha/2013/11/05/modern-garbage-collection-in-theory-and-practice/">Modern Garbage Collection in Theory and Practice</a> and <a href="https://vimeo.com/131636651">Making .NET Applications Faster</a></p>
<p>This talk is a great ‘how-to’ guide for <a href="https://github.com/Microsoft/perfview">PerfView</a>, what it can do and how to use it (JIT stats, memory allocations, CPU profiling). For more on PerfView see this interview with it’s creator, <a href="https://channel9.msdn.com/posts/Vance-Morrison-Performance-and-PerfView">Vance Morrison: Performance and PerfView</a>.</p>
<p><a href="https://www.infoq.com/presentations/perfview-net"><img src="/images/2018/07/01 - PerfView - The Ultimate .NET Performance Tool.png" alt="01 - PerfView - The Ultimate .NET Performance Tool" /></a></p>
<hr />
<p><span id="highperfcode"></span>
<a href="https://www.youtube.com/watch?v=r738tcIstck&feature=youtu.be"><strong>Writing High Performance Code in .NET</strong></a> by <a href="https://channel9.msdn.com/Tags/bart+de+smet">Bart De Smet</a> (he also has a some <a href="https://www.pluralsight.com/authors/bart-desmet">Pluralsight Courses</a> on the same subject)</p>
<p>Features <a href="/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD/">CLRMD</a>, WinDBG, ETW Events and PerfView, plus some great ‘real world’ performance issues</p>
<p><a href="https://www.youtube.com/watch?v=r738tcIstck&feature=youtu.be"><img src="/images/2018/07/03 - Writing High Performance Code in .NET.png" alt="03 - Writing High Performance Code in .NET" /></a></p>
<hr />
<p><span id="stateofperf">
<a href="https://www.youtube.com/watch?v=dVKUYP_YALg"><strong>State of the .NET Performance</strong></a> by <a href="https://twitter.com/sitnikadam">Adam Sitnik</a> (<a href="https://www.slideshare.net/yuliafast/adam-sitnik-state-of-the-net-performance">slides</a>)</span></p>
<p>How to write high-perf code that plays nicely with the .NET GC, covering Span<T>, Memory<T> & ValueTask</p>
<p><a href="https://www.youtube.com/watch?v=dVKUYP_YALg"><img src="/images/2018/07/02 - State of the .NET Performance.png" alt="02 - State of the .NET Performance" /></a></p>
<hr />
<p><span id="benchmarking"></span>
<a href="https://dotnext-helsinki.com/talks/lets-talk-about-microbenchmarking/"><strong>Let’s talk about microbenchmarking</strong></a> by <a href="https://twitter.com/andrey_akinshin">Andrey Akinshin</a> (<a href="https://www.slideshare.net/AndreyAkinshin/lets-talk-about-microbenchmarking">slides</a>)</p>
<p>Primarily a look at how to benchmark .NET code, but along the way it demonstrates some of the internal behaviour of the JIT compiler (Andrey is the creator of <a href="https://benchmarkdotnet.org/">BenchmarkDotNet</a>)</p>
<p><a href="https://dotnext-helsinki.com/talks/lets-talk-about-microbenchmarking/"><img src="/images/2018/07/12 - Let's talk about microbenchmarking.png" alt="12 - Let's talk about microbenchmarking" /></a></p>
<hr />
<p><span id="systemprogramming"></span>
<a href="https://www.infoq.com/presentations/csharp-systems-programming"><strong>Safe Systems Programming in C# and .NET</strong></a> (<a href="https://www.infoq.com/news/2016/06/systems-programming-qcon">summary</a>) by <a href="https://twitter.com/funcOfJoe">Joe Duffy</a> (<a href="https://www.slideshare.net/InfoQ/safe-systems-programming-in-c-and-net">slides</a> and <a href="http://joeduffyblog.com/">blog</a>)</p>
<p>Joe Duffy (worked on the <a href="http://joeduffyblog.com/2015/11/03/blogging-about-midori/">Midori project</a>) shows why C# is a good ‘System Programming’ language, including what low-level features it provides</p>
<p><a href="https://www.infoq.com/presentations/csharp-systems-programming"><img src="/images/2018/07/08%20-%20Safe%20Systems%20Programming%20in%20C%23%20and%20.NET.png" alt="08 - Safe Systems Programming in C# and .NET" /></a></p>
<hr />
<p><span id="flingos"></span>
<a href="https://www.youtube.com/watch?v=bnopbNS8Lnw"><strong>FlingOS - Using C# for an OS</strong></a> by <a href="https://twitter.com/ednutting">Ed Nutting</a> (<a href="https://github.com/FlingOS/FlingOS/tree/master/Documentation/Presentations/.NET%20South%20West">slides</a>)</p>
<p>Shows what you need to do if you want to write and entire OS in C# (!!) The <a href="http://www.flingos.co.uk/">FlingOS</a> project is worth checking out, it’s a great learning resource.</p>
<p><a href="https://www.youtube.com/watch?v=bnopbNS8Lnw"><img src="/images/2018/07/04%20-%20FlingOS%20-%20Using%20C%23%20for%20an%20OS.png" alt="04 - FlingOS - Using C# for an OS" /></a></p>
<hr />
<p><span id="netgc"></span>
<a href="https://channel9.msdn.com/Shows/On-NET/Maoni-Stephens-on-NET-GC"><strong>Maoni Stephens on .NET GC</strong></a> by <a href="https://blogs.msdn.microsoft.com/maoni/">Maoni Stephens</a> who is the main (only?) .NET GC developer. In addition <a href="https://channel9.msdn.com/posts/Maoni-Stephens-CLR-45-Server-Background-GC">CLR 4.5 Server Background GC</a> and <a href="https://channel9.msdn.com/Blogs/Charles/NET-45-in-Practice-Bing">.NET 4.5 in Practice: Bing</a> are also worth a watch.</p>
<p>An in-depth Q&A on how the .NET GC works, why is does what it does and how to use it efficiently</p>
<p><a href="https://channel9.msdn.com/Shows/On-NET/Maoni-Stephens-on-NET-GC"><img src="/images/2018/07/07 - Maoni Stephens on .NET GC.png" alt="07 - Maoni Stephens on .NET GC" /></a></p>
<hr />
<p><span id="netcoreperf">
<a href="https://www.ageofascent.com/2017/11/05/perfromance-dotnet-core-2-corestart-conference/"><strong>What’s new for performance in .NET Core 2.0</strong></a> by <a href="https://twitter.com/ben_a_adams">Ben Adams</a> (<a href="https://cdn.ageofascent.net/assets/2017/Corestart-Whats-new-performance-dotnet-core-2-0.pdf">slides</a>)</span></p>
<p>Whilst it <em>mostly</em> focuses on performance, there is some great internal details on how the JIT generates code for ‘de-virtualisation’, ‘exception handling’ and ‘bounds checking’</p>
<p><a href="https://www.youtube.com/watch?v=eOdhWTX3Ajk"><img src="/images/2018/07/13 - What's new for performance in .NET Core 2.0.png" alt="13 - What's new for performance in .NET Core 2.0" /></a></p>
<hr />
<p><span id="opensourcehacking"></span>
<a href="https://www.youtube.com/watch?v=iQRVJHab4MM"><strong>Open Source Hacking the CoreCLR</strong></a> by <a href="https://twitter.com/geoffnorton">Geoff Norton</a></p>
<p>Making .NET Core (the CoreCLR) work on OSX was mostly a ‘community contribution’, this talks is a ‘walk-through’ of what it took to make it happen</p>
<p><a href=""><img src="/images/2018/07/09 - Open Source Hacking the CoreCLR.png" alt="09 - Open Source Hacking the CoreCLR" /></a></p>
<hr />
<p><span id="netcorexplat"></span>
<a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-03-04"><strong>.NET Core & Cross Platform</strong></a> by <a href="https://github.com/ellismg">Matt Ellis</a>, one of the .NET Runtime Engineers (this one on how made <a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-02-25">.NET Core ‘Open Source’</a> is also worth a watch)</p>
<p>Discussion of the early work done to make CoreCLR ‘<em>cross-platform</em>’, including the build setup, ‘Platform Abstraction Layer’ (PAL) and OS differences that had to be accounted for</p>
<p><a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-03-04"><img src="/images/2018/07/05 - .NET Core & Cross Platform.png" alt="05 - .NET Core & Cross Platform" /></a></p>
<hr />
<p><span id="netcoreunix"></span>
<a href="https://www.youtube.com/watch?v=JNmUz7C1usM"><strong>.NET Core on Unix</strong></a> by <a href="https://github.com/janvorli">Jan Vorlicek</a> a .NET Runtime Engineer (<a href="https://www.slideshare.net/KarelZikmund1/net-meetup-prague-portable-net-core-on-linux-jan-vorlicek">slides</a>)</p>
<p>This talk discusses which parts of the CLR had to be changed to run on Unix, including exception handling, calling conventions, runtime suspension and the PAL</p>
<p><a href="https://www.youtube.com/watch?v=JNmUz7C1usM"><img src="/images/2018/07/06 - .NET Core on Unix.png" alt="06 - .NET Core on Unix" /></a></p>
<hr />
<p><span id="multithreading"></span>
<a href="https://www.youtube.com/watch?v=z2QYa2RW9c8"><strong>Multithreading Deep Dive</strong></a> by <a href="https://twitter.com/gfraiteur">Gael Fraiteur</a> (creator of <a href="https://www.postsharp.net/">PostSharp</a>)</p>
<p>Takes a really in-depth look at the CLR memory-model and threading primitives</p>
<p><a href="https://www.youtube.com/watch?v=z2QYa2RW9c8"><img src="/images/2018/07/10 - Multithreading Deep Dive.png" alt="10 - Multithreading Deep Dive" /></a></p>
<hr />
<p><span id="netmemorylego"></span>
<a href="https://vimeo.com/113632451"><strong>Everything you need to know about .NET memory</strong></a> by <a href="https://twitter.com/bcemmett">Ben Emmett</a> (<a href="https://www.slideshare.net/benemmett/net-memory-management-ndc-london">slides</a>)</p>
<p>Explains how the .NET GC works using Lego! A very innovative and effective approach!!</p>
<p><a href="https://vimeo.com/113632451"><img src="/images/2018/07/11 - Everything you need to know about .NET memory.png" alt="11 - Everything you need to know about .NET memory" /></a></p>
<hr />
<p><span id="channel9"></span></p>
<h1 id="channel-9">Channel 9</h1>
<p>The <a href="https://channel9.msdn.com/">Channel 9</a> videos recorded by Microsoft deserve their own category, because there’s so much deep, technical information in them. This list is just a selection, including some of my favourites, there are <a href="https://channel9.msdn.com/Search?term=.net%20clr&lang-en=true">many, many more available</a>!!</p>
<ul>
<li><a href="https://channel9.msdn.com/Blogs/Charles/Ian-Carmichael-The-History-and-Future-of-CLR">Ian Carmichael: The History and Future of the CLR</a> (2009)</li>
<li><a href="https://channel9.msdn.com/Shows/Going+Deep/Maoni-Stephens-and-Andrew-Pardoe-CLR-4-Inside-Background-GC">Maoni Stephens and Andrew Pardoe: CLR 4 Garbage Collector - Inside Background GC</a> (2009)</li>
<li><a href="https://channel9.msdn.com/Shows/Going+Deep/Vance-Morrison-CLR-Through-the-Years">Vance Morrison: CLR Through the Years</a> (2009)</li>
<li><a href="https://channel9.msdn.com/Blogs/Charles/Surupa-Biswas-CLR-4-Resilient-NGen-and-Targeted-Patching">Surupa Biswas: CLR 4 - Resilient NGen with Targeted Patching</a> (2009)</li>
<li><a href="https://channel9.msdn.com/Shows/WM_IN/Suzanne-Cook-Developing-the-CLR-Part-I">Suzanne Cook - Developing the CLR, Part I</a> (2005)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Kit-George-Tour-of-NET-CLR-Base-Class-Library-Team">Tour of .NET CLR Base Class Library Team</a> (2005)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Christopher-Brumme-The-future-of-CLR-exceptions">Christopher Brumme - The future of CLR exceptions</a> (2004)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Anders-Hejlsberg-What-brought-about-the-birth-of-the-CLR">Anders Hejlsberg - What brought about the birth of the CLR?</a> (2004)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Jason-Zander-Discussing-the-architecture-and-quotsecretsquot-of-NET-and-the-CLR">Jason Zander - Discussing the architecture and secrets of .NET and the CLR</a> (2004)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Brad-Abrams-What-is-missing-from-the-CLR">Brad Abrams - What is missing from the CLR?</a> (2004)</li>
<li><a href="https://channel9.msdn.com/Blogs/TheChannel9Team/Christopher-Brumme-Will-there-be-improvements-to-NETs-garbage-collector">Christopher Brumme – Will there be improvements to .NET’s garbage collector?</a> (2004)</li>
</ul>
<hr />
<p><span id="future"></span></p>
<h1 id="ones-to-watch">Ones to watch</h1>
<p>I can’t recommend these yet, because I haven’t watched them myself! (I can’t break my <em>own</em> rules!!).</p>
<p>But they all look really interesting and I will watch them as soon as I get a chance, so I thought they were worth including:</p>
<ul>
<li><a href="https://kalapos.net/Blog/ShowPost/Udemy-Advanced-DotNet-Course">C# and .NET - Advanced topics</a> (££) by <a href="https://twitter.com/gregkalapos">Gergely Kalapos</a></li>
<li><a href="https://www.udemy.com/high-performance-coding-with-net-core-and-csharp/?couponCode=KALPAOSNET-ADVCEDPST">High Performance Coding with .NET Core and C#</a> (££) also by <a href="https://twitter.com/gregkalapos">Gergely Kalapos</a></li>
<li><a href="https://www.youtube.com/watch?v=7GTpwgsmHgU">Patterns for high-performance C#</a> by <a href="https://twitter.com/federicolois">Federico Andres Lois</a></li>
<li><a href="https://www.youtube.com/playlist?list=PLV281NbnwQaJpaSSOoSI7oPLINjf2Ojak">Manual memory management in .NET Framework</a> by <a href="https://twitter.com/furmanekadam">Adam Furmanek</a> (<a href="https://blog.adamfurmanek.pl/">blog</a>)</li>
<li><a href="https://www.youtube.com/watch?v=rWZXjz_nnzs&index=9&list=PL03Lrmd9CiGfprrIjzbjdA2RRShJMzYIM">Internals of Exceptions</a> by <a href="https://twitter.com/furmanekadam">Adam Furmanek</a></li>
<li><a href="https://vimeo.com/223985297">Beyond step-by step debugging in Visual Studio</a> by <a href="https://twitter.com/TessFerrandez">Tess Ferrandez</a></li>
<li><a href="https://vimeo.com/68320501">Hacking .NET(C#) Application: Code of the Hacker</a> by Jon McCoy</li>
<li><a href="https://www.youtube.com/watch?v=jK8jYQ3ZKiI&index=22&list=PL03Lrmd9CiGfprrIjzbjdA2RRShJMzYIM">So you want to create your own .NET runtime?</a> (<a href="https://ndcoslo.com/talk/so-you-want-to-create-your-own-net-runtime/">abstract</a>) by <a href="">Chris Bacon</a></li>
<li><a href="https://dotnext-piter.ru/2018/spb/talks/5mpiesdyfikoi86s2u0owq/">Advanced .NET debugging techniques from a real world investigation</a> by <a href="https://twitter.com/chnasarre">Christophe Nasarre</a> and <a href="https://twitter.com/KooKiz">Kevin Gosse</a> (<a href="https://www.youtube.com/watch?v=DD3w66Ff8Ms&t=11713s">recording</a> and <a href="https://github.com/chrisnas/SELAConference2018">slides</a>)</li>
<li><a href="http://www.seladeveloperpractice.com/sessions?selected=13">Staying Friendly with the GC</a> by <a href="https://twitter.com/ayende">Oren Eini (Ayende Rahien)</a> (<a href="https://www.slideshare.net/OrenEini/staying-friendly-with-the-gc-104205724">slides</a>)</li>
<li><a href="https://www.youtube.com/watch?v=DD3w66Ff8Ms">Scratched Metal</a> by <a href="https://twitter.com/federicolois">Federico Andres Lois</a></li>
<li><a href="https://www.slideshare.net/kekyo/beachhead-implements-new-opcode-on-clr-jit">Beachhead implements new opcode on CLR JIT</a> by <a href="https://twitter.com/kekyo2">Kouji Matsui</a></li>
<li><a href="https://pyrzyk.net/public-talks/">Everything what you (don’t) know about structures in .NET</a> by <a href="https://twitter.com/lukaszpyrzyk">Łukasz Pyrzyk</a> (<a href="https://pyrzyk.net/structures">slides</a>)</li>
</ul>
<hr />
<p>If this post causes you to go off and watch hours and hours of videos, ignoring friends, family and work for the next few weeks, <strong><a href="https://www.youtube.com/watch?v=lQPeThqrjws">Don’t Blame Me</a></strong></p>
.NET JIT and CLR - Joined at the Hip2018-07-05T00:00:00+00:00http://www.mattwarren.org/2018/07/05/.NET JIT and CLR - Joined at the Hip
<link rel="stylesheet" href="/datavis/treemap-coreclr.css" />
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="/datavis/treemap-coreclr.js" type="text/javascript"></script>
<p>I’ve been <a href="/2018/03/23/Exploring-the-internals-of-the-.NET-Runtime/">digging into .NET Internals</a> for a while now, but never really looked closely at how the ‘<em>Just-in-Time</em>’ (JIT) compiler works. In my mind, the interaction between the .NET Runtime and the JIT has always looked like this:</p>
<p><img src="/images/2018/07/JIT and EE Interaction - Expected.png" alt="JIT and EE Interaction - Expected" /></p>
<p>Nice and straight-forward, the CLR asks the JIT to compile some ‘<em>Intermediate Language</em>’ (IL) code into machine code and the JIT hands back the bytes when it’s done.</p>
<p>However, it turns out the interaction is <em>much</em> more complicated, in reality it looks more like this:</p>
<p><img src="/images/2018/07/JIT and EE Interaction - Actual.png" alt="JIT and EE Interaction" /></p>
<p>The JIT and the CLR’s ‘<em>Execution Engine</em>’ (EE) or ‘<em>Virtual Machine</em>’ (VM) work closely with one another, they really are <a href="https://www.merriam-webster.com/dictionary/joined%20at%20the%20hip"><strong>‘joined at the hip’</strong></a>.</p>
<p><strong>The rest of this post will explore the interaction between the 2 components, how they work together and why they need to.</strong></p>
<hr />
<h3 id="the-jit-compiler">The JIT Compiler</h3>
<p>As a quick aside, this post will <strong>not</strong> be talking about the internals of the JIT compiler itself, if you want to find out more about how that works I recommend reading the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md">fantastic overview in the BOTR</a> and this <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md">excellent tutorial</a>, where this very helpful diagram comes from:</p>
<p><img src="/images/2018/07/RyuJIT Phases.png" alt="RyuJIT Phases" /></p>
<p>After all that, if you still want more, you can take a look at the ‘JIT’ section in the <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/#jit-just-in-time-compiler">‘Hitchhikers-Guide-to-the-CoreCLR-Source-Code’</a>.</p>
<hr />
<h3 id="components-within-the-clr">Components within the CLR</h3>
<p>Before we go any further it’s helpful to discuss how the ‘Common Language Runtime’ (CLR) is actually composed. It’s actually made up of several different components including the VM/EE, JIT, GC and others. The treemap below shows the different areas of the source code, grouped by colour into the top-level sections they fall under. You can clearly see that the VM and JIT dominate as well as ‘mscorlib’ which is the only component written in C#.</p>
<p>You can hover over an individual box to get more detailed information and can click on the different radio buttons to toggle the sizing (LOC/Files/Commits)</p>
<div id="top-level-treemap">
<svg width="800" height="570"></svg>
<form>
<span style="padding-right: 5em">
<label><input type="radio" name="mode" value="sumByLinesOfCode" checked="" />
Total L.O.C
</label>
</span>
<span style="padding-right: 5em">
<label><input type="radio" name="mode" value="sumByNumFiles" />
# Files
</label>
</span>
<span style="padding-right: 5em">
<label><input type="radio" name="mode" value="sumByNumCommits" />
# Commits
</label>
</span>
</form>
</div>
<p><strong>Note:</strong> This treemap is from my previous post <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">‘Hitchhikers-Guide-to-the-CoreCLR-Source-Code’</a> which was written over a year ago, so the exact numbers will have changed in the meantime.</p>
<p>You can also see these ‘components’ or ‘areas’ reflected in the <a href="https://github.com/dotnet/coreclr/labels?utf8=%E2%9C%93&q=area-">classification scheme</a> used for the CoreCLR GitHub issues (one difference is that <code class="language-plaintext highlighter-rouge">area-CodeGen</code> is used instead of <code class="language-plaintext highlighter-rouge">JIT</code>).</p>
<hr />
<h2 id="the-clr-and-the-jit-compiler">The CLR and the JIT Compiler</h2>
<p>Onto the main subject, just how do the CLR and the JIT compiler work together to <a href="/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation/#how-it-works">transform a method from IL to machine code</a>? As always, the ‘Book of the Runtime’ is a good place to start, from the ‘Execution Environment and External Interface’ section of the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#execution-environment-and-external-interface">RyuJIT Overview</a>:</p>
<blockquote>
<p>RyuJIT provides the just in time compilation service for the .NET runtime. The runtime itself is variously called the EE (execution engine), the VM (virtual machine) or simply the CLR (common language runtime). Depending upon the configuration, the EE and JIT may reside in the same or different executable files. RyuJIT implements the JIT side of the JIT/EE interfaces:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ICorJitCompiler</code> – this is the <strong>interface that the JIT compiler implements</strong>. This interface is defined in <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/corjit.h">src/inc/corjit.h</a> and its implementation is in <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/ee_il_dll.cpp">src/jit/ee_il_dll.cpp</a>. The following are the key methods on this interface:
<ul>
<li><code class="language-plaintext highlighter-rouge">compileMethod</code> is the main entry point for the JIT. The EE passes it a <code class="language-plaintext highlighter-rouge">ICorJitInfo</code> object, and the “info” containing the IL, the method header, and various other useful tidbits. It returns a pointer to the code, its size, and additional GC, EH and (optionally) debug info.</li>
<li><code class="language-plaintext highlighter-rouge">getVersionIdentifier</code> is the mechanism by which the JIT/EE interface is versioned. There is a single GUID (manually generated) which the JIT and EE must agree on.</li>
<li><code class="language-plaintext highlighter-rouge">getMaxIntrinsicSIMDVectorLength</code> communicates to the EE the largest SIMD vector length that the JIT can support.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ICorJitInfo</code> – this is the <strong>interface that the EE implements</strong>. It has many methods defined on it that allow the JIT to look up metadata tokens, traverse type signatures, compute field and vtable offsets, find method entry points, construct string literals, etc. This bulk of this interface is inherited from <code class="language-plaintext highlighter-rouge">ICorDynamicInfo</code> which is defined in <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/corinfo.h">src/inc/corinfo.h</a>. The implementation is defined in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp">src/vm/jitinterface.cpp</a>.</li>
</ul>
</blockquote>
<p>So there are 2 main interfaces, <code class="language-plaintext highlighter-rouge">ICorJitCompiler</code> which is implemented by the JIT compiler and allows the EE to control how a method is compiled. Second there is <code class="language-plaintext highlighter-rouge">ICorJitInfo</code> which the EE implements to allow the JIT to request information it needs during compilation.</p>
<p>Let’s now look at these interfaces in more detail.</p>
<hr />
<h3 id="ee--jit-icorjitcompiler"><strong>EE ➜ JIT</strong> <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjit.h#L243-L304">ICorJitCompiler</a></h3>
<p>Firstly, we’ll examine <code class="language-plaintext highlighter-rouge">ICorJitCompiler</code>, the interface exposed by the JIT. It’s actually pretty straight-forward and only contains 7 methods:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">CorJitResult __stdcall compileMethod (..)</code></li>
<li><code class="language-plaintext highlighter-rouge">void clearCache()</code></li>
<li><code class="language-plaintext highlighter-rouge">BOOL isCacheCleanupRequired()</code></li>
<li><code class="language-plaintext highlighter-rouge">void ProcessShutdownWork(ICorStaticInfo* info)</code></li>
<li><code class="language-plaintext highlighter-rouge">void getVersionIdentifier(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">unsigned getMaxIntrinsicSIMDVectorLength(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">void setRealJit(..)</code></li>
</ul>
<p>Of these, the most interesting one is <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/jit/ee_il_dll.cpp#L276-L309">compileMethod(..)</a>, which has the following signature:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">virtual</span> <span class="n">CorJitResult</span> <span class="kr">__stdcall</span> <span class="n">compileMethod</span> <span class="p">(</span>
<span class="n">ICorJitInfo</span> <span class="o">*</span><span class="n">comp</span><span class="p">,</span> <span class="cm">/* IN */</span>
<span class="k">struct</span> <span class="n">CORINFO_METHOD_INFO</span> <span class="o">*</span><span class="n">info</span><span class="p">,</span> <span class="cm">/* IN */</span>
<span class="kt">unsigned</span> <span class="cm">/* code:CorJitFlag */</span> <span class="n">flags</span><span class="p">,</span> <span class="cm">/* IN */</span>
<span class="n">BYTE</span> <span class="o">**</span><span class="n">nativeEntry</span><span class="p">,</span> <span class="cm">/* OUT */</span>
<span class="n">ULONG</span> <span class="o">*</span><span class="n">nativeSizeOfCode</span> <span class="cm">/* OUT */</span>
<span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>
<p>The EE provides the JIT with information about the method it wants compiled (<code class="language-plaintext highlighter-rouge">CORINFO_METHOD_INFO</code>) as well as flags (<code class="language-plaintext highlighter-rouge">CorJitFlag</code>) which control the:</p>
<ul>
<li>Level of optimisation</li>
<li>Whether the code is compiled in <code class="language-plaintext highlighter-rouge">Debug</code> or <code class="language-plaintext highlighter-rouge">Release</code> mode</li>
<li>If the code needs to be ‘Profilable’ or support ‘Edit-and-Continue’</li>
<li>Alignment of loops, i.e. should they be aligned on byte-boundaries</li>
<li>If <code class="language-plaintext highlighter-rouge">SSE3</code>/<code class="language-plaintext highlighter-rouge">SSE4</code> should be used</li>
<li>and <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjitflags.h#L24-L183">many other scenarios</a></li>
</ul>
<p>The final parameter is a reference to the <code class="language-plaintext highlighter-rouge">ICorJitInfo</code> interface, which is covered in the next section.</p>
<hr />
<h3 id="jit--ee-icorjithost-and-icorjitinfo"><strong>JIT ➜ EE</strong> <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjithost.h#L8-L46">ICorJitHost</a> and <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjit.h#L306-L484">ICorJitInfo</a></h3>
<p>The APIs that the EE has to implement to work with the JIT are not simple, there are almost 180 functions or callbacks!!</p>
<table>
<thead>
<tr>
<th>Interface</th>
<th style="text-align: right">Method Count</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjithost.h#L8-L46">ICorJitHost</a></td>
<td style="text-align: right">5</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corjit.h#L306-L484">ICorJitInfo</a></td>
<td style="text-align: right">19</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2886-L3156">ICorDynamicInfo</a></td>
<td style="text-align: right">36</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L1971-L2884">ICorStaticInfo</a></td>
<td style="text-align: right">118</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td style="text-align: right"><strong>178</strong></td>
</tr>
</tbody>
</table>
<p><strong>Note:</strong> The links take you to the function ‘definitions’ for a given interface. Alternatively all the methods are listed together <a href="https://gist.github.com/mattwarren/375c34ed71c37f7e89bb425cf8f0f964">in this gist</a>.</p>
<p><code class="language-plaintext highlighter-rouge">ICorJitHost</code> makes available ‘functionality that would normally be provided by the operating system’, predominantly the ability to allocate the ‘pages’ of memory <a href="/2017/07/10/Memory-Usage-Inside-the-CLR/#jit-memory-usage">that the JIT uses during compilation</a>.</p>
<p><code class="language-plaintext highlighter-rouge">ICorJitInfo</code> (<code class="language-plaintext highlighter-rouge">class ICorJitInfo : public ICorDynamicInfo</code>) contains more specific memory allocation routines, including ones for the ‘GC Info’ data, a ‘method/funclet’s unwind information’, ‘.rdata and .pdata for a method’ and the ‘exception handler blocks’.</p>
<p><code class="language-plaintext highlighter-rouge">ICorDynamicInfo</code> (<code class="language-plaintext highlighter-rouge">class ICorDynamicInfo : public ICorStaticInfo</code>) provides data that can change from ‘invocation to invocation’, i.e. the JIT cannot cache the results of these method calls. It includes functions that provide:</p>
<ul>
<li>Thread Local Storage (TLS) index</li>
<li>Function Entry Point (address)</li>
<li>EE ‘helper functions’</li>
<li>Address of a Field</li>
<li>Constructor for a <code class="language-plaintext highlighter-rouge">delegate</code></li>
<li>and <a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2886-L3156">much more</a></li>
</ul>
<p>Finally, <code class="language-plaintext highlighter-rouge">ICorStaticInfo</code>, which is further sub-divided up into more specific interfaces:</p>
<table>
<thead>
<tr>
<th>Interface</th>
<th style="text-align: right">Method Count</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L1980-L2201">ICorMethodInfo</a></td>
<td style="text-align: right">28</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2203-L2270">ICorModuleInfo</a></td>
<td style="text-align: right">9</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2272-L2598">ICorClassInfo</a></td>
<td style="text-align: right">49</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2600-L2649">ICorFieldInfo</a></td>
<td style="text-align: right">7</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2651-L2712">ICorDebugInfo</a></td>
<td style="text-align: right">4</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2731-L2767">ICorArgInfo</a></td>
<td style="text-align: right">4</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2769-L2817">ICorErrorInfo</a></td>
<td style="text-align: right">7</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2834-L2882">Diagnostic methods</a></td>
<td style="text-align: right">6</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2819-L2832">General methods</a></td>
<td style="text-align: right">2</td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2714-L2729">Misc methods</a></td>
<td style="text-align: right">2</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td style="text-align: right"><strong>118</strong></td>
</tr>
</tbody>
</table>
<p>Because the interface is nicely composed we can easily see what it provides. The bulk of the functions are concerned with information about a <code class="language-plaintext highlighter-rouge">module</code>, <code class="language-plaintext highlighter-rouge">class</code>, <code class="language-plaintext highlighter-rouge">method</code> or <code class="language-plaintext highlighter-rouge">field</code>. For instance the JIT can query the class size, GC layout and obtain the address of a field within a class. It can also learn about a method’s signature, find it’s parent class and get ‘exception handling’ information (the full list of methods are available <a href="https://gist.github.com/mattwarren/375c34ed71c37f7e89bb425cf8f0f964">in this gist</a>).</p>
<p><strong>These interfaces and the methods they contain give a nice insight into what information the JIT requests from the runtime and therefore what knowledge it requires when compiling a single method.</strong></p>
<hr />
<p>Now, let’s look at the <em>end-to-end flow</em> of a couple of these methods and see where they are implemented in the CoreCLR source code.</p>
<h3 id="ee--jit-getfunctionentrypoint">EE ➜ JIT <code class="language-plaintext highlighter-rouge">getFunctionEntryPoint(..)</code></h3>
<p>First we’ll look at a method where the EE provides information to the JIT:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2936-L2942">/src/inc/corinfo.h</a> (shared definition)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/jit/lower.cpp#L3148">/src/jit/lower.cpp</a> (<strong>method call from the JIT</strong>)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/jitinterface.h#L953-L955">/src/vm/jitinterface.h</a> (VM definition)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/jitinterface.cpp#L9091-L9142">/src/vm/jitinterface.cpp</a> (<strong>implementation in the VM</strong>)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/zap/zapinfo.cpp#L1872-L1904">/src/zap/zapinfo.cpp</a> (ZAP/NGEN implementation)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/jit/ICorJitInfo_API_wrapper.hpp#L1136-L1144">/src/jit/ICorJitInfo_API_wrapper.hpp</a> (wrapper)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/ToolBox/superpmi/superpmi/icorjitinfo.cpp#L1304-L1313">/src/ToolBox/superpmi/superpmi/icorjitinfo.cpp</a> (SuperPMI implementation)</li>
</ul>
<h3 id="jit--ee-reportinliningdecision">JIT ➜ EE <code class="language-plaintext highlighter-rouge">reportInliningDecision()</code></h3>
<p>Next we’ll look at a scenario where the data flows from the JIT back to the EE:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/inc/corinfo.h#L2036-L2042">/src/inc/corinfo.h</a> (shared definition)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/jit/inline.cpp#L734">/src/jit/inline.cpp</a> (<strong>method call from the JIT</strong>)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/jitinterface.h#L700-L703">/src/vm/jitinterface.h</a> (VM definition)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/vm/jitinterface.cpp#L7953-L8070">/src/vm/jitinterface.cpp</a> (<strong>implementation in the VM</strong>)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/zap/zapinfo.cpp#L3610-L3623">/src/zap/zapinfo.cpp</a> (ZAP/NGEN implementation)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/jit/ICorJitInfo_API_wrapper.hpp#L61-L69">/src/jit/ICorJitInfo_API_wrapper.hpp</a> (wrapper)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.1/src/ToolBox/superpmi/superpmi/icorjitinfo.cpp#L100-L110">/src/ToolBox/superpmi/superpmi/icorjitinfo.cpp</a> (SuperPMI implementation)</li>
</ul>
<hr />
<h3 id="superpmi-tool">SuperPMI tool</h3>
<p>Finally, I just want to cover the ‘SuperPMI’ tool that showed up in the previous 2 scenarios. What is this tool and what does it do? From the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/glossary.md">CoreCLR glossary</a>:</p>
<blockquote>
<p><strong>SuperPMI</strong> - JIT component test framework (super fast JIT testing - it mocks/replays EE in EE-JIT interface)</p>
</blockquote>
<p>So in a nutshell it allows JIT development and testing to be de-coupled from the EE, which is useful because we’ve just seen that the 2 components are tightly integrated.</p>
<p>But how does it work? From the <a href="https://github.com/dotnet/coreclr/tree/master/src/ToolBox/superpmi">README</a>:</p>
<blockquote>
<p>SuperPMI works in two phases: collection and playback. In the collection phase, the system is configured to collect SuperPMI data. Then, run any set of .NET managed programs. When these managed programs invoke the JIT compiler, <strong>SuperPMI gathers and captures all information passed between the JIT and its .NET host</strong>. In the playback phase, SuperPMI loads the JIT directly, and causes it to compile all the functions that it previously compiled, but using the collected data to provide answers to various questions that the JIT needs to ask. <strong>The .NET execution engine (EE) is not invoked at all.</strong></p>
</blockquote>
<p>This explains why there is a SuperPMI implementation for every method that is part of the JIT <-> EE interface. SuperPMI needs to ‘record’ or ‘collect’ each interaction with the EE and store the information so that it can be ‘played back’ at a later time, when the EE isn’t present.</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=17464054">Hacker News</a> or <a href="https://www.reddit.com/r/dotnet/comments/8wbdk1/net_jit_and_clr_joined_at_the_hip/">/r/dotnet</a></p>
<hr />
<h1 id="further-reading">Further Reading</h1>
<p>As always, if you’ve read this far, here’s some further information that you might find useful:</p>
<ul>
<li><a href="">Mono EE <-> JIT Interface</a></li>
<li>CoreRT implementation of the JIT/EE Interface (in C#)
<ul>
<li><a href="https://github.com/dotnet/corert/blob/master/src/Native/jitinterface/jitinterface.h">/src/Native/jitinterface/jitinterface.h</a> (auto-generated, how?)</li>
<li><a href="https://github.com/dotnet/corert/blob/master/src/JitInterface/src/CorInfoImpl.cs">/src/JitInterface/src/CorInfoImpl.cs</a> (partial class, the other part is in CoreInfoBase.cs)</li>
<li><a href="https://github.com/dotnet/corert/blob/master/src/JitInterface/src/CorInfoBase.cs">/src/JitInterface/src/CorInfoBase.cs</a> (auto-generated by ThunkGenerator, using jitinterface.h)</li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/JitInterface/src/ThunkGenerator">/src/JitInterface/src/ThunkGenerator</a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/src/JitInterface/src/ThunkGenerator/ThunkInput.txt">/src/JitInterface/src/ThunkGenerator/ThunkInput.txt</a></li>
</ul>
</li>
</ul>
Tools for Exploring .NET Internals2018-06-15T00:00:00+00:00http://www.mattwarren.org/2018/06/15/Tools for Exploring .NET Internals
<p>Whether you want to look at what your code is doing ‘<em>under-the-hood</em>’ or you’re trying to see what the ‘<em>internals</em>’ of the CLR look like, there is a whole range of tools that can help you out.</p>
<p>To give ‘<em>credit where credit is due</em>’, this post is <a href="https://twitter.com/matthewwarren/status/973940550473797633">based on a tweet</a>, so thanks to everyone who contributed to the list and if I’ve <strong>missed out any tools, please let me know in the comments below</strong>.</p>
<hr />
<p>While you’re here, I’ve also written other posts that look at the ‘internals’ of the .NET Runtime:</p>
<ul>
<li><a href="/2018/03/23/Exploring-the-internals-of-the-.NET-Runtime/?recommended=1">Exploring the Internals of the .NET Runtime</a> (a ‘how-to’ guide)</li>
<li><a href="/2018/01/22/Resources-for-Learning-about-.NET-Internals/?recommended=1">Resources for Learning about .NET Internals</a> (other blogs that cover ‘internals’)</li>
</ul>
<hr />
<h2 id="honourable-mentions">Honourable Mentions</h2>
<p>Firstly I’ll start by mentioning that <a href="https://msdn.microsoft.com/en-us/library/sc65sadd.aspx?f=255&MSPPError=-2147217396">Visual Studio has a great debugger</a> and <a href="https://code.visualstudio.com/docs/editor/debugging">so does VSCode</a>. Also there are lots of very good (commercial) <a href="https://stackoverflow.com/questions/3927/what-are-some-good-net-profilers">.NET Profilers</a> and <a href="https://www.quora.com/What-is-the-best-NET-Application-Server-Monitoring-Tool">Application Monitoring Tools</a> available that you should also take a look at. For example I’ve recently been playing around with <a href="http://www.getcodetrack.com/">Codetrack</a> and I’m very impressed by what it can do!</p>
<p>However, the rest of the post is going to look at some more <strong>single-use tools</strong> that give a <strong>even deeper insight</strong> into what is going on. As a added bonus they’re all ‘<strong>open-source</strong>’, so you can take a look at the code and see how they work!!</p>
<h3 id="perfview-by-vance-morrison"><a href="https://github.com/Microsoft/perfview">PerfView</a> by <a href="https://blogs.msdn.microsoft.com/vancem/">Vance Morrison</a></h3>
<p>PerfView is simply an excellent tool and is the one that I’ve used most over the years. It uses <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396">‘Event Tracing for Windows’ (ETW) Events</a> to provide a <strong>deep insight into what the CLR is doing</strong>, as well as allowing you to <strong>profile Memory and CPU usage</strong>. It does have a fairly steep learning curve, but there are some <a href="https://channel9.msdn.com/Series/PerfView-Tutorial">nice tutorials to help you along the way</a> and it’s absolutely worth the time and effort.</p>
<p>Also, if you need more proof of how useful it is, Microsoft Engineers themselves use it and many of the recent <a href="https://blogs.msdn.microsoft.com/dotnet/2018/02/02/net-core-2-1-roadmap/#user-content-build-time-performance">performance improvements in MSBuild</a> were carried out after using <a href="https://github.com/Microsoft/msbuild/search?q=PerfView&type=Issues">PerfView to find the bottlenecks</a>.</p>
<p>PerfView is built on-top of the <a href="https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent/">Microsoft.Diagnostics.Tracing.TraceEvent library</a> which you can use in your own tools. In addition, since it’s been open-sourced the community has contributed and it has gained some really nice features, <a href="https://github.com/Microsoft/perfview/pull/502">including flame-graphs</a>:</p>
<p><a href="/images/2018/06/PerfView Flamegraphs.png"><img src="/images/2018/06/PerfView Flamegraphs.png" alt="PerfView Flamegraphs" /></a></p>
<p>(<strong>Click for larger version</strong>)</p>
<h3 id="sharplab-by-andrey-shchekin"><a href="https://sharplab.io/">SharpLab</a> by <a href="https://twitter.com/ashmind">Andrey Shchekin</a></h3>
<p>SharpLab started out as a tool for inspecting the IL code emitted by the Roslyn compiler, but has now grown <a href="https://github.com/ashmind/SharpLab">into much more</a>:</p>
<blockquote>
<p>SharpLab is a .NET code playground that shows intermediate steps and results of code compilation.
Some language features are thin wrappers on top of other features – e.g. <code class="language-plaintext highlighter-rouge">using()</code> becomes <code class="language-plaintext highlighter-rouge">try/catch</code>.
SharpLab allows you to see the code as compiler sees it, and get a better understanding of .NET languages.</p>
</blockquote>
<p>If supports C#, Visual Basic and F#, but most impressive are the ‘Decompilation/Disassembly’ features:</p>
<blockquote>
<p>There are currently four targets for decompilation/disassembly:</p>
<ol>
<li>C#</li>
<li>Visual Basic</li>
<li>IL</li>
<li>JIT Asm (Native Asm Code)</li>
</ol>
</blockquote>
<p>That’s right, it will output the <a href="https://sharplab.io/#v2:EYLgZgpghgLgrgJwgZwLQBEJinANjASQDsYIFsBjCAgWwAdcIaITYBLAeyIBoYQpkNAD4ABAAwACEQEYA3AFgAUCIDMUgEwSAwhIDeSiYalqRAFgkBZABQBKPQaOOAblAQTSyGBIC8EgKwAdGIKio6OMgCcVh4wNiGOAL5KCUA==">assembly code</a> that the .NET JIT generates from your C#:</p>
<p><img src="/images/2018/06/SharpLab - Assembly Output.png" alt="SharpLab - Assembly Output" /></p>
<h3 id="object-layout-inspector-by-sergey-teplyakov"><a href="https://github.com/SergeyTeplyakov/ObjectLayoutInspector">Object Layout Inspector</a> by <a href="https://twitter.com/STeplyakov">Sergey Teplyakov</a></h3>
<p>This tool gives you an insight into the memory layout of your .NET objects, i.e. it will show you how the JITter has <strong>decided to arrange the fields</strong> within your <code class="language-plaintext highlighter-rouge">class</code> or <code class="language-plaintext highlighter-rouge">struct</code>. This can be useful when writing high-performance code and it’s helpful to have a tool that does it for us because doing it manually is tricky:</p>
<blockquote>
<p>There is no official documentation about fields layout because the CLR authors reserved the right to change it in the future. But knowledge about the layout can be helpful if you’re curious or if you’re working on a performance critical application.</p>
<p>How can we inspect the layout? We can look at a raw memory in Visual Studio or use <code class="language-plaintext highlighter-rouge">!dumpobj</code> command in <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension">SOS Debugging Extension</a>. These approaches are tedious and boring, so we’ll try to write a tool that will print an object layout at runtime.</p>
</blockquote>
<p>From the example in the <a href="https://github.com/SergeyTeplyakov/ObjectLayoutInspector#inspecting-a-value-type-layout-at-runtime">GitHub repo</a>, if you use <code class="language-plaintext highlighter-rouge">TypeLayout.Print<NotAlignedStruct>()</code> with code like this:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">struct</span> <span class="nc">NotAlignedStruct</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">byte</span> <span class="n">m_byte1</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">m_int</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">byte</span> <span class="n">m_byte2</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">short</span> <span class="n">m_short</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>You’ll get the following output, showing exactly how the CLR will layout the <code class="language-plaintext highlighter-rouge">struct</code> in memory, based on it’s padding and optimization rules.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Size: 12. Paddings: 4 (%33 of empty space)
|================================|
| 0: Byte m_byte1 (1 byte) |
|--------------------------------|
| 1-3: padding (3 bytes) |
|--------------------------------|
| 4-7: Int32 m_int (4 bytes) |
|--------------------------------|
| 8: Byte m_byte2 (1 byte) |
|--------------------------------|
| 9: padding (1 byte) |
|--------------------------------|
| 10-11: Int16 m_short (2 bytes) |
|================================|
</code></pre></div></div>
<h3 id="the-ultimate-net-experiment-tune-by-konrad-kokosa"><a href="http://tooslowexception.com/the-ultimate-net-experiment-project/">The Ultimate .NET Experiment (TUNE)</a> by <a href="https://twitter.com/konradkokosa">Konrad Kokosa</a></h3>
<p>TUNE is a really intriguing tool, as it says on the <a href="https://github.com/kkokosa/Tune">GitHub page</a>, it’s purpose is to help you</p>
<blockquote>
<p>… learn .NET internals and performance tuning by experiments with C# code.</p>
</blockquote>
<p>You can find out more information about what it does <a href="http://tooslowexception.com/the-ultimate-net-experiment-project/">in this blog post</a>, but at a high-level it <a href="https://github.com/kkokosa/Tune">works like this</a>:</p>
<blockquote>
<ul>
<li>write a sample, valid C# script which contains at least one class with public method taking a single string parameter. It will be executed by hitting Run button. This script can contain as many additional methods and classes as you wish. Just remember that first public method from the first public class will be executed (with single parameter taken from the input box below the script). …</li>
<li>after clicking Run button, the script will be compiled and executed. Additionally, it will be <strong>decompiled both to IL (Intermediate Language) and assembly code</strong> in the corresponding tabs.</li>
<li>all the time Tune is running (including time during script execution) a graph with GC data is being drawn. It shows information about <strong>generation sizes and GC occurrences</strong> (illustrated as vertical lines with the number below indicating which generation has been triggered).</li>
</ul>
</blockquote>
<p>And looks like this:</p>
<p><a href="/images/2018/06/TUNE Screenshot.png"><img src="/images/2018/06/TUNE Screenshot.png" alt="TUNE Screenshot" /></a></p>
<p>(<strong>Click for larger version</strong>)</p>
<hr />
<h2 id="tools-based-on-clr-memory-diagnostics-clrmd">Tools based on CLR Memory Diagnostics (ClrMD)</h2>
<p>Finally, we’re going to look at a particular category of tools. Since .NET came out you’ve always been able to use <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/getting-started-with-windbg">WinDBG</a> and the <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension">SOS Debugging Extension</a> to get deep into the .NET runtime. However it’s not always the easiest tool to <strong>get started with</strong> and as this tweet says, it’s not always the most <strong>productive</strong> way to do things:</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Besides how complex it is, the idea is to build better abstractions. Raw debugging at the low level is just usually too unproductive. That to me is the promise of ClrMD, that it lets us build specific extensions to extract quickly the right info</p>— Tomas Restrepo (@tomasrestrepo) <a href="https://twitter.com/tomasrestrepo/status/973924168365498370?ref_src=twsrc%5Etfw">March 14, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Fortunately Microsoft made the <a href="/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD/">ClrMD library available</a> (a.k.a <a href="https://www.nuget.org/packages/Microsoft.Diagnostics.Runtime">Microsoft.Diagnostics.Runtime</a>), so now anyone can write a tool that analyses <strong>memory dumps</strong> of .NET programs. You can find out even more info in the <a href="https://blogs.msdn.microsoft.com/dotnet/2013/05/01/net-crash-dump-and-live-process-inspection/">official blog post</a> and I also recommend taking a look at <a href="https://github.com/JeffCyr/ClrMD.Extensions">ClrMD.Extensions</a> that “<em>.. provide integration with LINPad and to make ClrMD even more easy to use</em>”.</p>
<p>I wanted to pull together a list of all the existing tools, so I enlisted <a href="https://twitter.com/matthewwarren/status/973940550473797633">twitter to help</a>. <strong>Note to self</strong>: careful what you tweet, the WinDBG Product Manager might read your tweets and <a href="https://twitter.com/aluhrs13/status/973948038380109824">get a bit upset</a>!!</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Well this just hurts my feelings :(</p>— Andy Luhrs (@aluhrs13) <a href="https://twitter.com/aluhrs13/status/973948038380109824?ref_src=twsrc%5Etfw">March 14, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Most of these tools are based on ClrMD because it’s the easiest way to do things, however you can use the <a href="https://twitter.com/goldshtn/status/973941389791809540">underlying COM interfaces directly</a> if you want. Also, it’s worth pointing out that any tool based on ClrMD is <strong>not cross-platform</strong>, because <a href="https://twitter.com/goldshtn/status/973942794296406017">ClrMD itself is Windows-only</a>. For cross-platform options see <a href="http://blogs.microsoft.co.il/sasha/2017/02/26/analyzing-a-net-core-core-dump-on-linux/">Analyzing a .NET Core Core Dump on Linux</a></p>
<p>Finally, in the interest of balance, there have been lots of recent <a href="https://blogs.msdn.microsoft.com/windbg/2017/08/28/new-windbg-available-in-preview/">improvements to WinDBG</a> and because it’s extensible there have been various efforts to add functionality to it:</p>
<ul>
<li><a href="http://labs.criteo.com/2017/09/extending-new-windbg-part-1-buttons-commands/">Extending the new WinDbg, Part 1 – Buttons and commands</a></li>
<li><a href="http://labs.criteo.com/2018/01/extending-new-windbg-part-2-tool-windows-command-output/">Extending the new WinDbg, Part 2 – Tool windows and command output</a></li>
<li><a href="http://labs.criteo.com/2018/05/extending-new-windbg-part-3-embedding-c-interpreter/">Extending the new WinDbg, Part 3 – Embedding a C# interpreter</a></li>
<li><a href="https://github.com/chrisnas/DebuggingExtensions">WinDBG extension + UI tool extensions</a> and <a href="https://github.com/kevingosse/windbg-extensions">here</a></li>
<li><a href="https://github.com/rodneyviana/netext">NetExt</a> a WinDBG application that <a href="https://blogs.msdn.microsoft.com/rodneyviana/2015/03/10/getting-started-with-netext/">makes .NET debugging much easier</a> as compared to the current options: sos or psscor, also see <a href="https://www.infoq.com/news/2013/11/netext">this InfoQ article</a></li>
</ul>
<p><strong>Having said all that, onto the list</strong>:</p>
<ul>
<li><a href="https://www.slideshare.net/ChristophNeumller/large-scale-crash-dump-analysis-with-superdump">SuperDump</a> (<a href="https://github.com/Dynatrace/superdump">GitHub</a>)
<ul>
<li>A service for automated crash-dump analysis (<a href="https://www.slideshare.net/ChristophNeumller/large-scale-crash-dump-analysis-with-superdump">presentation</a>)</li>
</ul>
</li>
<li><a href="https://github.com/goldshtn/msos/wiki">msos</a> (<a href="https://github.com/goldshtn/msos">GitHub</a>)
<ul>
<li>Command-line environment a-la WinDbg for executing SOS commands without having SOS available.</li>
</ul>
</li>
<li><a href="https://github.com/fremag/MemoScope.Net/wiki">MemoScope.Net</a> (<a href="https://github.com/fremag/MemoScope.Net">GitHub</a>)
<ul>
<li>A tool to analyze .Net process memory Can dump an application’s memory in a file and read it later.</li>
<li>The dump file contains all data (objects) and threads (state, stack, call stack). MemoScope.Net will analyze the data and help you to find memory leaks and deadlocks</li>
</ul>
</li>
<li><a href="https://github.com/0xd4d/dnSpy#dnspy">dnSpy</a> (<a href="https://github.com/0xd4d/dnSpy">GitHub</a>)
<ul>
<li>.NET debugger and assembly editor</li>
<li>You can use it to edit and debug assemblies even if you don’t have any source code available!!</li>
</ul>
</li>
<li><a href="https://aloiskraus.wordpress.com/2017/08/17/memanalyzer-v2-5-released/">MemAnalyzer</a> (<a href="https://github.com/Alois-xx/MemAnalyzer">GitHub</a>)
<ul>
<li>A command line memory analysis tool for managed code.</li>
<li>Can show which objects use most space on the managed heap just like <code class="language-plaintext highlighter-rouge">!DumpHeap</code> from Windbg without the need to install and attach a debugger.</li>
</ul>
</li>
<li><a href="https://mycodingplace.wordpress.com/2016/11/24/dumpminer-ui-tool-for-playing-with-clrmd/">DumpMiner</a> (<a href="https://github.com/dudikeleti/DumpMiner">GitHub</a>)
<ul>
<li>UI tool for playing with ClrMD, with more features <a href="https://twitter.com/dudi_ke/status/973930633935409153">coming soon</a></li>
</ul>
</li>
<li><a href="http://devops.lol/tracecli-a-production-debugging-and-tracing-tool/">Trace CLI</a> (<a href="https://github.com/ruurdk/TraceCLI/">GitHub</a>)
<ul>
<li>A production debugging and tracing tool</li>
</ul>
</li>
<li><a href="https://github.com/enkomio/shed">Shed</a> (<a href="https://github.com/enkomio/shed">GitHub</a>)
<ul>
<li>Shed is an application that allow to inspect the .NET runtime of a program in order to extract useful information. It can be used to inspect malicious applications in order to have a first general overview of which information are stored once that the malware is executed. Shed is able to:
<ul>
<li>Extract all objects stored in the managed heap</li>
<li>Print strings stored in memory</li>
<li>Save the snapshot of the heap in a JSON format for post-processing</li>
<li>Dump all modules that are loaded in memory</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>You can also find many other tools that <a href="https://github.com/search?p=2&q=CLRMD&type=Repositories&utf8=%E2%9C%93">make use of ClrMD</a>, it was a very good move by Microsoft to make it available.</p>
<hr />
<h2 id="other-tools">Other Tools</h2>
<p>A few other tools that are also worth mentioning:</p>
<ul>
<li><a href="https://support.microsoft.com/en-gb/help/2895198/debug-diagnostics-tool-v2-0-is-now-available">DebugDiag</a>
<ul>
<li>The DebugDiag tool is designed to assist in troubleshooting issues such as hangs, slow performance, memory leaks or memory fragmentation, and crashes in any user-mode process (now with ‘CLRMD Integration’)</li>
</ul>
</li>
<li><a href="http://www.stevestechspot.com/SOSEXANewDebuggingExtensionForManagedCode.aspx">SOSEX</a> (might not be <a href="https://twitter.com/tomasrestrepo/status/974049014244171776">developed any more</a>)
<ul>
<li>… a debugging extension for managed code that begins to alleviate some of my frustrations with SOS</li>
</ul>
</li>
<li><a href="https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap">VMMap</a> from Sysinternals
<ul>
<li>VMMap is a process virtual and physical memory analysis utility.</li>
<li>I’ve previously used it to look at <a href="/2017/07/10/Memory-Usage-Inside-the-CLR/">Memory Usage <em>Inside</em> the CLR</a></li>
</ul>
</li>
</ul>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=17323911">Hacker News</a> or <a href="https://www.reddit.com/r/programming/comments/8rel9m/tools_for_exploring_net_internals/">/r/programming</a></p>
CoreRT - A .NET Runtime for AOT2018-06-07T00:00:00+00:00http://www.mattwarren.org/2018/06/07/CoreRT-.NET-Runtime-for-AOT
<p>Firstly, what exactly is <strong>CoreRT</strong>? From <a href="https://github.com/dotnet/corert">its GitHub repo</a>:</p>
<blockquote>
<p>.. a .NET Core runtime optimized for AOT (ahead of time compilation) scenarios, with the accompanying .NET native compiler toolchain</p>
</blockquote>
<p><strong>The rest of this post will look at what that actually means.</strong></p>
<hr />
<h1 id="contents">Contents</h1>
<ol>
<li><a href="#existing">Existing .NET ‘AOT’ Implementations</a></li>
<li><a href="#highlevel">High-Level Overview</a></li>
<li><a href="#compiler">The Compiler</a></li>
<li><a href="#runtime">The Runtime</a></li>
<li><a href="#helloworld">‘Hello World’ Program</a></li>
<li><a href="#limitations">Limitations</a></li>
<li><a href="#furtherreading">Further Reading</a></li>
</ol>
<hr />
<p><span id="existing"></span></p>
<h1 id="existing-net-aot-implementations">Existing .NET ‘AOT’ Implementations</h1>
<p>However, before we look at what <strong>CoreRT</strong> is, it’s worth pointing out there are existing .NET ‘Ahead-of-Time’ (AOT) implementations that have been around for a while:</p>
<p><strong>Mono</strong></p>
<ul>
<li><a href="http://tirania.org/blog/archive/2006/Aug-17.html">Ahead of Time Compilation in Mono</a> (August 2006)</li>
<li><a href="http://www.mono-project.com/docs/advanced/aot/">Mono Docs - AOT</a> (also see <a href="http://www.mono-project.com/docs/advanced/runtime/docs/aot/">this link</a>)</li>
<li><a href="https://xamarinhelp.com/xamarin-android-aot-works/">How Xamarin.Android AOT Works</a></li>
<li><a href="https://docs.microsoft.com/en-us/xamarin/ios/internals/architecture#aot">Xamarin.iOS - Architecture - AOT</a></li>
</ul>
<p><strong>.NET Native</strong> (Windows 10/UWP apps only, a.k.a <a href="https://www.zdnet.com/article/microsoft-releases-a-preview-build-of-its-mysterious-project-n/">‘Project N’</a>)</p>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2014/04/02/announcing-net-native-preview/">Announcing .NET Native Preview</a> (April 2014)</li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2014/05/09/the-net-native-tool-chain/">The .NET Native Tool-Chain</a></li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/tag/dotnetnative/">Archive of ‘.NET Native’ Blogs Posts</a></li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/framework/net-native/">Compiling Apps with .NET Native</a> (docs)</li>
<li><a href="https://blogs.windows.com/buildingapps/2015/08/20/net-native-what-it-means-for-universal-windows-platform-uwp-developers/">.NET Native – What it means for Universal Windows Platform (UWP) developers</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2014/04/28/net-native-performance-internals/">Introduction to .NET Native</a></li>
</ul>
<p>So if there were existing implementations, why was CoreRT created? The <a href="https://blogs.msdn.microsoft.com/alphageek/2016/10/13/native-compilation-why-jit-when-you-can-codegen/">official announcement</a> gives us some idea:</p>
<blockquote>
<p>If we want to shortcut this two-step compilation process and deliver a 100% native application on Windows, Mac, and Linux, we need an alternative to the CLR. The project that is aiming to deliver that solution with an ahead-of-time compilation process is called CoreRT.</p>
</blockquote>
<p>The main difference is that CoreRT is designed to support <strong>.NET Core scenarios</strong>, i.e. <a href="https://blogs.msdn.microsoft.com/dotnet/2016/09/26/introducing-net-standard/">.NET Standard</a>, <a href="https://github.com/dotnet/corert#platform-support">cross-platform</a>, etc.</p>
<p>Also worth pointing out is that whilst <strong>.NET Native</strong> is a separate product, they are related and in fact <a href="https://github.com/dotnet/corert/issues/5780#issuecomment-387103170">“.NET Native shares many CoreRT parts”</a>.</p>
<hr />
<p><span id="highlevel"></span></p>
<h1 id="high-level-overview">High-Level Overview</h1>
<p>Because all the code is open source, we can very easily identify the main components and understand where the complexity is. Firstly lets look at where the most ‘<strong>lines of code</strong>’ are:</p>
<p><a href="/images/2018/06/Source Code - LOC in Main Components.png"><img src="/images/2018/06/Source Code - LOC in Main Components.png" alt="Source Code - LOC in Main Components" /></a></p>
<p>We clearly see that the majority of the code is written in C#, with only the <a href="https://github.com/dotnet/corert/tree/master/src/Native">Native</a> component written in C++. The largest single component is <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib">System.Private.CoreLib</a> which is all C# code, although there are other sub-components that contribute to it (‘System.Private.XXX’), such as <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.Interop/src">System.Private.Interop</a> (36,547 LOC), <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.TypeLoader">System.Private.TypeLoader</a> (30,777) and <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.Reflection.Core/src">System.Private.Reflection.Core</a> (24,964). Other significant components are the <a href="https://github.com/dotnet/corert/tree/master/src/ILCompiler">‘Intermediate Language (IL) Compiler’</a> and the <a href="https://github.com/dotnet/corert/tree/master/src/Common">Common code</a> that is used re-used by everything else.</p>
<p>All these components are discussed in more detail below.</p>
<hr />
<p><span id="compiler"></span></p>
<h1 id="the-compiler">The Compiler</h1>
<p>So whilst CoreRT is a run-time, it also needs a compiler to put everything together, from <a href="https://github.com/dotnet/corert/blob/master/Documentation/intro-to-corert.md">Intro to .NET Native and CoreRT</a>:</p>
<blockquote>
<p><a href="https://msdn.microsoft.com/library/dn584397.aspx">.NET Native</a> is a native toolchain that compiles <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">CIL byte code</a> to machine code (e.g. X64 instructions). By default, .NET Native (for .NET Core, as opposed to UWP) uses RyuJIT as an ahead-of-time (AOT) compiler, the same one that CoreCLR uses as a just-in-time (JIT) compiler. It can also be used with other compilers, such as <a href="https://github.com/dotnet/llilc">LLILC</a>, UTC for UWP apps and <a href="https://github.com/dotnet/corert/tree/master/src/ILCompiler.CppCodeGen/src/CppCodeGen">IL to CPP</a> (an IL to textual C++ compiler we have built as a reference prototype).</p>
</blockquote>
<p>But what does this actually look like in practice, as they say ‘<em>a picture paints a thousand words</em>’:</p>
<p><a href="/images/2018/06/CoreRT - compilation process.png"><img src="/images/2018/06/CoreRT - compilation process.png" alt="CoreRT - compilation process" /></a></p>
<p>(<strong>Click for larger version</strong>)</p>
<p>To give more detail, the main compilation phases (started from <a href="https://github.com/dotnet/corert/blob/39f518734c7712241ff332bce6c2f3585b7a5a42/src/ILCompiler/src/Program.cs#L218-L548">\ILCompiler\src\Program.cs</a>) are the following:</p>
<ol>
<li>Calculate the <strong>reachable modules/types/classes</strong>, i.e. the <a href="https://github.com/dotnet/corert/blob/39f518734c7712241ff332bce6c2f3585b7a5a42/src/ILCompiler/src/Program.cs#L315-L393">‘compilation roots’</a> using the <a href="https://github.com/dotnet/corert/blob/b2068273c52ca7392bb5ca3aac4299c007d9a743/src/ILCompiler.Compiler/src/Compiler/ILScanner.cs">ILScanner.cs</a></li>
<li>Allow for <strong>reflection</strong>, via an <a href="https://github.com/dotnet/corert/blob/39f518734c7712241ff332bce6c2f3585b7a5a42/src/ILCompiler/src/Program.cs#L387-L392">optional rd.xml file</a> and generate the <a href="https://github.com/dotnet/corert/blob/39f518734c7712241ff332bce6c2f3585b7a5a42/src/ILCompiler/src/Program.cs#L417-L456">necessary metadata</a> using <a href="https://github.com/dotnet/corert/tree/master/src/ILCompiler.MetadataWriter">ILCompiler.MetadataWriter</a></li>
<li><strong>Compile the IL</strong> using the specific back-end (generic/shared code is in <a href="https://github.com/dotnet/corert/blob/master/src/ILCompiler.Compiler/src/Compiler/Compilation.cs">Compilation.cs</a>)
<ul>
<li>RyuJIT <a href="https://github.com/dotnet/corert/blob/master/src/ILCompiler.Compiler/src/Compiler/RyuJitCompilation.cs">RyuJitCompilation.cs</a></li>
<li>Web Assembly (WASM) <a href="https://github.com/dotnet/corert/blob/master/src/ILCompiler.WebAssembly/src/Compiler/WebAssemblyCodegenCompilation.cs">WebAssemblyCodegenCompilation.cs</a></li>
<li>C++ Code <a href="https://github.com/dotnet/corert/blob/master/src/ILCompiler.CppCodeGen/src/Compiler/CppCodegenCompilation.cs">CppCodegenCompilation.cs</a></li>
</ul>
</li>
<li>Finally, <strong>write out</strong> the <a href="https://github.com/dotnet/corert/blob/39f518734c7712241ff332bce6c2f3585b7a5a42/src/ILCompiler/src/Program.cs#L488-L499">compiled methods</a> using <a href="https://github.com/dotnet/corert/blob/master/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/ObjectWriter.cs">ObjectWriter</a> which in turn uses <a href="https://github.com/dotnet/corert/tree/master/src/Native/ObjWriter">LLVM under-the-hood</a></li>
</ol>
<p>But it’s not just your code that ends up in the final .exe, along the way the CoreRT compiler also generates several ‘helper methods’ to cover the following scenarios:</p>
<ul>
<li><strong>IL Code</strong> (<a href="https://github.com/dotnet/corert/search?p=2&q=%22public+override+MethodIL+EmitIL%28%29%22&unscoped_q=%22public+override+MethodIL+EmitIL%28%29%22">via the ‘EmitIL()’ method</a>)
<ul>
<li><a href="https://github.com/dotnet/corert/blob/dfcd12f92c37d7533dcf7a48e9ab16295d84cf31/src/Common/src/TypeSystem/IL/Stubs/DelegateThunks.cs#L108">Delegates</a></li>
<li><a href="https://github.com/dotnet/corert/blob/bd7692c6ab69079fdaa543a0964fc0c1ebb17284/src/Common/src/TypeSystem/IL/Stubs/DelegateMarshallingMethodThunk.cs#L226">P/Invoke Delegates</a></li>
<li><a href="https://github.com/dotnet/corert/blob/61c403456e3199d4ef5098aa48f43cd79fb7feed/src/Common/src/TypeSystem/Interop/IL/InlineArrayType.cs#L328">Inlined Array methods</a></li>
<li><a href="https://github.com/dotnet/corert/blob/61c403456e3199d4ef5098aa48f43cd79fb7feed/src/ILCompiler.Compiler/src/Compiler/CompilerTypeSystemContext.BoxedTypes.cs#L440">Boxing</a></li>
<li><a href="https://github.com/dotnet/corert/blob/dfcd12f92c37d7533dcf7a48e9ab16295d84cf31/src/Common/src/TypeSystem/IL/Stubs/DynamicInvokeMethodThunk.cs#L293">Dynamically Invoked methods</a></li>
<li><a href="https://github.com/dotnet/corert/blob/da332710edc5387a79e298aa97f21e1feac56ceb/src/Common/src/TypeSystem/IL/Stubs/EnumThunks.cs#L80">Enum GetHashCode()</a></li>
<li><a href="https://github.com/dotnet/corert/blob/da332710edc5387a79e298aa97f21e1feac56ceb/src/Common/src/TypeSystem/IL/Stubs/AssemblyGetExecutingAssemblyMethodThunk.cs#L58">Assembly GetExecutingAssembly()</a></li>
</ul>
</li>
<li><strong>Assembly Code</strong> (<a href="https://github.com/dotnet/corert/search?q=%22override+void+EmitCode%28%22&unscoped_q=%22override+void+EmitCode%28%22">via the ‘EmitCode()’ method</a>) (different implementaions for each CPU architecure)
<ul>
<li><a href="https://github.com/dotnet/corert/blob/b68c08e3ce8c7647cd6b8954f625aae4c706bd33/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/Target_X64/X64UnboxingStubNode.cs#L11">Unboxing</a> (x64)</li>
<li><a href="https://github.com/dotnet/corert/blob/5314ca27fbd6ca56c467d710c25e1e614ad5d625/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/Target_ARM64/ARM64JumpStubNode.cs#L11">Jump Stubs</a> (ARM64)</li>
<li><a href="https://github.com/dotnet/corert/blob/1f3d243d7b39c53e6bfb3cc81a25227d1b0dfb2e/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/Target_X64/X64ReadyToRunGenericHelperNode.cs#L62">‘Ready to Run’ Generic helper</a> (x86)</li>
</ul>
</li>
</ul>
<p>Fortunately the compiler doesn’t blindly include all the code it finds, it is intelligent enough to <a href="https://github.com/dotnet/corert/issues/5564#issuecomment-375625357">only include code that’s actually used</a>:</p>
<blockquote>
<p>We don’t use ILLinker, but everything gets naturally treeshaken by the compiler itself (we start with compiling <code class="language-plaintext highlighter-rouge">Main</code>/<code class="language-plaintext highlighter-rouge">NativeCallable</code> exports and continue compiling other methods and generating necessary data structures as we go). If there’s a type or method that is not used, the compiler doesn’t even look at it.</p>
</blockquote>
<hr />
<p><span id="runtime"></span></p>
<h1 id="the-runtime">The Runtime</h1>
<p>All the user/helper code then sits on-top of the <strong>CoreRT runtime</strong>, from <a href="https://github.com/dotnet/corert/blob/master/Documentation/intro-to-corert.md">Intro to .NET Native and CoreRT</a>:</p>
<blockquote>
<p>CoreRT is the .NET Core runtime that is optimized for AOT scenarios, which .NET Native targets. <strong>This is a refactored and layered runtime</strong>. The base is a small native execution engine that provides services such as garbage collection(GC). <strong>This is the same GC used in CoreCLR</strong>. Many other parts of the traditional .NET runtime, such as the type system, are implemented in C#. <strong>We’ve always wanted to implement runtime functionality in C#</strong>. We now have the infrastructure to do that. In addition, library implementations that were built deep into CoreCLR, have also been cleanly refactored and implemented as C# libraries.</p>
</blockquote>
<p>This last point is interesting, why is it advantageous to implement ‘runtime functionality in C#’? Well it turns out that it’s hard to do in an un-managed language because there’s some very subtle and hard-to-track-down ways that you can get it wrong:</p>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Reliability and performance. The C/C++ code has to manually managed. It means that one has to be very careful to report all GC references to the GC. The manually managed code is both very hard to get right and it has performance overhead.</p>— Jan Kotas (@JanKotas7) <a href="https://twitter.com/JanKotas7/status/988622367973720064?ref_src=twsrc%5Etfw">April 24, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>These are known as ‘GC Holes’ and the BOTR provides <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-code-guide.md#2.1">more detail on them</a>. The author of that tweet is significant, Jan Kotas has worked on the .NET runtime <a href="https://channel9.msdn.com/Blogs/funkyonex/Happy-Birthday-NET-with-Jan-Kotas">for a long time</a>, if he thinks something is hard, it really is!!</p>
<h2 id="runtime-components">Runtime Components</h2>
<p>As previously mentioned it’s a <em>layered runtime</em>, i.e made up of several, distinct components, as explained in <a href="https://github.com/dotnet/corert/issues/5523#issuecomment-374229675">this comment</a>:</p>
<blockquote>
<p><strong>At the core of CoreRT, there’s a runtime that provides basic services for the code to run (think: garbage collection, exception handling, stack walking)</strong>. This runtime is pretty small and mostly depends on C/C++ runtime (even the C++ runtime dependency is not a hard requirement as Jan pointed out - <a href="https://github.com/dotnet/corert/issues/3564">#3564</a>). This code mostly lives in <a href="https://github.com/dotnet/corert/tree/master/src/Native/Runtime">src/Native/Runtime</a>, <a href="https://github.com/dotnet/corert/tree/master/src/Native/gc">src/Native/gc</a>, and <a href="https://github.com/dotnet/corert/tree/master/src/Runtime.Base">src/Runtime.Base</a>. It’s structured so that the places that <em>do require</em> interacting with the underlying platform (allocating native memory, threading, etc.) go through a platform abstraction layer (PAL). We have a PAL for Windows, Linux, and macOS, but others can be added.</p>
</blockquote>
<p>And you can see the <strong>PAL Components</strong> in the following locations:</p>
<ul>
<li><a href="https://github.com/dotnet/corert/tree/master/src/Native/Runtime/windows">Windows</a></li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/Native/Runtime/unix">Unix</a></li>
<li>MacOS <a href="https://github.com/dotnet/corert/search?utf8=%E2%9C%93&q=%23ifdef+__APPLE__&type=">‘Apple’</a> and <a href="https://github.com/dotnet/corert/search?utf8=%E2%9C%93&q=OSX&type=">‘OSX’</a></li>
</ul>
<h2 id="c-code-shared-with-coreclr">C# Code shared with CoreCLR</h2>
<p>One interesting aspect of the CoreRT runtime is that wherever possible it shares code with the <a href="https://github.com/dotnet/coreclr">CoreCLR runtime</a>, this is part of a larger effort to ensure that wherever possible code is shared across <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/shared">multiple repositories</a>:</p>
<blockquote>
<p>This directory contains the shared sources for System.Private.CoreLib. These are shared between <a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/shared">dotnet/corert</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/shared">dotnet/coreclr</a> and <a href="https://github.com/dotnet/corefx/tree/master/src/Common/src/CoreLib">dotnet/corefx</a>.
The sources are synchronized with a mirroring tool that watches for new commits on either side and creates new pull requests (as @dotnet-bot) in the other repository.</p>
</blockquote>
<p>Recently there has been a significant amount of work done to moved more and more code over into the ‘shared partition’ to ensure work isn’t duplicated and any fixes are shared across both locations. You can see how this works by looking at the links below:</p>
<ul>
<li>CoreRT
<ul>
<li><a href="https://github.com/dotnet/corert/search?q=shared+partition&type=Commits&utf8=%E2%9C%93">‘shared partition’ commits</a></li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/src">Normal System.Private.Corelib</a></li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/shared">Shared System.Private.Corelib</a></li>
</ul>
</li>
<li>CoreCLR
<ul>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22shared+partition%22&type=Commits">‘shared partition’ commits</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src">Normal mscorlib</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/shared">Shared mscorlib</a></li>
</ul>
</li>
</ul>
<p>What this means is that about 2/3 of the C# code in <code class="language-plaintext highlighter-rouge">System.Private.CoreLib</code> is shared with <code class="language-plaintext highlighter-rouge">CoreCLR</code> and only 1/3 is unique to <code class="language-plaintext highlighter-rouge">CoreRT</code>:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Group</th>
<th style="text-align: center">C# LOC (Files)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/shared">shared</a></td>
<td style="text-align: center"><strong>170,106 (759)</strong></td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/src">src</a></td>
<td style="text-align: center"><strong>96,733 (351)</strong></td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: center"><strong>266,839 (1,110)</strong></td>
</tr>
</tbody>
</table>
<h2 id="native-code">Native Code</h2>
<p>Finally, whilst it is advantageous to write as much code as possible in C#, there are certain components that have to be written in C++, these include the <a href="https://github.com/dotnet/corert/tree/master/src/Native/gc"><strong>GC</strong></a> (the majority of which is one file, <a href="https://github.com/dotnet/corert/blob/master/src/Native/gc/gc.cpp">gc.cpp</a> which is almost 37,000 LOC!!), the <a href="https://github.com/dotnet/corert/tree/master/src/Native/jitinterface"><strong>JIT Interface</strong></a>, <a href="https://github.com/dotnet/corert/tree/master/src/Native/ObjWriter"><strong>ObjWriter</strong></a> (based on LLVM) and most significantly the <strong><a href="https://github.com/dotnet/corert/tree/master/src/Native/Runtime">Core Runtime</a></strong> that contains code for activities like:</p>
<ul>
<li>Threading</li>
<li>Stack Frame handling</li>
<li>Debugging/Profiling</li>
<li>Interfacing to the OS</li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/Native/Runtime/arm64">CPU specific helpers</a> for:
<ul>
<li>Exception handling</li>
<li>GC Write Barriers</li>
<li>Stubs/Thunks</li>
<li>Optimised object allocation</li>
</ul>
</li>
</ul>
<hr />
<p><span id="helloworld"></span></p>
<h1 id="hello-world-program">‘Hello World’ Program</h1>
<p>One of the first things people asked about CoreRT is “<em>what is the size of a ‘Hello World’ app</em>” and the answer is ~3.93 MB (if you compile in Release mode), but there is work <a href="https://github.com/dotnet/corert/issues/5013">being done to reduce this</a>. At a ‘high-level’, the .exe that is produced looks like this:</p>
<p><a href="/images/2018/06/Exe Components.png"><img src="/images/2018/06/Exe Components.png" alt="Exe Components" /></a></p>
<p><strong>Note</strong> the different colours correspond to the original format of a component, obviously the output is a single, native, executable file.</p>
<p>This file comes with a full .NET specific ‘base runtime’ or ‘class libraries’ (‘System.Private.XXX’) so you get a lot of functionality, it is not the <a href="https://github.com/dotnet/corert/issues/5523#issuecomment-374229675">absolute bare-minimum app</a>. Fortunately there is a way to see what a ‘bare-minimum’ runtime would look like by compiling against the <a href="https://github.com/dotnet/corert/tree/master/src/Test.CoreLib">Test.CoreLib</a> project included in the CoreRT source. By using this you end up with an .exe that looks like this:</p>
<p><a href="/images/2018/06/Exe Components.png"><img src="/images/2018/06/Exe Components - Reduced CoreLib.png" alt="Exe Components - Reduced CoreLib" /></a></p>
<p>But it’s so minimal that OOTB you can’t even write ‘Hello World’ to the console as there is no <code class="language-plaintext highlighter-rouge">System.Console</code> type! After a bit of hacking I was able to build a version that did have a working <code class="language-plaintext highlighter-rouge">Console</code> output (if you’re interested, this diff is <a href="https://gist.github.com/mattwarren/a248782078d15c4ca2999f986ba7eacb#file-corert-test-corelib-changes-diff">available here</a>). To make it work I had to include the following components:</p>
<ul>
<li><a href="https://gist.github.com/mattwarren/a248782078d15c4ca2999f986ba7eacb#gistcomment-2612860">System.Console</a></li>
<li><a href="https://gist.github.com/mattwarren/a248782078d15c4ca2999f986ba7eacb#gistcomment-2612864">System.Text.UnicodeEncoding</a></li>
<li><a href="https://gist.github.com/mattwarren/a248782078d15c4ca2999f986ba7eacb#gistcomment-2612862">String handling</a></li>
<li><a href="https://gist.github.com/mattwarren/a248782078d15c4ca2999f986ba7eacb#gistcomment-2612866">P/Invoke and Marshalling support</a> (to call an OS function)</li>
</ul>
<p>So <code class="language-plaintext highlighter-rouge">Test.CoreLib</code> really is a minimal runtime!! But the difference in size is dramatic, it shrinks down to <strong>0.49 MB</strong> compared to <strong>3.93 MB</strong> for the fully-featured runtime!</p>
<table>
<thead>
<tr>
<th style="text-align: left">Type</th>
<th style="text-align: right">Standard (bytes)</th>
<th style="text-align: right">Test.CoreLib (bytes)</th>
<th style="text-align: right">Difference</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">.data</td>
<td style="text-align: right">163,840</td>
<td style="text-align: right">36,864</td>
<td style="text-align: right">-126,976</td>
</tr>
<tr>
<td style="text-align: left">.managed</td>
<td style="text-align: right">1,540,096</td>
<td style="text-align: right">65,536</td>
<td style="text-align: right">-1,474,560</td>
</tr>
<tr>
<td style="text-align: left">.pdata</td>
<td style="text-align: right">147,456</td>
<td style="text-align: right">20,480</td>
<td style="text-align: right">-126,976</td>
</tr>
<tr>
<td style="text-align: left">.rdata</td>
<td style="text-align: right">1,712,128</td>
<td style="text-align: right">81,920</td>
<td style="text-align: right">-1,630,208</td>
</tr>
<tr>
<td style="text-align: left">.reloc</td>
<td style="text-align: right">98,304</td>
<td style="text-align: right">4,096</td>
<td style="text-align: right">-94,208</td>
</tr>
<tr>
<td style="text-align: left">.text</td>
<td style="text-align: right">360,448</td>
<td style="text-align: right">299,008</td>
<td style="text-align: right">-61,440</td>
</tr>
<tr>
<td style="text-align: left">rdata</td>
<td style="text-align: right">98,304</td>
<td style="text-align: right">4,096</td>
<td style="text-align: right">-94,208</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right"> </td>
<td style="text-align: right"> </td>
<td style="text-align: right"> </td>
</tr>
<tr>
<td style="text-align: left">Total (bytes)</td>
<td style="text-align: right">4,120,576</td>
<td style="text-align: right">512,000</td>
<td style="text-align: right">-3,608,576</td>
</tr>
<tr>
<td style="text-align: left">Total (MB)</td>
<td style="text-align: right">3.93</td>
<td style="text-align: right">0.49</td>
<td style="text-align: right">-3.44</td>
</tr>
</tbody>
</table>
<p>These data sizes were obtained by using the Microsoft <a href="https://msdn.microsoft.com/en-us/library/c1h23y6c.aspx">DUMPBIN tool</a> and the <code class="language-plaintext highlighter-rouge">/DISASM</code> cmd line switch (<a href="/data/2018/06/HelloWorld.disasm.zip">zip file of the full ouput</a>), which produces the following summary (note: size values are in HEX):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Summary
28000 .data
178000 .managed
24000 .pdata
1A2000 .rdata
18000 .reloc
58000 .text
18000 rdata
</code></pre></div></div>
<p>Also contained in the output is the assembly code for a simple <code class="language-plaintext highlighter-rouge">Hello World</code> method:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>HelloWorld_HelloWorld_Program__Main:
0000000140004C50: 48 8D 0D 19 94 37 lea rcx,[__Str_Hello_World__E63BA1FD6D43904697343A373ECFB93457121E4B2C51AF97278C431E8EC85545]
00
0000000140004C57: 48 8D 05 DA C5 00 lea rax,[System_Console_System_Console__WriteLine_12]
00
0000000140004C5E: 48 FF E0 jmp rax
0000000140004C61: 90 nop
0000000140004C62: 90 nop
0000000140004C63: 90 nop
</code></pre></div></div>
<p>and if we dig further we can see the code for <code class="language-plaintext highlighter-rouge">System.Console.WriteLine(..)</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>System_Console_System_Console__WriteLine_12:
0000000140011238: 56 push rsi
0000000140011239: 48 83 EC 20 sub rsp,20h
000000014001123D: 48 8B F1 mov rsi,rcx
0000000140011240: E8 33 AD FF FF call System_Console_System_Console__get_Out
0000000140011245: 48 8B C8 mov rcx,rax
0000000140011248: 48 8B D6 mov rdx,rsi
000000014001124B: 48 8B 00 mov rax,qword ptr [rax]
000000014001124E: 48 8B 40 68 mov rax,qword ptr [rax+68h]
0000000140011252: 48 83 C4 20 add rsp,20h
0000000140011256: 5E pop rsi
0000000140011257: 48 FF E0 jmp rax
000000014001125A: 90 nop
000000014001125B: 90 nop
</code></pre></div></div>
<hr />
<p><span id="limitations"></span></p>
<h1 id="limitations">Limitations</h1>
<h2 id="missing-functionality">Missing Functionality</h2>
<p>There have been some people who’ve successfully run <a href="https://www.youtube.com/watch?v=iaC67CUmEXs">complex apps using CoreRT</a>, but, as it stands CoreRT is still an alpha product. At least according to the <a href="https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.DotNet.ILCompiler">NuGet package ‘1.0.0-alpha-26529-02’</a> that the official samples <a href="https://github.com/dotnet/corert/tree/master/samples/HelloWorld">instruct you to use</a> and I’ve not seen any information about when a full 1.0 Release will be available.</p>
<p>So there is some functionality that is not yet implemented, e.g. <a href="https://github.com/dotnet/corert/issues/5780">F# Support</a>, <a href="https://github.com/dotnet/corert/issues/5680">GC.GetMemoryInfo</a> or <a href="https://github.com/dotnet/corert/issues/5587">canGetCookieForPInvokeCalliSig</a> (a <code class="language-plaintext highlighter-rouge">calli</code> to a p/invoke). For more information on this I recommend this entertaining presentation on <a href="https://vimeo.com/262938007">Building Native Executables from .NET with CoreRT</a> by <a href="https://twitter.com/markrendle">Mark Rendle</a>. In the 2nd half he chronicles all the issues that he ran into when he was trying to run an ASP.NET app under CoreRT (some of which may well be fixed now).</p>
<iframe src="https://player.vimeo.com/video/262938007" width="640" height="360" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
<h2 id="reflection">Reflection</h2>
<p>But more fundamentally, because of the nature of AOT compilation, there are 2 main stumbling blocks that you may also run into <strong>Reflection</strong> and <strong>Runtime Code-Generation</strong>.</p>
<p>Firstly, if you want to use reflection in your code you need to tell the CoreRT compiler about the types you expect to <em>reflect</em> over, because by-default it only includes the types it knows about. You can do with by using a file called <code class="language-plaintext highlighter-rouge">rd.xml</code> as <a href="https://github.com/dotnet/corert/blob/master/samples/WebApi/README.md#using-reflection">shown here</a>. Unfortunately this will always require manual intervention for the reasons <a href="https://github.com/dotnet/corert/issues/5855#issuecomment-392605646">explained in this issue</a>. More information is available in this comment <a href="https://github.com/Microsoft/visualfsharp/issues/4954#issuecomment-390941777">‘…some details about CoreRT’s restriction on MakeGenericType and MakeGenericMethod’</a>.</p>
<p>To make reflection work the compiler adds the required metadata to the final .exe <a href="https://github.com/dotnet/corert/issues/2035#issuecomment-298418885">using this process</a>:</p>
<blockquote>
<p>This would reuse the same scheme we already have for the RyuJIT codegen path:</p>
<ul>
<li>The compiler generates a blob of bytes that describes the metadata (namespaces, types, their members, their custom attributes, method parameters, etc.). The data is generated as a byte array in the <a href="https://github.com/dotnet/corert/blob/79affc5f32c390e7f6a0d61b1446360fbad0ae5f/src/ILCompiler.Compiler/src/Compiler/CompilerGeneratedMetadataManager.cs#L62">ComputeMetadata method</a>.</li>
<li>The metadata gets embedded as a data blob into the executable image. This is achieved by <a href="https://github.com/dotnet/corert/blob/79affc5f32c390e7f6a0d61b1446360fbad0ae5f/src/ILCompiler.Compiler/src/Compiler/MetadataManager.cs#L71">adding the blob</a> to a “ready to run header”. Ready to run header is a well known data structure that can be located by the code in the framework at runtime.</li>
<li>The ready to run header along with the blobs it refers to is emitted into the final executable.</li>
<li>At runtime, pointer to the byte array <a href="https://github.com/dotnet/corert/blob/79affc5f32c390e7f6a0d61b1446360fbad0ae5f/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/ModuleList.cs#L702">is located using the RhFindBlob API</a>, and a parser is constructed over the array, to be used by the reflection stack.</li>
</ul>
</blockquote>
<h2 id="runtime-code-generation">Runtime Code-Generation</h2>
<p>In .NET you often use reflection once (because it <a href="/2016/12/14/Why-is-Reflection-slow/">can be slow</a>) followed by <a href="https://docs.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/dynamic-source-code-generation-and-compilation">‘dynamic’ or ‘runtime’ code-generation</a> with <code class="language-plaintext highlighter-rouge">Reflection.Emit(..)</code>. This technique is widely using in .NET libraries for Serialisation/Deserialisation, Dependency Injection, Object Mapping and ORM.</p>
<p>The issue is that ‘runtime’ code generation is <a href="https://github.com/dotnet/corert/issues/5720#issuecomment-382084927">problematic in an ‘AOT’ scenario</a>:</p>
<blockquote>
<p>ASP.NET dependency injection introduced dependency on Reflection.Emit in <a href="https://github.com/aspnet/DependencyInjection/pull/630">aspnet/DependencyInjection#630</a> unfortunately. It makes it incompatible with CoreRT.</p>
<p>We can make it functional in CoreRT AOT environment by introducing IL interpretter (<a href="https://github.com/dotnet/corert/issues/5011">#5011</a>), but it would still perform poorly. The dependency injection framework is using Reflection.Emit on performance critical paths.</p>
<p>It would be really up to ASP.NET to provide AOT-friendly flavor that generates all code at build time instead of runtime to make this work well. It would likely help the startup without CoreRT as well.</p>
</blockquote>
<p>I’m sure this will be solved one way or the other (see <a href="https://github.com/dotnet/corert/issues/5011">#5011</a>), but at the moment it’s still ‘work-in-progress’.</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=17261117">HackerNews</a> and <a href="https://www.reddit.com/r/dotnet/comments/8pdt98/corert_a_net_runtime_for_aot_performance_is_a/">/r/dotnet</a></p>
<hr />
<p><span id="furtherreading"></span></p>
<h1 id="further-reading">Further Reading</h1>
<p>If you’ve got this far, here’s some other links that you might be interested in:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/34665026/whats-the-difference-between-net-coreclr-corert-roslyn-and-llilc/35044525#35044525">What’s the difference between .NET CoreCLR, CoreRT, Roslyn and LLILC</a></li>
<li><a href="https://blog.rendle.io/what-ive-learned-about-dotnet-native/">What I’ve learned about .NET Native</a></li>
<li><a href="https://channel9.msdn.com/Shows/On-NET/Mei-Chin-Tsai--Jan-Kotas-CoreRT--NET-Native">Channel 9 - CoreRT & .NET Native</a></li>
<li><a href="https://channel9.msdn.com/Shows/Going+Deep/Inside-NET-Native">Channel 9 - Going Deep - Inside .NET Native</a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/Documentation/how-to-build-and-run-ilcompiler-in-visual-studio.md">Building ILCompiler in Visual Studio 2017</a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/Documentation/botr/type-system.md">Type System Overview (botr)</a></li>
<li><a href="https://github.com/dotnet/corert/blob/master/Documentation/design-docs/typesystem/TypeSystemInterfacesApi.md">Interfaces API surface on Type System</a></li>
<li><a href="https://xamarinhelp.com/xamarin-android-aot-works/">How Xamarin.Android AOT Works</a></li>
<li><a href="https://blogs.unity3d.com/2015/05/06/an-introduction-to-ilcpp-internals/">An introduction to IL2CPP internals</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2014/04/28/net-native-performance-internals/">.NET Native Performance and Internals</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2018/02/08/dynamic-tracing-of-net-core-methods/">Dynamic Tracing of .NET Core Methods</a></li>
<li><a href="http://www.mono-project.com/docs/advanced/runtime/docs/gsharedvt/">Generic sharing for valuetypes</a> (Mono)</li>
<li><a href="http://www.ntcore.com/Files/netint_native.htm">.NET Internals and Native Compiling</a></li>
</ul>
Taking a look at the ECMA-335 Standard for .NET2018-04-06T00:00:00+00:00http://www.mattwarren.org/2018/04/06/Taking-a-look-at-the-ECMA-335-Standard-for-.NET
<p>It turns out that the .NET Runtime has a <em>technical standard</em> (or <em>specification</em>), known by its full name <strong>ECMA-335 - Common Language Infrastructure (CLI)</strong> (not to be confused with <a href="https://www.ecma-international.org/publications/standards/Ecma-334.htm">ECMA-334</a> which is the <em>‘C# Language Specification’</em>). The latest update is the <a href="https://www.ecma-international.org/publications/standards/Ecma-335.htm">6th edition from June 2012</a>.</p>
<p>The specification or standard was written before <a href="https://www.microsoft.com/net/learn/get-started/windows">.NET Core</a> existed, so only applies to the <a href="https://www.microsoft.com/net/download/dotnet-framework-runtime">.NET Framework</a>, I’d be interested to know if there are any plans for an updated version?</p>
<hr />
<p>The rest of this post will take a look at the standard, exploring the contents and investigating what we can learn from it (hint: lots of <em>low-level details</em> and information about .NET <em>internals</em>)</p>
<hr />
<h2 id="why-is-it-useful">Why is it useful?</h2>
<p>Having a standard means that different implementations, such as <a href="http://www.mono-project.com/">Mono</a> and <a href="/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/">DotNetAnywhere</a> can exist, from <a href="https://docs.microsoft.com/en-us/dotnet/standard/clr">Common Language Runtime (CLR)</a>:</p>
<blockquote>
<p>Compilers and tools are able to produce output that the common language runtime can consume because the type system, the format of metadata, and the runtime environment (the virtual execution system) <strong>are all defined by a public standard</strong>, the ECMA Common Language Infrastructure specification. For more information, see <a href="https://www.visualstudio.com/license-terms/ecma-c-common-language-infrastructure-standards/">ECMA C# and Common Language Infrastructure Specifications</a>.</p>
</blockquote>
<p>and from the CoreCLR documentation on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/dotnet-standards.md">.NET Standards</a>:</p>
<blockquote>
<p>There was a very early realization by the founders of .NET that they were creating a new programming technology that had broad applicability across operating systems and CPU types and that advanced the state of the art of late 1990s (when the .NET project started at Microsoft) programming language implementation techniques. This led to considering and then pursuing standardization as an important pillar of establishing .NET in the industry.</p>
<p>The key addition to the state of the art was support for multiple programming languages with a single language runtime, hence the name <em>Common Language Runtime</em>. There were many other smaller additions, such as value types, a simple exception model and attributes. Generics and language integrated query were later added to that list.</p>
<p><strong>Looking back, standardization was quite effective, leading to .NET having a strong presence on iOS and Android, with the Unity and Xamarin offerings, both of which use the Mono runtime. The same may end up being true for .NET on Linux.</strong></p>
<p>The various .NET standards have been made meaningful by the collaboration of multiple companies and industry experts that have served on the working groups that have defined the standards. In addition (and most importantly), the .NET standards have been implemented by multiple commercial (ex: Unity IL2CPP, .NET Native) and open source (ex: Mono) implementors. The presence of multiple implementations proves the point of standardization.</p>
</blockquote>
<p>As the last quote points out, the standard is not produced <em>solely</em> by Microsoft:</p>
<p><img src="/images/2018/04/Companies and Organizations that Participated.png" alt="Companies and Organizations that Participated" /></p>
<p>There is also a nice <a href="https://en.wikipedia.org/wiki/Common_Language_Infrastructure">Wikipedia page</a> that has some additional information.</p>
<hr />
<h2 id="what-is-in-it">What is in it?</h2>
<p>At a high-level overview, the specification is divided into the following ‘partitions’ :</p>
<ul>
<li><strong>I: Concepts and Architecture</strong>
<ul>
<li>A great introduction to the CLR itself, explaining many of the key concepts and components, as well as the rationale behind them</li>
</ul>
</li>
<li><strong>II: Metadata Definition and Semantics</strong>
<ul>
<li>An explanation of the format of .NET dll/exe files, the different sections within them and how they’re laid out in-memory</li>
</ul>
</li>
<li><strong>III: CIL Instruction Set</strong>
<ul>
<li>A complete list of all the <em>Intermediate Language (IL)</em> instructions that the CLR understands, along with a detailed description of what they do and how to use them</li>
</ul>
</li>
<li><strong>IV: Profiles and Libraries</strong>
<ul>
<li>Describes the various different ‘Base Class libraries’ that make-up the runtime and how they are grouped into ‘Profiles’</li>
</ul>
</li>
<li><strong>V: Binary Formats (Debug Interchange Format)</strong>
<ul>
<li>An overview of ‘Portable CILDB files’, which give a way for additional <em>debugging information</em> to be provided</li>
</ul>
</li>
<li><strong>VI: Annexes</strong>
<ul>
<li>Annex A - Introduction</li>
<li>Annex B - Sample programs</li>
<li>Annex C - CIL assembler implementation</li>
<li>Annex D - Class library design guidelines</li>
<li>Annex E - Portability considerations</li>
<li>Annex F - Imprecise faults</li>
<li>Annex G - Parallel library</li>
</ul>
</li>
</ul>
<p>But, working your way through the entire specification is a mammoth task, generally I find it useful to just search for a particular word or phrase and locate the parts I need that way. However if you do want to read through one section, I recommend ‘Partition I: Concepts and Architecture’, at just over 100 pages it is much easier to fully digest! This section is a <a href="/images/2018/04/Partition I - Concepts and Architecture - Outline.png">very comprehensive overview</a> of the key concepts and components contained within the CLR and well worth a read.</p>
<p>Also, I’m convinced that the authors of the spec wanted to <em>help out</em> any future readers, so to break things up they included lots of very helpful diagrams:</p>
<p><img src="/images/2018/04/Figure 1 - Type System.png" alt="Type System.png" /></p>
<p>For more examples see:</p>
<ul>
<li><a href="/images/2018/04/Arrays - Multi-dimensional v Jagged.png">Arrays - Multi-dimensional v Jagged</a></li>
<li><a href="/images/2018/04/Figure 1 - Relationship between correct and verifiable CIL.png">Relationship between correct and verifiable CIL</a></li>
<li><a href="/images/2018/04/High-level view of the CLI file format.png">High-level view of the CLI file format</a></li>
<li><a href="/images/2018/04/Layout information for a class or value type.png">Layout information for a class or value type</a></li>
<li><a href="/images/2018/04/Relationship between boxed and unboxed representations of a value type.png">Relationship between boxed and unboxed representations of a value type</a></li>
<li><a href="/images/2018/04/Roots of the inheritance hierarchies.png">Roots of the inheritance hierarchies</a></li>
</ul>
<p>On top of all that, they also dropped in some <a href="https://designforhackers.com/blog/comic-sans-hate/">Comic Sans</a> 😀, just to make it clear when the text is only ‘<em>informative</em>’:</p>
<p><img src="/images/2018/04/Informative Text.png" alt="Informative Text" /></p>
<hr />
<h2 id="how-has-it-changed">How has it changed?</h2>
<p>The spec has been through <a href="https://www.ecma-international.org/publications/standards/Ecma-335-arch.htm">6th editions</a> and it’s interesting to look at the changes over time:</p>
<table>
<thead>
<tr>
<th>Edition</th>
<th>Release Date</th>
<th>CLR Version</th>
<th>Significant Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>1st</strong></td>
<td>December 2001</td>
<td><strong>1.0</strong> (February 2002)</td>
<td>N/A</td>
</tr>
<tr>
<td><strong>2nd</strong></td>
<td>December 2002</td>
<td><strong>1.1</strong> (April 2003)</td>
<td> </td>
</tr>
<tr>
<td><strong>3rd</strong></td>
<td>June 2005</td>
<td><strong>2.0</strong> (January 2006)</td>
<td>See below <a href="/data/2018/04/ECMA-335 - 3rd edition - Changes.pdf">(link)</a></td>
</tr>
<tr>
<td><strong>4th</strong></td>
<td>June 2006</td>
<td> </td>
<td>None, revision of 3rd edition <a href="/data/2018/04/ECMA-335 - 4th edition - Changes.pdf">(link)</a></td>
</tr>
<tr>
<td><strong>5th</strong></td>
<td>December 2010</td>
<td><strong>4.0</strong> (April 2010)</td>
<td>See below <a href="/data/2018/04/ECMA-335 - 5th edition - Changes.pdf">(link)</a></td>
</tr>
<tr>
<td><strong>6th</strong></td>
<td>June 2012</td>
<td> </td>
<td>None, revision of 5th edition <a href="/data/2018/04/ECMA-335 - 6th edition - Changes.pdf">(link)</a></td>
</tr>
</tbody>
</table>
<p>However, only 2 editions contained <strong>significant</strong> updates, they are explained in more detail below:</p>
<h3 id="3rd-edition-link">3rd Edition <a href="/data/2018/04/ECMA-335 - 3rd edition - Changes.pdf">(link)</a></h3>
<ul>
<li>Support for <em>generic</em> types and methods (see <a href="/2018/03/02/How-generics-were-added-to-.NET/">‘How generics were added to .NET’</a>)</li>
<li>New IL instructions - <code class="language-plaintext highlighter-rouge">ldelem</code>, <code class="language-plaintext highlighter-rouge">stelem</code> and <code class="language-plaintext highlighter-rouge">unbox.any</code></li>
<li>Added the <code class="language-plaintext highlighter-rouge">constrained.</code>, <code class="language-plaintext highlighter-rouge">no.</code> and <code class="language-plaintext highlighter-rouge">readonly.</code> IL instruction prefixes</li>
<li>Brand new ‘namespaces’ (with corresponding types) - <code class="language-plaintext highlighter-rouge">System.Collections.Generics</code>, <code class="language-plaintext highlighter-rouge">System.Threading.Parallel</code></li>
<li>New types added, including <code class="language-plaintext highlighter-rouge">Action<T></code>, <code class="language-plaintext highlighter-rouge">Nullable<T></code> and <code class="language-plaintext highlighter-rouge">ThreadStaticAttribute</code></li>
</ul>
<h3 id="5th-edition-link">5th Edition <a href="/data/2018/04/ECMA-335 - 6th edition - Changes.pdf">(link)</a></h3>
<ul>
<li><a href="https://docs.microsoft.com/en-us/dotnet/framework/app-domains/type-forwarding-in-the-common-language-runtime">Type-forwarding</a> added</li>
<li>Semantics of <a href="https://blogs.msdn.microsoft.com/ericlippert/2009/12/03/exact-rules-for-variance-validity/">‘variance’</a> redefined, became a core feature</li>
<li>Multiple types added or updated, including <code class="language-plaintext highlighter-rouge">System.Action</code>, <code class="language-plaintext highlighter-rouge">System.MulticastDelegate</code> and <code class="language-plaintext highlighter-rouge">System.WeakReference</code></li>
<li><code class="language-plaintext highlighter-rouge">System.Math</code> and <code class="language-plaintext highlighter-rouge">System.Double</code> modified to better conform to IEEE</li>
</ul>
<hr />
<h2 id="microsoft-specific-implementation">Microsoft Specific Implementation</h2>
<p>Another interesting aspect to look at is the Microsoft specific implementation details and notes. The following links are to pdf documents that are modified versions of the 4th edition:</p>
<ul>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20I.pdf">Partition I: Concepts and Architecture</a></li>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20II.pdf">Partition II: Meta Data Definition and Semantics</a></li>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20III.pdf">Partition III: CIL Instruction Set</a></li>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20IV.pdf">Partition IV: Profiles and Libraries</a></li>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20V.pdf">Partition V: Debug Interchange Format</a></li>
<li><a href="http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20VI.pdf">Partition VI: Annexes</a></li>
</ul>
<p>They all contain multiple occurrences of text like this ‘<em>Implementation Specific (Microsoft)</em>’:</p>
<p><a href="/images/2018/04/Microsoft Specific Implementation Notes - Partition I.png"><img src="/images/2018/04/Microsoft Specific Implementation Notes - Partition I.png" alt="Microsoft Specific Implementation Notes - Partition I" /></a></p>
<hr />
<h2 id="more-information">More Information</h2>
<p>Finally, if you want to find out more there’s a book available (affiliate link):</p>
<p><a href="https://www.amazon.co.uk/Common-Language-Infrastructure-Annotated-Standard/dp/0321154932/ref=as_li_ss_il?_encoding=UTF8&pd_rd_i=0321154932&pd_rd_r=B9W686JZFFZHB6G358Y5&pd_rd_w=0luDi&pd_rd_wg=IG2lU&psc=1&refRID=B9W686JZFFZHB6G358Y5&linkCode=li3&tag=mattonsoft-21&linkId=c99e84073532318dbca0d07dc9fcb19b" target="_blank"><img border="0" src="//ws-eu.amazon-adsystem.com/widgets/q?_encoding=UTF8&ASIN=0321154932&Format=_SL250_&ID=AsinImage&MarketPlace=GB&ServiceVersion=20070822&WS=1&tag=mattonsoft-21" /></a><img src="https://ir-uk.amazon-adsystem.com/e/ir?t=mattonsoft-21&l=li3&o=2&a=0321154932" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p>
Exploring the internals of the .NET Runtime2018-03-23T00:00:00+00:00http://www.mattwarren.org/2018/03/23/Exploring-the-internals-of-the-.NET-Runtime
<p>I recently appeared on <a href="http://herdingcode.com/herding-code-228-matt-warren-on-net-internals-and-open-source-contributions/">Herding Code</a> and <a href="https://stackify.com/developer-things-5-benchmarkdotnet/">Stackify ‘Developer Things’</a> podcasts and in both cases, the first question asked was ‘<strong><em>how do you figure out the internals of the .NET runtime</em></strong>’?</p>
<p>This post is an attempt to articulate that process, in the hope that it might be useful to others.</p>
<hr />
<p>Here are my suggested steps:</p>
<ol>
<li><a href="#decide">Decide what you want to investigate</a></li>
<li><a href="#double-check">See if someone else has already figured it out</a> (optional)</li>
<li><a href="#botr">Read the ‘Book of the Runtime’</a></li>
<li><a href="#build-from-source">Build from the source</a></li>
<li><a href="#debugging">Debugging</a></li>
<li><a href="#verify-net-framework">Verify against .NET Framework</a> (optional)</li>
</ol>
<p><strong>Note</strong>: As with all these types of lists, just because it worked for me <em>doesn’t</em> mean that it will for everyone. So, ‘<em>your milage may vary</em>’.</p>
<hr />
<p><span id="decide"></span></p>
<h2 id="step-one---decide-what-you-want-to-investigate">Step One - Decide what you want to investigate</h2>
<p>For me, this means working out <strong>what question I’m trying to answer</strong>, for example here are some previous posts I’ve written:</p>
<ul>
<li><a href="/2017/01/25/How-do-.NET-delegates-work/">How do .NET delegates work?</a></li>
<li><a href="/2016/12/14/Why-is-Reflection-slow/">Why is reflection slow?</a></li>
<li><a href="/2016/10/26/How-does-the-fixed-keyword-work/">How does the ‘fixed’ keyword work?</a></li>
</ul>
<p>(it just goes to show, you don’t always need fancy titles!)</p>
<p>I put this as ‘Step 1’ because digging into .NET internals isn’t quick or easy work, some of my posts take weeks to research, so I need to have a motivation to keep me going, something to focus on. In addition, the CLR isn’t a small run-time, there’s <em>a lot</em> in there, so just blindly trying to find your way around it isn’t easy! That’s why having a specific focus helps, looking at one feature or section at a time is more manageable.</p>
<p>The very first post where I followed this approach was <a href="/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/">Strings and the CLR - a Special Relationship</a>. I’d previously spent some time looking at the <a href="https://github.com/dotnet/coreclr">CoreCLR source</a> and I knew a bit about how <code class="language-plaintext highlighter-rouge">Strings</code> in the CLR worked, but not all the details. During the research of that post I then found more and more areas of the CLR that I didn’t understand and the rest of my blog grew from there (<a href="/2017/01/25/How-do-.NET-delegates-work/">delegates</a>, <a href="/2017/05/08/Arrays-and-the-CLR-a-Very-Special-Relationship/">arrays</a>, <a href="/2016/10/26/How-does-the-fixed-keyword-work/">fixed keyword</a>, <a href="/2017/06/15/How-the-.NET-Rutime-loads-a-Type/">type loader</a>, etc).</p>
<p><strong>Aside:</strong> I think this is generally applicable, if you want to start blogging, but you don’t think you have enough ideas to sustain it, I’d recommend that you <strong>start somewhere and other ideas will follow</strong>.</p>
<p>Another tip is to look at <a href="https://news.ycombinator.com/">HackerNews</a> or <a href="https://www.reddit.com/r/programming/">/r/programming</a> for posts about the ‘<em>internals</em>’ of other runtimes, e.g. Java, Ruby, Python, Go etc, then write the equivalent post about the CLR. One of my most popular posts <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">A Hitchhikers Guide to the CoreCLR Source Code</a> was clearly influenced by <a href="https://hn.algolia.com/?query=hitchhikers%20guide%20to&sort=byPopularity&prefix=false&page=0&dateRange=all&type=story">equivalent articles</a>!</p>
<p>Finally, for more help with learning, ‘<em>figuring things out</em>’ and explaining them to others, I recommend that you read anything by <a href="https://twitter.com/b0rk">Julia Evans</a>. Start with <a href="https://jvns.ca/blog/2017/03/20/blogging-principles/">Blogging principles I use</a> and <a href="https://jvns.ca/blog/so-you-want-to-be-a-wizard/">So you want to be a wizard</a> (also available <a href="https://twitter.com/b0rk/status/941901614796943361?lang=en">as a zine</a>), then work your way through <a href="https://jvns.ca/">all the other posts related to blogging or writing</a>.</p>
<p><strong>I’ve been hugely influenced, for the better, by Julia’s approach to blogging</strong>.</p>
<script async="" class="speakerdeck-embed" data-slide="7" data-id="b32f2c13a1644e898379ac77e6ae73fb" data-ratio="1.49926793557833" src="//speakerdeck.com/assets/embed.js"></script>
<p><span id="double-check"></span></p>
<h2 id="step-two---see-if-someone-else-has-already-figured-it-out-optional">Step Two - See if someone else has already figured it out (optional)</h2>
<p>I put this in as ‘optional’, because it depends on your motivation. If you are trying to understand .NET internals for <strong>your own education</strong>, then feel-free to write about whatever you want. If you are trying to do it to <strong>also help others</strong>, I’d recommend that you first see what’s already been written about the subject. If, once you’ve done that you still think there is something <strong>new or different that you can add</strong>, then go ahead, but I try not to just re-hash what is already out there.</p>
<p>To see what’s already been written, you can start with <a href="/2018/01/22/Resources-for-Learning-about-.NET-Internals/">Resources for Learning about .NET Internals</a> or peruse the <a href="/tags/#Internals">‘Internals’ tag on this blog</a>. Another really great resource is all the <a href="https://stackoverflow.com/users/17034/hans-passant?tab=answers">answers by Hans Passant</a> on StackOverflow, he is prolific and amazingly knowledgeable, here’s some examples to get you started:</p>
<ul>
<li><a href="https://stackoverflow.com/questions/8870442/how-is-math-pow-implemented-in-net-framework/8870593#8870593">How is Math.Pow() implemented in .NET Framework?</a></li>
<li><a href="https://stackoverflow.com/questions/17130382/understanding-garbage-collection-in-net/17131389#17131389">Understanding garbage collection in .NET</a></li>
<li><a href="https://stackoverflow.com/questions/4043821/performance-differences-between-debug-and-release-builds/4045073#4045073">Performance differences between debug and release builds</a></li>
<li><a href="https://stackoverflow.com/questions/2056948/net-jit-potential-error/2057228#2057228">.NET JIT potential error?</a></li>
<li><a href="https://stackoverflow.com/questions/8951836/why-large-object-heap-and-why-do-we-care/8953503#8953503">Why Large Object Heap and why do we care?</a></li>
<li><a href="https://stackoverflow.com/questions/1583050/performance-surprise-with-as-and-nullable-types/3076525#3076525">Performance surprise with “as” and nullable types</a></li>
<li><a href="https://stackoverflow.com/questions/28514373/what-is-the-size-of-a-boolean-in-c-does-it-really-take-4-bytes/28515361#28515361">What is the size of a boolean In C#? Does it really take 4-bytes?</a></li>
</ul>
<p><span id="botr"></span></p>
<h2 id="step-three---read-the-book-of-the-runtime">Step Three - Read the ‘Book of the Runtime’</h2>
<p>You won’t get far in investigating .NET internals without coming across the <a href="https://github.com/dotnet/coreclr/tree/master/Documentation/botr">‘Book of the Runtime’ (BOTR)</a> which is an invaluable resource, even <a href="https://www.hanselman.com/blog/TheBookOfTheRuntimeTheInternalsOfTheNETRuntimeThatYouWontFindInTheDocumentation.aspx">Scott Hanselman agrees</a>!</p>
<p>It was written by the .NET engineering team, for the .NET engineering team, as per <a href="https://news.ycombinator.com/item?id=15358571">this HackerNews comment</a>:</p>
<blockquote>
<p>Having worked for 7 years on the .NET runtime team, I can attest that the BOTR is <strong>the official reference</strong>. It was created as documentation for the engineering team, by the engineering team. And it was (supposed to be) kept up to date any time a new feature was added or changed.</p>
</blockquote>
<p>However, just a word of warning, this means that it’s an in-depth, non-trivial document and hard to understand when you are first learning about a particular topic. Several of my blog posts have consisted of the following steps:</p>
<ol>
<li>Read the BOTR chapter on ‘Topic X’</li>
<li>Understand about 5% of what I read</li>
<li>Go away and learn more (read the source code, read other resources, etc)</li>
<li>GOTO ‘Step 1’, understanding more this time!</li>
</ol>
<p>Related to this, the source code itself is often as helpful as the BOTR due to the extensive comments, for example <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/inc/corinfo.h#L1426-L1514">this one describing the rules for prestubs</a> really helped me out. The downside of the source code comments is that they are bit harder to find, whereas the BOTR is all in one place.</p>
<p><span id="build-from-source"></span></p>
<h2 id="step-four---build-from-the-source">Step Four - Build from the source</h2>
<p>However, at some point, just reading about the internals of the CLR isn’t enough, you actually need to ‘<em>get your hands</em>’ dirty and see it in action. Now that the Core CLR is open source it’s very easy to <a href="https://github.com/dotnet/coreclr#building-the-repository">build it yourself</a> and then once you’ve done that, there are <a href="https://github.com/dotnet/coreclr/tree/master/Documentation/building">even more docs to help you out</a> if you are building on different OSes, want to debug, test CoreCLR in conjunction with CoreFX, etc.</p>
<p><strong>But why is building from source useful?</strong></p>
<p>Because it lets you build a Debug/Diagnostic version of the runtime that gives you lots of additional information that isn’t available in the Release/Retails builds. For instance you can <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md#setting-configuration-variables">view JIT Dumps</a> using <code class="language-plaintext highlighter-rouge">COMPlus_JitDump=...</code>, however this is just one of many <code class="language-plaintext highlighter-rouge">COMPlus_XXX</code> settings you can use, there are <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md">100’s available</a>.</p>
<p>However, even more useful is the ability to turn on diagnostic logging for a particular area of the CLR. For instance, lets imagine that we want to find out more about <code class="language-plaintext highlighter-rouge">AppDomains</code> and how they work under-the-hood, we can use the following <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md#log-configuration-knobs">logging configuration settings</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SET COMPLUS_LogEnable=1
SET COMPLUS_LogToFile=1
SET COMPLUS_LogFacility=02000000
SET COMPLUS_LogLevel=A
</code></pre></div></div>
<p>Where <code class="language-plaintext highlighter-rouge">LogFacility</code> is set to <code class="language-plaintext highlighter-rouge">LF_APPDOMAIN</code>, there are many other values you can provide as a HEX bit-mask the full list is available <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/loglf.h">in the source code</a>. If you set these variables and then run an app, you will get a log output <a href="/data/2017/02/COMPLUS-AppDomain.log">like this one</a>. Once you have this log you can very easily search around in the code to find where the messages came from, for instance here are all the places that <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=LF_APPDOMAIN&type="><code class="language-plaintext highlighter-rouge">LF_APPDOMAIN</code> is logged</a>. This is a great technique to find your way into a section of the CLR that you aren’t familiar with, I’ve used it many times to great effect.</p>
<p><span id="debugging"></span></p>
<h2 id="step-five---debugging">Step Five - Debugging</h2>
<p>For me, biggest boon of <a href="/2017/12/19/Open-Source-.Net-3-years-later">Microsoft open sourcing .NET</a> is that you can discover so much more about the internals <strong>without</strong> having to resort to ‘old school’ <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/getting-started-with-windbg">debugging using WinDBG</a>. But there still comes a time when it’s useful to step through the code line-by-line to see what’s going on. The added advantage of having the source code is that you can build a copy locally and then debug through that <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md">using Visual Studio</a> which is slightly easier than WinDBG.</p>
<p>I always leave debugging to last, as it can be time-consuming and I only find it helpful when I already know where to set a breakpoint, i.e. I already know which part of the code I want to step through. I once tried to blindly step through the source of the CLR <a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/">whilst it was starting up</a> and it was very hard to see what was going on, as I’ve said before the CLR is a complex runtime, there are many things happening, so stepping through lots of code, line-by-line can get tricky.</p>
<p><span id="verify-net-framework"></span></p>
<h2 id="step-six---verify-against-net-framework">Step Six - Verify against .NET Framework</h2>
<p>I put this final step in because the .NET CLR source <a href="https://github.com/dotnet/coreclr">available on GitHub</a> is the ‘.NET Core’ version of the runtime, which isn’t the same as the full/desktop .NET Framework that’s been around for years. So you may need to verify the behavior matches, if you want to understand the internals ‘<em>as they were</em>’, not just ‘<em>as they will be</em>’ going forward. For instance .NET Core has removed the ability to <a href="https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/porting.md#app-domains">create App Domains</a> as a way to provide isolation but interestingly enough the <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/appdomain.cpp">internal class lives on</a>!</p>
<p>To verify the behaviour, your main option is to <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/getting-started-with-windbg">debug the CLR using WinDBG</a>. Beyond that, you can resort to looking at the <a href="https://msdn.microsoft.com/en-us/library/cc749640.aspx">‘Rotor’ source code</a> (roughly the same as .NET Framework 2.0), or petition Microsoft the release the .NET Framework Source Code (probably not going to happen)!</p>
<p>However, low-level internals don’t change all that often, so more often than not the way things behave in the CoreCLR is the same as they’ve always worked.</p>
<hr />
<h1 id="resources">Resources</h1>
<p>Finally, for your viewing pleasure, here are a few talks related to ‘<em>.NET Internals</em>’:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=iQRVJHab4MM">.NET Unboxed 2015 - Geoff Norton - Open Source Hacking the CoreCLR</a></li>
<li><a href="https://www.youtube.com/watch?v=JNmUz7C1usM">.NET Core on Unix - Jan Vorlicek</a></li>
<li><a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-03-04">.NET Internals 2015-03-04: .NET Core & Cross Platform</a></li>
<li><a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-02-25">.NET Internals 2015-02-25: Open Source</a></li>
</ul>
<hr />
<p>Discuss this post on <a href="https://www.reddit.com/r/programming/comments/86opzw/exploring_the_internals_of_the_net_runtime/">/r/programming</a> or <a href="https://www.reddit.com/r/dotnet/comments/86opun/exploring_the_internals_of_the_net_runtime/">/r/dotnet</a></p>
How generics were added to .NET2018-03-02T00:00:00+00:00http://www.mattwarren.org/2018/03/02/How-generics-were-added-to-.NET
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=16525244">HackerNews</a> and <a href="https://www.reddit.com/r/programming/comments/81ih8t/how_generics_were_added_to_net/">/r/programming</a></p>
<hr />
<p>Before we dive into the technical details, let’s start with a quick history lesson, courtesy of <a href="https://www.microsoft.com/en-us/research/people/dsyme/">Don Syme</a> who worked on adding generics to .NET and then went on to <a href="http://fsharp.org">design and implement F#</a>, which is a pretty impressive set of achievements!!</p>
<h2 id="background-and-history">Background and History</h2>
<ul>
<li><strong>1999</strong> Initial research, design and planning
<ul>
<li><a href="https://blogs.msdn.microsoft.com/dsyme/2011/03/15/netc-generics-history-some-photos-from-feb-1999/">.NET/C# Generics History: Some Photos From Feb 1999</a></li>
</ul>
</li>
<li><strong>1999</strong> First ‘white paper’ published
<ul>
<li><a href="https://blogs.msdn.microsoft.com/dsyme/2012/07/05/more-c-net-generics-research-project-history-the-msr-white-paper-from-mid-1999/">More C#/.NET Generics Research Project History – The MSR white paper</a></li>
<li><a href="https://msdnshared.blob.core.windows.net/media/MSDNBlogsFS/prod.evol.blogs.msdn.com/CommunityServer.Components.PostAttachments/00/10/32/72/38/Ext-VOS.pdf">MSR White Paper: Proposed Extensions to COM+ VOS (Draft)</a> (<strong>pdf</strong>)</li>
</ul>
</li>
<li><strong>2001</strong> C# Language Design Specification created
<ul>
<li><a href="https://blogs.msdn.microsoft.com/dsyme/2012/06/19/some-history-2001-gc-research-project-draft-from-the-msr-cambridge-team/">Some History: 2001 “GC#” (Generic C#) research project draft</a></li>
<li><a href="https://msdnshared.blob.core.windows.net/media/MSDNBlogsFS/prod.evol.blogs.msdn.com/CommunityServer.Components.PostAttachments/00/10/32/17/02/GCSharp-new-v16-12-Dec-2001-redist.pdf">MSR - .NET Generics Research Project - Generic C# Specification</a> (<strong>pdf</strong>)</li>
</ul>
</li>
<li><strong>2001</strong> Research paper published
<ul>
<li><a href="https://www.microsoft.com/en-us/research/publication/design-and-implementation-of-generics-for-the-net-common-language-runtime/">Design and Implementation of Generics for the .NET CLR</a> (<strong>pdf</strong>)</li>
</ul>
</li>
<li><strong>2004</strong> Work completed and all bugs fixed
<ul>
<li><a href="https://blogs.msdn.microsoft.com/dsyme/2012/06/26/some-more-netc-generics-research-project-history/">Some more .NET/C# Generics Research Project History</a></li>
</ul>
</li>
</ul>
<p><strong>Update:</strong> Don Syme, <a href="https://twitter.com/dsyme/status/969928172597858305">pointed out</a> another research paper related to .NET generics, <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/space2004generics.pdf">Combining Generics, Precompilation and Sharing Between Software Based Processes</a> (<strong>pdf</strong>)</p>
<p>To give you an idea of how these events fit into the bigger picture, here are the dates of <a href="https://en.wikipedia.org/wiki/.NET_Framework">.NET Framework Releases</a>, up-to 2.0 which was the first version to have generics:</p>
<table>
<thead>
<tr>
<th style="text-align: center">Version number</th>
<th style="text-align: center">CLR version</th>
<th style="text-align: center">Release date</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">1.0</td>
<td style="text-align: center">1.0</td>
<td style="text-align: center">2002-02-13</td>
</tr>
<tr>
<td style="text-align: center">1.1</td>
<td style="text-align: center">1.1</td>
<td style="text-align: center">2003-04-24</td>
</tr>
<tr>
<td style="text-align: center"><strong>2.0</strong></td>
<td style="text-align: center"><strong>2.0</strong></td>
<td style="text-align: center"><strong>2005-11-07</strong></td>
</tr>
</tbody>
</table>
<p>Aside from the historical perspective, what I find most fascinating is just how much the addition of generics in .NET was due to the work done by Microsoft Research, from <a href="https://blogs.msdn.microsoft.com/dsyme/2011/03/15/netc-generics-history-some-photos-from-feb-1999/">.NET/C# Generics History</a>:</p>
<blockquote>
<p>It was only through the total dedication of Microsoft Research, Cambridge during 1998-2004, to doing <strong>a complete, high quality implementation in both the CLR (including NGEN, debugging, JIT, AppDomains, concurrent loading and many other aspects), and the C# compiler</strong>, that the project proceeded.</p>
</blockquote>
<p>He then goes on to say:</p>
<blockquote>
<p>What would the cost of inaction have been? What would the cost of failure have been? <strong>No generics in C# 2.0? No LINQ in C# 3.0? No TPL in C# 4.0? No Async in C# 5.0? No F#?</strong> Ultimately, an erasure model of generics would have been adopted, as for Java, since the CLR team would never have pursued a in-the-VM generics design without external help.</p>
</blockquote>
<p>Wow, C# and .NET would look <strong>very</strong> different without all these features!!</p>
<h3 id="the-gyro-project---generics-for-rotor">The ‘Gyro’ Project - Generics for Rotor</h3>
<p>Unfortunately there doesn’t exist a publicly accessible version of the .NET 1.0 and 2.0 source code, so we can’t go back and look at the changes that were made (if I’m wrong, please let me know as I’d love to read it).</p>
<p>However, we do have the next best thing, the <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52517">‘Gyro’ project</a> in which the equivalent changes were made to the <a href="https://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure">‘Shared Source Common Language Implementation’</a> (SSCLI) code base (a.k.a ‘Rotor’). As an aside, if you want to learn more about the Rotor code base I really recommend the excellent book by Ted Neward, which you can <a href="http://blogs.tedneward.com/post/revisiting-rotor/">download from his blog</a>.</p>
<p>Gyro 1.0 was <a href="http://www.servergeek.com/blogs/mickey/archive/2003_04_27_blog_arc.htm">released in 2003</a> which implies that is was created <em>after</em> the work has been done in the <em>real</em> .NET Framework source code, I assume that Microsoft Research wanted to publish the ‘Rotor’ implementation so it could be studied more widely. Gyro is also referenced in one Don Syme’s posts, from <a href="https://blogs.msdn.microsoft.com/dsyme/2012/06/19/some-history-2001-gc-research-project-draft-from-the-msr-cambridge-team/">Some History: 2001 “GC#” research project draft, from the MSR Cambridge team</a>:</p>
<blockquote>
<p>With Dave Berry’s help we later published a version of the corresponding code as the “Gyro” variant of the “Rotor” CLI implementation.</p>
</blockquote>
<p><strong>The rest of this post will look at <em>how</em> generics were implemented in the Rotor source code.</strong></p>
<p><strong>Note</strong>: There are some significant differences between the Rotor source code and the real .NET framework. Most notably the <a href="https://blogs.msdn.microsoft.com/joelpob/2004/01/21/short-notes-on-the-rotor-jit/">JIT</a> and <a href="https://blogs.msdn.microsoft.com/joelpob/2004/02/26/explanatory-notes-on-rotors-garbage-collector/">GC</a> are completely different implementations (due to licensing issues, listen to <a href="https://www.dotnetrocks.com/?show=360">DotNetRocks show 360 - Ted Neward and Joel Pobar on Rotor 2.0</a> for more info). However, the Rotor source does give us an accurate idea about how other <em>core parts</em> of the CLR are implemented, such as the Type-System, Debugger, AppDomains and the VM itself. It’s interesting to compare the <a href="https://github.com/SSCLI/sscli20_20060311">Rotor source</a> with the current <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">CoreCLR source</a> and see how much of the source code layout and class names have remained the same.</p>
<hr />
<h2 id="implementation">Implementation</h2>
<p>To make things easier for anyone who wants to follow-along, I created a <a href="https://github.com/mattwarren/GenericsInDotNet">GitHub repo</a> that contains the <a href="https://github.com/SSCLI/sscli_20021101">Rotor code for .NET 1.0</a> and then checked in the <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52517">Gyro source code</a> on top, which means that you can <a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1">see all the changes in one place</a>:</p>
<p><img src="/images/2018/03/Gyro changes to implement generics.png" alt="Gyro changes to implement generics" /></p>
<p>The first thing you notice in the Gyro source is that all the files contain this particular piece of legalese:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ; By using this software in any fashion, you are agreeing to be bound by the
; terms of this license.
;
<span class="gi">+; This file contains modifications of the base SSCLI software to support generic
+; type definitions and generic methods. These modifications are for research
+; purposes. They do not commit Microsoft to the future support of these or
+; any similar changes to the SSCLI or the .NET product. -- 31st October, 2002.
+;
</span> ; You must not remove this notice, or any other, from this software.
</code></pre></div></div>
<p>It’s funny that they needed to add the line ‘<em>They do not commit Microsoft to the future support of these or any similar changes to the SSCLI or the .NET product</em>’, even though they were just a few months away from doing just that!!</p>
<h3 id="components-directories-with-the-most-changes">Components (Directories) with the most changes</h3>
<p>To see where the work was done, lets start with a high-level view, showing the directories with a <strong>significant amount of changes</strong> (> 1% of the total changes):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git diff --dirstat=lines,1 464bf98 2714cca
0.1% bcl/
14.4% csharp/csharp/sccomp/
9.1% debug/di/
11.9% debug/ee/
2.1% debug/inc/
1.9% debug/shell/
2.5% fjit/
21.1% ilasm/
1.5% ildasm/
1.2% inc/
1.4% md/compiler/
29.9% vm/
</code></pre></div></div>
<p><strong>Note</strong>: <code class="language-plaintext highlighter-rouge">fjit</code> is the “Fast JIT” compiler, i.e the version released with Rotor, which was significantly different to one available in the full .NET framework.</p>
<p>The full output from <code class="language-plaintext highlighter-rouge">git diff --dirstat=lines,0</code> is available <a href="/data/2018/03/dirstat output.txt">here</a> and the output from <code class="language-plaintext highlighter-rouge">git diff --stat</code> is <a href="/data/2018/03/diff stat output.txt">here</a>.</p>
<p><code class="language-plaintext highlighter-rouge">0.1% bcl/</code> is included only to show that very little <strong>C# code</strong> changes were needed, these were <em>mostly</em> plumbing code to expose the underlying C++ methods and changes to the various <code class="language-plaintext highlighter-rouge">ToString()</code> methods to include <a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-4eff16b228185c6e80fd6325d6994ff9">generic type information</a>, e.g. ‘<code class="language-plaintext highlighter-rouge">Class[int,double]</code>’. However there are 2 more significant ones:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">bcl/system/reflection/emit/opcodes.cs</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-cd44d74d6f3263cab42469a039ca2601">diff</a>)
<ul>
<li>Add the additional IL opcode needed to make generics work (this just mirrors the <a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-91e0675d515fc426f84d4e6465ad7f2d">main change made in core of the runtime</a>, so that the opcodes available in C# are consistent)</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">bcl/system/reflection/emit/signaturehelper.cs</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-e6629d61becf92412984036207cb92f8">diff</a>)
<ul>
<li>Add the ability to parse method <em>metadata</em> that contains generic related information, such as methods with generic parameters.</li>
</ul>
</li>
</ul>
<h3 id="files-with-the-most-changes">Files with the most changes</h3>
<p>Next, we’ll take a look at the specific classes/files that had the most changes as this gives us a really good idea about where the complexity was</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: center">Added</th>
<th style="text-align: center">Deleted</th>
<th style="text-align: center">Total Changes</th>
<th>File (click to go directly to the diff)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">1794</td>
<td style="text-align: center">323</td>
<td style="text-align: center">1471</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-22234c906bfe132ec494932cf06e3fb1">debug/di/module.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">1418</td>
<td style="text-align: center">337</td>
<td style="text-align: center">1081</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-0e0d8fff6a020ec70ca77b2cb8b99647">vm/class.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">1335</td>
<td style="text-align: center">308</td>
<td style="text-align: center">1027</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-fea4cf9500609e43a8069a1dcfa43b71">vm/jitinterface.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">1616</td>
<td style="text-align: center">888</td>
<td style="text-align: center">728</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-13c4c633f56c04ff5faf6dce22560847">debug/ee/debugger.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">741</td>
<td style="text-align: center">46</td>
<td style="text-align: center">695</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-aa4f38f96ad3a77d5b09b8a991aa6cb8">csharp/csharp/sccomp/symmgr.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">693</td>
<td style="text-align: center">0</td>
<td style="text-align: center">693</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-552abe52e5c106c6362a1a1caea0f132">vm/genmeth.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">999</td>
<td style="text-align: center">362</td>
<td style="text-align: center">637</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-0952232ff4ff9b6e7dd3d0810c526384">csharp/csharp/sccomp/clsdrec.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">926</td>
<td style="text-align: center">321</td>
<td style="text-align: center">605</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-3a12049d560ad4f93e5ce65a316fd978">csharp/csharp/sccomp/fncbind.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">559</td>
<td style="text-align: center">0</td>
<td style="text-align: center">559</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-2112a77378a346f28c6a0a3a321e8f87">vm/typeparse.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">605</td>
<td style="text-align: center">156</td>
<td style="text-align: center">449</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-0a485aaa61cb18a87e48fa33a3857dc6">vm/siginfo.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">417</td>
<td style="text-align: center">29</td>
<td style="text-align: center">388</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-7934c88bd9924d3c8cbff690063da3d7">vm/method.hpp</a></td>
</tr>
<tr>
<td style="text-align: center">642</td>
<td style="text-align: center">255</td>
<td style="text-align: center">387</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-9f6e7a75bd6b1a7a0cdd5e8035890206">fjit/fjit.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">379</td>
<td style="text-align: center">0</td>
<td style="text-align: center">379</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-f74e814e74cc0b7f310d8899dd9572c6">vm/jitinterfacegen.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">3045</td>
<td style="text-align: center">2672</td>
<td style="text-align: center">373</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-f7f421904f275fdc51213ac75de92119">ilasm/parseasm.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">465</td>
<td style="text-align: center">94</td>
<td style="text-align: center">371</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-003b498fe92dffc37d31bb4e94fc82d4">vm/class.h</a></td>
</tr>
<tr>
<td style="text-align: center">515</td>
<td style="text-align: center">163</td>
<td style="text-align: center">352</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-7ceae3bfad44ef6e15c1211be9f537a5">debug/inc/cordb.h</a></td>
</tr>
<tr>
<td style="text-align: center">339</td>
<td style="text-align: center">0</td>
<td style="text-align: center">339</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-2a678baf192f81a25eab4bd85ef5bae6">vm/generics.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">733</td>
<td style="text-align: center">418</td>
<td style="text-align: center">315</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-a096d9aee517403abfd5c9171ee7ee9c">csharp/csharp/sccomp/parser.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">471</td>
<td style="text-align: center">169</td>
<td style="text-align: center">302</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-3abe6da78df285aff42ab5932f2dda93">debug/shell/dshell.cpp</a></td>
</tr>
<tr>
<td style="text-align: center">382</td>
<td style="text-align: center">88</td>
<td style="text-align: center">294</td>
<td><a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-6ff795bd0261cd4bd627968951cef1f3">csharp/csharp/sccomp/import.cpp</a></td>
</tr>
</tbody>
</table>
</span>
<h2 id="components-of-the-runtime">Components of the Runtime</h2>
<p>Now we’ll look at individual components in more detail so we can get an idea of how different parts of the runtime had to change to accommodate generics.</p>
<h3 id="type-system-changes">Type System changes</h3>
<p>Not surprisingly the bulk of the changes are in the Virtual Machine (VM) component of the CLR and related to the ‘Type System’. Obviously adding ‘parameterised types’ to a type system that didn’t already have them requires wide-ranging and significant changes, which are shown in the list below:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">vm/class.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-0e0d8fff6a020ec70ca77b2cb8b99647">diff</a>
)
<ul>
<li>Allow the type system to distinguish between <a href="https://stackoverflow.com/questions/2173107/what-exactly-is-an-open-generic-type-in-net">open and closed generic types</a> and provide APIs to allow working them, such as <code class="language-plaintext highlighter-rouge">IsGenericVariable()</code> and <code class="language-plaintext highlighter-rouge">GetGenericTypeDefinition()</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">vm/genmeth.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-552abe52e5c106c6362a1a1caea0f132">diff</a>)
<ul>
<li>Contains the bulk of the functionality to make ‘generic methods’ possible, i.e. <code class="language-plaintext highlighter-rouge">MyMethod<T, U>(T item, U filter)</code>, including to work done to enable <a href="#shared-instantiations">‘shared instantiation’</a> of generic methods</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">vm/typeparse.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-2112a77378a346f28c6a0a3a321e8f87">diff</a>)
<ul>
<li>Changes needed to allow generic types to be looked-up by name, i.e. ‘<code class="language-plaintext highlighter-rouge">MyClass[System.Int32]</code>’</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">vm/siginfo.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-0a485aaa61cb18a87e48fa33a3857dc6">diff</a>)
<ul>
<li>Adds the ability to work with ‘generic-related’ method signatures</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">vm/method.hpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-7934c88bd9924d3c8cbff690063da3d7">diff</a>) and <code class="language-plaintext highlighter-rouge">vm/method.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-c615bd9fa80c05ada3fa2c6aeb3f8f4c">diff</a>)
<ul>
<li>Provides the runtime with generic related methods such as <code class="language-plaintext highlighter-rouge">IsGenericMethodDefinition()</code>, <code class="language-plaintext highlighter-rouge">GetNumGenericMethodArgs()</code> and <code class="language-plaintext highlighter-rouge">GetNumGenericClassArgs()</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">vm/generics.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-2a678baf192f81a25eab4bd85ef5bae6">diff</a>)
<ul>
<li>All the completely new ‘generics’ specific code is in here, mostly related to <a href="#shared-instantiations">‘shared instantiation’</a> which is explained below</li>
</ul>
</li>
</ul>
<h3 id="bytecode-or-intermediate-language-il-changes">Bytecode or ‘Intermediate Language’ (IL) changes</h3>
<p>The main place that the implementation of generics in the CLR differs from the JVM is that they are <a href="http://www.jprl.com/Blog/archive/development/2007/Aug-31.html">‘fully reified’ instead of using ‘type erasure’</a>, this was possible because the CLR designers were willing to break backwards compatibility, whereas the JVM had been around longer so I assume that this was a much less appealing option. For more discussion on this issue see <a href="http://beust.com/weblog/2011/07/29/erasure-vs-reification/">Erasure vs reification</a> and <a href="http://gafter.blogspot.co.uk/2006/11/reified-generics-for-java.html">Reified Generics for Java</a>. <strong>Update</strong>: this <a href="https://news.ycombinator.com/item?id=14584359">HackerNews discussion</a> is also worth a read.</p>
<p>The specific changes made to the .NET Intermediate Language (IL) op-codes can be seen in the <code class="language-plaintext highlighter-rouge">inc/opcode.def</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007?w=1#diff-91e0675d515fc426f84d4e6465ad7f2d">diff</a>), in essence the following 3 instructions were added</p>
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldelem">ldelem</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.stelem">stelem</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox_any">unbox.any</a></li>
</ul>
<p>In addition the <code class="language-plaintext highlighter-rouge">IL Assembler</code> tool (ILASM) needed <a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-f7f421904f275fdc51213ac75de92119">significant changes</a> as well as it’s counter part `IL Disassembler (ILDASM) so it could <a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-87680592860bf2d2e2a595434efa0016">handle the additional instructions</a>.</p>
<p>There is also a whole section titled ‘Support for Polymorphism in IL’ that explains these changes in greater detail in <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/designandimplementationofgenerics.pdf">Design and Implementation of Generics for the .NET Common Language Runtime</a></p>
<h3 id="shared-instantiations">Shared Instantiations</h3>
<p>From <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/designandimplementationofgenerics.pdf">Design and Implementation of Generics for the .NET Common Language Runtime</a></p>
<blockquote>
<p>Two instantiations are compatible if for any parameterized class its
compilation at these instantiations gives rise to identical code and
other execution structures (e.g. field layout and GC tables), apart
from the dictionaries described below in Section 4.4. In particular,
<strong>all reference types are compatible with each other</strong>, because the
loader and JIT compiler make no distinction for the purposes of
field layout or code generation. On the implementation for the Intel
x86, at least, <strong>primitive types are mutually incompatible</strong>, even
if they have the same size (floats and ints have different parameter
passing conventions). That leaves <strong>user-defined struct types, which
are compatible if their layout is the same</strong> with respect to garbage
collection i.e. they share the same pattern of traced pointers</p>
</blockquote>
<ul>
<li><code class="language-plaintext highlighter-rouge">ClassLoader::NewInstantiation(..)</code> <a href="https://github.com/mattwarren/GenericsInDotNet/blob/master/vm/generics.cpp#L15-L202">source code</a></li>
<li><code class="language-plaintext highlighter-rouge">TypeHandle::GetCanonicalFormAsGenericArgument()</code> <a href="https://github.com/mattwarren/GenericsInDotNet/blob/2714ccac6f18f0f6ff885567b90484013b31e007/vm/class.cpp#L428-L490">source code</a></li>
</ul>
<p>From a <a href="https://github.com/mattwarren/GenericsInDotNet/blob/2714ccac6f18f0f6ff885567b90484013b31e007/vm/typehandle.h#L227-L237">comment with more info</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// For an generic type instance return the representative within the class of
// all type handles that share code. For example,
// <int> --> <int>,
// <object> --> <object>,
// <string> --> <object>,
// <List<string>> --> <object>,
// <Struct<string>> --> <Struct<object>>
//
// If the code for the type handle is not shared then return
// the type handle itself.
</code></pre></div></div>
<p>In addition, <a href="https://github.com/mattwarren/GenericsInDotNet/blob/2714ccac6f18f0f6ff885567b90484013b31e007/vm/genmeth.cpp#L34-L83">this comment</a> explains the work that needs to take place to allow shared instantiations when working with <em>generic methods</em>.</p>
<p><strong>Update</strong>: If you want more info on the ‘code-sharing’ that takes places, I recommend reading these 4 posts:</p>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/joelpob/2004/11/17/clr-generics-and-code-sharing/">CLR Generics and code sharing</a></li>
<li><a href="https://web.archive.org/web/20100723221307/http://www.bluebytesoftware.com/blog/2005/03/23/DGUpdateGenericsAndPerformance.aspx">DG Update: Generics and Performance</a></li>
<li><a href="http://joeduffyblog.com/2011/10/23/on-generics-and-some-of-the-associated-overheads/">On generics and (some of) the associated overheads</a></li>
<li><a href="http://yizhang82.me/dotnet-generics-sharing">Sharing .NET generic code under the hood</a></li>
</ul>
<h3 id="compiler-and-jit-changes">Compiler and JIT Changes</h3>
<p>If seems like almost every part of the compiler had to change to accommodate generics, which is not surprising given that they touch so many parts of the code we write, <code class="language-plaintext highlighter-rouge">Types</code>, <code class="language-plaintext highlighter-rouge">Classes</code> and <code class="language-plaintext highlighter-rouge">Methods</code>. Some of the biggest changes were:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/clsdrec.cpp</code> - <strong>+999 -363</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-0952232ff4ff9b6e7dd3d0810c526384">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/emitter.cpp</code> - <strong>+347 -127</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-58397e0e022ba5c8e98f1ea59eadefee">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/fncbind.cpp</code> - <strong>+926 -321</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-3a12049d560ad4f93e5ce65a316fd978">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/import.cpp</code> - <strong>+382 - 88</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-6ff795bd0261cd4bd627968951cef1f3">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/parser.cpp</code> - <strong>+733 -418</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-a096d9aee517403abfd5c9171ee7ee9c">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">csharp/csharp/sccomp/symmgr.cpp</code> - <strong>+741 -46</strong> - (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-aa4f38f96ad3a77d5b09b8a991aa6cb8">diff</a>)</li>
</ul>
<p>In the ‘<em>just-in-time</em>’ (JIT) compiler extra work was needed because it’s responsible for implementing the additional <a href="#bytecode-or-intermediate-language-il-changes">‘IL Instructions’</a>. The bulk of these changes took place in <code class="language-plaintext highlighter-rouge">fjit.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-9f6e7a75bd6b1a7a0cdd5e8035890206">diff</a>) and <code class="language-plaintext highlighter-rouge">fjitdef.h</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-ddf200851d7fc0eb14bf1f64403cfae7">diff</a>).</p>
<p>Finally, a large amount of work was done in <code class="language-plaintext highlighter-rouge">vm/jitinterface.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-fea4cf9500609e43a8069a1dcfa43b71">diff</a>) to enable the JIT to access the extra information it needed to emit code for generic methods.</p>
<h3 id="debugger-changes">Debugger Changes</h3>
<p>Last, but by no means least, a significant amount of work was done to ensure that the debugger could understand and inspect generics types. It goes to show just how much <em>inside information</em> a debugger needs to have of the type system in an managed language.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">debug/ee/debugger.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-13c4c633f56c04ff5faf6dce22560847">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">debug/ee/debugger.h</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-f89efe7b1a060b67715d76a176830017">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">debug/di/module.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-22234c906bfe132ec494932cf06e3fb1">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">debug/di/rsthread.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-a0ed41f780929de1f626f8e7b4354dcb">diff</a>)</li>
<li><code class="language-plaintext highlighter-rouge">debug/shell/dshell.cpp</code> (<a href="https://github.com/mattwarren/GenericsInDotNet/commit/2714ccac6f18f0f6ff885567b90484013b31e007#diff-3abe6da78df285aff42ab5932f2dda93">diff</a>)</li>
</ul>
<hr />
<h1 id="further-reading">Further Reading</h1>
<p>If you want even more information about generics in .NET, there are also some very useful design docs available (included in the <a href="https://www.microsoft.com/en-us/download/details.aspx?id=52517">Gyro source code download</a>):</p>
<ul>
<li><a href="/data/2018/03/csharp.html">Generics in C#</a></li>
<li><a href="/data/2018/03/clrgen-types.html">Generics in the Common Type System</a></li>
<li><a href="/data/2018/03/clrgen-il.html">IL extensions for generics</a></li>
</ul>
<p>Also <a href="http://citeseerx.ist.psu.edu/viewdoc/download?rep=rep1&type=pdf&doi=10.1.1.124.3911">Pre-compilation for .NET Generics by Andrew Kennedy & Don Syme</a> (pdf) is an interesting read</p>
Resources for Learning about .NET Internals2018-01-22T00:00:00+00:00http://www.mattwarren.org/2018/01/22/Resources-for-Learning-about-.NET-Internals
<p>It all started with a tweet, which seemed to resonate with people:</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">If you like reading my posts on .NET internals, you'll like all these other blogs. So I've put them together in a thread for you!!</p>— Matt Warren (@matthewwarren) <a href="https://twitter.com/matthewwarren/status/951799867038404608?ref_src=twsrc%5Etfw">January 12, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>The aim was to list blogs that <em>specifically</em> cover .NET internals at a low-level or to put it another way, blogs that answer the question <strong>how does feature ‘X’ work, under-the-hood</strong>. The list includes either <em>typical posts</em> for that blog, or just some of <em>my favourites</em>!</p>
<p><strong>Note:</strong> for a wider list of .NET and performance related blogs see <a href="https://github.com/adamsitnik/awesome-dot-net-performance#article-series">Awesome .NET Performance</a> by <a href="https://twitter.com/SitnikAdam">Adam Sitnik</a></p>
<p>I <strong>wouldn’t recommend reading through the entire list</strong>, at least not in one go, your brain will probably melt. Picks some posts/topics that interest you and start with those.</p>
<p>Finally, bear in mind that some of the posts are over 10 years old, so there’s a chance that things have changed since then (however, in my experience, the low-levels parts of the CLR are more stable). If you want to double-check the latest behaviour, you’re best option is to <a href="https://github.com/dotnet/coreclr">read the source</a>!</p>
<hr />
<h2 id="community-or-non-microsoft-blogs">Community or Non-Microsoft Blogs</h2>
<p>These blogs are all written by non-Microsoft employees (AFAICT), or if they do work for Microsoft, they don’t work directly on the CLR. If I’ve missed any interesting blogs out, please let me know!</p>
<p><strong>Special mention</strong> goes to <strong>Sasha Goldshtein</strong>, he’s been blogging about this <a href="http://blogs.microsoft.co.il/sasha/tag/netinternals/">longer than anyone</a>!!</p>
<ul>
<li><a href="http://blogs.microsoft.co.il/sasha"><strong>All Your Base Are Belong To Us</strong></a> by <a href="https://twitter.com/goldshtn"><strong>Sasha Goldshtein</strong> (@goldshtn)</a>
<ul>
<li><a href="http://blogs.microsoft.co.il/sasha/2010/07/09/generic-method-dispatch/">Generic Method Dispatch</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2010/08/25/inspecting-local-root-lifetime/">Inspecting Local Root Lifetime</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2012/03/15/virtual-method-dispatch-and-object-layout-changes-in-clr-40/">Virtual Method Dispatch and Object Layout Changes in CLR 4.0</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2012/09/18/runtime-representation-of-genericspart-2/">Runtime Representation of Generics—Part 2</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2013/04/10/revisiting-value-types-vs-reference-types/">Revisiting Value Types vs. Reference Types</a></li>
</ul>
</li>
</ul>
<hr />
<ul>
<li><a href="https://blogs.msdn.microsoft.com/seteplia"><strong>Dissecting the code</strong></a> by <a href="https://twitter.com/STeplyakov"><strong>Sergey Teplyakov</strong> (@STeplyakov)</a> (<strong>M/S</strong>)
<ul>
<li><a href="https://blogs.msdn.microsoft.com/seteplia/2017/01/05/understanding-different-gc-modes-with-concurrency-visualizer/">Understanding different GC modes with Concurrency Visualizer</a></li>
<li><a href="https://blogs.msdn.microsoft.com/seteplia/2017/05/09/garbage-collection-and-variable-lifetime-tracking/">Garbage collection and variable lifetime tracking</a></li>
<li><a href="https://blogs.msdn.microsoft.com/seteplia/2017/05/26/managed-object-internals-part-1-layout/">Managed object internals, Part 1. The layout</a> (Also <a href="https://blogs.msdn.microsoft.com/seteplia/2017/09/06/managed-object-internals-part-2-object-header-layout-and-the-cost-of-locking/">part 2</a>, <a href="https://blogs.msdn.microsoft.com/seteplia/2017/09/12/managed-object-internals-part-3-the-layout-of-a-managed-array-3/">part 3</a> and <a href="https://blogs.msdn.microsoft.com/seteplia/2017/09/21/managed-object-internals-part-4-fields-layout/">part 4</a>)</li>
<li><a href="https://blogs.msdn.microsoft.com/seteplia/2017/05/17/box-or-not-to-box-that-is-the-question/">To box or not to Box? That is the question!</a></li>
<li><a href="https://blogs.msdn.microsoft.com/seteplia/2017/02/01/dissecting-the-new-constraint-in-c-a-perfect-example-of-a-leaky-abstraction/">Dissecting the new() constraint in C#: a perfect example of a leaky abstraction</a></li>
</ul>
</li>
<li><a href="http://adamsitnik.com"><strong>Adam Sitnik - .NET Performance and Reliability</strong></a> by <a href="https://twitter.com/SitnikAdam"><strong>Adam Sitnik</strong> (@SitnikAdam)</a> (<strong>M/S</strong>)
<ul>
<li><a href="http://adamsitnik.com/Value-Types-vs-Reference-Types/">Value Types vs Reference Types</a></li>
<li><a href="http://adamsitnik.com/Span/">Span</a></li>
<li><a href="http://adamsitnik.com/Array-Pool/">Pooling large arrays with ArrayPool</a></li>
<li><a href="http://adamsitnik.com/Hardware-Counters-Diagnoser/">Collecting Hardware Performance Counters with BenchmarkDotNet</a></li>
<li><a href="http://adamsitnik.com/Disassembly-Diagnoser/">Disassembling .NET Code with BenchmarkDotNet</a></li>
</ul>
</li>
<li><a href="http://aakinshin.net/blog"><strong>Andrey Akinshin’s blog</strong></a> by <a href="https://twitter.com/andrey_akinshin"><strong>Andrey Akinshin</strong> (@andrey_akinshin)</a>
<ul>
<li><a href="http://aakinshin.net/blog/post/stephen-toub-benchmarks-part1/">Measuring Performance Improvements in .NET Core with BenchmarkDotNet (Part 1)</a></li>
<li><a href="http://aakinshin.net/blog/post/blittable/">Blittable types</a></li>
<li><a href="http://aakinshin.net/blog/post/datetime/">DateTime under the hood</a></li>
<li><a href="http://aakinshin.net/blog/post/stopwatch/">Stopwatch under the hood</a></li>
</ul>
</li>
<li><a href="http://tooslowexception.com/"><strong>TooSlowException</strong></a> by <a href="https://twitter.com/konradkokosa"><strong>Konrad Kokosa</strong> (@konradkokosa)</a>
<ul>
<li><a href="http://tooslowexception.com/net-core-compilation-running-debugging/">.NET Core – compilation, running, debugging</a></li>
<li><a href="http://tooslowexception.com/how-does-gettype-work/">How does Object.GetType() really work?</a></li>
<li><a href="http://tooslowexception.com/zero-garbage-collector-for-net-core/">Zero Garbage Collector for .NET Core</a> and the follow-up <a href="http://tooslowexception.com/zero-garbage-collector-for-net-core-2-1-and-asp-net-core-2-1/">Zero Garbage Collector for .NET Core 2.1 and ASP.NET Core 2.1</a></li>
<li><a href="http://tooslowexception.com/the-ultimate-net-experiment-project/">The Ultimate .NET Experiment – open source project</a></li>
</ul>
</li>
<li><a href="https://marcinjuraszek.com"><strong>a little bit of programming</strong></a> by <a href="https://twitter.com/mmjuraszek"><strong>Marcin Juraszek</strong> (@mmjuraszek)</a> (<strong>M/S</strong>)
<ul>
<li><a href="https://marcinjuraszek.com/2017/10/string-split-and-int-array-allocations.html">String.Split and int[] allocations</a></li>
<li><a href="https://marcinjuraszek.com/2017/05/adding-matt-operator-to-roslyn-part-1.html">Adding Matt operator to Roslyn - Syntax, Lexer and Parser</a> (<a href="https://marcinjuraszek.com/2017/05/adding-matt-operator-to-roslyn-part-2.html">Part 2 - Binder</a>, <a href="https://marcinjuraszek.com/2017/06/adding-matt-operator-to-roslyn-part-3.html">Part 3 - Emitter</a>)</li>
</ul>
</li>
<li><a href="http://yizhang82.me"><strong>yizhang82’s blog</strong></a> by <a href="https://twitter.com/yizhang82"><strong>Yi Zhang</strong> (@yizhang82)</a> (<strong>M/S</strong>)
<ul>
<li><a href="http://yizhang82.me/dotnet-generics-sharing">Sharing .NET generic code under the hood</a></li>
<li><a href="http://yizhang82.me/value-type-boxing">C# value type boxing under the hood</a></li>
<li><a href="http://yizhang82.me/hosting-coreclr">Embedding CoreCLR in your C/C++ application</a></li>
</ul>
</li>
<li><a href="http://codingsight.com/author/timur-guev/"><strong>Timur Guev’s posts on {coding}Sight</strong></a> by <a href="https://twitter.com/timyrik20"><strong>Timur Guev</strong> (@timyrik200)</a>, also <em>appears</em> to have his own blog <a href="http://timyrguev.blogspot.co.uk/">Math and Programming</a> (in Russian)
<ul>
<li><a href="http://codingsight.com/the-origin-of-gethashcode-in-net/">The origin of GetHashCode in .NET</a></li>
<li><a href="http://codingsight.com/strings-in-dot-net/">Aspects of Strings in .NET</a></li>
<li><a href="http://codingsight.com/stringbuilder-the-past-and-the-future/">StringBuilder: the Past and the Future</a></li>
</ul>
</li>
<li><a href="https://alexandrnikitin.github.io/blog/"><strong>The mole is digging</strong></a> by <a href="https://twitter.com/nikitin_a_a"><strong>Alexandr Nikitin</strong> (@nikitin_a_a)</a>
<ul>
<li><a href="https://alexandrnikitin.github.io/blog/dotnet-generics-under-the-hood/">.NET Generics under the hood</a></li>
<li><a href="https://alexandrnikitin.github.io/blog/hoisting-in-net-explained/">Hoisting in .NET Explained</a></li>
<li><a href="https://alexandrnikitin.github.io/blog/hoisting-in-net-examples/">Hoisting in .NET Examples</a></li>
</ul>
</li>
<li><a href="https://mycodingplace.wordpress.com"><strong>My Coding Place</strong></a> by <a href="https://twitter.com/dudi_ke"><strong>Dudi Keleti</strong> (@dudi_ke)</a>
<ul>
<li><a href="https://mycodingplace.wordpress.com/2018/01/10/object-header-get-complicated/">Object header get complicated</a></li>
<li><a href="https://mycodingplace.wordpress.com/2014/04/22/call-vs-callvirt-instruction/">IL Call Vs. Callvirt Instruction</a> (<a href="https://mycodingplace.wordpress.com/2014/04/24/il-call-vs-callvirt-instruction-part-two/">Part 2</a>)</li>
<li><a href="https://mycodingplace.wordpress.com/2016/11/11/value-type-methods-call-callvirt-constrained-and-hidden-boxing/">Value type methods – call, callvirt, constrained and hidden boxing</a></li>
</ul>
</li>
<li><a href="http://xoofx.com/blog/"><strong>Alexandre Mutel’s blog</strong></a> by <a href="https://twitter.com/xoofx"><strong>Alexandre Mutel</strong> (@xoofx)</a>
<ul>
<li><a href="http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/">A new stackalloc operator for reference types with CoreCLR and Roslyn</a></li>
<li><a href="http://xoofx.com/blog/2015/09/27/struct-inheritance-in-csharp-with-roslyn-and-coreclr/">Struct inheritance in C# with CoreCLR and Roslyn</a></li>
</ul>
</li>
</ul>
<p><span id="Update"></span>
<strong>Update:</strong> I missed out a few blogs and learnt about some new ones:</p>
<p>Honourable mention goes to <a href="https://www.codeproject.com/Articles/20481/NET-Type-Internals-From-a-Microsoft-CLR-Perspecti">.NET Type Internals - From a Microsoft CLR Perspective</a> on CodeProject, it’s a great article!!</p>
<ul>
<li><a href="https://aloiskraus.wordpress.com"><strong>Performance is everything. But correctness comes first.</strong></a> by <a href="http://geekswithblogs.net/akraus1/Default.aspx"><strong>Alois Kraus</strong></a> (also includes some great posts on Windows Internals and Debugging, such as <a href="https://aloiskraus.wordpress.com/2016/10/03/windows-10-memory-compression-and-more/">Windows 10 Memory Compression And More</a> and <a href="https://aloiskraus.wordpress.com/2016/10/09/how-buffered-io-can-ruin-performance/">How Buffered IO Can Ruin Performance</a>)
<ul>
<li><a href="https://aloiskraus.wordpress.com/2016/07/18/the-non-contracting-code-contracts/">The Non Contracting Code Contracts</a></li>
<li><a href="https://aloiskraus.wordpress.com/2016/07/31/when-known-net-bugs-bite-you/">When Known .NET Bugs Bite You</a></li>
<li><a href="https://aloiskraus.wordpress.com/2017/04/23/the-definitive-serialization-performance-guide/">The Definitive Serialization Performance Guide</a></li>
<li><a href="https://aloiskraus.wordpress.com/2017/08/17/memanalyzer-v2-5-released/">MemAnalyzer v2.5 Released</a></li>
</ul>
</li>
<li><a href="http://blog.barrkel.com"><strong>Entropy Overload</strong></a> by <a href="https://stackoverflow.com/users/3712/barry-kelly"><strong>Barry Kelly</strong></a>
<ul>
<li><a href="http://blog.barrkel.com/2006/05/call-vs-callvirt-for-c-non-virtual.html">Call vs CallVirt for C# non-virtual instance methods</a></li>
<li><a href="http://blog.barrkel.com/2006/07/covariance-and-contravariance-in-net.html">Covariance and Contravariance in .NET, Java and C++</a></li>
<li><a href="http://blog.barrkel.com/2006/07/not-so-lazy-garbage-collector.html">The not so lazy garbage collector</a></li>
<li><a href="http://blog.barrkel.com/2009/12/commonly-confused-tidbits-re-net.html">Commonly Confused Tidbits re .NET Garbage Collector</a></li>
</ul>
</li>
<li><a href="https://blog.matthewskelton.net"><strong>Matthew Skelton’s blog</strong></a> by <a href="https://twitter.com/matthewpskelton"><strong>Matthew Skelton</strong></a>
<ul>
<li><a href="https://blog.matthewskelton.net/2012/01/29/advanced-call-processing-in-the-clr/">Advanced Call Processing in the CLR</a></li>
<li><a href="https://blog.matthewskelton.net/2012/01/29/clr-com-interop/">CLR-COM Interop</a></li>
<li><a href="https://blog.matthewskelton.net/2012/01/29/clr-contexts/">CLR Contexts</a></li>
</ul>
</li>
<li><a href="http://www.liranchen.com"><strong>.Net Internals, Debugging, Multithreading - and More!</strong></a> by <a href="??"><strong>Liran Chen</strong></a>
<ul>
<li><a href="http://www.liranchen.com/2010/08/accurately-measuring-gc-suspensions.html">Accurately Measuring GC Suspensions</a></li>
<li><a href="http://www.liranchen.com/2010/07/behind-locals-init-flag.html">Behind The .locals init Flag</a></li>
<li><a href="http://www.liranchen.com/2010/08/brain-teasing-with-strings.html">Brain Teasing With Strings</a></li>
</ul>
</li>
<li><a href="https://blog.maartenballiauw.be/"><strong>Maarten Balliauw {blog}</strong></a> by <a href="https://twitter.com/maartenballiauw"><strong>Maarten Balliauw</strong></a>
<ul>
<li><a href="https://blog.maartenballiauw.be/post/2017/01/03/exploring-.net-managed-heap-with-clrmd.html">Exploring .NET managed heap with ClrMD</a></li>
<li><a href="https://blog.maartenballiauw.be/post/2016/11/15/exploring-memory-allocation-and-strings.html">Exploring memory allocation and strings</a></li>
<li><a href="https://blog.maartenballiauw.be/post/2016/10/19/making-net-code-less-allocatey-garbage-collector.html">Making .NET code less allocatey - Allocations and the Garbage Collector</a></li>
</ul>
</li>
<li><a href="https://www.tabsoverspaces.com"><strong>tabs ↹ over ␣ ␣ ␣ spaces</strong></a> by <a href="https://twitter.com/cincura_net"><strong>Jiri Cincura</strong></a>
<ul>
<li><a href="https://www.tabsoverspaces.com/233660-are-static-methods-faster-in-execution-compared-to-instance-methods-dotnet/">Are static methods faster in execution compared to instance methods?</a></li>
<li><a href="https://www.tabsoverspaces.com/233661-where-are-the-differences-in-execution-speed-of-various-method-types-come-from-dotnet/">Where are the differences in execution speed of various method types come from?</a></li>
</ul>
</li>
<li><a href="http://www.ntcore.com/articles.php"><strong>NTCore</strong></a> (also writes on the <a href="http://cerbero-blog.com/?author=1">Cerbero Blog</a>) by <a href="https://twitter.com/dpistelli"><strong>Daniel Pistelli</strong></a>
<ul>
<li><a href="http://www.ntcore.com/Files/netint_native.htm">.NET Internals and Native Compiling</a></li>
<li><a href="http://www.ntcore.com/files/netint_injection.htm">.NET Internals and Code Injection</a></li>
<li><a href="http://www.ntcore.com/files/dotnetformat.htm">The .NET File Format</a></li>
</ul>
</li>
<li><a href="http://www.abhisheksur.com"><strong>DOT NET TRICKS</strong></a> by <a href="https://twitter.com/abhi2434"><strong>Abhishek Sur (@abhi2434)</strong></a>
<ul>
<li><a href="http://www.abhisheksur.com/2011/03/internals-to-net.html">Internals to .NET</a></li>
<li><a href="http://www.abhisheksur.com/2011/09/internals-of-net-objects-and-use-of-sos.html">Internals of .NET Objects and Use of SOS</a></li>
<li><a href="http://www.abhisheksur.com/2011/07/valuetypes-and-referencetypes-under.html">ValueTypes and ReferenceTypes : Under the Hood</a> (<a href="http://www.abhisheksur.com/2011/07/valuetype-and-referencetype-under-hood.html">part 2</a>)</li>
</ul>
</li>
<li><a href="https://blog.adamfurmanek.pl/"><strong>Random IT Utensils</strong></a> by <a href="https://twitter.com/furmanekadam">Adam Furmanek</a>
<ul>
<li><a href="https://blog.adamfurmanek.pl/2016/04/23/custom-memory-allocation-in-c-part-1/">Custom memory allocation in C# Part 1 — Allocating object on a stack</a></li>
<li><a href="https://blog.adamfurmanek.pl/2016/07/09/custom-memory-allocation-in-c-part-6/">Custom memory allocation in C# Part 6 — Memory errors</a></li>
<li><a href="https://blog.adamfurmanek.pl/2016/05/21/virtual-and-non-virtual-calls-in-c/">.NET Inside Out Part 1 — Virtual and non-virtual calls in C#</a></li>
<li><a href="https://blog.adamfurmanek.pl/2017/05/27/how-to-override-sealed-function-in-c-revisited/">.NET Inside Out Part 4 — How to override sealed function in C# Revisited</a></li>
<li><a href="https://blog.adamfurmanek.pl/2018/03/24/generating-func-from-bunch-of-bytes-in-c/">.NET Inside Out Part 7 — Generating Func from a bunch of bytes in C#</a></li>
</ul>
</li>
<li><a href="https://www.red-gate.com/simple-talk/author/24200-simon-cooper/"><strong>Redgate ‘Simple Talk’ posts</strong></a> by <a href=""><strong>Simon Cooper</strong></a>
<ul>
<li><a href="https://www.red-gate.com/Search/?s=%22Anatomy+of+a+.NET+Assembly%22&t=simpletalk">Series on ‘<strong>Anatomy of a .NET Assembly</strong>’</a> (<a href="https://www.google.co.uk/search?q=site%3Ahttps%3A%2F%2Fwww.red-gate.com%2Fsimple-talk%2F+%22Anatomy+of+a+.NET+Assembly%22&oq=site%3Ahttps%3A%2F%2Fwww.red-gate.com%2Fsimple-talk%2F+%22Anatomy+of+a+.NET+Assembly%22">Google search</a>)
<ul>
<li><a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-pe-headers/">PE Headers</a> (Intro)</li>
<li><a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-clr-metadata-1/">CLR metadata 1</a>, <a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-clr-metadata-2/">Part 2</a> and <a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-clr-metadata-3/">Part 3</a></li>
<li><a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-the-dos-stub/">The DOS stub</a> and <a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-the-clr-loader-stub/">The CLR Loader stub</a></li>
<li><a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-methods/">Methods</a> and <a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-type-forwards/">Type forwards</a></li>
</ul>
</li>
<li><a href="https://www.red-gate.com/Search/?s=%22Subterranean+IL%22&t=simpletalk">Series on ‘<strong>Subterranean IL</strong>’</a> (<a href="https://www.google.co.uk/search?q=site%3Ahttps%3A%2F%2Fwww.red-gate.com%2Fsimple-talk%2F+%22Subterranean+IL%22&oq=site%3Ahttps%3A%2F%2Fwww.red-gate.com%2Fsimple-talk%2F+%22Subterranean+IL%22">Google search</a>)
<ul>
<li><a href="https://www.red-gate.com/simple-talk/blogs/subterranean-il-introduction/">Introduction</a></li>
<li><a href="https://www.red-gate.com/simple-talk/blogs/subterranean-il-callvirt-and-virtual-methods/">Callvirt and virtual methods</a> and <a href="https://www.red-gate.com/simple-talk/blogs/subterranean-il-callvirt-and-generic-types/">Callvirt and generic types</a></li>
<li><a href="https://www.red-gate.com/simple-talk/blogs/subterranean-il-the-threadlocal-type/">The ThreadLocal type</a> and <a href="https://www.red-gate.com/simple-talk/blogs/subterranean-il-threadlocal-revisited/">ThreadLocal revisited</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="https://ayende.com"><strong>Ayende @ Rahien</strong></a> by <a href="https://twitter.com/ayende"><strong>Oren Eini</strong></a>
<ul>
<li><a href="https://ayende.com/blog/177986/de-virtualization-in-coreclr-part-i">De-virtualization in CoreCLR - Part I</a> and <a href="https://ayende.com/blog/177987/de-virtualization-in-coreclr-part-ii">Part II</a></li>
<li><a href="https://ayende.com/blog/174914/debugging-coreclr-applications-in-windbg">Debugging CoreCLR applications in WinDBG</a></li>
<li><a href="https://ayende.com/blog/174977/digging-into-the-coreclr-jit-introduction">Digging into the CoreCLR - JIT Introduction</a> (by <a href="https://twitter.com/federicolois">Federico Andres Lois</a>)</li>
<li><a href="https://ayende.com/blog/175009/digging-into-the-coreclr-exceptional-costs-part-i">Digging into the CoreCLR - Exceptional costs, Part I</a> and <a href="https://ayende.com/blog/175010/digging-into-the-coreclr-exceptional-costs-part-ii">Part II</a> (by <a href="https://twitter.com/federicolois">Federico Andres Lois</a>)</li>
</ul>
</li>
<li><a href="https://lowleveldesign.org"><strong>Low Level Design</strong></a> by <a href="https://twitter.com/lowleveldesign"><strong>Sebastian Solnica</strong></a> (he’s also done some <a href="https://lowleveldesign.org/presentations/">great presentations</a>)
<ul>
<li><a href="https://lowleveldesign.org/2010/10/11/writing-a-net-debugger-part-1-starting-the-debugging-session/">Writing a .Net Debugger</a>, also <a href="https://lowleveldesign.org/2010/10/22/writing-a-net-debugger-part-2-handling-events-and-creating-wrappers/">Part 2</a>, <a href="https://lowleveldesign.org/2010/11/08/writing-a-net-debugger-part-3-symbol-and-source-files/">Part 3</a> and <a href="https://lowleveldesign.org/2010/12/01/writing-a-net-debugger-part-4-breakpoints/">Part 4</a></li>
<li><a href="https://lowleveldesign.org/2018/08/15/randomness-in-net/">Randomness in .NET</a></li>
<li><a href="https://lowleveldesign.org/2016/08/23/enumerating-appdomains-in-a-remote-process/">Enumerating AppDomains in a remote process</a></li>
</ul>
</li>
<li><a href="https://ekasiswanto.wordpress.com/"><strong>Welcome to the Corner of Excellence</strong></a> by <a href="https://twitter.com/surya_rakanta"><strong>Eka Siswanto</strong></a> now hosted at <a href="https://excellentcorner.com/">https://excellentcorner.com/</a>
<ul>
<li><a href="https://excellentcorner.com/2018/06/21/how-to-perform-precise-breakpoint-on-net-method-in-windbg/">How to Perform Precise Breakpoint on .NET Method in WinDBG</a></li>
<li><a href="https://ekasiswanto.wordpress.com/2010/11/15/sos-internals-threads-command/">SOS Internals – threads Command</a></li>
<li><a href="https://ekasiswanto.wordpress.com/2010/11/17/sos-internals-dumpdomain-command/">SOS Internals – DumpDomain Command</a></li>
<li><a href="https://ekasiswanto.wordpress.com/2010/11/23/sos-internals-dumpmodule-command/">SOS Internals – DumpModule Command</a></li>
</ul>
</li>
<li><a href="http://blog.steveniemitz.com"><strong>Steve’s Tech Blog</strong></a> by <a href="https://twitter.com/steveniemitz"><strong>Steven Niemitz</strong></a>
<ul>
<li><a href="http://blog.steveniemitz.com/building-a-mixed-mode-stack-walker-part-1/">Building a mixed-mode stack walker - Part 1</a> and <a href="http://blog.steveniemitz.com/building-a-mixed-mode-stack-walker-part-2/">Part 2</a></li>
<li><a href="http://blog.steveniemitz.com/implementing-sos-with-spt-part-1-of-n-dumpobj/">Implementing SOS with SPT - Part 1 of N - <strong>DumpObj</strong></a>, <a href="http://blog.steveniemitz.com/implementing-sos-with-spt-part-2-of-n-dumpstackobjects/">Part 2 of N - <strong>DumpStackObjects</strong></a> and <a href="http://blog.steveniemitz.com/implementing-sos-with-spt-part-3-of-n-dumpmd-ip2md/">Part 3 of N - <strong>DumpMD & IP2MD</strong></a></li>
<li><a href="http://blog.steveniemitz.com/threads-cant-be-aborted-while-theyre-running-code-inside-a-catchfinally-block/">Threads can’t be aborted while they’re running code inside a catch/finally block</a></li>
</ul>
</li>
<li><a href="https://www.mode19.net/"><strong>Mode 13h</strong></a> by <a href="https://twitter.com/DustinMetzgar"><strong>Dustin Metzgar</strong></a> (author of <a href="https://www.manning.com/books/dotnet-core-in-action">.NET Core in Action</a>)
<ul>
<li><a href="https://www.mode19.net/posts/clrhostingold/">Hosting the CLR the <strong>Old</strong> Way</a></li>
<li><a href="https://www.mode19.net/posts/clrhostingright/">Hosting the CLR the <strong>Right</strong> Way</a>
<a href="http://benbowen.blog"><strong>Ben Bowen’s Blog</strong></a> by <a href="https://twitter.com/Xenoprimate"><strong>Ben Bowen</strong></a></li>
<li><a href="http://benbowen.blog/post/fun_with_makeref/">Fun With __makeref</a></li>
<li><a href="http://benbowen.blog/post/pinvoke_tips/">P/Invoke Tips</a></li>
<li><a href="http://benbowen.blog/post/tale_of_two_casts/#implementation_details">Postmortems - Tale of Two Casts</a></li>
</ul>
</li>
</ul>
<hr />
<h2 id="book-of-the-runtime-botr">Book of the Runtime (BotR)</h2>
<p>The BotR deserves it’s own section (thanks to <strong>svick</strong> to <a href="http://disq.us/p/1pkmyni">reminding me about it</a>).</p>
<p>If you haven’t heard of the BotR before, there’s a nice FAQ that <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/botr-faq.md#what-is-the-botr">explains what it is</a>:</p>
<blockquote>
<p>The Book of the Runtime is a set of documents that describe components in the CLR and BCL. They are intended to focus more on architecture and invariants and not an annotated description of the codebase.</p>
<p>It was originally created within Microsoft in ~2007, including this document. Developers were responsible to document their feature areas. This helped new devs joining the team and also helped share the product architecture across the team.</p>
</blockquote>
<p>To find your way around it, I recommend starting with the <a href="https://github.com/dotnet/coreclr/tree/master/Documentation/botr#the-book-of-the-runtime">table of contents</a> and then diving in.</p>
<p><strong>Note:</strong> It’s written for <em>developers working on the CLR</em>, so it’s not an introductory document. I’d recommend reading some of the other blog posts first, then referring to the BotR once you have the basic knowledge. For instance many of my blog posts started with me reading a chapter from the BotR, not fully understanding it, going away and learning some more, writing up what I found and then pointing people to the relevant BotR page for more information.</p>
<hr />
<h2 id="microsoft-engineers">Microsoft Engineers</h2>
<p>The blogs below are written by the <em>actual</em> engineers who worked on, designed or managed various parts of the CLR, so they give a deep insight (again, if I’ve missed any blogs out, please let me know):</p>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/maoni"><strong>Maoni’s WebLog - CLR Garbage Collector</strong></a> by <a href="https://channel9.msdn.com/Shows/On-NET/Maoni-Stephens-on-NET-GC"><strong>Maoni Stephens</strong></a>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2006/06/07/suspending-and-resuming-threads-for-gc/">Suspending and resuming threads for GC</a></li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2015/07/15/allocating-on-the-stack-or-the-heap/">Allocating on the stack or the heap?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2006/04/19/large-object-heap/">Large Object Heap</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/cbrumme/"><strong>cbrumme’s WebLog</strong></a> by <a href="https://channel9.msdn.com/Search?term=Christopher%20Brumme#ch9Search&lang-en=en&pubDate=all"><strong>Christopher Brumme</strong></a>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/cbrumme/2003/05/17/memory-model/">Memory Model</a></li>
<li><a href="https://blogs.msdn.microsoft.com/cbrumme/2003/05/10/value-types/">Value Types</a></li>
<li><a href="https://blogs.msdn.microsoft.com/cbrumme/2003/04/25/virtual-and-non-virtual/">Virtual and non-virtual</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba"><strong>A blog on coding, .NET, .NET Compact Framework and life in general..</strong></a> by <strong>Abhinaba Basu</strong>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2014/09/29/net-just-in-time-compilation-and-warming-up-your-system/">.NET Just in Time Compilation and Warming up Your System</a></li>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2008/04/30/trivia-how-does-clr-create-an-outofmemoryexception/">Trivia: How does CLR create an OutOfMemoryException</a></li>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2009/01/25/back-to-basic-series-on-dynamic-memory-management/">Back to basic: Series on dynamic memory management</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/joelpob"><strong>Joel Pobar’s CLR weblog - CLR Program Manager: Reflection, LCG, Generics and the type system..</strong></a> by <a href="https://channel9.msdn.com/Events/Speakers/Joel-Pobar"><strong>Joel Pobar</strong></a>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/joelpob/2004/07/19/clr-type-system-notes/">CLR Type System notes</a></li>
<li><a href="https://blogs.msdn.microsoft.com/joelpob/2004/11/17/clr-generics-and-code-sharing/">CLR Generics and code sharing</a></li>
<li><a href="https://blogs.msdn.microsoft.com/joelpob/2004/02/26/explanatory-notes-on-rotors-garbage-collector/">Explanatory notes on Rotor’s Garbage Collector</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/davbr/"><strong>CLR Profiling API Blog - Info about the Common Language Runtime’s Profiling API</strong></a> by <a href="https://channel9.msdn.com/Search?term=David%20Broman#pubDate=all&ch9Search&lang-en=en">David Broman</a> (slightly niche, but still worth a read)
<ul>
<li><a href="https://blogs.msdn.microsoft.com/davbr/2007/03/06/creating-an-il-rewriting-profiler/">Creating an IL-rewriting profiler</a></li>
<li><a href="https://blogs.msdn.microsoft.com/davbr/2009/09/30/type-forwarding/">Type Forwarding</a></li>
<li><a href="https://blogs.msdn.microsoft.com/davbr/2011/10/17/metadata-tokens-run-time-ids-and-type-loading/">Metadata Tokens, Run-Time IDs, and Type Loading</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/yunjin"><strong>Yun Jin’s WebLog CLR internals, Rotor code explanation, CLR debugging tips, trivial debugging notes, .NET programming pitfalls</strong></a> by <a href="https://social.msdn.microsoft.com/profile/Yun+Jin"><strong>Yun Jin</strong></a>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/yunjin/2004/02/09/fcall-and-gc-hole-first-post-about-rotor/">FCall and GC hole – first post about Rotor</a></li>
<li><a href="https://blogs.msdn.microsoft.com/yunjin/2005/07/05/special-threads-in-clr/">Special threads in CLR</a></li>
<li><a href="https://blogs.msdn.microsoft.com/yunjin/2004/02/21/dangerous-pinvokes-string-modification/">Dangerous PInvokes – string modification</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/clrcodegeneration"><strong>JIT, NGen, and other Managed Code Generation Stuff - Details about RyuJIT stuff of all sort..</strong></a> by various
<ul>
<li><a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/array-bounds-check-elimination-in-the-clr/">Array Bounds Check Elimination in the CLR</a></li>
<li><a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2007/11/02/how-are-value-types-implemented-in-the-32-bit-clr-what-has-been-done-to-improve-their-performance/">How are value types implemented in the 32-bit CLR? What has been done to improve their performance?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2009/10/21/jit-etw-inlining-event-fail-reasons/">JIT ETW Inlining Event Fail Reasons</a></li>
<li><a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2010/04/27/ngen-measuring-working-set-with-vmmap/">NGen: Measuring Working Set with VMMap</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/carlos"><strong>Distributed Matters - Troubleshooting issues in technologies available to developers for building distributed applications</strong></a> by <a href="https://blogs.msdn.microsoft.com/carlos/author/carcolo/"><strong>Carlo</strong></a>
<ul>
<li><a href="https://blogs.msdn.microsoft.com/carlos/2009/11/09/net-generics-and-code-bloat-or-its-lack-thereof/">.NET Generics and Code Bloat (or its lack thereof)</a></li>
<li><a href="https://blogs.msdn.microsoft.com/carlos/2008/12/10/heap-corruption-a-case-study/">Heap Corruption: A Case Study</a></li>
<li><a href="https://blogs.msdn.microsoft.com/carlos/2013/08/23/loading-multiple-clr-runtimes-inproc-sxs-sample-code/">Loading multiple CLR Runtimes (InProc SxS) – Sample Code</a></li>
</ul>
</li>
<li><a href="http://bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx"><strong>B# .NET Blog - BART DE SMET’S on-line blog (0X2B | ~0X2B, THAT’S THE QUESTION)</strong></a> by <a href="https://channel9.msdn.com/Events/Speakers/Bart-De-Smet"><strong>Bart De Smet</strong></a>
<ul>
<li><a href="http://bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx">.NET 2.0 string interning inside out</a></li>
<li><a href="http://bartdesmet.net/blogs/bart/archive/2007/02/19/inlining-yes-it-happens.aspx">Inlining - yes, it happens</a></li>
<li><a href="http://bartdesmet.net/blogs/bart/archive/2006/09/07/4395.aspx">Going Unsafe - An ADDRESSOF Operator in C#</a></li>
<li><a href="http://bartdesmet.net/blogs/bart/archive/2006/10/03/4491.aspx">A Beginner’s Guide to Cordbg</a></li>
</ul>
</li>
<li><a href="https://natemcmaster.com">Nate McMaster’s blog</a> by <a href="https://twitter.com/natemcmaster"><strong>Nate McMaster</strong></a>
<ul>
<li><a href="https://natemcmaster.com/blog/2017/12/21/netcore-primitives/">Deep-dive into .NET Core primitives: deps.json, runtimeconfig.json, and dll’s</a></li>
<li><a href="https://natemcmaster.com/blog/2018/08/29/netcore-primitives-2/">Deep-dive into .NET Core primitives, part 2: the shared framework</a></li>
<li><a href="https://natemcmaster.com/blog/2018/07/25/netcore-plugins/">.NET Core Plugins</a></li>
</ul>
</li>
</ul>
<hr />
<h2 id="books">Books</h2>
<p>Finally, if you prefer reading off-line there are some decent books that discuss .NET Internals (Note: all links are Amazon Affiliate links):</p>
<ul>
<li><a href="http://amzn.to/2Ba0ytN">CLR via C#, 4ed by <strong>Jeffrey Richter</strong></a></li>
<li><a href="http://amzn.to/2DcscYY">Shared Source CLI Essentials Paperback by <strong>David Stutz, Ted Neward, Geoff Shilling</strong></a> Ted (Ted Neward also made a pdf version available to <a href="http://www.newardassociates.com/files/SSCLI2.pdf">download from his web site</a>)</li>
<li><a href="http://amzn.to/2EOFX0e">Writing High-Performance .NET Code Paperback by <strong>Ben Watson</strong></a>
<ul>
<li>His <a href="http://www.philosophicalgeek.com">blog</a> is also worth reading, e.g. <a href="http://www.philosophicalgeek.com/2014/09/29/digging-into-net-object-allocation-fundamentals/">Digging Into .NET Object Allocation Fundamentals</a> and <a href="http://www.philosophicalgeek.com/2014/11/20/digging-into-net-loop-performance-bounds-checking-iteration-and-unrolling/">Digging Into .NET Loop Performance, Bounds-checking, Iteration, and Unrolling</a></li>
</ul>
</li>
<li><a href="http://amzn.to/2Djtplh">Pro .NET Performance: Optimize Your C# Applications by <strong>Sasha Goldshtein</strong></a></li>
</ul>
<p>All the books listed above I own copies of and I’ve read cover-to-cover, they’re fantastic resources.</p>
<p>I’ve also been recently recommend the 2 books below, they look good and certainly the authors know their stuff, but I haven’t read them yet:</p>
<ul>
<li><a href="http://amzn.to/2ERV6Ol">The Common Language Infrastructure Annotated Standard by <strong>James S. Miller, Susann Ragsdale</strong></a></li>
<li><a href="http://amzn.to/2Dm1yAV">Essential .NET, Volume I: The Common Language Runtime by <strong>Don Box, Chris Sells</strong></a></li>
</ul>
<p><strong>*New Release*</strong></p>
<ul>
<li><a href="https://amzn.to/2PA50Jp">Pro .NET Memory Management: For Better Code, Performance, and Scalability by <strong>Konrad Kokosa</strong></a> (Nov 2018)</li>
</ul>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=16212220">HackerNews</a> and <a href="https://www.reddit.com/r/programming/comments/7s7rkq/resources_for_learning_about_net_internals/">/r/programming</a></p>
A look back at 20172017-12-31T00:00:00+00:00http://www.mattwarren.org/2017/12/31/A-look-back-at-2017
<p>I’ve now been blogging consistently for over 2 years (~2 times per/month) and I decided it was time for my first ‘retrospective’ post.</p>
<p><strong style="color:red">Warning</strong> this post contains a large amount of <a href="https://www.urbandictionary.com/define.php?term=humblebrag"><strong>humble brags</strong></a>, if you’ve come here to read about <a href="/tags/#Internals">‘<em>.NET internals</em>’</a> you’d better check back in a few weeks, when normal service will be resumed!</p>
<hr />
<h2 id="overall-stats">Overall Stats</h2>
<p>Firstly, lets looks at my Google Analytics stats for 2017, showing <strong>Page Views</strong> and <strong>Sessions</strong>:</p>
<p><a href="/images/2017/12/Blog - Page Views & Sessions - 2017.png"><img src="/images/2017/12/Blog - Page Views & Sessions - 2017.png" alt="Blog - Page Views & Sessions - 2017" /></a></p>
<p>Which clearly shows that I took a bit of a break during the summer! But I still managed over 800K page views, mostly because I was fortunate enough to end up on the <a href="https://hn.algolia.com/?query=mattwarren.org%2F2017&sort=byPopularity&prefix&page=0&dateRange=pastYear&type=story">front page of HackerNews a few times</a>!</p>
<p>As a comparison, here’s what ‘2017 v 2016’ looks like:</p>
<p><a href="/images/2017/12/Blog - Page Views - 2016 v 2017.png"><img src="/images/2017/12/Blog - Page Views - 2016 v 2017.png" alt="Blog - Page Views - 2016 v 2017" /></a></p>
<p>This is cool because it shows a nice trend, more people read my blog posts in 2017 than in 2016 (but I have no idea if it will continue in 2018?!)</p>
<hr />
<h2 id="most-read-posts">Most Read Posts</h2>
<p>Next, here are my <strong>top 10 most read</strong> posts. Surprising enough my most read post was literally just a list with 68 entries in it!!</p>
<table>
<thead>
<tr>
<th>Post</th>
<th style="text-align: right">Page Views</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/">The 68 things the CLR does before executing a single line of your code</a></td>
<td style="text-align: right">101,382</td>
</tr>
<tr>
<td><a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">A Hitchhikers Guide to the CoreCLR Source Code</a></td>
<td style="text-align: right">61,169</td>
</tr>
<tr>
<td><a href="/2017/11/08/A-DoS-Attack-against-the-C-Compiler/">A DoS Attack against the C# Compiler</a></td>
<td style="text-align: right">50,884</td>
</tr>
<tr>
<td><a href="/2017/10/12/Analysing-C-code-on-GitHub-with-BigQuery/">Analysing C# code on GitHub with BigQuery</a></td>
<td style="text-align: right">40,165</td>
</tr>
<tr>
<td><a href="/2017/05/19/Adding-a-new-Bytecode-Instruction-to-the-CLR/">Adding a new Bytecode Instruction to the CLR</a></td>
<td style="text-align: right">39,101</td>
</tr>
<tr>
<td><a href="/2017/12/19/Open-Source-.Net-3-years-later">Open Source .NET – 3 years later</a></td>
<td style="text-align: right">36,316</td>
</tr>
<tr>
<td><a href="/2017/01/25/How-do-.NET-delegates-work/">How do .NET delegates work?</a></td>
<td style="text-align: right">36,047</td>
</tr>
<tr>
<td><a href="/2017/05/25/Lowering-in-the-C-Compiler/">Lowering in the C# Compiler (and what happens when you misuse it)</a></td>
<td style="text-align: right">34,375</td>
</tr>
<tr>
<td><a href="/2017/06/15/How-the-.NET-Rutime-loads-a-Type/">How the .NET Runtime loads a Type</a></td>
<td style="text-align: right">32,813</td>
</tr>
<tr>
<td><a href="/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/">DotNetAnywhere: An Alternative .NET Runtime</a></td>
<td style="text-align: right">26,140</td>
</tr>
</tbody>
</table>
<hr />
<h2 id="traffic-sources">Traffic Sources</h2>
<p>I was going to do a write-up on where/how I get my blog traffic, but instead I’d encourage you to read <a href="https://henrikwarne.com/2017/11/26/6-years-of-thoughts-on-programming">6 Years of Thoughts on Programming</a> by <a href="https://twitter.com/henrikwarne">Henrik Warne</a> as his experience <strong>exactly</strong> matches mine. But in summary, getting onto the front-page of <a href="http://news.ycombinator.com/">HackerNews</a> drives <strong>a lot</strong> of traffic to your site/blog.</p>
<hr />
<p><strong>Finally, a big thanks to everyone who has read, commented on or shared my blogs posts, it means a lot!!</strong></p>
Open Source .NET – 3 years later2017-12-19T00:00:00+00:00http://www.mattwarren.org/2017/12/19/Open-Source-.Net-3-years-later.
<link rel="stylesheet" href="/datavis/dotnet-oss.css" />
<script src="/datavis/dotnet-oss.js" type="text/javascript"></script>
<p>A little over 3 years ago Microsoft announced that they were <a href="http://www.hanselman.com/blog/AnnouncingNET2015NETAsOpenSourceNETOnMacAndLinuxAndVisualStudioCommunity.aspx">open sourcing large parts of the .NET framework</a> and as <a href="https://twitter.com/shanselman">Scott Hanselman</a> said in his <a href="https://channel9.msdn.com/Events/Connect/2016/Keynotes-Scott-Guthrie-and-Scott-Hanselman">Connect 2016 keynote</a>, the community has been contributing in a significant way:</p>
<p><a href="https://twitter.com/poweredbyaltnet/status/798942478195970048"><img src="/images/2016/11/Over 60 of the contributions to dotnetcore come from the community.jpg" alt="Over 60% of the contribution to .NET Core come from the community" /></a></p>
<p>This post forms part of an on-going series, if you want to see how things have changed over time you can check out the previous ones:</p>
<ul>
<li><a href="/2016/11/23/open-source-net-2-years-later/?recommended=1">Open Source .NET – 2 years later</a></li>
<li><a href="/2016/01/15/open-source-net-1-year-later-now-with-aspnet/?recommended=1">Open Source .NET – 1 year later - Now with ASP.NET</a></li>
<li><a href="/2015/12/08/open-source-net-1-year-later/?recommended=1">Open Source .NET – 1 year later</a></li>
</ul>
<p>In addition, I’ve recently done a talk <a href="/2017/11/14/Microsoft-and-Open-Source-a-Brave-New-World-CORESTART/">covering this subject</a>, the slides are below:</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/bSYyRobLw3jMLq" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20" title="Microsoft & open source a 'brave new world' - CORESTART 2.0" target="_blank">Microsoft & open source a 'brave new world' - CORESTART 2.0</a> </strong> from <strong><a href="https://www.slideshare.net/mattwarren" target="_blank">Matt Warren</a></strong> </div>
<hr />
<h3 id="historical-perspective">Historical Perspective</h3>
<p>Now that we are 3 years down the line, it’s interesting to go back and see what the aims were when it all started. If you want to know more about this, I recommend watching the 2 Channel 9 videos below, made by the Microsoft Engineers involved in the process:</p>
<ul>
<li><a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-02-25">.NET Internals 2015-02-25: Open Source</a></li>
<li><a href="https://channel9.msdn.com/Blogs/dotnet/NET-Foundations-2015-03-04">.NET Internals 2015-03-04: .NET Core & Cross Platform</a></li>
</ul>
<p>It hasn’t always been plain sailing, it’s fair to say that there have been a few bumps along the way (I guess that’s what happens if you get to see <a href="https://english.stackexchange.com/questions/120739/a-peek-into-the-sausage-factory">“how the sausage gets made”</a>), but I think that we’ve ended up in a good place.</p>
<p>During the past 3 years there have been a few notable events that I think are worth mentioning:</p>
<ul>
<li>Samsung developers have made <a href="https://github.com/dotnet/coreclr/issues/8496#issuecomment-351463875">significant contributions to the CoreCLR source code</a>, to support their Tizen OS</li>
<li>Microsoft really are developing ‘out in the open’, you can see this by how often <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22https%3A%2F%2Fgithub.com%2Fdotnet%2Fcoreclr%22+language%3AC%2B%2B+language%3AC%23&type=Code">GitHub issues are referenced</a> in the source code</li>
<li>We saw the <a href="https://msdn.microsoft.com/en-us/magazine/mt814808">new Span<T> apis</a> move their way through the various repos, <a href="https://github.com/dotnet/corefxlab/search?q=Span&type=Commits&utf8=%E2%9C%93">CoreFXLabs</a> -> <a href="https://github.com/dotnet/coreclr/search?q=Span&type=Commits&utf8=%E2%9C%93">CoreCLR</a> -> <a href="https://github.com/dotnet/roslyn/search?q=Span&type=Commits&utf8=%E2%9C%93">Roslyn</a> -> <a href="https://github.com/dotnet/corefx/search?q=Span&type=Commits&utf8=%E2%9C%93">CoreFX</a> before turning into a complete feature!</li>
<li>There’s been deeper integration between <a href="https://github.com/dotnet/corefx/issues/25379">.NET Core and Mono</a></li>
<li>Significant Performance Improvements <a href="https://blogs.msdn.microsoft.com/dotnet/2017/06/07/performance-improvements-in-net-core/">have been made in .NET Core</a></li>
<li>.NET Core and .NET Desktop have <a href="https://github.com/dotnet/coreclr/pull/9044#issuecomment-274543630">now sufficiently diverged</a> (even though they still share code, such as JIT, GC)</li>
<li>Microsoft have made a concerted effort to ensure that all their Open Source code can be built <a href="https://github.com/dotnet/coreclr/issues/14345">just using other Open Source code</a></li>
<li>The <a href="https://github.com/dotnet/coreclr/projects/3">Local GC</a> effort has been started, aiming to ‘decouple the GC from the rest of the runtime’</li>
<li>.NET will be finally getting <a href="/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation/">Tiered Compilation</a></li>
</ul>
<hr />
<h3 id="repository-activity-over-time">Repository activity over time</h3>
<p>But onto the data, first we are going to look at an overview of the <strong>level of activity in each repo</strong>, by looking at the total number of ‘<strong>Issues</strong>’ (created) or ‘<strong>Pull Requests</strong>’ (closed) per month. (<a href="http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR">yay sparklines FTW!!</a>). If you are interested in <em>how</em> I got the data, see the previous post <a href="/2016/11/23/open-source-net-2-years-later#methodology---community-v-microsoft">because the process is the same</a>.</p>
<p><strong>Note:</strong> Numbers in <span style="color:rgb(0,0,0);font-weight:bold;">black</span> are from the most recent month, with the <span style="color:#d62728;font-weight:bold;">red</span> dot showing the lowest and the <span style="color:#2ca02c;font-weight:bold;">green</span> dot the highest previous value. You can toggle between <strong>Issues</strong> and <strong>Pull Requests</strong> by clicking on the buttons, hover over individual sparklines to get a tooltip showing the per/month values and click on the project name to take you to the GitHub page for that repository.</p>
<section class="press" align="center">
<!-- <section class="gradient" align="center"> -->
<button id="btnIssues" class="active">Issues</button>
<button id="btnPRs">Pull Requests</button>
</section>
<div id="textbox" class="rChartHeader">
<!-- The Start/End dates are setup dynamically, once the data is loaded -->
<p id="dataStartDate" class="alignleft"></p>
<p id="dataEndDate" class="alignright"></p>
</div>
<div style="clear: both;"></div>
<!-- All the sparklines are added to this div -->
<div id="sparkLines" class="rChart nvd3">
</div>
<p>This data gives a good indication of how healthy different repos are, are they growing over time, or staying the same. You can also see the different levels of activity each repo has and how they compare to other ones.</p>
<p>Whilst it’s clear that <a href="https://github.com/microsoft/vscode">Visual Studio Code</a> is way ahead of all the other repos in terms of ‘Issues’, it’s interesting to see that the .NET-only ones have the most ‘Pull-Requests’, notably CoreFX (Base Class Libraries), Roslyn (Compiler) and CoreCLR (Runtime).</p>
<hr />
<h3 id="overall-participation---community-v-microsoft">Overall Participation - Community v. Microsoft</h3>
<p>Next will will look at the <strong>total participation from the last 3 years</strong>, i.e. <strong>November 2014</strong> to <strong>November 2017</strong>. All Pull Requests are Issues are treated equally, so a large PR counts the same as one that fixes a spelling mistake. Whilst this isn’t ideal it’s the simplest way to get an idea of the <strong>Microsoft/Community split</strong>.</p>
<p><strong>Note:</strong> You can hover over the bars to get the actual numbers, rather than percentages.</p>
<body>
<!-- TODO do this in css styles, not inline!! -->
<div class="g-chart-issues">
<span style="font-weight:bold;font-size:large;margin-left:150px;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
<div class="g-chart-pull-requests">
<span style="font-weight:bold;font-size:large;margin-left:150px;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
</body>
<hr />
<h3 id="participation-over-time---community-v-microsoft">Participation over time - Community v. Microsoft</h3>
<p>Finally we can see the ‘per-month’ data from the last 3 years, i.e. <strong>November 2014</strong> to <strong>November 2017</strong>.</p>
<p><strong>Note</strong>: You can inspect different repos by selecting them from the pull-down list, but be aware that the y-axis on the graphs are re-scaled, so the maximum value will change each time.</p>
<div id="issuesGraph">
<!-- TODO do this in css styles, not inline!! -->
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
<!-- <form>
<label><input type="radio" name="mode" value="stacked" checked> Stacked</label>
<label><input type="radio" name="mode" value="grouped"> Grouped</label>
</form> -->
</div>
<div id="pullRequestsGraph">
<!-- TODO do this in css styles, not inline!! -->
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
<!-- <form>
<label><input type="radio" name="mode" value="stacked" checked> Stacked</label>
<label><input type="radio" name="mode" value="grouped"> Grouped</label>
</form> -->
</div>
<hr />
<h2 id="summary">Summary</h2>
<p>It’s clear that the community continues to be invested in the .NET-related, Open Source repositories, contributing significantly and for a sustained period of time. I think this is good for <em>all .NET developers</em>, whether you contribute to OSS or not, having .NET be a <strong>thriving, Open Source product</strong> has many benefits!</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15998856">Hacker News</a> and <a href="https://www.reddit.com/r/programming/comments/7lh19z/open_source_net_3_years_later/">/r/programming</a></p>
A look at the internals of 'Tiered JIT Compilation' in .NET Core2017-12-15T00:00:00+00:00http://www.mattwarren.org/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation
<p>The .NET runtime (CLR) has predominantly used a just-in-time (JIT) compiler to convert your executable into machine code (leaving aside <a href="https://github.com/dotnet/corert/">ahead-of-time (AOT) scenarios</a> for the time being), as the <a href="https://docs.microsoft.com/en-us/dotnet/standard/managed-execution-process">official Microsoft docs say</a>:</p>
<blockquote>
<p>At execution time, <strong>a just-in-time (JIT) compiler translates the MSIL into native code</strong>. During this compilation, code must pass a verification process that examines the MSIL and metadata to find out whether the code can be determined to be type safe.</p>
</blockquote>
<p><strong>But how does that process actually work?</strong></p>
<p>The same docs <a href="https://docs.microsoft.com/en-us/dotnet/standard/managed-execution-process">give us a bit more info</a>:</p>
<blockquote>
<p>JIT compilation takes into account the possibility that some code might never be called during execution. Instead of using time and memory to convert all the MSIL in a PE file to native code, it converts the MSIL as needed during execution and stores the resulting native code in memory so that it is accessible for subsequent calls in the context of that process. The loader <strong>creates and attaches a stub to each method</strong> in a type when the type is loaded and initialized. When a method is called for the first time, <strong>the stub passes control to the JIT compiler</strong>, which converts the MSIL for that method into native code and <strong>modifies the stub to point directly to the generated native code</strong>. Therefore, subsequent calls to the JIT-compiled method go directly to the native code.</p>
</blockquote>
<p><strong>Simple really!!</strong> However if you want to know more, the rest of this post will explore this process in detail.</p>
<p>In addition, we will look at a <strong>new feature that is making its way into the Core CLR</strong>, called ‘<strong>Tiered Compilation</strong>’. This is a big change for the CLR, up till now .NET methods have only been JIT compiled once, on their first usage. Tiered compilation is looking to change that, allowing methods to be re-compiled into a more optimised version much like <a href="http://www.oracle.com/technetwork/java/whitepaper-135217.html">the Java Hotspot compiler</a>.</p>
<hr />
<h2 id="how-it-works">How it works</h2>
<p>But before we look at future plans, <strong>how does the current CLR allow the JIT to transform a method from IL to native code</strong>? Well, they say ‘a pictures speaks a thousand words’</p>
<h4 id="before-the-method-is-jited"><strong>Before the method is JITed</strong></h4>
<p><img src="/images/2017/12/01 - Before JITing.svg" alt="Step 1 - Before JITing" /></p>
<h4 id="after-the-method-has-been-jited"><strong>After the method has been JITed</strong></h4>
<p><img src="/images/2017/12/02 - After JITing - Normal.svg" alt="Step 2 - After JITing - Normal" /></p>
<p>The main things to note are:</p>
<ul>
<li>The CLR has put in a ‘precode’ and ‘stub’ to divert the initial method call to the <code class="language-plaintext highlighter-rouge">PreStubWorker()</code> method (which ultimately calls the JIT). These are hand-written assembly code fragments consisting of only a few instructions.</li>
<li>Once the method had been JITed into ‘native code’, a stable entry point it created. For the rest of the life-time of the method the CLR guarantees that this won’t change, so the rest of the run-time can depend on it remaining stable.</li>
<li>The ‘temporary entry point’ doesn’t go away, it’s still available because there may be other methods that are expecting to call it. However the associated ‘precode fixup’ has been re-written or ‘back patched’ to point to the newly created ‘native code’ instead of <code class="language-plaintext highlighter-rouge">PreStubWorker()</code>.</li>
<li>The CLR doesn’t change the address of the <code class="language-plaintext highlighter-rouge">call</code> instruction in the method that called the method being JITted, it only changes the address inside the ‘precode’. But because all method calls in the CLR go via a precode, the 2nd time the newly JITed method is called, the call will end up at the ‘native code’.</li>
</ul>
<p>For reference, the ‘stable entry point’ is the same memory location as the <code class="language-plaintext highlighter-rouge">IntPtr</code> that is returned when you call the <a href="https://msdn.microsoft.com/en-us/library/system.runtimemethodhandle.getfunctionpointer%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">RuntimeMethodHandle.GetFunctionPointer() method</a>.</p>
<p>If you want to see this process in action for yourself, you can either re-compile the CoreCLR source and add the relevant debug information as I did <strong>or</strong> just use WinDbg and follow the steps <a href="https://blogs.msdn.microsoft.com/abhinaba/2014/09/29/net-just-in-time-compilation-and-warming-up-your-system/">in this excellent blog post</a> (for more on the same topic see <a href="https://blog.matthewskelton.net/2012/01/29/advanced-call-processing-in-the-clr/">‘Advanced Call Processing in the CLR’</a> and Vance Morrison’s excellent write-up <a href="https://blogs.msdn.microsoft.com/vancem/2006/03/13/digging-into-interface-calls-in-the-net-framework-stub-based-dispatch/">‘Digging into interface calls in the .NET Framework: Stub-based dispatch’</a>).</p>
<p>Finally, the different parts of the Core CLR source code that are involved are listed below:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/inc/jithelpers.h#L295-L299">JIT Helpers for ‘PrecodeFixupThunk’</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/i386/asmhelpers.asm#L888-L907">PrecodeFixupThunk (i386 assembly)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/i386/asmhelpers.asm#L1739-L1769">ThePreStub (i386 assembly)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/prestub.cpp#L1027-L1140">PreStubWorker(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/prestub.cpp#L1178-L1707">MethodDesc::DoPrestub(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/prestub.cpp#L67-L193">MethodDesc::DoBackpatch(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/method.cpp#L5170-L5189">MethodDesc::SetStableEntryPointInterlocked(..)</a></li>
</ul>
<p><strong>Note:</strong> this post isn’t going to look at how the JIT itself works, if you are interested in that take a look as this <a href="https://github.com/CarolEidt/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md#ryujit-high-level-overview">excellent overview</a> written by one of the main developers.</p>
<hr />
<h3 id="jit-and-execution-engine-ee-interaction">JIT and Execution Engine (EE) Interaction</h3>
<p>The make all this work the JIT and the EE have to work together, to get an idea of what is involved, take a look at this comment describing the <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/inc/corinfo.h#L1426-L1514">rules that determine which type of precode the JIT can use</a>. All this info is stored in the EE as it’s the only place that has the full knowledge of what a method does, so the JIT has to ask which mode to work in.</p>
<p>In addition, the JIT has to ask the EE what the address of a functions entry point is, this is done via the following methods:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/jitinterface.cpp#L8872-L8923">CEEInfo::getFunctionEntryPoint(..)</a>
<ul>
<li>Then calls <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/method.cpp#L2218-L2324">MethodDesc::TryGetMultiCallableAddrOfCode(..)</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/jitinterface.cpp#L8925-L8955">CEEInfo::getFunctionFixedEntryPoint(..)</a>
<ul>
<li>Then calls <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/method.cpp#L2187-L2209">MethodDesc::GetMultiCallableAddrOfCode(..)</a></li>
</ul>
</li>
</ul>
<hr />
<h3 id="precode-and-stubs">Precode and Stubs</h3>
<p>There are different types or ‘precode’ available, ‘FIXUP’, ‘REMOTING’ or ‘STUB’, you can see the rules for which one is used in <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/method.cpp#L5747-L5773">MethodDesc::GetPrecodeType()</a>. In addition, because they are such a low-level mechanism, they are implemented differently across CPU architectures, from a <a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/method.hpp#L2023-L2035">comment in the code</a>:</p>
<blockquote>
<p>There two implementation options for temporary entrypoints:</p>
<p>(1) Compact entrypoints. They provide as dense entrypoints as possible, but can’t be patched to point to the final code. The call to unjitted method is indirect call via slot.</p>
<p>(2) Precodes. The precode will be patched to point to the final code eventually, thus the temporary entrypoint can be embedded in the code.
The call to unjitted method is direct call to direct jump.</p>
<p>We use (1) for x86 and (2) for 64-bit to get the best performance on each platform. For ARM (1) is used.</p>
</blockquote>
<p>There’s also a whole lot more information about ‘precode’ available <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">in the BOTR</a>.</p>
<p>Finally, it turns out that you can’t go very far into the internals of the CLR without coming across ‘stubs’ (or ‘trampolines’, ‘thunks’, etc), for instance they’re used in</p>
<ul>
<li><a href="http://mattwarren.org/2017/01/25/How-do-.NET-delegates-work/#creation-of-the-delegate-invoke-method">Delegates</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md#stubs">Virtual Method (Interface) Dispatch</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/jump-stubs.md">Jump Stubs</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/Documentation/botr/clr-abi.md#generics">Shared Generics</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/release/2.0.0/src/vm/stubmgr.cpp#L1360-L1388">Dll Import callbacks</a></li>
<li>and probably some more I’ve missed!</li>
</ul>
<hr />
<h2 id="tiered-compilation">Tiered Compilation</h2>
<p>Before we go any further I want to point out that <strong>Tiered Compilation</strong> is very much work-in-progress. As an indication, to get it working you currently have to set an environment variable called <code class="language-plaintext highlighter-rouge">COMPLUS_EXPERIMENTAL_TieredCompilation</code>. It appears that the current work is focussed on the infrastructure to make it possible (i.e. CLR changes), then I assume that there has to be a fair amount of testing and performance analysis before it’s enabled by default.</p>
<p>If you want to learn about the goals of the feature and how it fits into the wider process of ‘code versioning’, I recommend reading the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/code-versioning.md">excellent design docs</a>, including the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/code-versioning.md#future-roadmap-possibilities">future roadmap possibilities</a>.</p>
<p>To give an indications of what has been involved so far, there has been work going on in the:</p>
<ul>
<li><strong>Debugger</strong> (e.g. <a href="https://github.com/dotnet/coreclr/issues/14427">Breakpoints aren’t hit if tiered jitting recompiled the method before the debugger was attached</a> and <a href="https://github.com/dotnet/coreclr/issues/14423">Source line breakpoints stop working when tiered jitting replaces the code</a>)</li>
<li><strong>Profiling APIs</strong> - e.g. <a href="https://github.com/dotnet/coreclr/issues/12610">Tiered jitting: Implement additional profiler APIs</a></li>
<li><strong>Diagnostics</strong> - (all tracked via <a href="https://github.com/dotnet/coreclr/issues/12612">Tiered jitting: Design/Implement appropriate diagnostics</a>, e.g. <a href="https://github.com/dotnet/coreclr/issues/14947">Tiered Jitting: Fix IL to native mapping for ETW</a>)</li>
<li><strong>Interpreter</strong> - <a href="http://mattwarren.org/2017/03/30/The-.NET-IL-Interpreter/">yes the CLR has a built-in Interpreter</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=tiered+compilation&type=">Many other places</a></li>
</ul>
<p>If you want to follow along you can <a href="https://github.com/dotnet/coreclr/search?q=tiered+compilation&type=Issues&utf8=%E2%9C%93">take a look at the related issues/PRs</a>, here are the main ones to get you started:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/pull/10478">Tiered Compilation step 1</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/12193">WIP - Tiered Jitting Part Deux</a></li>
<li><a href="https://github.com/dotnet/coreclr/pulls?q=is%3Apr+author%3Anoahfalk">All PRs by Noah Falk</a> (main Microsoft Developer working on the feature)</li>
</ul>
<p>There is also some nice background information available in <a href="https://github.com/dotnet/coreclr/issues/4331">Introduce a tiered JIT</a> and if you want to understand how it will eventually makes use of changes in the JIT (‘MinOpts’), take a look at <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/performance/JitOptimizerTodoAssessment.md#low-tier-back-off">Low Tier Back-Off</a> and <a href="https://github.com/dotnet/coreclr/pull/15046">JIT: enable aggressive inline policy for Tier1</a>.</p>
<hr />
<h3 id="history---rejit">History - ReJIT</h3>
<p>As an quick historical aside, you have previously been able to get the CLR to <a href="https://blogs.msdn.microsoft.com/davbr/2011/10/12/rejit-a-how-to-guide/">re-JIT a method for you</a>, but it only worked with the <a href="https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview#profiling_api">Profiling APIs</a>, which meant you had to write some C/C++ COM code to make it happen! In addition ReJIT only allowed the method to be re-compiled at the same level, so it wouldn’t ever produce more optimised code. It was mostly meant to help <a href="https://blogs.msdn.microsoft.com/davbr/2011/10/10/rejit-limitations-in-net-4-5/">monitoring or profiling tools</a>.</p>
<hr />
<h2 id="how-it-works-1">How it works</h2>
<p>Finally, how does it work, again lets look at some diagrams. Firstly, as a recap, lets take a look at how things ends up once a method had been JITed, with <strong>tiered compilation turned off</strong> (the same diagram as above):</p>
<p><img src="/images/2017/12/02 - After JITing - Normal.svg" alt="Step 2 - After JITing - Normal" /></p>
<p>Now, as a comparison, here’s what the same stage looks like with <strong>tiered compilation enabled</strong>:</p>
<p><img src="/images/2017/12/03 - After JITing - Tiered Compilation.svg" alt="Step 3 - After JITing - Tiered Compilation" /></p>
<p>The main difference is that tiered compilation has forced the method call to go through another level of indirection, the ‘pre stub’. This is to make it possible to count the number of times the method is called, then once it has hit the threshold (<a href="https://github.com/dotnet/coreclr/blob/5d91c4d2cc8fe60bad20cdfdf2e5f239bc024061/src/vm/tieredcompilation.cpp#L84">currently 30</a>), the ‘pre stub’ is re-written to point to the ‘optimised native code’ instead:</p>
<p><img src="/images/2017/12/04 - After Tiered Compilation Optimisation.svg" alt="Step 4 - 04 - After Tiered Compilation Optimisation" /></p>
<p>Note that the original ‘native code’ is still available, so if needed the changes can be reverted and the method call can go back to the unoptimised version.</p>
<hr />
<h3 id="using-a-counter">Using a counter</h3>
<p>We can see a bit more details about the counter in this comments from <a href="https://github.com/dotnet/coreclr/blob/5d91c4d2cc8fe60bad20cdfdf2e5f239bc024061/src/vm/prestub.cpp#L1702-L1715">prestub.cpp</a>:</p>
<pre><code class="language-cplusplus"> /*************************** CALL COUNTER ***********************/
// If we are counting calls for tiered compilation, leave the prestub
// in place so that we can continue intercepting method invocations.
// When the TieredCompilationManager has received enough call notifications
// for this method only then do we back-patch it.
BOOL fCanBackpatchPrestub = TRUE;
#ifdef FEATURE_TIERED_COMPILATION
BOOL fEligibleForTieredCompilation = IsEligibleForTieredCompilation();
if (fEligibleForTieredCompilation)
{
CallCounter * pCallCounter = GetCallCounter();
fCanBackpatchPrestub = pCallCounter->OnMethodCalled(this);
}
#endif
</code></pre>
<p>In essence the ‘stub’ calls back into the <a href="https://github.com/dotnet/coreclr/blob/5d91c4d2cc8fe60bad20cdfdf2e5f239bc024061/src/vm/tieredcompilation.cpp">TieredCompilationManager</a> until the ‘tiered compilation’ is triggered, once that happens the ‘stub’ is ‘back-patched’ to stop it being called any more.</p>
<hr />
<h3 id="why-not-interpreted">Why not ‘Interpreted’?</h3>
<p>If you’re wondering why tiered compilation doesn’t have an interpreted mode, you’re not alone, I asked the <a href="https://github.com/dotnet/coreclr/pull/10478#issuecomment-289394905">same question</a> (for more info see <a href="http://mattwarren.org/2017/03/30/The-.NET-IL-Interpreter/">my previous post on the .NET Interpreter</a>)</p>
<p>And the <a href="https://github.com/dotnet/coreclr/pull/10478#issuecomment-289412414">answer I got was</a>:</p>
<blockquote>
<blockquote>
<p>There’s already an Interpreter available, or is it not considered suitable for production code?</p>
</blockquote>
<p>Its a fine question, but you guessed correctly - the interpreter is not in good enough shape to run production code as-is. There are also some significant issues if you want debugging and profiling tools to work (which we do). Given enough time and effort it is all solvable, it just isn’t the easiest place to start.</p>
<blockquote>
<p>How different is the overhead between non-optimised and optimised JITting?</p>
</blockquote>
<p>On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes, but of course I expect results will vary by workload and hardware. Getting this first step checked in should make it easier to collect better measurements.</p>
</blockquote>
<p>But that’s from a few months ago, maybe <a href="http://www.mono-project.com/news/2017/11/13/mono-interpreter/">Mono’s New .NET Interpreter</a> will change things, <a href="https://twitter.com/matthewwarren/status/930397571478183937">who knows</a>?</p>
<hr />
<h3 id="why-not-llvm">Why not LLVM?</h3>
<p>Finally, why aren’t they using a LLVM to compile the code, from <a href="https://github.com/dotnet/coreclr/issues/4331#issuecomment-313179155">Introduce a tiered JIT (comment)</a></p>
<blockquote>
<p>There were (and likely still are) <strong>significant differences in the LLVM support needed for the CLR versus what is needed for Java</strong>, both in GC and in EH, and in the restrictions one must place on the optimizer. To cite just one example: the CLRs GC currently cannot tolerate managed pointers that point off the end of objects. Java handles this via a base/derived paired reporting mechanism. We’d either need to plumb support for this kind of paired reporting into the CLR or restrict LLVM’s optimizer passes to never create these kinds of pointers. On top of that, the LLILC jit was slow and we weren’t sure ultimately what kind of code quality it might produce.</p>
<p>So, figuring out how LLILC might fit into a potential multi-tier approach that did not yet exist seemed (and still seems) premature. <strong>The idea for
now is to get tiering into the framework and use RyuJit for the second-tier jit</strong>. As we learn more, we may discover there is indeed room for higher tier jits, or, at least, understand better what else we need to do before such things make sense.</p>
</blockquote>
<p>There is more background info in <a href="https://github.com/dotnet/coreclr/issues/4331">Introduce a tiered JIT</a></p>
<hr />
<h2 id="summary">Summary</h2>
<p>One of my favourite side-effects of Microsoft making .NET Open Source and developing out in the open is that we can follow along with work-in-progress features. It’s great being able to download the latest code, try them out and see how they work under-the-hood, yay for OSS!!</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15955505">Hacker News</a></p>
Exploring the BBC micro:bit Software Stack2017-11-28T00:00:00+00:00http://www.mattwarren.org/2017/11/28/Exploring-the-BBC-microbit-Software-Stack
<p>If you grew up in the UK and went to school during the 1980’s or 1990’s there’s a good chance that this picture brings back fond memories:</p>
<p><img src="http://www.classicacorn.freeuk.com/8bit_focus/logo/logo_8.jpg" alt="BBC Micro and a Turtle" /></p>
<p>(image courtesy of <a href="http://www.classicacorn.freeuk.com/">Classic Acorn</a>)</p>
<p>I’d imagine that for a large amount of computer programmers (currently in their 30’s) the BBC Micro was their first experience of programming. If this applies to you and you want a trip down memory lane, have a read of <a href="https://www.geeksaresexy.net/2009/10/22/remembering-the-bbc-micro/">Remembering: The BBC Micro</a> and <a href="https://www.retro-kit.co.uk/page.cfm/content/The-BBC-Micro-in-Education/">The BBC Micro in my education</a>.</p>
<p>Programming the classic <a href="https://angrytechnician.wordpress.com/2009/07/23/relic/">Turtle</a> was done in <a href="http://www.walkingrandomly.com/?p=13">Logo</a>, with code like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FORWARD 100
LEFT 90
FORWARD 100
LEFT 90
FORWARD 100
LEFT 90
FORWARD 100
LEFT 90
</code></pre></div></div>
<p>Of course, once you knew what you were doing, you would re-write it like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>REPEAT 4 [FORWARD 100 LEFT 90]
</code></pre></div></div>
<hr />
<h2 id="bbc-microbit">BBC micro:bit</h2>
<p>The original Micro was launched as an education tool, as part of the <a href="http://www.swansea.ac.uk/library/archive-and-research-collections/hocc/computersandsoftware/earlyhomecomputers/bbcmicro/">BBC’s Computer Literacy Project</a> and by most accounts was a big success. As a follow-up, in March 2016 the <a href="http://www.bbc.co.uk/mediacentre/latestnews/2016/bbc-micro-bit-schools-launch">micro:bit was launched</a> as part of the BBC’s ‘Make it Digital’ initiative and 1 million devices were given out to schools and libraries in the UK to ‘help develop a new generation of digital pioneers’ (i.e. get them into programming!)</p>
<p><strong>Aside</strong>: I love the difference in branding across 30 years, ‘<em>BBC Micro</em>’ became ‘<em>BBC micro:bit</em>’ (you must include the colon) and ‘<em>Computer Literacy Project</em>’ changed to the ‘<em>Make it Digital Initiative</em>’.</p>
<p><a href="http://www.bbc.co.uk/mediacentre/mediapacks/microbit/specs"><img src="/images/2017/11/BBC microbit hardware specification.jpg" alt="BBC microbit hardware specification" /></a></p>
<p>A few weeks ago I walked into my local library, <a href="http://microbit.org/en/2017-10-23-libraries/">picked up a nice starter kit</a> and then spent a fun few hours watching my son play around with it (I’m worried about how quickly he picked up the basics of programming, I think I might be out of a job in a few years time!!)</p>
<p>However once he’d gone to bed it was all mine! The result of my ‘playing around’ is this post, in it I will be exploring the <strong>software stack</strong> that makes up the micro:bit, what’s in it, what it does and how it all fits together.</p>
<p>If you want to learn about how to program the micro:bit, its hardware or anything else, take a look at this <a href="https://github.com/carlosperate/awesome-microbit">excellent list of resources</a>.</p>
<hr />
<p>Slightly off-topic, but if you enjoy reading <strong>source code</strong> you might like these other posts:</p>
<ul>
<li><a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/?recommended=1">The 68 things the CLR does before executing a single line of your code</a></li>
<li><a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/?recommended=1">A Hitchhikers Guide to the CoreCLR Source Code</a></li>
<li><a href="/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/?recommended=1">DotNetAnywhere: An Alternative .NET Runtime</a></li>
</ul>
<hr />
<h1 id="bbc-microbit-software-stack">BBC micro:bit Software Stack</h1>
<p>If we take a <em>high-level</em> view at the stack, it divides up into 3 discrete <strong>software</strong> components that all sit on top of the <strong>hardware</strong> itself:</p>
<p><img src="/images/2017/11/BBC Microbit Software Stack.png" alt="BBC Microbit Software Stack.png" /></p>
<p>If you would like to build this stack for yourself take a look at the <a href="https://lancaster-university.github.io/microbit-docs/offline-toolchains">Building with Yotta guide</a>. I also found this post describing <a href="https://hackernoon.com/the-first-video-game-on-the-bbc-micro-bit-probably-4175fab44da8">The First Video Game on the BBC micro:bit [probably]</a> very helpful.</p>
<hr />
<h2 id="runtimes">Runtimes</h2>
<p>There are several high-level <em>runtimes</em> available, these are useful because they let you write code in a language other than C/C++ or even create programs by <a href="https://www.microbit.co.uk/blocks/editor">dragging <em>blocks</em> around on a screen</a>. The main ones that I’ve come across are below (see <a href="https://github.com/carlosperate/awesome-microbit#programming">‘Programming’</a> for a full list):</p>
<ul>
<li><strong>Python</strong> via <a href="https://github.com/bbcmicrobit/micropython/">MicroPython</a></li>
<li><strong>JavaScript</strong> with <a href="https://github.com/Microsoft/pxt-microbit">Microsoft Programming Experience Toolkit (PXT)</a>
<ul>
<li>well actually it’s <a href="https://makecode.com/language"><strong>TypeScript</strong></a>, which is good, we wouldn’t want to rot the brains of impressionable young children with the <a href="https://www.destroyallsoftware.com/talks/wat">horrors of Javascript - Wat!!</a></li>
</ul>
</li>
</ul>
<p>They both work in a similar way, the users code (Python or TypeScript) is bundled up along with the C/C++ code of the runtime itself and then the entire binary (hex) file is deployed to the micro:bit. When the device starts up, the runtime then looks for the users code at a known location in memory and starts interpreting it.</p>
<p><strong>Update</strong> It turns out that I was wrong about the Microsoft PXT, it actually <a href="https://makecode.com/language#static-compilation-vs-a-dynamic-vm">compiles your TypeScript program to native code</a>, very cool! Interestingly, they did it that way because:</p>
<blockquote>
<p>Compared to a typical dynamic JavaScript engine, PXT compiles code statically, giving rise to significant time and space performance improvements:</p>
<ul>
<li>user programs are compiled directly to machine code, and are never in any byte-code form that needs to be interpreted; this results in much faster execution than a typical JS interpreter</li>
<li>there is no RAM overhead for user-code - all code sits in flash; in a dynamic VM there are usually some data-structures representing code</li>
<li>due to lack of boxing for small integers and static class layout the memory consumption for objects is around half the one you get in a dynamic VM (not counting the user-code structures mentioned above)</li>
<li>while there is some runtime support code in PXT, it’s typically around 100KB smaller than a dynamic VM, bringing down flash consumption and leaving more space for user code</li>
</ul>
<p><strong>The execution time, RAM and flash consumption of PXT code is as a rule of thumb 2x of compiled C code, making it competitive to write drivers and other user-space libraries.</strong></p>
</blockquote>
<hr />
<h2 id="memory-layout">Memory Layout</h2>
<p>Just before we go onto the other parts of the software stack I want to take a deeper look at the memory layout. This is important because memory is so constrained on the micro:bit, there is <em>only</em> 16KB of RAM. To put that into perspective, we’ll use the calculation from this StackOverflow question <a href="https://stackoverflow.com/questions/5999821/how-many-bytes-of-memory-is-a-tweet/5999852#5999852">How many bytes of memory is a tweet?</a></p>
<blockquote>
<p>Twitter uses UTF-8 encoded messages. UTF-8 code points can be up to six four octets long, making the maximum message size <strong>140 x 4 = 560 8-bit bytes</strong>.</p>
</blockquote>
<p>If we re-calculate for the newer, longer tweets <strong>280 x 4 = 1,120 bytes</strong>. So we could only fit <strong>10 tweets</strong> into the available RAM on the micro:bit (it turns out that only ~11K out of the total 16K is available for general use). Which is why it’s worth using a <a href="https://github.com/lancaster-university/microbit-dal/issues/323">custom version of atoi() to save 350 bytes of RAM</a>!</p>
<p>The memory layout is specified by the linker at compile-time using <a href="https://github.com/lancaster-university/microbit-targets/blob/master/bbc-microbit-classic-gcc-nosd/ld/NRF51822.ld#L6">NRF51822.ld</a>, there is a <a href="/data/2017/11/microbit-samples.map">sample output available</a> if you want to take a look. Because it’s done at compile-time you run into build errors such as <a href="https://github.com/bbcmicrobit/micropython/issues/363">“region RAM overflowed with stack”</a> if you configure it incorrectly.</p>
<p>The table below shows the memory layout from the ‘no SD’ version of a ‘Hello World’ app, i.e. with the maximum amount of RAM available as the Bluetooth (BLE) Soft-Device (SD) support has been removed. By comparison with BLE enabled, you instantly have <a href="https://github.com/lancaster-university/microbit-targets/blob/master/bbc-microbit-classic-gcc/ld/NRF51822.ld#L6">8K less RAM available</a>, so things start to get tight!</p>
<table>
<thead>
<tr>
<th style="text-align: right">Name</th>
<th style="text-align: right">Start Address</th>
<th style="text-align: right">End Address</th>
<th style="text-align: right">Size</th>
<th style="text-align: right">Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">.data</td>
<td style="text-align: right">0x20000000</td>
<td style="text-align: right">0x20000098</td>
<td style="text-align: right">152 bytes</td>
<td style="text-align: right">0.93%</td>
</tr>
<tr>
<td style="text-align: right">.bss</td>
<td style="text-align: right">0x20000098</td>
<td style="text-align: right">0x20000338</td>
<td style="text-align: right">672 bytes</td>
<td style="text-align: right">4.10%</td>
</tr>
<tr>
<td style="text-align: right">Heap (mbed)</td>
<td style="text-align: right">0x20000338</td>
<td style="text-align: right">0x20000b38</td>
<td style="text-align: right">2,048 bytes</td>
<td style="text-align: right">12.50%</td>
</tr>
<tr>
<td style="text-align: right">Empty</td>
<td style="text-align: right">0x20000b38</td>
<td style="text-align: right">0x20003800</td>
<td style="text-align: right">11,464 bytes</td>
<td style="text-align: right">69.97%</td>
</tr>
<tr>
<td style="text-align: right">Stack</td>
<td style="text-align: right">0x20003800</td>
<td style="text-align: right">0x20004000</td>
<td style="text-align: right">2,048 bytes</td>
<td style="text-align: right">12.50%</td>
</tr>
</tbody>
</table>
<p>For more info on the column names see the Wikipedia pages for <a href="https://en.wikipedia.org/wiki/Data_segment">.data</a> and <a href="https://en.wikipedia.org/wiki/.bss">.bss</a> as well as <a href="https://mcuoneclipse.com/2013/04/14/text-data-and-bss-code-and-data-size-explained/">text, data and bss: Code and Data Size Explained</a></p>
<p>As a comparison there is a nice image of the micro:bit RAM Layout <a href="https://hackernoon.com/the-first-video-game-on-the-bbc-micro-bit-probably-4175fab44da8#5fea">in this article</a>. It shows what things look like when running MicroPython and you can clearly see the main Python heap in the centre <a href="https://github.com/bbcmicrobit/micropython/blob/master/source/microbit/mprun.c#L95-L104">taking up all the remaining space</a>.</p>
<hr />
<h2 id="microbit-dal"><a href="https://github.com/lancaster-university/microbit-dal">microbit-dal</a></h2>
<p>Sitting in the stack below the high-level runtime is the <em>device abstraction layer</em> (DAL), created at <a href="https://github.com/lancaster-university">Lancaster University</a> in the UK, it’s made up of 4 main components:</p>
<ul>
<li><a href="https://github.com/lancaster-university/microbit-dal/tree/master/source/core"><strong>core</strong></a>
<ul>
<li>High-level components, such as <code class="language-plaintext highlighter-rouge">Device</code>, <code class="language-plaintext highlighter-rouge">Font</code>, <code class="language-plaintext highlighter-rouge">HeapAllocator</code>, <code class="language-plaintext highlighter-rouge">Listener</code> and <code class="language-plaintext highlighter-rouge">Fiber</code>, often implemented on-top of 1 or more <code class="language-plaintext highlighter-rouge">driver</code> classes</li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/microbit-dal/tree/master/source/types"><strong>types</strong></a>
<ul>
<li>Helper types such as <code class="language-plaintext highlighter-rouge">ManagedString</code>, <code class="language-plaintext highlighter-rouge">Image</code>, <code class="language-plaintext highlighter-rouge">Event</code> and <code class="language-plaintext highlighter-rouge">PacketBuffer</code></li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/microbit-dal/tree/master/source/drivers"><strong>drivers</strong></a>
<ul>
<li>For control of a specific hardware component, such as <code class="language-plaintext highlighter-rouge">Accelerometer</code>, <code class="language-plaintext highlighter-rouge">Button</code>, <code class="language-plaintext highlighter-rouge">Compass</code>, <code class="language-plaintext highlighter-rouge">Display</code>, <code class="language-plaintext highlighter-rouge">Flash</code>, <code class="language-plaintext highlighter-rouge">IO</code>, <code class="language-plaintext highlighter-rouge">Serial</code> and <code class="language-plaintext highlighter-rouge">Pin</code></li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/microbit-dal/tree/master/source/bluetooth"><strong>bluetooth</strong></a>
<ul>
<li>All the code for the <a href="https://www.kitronik.co.uk/blog/bbc-microbit-bluetooth-low-energy/">Bluetooth Low Energy</a> (BLE) stack that is <a href="https://lancaster-university.github.io/microbit-docs/ble/profile/">shipped with the micro:bit</a></li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/microbit-dal/tree/master/source/asm"><strong>asm</strong></a>
<ul>
<li>Just 4 functions are implemented in assembly, they are <code class="language-plaintext highlighter-rouge">swap_context</code>, <code class="language-plaintext highlighter-rouge">save_context</code>, <code class="language-plaintext highlighter-rouge">save_register_context</code> and <code class="language-plaintext highlighter-rouge">restore_register_context</code>. As the names suggest, they handle the ‘context switching’ necessary to make the <a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/core/MicroBitFiber.cpp">MicroBit Fiber scheduler</a> work</li>
</ul>
</li>
</ul>
<p>The image below shows the distribution of ‘Lines of Code’ (LOC), as you can see the majority of the code is in the <code class="language-plaintext highlighter-rouge">drivers</code> and <code class="language-plaintext highlighter-rouge">bluetooth</code> components.</p>
<p><img src="/images/2017/11/LocMetricsPie-microbit-dal.png" alt="LOC Metrics Pie - microbit-dal" /></p>
<p>In addition to providing nice helper classes for working with the underlying devices, the DAL provides the <code class="language-plaintext highlighter-rouge">Fiber</code> abstraction to allows asynchronous functions to work. This is useful because you can asynchronously display text on the LED display and your code won’t block whilst it’s <em>scrolling</em> across the screen. In addition the <code class="language-plaintext highlighter-rouge">Fiber</code> class is used to handle the interrupts that signal when the buttons on the micro:bit are pushed. This comment from the code clearly lays out what the <a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/core/MicroBitFiber.cpp">Fiber scheduler</a> does:</p>
<blockquote>
<p>This lightweight, <strong>non-preemptive scheduler</strong> provides a <strong>simple threading mechanism</strong> for two main purposes:</p>
<p>1) To provide a clean abstraction for application languages to use when building async behaviour (callbacks).
2) To provide ISR decoupling for EventModel events generated in an ISR context.</p>
</blockquote>
<p>Finally the high-level classes <a href="https://github.com/lancaster-university/microbit/blob/master/source/MicroBit.cpp">MicroBit.cpp</a> and <a href="https://github.com/lancaster-university/microbit/blob/master/inc/MicroBit.h">MicroBit.h</a> are housed in the <a href="https://github.com/lancaster-university/microbit">microbit repository</a>. These classes define the API of the MicroBit runtime and setup the default configuration, as shown in the <code class="language-plaintext highlighter-rouge">Constructor</code> of <code class="language-plaintext highlighter-rouge">MicroBit.cpp</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
* Constructor.
*
* Create a representation of a MicroBit device, which includes member variables
* that represent various device drivers used to control aspects of the micro:bit.
*/</span>
<span class="n">MicroBit</span><span class="o">::</span><span class="n">MicroBit</span><span class="p">()</span> <span class="o">:</span>
<span class="n">serial</span><span class="p">(</span><span class="n">USBTX</span><span class="p">,</span> <span class="n">USBRX</span><span class="p">),</span>
<span class="n">resetButton</span><span class="p">(</span><span class="n">MICROBIT_PIN_BUTTON_RESET</span><span class="p">),</span>
<span class="n">storage</span><span class="p">(),</span>
<span class="n">i2c</span><span class="p">(</span><span class="n">I2C_SDA0</span><span class="p">,</span> <span class="n">I2C_SCL0</span><span class="p">),</span>
<span class="n">messageBus</span><span class="p">(),</span>
<span class="n">display</span><span class="p">(),</span>
<span class="n">buttonA</span><span class="p">(</span><span class="n">MICROBIT_PIN_BUTTON_A</span><span class="p">,</span> <span class="n">MICROBIT_ID_BUTTON_A</span><span class="p">),</span>
<span class="n">buttonB</span><span class="p">(</span><span class="n">MICROBIT_PIN_BUTTON_B</span><span class="p">,</span> <span class="n">MICROBIT_ID_BUTTON_B</span><span class="p">),</span>
<span class="n">buttonAB</span><span class="p">(</span><span class="n">MICROBIT_ID_BUTTON_A</span><span class="p">,</span><span class="n">MICROBIT_ID_BUTTON_B</span><span class="p">,</span> <span class="n">MICROBIT_ID_BUTTON_AB</span><span class="p">),</span>
<span class="n">accelerometer</span><span class="p">(</span><span class="n">i2c</span><span class="p">),</span>
<span class="n">compass</span><span class="p">(</span><span class="n">i2c</span><span class="p">,</span> <span class="n">accelerometer</span><span class="p">,</span> <span class="n">storage</span><span class="p">),</span>
<span class="n">compassCalibrator</span><span class="p">(</span><span class="n">compass</span><span class="p">,</span> <span class="n">accelerometer</span><span class="p">,</span> <span class="n">display</span><span class="p">),</span>
<span class="n">thermometer</span><span class="p">(</span><span class="n">storage</span><span class="p">),</span>
<span class="n">io</span><span class="p">(</span><span class="n">MICROBIT_ID_IO_P0</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P1</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P2</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P3</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P4</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P5</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P6</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P7</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P8</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P9</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P10</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P11</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P12</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P13</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P14</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P15</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P16</span><span class="p">,</span><span class="n">MICROBIT_ID_IO_P19</span><span class="p">,</span>
<span class="n">MICROBIT_ID_IO_P20</span><span class="p">),</span>
<span class="n">bleManager</span><span class="p">(</span><span class="n">storage</span><span class="p">),</span>
<span class="n">radio</span><span class="p">(),</span>
<span class="n">ble</span><span class="p">(</span><span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<h2 id="mbed-classic"><a href="https://github.com/lancaster-university/mbed-classic">mbed-classic</a></h2>
<p>The software at the bottom of the stack is making use of the <a href="https://github.com/ARMmbed/mbed-os">ARM mbed OS</a> which is:</p>
<blockquote>
<p>.. an open-source embedded operating system designed for the “things” in the Internet of Things (IoT). mbed OS includes the features you need to develop a connected product using an ARM Cortex-M microcontroller.</p>
<p>mbed OS provides a platform that includes:</p>
<ul>
<li>Security foundations.</li>
<li>Cloud management services.</li>
<li>Drivers for sensors, I/O devices and connectivity.</li>
</ul>
<p>mbed OS is modular, configurable software that you can customize it to your device and to reduce memory requirements by excluding unused software.</p>
</blockquote>
<p>We can see this from the layout of it’s source, it’s based around <code class="language-plaintext highlighter-rouge">common</code> components, which can be combined with a <code class="language-plaintext highlighter-rouge">hal</code> (Hardware Abstraction Layers) and a <code class="language-plaintext highlighter-rouge">target</code> specific to the hardware you are running on.</p>
<ul>
<li><a href="https://github.com/lancaster-university/mbed-classic/tree/master/api"><strong>api</strong></a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/tree/master/common"><strong>common</strong></a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/tree/master/hal"><strong>hal</strong></a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/tree/master/targets"><strong>targets</strong></a></li>
</ul>
<p>More specifically the micro:bit uses the <code class="language-plaintext highlighter-rouge">yotta target bbc-microbit-classic-gcc</code>, but it can also use <a href="https://github.com/lancaster-university/microbit-targets">others targets as needed</a>.</p>
<p>For reference, here are the files from the <code class="language-plaintext highlighter-rouge">common</code> section of <code class="language-plaintext highlighter-rouge">mbed</code> that are used by the <code class="language-plaintext highlighter-rouge">micro:bit-dal</code>:</p>
<ul>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/board.c">board.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/error.c">error.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/FileBase.cpp">FileBase.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/FilePath.cpp">FilePath.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/FileSystemLike.cpp">FileSystemLike.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/gpio.c">gpio.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/I2C.cpp">I2C.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/InterruptIn.cpp">InterruptIn.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/pinmap_common.c">pinmap_common.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/RawSerial.cpp">RawSerial.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/SerialBase.cpp">SerialBase.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/Ticker.cpp">Ticker.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/ticker_api.c">ticker_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/Timeout.cpp">Timeout.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/Timer.cpp">Timer.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/TimerEvent.cpp">TimerEvent.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/us_ticker_api.c">us_ticker_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/common/wait_api.c">wait_api.c</a></li>
</ul>
<p>And here are the hardware specific files, targeting the <code class="language-plaintext highlighter-rouge">NORDIC - MCU NRF51822</code>:</p>
<ul>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/analogin_api.c">analogin_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/gpio_api.c">gpio_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/gpio_irq_api.c">gpio_irq_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/i2c_api.c">i2c_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/pinmap.c">pinmap.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/port_api.c">port_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/pwmout_api.c">pwmout_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/cmsis/TARGET_NORDIC/TARGET_MCU_NRF51822/TOOLCHAIN_ARM_STD/sys.cpp">retarget.cpp</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/serial_api.c">serial_api.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/cmsis/TARGET_NORDIC/TARGET_MCU_NRF51822/TOOLCHAIN_GCC_ARM/startup_NRF51822.S">startup_NRF51822.S</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/cmsis/TARGET_NORDIC/TARGET_MCU_NRF51822/system_nrf51.c">system_nrf51.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/twi_master.c">twi_master.c</a></li>
<li><a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/us_ticker.c">us_ticker.c</a></li>
</ul>
<hr />
<h2 id="end-to-end-or-top-to-bottom">End-to-end (or top-to-bottom)</h2>
<p>Finally, lets look a few examples of how the different components within the stack are used in specific scenarios</p>
<h3 id="writing-to-the-display">Writing to the Display</h3>
<ul>
<li><a href="https://github.com/lancaster-university/microbit-dal"><strong>microbit-dal</strong></a>
<ul>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/drivers/MicroBitDisplay.cpp">MicroBitDisplay.cpp</a>, handles scrolling, asynchronous updates and other high-level tasks, before handing off to:
<ul>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/core/MicroBitFont.cpp">MicroBitFont.cpp</a></li>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/types/MicroBitImage.cpp">MicroBitImage.cpp</a></li>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/inc/drivers/MicroBitMatrixMaps.h">MicroBitMatrixMaps.h</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/mbed-classic"><strong>mbed-classic</strong></a>
<ul>
<li><code class="language-plaintext highlighter-rouge">void port_write(port_t *obj, int value)</code> in <a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/port_api.c">port_api.c</a> (‘NORDIC NRF51822’ version), via a call to <code class="language-plaintext highlighter-rouge">void write(int value)</code> in <a href="https://github.com/lancaster-university/mbed-classic/blob/master/api/PortOut.h">PortOut.h</a>, using info from <a href="https://github.com/lancaster-university/mbed-classic/blob/master/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/TARGET_NRF51_MICROBIT/PinNames.h">PinNames.h</a></li>
</ul>
</li>
</ul>
<h3 id="storing-files-on-the-flash-memory">Storing files on the Flash memory</h3>
<ul>
<li><a href="https://github.com/lancaster-university/microbit-dal"><strong>microbit-dal</strong></a>
<ul>
<li>Provides the high-level abstractions, such as:</li>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/drivers/MicroBitFileSystem.cpp">FileSystem</a></li>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/drivers/MicroBitFile.cpp">File</a></li>
<li><a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/drivers/MicroBitFlash.cpp">Flash</a></li>
</ul>
</li>
<li><a href="https://github.com/lancaster-university/mbed-classic"><strong>mbed-classic</strong></a>
<ul>
<li>Allows low-level control of the hardware, such as writing to the flash itself either directly or via the SoftDevice (SD) layer</li>
</ul>
</li>
</ul>
<p>In addition, this comment from <a href="https://github.com/lancaster-university/microbit-dal/blob/master/source/drivers/MicroBitStorage.h">MicroBitStorage.h</a> gives a nice overview of how the file system is implemented on-top of the raw flash storage:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* The first 8 bytes are reserved for the KeyValueStore struct which gives core
* information such as the number of KeyValuePairs in the store, and whether the
* store has been initialised.
*
* After the KeyValueStore struct, KeyValuePairs are arranged contiguously until
* the end of the block used as persistent storage.
*
* |-------8-------|--------48-------|-----|---------48--------|
* | KeyValueStore | KeyValuePair[0] | ... | KeyValuePair[N-1] |
* |---------------|-----------------|-----|-------------------|
</code></pre></div></div>
<hr />
<h2 id="summary">Summary</h2>
<p>All-in-all the micro:bit is a very nice piece of kit and hopefully will achieve its goal ‘to help develop a new generation of digital pioneers’. However, it also has a really nice software stack, one that is easy to understand and find your way around.</p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>I’ve got nothing to add that isn’t already included in this <a href="https://github.com/carlosperate/awesome-microbit">excellent, comprehensive list of resources</a>, thanks <a href="https://twitter.com/carlosperate">Carlos</a> for putting it together!!</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15806004">Hacker News</a> or <a href="https://www.reddit.com/r/microbit/comments/7g5sgm/exploring_the_bbc_microbit_software_stack/">/r/microbit</a></p>
Microsoft & Open Source a 'Brave New World' - CORESTART 2.02017-11-14T00:00:00+00:00http://www.mattwarren.org/2017/11/14/Microsoft-and-Open-Source-a-Brave-New-World-CORESTART
<p>Recently I was fortunate enough to be invited to the <a href="https://www.corestart.cz/#page-speeches">CORESTART 2.0 conference</a> to give a talk on <a href="https://www.corestart.cz/#page-speeches">Microsoft & Open Source a ‘Brave New World’</a>. It was a great conference, well organised by <a href="https://twitter.com/hercegtomas">Tomáš Herceg</a> and the teams from <a href="https://www.dotnetcollege.cz/">.NET College</a> and <a href="https://www.riganti.cz/en">Riganti</a> and I had a great time.</p>
<p>I encourage you to attend <a href="http://www.updateconference.net/">next years ‘Update’ conference</a> if you can and as bonus you’ll get to see the sights of Prague! Including the <a href="https://en.wikipedia.org/wiki/Head_of_Franz_Kafka">Head of Franz Kafka</a> as well as the amazing buildings, castles and bridges that all the guide-books will tell you about!</p>
<p><a href="/images/2017/11/Head of Franz Kafka.jpeg"><img src="/images/2017/11/Head of Franz Kafka.jpeg" alt="Head of Franz Kafka" /></a></p>
<p>I’ve not been ‘invited’ to speak at a conference before, so I wasn’t sure what to expect, but there was a great audience and they seemed happy to learn about the Open Source projects that Microsoft are running and what is being done to encourage us (the ‘Community’) to contribute.</p>
<p><a href="/images/2017/11/Speaking at CORESTART 2.0.jpg"><img src="/images/2017/11/Speaking at CORESTART 2.0.jpg" alt="Speaking at CORESTART 2.0" /></a></p>
<hr />
<p>The slides for my talk are embedded below and you can also ‘watch’ the <a href="https://www.youtube.com/watch?v=garlskQb8BU">entire recording</a> (audio and slides only, no video).</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/bSYyRobLw3jMLq" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20" title="Microsoft & open source a 'brave new world' - CORESTART 2.0" target="_blank">Microsoft & open source a 'brave new world' - CORESTART 2.0</a> </strong> from <strong><a href="https://www.slideshare.net/mattwarren" target="_blank">Matt Warren</a></strong> </div>
<hr />
<h1 id="talk-outline">Talk Outline</h1>
<p>But if you don’t fancy sitting through the whole thing, you can read the summary below and jump straight to the relevant parts</p>
<h2 id="before">Before</h2>
<p>[<a href="https://www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20/3">jump to slide</a>] [<a href="https://www.youtube.com/watch?v=garlskQb8BU&t=153">direct video link</a>]</p>
<ul>
<li><strong>Wait, didn’t that happen before?</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=300">direct link</a></li>
<li><strong>.NET goes ‘Open Source’ and onto Hacker News</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=478">direct link</a></li>
<li><strong>What did they Open Source?</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=570">direct link</a></li>
<li><strong>CoreFX, CoreCLR, CoreFX Labs, Roslyn</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=804">direct link</a></li>
<li><strong>TypeScript, VS Code and Kestrel</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1256">direct link</a></li>
</ul>
<h2 id="during">During</h2>
<p>[<a href="https://www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20/19">jump to slide</a>] [<a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1470">direct video link</a>]</p>
<ul>
<li><strong>First PR</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1475">direct link</a></li>
<li><strong>Comedy PRs</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1577">direct link</a></li>
<li><strong>Good</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1651">direct link</a></li>
<li><strong>Bad</strong> (‘we got to see how the sausage was made’) <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=1981">direct link</a></li>
<li><strong>Ugly</strong> <a href="https://youtu.be/garlskQb8BU?t=2274">direct link</a></li>
</ul>
<h2 id="after">After</h2>
<p>[<a href="https://www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20/26">jump to slide</a>] [<a href="https://youtu.be/garlskQb8BU?t=2381">direct video link</a>]</p>
<ul>
<li><strong>Do .NET Developers Care?</strong> <a href="https://youtu.be/garlskQb8BU?t=2384">direct link</a></li>
<li><strong>Microsoft the organisation on GitHub</strong> <a href="https://youtu.be/garlskQb8BU?t=2410">direct link</a></li>
<li><strong>Over 60% of Contributions to .NET Core come from the Community</strong> <a href="https://youtu.be/garlskQb8BU?t=2449">direct link</a></li>
<li><strong>Are Microsoft telling the Truth?</strong> <a href="https://youtu.be/garlskQb8BU?t=2469">direct link</a></li>
<li><strong>Analysis of GitHub Repositories - ‘Community v. Microsoft’</strong> <a href="https://youtu.be/garlskQb8BU?t=2540">direct link</a></li>
<li><strong>Issues Opened</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=2600">direct link</a></li>
<li><strong>Pull Requests Created</strong> <a href="https://youtu.be/garlskQb8BU?t=2654">direct link</a></li>
<li><strong>Do .NET Developers Care? - Conclusions</strong> <a href="https://www.youtube.com/watch?v=garlskQb8BU&t=2710">direct link</a></li>
</ul>
<h2 id="what-now">What Now?</h2>
<p>[<a href="https://www.slideshare.net/mattwarren/microsoft-open-source-a-brave-new-world-corestart-20/37">jump to slide</a>] [<a href="https://youtu.be/garlskQb8BU?t=2741">direct video link</a>]</p>
<ul>
<li><strong>How do I Contribute?</strong> <a href="https://youtu.be/garlskQb8BU?t=2745">direct link</a></li>
<li><strong>Domino Chain Reaction</strong> <a href="https://youtu.be/garlskQb8BU?t=2868">direct link</a></li>
<li><strong>First CoreFX PR by Ben Adams</strong> <a href="https://youtu.be/garlskQb8BU?t=2918">direct link</a></li>
<li><strong>First CoreCLR PR by Ben Adams</strong> <a href="https://youtu.be/garlskQb8BU?t=2944">direct link</a></li>
<li><strong>My main Contributions to the CoreCLR</strong> <a href="https://youtu.be/garlskQb8BU?t=2967">direct link</a></li>
<li><strong>Will I get told to RTM?</strong> <a href="https://youtu.be/garlskQb8BU?t=3037">direct link</a></li>
</ul>
<hr />
<h2 id="domino-chain-reaction">Domino Chain Reaction</h2>
<p>Finally, if you’re wondering what the section on ‘Domino Chain Reaction’ is all about, you’ll have to listen to that <a href="https://youtu.be/garlskQb8BU?t=2868">part of the talk</a>, but the video itself is embedded below:</p>
<iframe width="754" height="420" src="https://www.youtube.com/embed/y97rBdSYbkg" frameborder="0" gesture="media" allowfullscreen=""></iframe>
<p>(Based on <strong>actual</strong> research, see <a href="https://www.technologyreview.com/s/509641/the-curious-mathematics-of-domino-chain-reactions/">The Curious Mathematics of Domino Chain Reactions</a>)</p>
A DoS Attack against the C# Compiler2017-11-08T00:00:00+00:00http://www.mattwarren.org/2017/11/08/A-DoS-Attack-against-the-C#-Compiler
<p>Generics in C# are certainly very useful and I find it amazing that <a href="https://blogs.msdn.microsoft.com/dsyme/2011/03/15/netc-generics-history-some-photos-from-feb-1999/">we almost didn’t get them</a>:</p>
<blockquote>
<p>What would the cost of inaction have been? What would the cost of failure have been? No generics in C# 2.0? No LINQ in C# 3.0? No TPL in C# 4.0? No Async in C# 5.0? No F#? Ultimately, an erasure model of generics would have been adopted, as for Java, since the CLR team would never have pursued a in-the-VM generics design without external help.</p>
</blockquote>
<p>So a big thanks is due to <a href="https://www.microsoft.com/en-us/research/people/dsyme/">Don Syme</a> and the rest of the team at Microsoft Research in Cambridge!</p>
<p>But as well as being useful, I also find some usages of generics mind-bending, for instance I’m still not sure what this code <em>actually</em> means or how to explain it in words:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Blah</span><span class="p"><</span><span class="n">T</span><span class="p">></span> <span class="k">where</span> <span class="n">T</span> <span class="p">:</span> <span class="n">Blah</span><span class="p"><</span><span class="n">T</span><span class="p">></span>
</code></pre></div></div>
<p>As always, reading an Eric Lippert post <a href="https://blogs.msdn.microsoft.com/ericlippert/2011/02/03/curiouser-and-curiouser/">helps a lot</a>, but even he recommends against using this specific ‘circular’ pattern.</p>
<hr />
<p>Recently I spoke at the <a href="https://www.corestart.cz/">CORESTART 2.0</a> conference in Prague, giving a talk on <a href="https://www.corestart.cz/#page-speeches">‘Microsoft and Open-Source – A ‘Brave New World’</a>. Whilst I was there I met the very knowledgeable <a href="https://twitter.com/cincura_net">Jiri Cincura</a>, who blogs at <a href="https://www.tabsoverspaces.com/">tabs ↹ over ␣ ␣ ␣ spaces</a>. He was giving a great talk on ‘C# 7.1 and 7.2 features’, but also shared with me an excellent code snippet that he called ‘Crazy Class’:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">></span>
<span class="p">{</span>
<span class="k">class</span> <span class="nc">Inner</span> <span class="p">:</span> <span class="n">Class</span><span class="p"><</span><span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">></span>
<span class="p">{</span>
<span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span> <span class="n">inner</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>He said:</p>
<blockquote>
<p>this is the class that takes crazy amount of time to compile. You can add more <code class="language-plaintext highlighter-rouge">Inner.Inner.Inner...</code> to make it even longer (and also generic parameters).</p>
</blockquote>
<p>After a big of digging around I found that someone else had noticed this, see the StackOverflow question <a href="https://stackoverflow.com/questions/14177225/why-does-field-declaration-with-duplicated-nested-type-in-generic-class-results/14178014">Why does field declaration with duplicated nested type in generic class results in huge source code increase?</a> Helpfully the ‘accepted answer’ explains what is going on:</p>
<blockquote>
<p>When you combine these two, the way you have done, something interesting happens. The type <code class="language-plaintext highlighter-rouge">Outer<T>.Inner</code> is not the same type as <code class="language-plaintext highlighter-rouge">Outer<T>.Inner.Inner</code>. <code class="language-plaintext highlighter-rouge">Outer<T>.Inner</code> is a subclass of <code class="language-plaintext highlighter-rouge">Outer<Outer<T>.Inner></code> while <code class="language-plaintext highlighter-rouge">Outer<T>.Inner.Inner</code> is a subclass of <code class="language-plaintext highlighter-rouge">Outer<Outer<Outer<T>.Inner>.Inner></code>, which we established before as being different from <code class="language-plaintext highlighter-rouge">Outer<T>.Inner</code>. So <code class="language-plaintext highlighter-rouge">Outer<T>.Inner.Inner</code> and <code class="language-plaintext highlighter-rouge">Outer<T>.Inner</code> <strong>are referring to different types</strong>.</p>
<p>When generating IL, the compiler always uses fully qualified names for types. You have cleverly found a way to refer to types with names whose lengths that grow at <strong>exponential rates</strong>. That is why as you increase the generic arity of <code class="language-plaintext highlighter-rouge">Outer</code> or add additional levels <code class="language-plaintext highlighter-rouge">.Y</code> to the field <code class="language-plaintext highlighter-rouge">field</code> in <code class="language-plaintext highlighter-rouge">Inner</code> the output IL size and compile time grow so quickly.</p>
</blockquote>
<p><strong>Clear? Good!!</strong></p>
<p>You probably have to be Jon Skeet, Eric Lippert or a member of the <a href="https://github.com/dotnet/csharplang/blob/057c1fde486803b9e7d33df70dcb84fefa6c89b1/meetings/2015/LDM-2015-01-21.md#design-team">C# Language Design Team</a> (yay, ‘Matt Warren’) to really understand what’s going on here, but that doesn’t stop the rest of us having fun with the code!!</p>
<p><strong style="color:red">I can’t think of any reason why you’d actually want to write code like this, so please don’t!! (or at least if you do, don’t blame me!!)</strong></p>
<p>For a simple idea of what’s actually happening, lets take this code (with only 2 ‘Levels’):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">></span>
<span class="p">{</span>
<span class="k">class</span> <span class="nc">Inner</span> <span class="p">:</span> <span class="n">Class</span><span class="p"><</span><span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">,</span> <span class="n">Inner</span><span class="p">></span>
<span class="p">{</span>
<span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span> <span class="n">inner</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The ‘decompiled’ version actually looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">internal</span> <span class="k">class</span> <span class="nc">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">></span>
<span class="p">{</span>
<span class="k">private</span> <span class="k">class</span> <span class="nc">Inner</span> <span class="p">:</span> <span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">></span>
<span class="p">{</span>
<span class="k">private</span> <span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">,</span>
<span class="n">Class</span><span class="p"><</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span><span class="p">>.</span><span class="n">Inner</span> <span class="n">inner</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Wow, no wonder things go wrong quickly!!</p>
<hr />
<h3 id="exponential-growth">Exponential Growth</h3>
<p>Firstly let’s check the claim of <strong>exponential growth</strong>, if you don’t remember your <a href="https://en.wikipedia.org/wiki/Big_O_notation">Big O notation</a> you can also think of this as <code class="language-plaintext highlighter-rouge">O(very, very bad)</code>!!</p>
<p>To test this out, I’m going to compile the code above, but vary the ‘level’ each time by adding a new <code class="language-plaintext highlighter-rouge">.Inner</code>, so ‘Level 5’ looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span> <span class="n">inner</span><span class="p">;</span>
</code></pre></div></div>
<p>‘Level 6’ like this, and so on</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span><span class="p">.</span><span class="n">Inner</span> <span class="n">inner</span><span class="p">;</span>
</code></pre></div></div>
<p>We then get the following results:</p>
<table>
<thead>
<tr>
<th>Level</th>
<th style="text-align: right">Compile Time (secs)</th>
<th style="text-align: right">Working set (KB)</th>
<th style="text-align: right">Binary Size (Bytes)</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td style="text-align: right">1.15</td>
<td style="text-align: right">54,288</td>
<td style="text-align: right">135,680</td>
</tr>
<tr>
<td>6</td>
<td style="text-align: right">1.22</td>
<td style="text-align: right">59,500</td>
<td style="text-align: right">788,992</td>
</tr>
<tr>
<td>7</td>
<td style="text-align: right">2.00</td>
<td style="text-align: right">70,728</td>
<td style="text-align: right">4,707,840</td>
</tr>
<tr>
<td>8</td>
<td style="text-align: right">6.43</td>
<td style="text-align: right">121,852</td>
<td style="text-align: right">28,222,464</td>
</tr>
<tr>
<td>9</td>
<td style="text-align: right">33.23</td>
<td style="text-align: right">405,472</td>
<td style="text-align: right">169,310,208</td>
</tr>
<tr>
<td>10</td>
<td style="text-align: right">202.10</td>
<td style="text-align: right">2,141,272</td>
<td style="text-align: right"><strong>CRASH</strong></td>
</tr>
</tbody>
</table>
<p>If we look at these results in graphical form, it’s very obvious what’s going on</p>
<p><a href="/images/2017/11/Crazy Class - Compile Time.png"><img src="/images/2017/11/Crazy Class - Compile Time.png" alt="Crazy Class - Compile Time" /></a></p>
<p><a href="/images/2017/11/Crazy Class - Working Set.png"><img src="/images/2017/11/Crazy Class - Working Set.png" alt="Crazy Class - Working Set" /></a></p>
<p><a href="/images/2017/11/Crazy Class - Binary Size.png"><img src="/images/2017/11/Crazy Class - Binary Size.png" alt="Crazy Class - Binary Size" /></a></p>
<p>(the dotted lines are a ‘best fit’ trend-line and they are exponential)</p>
<p>If I compile the code with <code class="language-plaintext highlighter-rouge">dotnet build</code> (version 2.0.0), things go really wrong at ‘Level 10’ and the compiler throws an error (<a href="https://gist.github.com/mattwarren/d6fd747792cf1e98cba4679bf1398041">full stack trace</a>):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">System</span><span class="p">.</span><span class="n">ArgumentOutOfRangeException</span><span class="p">:</span> <span class="n">Specified</span> <span class="n">argument</span> <span class="n">was</span> <span class="k">out</span> <span class="n">of</span> <span class="n">the</span> <span class="n">range</span> <span class="n">of</span> <span class="n">valid</span> <span class="n">values</span><span class="p">.</span>
</code></pre></div></div>
<p>Which looks similar to <a href="https://github.com/Microsoft/visualfsharp/issues/3866">Internal compiler error when creating Portable PDB files #3866</a>.</p>
<p>However your mileage may vary, when I ran the code in Visual Studio 2015 it threw an <code class="language-plaintext highlighter-rouge">OutOfMemoryException</code> instead and then promptly restarted itself!! I assume this is because <a href="https://blogs.msdn.microsoft.com/ricom/2009/06/10/visual-studio-why-is-there-no-64-bit-version-yet/">VS is a 32-bit application</a> and it runs out of memory before it can go really wrong!</p>
<hr />
<h3 id="mono-compiler">Mono Compiler</h3>
<p>As a comparison, here are the results from the <a href="https://github.com/mono/">Mono compiler</a>, thanks to <a href="https://twitter.com/EgorBo">Egor Bogatov</a> for putting them together.</p>
<table>
<thead>
<tr>
<th>Level</th>
<th style="text-align: right">Compile Time (secs)</th>
<th style="text-align: right">Memory Usage (Bytes)</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td style="text-align: right">0.480</td>
<td style="text-align: right">134,144</td>
</tr>
<tr>
<td>6</td>
<td style="text-align: right">0.502</td>
<td style="text-align: right">786,944</td>
</tr>
<tr>
<td>7</td>
<td style="text-align: right">0.745</td>
<td style="text-align: right">4,706,304</td>
</tr>
<tr>
<td>8</td>
<td style="text-align: right">2.053</td>
<td style="text-align: right">28,220,928</td>
</tr>
<tr>
<td>9</td>
<td style="text-align: right">10.134</td>
<td style="text-align: right">169,308,672</td>
</tr>
<tr>
<td>10</td>
<td style="text-align: right">57.307</td>
<td style="text-align: right">1,015,835,136</td>
</tr>
</tbody>
</table>
<p>At ‘Level 10’ it <a href="https://twitter.com/EgorBo/status/928388080519741445">produced a 968.78 Mb binary</a>!!</p>
<p><a href="/images/2017/11/Mono Compiler - Level 10.jpg"><img src="/images/2017/11/Mono Compiler - Level 10.jpg" alt="Mono Compiler - Level 10" /></a></p>
<hr />
<h3 id="profiling-the-compiler">Profiling the Compiler</h3>
<p>Finally, I want to look at just where the compiler is spending all it’s time. From the results above we saw that it was taking <strong>over 3 minutes</strong> to compile a simple program, with a peak memory usage of <strong>2.14 GB</strong>, so what was it actually doing??</p>
<p>Well clearly there’s lots of <code class="language-plaintext highlighter-rouge">Types</code> involved and the Compiler seems happy for you to write this code, so I guess it needs to figure it all out. Once it’s done that, it then needs to write all this <code class="language-plaintext highlighter-rouge">Type</code> metadata out to a .dll or .exe, which can be <strong>100’s of MB</strong> in size.</p>
<p>At a high-level the profiling summary produce by VS looks like this (click for full-size image):</p>
<p><a href="/images/2017/11/Profiling Report.png"><img src="/images/2017/11/Profiling Report.png" alt="Profiling Report" /></a></p>
<p>However if we take a bit of a close look, we can see the ‘hot-path’ is inside the <code class="language-plaintext highlighter-rouge">SerializeTypeReference(..)</code> method in <a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/PEWriter/MetadataWriter.cs#L3788-L3810">Compilers/Core/Portable/PEWriter/MetadataWriter.cs</a></p>
<p><a href="/images/2017/11/Profiling - Hot Path.png"><img src="/images/2017/11/Profiling - Hot Path.png" alt="Profiling - Hot Path" /></a></p>
<hr />
<h3 id="summary">Summary</h3>
<p>I’m a bit torn about this, it is clearly an ‘abuse’ of generics!!</p>
<p>In some ways I think that it <strong>shouldn’t</strong> be fixed, it seems better that the compiler encourages you to <strong>not</strong> write code like this, rather than making is possible!!</p>
<p><strong style="color:red">So if it takes 3 mins to compile your code, allocates 2GB of memory and then crashes, take that as a warning!!</strong></p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15654970">Hacker News</a>, <a href="https://www.reddit.com/r/programming/comments/7bn21r/a_dos_attack_against_the_c_compiler_performance/">/r/programming</a> and <a href="https://www.reddit.com/r/csharp/comments/7bn206/a_dos_attack_against_the_c_compiler_performance/">/r/csharp</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/11/08/A-DoS-Attack-against-the-C-Compiler/">A DoS Attack against the C# Compiler</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
DotNetAnywhere: An Alternative .NET Runtime2017-10-19T00:00:00+00:00http://www.mattwarren.org/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime
<p>Recently I was listening to the excellent <a href="https://www.dotnetrocks.com/">DotNetRocks podcast</a> and they had <a href="https://twitter.com/stevensanderson">Steven Sanderson</a> (of <a href="http://knockoutjs.com/">Knockout.js fame</a>) talking about <a href="https://www.dotnetrocks.com/?show=1455">‘WebAssembly and Blazor’</a>.</p>
<p>In case you haven’t heard about it, <a href="https://github.com/SteveSanderson/Blazor">Blazor</a> is an attempt to bring .NET to the browser, using the magic of <a href="https://developer.mozilla.org/en-US/docs/WebAssembly">WebAssembly</a>. If you want more info, Scott Hanselmen has done a <a href="https://www.hanselman.com/blog/NETAndWebAssemblyIsThisTheFutureOfTheFrontend.aspx">nice write-up of the various .NET/WebAssembly projects</a>.</p>
<p>However, as much as the mention of WebAssembly was pretty cool, what interested me even more how Blazor was using <a href="https://github.com/chrisdunelm/DotNetAnywhere">DotNetAnywhere</a> as the underlying .NET runtime. This post will look at what DotNetAnywhere is, what you can do with it and how it compares to the full .NET framework.</p>
<hr />
<h1 id="dotnetanywhere">DotNetAnywhere</h1>
<p>Firstly it’s worth pointing out that DotNetAnywhere (DNA) is designed to be a fully compliant .NET runtime, which means that it can run .NET dlls/exes that have been compiled to run against the full framework. On top of that (at least in theory) it <strong>supports</strong> all the following <a href="https://github.com/chrisdunelm/DotNetAnywhere#supported-net-runtime-features">.NET runtime features</a>, which is a pretty impressive list!</p>
<blockquote>
<ul>
<li>Generics</li>
<li>Garbage collection and finalization</li>
<li>Weak references</li>
<li>Full exception handling - try/catch/finally</li>
<li>PInvoke</li>
<li>Interfaces</li>
<li>Delegates</li>
<li>Events</li>
<li>Nullable types</li>
<li>Single-dimensional arrays</li>
<li>Multi-threading</li>
</ul>
</blockquote>
<p>In addition there is some <strong>partial support</strong> for <a href="https://docs.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/reflection">Reflection</a></p>
<blockquote>
<ul>
<li>Very limited read-only reflection
<ul>
<li>typeof(), .GetType(), Type.Name, Type.Namespace, Type.IsEnum(), <object>.ToString() only</li>
</ul>
</li>
</ul>
</blockquote>
<p>Finally, there are a few features that are currently <strong>unsupported</strong>:</p>
<blockquote>
<ul>
<li>Attributes</li>
<li>Most reflection</li>
<li>Multi-dimensional arrays</li>
<li>Unsafe code</li>
</ul>
</blockquote>
<p>There are <a href="https://github.com/chrisdunelm/DotNetAnywhere/issues?q=is%3Aissue+is%3Aclosed">various bugs or missing functionality</a> that might prevent your code running under DotNetAnywhere, however several of these have been <a href="https://github.com/SteveSanderson/Blazor/pulls?utf8=%E2%9C%93&q=is%3Apr">fixed since Blazor came along</a>, so it’s worth checking against the Blazor version of DotNetAnywhere.</p>
<p><strong>At this point in time the original DotNetAnywhere repo is <a href="https://github.com/chrisdunelm/DotNetAnywhere#this-project-is-inactive-no-issues-or-prs-will-be-dealt-with">no longer active</a> (the last sustained activity was in Jan 2012), so it seems that any future development or bugs fixes will likely happen in the Blazor repo. If you have ever fixed something in DotNetAnywhere, consider sending a P.R there, to help the effort.</strong></p>
<p><strong>Update:</strong> In addition there are other forks with various bug fixes and enhancements:</p>
<ul>
<li><a href="https://github.com/ncave/dotnet-js">https://github.com/ncave/dotnet-js</a></li>
<li><a href="https://github.com/memsom/dna">https://github.com/memsom/dna</a></li>
</ul>
<h2 id="source-code-layout">Source Code Layout</h2>
<p>What I find most impressive about the DotNetAnywhere runtime is that it was <strong>developed by one person</strong> and is <strong>less that 40,000 lines of code</strong>!! For a comparison the .NET framework Garbage Collector is <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">almost 37,000 lines on it’s own</a> (more info available in my previous post <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/#overall-stats">A Hitchhikers Guide to the CoreCLR Source Code</a>).</p>
<p><strong style="color:green">This makes DotNetAnywhere an ideal learning resource!</strong></p>
<p>Firstly, lets take a look at the Top-10 largest source files, to see where the complexity is:</p>
<h3 id="native-code---17710-lines-in-total">Native Code - <strong>17,710</strong> lines in total</h3>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: right">LOC</th>
<th style="text-align: left">File</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">3,164</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/JIT_Execute.c">JIT_Execute.c</a></td>
</tr>
<tr>
<td style="text-align: right">1,778</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/JIT.c">JIT.c</a></td>
</tr>
<tr>
<td style="text-align: right">1,109</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/PInvoke_CaseCode.h">PInvoke_CaseCode.h</a></td>
</tr>
<tr>
<td style="text-align: right">630</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/Heap.c">Heap.c</a></td>
</tr>
<tr>
<td style="text-align: right">618</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/MetaData.c">MetaData.c</a></td>
</tr>
<tr>
<td style="text-align: right">563</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/MetaDataTables.h">MetaDataTables.h</a></td>
</tr>
<tr>
<td style="text-align: right">517</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/Type.c">Type.c</a></td>
</tr>
<tr>
<td style="text-align: right">491</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/MetaData_Fill.c">MetaData_Fill.c</a></td>
</tr>
<tr>
<td style="text-align: right">467</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/MetaData_Search.c">MetaData_Search.c</a></td>
</tr>
<tr>
<td style="text-align: right">452</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/tree/master/dna/JIT_OpCodes.h">JIT_OpCodes.h</a></td>
</tr>
</tbody>
</table>
</span>
<h3 id="managed-code---28783-lines-in-total">Managed Code - <strong>28,783</strong> lines in total</h3>
<table>
<thead>
<tr>
<th style="text-align: right">LOC</th>
<th style="text-align: left">File</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">2393</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System.Globalization/CalendricalCalculations.cs">corlib/System.Globalization/CalendricalCalculations.cs</a></td>
</tr>
<tr>
<td style="text-align: right">2314</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System/NumberFormatter.cs">corlib/System/NumberFormatter.cs</a></td>
</tr>
<tr>
<td style="text-align: right">1582</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/System.Drawing/System.Drawing/Pens.cs">System.Drawing/System.Drawing/Pens.cs</a></td>
</tr>
<tr>
<td style="text-align: right">1443</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/System.Drawing/System.Drawing/Brushes.cs">System.Drawing/System.Drawing/Brushes.cs</a></td>
</tr>
<tr>
<td style="text-align: right">1405</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/System.Core/System.Linq/Enumerable.cs">System.Core/System.Linq/Enumerable.cs</a></td>
</tr>
<tr>
<td style="text-align: right">745</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System/DateTime.cs">corlib/System/DateTime.cs</a></td>
</tr>
<tr>
<td style="text-align: right">693</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System.IO/Path.cs">corlib/System.IO/Path.cs</a></td>
</tr>
<tr>
<td style="text-align: right">632</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System.Collections.Generic/Dictionary.cs">corlib/System.Collections.Generic/Dictionary.cs</a></td>
</tr>
<tr>
<td style="text-align: right">598</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System/String.cs">corlib/System/String.cs</a></td>
</tr>
<tr>
<td style="text-align: right">467</td>
<td style="text-align: left"><a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System.Text/StringBuilder.cs">corlib/System.Text/StringBuilder.cs</a></td>
</tr>
</tbody>
</table>
<hr />
<h2 id="main-areas-of-functionality">Main areas of functionality</h2>
<p>Next, lets look at the key components in DotNetAnywhere as this gives us a really good idea about what you need to implement a .NET compatible runtime. Along the way, we will also see how they differ from the implementation found in Microsoft’s .NET Framework.</p>
<h3 id="reading-net-dlls">Reading .NET dlls</h3>
<p>The first thing DotNetAnywhere has to do is read/understand/parse the .NET <em>Metadata and Code</em> that’s contained in a .dll/.exe. This all takes place in <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/MetaData.c">MetaData.c</a>, primarily within the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/MetaData.c#L302-L484">LoadSingleTable(..)</a> function. By adding some debugging code, I was able to get a summary of all the different types of <em>Metadata</em> that are read in from a typical .NET dll, it’s quite an interesting list:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MetaData contains 1 Assemblies (MD_TABLE_ASSEMBLY)
MetaData contains 1 Assembly References (MD_TABLE_ASSEMBLYREF)
MetaData contains 0 Module References (MD_TABLE_MODULEREF)
MetaData contains 40 Type References (MD_TABLE_TYPEREF)
MetaData contains 13 Type Definitions (MD_TABLE_TYPEDEF)
MetaData contains 14 Type Specifications (MD_TABLE_TYPESPEC)
MetaData contains 5 Nested Classes (MD_TABLE_NESTEDCLASS)
MetaData contains 11 Field Definitions (MD_TABLE_FIELDDEF)
MetaData contains 0 Field RVA's (MD_TABLE_FIELDRVA)
MetaData contains 2 Propeties (MD_TABLE_PROPERTY)
MetaData contains 59 Member References (MD_TABLE_MEMBERREF)
MetaData contains 2 Constants (MD_TABLE_CONSTANT)
MetaData contains 35 Method Definitions (MD_TABLE_METHODDEF)
MetaData contains 5 Method Specifications (MD_TABLE_METHODSPEC)
MetaData contains 4 Method Semantics (MD_TABLE_PROPERTY)
MetaData contains 0 Method Implementations (MD_TABLE_METHODIMPL)
MetaData contains 22 Parameters (MD_TABLE_PARAM)
MetaData contains 2 Interface Implementations (MD_TABLE_INTERFACEIMPL)
MetaData contains 0 Implementation Maps? (MD_TABLE_IMPLMAP)
MetaData contains 2 Generic Parameters (MD_TABLE_GENERICPARAM)
MetaData contains 1 Generic Parameter Constraints (MD_TABLE_GENERICPARAMCONSTRAINT)
MetaData contains 22 Custom Attributes (MD_TABLE_CUSTOMATTRIBUTE)
MetaData contains 0 Security Info Items? (MD_TABLE_DECLSECURITY)
</code></pre></div></div>
<p>For more information on the <em>Metadata</em> see <a href="https://iobservable.net/blog/2013/05/12/introduction-to-clr-metadata/">Introduction to CLR metadata</a>, <a href="https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-pe-headers/">Anatomy of a .NET Assembly – PE Headers</a> and the <a href="https://www.visualstudio.com/license-terms/ecma-c-common-language-infrastructure-standards/">ECMA specification itself</a>.</p>
<hr />
<h3 id="executing-net-il">Executing .NET IL</h3>
<p>Another large piece of functionality within DotNetAnywhere is the ‘Just-in-Time’ Compiler (JIT), i.e. the code that is responsible for executing the IL, this takes place initially in <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT_Execute.c">JIT_Execute.c</a> and then <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT.c">JIT.c</a>. The main ‘execution loop’ is in the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT.c#L232-L1606">JITit(..) function</a> which contains an impressive 1,374 lines of code and over 200 <code class="language-plaintext highlighter-rouge">case</code> statements within a single <code class="language-plaintext highlighter-rouge">switch</code>!!</p>
<p>Taking a higher level view, the overall process that it goes through looks like this:</p>
<p><a href="/images/2017/10/NET IL - DNA JIT Op-Codes.png"><img src="/images/2017/10/NET IL - DNA JIT Op-Codes.png" alt="NET IL -> DNA JIT Op-Codes" /></a></p>
<p>Where the .NET IL Op-Codes (<code class="language-plaintext highlighter-rouge">CIL_XXX</code>) are defined in <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/CIL_OpCodes.h">CIL_OpCodes.h</a> and the DotNetAnywhere JIT Op-Codes (<code class="language-plaintext highlighter-rouge">JIT_XXX</code>) are defined in <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT_OpCodes.h">JIT_OpCodes.h</a></p>
<p>Interesting enough, the JIT is the only place in DotNetAnywhere that <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT_Execute.c#L184-L204">uses assembly code</a> and even then it’s only for <code class="language-plaintext highlighter-rouge">win32</code>. It is used to allow a ‘jump’ or a <code class="language-plaintext highlighter-rouge">goto</code> to labels in the C source code, so as IL instructions are executed it never actually leaves the <code class="language-plaintext highlighter-rouge">JITit(..)</code> function, control is just moved around without having to make a full method call.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef __GNUC__
</span>
<span class="cp">#define GET_LABEL(var, label) var = &&label
</span>
<span class="cp">#define GO_NEXT() goto **(void**)(pCurOp++)
</span>
<span class="cp">#else
#ifdef WIN32
</span>
<span class="cp">#define GET_LABEL(var, label) \
{ __asm mov edi, label \
__asm mov var, edi }
</span>
<span class="cp">#define GO_NEXT() \
{ __asm mov edi, pCurOp \
__asm add edi, 4 \
__asm mov pCurOp, edi \
__asm jmp DWORD PTR [edi - 4] }
</span>
<span class="cp">#endif
</span></code></pre></div></div>
<p><strong style="color:orange">Differences with the .NET Framework</strong></p>
<p>In the full .NET framework all IL code is turned into machine code by the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md">Just-in-Time Compiler (JIT)</a> before being executed by the CPU.</p>
<p>However as we’ve already seen, DotNetAnywhere ‘interprets’ the IL, instruction-by-instruction and even through it’s done in a file called <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/JIT.c">JIT.c</a> <strong>no machine code</strong> is emitted, so the naming seems strange!?</p>
<p>Maybe it’s just a difference of perspective, but it’s not clear to me at what point you move from ‘interpreting’ code to ‘JITting’ it, even after reading the following links I’m not sure!! (can someone enlighten me?)</p>
<ul>
<li><a href="https://stackoverflow.com/questions/2426091/what-are-the-differences-between-a-just-in-time-compiler-and-an-interpreter">What are the differences between a Just-in-Time-Compiler and an Interpreter?</a></li>
<li><a href="https://softwareengineering.stackexchange.com/questions/246094/understanding-the-differences-traditional-interpreter-jit-compiler-jit-interp">Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler</a></li>
<li><a href="https://stackoverflow.com/questions/3718024/jit-vs-interpreters">JIT vs Interpreters</a></li>
<li><a href="https://www.quora.com/Why-do-we-call-it-JIT-compiler-and-not-JIT-interpreter-to-refer-to-the-thing-that-converts-the-Java-bytecode-to-the-machine-code">Why do we call it “JIT compiler” and not “JIT interpreter” to refer to the thing that converts the Java bytecode to the machine code?</a></li>
<li><a href="https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/underst_jit.html">Understanding JIT Compilation and Optimizations</a></li>
</ul>
<hr />
<h3 id="garbage-collector">Garbage Collector</h3>
<p>All the code for the DotNetAnywhere Garbage Collector (GC) is contained in <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/Heap.c">Heap.c</a> and is a very readable 600 lines of code. To give you an overview of what it does, here is the list of functions that it exposes:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">Heap_Init</span><span class="p">();</span>
<span class="kt">void</span> <span class="nf">Heap_SetRoots</span><span class="p">(</span><span class="n">tHeapRoots</span> <span class="o">*</span><span class="n">pHeapRoots</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">pRoots</span><span class="p">,</span> <span class="n">U32</span> <span class="n">sizeInBytes</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">Heap_UnmarkFinalizer</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">heapPtr</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">Heap_GarbageCollect</span><span class="p">();</span>
<span class="n">U32</span> <span class="nf">Heap_NumCollections</span><span class="p">();</span>
<span class="n">U32</span> <span class="nf">Heap_GetTotalMemory</span><span class="p">();</span>
<span class="n">HEAP_PTR</span> <span class="nf">Heap_Alloc</span><span class="p">(</span><span class="n">tMD_TypeDef</span> <span class="o">*</span><span class="n">pTypeDef</span><span class="p">,</span> <span class="n">U32</span> <span class="n">size</span><span class="p">);</span>
<span class="n">HEAP_PTR</span> <span class="nf">Heap_AllocType</span><span class="p">(</span><span class="n">tMD_TypeDef</span> <span class="o">*</span><span class="n">pTypeDef</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">Heap_MakeUndeletable</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">heapEntry</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">Heap_MakeDeletable</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">heapEntry</span><span class="p">);</span>
<span class="n">tMD_TypeDef</span><span class="o">*</span> <span class="nf">Heap_GetType</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">heapEntry</span><span class="p">);</span>
<span class="n">HEAP_PTR</span> <span class="nf">Heap_Box</span><span class="p">(</span><span class="n">tMD_TypeDef</span> <span class="o">*</span><span class="n">pType</span><span class="p">,</span> <span class="n">PTR</span> <span class="n">pMem</span><span class="p">);</span>
<span class="n">HEAP_PTR</span> <span class="nf">Heap_Clone</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">obj</span><span class="p">);</span>
<span class="n">U32</span> <span class="nf">Heap_SyncTryEnter</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">obj</span><span class="p">);</span>
<span class="n">U32</span> <span class="nf">Heap_SyncExit</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">obj</span><span class="p">);</span>
<span class="n">HEAP_PTR</span> <span class="nf">Heap_SetWeakRefTarget</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">target</span><span class="p">,</span> <span class="n">HEAP_PTR</span> <span class="n">weakRef</span><span class="p">);</span>
<span class="n">HEAP_PTR</span><span class="o">*</span> <span class="nf">Heap_GetWeakRefAddress</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">target</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">Heap_RemovedWeakRefTarget</span><span class="p">(</span><span class="n">HEAP_PTR</span> <span class="n">target</span><span class="p">);</span>
</code></pre></div></div>
<p><strong style="color:orange">Differences with the .NET Framework</strong></p>
<p>However, like the JIT/Interpreter, the GC has some fundamental differences when compared to the .NET Framework</p>
<h4 id="conservative-garbage-collection"><strong>Conservative Garbage Collection</strong></h4>
<p>Firstly DotNetAnywhere implements what is knows as a <a href="https://stackoverflow.com/questions/7629446/conservative-garbage-collector"><em>Conservative</em> GC</a>. In simple terms this means that is does not know (for sure) which areas of memory are actually references/pointers to objects and which are just a random number (that looks like a memory address). In the Microsoft .NET Framework the JIT calculates this information and stores it in the <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/gcinfo.h">GCInfo structure</a> so the GC can make use of it. But DotNetAnywhere doesn’t do this.</p>
<p>Instead, during the <code class="language-plaintext highlighter-rouge">Mark</code> phase the GC <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/Heap.c#L278-L345">gets all the available ‘roots’</a>, but it will consider all memory addresses within an object as ‘potential’ references (hence it is ‘<em>conservative</em>’). It then has to lookup each possible reference, to see if it really points to an ‘object reference’. It does this by keeping track of all memory/heap references in a <a href="http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_andersson.aspx">balanced binary search tree</a> (ordered by memory address), which looks something like this:</p>
<p><a href="/images/2017/10/Binary Tree with Pointers into the Heap.png"><img src="/images/2017/10/Binary Tree with Pointers into the Heap.png" alt="Binary Tree with Pointers into the Heap" /></a></p>
<p>However, this means that all objects references have to be stored in the binary tree when they are allocated, which adds some overhead to allocation. In addition extra memory is needed, 20 bytes per heap entry. We can see this by looking at the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/Heap.c#L58-L83"><code class="language-plaintext highlighter-rouge">tHeapEntry</code> data structure</a> (all pointers are 4 bytes, <code class="language-plaintext highlighter-rouge">U8</code> = 1 byte and <code class="language-plaintext highlighter-rouge">padding</code> is ignored), <code class="language-plaintext highlighter-rouge">tHeapEntry *pLink[2]</code> is the extra data that is needed just to enable the binary tree lookup.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tHeapEntry_</span> <span class="p">{</span>
<span class="c1">// Left/right links in the heap binary tree</span>
<span class="n">tHeapEntry</span> <span class="o">*</span><span class="n">pLink</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="c1">// The 'level' of this node. Leaf nodes have lowest level</span>
<span class="n">U8</span> <span class="n">level</span><span class="p">;</span>
<span class="c1">// Used to mark that this node is still in use.</span>
<span class="c1">// If this is set to 0xff, then this heap entry is undeletable.</span>
<span class="n">U8</span> <span class="n">marked</span><span class="p">;</span>
<span class="c1">// Set to 1 if the Finalizer needs to be run.</span>
<span class="c1">// Set to 2 if this has been added to the Finalizer queue</span>
<span class="c1">// Set to 0 when the Finalizer has been run (or there is no Finalizer in the first place)</span>
<span class="c1">// Only set on types that have a Finalizer</span>
<span class="n">U8</span> <span class="n">needToFinalize</span><span class="p">;</span>
<span class="c1">// unused</span>
<span class="n">U8</span> <span class="n">padding</span><span class="p">;</span>
<span class="c1">// The type in this heap entry</span>
<span class="n">tMD_TypeDef</span> <span class="o">*</span><span class="n">pTypeDef</span><span class="p">;</span>
<span class="c1">// Used for locking sync, and tracking WeakReference that point to this object</span>
<span class="n">tSync</span> <span class="o">*</span><span class="n">pSync</span><span class="p">;</span>
<span class="c1">// The user memory</span>
<span class="n">U8</span> <span class="n">memory</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>
<p>But why does DotNetAnywhere work like this? Fortunately <a href="https://github.com/chrisdunelm">Chris Bacon</a> the author of DotNetAnywhere <a href="https://github.com/SteveSanderson/Blazor/pull/7#discussion_r136719427">explains</a></p>
<blockquote>
<p>Mind you, the whole heap code really needs a rewrite to reduce per-object memory overhead, and to remove the need for the binary tree of allocations. Not really thinking of a generational GC, that would probably add to much code. This was something I vaguely intended to do, but never got around to.
<strong>The current heap code was just the simplest thing to get GC working quickly.</strong> The very initial implementation did no GC at all. It was beautifully fast, but ran out of memory rather too quickly.</p>
</blockquote>
<p>For more info on ‘Conservative’ and ‘Precise’ GCs see:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Tracing_garbage_collection#Precise_vs._conservative_and_internal_pointers">Precise vs. conservative and internal pointers</a></li>
<li><a href="https://stackoverflow.com/questions/5096088/how-does-the-net-clr-distinguish-between-managed-from-unmanaged-pointers/5096824#5096824">How does the .NET CLR distinguish between Managed from Unmanaged Pointers?</a></li>
</ul>
<h4 id="gc-only-does-mark-sweep-it-doesnt-compact"><strong>GC only does ‘Mark-Sweep’, it doesn’t Compact</strong></h4>
<p>Another area in which the GC behaviour differs is that it doesn’t do any <strong>Compaction</strong> of memory after it’s cleaned up, as Steve Sanderson found out when <a href="https://github.com/SteveSanderson/Blazor/blob/master/src/Blazor.Runtime/Interop/ManagedGCHandle.cs#L40-L43">working on Blazor</a></p>
<blockquote>
<p>.. During server-side execution we don’t actually need to pin anything, because there’s no interop outside .NET. During client-side execution, everything is (in effect) pinned regardless, <strong>because DNA’s GC only does mark-sweep - it doesn’t have any compaction phase</strong>.</p>
</blockquote>
<p>In addition, when an object is allocated DotNetAnywhere just makes a call to <a href="http://www.cplusplus.com/reference/cstdlib/malloc/">malloc()</a>, see the code that does this is in the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/Heap.c#L468">Heap_Alloc(..) function</a>. So there is no concept of <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md#physical-representation-of-the-managed-heap">‘Generations’ or ‘Segments’</a> that you have in the .NET Framework GC, i.e. no ‘Gen 0’, ‘Gen 1’, or ‘Large Object Heap’.</p>
<hr />
<h3 id="threading-model">Threading Model</h3>
<p>Finally, lets take a look at the threading model, which is fundamentally different from the one found in the .NET Framework.</p>
<p><strong style="color:orange">Differences with the .NET Framework</strong></p>
<p>Whilst DotNetAnywhere will happily create new threads and execute them for you, it’s only providing the illusion of true multi-threading. In reality it only runs on <strong>one thread</strong>, but <strong>context switches</strong> between the different threads that your program creates:</p>
<p><a href="/images/2017/10/Thread Usage Explanation.png"><img src="/images/2017/10/Thread Usage Explanation.png" alt="Thread Usage Explanation" /></a></p>
<p>You can see this in action in the code below, (from the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/Thread.c#L112-L236">Thread_Execute() function</a>), note the call to <code class="language-plaintext highlighter-rouge">JIT_Execute(..)</code> with <code class="language-plaintext highlighter-rouge">numInst</code> set to <code class="language-plaintext highlighter-rouge">100</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
<span class="n">U32</span> <span class="n">minSleepTime</span> <span class="o">=</span> <span class="mh">0xffffffff</span><span class="p">;</span>
<span class="n">I32</span> <span class="n">threadExitValue</span><span class="p">;</span>
<span class="n">status</span> <span class="o">=</span> <span class="n">JIT_Execute</span><span class="p">(</span><span class="n">pThread</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">status</span><span class="p">)</span> <span class="p">{</span>
<span class="p">....</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>An interesting side-effect is that the threading code in the DotNetAnywhere <code class="language-plaintext highlighter-rouge">corlib</code> implementation is really simple. For instance the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/dna/System.Threading.Interlocked.c#L26-L37">internal implementation</a> of the <a href="https://github.com/chrisdunelm/DotNetAnywhere/blob/master/corlib/System.Threading/Interlocked.cs#L28"><code class="language-plaintext highlighter-rouge">Interlocked.CompareExchange()</code> function</a> looks like the following, note the lack of synchronisation that you would normally expect:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tAsyncCall</span><span class="o">*</span> <span class="nf">System_Threading_Interlocked_CompareExchange_Int32</span><span class="p">(</span>
<span class="n">PTR</span> <span class="n">pThis_</span><span class="p">,</span> <span class="n">PTR</span> <span class="n">pParams</span><span class="p">,</span> <span class="n">PTR</span> <span class="n">pReturnValue</span><span class="p">)</span> <span class="p">{</span>
<span class="n">U32</span> <span class="o">*</span><span class="n">pLoc</span> <span class="o">=</span> <span class="n">INTERNALCALL_PARAM</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">U32</span><span class="o">*</span><span class="p">);</span>
<span class="n">U32</span> <span class="n">value</span> <span class="o">=</span> <span class="n">INTERNALCALL_PARAM</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">U32</span><span class="p">);</span>
<span class="n">U32</span> <span class="n">comparand</span> <span class="o">=</span> <span class="n">INTERNALCALL_PARAM</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="n">U32</span><span class="p">);</span>
<span class="o">*</span><span class="p">(</span><span class="n">U32</span><span class="o">*</span><span class="p">)</span><span class="n">pReturnValue</span> <span class="o">=</span> <span class="o">*</span><span class="n">pLoc</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">pLoc</span> <span class="o">==</span> <span class="n">comparand</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">pLoc</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<h2 id="benchmarks">Benchmarks</h2>
<p>As a simple test, I ran some benchmarks from <a href="http://benchmarksgame.alioth.debian.org/u64q/binarytrees.html">The Computer Language Benchmarks Game - binary-trees</a>, using the <a href="http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=csharpcore&id=1">simplest C# version</a></p>
<p><strong>Note: DotNetAnywhere was designed to run on low-memory devices, so it was not meant to have the same performance as the full .NET Framework. Please bear that in mind when looking at the results!!</strong></p>
<h3 id="net-framework-461---036-seconds">.NET Framework, 4.6.1 - 0.36 seconds</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Invoked=TestApp.exe 15
stretch tree of depth 16 check: 131071
32768 trees of depth 4 check: 1015808
8192 trees of depth 6 check: 1040384
2048 trees of depth 8 check: 1046528
512 trees of depth 10 check: 1048064
128 trees of depth 12 check: 1048448
32 trees of depth 14 check: 1048544
long lived tree of depth 15 check: 65535
Exit code : 0
Elapsed time : 0.36
Kernel time : 0.06 (17.2%)
User time : 0.16 (43.1%)
page fault # : 6604
Working set : 25720 KB
Paged pool : 187 KB
Non-paged pool : 24 KB
Page file size : 31160 KB
</code></pre></div></div>
<h3 id="dotnetanywhere---5439-seconds">DotNetAnywhere - 54.39 seconds</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Invoked=dna TestApp.exe 15
stretch tree of depth 16 check: 131071
32768 trees of depth 4 check: 1015808
8192 trees of depth 6 check: 1040384
2048 trees of depth 8 check: 1046528
512 trees of depth 10 check: 1048064
128 trees of depth 12 check: 1048448
32 trees of depth 14 check: 1048544
long lived tree of depth 15 check: 65535
Total execution time = 54288.33 ms
Total GC time = 36857.03 ms
Exit code : 0
Elapsed time : 54.39
Kernel time : 0.02 (0.0%)
User time : 54.15 (99.6%)
page fault # : 5699
Working set : 15548 KB
Paged pool : 105 KB
Non-paged pool : 8 KB
Page file size : 13144 KB
</code></pre></div></div>
<p>So clearly DotNetAnywhere doesn’t work as fast in this benchmark (0.36 seconds v 54 seconds). However if we look at other benchmarks from the same site, it performs a lot better. It seems that DotNetAnywhere has a significant overhead when allocating objects (a <code class="language-plaintext highlighter-rouge">class</code>), which is less obvious when using <code class="language-plaintext highlighter-rouge">structs</code>.</p>
<table>
<thead>
<tr>
<th> </th>
<th><a href="http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=csharpcore&id=1">Benchmark 1</a> (using <code class="language-plaintext highlighter-rouge">classes</code>)</th>
<th><a href="http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=csharpcore&id=2">Benchmark 2</a> (using <code class="language-plaintext highlighter-rouge">structs</code>)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Elapsed Time (secs)</td>
<td>3.1</td>
<td>2.0</td>
</tr>
<tr>
<td>GC Collections</td>
<td>96</td>
<td>67</td>
</tr>
<tr>
<td>Total GC time (msecs)</td>
<td>983.59</td>
<td>439.73</td>
</tr>
</tbody>
</table>
<hr />
<p><strong>Finally, I really want to thank <a href="https://github.com/chrisdunelm">Chris Bacon</a>, DotNetAnywhere is a great code base and gives a fantastic insight into what needs to happen for a .NET runtime to work.</strong></p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15514519">Hacker News</a> and <a href="https://www.reddit.com/r/programming/comments/77frgh/dotnetanywhere_an_alternative_net_runtime/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/10/19/DotNetAnywhere-an-Alternative-.NET-Runtime/">DotNetAnywhere: An Alternative .NET Runtime</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Analysing C# code on GitHub with BigQuery2017-10-12T00:00:00+00:00http://www.mattwarren.org/2017/10/12/Analysing-C#-code-on-GitHub-with-BigQuery
<p>Just over a year ago Google made all the <a href="https://medium.com/google-cloud/github-on-bigquery-analyze-all-the-code-b3576fd2b150">open source code on GitHub available for querying</a> within BigQuery and as if that wasn’t enough <a href="https://cloud.google.com/blog/big-data/2017/01/how-to-run-a-terabyte-of-google-bigquery-queries-each-month-without-a-credit-card">you can run a terabyte of queries each month for free</a>!</p>
<p>So in this post I am going to be looking at all the <strong>C#</strong> source code on GitHub and what we can find out from it. Handily a smaller, C# only, dataset has been made available (in BigQuery you are charged per byte read), called <a href="https://bigquery.cloud.google.com/table/fh-bigquery:github_extracts.contents_net_cs">fh-bigquery:github_extracts.contents_net_cs</a> and has</p>
<ul>
<li><strong>5,885,933</strong> unique ‘.cs’ files</li>
<li><strong>792,166,632</strong> lines of code (LOC)</li>
<li><strong>37.17 GB</strong> of data</li>
</ul>
<p>Which is a pretty comprehensive set of C# source code!</p>
<hr />
<p>The rest of this post will <em>attempt</em> to answer the following questions:</p>
<ol>
<li><a href="#tabs-or-spaces">Tabs or Spaces?</a></li>
<li><a href="#regions-should-be-banned-or-okay-in-some-cases"><code class="language-plaintext highlighter-rouge">regions</code>: ‘should be banned’ or ‘okay in some cases’?</a></li>
<li><a href="#kr-or-allman-where-do-c-devs-like-to-put-their-braces">‘K&R’ or ‘Allman’, where do C# devs like to put their braces?</a></li>
<li><a href="#do-c-developers-like-writing-functional-code">Do C# developers like writing functional code?</a></li>
</ol>
<p>Then moving onto some less controversial C# topics:</p>
<ol>
<li><a href="#which-using-statements-are-most-widely-used">Which <code class="language-plaintext highlighter-rouge">using</code> statements are most widely used?</a></li>
<li><a href="#what-nuget-packages-are-most-often-included-in-a-net-project">What NuGet packages are most often included in a .NET project</a></li>
<li><a href="#how-many-lines-of-code-loc-are-in-a-typical-c-file">How many lines of code (LOC) are in a typical C# file?</a></li>
<li><a href="#what-is-the-most-widely-thrown-exception">What is the most widely thrown <code class="language-plaintext highlighter-rouge">Exception</code>?</a></li>
<li><a href="#asyncawait-all-the-things-or-not">‘async/await all the things’ or not?</a></li>
<li><a href="#do-c-developers-like-using-the-var-keyword">Do C# developers like using the <code class="language-plaintext highlighter-rouge">var</code> keyword?</a> (<strong>Updated</strong>)</li>
</ol>
<p>Before we end up looking at repositories, not just individual C# files:</p>
<ol>
<li><a href="#what-is-the-most-popular-repository-with-c-code-in-it">What is the most popular repository with C# code in it?</a></li>
<li><a href="#just-how-many-files-should-you-have-in-a-repository">Just how many files should you have in a repository?</a></li>
<li><a href="#what-are-the-most-popular-c-class-names">What are the most popular C# <code class="language-plaintext highlighter-rouge">class</code> names?</a></li>
<li><a href="#foocs-programcs-or-something-else-whats-the-most-common-file-name">‘Foo.cs’, ‘Program.cs’ or something else, what’s the most common file name?</a></li>
</ol>
<p>If you want to try the queries for yourself (or find my mistakes), all of them are available in <a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752">this gist</a>. There’s a good chance that my regular expressions miss out some edge-cases, after all <a href="https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/">Regular Expressions: Now You Have Two Problems</a>:</p>
<blockquote>
<p>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.</p>
</blockquote>
<hr />
<h2 id="tabs-or-spaces">Tabs or Spaces?</h2>
<p>In the entire data-set there are 5,885,933 files, but here we only include ones that have more than 10 lines starting with a tab or a space</p>
<table>
<thead>
<tr>
<th style="text-align: center">Tabs</th>
<th style="text-align: center">Tabs %</th>
<th style="text-align: center">Spaces</th>
<th style="text-align: center">Spaces %</th>
<th style="text-align: center">Total</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">799,055</td>
<td style="text-align: center">17.15%</td>
<td style="text-align: center">3,859,528</td>
<td style="text-align: center">82.85%</td>
<td style="text-align: center">4,658,583</td>
</tr>
</tbody>
</table>
<p>Clearly, C# developers (on GitHub) prefer <strong>Spaces</strong> over <strong>Tabs</strong>, let the endless debates continue!! (I think <em>some</em> of this can be explained by the fact that Visual Studio <a href="https://blogs.msdn.microsoft.com/zainnab/2010/09/08/insert-spaces-vs-keep-tabs/">uses ‘spaces’ by default</a>)</p>
<p>If you want to see how C# compares to other programming languages, take a look at <a href="https://medium.com/@hoffa/400-000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs-7cfe0b5dd7fd">400,000 GitHub repositories, 1 billion files, 14 terabytes of code: Spaces or Tabs?</a>.</p>
<h2 id="regions-should-be-banned-or-okay-in-some-cases"><code class="language-plaintext highlighter-rouge">regions</code>: ‘should be banned’ or ‘okay in some cases’?</h2>
<p>It turns out that there are an impressive <strong>712,498</strong> C# files (out of 5.8 million) that contain at least one <code class="language-plaintext highlighter-rouge">#region</code> statement (<a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#regions">query used</a>), that’s just over 12%. (I’m hoping that a lot of those files have been auto-generated by a tool!)</p>
<h2 id="kr-or-allman-where-do-c-devs-like-to-put-their-braces">‘K&R’ or ‘Allman’, where do C# devs like to put their braces?</h2>
<p>C# developers overwhelmingly prefer putting an opening brace <code class="language-plaintext highlighter-rouge">{</code> on it’s own line (<a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#brace_placement">query used</a>)</p>
<table>
<thead>
<tr>
<th style="text-align: center">separate line</th>
<th style="text-align: center">same line</th>
<th>same line (initializer)</th>
<th> </th>
<th>total (with brace)</th>
<th>total (all code)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">81,306,320 (67%)</td>
<td style="text-align: center">40,044,603 (33%)</td>
<td>3,631,947 (2.99%)</td>
<td> </td>
<td>121,350,923 (15.32%)</td>
<td>792,166,632</td>
</tr>
</tbody>
</table>
<p>(‘same line initializers’ include code like <code class="language-plaintext highlighter-rouge">new { Name = "", .. }</code>, <code class="language-plaintext highlighter-rouge">new [] { 1, 2, 3.. }</code>)</p>
<h2 id="do-c-developers-like-writing-functional-code">Do C# developers like writing functional code?</h2>
<p>This is slightly unscientific, but I wanted to see how widely the <a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/lambda-operator">Lambda Operator</a> <code class="language-plaintext highlighter-rouge">=></code> is used in C# code (<a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#lambdas">query</a>). Yes, I know, if you want to write functional code on .NET you really should use F#, but C# has become more ‘functional’ over the years and I wanted to see how much code was taking advantage of that.</p>
<p>Here’s the raw percentiles:</p>
<table>
<thead>
<tr>
<th style="text-align: center">Percentile</th>
<th style="text-align: center">% of lines using lambdas</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">10</td>
<td style="text-align: center">0.51</td>
</tr>
<tr>
<td style="text-align: center">25</td>
<td style="text-align: center">1.14</td>
</tr>
<tr>
<td style="text-align: center">50</td>
<td style="text-align: center">2.50</td>
</tr>
<tr>
<td style="text-align: center">75</td>
<td style="text-align: center">5.26</td>
</tr>
<tr>
<td style="text-align: center">90</td>
<td style="text-align: center">9.95</td>
</tr>
<tr>
<td style="text-align: center">95</td>
<td style="text-align: center">14.29</td>
</tr>
<tr>
<td style="text-align: center">99</td>
<td style="text-align: center">28.00</td>
</tr>
</tbody>
</table>
<p>So we can say that:</p>
<ul>
<li>50% of all the C# code on GitHub uses <code class="language-plaintext highlighter-rouge">=></code> on 2.44% (or less) of their lines.</li>
<li>10% of all C# files have lambdas on almost 1 in 10 of their lines (9.95%)</li>
<li>5% use <code class="language-plaintext highlighter-rouge">=></code> on 1 in 7 lines (14.29%)</li>
<li>1% of files have lambdas on over 1 in 3 lines (28%) of their lines of code, that’s pretty impressive!</li>
</ul>
<hr />
<h2 id="which-using-statements-are-most-widely-used">Which <code class="language-plaintext highlighter-rouge">using</code> statements are most widely used?</h2>
<p>Now on to some a bit more substantial, what are the most widely used <code class="language-plaintext highlighter-rouge">using</code> statements in C# code?</p>
<p>The top 10 looks like this (the <a href="https://gist.github.com/mattwarren/be5df65729b0188d31463e3f143ba886">full results are available</a>):</p>
<table>
<thead>
<tr>
<th>using statement</th>
<th style="text-align: right">count</th>
</tr>
</thead>
<tbody>
<tr>
<td>using System.Collections.Generic;</td>
<td style="text-align: right">1,780,646</td>
</tr>
<tr>
<td>using System;</td>
<td style="text-align: right">1,477,019</td>
</tr>
<tr>
<td>using System.Linq;</td>
<td style="text-align: right">1,319,830</td>
</tr>
<tr>
<td>using System.Text;</td>
<td style="text-align: right">902,165</td>
</tr>
<tr>
<td>using System.Threading.Tasks;</td>
<td style="text-align: right">628,195</td>
</tr>
<tr>
<td>using System.Runtime.InteropServices;</td>
<td style="text-align: right">431,867</td>
</tr>
<tr>
<td>using System.IO;</td>
<td style="text-align: right">407,848</td>
</tr>
<tr>
<td>using System.Runtime.CompilerServices;</td>
<td style="text-align: right">338,686</td>
</tr>
<tr>
<td>using System.Collections;</td>
<td style="text-align: right">289,867</td>
</tr>
<tr>
<td>using System.Reflection;</td>
<td style="text-align: right">218,369</td>
</tr>
</tbody>
</table>
<p>However, <a href="https://twitter.com/davkean/status/917523113587257344">as was pointed out</a>, the top 5 are included by default when you add a new file in Visual Studio and many people wouldn’t remove them. The same applies to ‘System.Runtime.InteropServices’ and ‘System.Runtime.CompilerServices’ which are include in ‘AssemblyInfo.cs` by default.</p>
<p>So if we adjust the list to take account of this, the top 10 looks like so:</p>
<table>
<thead>
<tr>
<th>using statement</th>
<th style="text-align: right">count</th>
</tr>
</thead>
<tbody>
<tr>
<td>using System.IO;</td>
<td style="text-align: right">407,848</td>
</tr>
<tr>
<td>using System.Collections;</td>
<td style="text-align: right">289,867</td>
</tr>
<tr>
<td>using System.Reflection;</td>
<td style="text-align: right">218,369</td>
</tr>
<tr>
<td>using System.Diagnostics;</td>
<td style="text-align: right">201,341</td>
</tr>
<tr>
<td>using System.Threading;</td>
<td style="text-align: right">179,168</td>
</tr>
<tr>
<td>using System.ComponentModel;</td>
<td style="text-align: right">160,681</td>
</tr>
<tr>
<td>using System.Web;</td>
<td style="text-align: right">160,323</td>
</tr>
<tr>
<td>using System.Windows.Forms;</td>
<td style="text-align: right">137,003</td>
</tr>
<tr>
<td>using System.Globalization;</td>
<td style="text-align: right">132,113</td>
</tr>
<tr>
<td>using System.Drawing;</td>
<td style="text-align: right">127,033</td>
</tr>
</tbody>
</table>
<p>Finally, an interesting list is the top 10 using statements that aren’t <code class="language-plaintext highlighter-rouge">System</code>, <code class="language-plaintext highlighter-rouge">Microsoft</code> or <code class="language-plaintext highlighter-rouge">Windows</code> namespaces:</p>
<table>
<thead>
<tr>
<th>using statement</th>
<th style="text-align: right">count</th>
</tr>
</thead>
<tbody>
<tr>
<td>using NUnit.Framework;</td>
<td style="text-align: right">119,463</td>
</tr>
<tr>
<td>using UnityEngine;</td>
<td style="text-align: right">117,673</td>
</tr>
<tr>
<td>using Xunit;</td>
<td style="text-align: right">99,099</td>
</tr>
<tr>
<td>using Newtonsoft.Json;</td>
<td style="text-align: right">81,675</td>
</tr>
<tr>
<td>using Newtonsoft.Json.Linq;</td>
<td style="text-align: right">29,416</td>
</tr>
<tr>
<td>using Moq;</td>
<td style="text-align: right">23,546</td>
</tr>
<tr>
<td>using UnityEngine.UI;</td>
<td style="text-align: right">20,355</td>
</tr>
<tr>
<td>using UnityEditor;</td>
<td style="text-align: right">19,937</td>
</tr>
<tr>
<td>using Amazon.Runtime;</td>
<td style="text-align: right">18,941</td>
</tr>
<tr>
<td>using log4net;</td>
<td style="text-align: right">17,297</td>
</tr>
</tbody>
</table>
<h2 id="what-nuget-packages-are-most-often-included-in-a-net-project">What NuGet packages are most often included in a .NET project?</h2>
<p>It turns out that there is also a separate dataset containing all the ‘packages.config’ files on GitHub, it’s called <a href="https://bigquery.cloud.google.com/table/fh-bigquery:github_extracts.contents_net_packages_config">contents_net_packages_config</a> and has 104,808 entries. By querying this we can see that <a href="https://www.newtonsoft.com/json">Json.Net</a> is the clear winner!!</p>
<table>
<thead>
<tr>
<th>package</th>
<th style="text-align: right">count</th>
</tr>
</thead>
<tbody>
<tr>
<td>Newtonsoft.Json</td>
<td style="text-align: right">45,055</td>
</tr>
<tr>
<td>Microsoft.Web.Infrastructure</td>
<td style="text-align: right">16,022</td>
</tr>
<tr>
<td>Microsoft.AspNet.Razor</td>
<td style="text-align: right">15,109</td>
</tr>
<tr>
<td>Microsoft.AspNet.WebPages</td>
<td style="text-align: right">14,495</td>
</tr>
<tr>
<td>Microsoft.AspNet.Mvc</td>
<td style="text-align: right">14,236</td>
</tr>
<tr>
<td>EntityFramework</td>
<td style="text-align: right">14,191</td>
</tr>
<tr>
<td>Microsoft.AspNet.WebApi.Client</td>
<td style="text-align: right">13,480</td>
</tr>
<tr>
<td>Microsoft.AspNet.WebApi.Core</td>
<td style="text-align: right">12,210</td>
</tr>
<tr>
<td>Microsoft.Net.Http</td>
<td style="text-align: right">11,625</td>
</tr>
<tr>
<td>jQuery</td>
<td style="text-align: right">10,646</td>
</tr>
<tr>
<td>Microsoft.Bcl.Build</td>
<td style="text-align: right">10,641</td>
</tr>
<tr>
<td>Microsoft.Bcl</td>
<td style="text-align: right">10,349</td>
</tr>
<tr>
<td>NUnit</td>
<td style="text-align: right">10,341</td>
</tr>
<tr>
<td>Owin</td>
<td style="text-align: right">9,681</td>
</tr>
<tr>
<td>Microsoft.Owin</td>
<td style="text-align: right">9,202</td>
</tr>
<tr>
<td>Microsoft.AspNet.WebApi.WebHost</td>
<td style="text-align: right">9,007</td>
</tr>
<tr>
<td>WebGrease</td>
<td style="text-align: right">8,743</td>
</tr>
<tr>
<td>Microsoft.AspNet.Web.Optimization</td>
<td style="text-align: right">8,721</td>
</tr>
<tr>
<td>Microsoft.AspNet.WebApi</td>
<td style="text-align: right">8,179</td>
</tr>
</tbody>
</table>
<h2 id="how-many-lines-of-code-loc-are-in-a-typical-c-file">How many lines of code (LOC) are in a typical C# file?</h2>
<p>Are C# developers prone to creating huge files that go one for 1000’s of lines? Well some are but fortunately it’s the minority of us!!</p>
<p><a href="/images/2017/10/Percentiles%20of%20lines%20of%20code%20per%20file.png"><img src="/images/2017/10/Percentiles%20of%20lines%20of%20code%20per%20file.png" alt="Percentiles of lines of code per file" /></a></p>
<p>Note the Y-axis is ‘lines of code’ and is logarithmic, the <a href="https://gist.github.com/mattwarren/c810abe0c1ea152b60632c5987161aa4">raw data is available</a>.</p>
<p>Oh dear, Uncle Bob isn’t going to be happy, whilst 96% of the files have 509 LOC of less, the other 4% don’t!! From <a href="http://amzn.to/2yezlZH">Clean Code</a>:</p>
<p><a href="/images/2017/10/Uncle Bob - Clean Code - Number of lines of code in a file.png"><img src="/images/2017/10/Uncle Bob - Clean Code - Number of lines of code in a file.png" alt="Uncle Bob - Clean Code - Number of lines of code in a file" /></a></p>
<p>And in case you’re wondering, here’s the Top 10 longest C# files!!</p>
<table>
<thead>
<tr>
<th>File</th>
<th style="text-align: right">Lines</th>
</tr>
</thead>
<tbody>
<tr>
<td>MarMot/Input/test.marmot.cs</td>
<td style="text-align: right">92663</td>
</tr>
<tr>
<td>src/CodenameGenerator/WordRepos/LastNamesRepository.cs</td>
<td style="text-align: right">88810</td>
</tr>
<tr>
<td>cs_inputtest/cs_02_7000.cs</td>
<td style="text-align: right">63004</td>
</tr>
<tr>
<td>cs_inputtest/cs_02_6000.cs</td>
<td style="text-align: right">54004</td>
</tr>
<tr>
<td>src/ML NET20/Utility/UserName.cs</td>
<td style="text-align: right">52014</td>
</tr>
<tr>
<td>MWBS/Dictionary/DefaultWordDictionary.cs</td>
<td style="text-align: right">48912</td>
</tr>
<tr>
<td>Sources/Accord.Math/Matrix/Matrix.Comparisons1.Generated.cs</td>
<td style="text-align: right">48407</td>
</tr>
<tr>
<td>UrduProofReader/UrduLibs/Utils.cs</td>
<td style="text-align: right">48255</td>
</tr>
<tr>
<td>cs_inputtest/cs_02_5000.cs</td>
<td style="text-align: right">45004</td>
</tr>
<tr>
<td>css/style.cs</td>
<td style="text-align: right">44366</td>
</tr>
</tbody>
</table>
<h2 id="what-is-the-most-widely-thrown-exception">What is the most widely thrown <code class="language-plaintext highlighter-rouge">Exception</code>?</h2>
<p>There’s a few interesting results in <a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#most-popular-execeptions">this query</a>, for instance who knew that so many <code class="language-plaintext highlighter-rouge">ApplicationExceptions</code> were thrown and <code class="language-plaintext highlighter-rouge">NotSupportedException</code> being so high up the list is a bit worrying!!</p>
<table>
<thead>
<tr>
<th>Exception</th>
<th style="text-align: right">count</th>
</tr>
</thead>
<tbody>
<tr>
<td>throw new ArgumentNullException</td>
<td style="text-align: right">699,526</td>
</tr>
<tr>
<td>throw new ArgumentException</td>
<td style="text-align: right">361,616</td>
</tr>
<tr>
<td>throw new NotImplementedException</td>
<td style="text-align: right">340,361</td>
</tr>
<tr>
<td>throw new InvalidOperationException</td>
<td style="text-align: right">260,792</td>
</tr>
<tr>
<td>throw new ArgumentOutOfRangeException</td>
<td style="text-align: right">160,640</td>
</tr>
<tr>
<td>throw new NotSupportedException</td>
<td style="text-align: right">110,019</td>
</tr>
<tr>
<td>throw new HttpResponseException</td>
<td style="text-align: right">74,498</td>
</tr>
<tr>
<td>throw new ValidationException</td>
<td style="text-align: right">35,615</td>
</tr>
<tr>
<td>throw new ObjectDisposedException</td>
<td style="text-align: right">31,129</td>
</tr>
<tr>
<td>throw new ApplicationException</td>
<td style="text-align: right">30,849</td>
</tr>
<tr>
<td>throw new UnauthorizedException</td>
<td style="text-align: right">21,133</td>
</tr>
<tr>
<td>throw new FormatException</td>
<td style="text-align: right">19,510</td>
</tr>
<tr>
<td>throw new SerializationException</td>
<td style="text-align: right">17,884</td>
</tr>
<tr>
<td>throw new IOException</td>
<td style="text-align: right">15,779</td>
</tr>
<tr>
<td>throw new IndexOutOfRangeException</td>
<td style="text-align: right">14,778</td>
</tr>
<tr>
<td>throw new NullReferenceException</td>
<td style="text-align: right">12,372</td>
</tr>
<tr>
<td>throw new InvalidDataException</td>
<td style="text-align: right">12,260</td>
</tr>
<tr>
<td>throw new ApiException</td>
<td style="text-align: right">11,660</td>
</tr>
<tr>
<td>throw new InvalidCastException</td>
<td style="text-align: right">10,510</td>
</tr>
</tbody>
</table>
<h2 id="asyncawait-all-the-things-or-not">‘async/await all the things’ or not?</h2>
<p>The addition of the <code class="language-plaintext highlighter-rouge">async</code> and <code class="language-plaintext highlighter-rouge">await</code> keywords to the C# language makes writing <a href="https://docs.microsoft.com/en-us/dotnet/csharp/async">asynchronous code much easier</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">async</span> <span class="n">Task</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="nf">GetDotNetCountAsync</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// Suspends GetDotNetCountAsync() to allow the caller (the web server)</span>
<span class="c1">// to accept another request, rather than blocking on this one.</span>
<span class="kt">var</span> <span class="n">html</span> <span class="p">=</span> <span class="k">await</span> <span class="n">_httpClient</span><span class="p">.</span><span class="nf">DownloadStringAsync</span><span class="p">(</span><span class="s">"http://dotnetfoundation.org"</span><span class="p">);</span>
<span class="k">return</span> <span class="n">Regex</span><span class="p">.</span><span class="nf">Matches</span><span class="p">(</span><span class="n">html</span><span class="p">,</span> <span class="s">".NET"</span><span class="p">).</span><span class="n">Count</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>But how much is it used? Using the query below:</p>
<pre><code class="language-SQL">SELECT Count(*) count
FROM
[fh-bigquery:github_extracts.contents_net_cs]
WHERE
REGEXP_MATCH(content, r'\sasync\s|\sawait\s')
</code></pre>
<p>I found that there are <strong>218,643</strong> files (out of 5,885,933) that have at least one usage of <code class="language-plaintext highlighter-rouge">async</code> or <code class="language-plaintext highlighter-rouge">await</code> in them.</p>
<h2 id="do-c-developers-like-using-the-var-keyword">Do C# developers like using the <code class="language-plaintext highlighter-rouge">var</code> keyword?</h2>
<strike><p>Less that they use <code class="highlighter-rouge">async</code> and <code class="highlighter-rouge">await</code>, there are <strong>130,590</strong> files that have at least one usage of the <code class="highlighter-rouge">var</code> keyword</p></strike>
<p><strong>Update</strong>: thanks for <a href="https://twitter.com/jairbubbles">jairbubbles</a> for <a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#gistcomment-2228956">pointing out</a> that my <code class="language-plaintext highlighter-rouge">var</code> regex was wrong and supplying a fixed version!</p>
<p><strong>More</strong> than they use <code class="language-plaintext highlighter-rouge">async</code> and <code class="language-plaintext highlighter-rouge">await</code>, there are <strong>1,457,154</strong> files that have at least one usage of the <code class="language-plaintext highlighter-rouge">var</code> keyword</p>
<hr />
<h2 id="just-how-many-files-should-you-have-in-a-repository">Just how many files should you have in a repository?</h2>
<p>90% of the repositories (that have any C# files) have 95 files or less. 95% have 170 files or less and 99% have 535 files or less.</p>
<p><a href="/images/2017/10/Number of Files per Repository.png"><img src="/images/2017/10/Number of Files per Repository.png" alt="Number of C# Files per Repository" /></a></p>
<p>(again the Y-axis (# files) is logarithmic)</p>
<p>The top 10 largest repositories, by number of C# files are shown below:</p>
<table>
<thead>
<tr>
<th>Repository</th>
<th style="text-align: right"># Files</th>
</tr>
</thead>
<tbody>
<tr>
<td>https://github.com/xen2/mcs</td>
<td style="text-align: right">23389</td>
</tr>
<tr>
<td>https://github.com/mater06/LEGOChimaOnlineReloaded</td>
<td style="text-align: right">14241</td>
</tr>
<tr>
<td>https://github.com/Microsoft/referencesource</td>
<td style="text-align: right">13051</td>
</tr>
<tr>
<td>https://github.com/dotnet/corefx</td>
<td style="text-align: right">10652</td>
</tr>
<tr>
<td>https://github.com/apo-j/Projects_Working</td>
<td style="text-align: right">10185</td>
</tr>
<tr>
<td>https://github.com/Microsoft/CodeContracts</td>
<td style="text-align: right">9338</td>
</tr>
<tr>
<td>https://github.com/drazenzadravec/nequeo</td>
<td style="text-align: right">8060</td>
</tr>
<tr>
<td>https://github.com/ClearCanvas/ClearCanvas</td>
<td style="text-align: right">7946</td>
</tr>
<tr>
<td>https://github.com/mwilliamson-firefly/aws-sdk-net</td>
<td style="text-align: right">7860</td>
</tr>
<tr>
<td>https://github.com/151706061/MacroMedicalSystem</td>
<td style="text-align: right">7765</td>
</tr>
</tbody>
</table>
<h2 id="what-is-the-most-popular-repository-with-c-code-in-it">What is the most popular repository with C# code in it?</h2>
<p>This time we are going to look at the most popular repositories (based on GitHub ‘stars’) that contain at least 50 C# files (<a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#most_popular_c_repos">query used</a>):</p>
<table>
<thead>
<tr>
<th>repo</th>
<th style="text-align: right">stars</th>
<th style="text-align: right">files</th>
</tr>
</thead>
<tbody>
<tr>
<td>https://github.com/grpc/grpc</td>
<td style="text-align: right">11075</td>
<td style="text-align: right">237</td>
</tr>
<tr>
<td>https://github.com/dotnet/coreclr</td>
<td style="text-align: right">8576</td>
<td style="text-align: right">6503</td>
</tr>
<tr>
<td>https://github.com/dotnet/roslyn</td>
<td style="text-align: right">8422</td>
<td style="text-align: right">6351</td>
</tr>
<tr>
<td>https://github.com/facebook/yoga</td>
<td style="text-align: right">8046</td>
<td style="text-align: right">73</td>
</tr>
<tr>
<td>https://github.com/bazelbuild/bazel</td>
<td style="text-align: right">7123</td>
<td style="text-align: right">132</td>
</tr>
<tr>
<td>https://github.com/dotnet/corefx</td>
<td style="text-align: right">7115</td>
<td style="text-align: right">10652</td>
</tr>
<tr>
<td>https://github.com/SeleniumHQ/selenium</td>
<td style="text-align: right">7024</td>
<td style="text-align: right">512</td>
</tr>
<tr>
<td>https://github.com/Microsoft/WinObjC</td>
<td style="text-align: right">6184</td>
<td style="text-align: right">81</td>
</tr>
<tr>
<td>https://github.com/qianlifeng/Wox</td>
<td style="text-align: right">5674</td>
<td style="text-align: right">207</td>
</tr>
<tr>
<td>https://github.com/Wox-launcher/Wox</td>
<td style="text-align: right">5674</td>
<td style="text-align: right">142</td>
</tr>
<tr>
<td>https://github.com/ShareX/ShareX</td>
<td style="text-align: right">5336</td>
<td style="text-align: right">766</td>
</tr>
<tr>
<td>https://github.com/Microsoft/Windows-universal-samples</td>
<td style="text-align: right">5130</td>
<td style="text-align: right">1501</td>
</tr>
<tr>
<td>https://github.com/NancyFx/Nancy</td>
<td style="text-align: right">3701</td>
<td style="text-align: right">957</td>
</tr>
<tr>
<td>https://github.com/chocolatey/choco</td>
<td style="text-align: right">3432</td>
<td style="text-align: right">248</td>
</tr>
<tr>
<td>https://github.com/JamesNK/Newtonsoft.Json</td>
<td style="text-align: right">3340</td>
<td style="text-align: right">650</td>
</tr>
</tbody>
</table>
<p>Interesting that the top spot is a Google Repository! (the C# files in it are sample code for using the GRPC library from .NET)</p>
<h2 id="what-are-the-most-popular-c-class-names">What are the most popular C# <code class="language-plaintext highlighter-rouge">class</code> names?</h2>
<p>Assuming that I got the <a href="https://gist.github.com/mattwarren/42100ffe488bce5d48be22b59124b752#class_names">regex correct</a>, the most popular C# <code class="language-plaintext highlighter-rouge">class</code> names are the following:</p>
<table>
<thead>
<tr>
<th>Class name</th>
<th style="text-align: right">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>class C</td>
<td style="text-align: right">182480</td>
</tr>
<tr>
<td>class Program</td>
<td style="text-align: right">163462</td>
</tr>
<tr>
<td>class Test</td>
<td style="text-align: right">50593</td>
</tr>
<tr>
<td>class Settings</td>
<td style="text-align: right">40841</td>
</tr>
<tr>
<td>class Resources</td>
<td style="text-align: right">39345</td>
</tr>
<tr>
<td>class A</td>
<td style="text-align: right">34687</td>
</tr>
<tr>
<td>class App</td>
<td style="text-align: right">28462</td>
</tr>
<tr>
<td>class B</td>
<td style="text-align: right">24246</td>
</tr>
<tr>
<td>class Startup</td>
<td style="text-align: right">18238</td>
</tr>
<tr>
<td>class Foo</td>
<td style="text-align: right">15198</td>
</tr>
</tbody>
</table>
<p>Yay for <code class="language-plaintext highlighter-rouge">Foo</code>, just sneaking into the Top 10!!</p>
<h2 id="foocs-programcs-or-something-else-whats-the-most-common-file-name">‘Foo.cs’, ‘Program.cs’ or something else, what’s the most common file name?</h2>
<p>Finally lets look at the different <code class="language-plaintext highlighter-rouge">class</code> names used, as with the <code class="language-plaintext highlighter-rouge">using</code> statement they are dominated by the default ones used in the Visual Studio templates:</p>
<table>
<thead>
<tr>
<th>File</th>
<th style="text-align: right">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>AssemblyInfo.cs</td>
<td style="text-align: right">386822</td>
</tr>
<tr>
<td>Program.cs</td>
<td style="text-align: right">105280</td>
</tr>
<tr>
<td>Resources.Designer.cs</td>
<td style="text-align: right">40881</td>
</tr>
<tr>
<td>Settings.Designer.cs</td>
<td style="text-align: right">35392</td>
</tr>
<tr>
<td>App.xaml.cs</td>
<td style="text-align: right">21928</td>
</tr>
<tr>
<td>Global.asax.cs</td>
<td style="text-align: right">16133</td>
</tr>
<tr>
<td>Startup.cs</td>
<td style="text-align: right">14564</td>
</tr>
<tr>
<td>HomeController.cs</td>
<td style="text-align: right">13574</td>
</tr>
<tr>
<td>RouteConfig.cs</td>
<td style="text-align: right">11278</td>
</tr>
<tr>
<td>MainWindow.xaml.cs</td>
<td style="text-align: right">11169</td>
</tr>
</tbody>
</table>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=15464097">Hacker News</a> and <a href="https://www.reddit.com/r/csharp/comments/75ykfb/analysing_c_code_on_github_with_bigquery/">/r/csharp</a></p>
<hr />
<h2 id="more-information">More Information</h2>
<p>As always, if you’ve read this far your present is yet more blog posts to read, enjoy!!</p>
<h3 id="how-bigquery-works-only-put-in-at-the-end-of-the-blog-post">How BigQuery Works (only put in at the end of the blog post)</h3>
<ul>
<li><a href="https://cloud.google.com/blog/big-data/2016/01/bigquery-under-the-hood">BigQuery under the hood</a></li>
<li><a href="https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format">Inside Capacitor, BigQuery’s next-generation columnar storage format</a></li>
<li><a href="https://cloud.google.com/blog/big-data/2016/08/in-memory-query-execution-in-google-bigquery">In-memory query execution in Google BigQuery</a></li>
<li><a href="https://cloud.google.com/blog/big-data/2017/07/counting-uniques-faster-in-bigquery-with-hyperloglog">Counting uniques faster in BigQuery with HyperLogLog++</a></li>
<li><a href="https://cloud.google.com/blog/big-data/2017/10/separation-of-compute-and-state-in-google-bigquery-and-cloud-dataflow-and-why-it-matters">Separation of compute and state in Google BigQuery and Cloud Dataflow (and why it matters)</a></li>
<li><a href="https://www.gcppodcast.com/post/episode-94-big-query-under-the-hood-with-tino-tereshko-and-jordan-tigani/">#94 BigQuery Under the Hood with Tino Tereshko and Jordan Tigani</a></li>
<li><a href="http://blog.atscale.com/bi-benchmarks-with-google-bigquery">TECH TALK: BI Performance Benchmarks with Google BigQuery</a></li>
</ul>
<h3 id="bigquery-analysis-of-other-programming-languages">BigQuery analysis of other Programming Languages</h3>
<ul>
<li><a href="https://medium.com/google-cloud/analyzing-go-code-with-bigquery-485c70c3b451">Analyzing Go code with BigQuery</a></li>
<li><a href="https://medium.com/@sAbakumoff/using-bigquery-github-data-to-rank-npm-repositories-ecf8947a1182">Using BigQuery GitHub data to rank npm repositories</a></li>
<li><a href="https://medium.com/@sAbakumoff/using-bigquery-github-data-to-find-out-open-source-software-development-trends-e288a2ca3e6b">Using BigQuery GitHub data to find out open source software development trends</a></li>
<li><a href="https://cloud.google.com/blog/big-data/2016/09/using-bigquery-to-analyze-php-on-github">Using BigQuery to Analyze PHP on GitHub</a></li>
<li><a href="https://labs.steren.fr/2017/08/17/extracting-all-go-regular-expressions-found-on-github/">Extracting all Go regular expressions found on GitHub</a></li>
<li><a href="https://kozikow.com/2016/06/05/more-advanced-github-code-search/">More advanced github code search</a></li>
<li><a href="https://kozikow.com/2016/07/01/top-angular-directives-on-github/">Top angular directives on github, including custom directives</a></li>
<li><a href="http://blog.takipi.com/779236-java-logging-statements-1313-github-repositories-error-warn-or-fatal/">779,236 Java Logging Statements, 1,313 GitHub Repositories: ERROR, WARN or FATAL?</a></li>
<li><a href="https://www.reddit.com/r/bigquery/">/r/BigQuery</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/10/12/Analysing-C-code-on-GitHub-with-BigQuery/">Analysing C# code on GitHub with BigQuery</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
A look at the internals of 'boxing' in the CLR2017-08-02T00:00:00+00:00http://www.mattwarren.org/2017/08/02/A-look-at-the-internals-of-boxing-in-the-CLR
<p>It’s a <a href="https://stackoverflow.com/search?q=boxing+c%23">fundamental part of .NET</a> and can often happen <a href="https://github.com/controlflow/resharper-heapview#resharper-heap-allocations-viewer-plugin">without you knowing</a>, but <strong>how does it actually work</strong>? What is the .NET Runtime doing to make <em>boxing</em> possible?</p>
<p><strong>Note</strong>: this post won’t be discussing how to detect boxing, how it can affect performance or how to remove it (speak to <a href="https://www.ageofascent.com/2016/02/18/asp-net-core-exeeds-1-15-million-requests-12-6-gbps/">Ben Adams</a> about that!). It will <strong>only</strong> be talking about <em>how it works</em>.</p>
<hr />
<p>As an aside, if you like reading about <strong>CLR internals</strong> you may find these other posts interesting:</p>
<ul>
<li><a href="/2017/06/15/How-the-.NET-Rutime-loads-a-Type/?recommended=1">How the .NET Runtime loads a Type</a></li>
<li><a href="/2017/05/08/Arrays-and-the-CLR-a-Very-Special-Relationship/?recommended=1">Arrays and the CLR - a Very Special Relationship</a></li>
<li><a href="/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/?recommended=1">The CLR Thread Pool ‘Thread Injection’ Algorithm</a></li>
<li><a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/?recommended=1">The 68 things the CLR does before executing a single line of your code</a></li>
<li><a href="/2017/01/25/How-do-.NET-delegates-work/?recommended=1">How do .NET delegates work?</a></li>
<li><a href="/2016/12/14/Why-is-Reflection-slow/?recommended=1">Why is reflection slow?</a></li>
<li><a href="/2016/10/26/How-does-the-fixed-keyword-work/?recommended=1">How does the ‘fixed’ keyword work?</a></li>
</ul>
<hr />
<h3 id="boxing-in-the-clr-specification">Boxing in the CLR Specification</h3>
<p>Firstly it’s worth pointing out that boxing is mandated by the <a href="http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf">CLR specification ‘ECMA-335’</a>, so the runtime <strong>has</strong> to provide it:</p>
<p><a href="/images/2017/08/ECMA Spec - I.8.2.4 Boxing and unboxing of values.png"><img src="/images/2017/08/ECMA Spec - I.8.2.4 Boxing and unboxing of values - cutdown.png" alt="ECMA Spec - I.8.2.4 Boxing and unboxing of values" /></a></p>
<p>This means that there are a few key things that the CLR needs to take care of, which we will explore in the rest of this post.</p>
<hr />
<h2 id="creating-a-boxed-type">Creating a ‘boxed’ Type</h2>
<p>The first thing that the runtime needs to do is create the corresponding reference type (‘boxed type’) for any <code class="language-plaintext highlighter-rouge">struct</code> that it loads. You can see this in action, right at the beginning of the ‘Method Table’ creation where it <a href="https://github.com/dotnet/coreclr/blob/4b49e4330441db903e6a5b6efab3e1dbb5b64ff3/src/vm/methodtablebuilder.cpp#L1425-L1445">first checks if it’s dealing with a ‘Value Type’</a>, then behaves accordingly. So the ‘boxed type’ for any <code class="language-plaintext highlighter-rouge">struct</code> is created up front, when your .dll is imported, then it’s ready to be used by any ‘boxing’ that happens during program execution.</p>
<p>The comment in the linked code is pretty interesting, as it reveals some of the low-level details the runtime has to deal with:</p>
<pre><code class="language-Text">// Check to see if the class is a valuetype; but we don't want to mark System.Enum
// as a ValueType. To accomplish this, the check takes advantage of the fact
// that System.ValueType and System.Enum are loaded one immediately after the
// other in that order, and so if the parent MethodTable is System.ValueType and
// the System.Enum MethodTable is unset, then we must be building System.Enum and
// so we don't mark it as a ValueType.
</code></pre>
<hr />
<h2 id="cpu-specific-code-generation">CPU-specific code-generation</h2>
<p>But to see what happens during program execution, let’s start with a simple C# program. The code below creates a custom <code class="language-plaintext highlighter-rouge">struct</code> or <code class="language-plaintext highlighter-rouge">Value Type</code>, which is then ‘boxed’ and ‘unboxed’:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">struct</span> <span class="nc">MyStruct</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">Value</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">var</span> <span class="n">myStruct</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">MyStruct</span><span class="p">();</span>
<span class="c1">// boxing</span>
<span class="kt">var</span> <span class="n">boxed</span> <span class="p">=</span> <span class="p">(</span><span class="kt">object</span><span class="p">)</span><span class="n">myStruct</span><span class="p">;</span>
<span class="c1">// unboxing</span>
<span class="kt">var</span> <span class="n">unboxed</span> <span class="p">=</span> <span class="p">(</span><span class="n">MyStruct</span><span class="p">)</span><span class="n">boxed</span><span class="p">;</span>
</code></pre></div></div>
<p>This gets turned into the following IL code, in which you can see the <code class="language-plaintext highlighter-rouge">box</code> and <code class="language-plaintext highlighter-rouge">unbox.any</code> IL instructions:</p>
<pre><code class="language-Text">L_0000: ldloca.s myStruct
L_0002: initobj TestNamespace.MyStruct
L_0008: ldloc.0
L_0009: box TestNamespace.MyStruct
L_000e: stloc.1
L_000f: ldloc.1
L_0010: unbox.any TestNamespace.MyStruct
</code></pre>
<h3 id="runtime-and-jit-code">Runtime and JIT code</h3>
<p>So what does the JIT do with these IL op codes? Well in the normal case it <em>wires up</em> and then <em>inlines</em> the optimised, hand-written, assembly code versions of the ‘JIT Helper Methods’ provided by the runtime. The links below take you to the relevant lines of code in the CoreCLR source:</p>
<ul>
<li>CPU specific, optimised versions (which are <a href="https://github.com/dotnet/coreclr/blob/4b49e4330441db903e6a5b6efab3e1dbb5b64ff3/src/vm/jitinterfacegen.cpp#L217-L275">wired-up at run-time</a>):
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/vm/amd64/JitHelpers_InlineGetThread.asm#L86-L148">JIT_BoxFastMP_InlineGetThread</a> (AMD64 - multi-proc or Server GC, implicit TLS)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/8cc7e35dd0a625a3b883703387291739a148e8c8/src/vm/amd64/JitHelpers_Slow.asm#L201-L271">JIT_BoxFastMP</a> (AMD64 - multi-proc or Server GC)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/8cc7e35dd0a625a3b883703387291739a148e8c8/src/vm/amd64/JitHelpers_Slow.asm#L485-L554">JIT_BoxFastUP</a> (AMD64 - single-proc and Workstation GC)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/38a2a69c786e4273eb1339d7a75f939c410afd69/src/vm/i386/jitinterfacex86.cpp#L756-L886">JIT_TrialAlloc::GenBox(..)</a> (x86), which is <a href="https://github.com/dotnet/coreclr/blob/38a2a69c786e4273eb1339d7a75f939c410afd69/src/vm/i386/jitinterfacex86.cpp#L1503-L1504">independently wired-up</a></li>
</ul>
</li>
<li>JIT inlines the helper function call in the common case, see <a href="https://github.com/dotnet/coreclr/blob/a14608efbad1bcb4e9d36a418e1e5ac267c083fb/src/jit/importer.cpp#L5212-L5221">Compiler::impImportAndPushBox(..)</a></li>
<li>Generic, less-optimised version, used as a fall-back <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/methodtable.cpp#L3734-L3783">MethodTable::Box(..)</a>
<ul>
<li>Eventually calls into <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/object.cpp#L1514-L1581">CopyValueClassUnchecked(..)</a></li>
<li>Which ties in with the answer to this Stack Overflow question <a href="https://stackoverflow.com/questions/2437925/why-is-struct-better-with-being-less-than-16-bytes/2437938#2437938">Why is struct better with being less than 16 bytes?</a></li>
</ul>
</li>
</ul>
<p>Interesting enough, the only other ‘JIT Helper Methods’ that get this special treatment are <code class="language-plaintext highlighter-rouge">object</code>, <code class="language-plaintext highlighter-rouge">string</code> or <code class="language-plaintext highlighter-rouge">array</code> allocations, which goes to show just how <em>performance sensitive</em> boxing is.</p>
<p>In comparison, there is only one helper method for ‘unboxing’, called <a href="https://github.com/dotnet/coreclr/blob/03bec77fb4efaa397248a2b9a35c547522221447/src/vm/jithelpers.cpp#L3603-L3626">JIT_Unbox(..)</a>, which falls back to <a href="https://github.com/dotnet/coreclr/blob/03bec77fb4efaa397248a2b9a35c547522221447/src/vm/jithelpers.cpp#L3574-L3600">JIT_Unbox_Helper(..)</a> in the uncommon case and is <a href="https://github.com/dotnet/coreclr/blob/4b49e4330441db903e6a5b6efab3e1dbb5b64ff3/src/inc/jithelpers.h#L105">wired up here</a> (<code class="language-plaintext highlighter-rouge">CORINFO_HELP_UNBOX</code> to <code class="language-plaintext highlighter-rouge">JIT_Unbox</code>). The JIT will also inline the helper call in the common case, to save the cost of a method call, see <a href="https://github.com/dotnet/coreclr/blob/11c911e6f49fdc95fc52bec8d930df7e5c50daa9/src/jit/importer.cpp#L14172-L14177">Compiler::impImportBlockCode(..)</a>.</p>
<p>Note that the ‘unbox helper’ only fetches a reference/pointer to the ‘boxed’ data, it has to then be <a href="https://github.com/dotnet/coreclr/blob/11c911e6f49fdc95fc52bec8d930df7e5c50daa9/src/jit/importer.cpp#L14277-L14283">put onto the stack</a>. As we saw above, when the C# compiler does unboxing it uses the <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox_any(v=vs.110).aspx">‘Unbox_Any’</a> op-code not just the <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox(v=vs.110).aspx">‘Unbox’</a> one, see <a href="https://stackoverflow.com/questions/3743762/unboxing-does-not-create-a-copy-of-the-value-is-this-right">Unboxing does not create a copy of the value</a> for more information.</p>
<hr />
<h2 id="unboxing-stub-creation">Unboxing Stub Creation</h2>
<p>As well as ‘boxing’ and ‘unboxing’ a <code class="language-plaintext highlighter-rouge">struct</code>, the runtime also needs to help out during the time that a type remains ‘boxed’. To see why, let’s extend <code class="language-plaintext highlighter-rouge">MyStruct</code> and <code class="language-plaintext highlighter-rouge">override</code> the <code class="language-plaintext highlighter-rouge">ToString()</code> method, so that it displays the current <code class="language-plaintext highlighter-rouge">Value</code>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">struct</span> <span class="nc">MyStruct</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">Value</span><span class="p">;</span>
<span class="k">public</span> <span class="k">override</span> <span class="kt">string</span> <span class="nf">ToString</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="s">"Value = "</span> <span class="p">+</span> <span class="n">Value</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now, if we look at the ‘Method Table’ the runtime creates for the <em>boxed</em> version of <code class="language-plaintext highlighter-rouge">MyStruct</code> (remember, value types have no ‘Method Table’), we can see something strange going on. Note that there are 2 entries for <code class="language-plaintext highlighter-rouge">MyStruct::ToString</code>, one of which I’ve labelled as an ‘Unboxing Stub’</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Method table summary for 'MyStruct':
Number of static fields: 0
Number of instance fields: 1
Number of static obj ref fields: 0
Number of static boxed fields: 0
Number of declared fields: 1
Number of declared methods: 1
Number of declared non-abstract methods: 1
Vtable (with interface dupes) for 'MyStruct':
Total duplicate slots = 0
SD: MT::MethodIterator created for MyStruct (TestNamespace.MyStruct).
slot 0: MyStruct::ToString 0x000007FE41170C10 (slot = 0) (Unboxing Stub)
slot 1: System.ValueType::Equals 0x000007FEC1194078 (slot = 1)
slot 2: System.ValueType::GetHashCode 0x000007FEC1194080 (slot = 2)
slot 3: System.Object::Finalize 0x000007FEC14A30E0 (slot = 3)
slot 5: MyStruct::ToString 0x000007FE41170C18 (slot = 4)
<-- vtable ends here
</code></pre></div></div>
<p>(<a href="\data\2017\08\Full Method Table info for MyStruct.txt">full output is available</a>)</p>
<p><strong>So what is this ‘unboxing stub’ and why is it needed?</strong></p>
<p>It’s there because if you call <code class="language-plaintext highlighter-rouge">ToString()</code> on a <em>boxed</em> version of <code class="language-plaintext highlighter-rouge">MyStruct</code>, it calls the <em>overridden</em> method declared within <code class="language-plaintext highlighter-rouge">MyStruct</code> itself (which is what you’d want it to do), not the <a href="https://msdn.microsoft.com/en-us/library/system.object.tostring(v=vs.110).aspx">Object::ToString()</a> version. But, <code class="language-plaintext highlighter-rouge">MyStruct::ToString()</code> expects to be able to access any fields within the <code class="language-plaintext highlighter-rouge">struct</code>, such as <code class="language-plaintext highlighter-rouge">Value</code> in this case. To make that possible, the runtime/JIT has to adjust the <code class="language-plaintext highlighter-rouge">this</code> pointer before <code class="language-plaintext highlighter-rouge">MyStruct::ToString()</code> is called, as shown in the diagram below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. MyStruct: [0x05 0x00 0x00 0x00]
| Object Header | MethodTable | MyStruct |
2. MyStruct (Boxed): [0x40 0x5b 0x6f 0x6f 0xfe 0x7 0x0 0x0 0x5 0x0 0x0 0x0]
^
object 'this' pointer |
| Object Header | MethodTable | MyStruct |
3. MyStruct (Boxed): [0x40 0x5b 0x6f 0x6f 0xfe 0x7 0x0 0x0 0x5 0x0 0x0 0x0]
^
adjusted 'this' pointer |
</code></pre></div></div>
<p><strong>Key to the diagram</strong></p>
<ol>
<li>Original <code class="language-plaintext highlighter-rouge">struct</code>, on the <strong>stack</strong></li>
<li>The <code class="language-plaintext highlighter-rouge">struct</code> being <em>boxed</em> into an <code class="language-plaintext highlighter-rouge">object</code> that lives on the <strong>heap</strong></li>
<li>Adjustment made to <em>this</em> pointer so <code class="language-plaintext highlighter-rouge">MyStruct::ToString()</code> will work</li>
</ol>
<p>(If you want more information on .NET object internals, see <a href="https://alexandrnikitin.github.io/blog/dotnet-generics-under-the-hood/#net-memory-layout">this useful article</a>)</p>
<p>We can see this in action in the the code linked below, note that the stub <em>only</em> consists of a few assembly instructions (it’s not as heavy-weight as a method call) and there are CPU-specific versions:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/c61525b5883e883621f98d44f479b15d790b0533/src/vm/prestub.cpp#L1760-L1763">MethodDesc::DoPrestub(..)</a> (calls <code class="language-plaintext highlighter-rouge">MakeUnboxingStubWorker(..)</code>)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/c61525b5883e883621f98d44f479b15d790b0533/src/vm/prestub.cpp#L1332-L1364">MakeUnboxingStubWorker(..)</a> (calls <code class="language-plaintext highlighter-rouge">EmitUnboxMethodStub(..)</code> to create the stub)
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/1c9eb774950c98ae65ef5497d805cff2eb565971/src/vm/i386/stublinkerx86.cpp#L3305-L3363">i386</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/1c540c594cc55d8446086dcd979c48efa84e00a9/src/vm/arm/stubs.cpp#L2194-L2221">arm</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/1c540c594cc55d8446086dcd979c48efa84e00a9/src/vm/arm64/stubs.cpp#L1829-L1839">arm64</a></li>
</ul>
</li>
</ul>
<p>The runtime/JIT has to do these tricks to help maintain the illusion that a <code class="language-plaintext highlighter-rouge">struct</code> can behave like a <code class="language-plaintext highlighter-rouge">class</code>, even though under-the-hood they are very different. See Eric Lipperts answer to <a href="https://stackoverflow.com/questions/1682231/how-do-valuetypes-derive-from-object-referencetype-and-still-be-valuetypes">How do ValueTypes derive from Object (ReferenceType) and still be ValueTypes?</a> for a bit more on this.</p>
<hr />
<p>Hopefully this post has given you some idea of what happens <em>under-the-hood</em> when ‘boxing’ takes place.</p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>As before, if you’ve got this far you might find these other links interesting:</p>
<h3 id="useful-code-comments-related-to-boxingunboxing-stubs">Useful code comments related to boxing/unboxing stubs</h3>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/a14608efbad1bcb4e9d36a418e1e5ac267c083fb/src/vm/methodtablebuilder.cpp#L6748-L6760">MethodTableBuilder::AllocAndInitMethodDescChunk(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/fd3668c7c9b9f5d64b5e6d1edf8c55a307cd3c2d/src/vm/genmeth.cpp#L733-L750">MethodDesc::FindOrCreateAssociatedMethodDesc(..) (in genmeth.cpp)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/eeb1efd9394a5decd00078b06099d785a471c06d/src/jit/importer.cpp#L14229-L14247">Compiler::impImportBlockCode(..)</a></li>
<li><a href="https://github.com/AndyAyersMS/coreclr/blob/aa70c0c4b98c167b4b347df79e1765d6727dac5a/src/jit/importer.cpp#L5204-L5219">Note on different ‘Boxing’ modes</a>, added as part of the work on <a href="https://github.com/dotnet/coreclr/pull/13188">JIT: modify box/unbox/isinst/castclass expansions for fast jitting</a></li>
</ul>
<h3 id="github-issues">GitHub Issues</h3>
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/8735">Question: Boxing on stack for function calls</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/8423">Boxing Cache?</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/1341">Improve the default hash code for structs</a> (read the whole discussion)</li>
<li><a href="https://github.com/dotnet/coreclr/pull/13016">JIT: Fix value type box optimization</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/111">(Discussion) Lightweight Boxing?</a></li>
</ul>
<h3 id="other-similarrelated-articles">Other similar/related articles</h3>
<ul>
<li><a href="https://www.codeproject.com/Articles/20481/NET-Type-Internals-From-a-Microsoft-CLR-Perspecti#12">.NET Type Internals - From a Microsoft CLR Perspective</a> (section on ‘Boxing and Unboxing’)</li>
<li><a href="http://yizhang82.me/value-type-boxing#interface-call-into-the-value-type-instance-method">C# value type boxing under the hood</a> (section on ‘Interface call into the value type instance method’)</li>
<li><a href="https://mycodingplace.wordpress.com/2016/11/11/value-type-methods-call-callvirt-constrained-and-hidden-boxing/">Value type methods – call, callvirt, constrained and hidden boxing</a></li>
<li><a href="https://blogs.msdn.microsoft.com/ricom/2007/01/26/performance-quiz-12-the-cost-of-a-good-hash-solution/">Performance Quiz #12 – The Cost of a Good Hash – Solution</a> (Rico Mariani)</li>
<li><a href="https://ericlippert.com/2011/03/14/to-box-or-not-to-box/">To box or not to box</a> (Eric Lippert)</li>
<li><a href="http://theburningmonk.com/2015/07/beware-of-implicit-boxing-of-value-types/">Beware of implicit boxing of value types</a></li>
<li><a href="http://doogalbellend.blogspot.co.uk/2007/04/method-calls-on-value-types-and-boxing.html">Method calls on value types and boxing</a></li>
</ul>
<h3 id="stack-overflow-questions">Stack Overflow Questions</h3>
<ul>
<li><a href="https://stackoverflow.com/questions/7660605/clr-specification-on-boxing">CLR specification on boxing</a></li>
<li><a href="https://stackoverflow.com/questions/5494807/how-clr-works-when-invoking-a-method-of-a-struct">How CLR works when invoking a method of a struct</a></li>
<li><a href="https://stackoverflow.com/questions/1249086/boxing-on-structs-when-calling-tostring">boxing on structs when calling ToString()</a></li>
<li><a href="https://stackoverflow.com/questions/436363/does-calling-a-method-on-a-value-type-result-in-boxing-in-net">Does calling a method on a value type result in boxing in .NET?</a></li>
<li><a href="https://stackoverflow.com/questions/1359856/why-does-implicitly-calling-tostring-on-a-value-type-cause-a-box-instruction">Why does implicitly calling toString on a value type cause a box instruction</a></li>
<li><a href="https://stackoverflow.com/questions/2437925/why-is-struct-better-with-being-less-than-16-bytes/2437938#2437938">Why is struct better with being less than 16 bytes</a></li>
<li><a href="https://stackoverflow.com/questions/40217308/when-are-type-objects-for-value-types-created">When are Type Objects for Value Types created?</a></li>
<li><a href="https://stackoverflow.com/questions/2412981/if-my-struct-implements-idisposable-will-it-be-boxed-when-used-in-a-using-statem">If my struct implements IDisposable will it be boxed when used in a using statement?</a></li>
<li><a href="https://stackoverflow.com/questions/1330571/when-does-a-using-statement-box-its-argument-when-its-a-struct">When does a using-statement box its argument, when it’s a struct?</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/08/02/A-look-at-the-internals-of-boxing-in-the-CLR/">A look at the internals of 'boxing' in the CLR</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Memory Usage Inside the CLR2017-07-10T00:00:00+00:00http://www.mattwarren.org/2017/07/10/Memory-Usage-Inside-the-CLR
<p>Have you ever wondered where and why the .NET Runtime (CLR) allocates memory? I don’t mean the ‘<em>managed</em>’ memory that <em>your</em> code allocates, e.g. via <code class="language-plaintext highlighter-rouge">new MyClass(..)</code> and the Garbage Collector (GC) then cleans up. I mean the memory that the CLR <em>itself</em> allocates, all the internal data structures that it needs to make is possible for your code to run.</p>
<p><strong>Note</strong> just to clarify, this post will <strong>not</strong> be telling you how you can analyse the memory usage of <em>your code</em>, for that I recommend using one of the excellent .NET Profilers available such as <a href="https://www.jetbrains.com/dotmemory/features/">dotMemory by JetBrains</a> or the <a href="http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/">ANTS Memory Profiler from Redgate</a> (I’ve personally used both and they’re great)</p>
<hr />
<h2 id="the-high-level-view">The high-level view</h2>
<p>Fortunately there’s a fantastic tool that makes it very easy for us to get an overview of memory usage within the CLR itself. It’s called <a href="https://technet.microsoft.com/en-us/sysinternals/vmmap.aspx">VMMap</a> and it’s part of the excellent <a href="https://technet.microsoft.com/en-gb/sysinternals/bb842062">Sysinternals Suite</a>.</p>
<p>For the post I will just be using a simple <code class="language-plaintext highlighter-rouge">HelloWorld</code> program, so that we can observe what the CLR does in the simplest possible scenario, obviously things may look a bit different in a more complex app.</p>
<p>Firstly, lets look at the data over time, in 1 second intervals. The <code class="language-plaintext highlighter-rouge">HelloWorld</code> program just prints to the Console and then waits until you press <code class="language-plaintext highlighter-rouge"><ENTER></code>, so once the memory usage has reached it’s peak it remains there till the program exits. (Click for a larger version)</p>
<p><a href="/images/2017/07/Overall Memory Usage - Timeline (Committed).png"><img src="/images/2017/07/Overall Memory Usage - Timeline (Committed).png" alt="Overall Memory Usage - Timeline (Committed)" /></a></p>
<p>However, to get a more detailed view, we will now look at the <em>snapshot</em> from 2 seconds into the timeline, when the memory usage has stabilised.</p>
<p><a href="/images/2017/07/Overall Memory Usage.png"><img src="/images/2017/07/Overall Memory Usage.png" alt="Overall Memory Usage" /></a></p>
<p><strong>Note</strong>: If you want to find out more about memory usage in general, but also <em>specifically</em> how measure it in .NET applications, I recommend reading this excellent series of posts by <a href="https://twitter.com/goldshtn">Sasha Goldshtein</a></p>
<ul>
<li><a href="http://blogs.microsoft.co.il/sasha/2011/07/14/mapping-the-memory-usage-of-net-applications-part-1-windows-memory-recap/">Mapping the Memory Usage of .NET Applications: Part 1, Windows Memory Recap</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2011/07/18/mapping-the-memory-usage-of-net-applications-part-2-vmmap-and-memorydisplay/">Mapping the Memory Usage of .NET Applications: Part 2, VMMap and MemoryDisplay</a></li>
<li><a href="http://blogs.microsoft.co.il/sasha/2011/07/22/mapping-the-memory-usage-of-net-applications-part-3-clr-profiler/">Mapping the Memory Usage of .NET Applications: Part 3, CLR Profiler</a></li>
</ul>
<p>Also, if like me you always get the different types of memory mixed-up, please read this Stackoverflow answer first <a href="https://stackoverflow.com/questions/1984186/what-is-private-bytes-virtual-bytes-working-set">What is private bytes, virtual bytes, working set?</a></p>
<h3 id="image-memory">‘Image’ Memory</h3>
<p>Now we’ve seen the high-level view, lets take a close look at the individual chucks, the largest of which is labelled <em>Image</em>, which according to the VMMap help page (see here for <a href="/images/2017/07/VMMap - Help for Memory Types.png">all info on all memory types</a>):</p>
<blockquote>
<p>… represents an executable file such as a .exe or .dll and has been loaded into a process by the image loader. It does not include images mapped as data files, which would be included in the Mapped File memory type. Image mappings can include shareable memory like code. When data regions, like initialized data, is modified, additional private memory is created in the process.</p>
</blockquote>
<p><a href="/images/2017/07/Image Memory Usage.png"><img src="/images/2017/07/Image Memory Usage.png" alt="Image Memory Usage" /></a></p>
<p>At this point, it’s worth pointing out a few things:</p>
<ol>
<li>This memory is takes up a large amount of the total process memory because I’m using a simple <code class="language-plaintext highlighter-rouge">HelloWorld</code> program, in other types of programs it wouldn’t dominate the memory usage as much</li>
<li>I was using a <code class="language-plaintext highlighter-rouge">DEBUG</code> version of the <a href="https://github.com/dotnet/coreclr">CoreCLR</a>, so the CLR specific files System.Private.CoreLib.dll, coreclr.dll, clrjit.dll and CoreRun.exe may well be larger than if they were compiled in <code class="language-plaintext highlighter-rouge">RELEASE</code> mode</li>
<li>Some of this memory is potentially ‘shared’ with other processes, compare the numbers in the ‘Total WS’, ‘Private WS’, ‘Shareable WS’ and ‘Shared WS’ columns to see this in action.</li>
</ol>
<h3 id="managed-heaps-created-by-the-garbage-collector">‘Managed Heaps’ created by the Garbage Collector</h3>
<p>The next largest usage of memory is the GC itself, it pre-allocates several <em>heaps</em> that it can then give out whenever your program allocates an object, for example via code such as <code class="language-plaintext highlighter-rouge">new MyClass()</code> or <code class="language-plaintext highlighter-rouge">new byte[]</code>.</p>
<p><a href="/images/2017/07/Managed Heap Memory Usage - Expanded.png"><img src="/images/2017/07/Managed Heap Memory Usage - Expanded.png" alt="Managed Heap Memory Usage - Expanded" /></a></p>
<p>The main thing to note about the image above is that you can clearly see the different heap, there is 256 MB allocated for <em>Generations</em> (Gen 0, 1, 2) and 128 MB for the ‘Large Object Heap’. In addition, note the difference between the amounts in the <em>Size</em> and the <em>Committed</em> columns. Only the <em>Committed</em> memory is actually being used, the total <em>Size</em> is what the GC pre-allocates or reserves up front from the address space.</p>
<p>If you’re interested, the rules for <em>heap</em> or more specifically <em>segment</em> sizes are helpfully explained in the <a href="https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals#ephemeral-generations-and-segments">Microsoft Docs</a>, but simply put, it varies depending on the GC mode (Workstation v Server), whether the process is 32/64-bit and ‘Number of CPUs’.</p>
<hr />
<h2 id="internal-clr-heap-memory">Internal CLR ‘Heap’ memory</h2>
<p>However the part that I’m going to look at for the rest of this post is the memory that is allocated by the CLR itself, that is <em>unmanaged memory</em> that is uses for all its internal data structures.</p>
<p>But if we just look at the VMMap UI view, it doesn’t really tell us that much!</p>
<p><a href="/images/2017/07/Heap Memory Usage.png"><img src="/images/2017/07/Heap Memory Usage.png" alt="Heap Memory Usage" /></a></p>
<p>However, using the excellent <a href="https://github.com/Microsoft/perfview/">PerfView tool</a> we can capture the full call-stack of any memory allocations, that is any calls to <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx">VirtualAlloc()</a> or <a href="https://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx">RtlAllocateHeap()</a> (obviously these functions only apply when running the CoreCLR on Windows). If we do this, PerfView gives us the following data (yes, it’s not pretty, but it’s very powerful!!)</p>
<p><a href="/images/2017/07/PerfView - Net Virtual Alloc Stacks.png"><img src="/images/2017/07/PerfView - Net Virtual Alloc Stacks.png" alt="PerfView - Net Virtual Alloc Stacks" /></a></p>
<p>So lets explore this data in more detail.</p>
<h3 id="notable-memory-allocations">Notable memory allocations</h3>
<p>There are a few places where the CLR allocates significant chunks of memory up-front and then uses them through its lifetime, they are listed below:</p>
<ul>
<li>GC related allocations (see <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">gc.cpp</a>)
<ul>
<li>Mark List - <strong>1,052,672 Bytes (1,028 K)</strong> in <code class="language-plaintext highlighter-rouge">WKS::make_mark_list(..)</code>. using during the ‘mark’ phase of the GC, see <a href="https://blogs.msdn.microsoft.com/abhinaba/2009/01/30/back-to-basics-mark-and-sweep-garbage-collection/">Back To Basics: Mark and Sweep Garbage Collection</a></li>
<li>Card Table - <strong>397,312 Bytes (388 K)</strong> in <code class="language-plaintext highlighter-rouge">WKS::gc_heap::make_card_table(..)</code>, see <a href="/2016/02/04/learning-how-garbage-collectors-work-part-1/#marking-the-card-table">Marking the ‘Card Table’</a></li>
<li>Overall Heap Creation/Allocation - <strong>204,800 Bytes (200 K)</strong> in <code class="language-plaintext highlighter-rouge">WKS::gc_heap::make_gc_heap(..)</code></li>
<li>S.O.H Segment creation - <strong>65,536 Bytes (64 K)</strong> in <code class="language-plaintext highlighter-rouge">WKS::gc_heap::allocate(..)</code>, triggered by the first object allocation</li>
<li>L.O.H Segment creation - <strong>65,536 Bytes (64 K)</strong> in <code class="language-plaintext highlighter-rouge">WKS::gc_heap::allocate_large_object(..)</code>, triggered by the first ‘large’ object allocation</li>
<li>Handle Table - <strong>20,480 Bytes (20 K)</strong> in <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/gc/handletable.cpp#L110">HndCreateHandleTable(..)</a></li>
</ul>
</li>
<li>Stress Log - <strong>4,194,304 Bytes (4,096 K)</strong> in <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/utilcode/stresslog.cpp#L191">StressLog::Initialize(..)</a>. Only if the ‘stress log’ is activated, see <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/stresslog.h#L6-L22">this comment for more info</a></li>
<li>‘Watson’ error reporting - <strong>65,536 Bytes (64 K)</strong> in <a href="https://github.com/dotnet/coreclr/blob/3a24095610ecaba62495740bf8319ad467af4997/src/vm/ceemain.cpp#L1079-L1090">EEStartupHelper routine</a></li>
<li>Virtual Call Stub Manager - <strong>36,864 Bytes (36 K)</strong> in <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/vm/virtualcallstub.cpp#L877">VirtualCallStubManager::InitStatic()</a>, which in turn <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/vm/virtualcallstub.cpp#L3449-L3475">creates the DispatchCache</a>. See <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">‘Virtual Stub Dispatch’ in the BOTR</a> for more info</li>
<li>Debugger Heap and Control-Block - <strong>28,672 Bytes (28K)</strong> (only if debugging support is needed) in <a href="https://github.com/dotnet/coreclr/blob/51e968b013e9b1582035f202e004ed024f747f4f/src/debug/ee/debugger.cpp#L16637-L16639">DebuggerHeap::Init(..)</a> and <a href="https://github.com/dotnet/coreclr/blob/51e968b013e9b1582035f202e004ed024f747f4f/src/debug/ee/rcthread.cpp#L402">DebuggerRCThread::Init(..)</a>, both called via <a href="https://github.com/dotnet/coreclr/blob/3a24095610ecaba62495740bf8319ad467af4997/src/vm/ceemain.cpp#L2759-L2839">InitializeDebugger(..)</a></li>
</ul>
<h3 id="execution-engine-heaps">Execution Engine Heaps</h3>
<p>However another technique that it uses is to allocated ‘heaps’, often 64K at a time and then perform individual allocations within the heaps as needed. These heaps are split up into individual use-cases, the most common being for ‘<strong>frequently accessed</strong>’ data and it’s counter-part, data that is ‘<strong>rarely accessed</strong>’, see the explanation from this comment in <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/loaderallocator.hpp#L73-L91">loaderallocator.hpp</a> for more. This is done to ensure that the CLR retains control over any memory allocations and can therefore prevent ‘fragmentation’.</p>
<p>These heaps are together known as ‘Loader Heaps’ as explained in <a href="https://web.archive.org/web/20080919091745/http://msdn.microsoft.com:80/en-us/magazine/cc163791.aspx#S5">Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects</a> (wayback machine version):</p>
<blockquote>
<p><strong>LoaderHeaps</strong>
LoaderHeaps are meant for loading various runtime CLR artifacts and optimization artifacts that live for the lifetime of the domain. These heaps grow by predictable chunks to minimize fragmentation. LoaderHeaps are different from the garbage collector (GC) Heap (or multiple heaps in case of a symmetric multiprocessor or SMP) in that the GC Heap hosts object instances while LoaderHeaps hold together the type system. Frequently accessed artifacts like MethodTables, MethodDescs, FieldDescs, and Interface Maps get allocated on a <strong>HighFrequencyHeap</strong>, while less frequently accessed data structures, such as EEClass and ClassLoader and its lookup tables, get allocated on a <strong>LowFrequencyHeap</strong>. The <strong>StubHeap</strong> hosts stubs that facilitate code access security (CAS), COM wrapper calls, and P/Invoke.</p>
</blockquote>
<p>One of the main places you see this high/low-frequency of access is in the heart of the Type system, where different data items are either classified as ‘hot’ (high-frequency) or ‘cold’ (low-frequency), from the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/type-loader.md#key-data-structures">‘Key Data Structures’ section</a> of the BOTR page on ‘Type Loader Design’:</p>
<blockquote>
<p><strong>EEClass</strong></p>
<p><strong>MethodTable</strong> data are split into “hot” and “cold” structures to improve working set and cache utilization. <strong>MethodTable</strong> itself is meant to only store “hot” data that are needed in program steady state. <strong>EEClass</strong> stores “cold” data that are typically only needed by type loading, JITing or reflection. Each <strong>MethodTable</strong> points to one <strong>EEClass</strong>.</p>
</blockquote>
<p>Further to this, listed below are some specific examples of when each heap type is used:</p>
<ul>
<li>List of all <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetLowFrequencyHeap&type="><strong>Low-Frequency Heap</strong> usages</a>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/b258792e59b09060f54e0c9bbd31edc3e67d1ae8/src/vm/class.cpp#L74">EEClass::operator new</a> (the ‘cold’ scenario above)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/cd95a2e99450f892e56d9703cc71ddd682421e62/src/vm/binder.cpp#L1135">MscorlibBinder::AttachModule(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/b258792e59b09060f54e0c9bbd31edc3e67d1ae8/src/vm/typehash.cpp#L46">EETypeHashTable::Create(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/7e4afb4fbf900b789f53ccb816c6ddba7807de68/src/vm/comutilnative.cpp#L3056">COMNlsHashProvider::InitializeDefaultSeed()</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/a3c193780b8e055678feb06b2499cf8e7b41810c/src/vm/clsload.cpp#L3647">ClassLoader::CreateTypeHandleForTypeKey(..)</a> (when creating function pointers)</li>
</ul>
</li>
<li>List of all <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetHighFrequencyHeap&type="><strong>High-Frequency</strong> Heap usages</a>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/4ee1c192d1638b4bc69db59c0807a2b8c2b5bd3c/src/vm/methodtablebuilder.cpp#L9888">MethodTableBuilder::AllocateNewMT(..)</a> (the ‘hot’ scenario mentioned above)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/array.cpp#L148">ArrayClass::GenerateArrayAccessorCallSig(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/0ee3b5e64a98dc71aefed2304fe4bcf7f66ca9f5/src/vm/generics.cpp#L335">ClassLoader::CreateTypeHandleForNonCanonicalGenericInstantiation(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/ecall.cpp#L414">ECall::GetFCallImpl(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/a9b25d4aa22a1f4ad5f323f6c826e318f5a720fe/src/vm/clrtocomcall.cpp#L77">ComPlusCall::PopulateComPlusCallMethodDesc(..)</a></li>
</ul>
</li>
<li>List of all <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetStubHeap&type="><strong>Stub Heap</strong> usages</a>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/8cc7e35dd0a625a3b883703387291739a148e8c8/src/vm/prestub.cpp#L1005">MethodDesc::DoPrestub(..)</a> (triggers JIT-ting of a method)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/44285ef65b626db7954066ff596d6be07c7dd7a2/src/vm/dllimportcallback.cpp#L953">UMEntryThunkCache::GetUMEntryThunk(..)</a> (a DLL Import callback)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/51e968b013e9b1582035f202e004ed024f747f4f/src/vm/comtoclrcall.cpp#L1858">ComCall::CreateGenericComCallStub(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/8cc7e35dd0a625a3b883703387291739a148e8c8/src/vm/prestub.cpp#L956">MakeUnboxingStubWorker(..)</a></li>
</ul>
</li>
<li>List of all <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetPrecodeHeap&type="><strong>Precode Heap</strong> Usages</a>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/fd3668c7c9b9f5d64b5e6d1edf8c55a307cd3c2d/src/vm/method.cpp#L4693">MethodDescChunk::AllocateCompactEntryPoints(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/980c1204d68f54be77eb840cc3f2e4fe2df42a26/src/vm/precode.cpp#L378">Precode::Allocate(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/980c1204d68f54be77eb840cc3f2e4fe2df42a26/src/vm/precode.cpp#L542">Precode::AllocateTemporaryEntryPoints(..)</a></li>
</ul>
</li>
<li>List of all <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetExecutableHeap&type="><strong>Executable Heap</strong> usages</a>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/4f8be95166a30ea7c0b1d6aed4ef424ee47c425a/src/vm/i386/cgenx86.cpp#L1086">GenerateInitPInvokeFrameHelper(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/38a2a69c786e4273eb1339d7a75f939c410afd69/src/vm/i386/jitinterfacex86.cpp#L883">JIT_TrialAlloc::GenBox(..)</a> (x86 JIT)</li>
<li>From <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/loaderallocator.hpp#L329">comment on GetExecutableHeap()</a> ‘The executable heap is intended to only be used by the global loader allocator.’</li>
</ul>
</li>
</ul>
<p>All the general ‘Loader Heaps’ listed above are allocated in the <code class="language-plaintext highlighter-rouge">LoaderAllocator::Init(..)</code> function (<a href="https://github.com/dotnet/coreclr/blob/32b52269a270f9b7800da3ba119b92061f528789/src/vm/loaderallocator.cpp#L986-L1044">link to actual code</a>), the <code class="language-plaintext highlighter-rouge">executable</code> and <code class="language-plaintext highlighter-rouge">stub</code> heap have the ‘executable’ flag set, all the rest don’t. The size of these heaps is <a href="https://github.com/dotnet/coreclr/blob/32b52269a270f9b7800da3ba119b92061f528789/src/vm/appdomain.hpp#L811-L818">configured in this code</a>, they ‘reserve’ different amounts up front, but they all have a ‘commit’ size that is equivalent to one OS ‘page’.</p>
<p>In addition to the ‘general’ heaps, there are some others that are specifically used by the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">Virtual Stub Dispatch</a> mechanism, they are known as the <code class="language-plaintext highlighter-rouge">indcell_heap</code>, <code class="language-plaintext highlighter-rouge">cache_entry_heap</code>, <code class="language-plaintext highlighter-rouge">lookup_heap</code>, <code class="language-plaintext highlighter-rouge">dispatch_heap</code> and <code class="language-plaintext highlighter-rouge">resolve_heap</code>, they’re allocated <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/virtualcallstub.cpp#L690-L756">in this code</a>, using the <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/virtualcallstub.cpp#L521-L688">specified commit/reserve sizes</a>.</p>
<p>Finally, if you’re interested in the mechanics of how the heaps actually work <a href="https://github.com/dotnet/coreclr/blob/master/src/utilcode/loaderheap.cpp">take a look at LoaderHeap.cpp</a>.</p>
<h3 id="jit-memory-usage">JIT Memory Usage</h3>
<p>Last, but by no means least, there is one other component in the CLR that extensively allocates memory and that is the JIT. It does so in 2 main scenarios:</p>
<ol>
<li><strong>‘Transient’</strong> or temporary memory needed when it’s doing the job of converting IL code into machine code</li>
<li><strong>‘Permanent’</strong> memory used when it needs to emit the ‘machine code’ for a method</li>
</ol>
<h4 id="transient-memory"><strong>‘Transient’ Memory</strong></h4>
<p>This is needed by the JIT when it is doing the job of converting IL code into machine code for the current CPU architecture. This memory is only needed whilst the JIT is running and can be re-used/discarded later, it is used to hold the internal <a href="https://github.com/dotnet/coreclr/blob/bbf13d7e5e0764770cc0d55d727beb73a05d55f6/Documentation/botr/ryujit-overview.md#overview-of-the-ir">JIT data structures</a> (e.g. <code class="language-plaintext highlighter-rouge">Compiler</code>, <code class="language-plaintext highlighter-rouge">BasicBlock</code>, <code class="language-plaintext highlighter-rouge">GenTreeStmt</code>, etc).</p>
<p>For example, take a look at the following code from <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/jit/valuenum.cpp#L4489">Compiler::fgValueNumber()</a>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
<span class="c1">// Allocate the value number store.</span>
<span class="n">assert</span><span class="p">(</span><span class="n">fgVNPassesCompleted</span> <span class="o">></span> <span class="mi">0</span> <span class="o">||</span> <span class="n">vnStore</span> <span class="o">==</span> <span class="nb">nullptr</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">fgVNPassesCompleted</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CompAllocator</span><span class="o">*</span> <span class="n">allocator</span> <span class="o">=</span> <span class="k">new</span> <span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">CMK_ValueNumber</span><span class="p">)</span> <span class="n">CompAllocator</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">CMK_ValueNumber</span><span class="p">);</span>
<span class="n">vnStore</span> <span class="o">=</span> <span class="k">new</span> <span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">CMK_ValueNumber</span><span class="p">)</span> <span class="n">ValueNumStore</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">allocator</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span>
</code></pre></div></div>
<p>The line <code class="language-plaintext highlighter-rouge">vnStore = new (this, CMK_ValueNumber) ...</code> ends up calling the specialised <code class="language-plaintext highlighter-rouge">new</code> operator defined in <a href="https://github.com/dotnet/coreclr/blob/74a3f9691e490e9732da55c46b678159c64fae74/src/jit/compiler.hpp">compiler.hpp</a> (code shown below), which as per the comment, uses a customer ‘Arena Allocator’ that is implemented in <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/alloc.cpp">/src/jit/alloc.cpp</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*****************************************************************************
* operator new
*
* Note that compGetMem is an arena allocator that returns memory that is
* not zero-initialized and can contain data from a prior allocation lifetime.
* it also requires that 'sz' be aligned to a multiple of sizeof(int)
*/</span>
<span class="kr">inline</span> <span class="kt">void</span><span class="o">*</span> <span class="kr">__cdecl</span> <span class="k">operator</span> <span class="nf">new</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">sz</span><span class="p">,</span> <span class="n">Compiler</span><span class="o">*</span> <span class="n">context</span><span class="p">,</span> <span class="n">CompMemKind</span> <span class="n">cmk</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sz</span> <span class="o">=</span> <span class="n">AlignUp</span><span class="p">(</span><span class="n">sz</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
<span class="n">assert</span><span class="p">(</span><span class="n">sz</span> <span class="o">!=</span> <span class="mi">0</span> <span class="o">&&</span> <span class="p">(</span><span class="n">sz</span> <span class="o">&</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="n">context</span><span class="o">-></span><span class="n">compGetMem</span><span class="p">(</span><span class="n">sz</span><span class="p">,</span> <span class="n">cmk</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This technique (of overriding the <code class="language-plaintext highlighter-rouge">new</code> operator) is used in <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22operator+new%22&type=">lots of places throughout the CLR</a>, for instance there is a generic one implemented in <a href="https://github.com/dotnet/coreclr/blob/32b52269a270f9b7800da3ba119b92061f528789/src/utilcode/clrhost_nodependencies.cpp#L421-L440">the CLR Host</a>.</p>
<h4 id="permanent-memory"><strong>‘Permanent’ Memory</strong></h4>
<p>The last type of memory that the JIT uses is ‘permanent’ memory to store the JITted machine code, this is done via calls to <a href="https://github.com/dotnet/coreclr/blob/44f57065649af5f8bcbb7c71d827221a7bc1bf7a/src/jit/compiler.cpp#L2163-L2200">Compiler::compGetMem(..)</a>, starting from <a href="https://github.com/dotnet/coreclr/blob/44f57065649af5f8bcbb7c71d827221a7bc1bf7a/src/jit/compiler.cpp#L5066-L5345">Compiler::compCompile(..)</a> via the call-stack shown below. Note that as before this uses a customer ‘Arena Allocator’ that is implemented in <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/alloc.cpp">/src/jit/alloc.cpp</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+ clrjit!ClrAllocInProcessHeap
+ clrjit!ArenaAllocator::allocateHostMemory
+ clrjit!ArenaAllocator::allocateNewPage
+ clrjit!ArenaAllocator::allocateMemory
+ clrjit!Compiler::compGetMem
+ clrjit!emitter::emitGetMem
+ clrjit!emitter::emitAllocInstr
+ clrjit!emitter::emitNewInstrTiny
+ clrjit!emitter::emitIns_R_R
+ clrjit!emitter::emitInsBinary
+ clrjit!CodeGen::genCodeForStoreLclVar
+ clrjit!CodeGen::genCodeForTreeNode
+ clrjit!CodeGen::genCodeForBBlist
+ clrjit!CodeGen::genGenerateCode
+ clrjit!Compiler::compCompile
</code></pre></div></div>
<hr />
<h2 id="real-world-example">Real-world example</h2>
<p>Finally, to prove that this investigation matches with more real-world scenarios, we can see similar memory usage breakdowns in this GitHub issue: <a href="https://github.com/dotnet/coreclr/issues/10380#issuecomment-288365180">[Question] Reduce memory consumption of CoreCLR</a></p>
<blockquote>
<p>Yes, we have profiled several Xamarin GUI applications on Tizen Mobile.</p>
<p>Typical profile of CoreCLR’s memory on the GUI applications is the following:</p>
<ol>
<li>Mapped assembly images - 4.2 megabytes (50%)</li>
<li>JIT-compiler’s memory - 1.7 megabytes (20%)</li>
<li>Execution engine - about 1 megabyte (11%)</li>
<li>Code heap - about 1 megabyte (11%)</li>
<li>Type information - about 0.5 megabyte (6%)</li>
<li>Objects heap - about 0.2 megabyte (2%)</li>
</ol>
</blockquote>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14740169">HackerNews</a></p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>See the links below for additional information on ‘Loader Heaps’</p>
<ul>
<li><a href="https://web.archive.org/web/20080919091745/http://msdn.microsoft.com:80/en-us/magazine/cc163791.aspx#S5">Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects</a> (wayback machine version)</li>
<li><a href="https://vivekcek.wordpress.com/2016/07/10/c-different-types-of-heap-memory/">C# Different Types Of Heap Memory</a></li>
<li><a href="https://social.msdn.microsoft.com/Forums/vstudio/en-US/24eac008-e6a2-4205-b551-68acb5bfb9f5/need-clarification-loader-heap-high-frequency-heap-and-method-tables?forum=clr">Need clarification : Loader Heap , High Frequency heap and method tables</a></li>
<li><a href="https://blogs.msdn.microsoft.com/alejacma/2009/08/24/managed-debugging-with-windbg-managed-heap-part-5/">MANAGED DEBUGGING with WINDBG. Managed Heap. Part 5</a></li>
<li><a href="https://stackoverflow.com/questions/10121943/net-process-memory-usage-5x-clr-heap-memory">.NET process memory usage = 5x CLR Heap Memory?</a></li>
<li><a href="https://stackoverflow.com/questions/4403506/what-is-the-difference-between-object-and-loader-heap-in-net-4-0/4517582#4517582">what is the difference between object and loader heap in .net 4.0</a></li>
<li><a href="https://csharp.2000things.com/2011/01/03/200-static-data-and-constants-are-stored-on-the-heap/">2,000 Things You Should Know About C# - #200 – Static Data and Constants Are Stored on the Heap</a></li>
<li><a href="https://stackoverflow.com/questions/4405627/high-frequency-heap">High Frequency Heap - Can anyone explain me the CLR’s “HighFrequencyHeap”?</a></li>
<li><a href="https://stackoverflow.com/questions/8479529/heap-memory-management-net/12062828#12062828">To help clarify the discussion on the heaps here, there are about 8 different heaps that the CLR uses</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22%5Bmemory+consumption%5D%22&type=Issues">Issues about ‘[Memory Consumption]’</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/07/10/Memory-Usage-Inside-the-CLR/">Memory Usage Inside the CLR</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
How the .NET Runtime loads a Type2017-06-15T00:00:00+00:00http://www.mattwarren.org/2017/06/15/How-the-.NET-Rutime-loads-a-Type
<p>It is something we take for granted every time we run a .NET program, but it turns out that loading a Type or <code class="language-plaintext highlighter-rouge">class</code> is a fairly complex process.</p>
<p>So how does the .NET Runtime (CLR) actually load a Type?</p>
<hr />
<p>If you want the tl;dr it’s done <strong>carefully</strong>, <strong>cautiously</strong> and <strong>step-by-step</strong></p>
<hr />
<h2 id="ensuring-type-safety">Ensuring Type Safety</h2>
<p>One of the key requirements of a ‘Managed Runtime’ is providing Type Safety, but what does it actually mean? From the MSDN page on <a href="https://msdn.microsoft.com/en-us/library/hbzz1a9a(v=vs.110).aspx">Type Safety and Security</a></p>
<blockquote>
<p>Type-safe code accesses only the memory locations it is authorized to access. (For this discussion, type safety specifically refers to memory type safety and should not be confused with type safety in a broader respect.) For example, type-safe code cannot read values from another object’s private fields. It accesses types only in well-defined, allowable ways.</p>
</blockquote>
<p>So in effect, the CLR has to ensure your Types/Classes are <strong>well-behaved</strong> and <strong>following the rules</strong>.</p>
<h3 id="compiler-prevents-you-from-creating-an-abstract-class">Compiler prevents you from creating an ‘abstract’ class</h3>
<p>But lets look at a more concrete example, using the C# code below</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">abstract</span> <span class="k">class</span> <span class="nc">AbstractClass</span>
<span class="p">{</span>
<span class="k">public</span> <span class="nf">AbstractClass</span><span class="p">()</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">NormalClass</span> <span class="p">:</span> <span class="n">AbstractClass</span>
<span class="p">{</span>
<span class="k">public</span> <span class="nf">NormalClass</span><span class="p">()</span> <span class="p">{</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">test</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">AbstractClass</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The compiler quite rightly refuses to compile this and gives the following error, because <a href="https://msdn.microsoft.com/en-us/library/k535acbf(v=vs.71).aspx"><code class="language-plaintext highlighter-rouge">abstract</code> classes can’t be created</a>, you can only inherit from them.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error CS0144: Cannot create an instance of the abstract class or interface
'ConsoleApplication.AbstractClass'
</code></pre></div></div>
<p>So that’s all well and good, but the CLR can’t rely on <strong>all</strong> code being created via a well-behaved compiler, or in fact via a compiler at all. So it <strong>has</strong> to check for and prevent any attempt to create an <code class="language-plaintext highlighter-rouge">abstract</code> class.</p>
<h3 id="writing-il-code-by-hand">Writing IL code by hand</h3>
<p>One way to circumvent the compiler is to write IL code by hand using the <a href="https://docs.microsoft.com/en-us/dotnet/framework/tools/ilasm-exe-il-assembler">IL Assembler tool (ILAsm)</a> which will do <em>almost</em> no checks on the validity of the IL you give it.</p>
<p>For instance the IL below is the equivalent of writing <code class="language-plaintext highlighter-rouge">var test = new AbstractClass();</code> (if the C# compiler would let us):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.method public hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 1
.locals init (
[0] class ConsoleApplication.NormalClass class2)
// System.InvalidOperationException: Instances of abstract classes cannot be created.
newobj instance void ConsoleApplication.AbstractClass::.ctor()
stloc.0
ldloc.0
callvirt instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
callvirt instance string [mscorlib]System.Reflection.MemberInfo::get_Name()
call void [mscorlib]Internal.Console::WriteLine(string)
ret
}
</code></pre></div></div>
<p>Fortunately the CLR has got this covered and will throw an <code class="language-plaintext highlighter-rouge">InvalidOperationException</code> when you execute the code. This is due to <a href="https://github.com/dotnet/coreclr/blob/dde63bc1aa39aabae77fb89aad583483965c523e/src/vm/jitinterface.cpp#L5832-L5835">this check</a> which is hit when the JIT compiles the <code class="language-plaintext highlighter-rouge">newobj</code> IL instruction.</p>
<h3 id="creating-types-at-run-time">Creating Types at run-time</h3>
<p>One other way that you can attempt to create an <code class="language-plaintext highlighter-rouge">abstract</code> class is at run-time, using reflection (thanks to <a href="https://blogs.msdn.microsoft.com/seteplia/2017/02/01/dissecting-the-new-constraint-in-c-a-perfect-example-of-a-leaky-abstraction/">this blog post</a> for giving me some tips on other ways of creating Types).</p>
<p>This is shown in the code below:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">abstractType</span> <span class="p">=</span> <span class="n">Type</span><span class="p">.</span><span class="nf">GetType</span><span class="p">(</span><span class="s">"ConsoleApplication.AbstractClass"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">abstractType</span><span class="p">.</span><span class="n">FullName</span><span class="p">);</span>
<span class="c1">// System.MissingMethodException: Cannot create an abstract class.</span>
<span class="kt">var</span> <span class="n">abstractInstance</span> <span class="p">=</span> <span class="n">Activator</span><span class="p">.</span><span class="nf">CreateInstance</span><span class="p">(</span><span class="n">abstractType</span><span class="p">);</span>
</code></pre></div></div>
<p>The compiler is completely happy with this, it doesn’t do anything to prevent or warn you and nor should it. However when you run the code, it will throw an exception, strangely enough a <code class="language-plaintext highlighter-rouge">MissingMethodException</code> this time, but it does the job!</p>
<p>The call stack is below:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Activator.cs#L45-L52">Activator CreateInstance(..)</a> (C# code)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/97c58ac4fce27b7796206a59eea0ca27cb49fe1a/src/mscorlib/src/System/RtType.cs#L4767-L4793">RtType CreateInstanceSlow(..)</a> (C# code)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/b479cee9fdcee2cb4035fda788d34e724e32a222/src/mscorlib/src/System/RuntimeHandles.cs#L223">RuntimeHandles CreateInstance(..)</a> (extern call)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/reflectioninvocation.cpp#L473-L631">RuntimeTypeHandle::CreateInstance(..)</a> (C++ implementation)</li>
<li>The <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/reflectioninvocation.cpp#L552">actual check</a> that throws a <code class="language-plaintext highlighter-rouge">MissingMethodException</code></li>
</ul>
<p>One final way (unless I’ve missed some out?) is to use <code class="language-plaintext highlighter-rouge">GetUninitializedObject(..)</code> in the <a href="https://msdn.microsoft.com/en-us/library/system.runtime.serialization.formatterservices.getuninitializedobject(v=vs.110).aspx">FormatterServices class</a> like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">object</span> <span class="nf">CreateInstance</span><span class="p">(</span><span class="n">Type</span> <span class="n">type</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">constructor</span> <span class="p">=</span> <span class="n">type</span><span class="p">.</span><span class="nf">GetConstructor</span><span class="p">(</span><span class="k">new</span> <span class="n">Type</span><span class="p">[</span><span class="m">0</span><span class="p">]);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">constructor</span> <span class="p">==</span> <span class="k">null</span> <span class="p">&&</span> <span class="p">!</span><span class="n">type</span><span class="p">.</span><span class="n">IsValueType</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">NotSupportedException</span><span class="p">(</span>
<span class="s">"Type '"</span> <span class="p">+</span> <span class="n">type</span><span class="p">.</span><span class="n">FullName</span> <span class="p">+</span> <span class="s">"' doesn't have a parameterless constructor"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">var</span> <span class="n">emptyInstance</span> <span class="p">=</span> <span class="n">FormatterServices</span><span class="p">.</span><span class="nf">GetUninitializedObject</span><span class="p">(</span><span class="n">type</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">constructor</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span>
<span class="k">return</span> <span class="k">null</span><span class="p">;</span>
<span class="k">return</span> <span class="n">constructor</span><span class="p">.</span><span class="nf">Invoke</span><span class="p">(</span><span class="n">emptyInstance</span><span class="p">,</span> <span class="k">new</span> <span class="kt">object</span><span class="p">[</span><span class="m">0</span><span class="p">])</span> <span class="p">??</span> <span class="n">emptyInstance</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">var</span> <span class="n">abstractType</span> <span class="p">=</span> <span class="n">Type</span><span class="p">.</span><span class="nf">GetType</span><span class="p">(</span><span class="s">"ConsoleApplication.AbstractClass"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">abstractType</span><span class="p">.</span><span class="n">FullName</span><span class="p">);</span>
<span class="c1">// System.MemberAccessException: Cannot create an abstract class.</span>
<span class="kt">var</span> <span class="n">abstractInstance</span> <span class="p">=</span> <span class="nf">CreateInstance</span><span class="p">(</span><span class="n">abstractType</span><span class="p">);</span>
</code></pre></div></div>
<p>Again the run-time stops you from doing this, however this time it decides to throw a <code class="language-plaintext highlighter-rouge">MemberAccessException</code>?</p>
<p>This happens via the following call stack:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/3ababc21ab334a2e37c6ba4115c946ea26a6f2fb/src/mscorlib/src/System/Runtime/Serialization/FormatterServices.cs#L42-L56">FormatterServices GetUninitializedObject(..)</a> (C# code)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/3ababc21ab334a2e37c6ba4115c946ea26a6f2fb/src/mscorlib/src/System/Runtime/Serialization/FormatterServices.cs#L59">FormatterServices nativeGetUninitializedObject(..)</a> (extern call)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/reflectioninvocation.cpp#L2676-L2739">ReflectionSerialization::GetUninitializedObject(..)</a> (C++ implementation)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/reflectioninvocation.cpp#L2705">Actual check</a> that throws a <code class="language-plaintext highlighter-rouge">MemberAccessException</code></li>
</ul>
<h3 id="further-type-safety-checks">Further Type-Safety Checks</h3>
<p>These checks are just one example of what the runtime has to validate when creating types, there are many more things is has to deal with. For instance you <strong>can’t</strong>:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/reflectioninvocation.cpp#L550">instantiate an interface</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/dde63bc1aa39aabae77fb89aad583483965c523e/src/vm/jitinterface.cpp#L5827-L5830">create a Function Pointer type</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/dde63bc1aa39aabae77fb89aad583483965c523e/src/vm/jitinterface.cpp#L7385-L7395">load a type with invalid IL</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/dde63bc1aa39aabae77fb89aad583483965c523e/src/vm/jitinterface.cpp#L6251-L6254">box a type containing stack pointers</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/3437a820fdc94caa3d1775bcee802b056f3adce2/src/vm/methodtablebuilder.cpp#L12298-L12301">load a type if any of it’s generic argument types failed to load</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/3437a820fdc94caa3d1775bcee802b056f3adce2/src/vm/methodtablebuilder.cpp#L12477-L12488">create a subclass of an Array</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/classcompat.cpp#L2612-L2615">create virtual, static methods</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/classcompat.cpp#L2560-L2563">have methods in an enum</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/classcompat.cpp#L2408-L2411">have a class with a method name that is too long</a> (<a href="https://github.com/dotnet/coreclr/blob/cb8cfba3b61f18f81787322f0a2563d118c26b8a/src/inc/corhdr.h#L170">1024 characters if you’re wondering</a>)</li>
<li>and many, many more (for instance, search <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/classcompat.cpp">classcompat.cpp</a> for <code class="language-plaintext highlighter-rouge">BuildMethodTableThrowException</code> and <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/methodtablebuilder.cpp">methodtablebuilder.cpp</a> for <code class="language-plaintext highlighter-rouge">ThrowTypeLoadException</code>)</li>
</ul>
<hr />
<h2 id="loading-types-step-by-step">Loading Types ‘step-by-step’</h2>
<p>So we’ve seen that the CLR has to do multiple checks when it’s loading types, but why does it have to load them ‘step-by-step’?</p>
<p>Well in a nutshell, it’s because of circular references and recursion, particularly when dealing with generics types. If we take the code below from section ‘2.1 Load Levels’ in <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/type-loader.md#21-load-levels">Type Loader Design (BotR)</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">classA</span><span class="p"><</span><span class="n">T</span><span class="p">></span> <span class="p">:</span> <span class="n">C</span><span class="p"><</span><span class="n">B</span><span class="p"><</span><span class="n">T</span><span class="p">>></span>
<span class="p">{</span> <span class="p">}</span>
<span class="n">classB</span><span class="p"><</span><span class="n">T</span><span class="p">></span> <span class="p">:</span> <span class="n">C</span><span class="p"><</span><span class="n">A</span><span class="p"><</span><span class="n">T</span><span class="p">>></span>
<span class="p">{</span> <span class="p">}</span>
<span class="n">classC</span><span class="p"><</span><span class="n">T</span><span class="p">></span>
<span class="p">{</span> <span class="p">}</span>
</code></pre></div></div>
<p>These are valid types and <code class="language-plaintext highlighter-rouge">class A</code> depends on <code class="language-plaintext highlighter-rouge">class B</code> and vice versa. So we can’t load <code class="language-plaintext highlighter-rouge">A</code> until we know that <code class="language-plaintext highlighter-rouge">B</code> is valid, but we can’t load <code class="language-plaintext highlighter-rouge">B</code>, until we’re sure that <code class="language-plaintext highlighter-rouge">A</code> is valid, a classic deadlock!!</p>
<p>How does the run-time get round this, well from the same BotR page:</p>
<blockquote>
<p>The loader initially creates the structure(s) representing the type and initializes them with data that can be obtained without loading other types. When this “no-dependencies” work is done, the structure(s) can be referred from other places, usually by sticking pointers to them into another structures. <strong>After that the loader progresses in incremental steps and fills the structure(s) with more and more information until it finally arrives at a fully loaded type.</strong> In the above example, the base types of <strong>A</strong> and <strong>B</strong> will be approximated by something that does not include the other type, and substituted by the real thing later.</p>
</blockquote>
<p>(there is also some <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/methodtable.cpp#L5547-L5560">more info here</a>)</p>
<p>So it loads types in stages, step-by-step, ensuring each dependant type has reached the same stage before continuing. These ‘Class Load’ stages are shown in the image below and explained in detail in this very helpful <a href="https://github.com/dotnet/coreclr/blob/5d2a54449d6b9d8fecb788e741654d7dbd992a87/src/vm/classloadlevel.h#L11-L70">source-code comment</a> (Yay for Open-Sourcing the CoreCLR!!)</p>
<p><a href="/images/2017/06/Class Load flow.png"><img src="/images/2017/06/Class Load flow.png" alt="Class Load flow" /></a></p>
<p>The different levels are handled in the <a href="https://github.com/dotnet/coreclr/blob/a3c193780b8e055678feb06b2499cf8e7b41810c/src/vm/clsload.cpp#L3488-L3567">ClassLoader::DoIncrementalLoad(..)</a> method, which contains the <code class="language-plaintext highlighter-rouge">switch</code> statement that deals with them all in turn.</p>
<p>However this is part of a bigger process, which controls loading an entire file, also known as a <code class="language-plaintext highlighter-rouge">Module</code> or <code class="language-plaintext highlighter-rouge">Assembly</code> in .NET terminology. The entire process for that is handled in by another <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L535-L627">dispatch loop (switch statement)</a>, that works with the <code class="language-plaintext highlighter-rouge">FileLoadLevel</code> enum (<a href="https://github.com/dotnet/coreclr/blob/5d2a54449d6b9d8fecb788e741654d7dbd992a87/src/vm/domainfile.h#L28-L53">definition</a>). So in reality the whole process for loading an <code class="language-plaintext highlighter-rouge">Assembly</code> looks like this (the loading of one or more Types happens as sub-steps once the <code class="language-plaintext highlighter-rouge">Module</code> had reached the <code class="language-plaintext highlighter-rouge">FILE_LOADED</code> stage)</p>
<ol>
<li><strong>FILE_LOAD_CREATE</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L60-L86">DomainFile ctor()</a></li>
<li><strong>FILE_LOAD_BEGIN</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1725-L1736">Begin()</a></li>
<li><strong>FILE_LOAD_FIND_NATIVE_IMAGE</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1739-L1912">FindNativeImage()</a></li>
<li><strong>FILE_LOAD_VERIFY_NATIVE_IMAGE_DEPENDENCIES</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L631-L767">VerifyNativeImageDependencies()</a></li>
<li><strong>FILE_LOAD_ALLOCATE</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1973-L2125">Allocate()</a></li>
<li><strong>FILE_LOAD_ADD_DEPENDENCIES</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1038-L1049">AddDependencies()</a></li>
<li><strong>FILE_LOAD_PRE_LOADLIBRARY</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L966-L975">PreLoadLibrary()</a></li>
<li><strong>FILE_LOAD_LOADLIBRARY</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L979-L991">LoadLibrary()</a></li>
<li><strong>FILE_LOAD_POST_LOADLIBRARY</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L993-L1036">PostLoadLibrary()</a></li>
<li><strong>FILE_LOAD_EAGER_FIXUPS</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1051-L1083">EagerFixups()</a></li>
<li><strong>FILE_LOAD_VTABLE_FIXUPS</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1085-L1089">VtableFixups()</a></li>
<li><strong>FILE_LOAD_DELIVER_EVENTS</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L2145-L2186">DeliverSyncEvents()</a></li>
<li><strong>FILE_LOADED</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1091-L1183">FinishLoad()</a>
<ol>
<li><strong>CLASS_LOAD_BEGIN</strong></li>
<li><strong>CLASS_LOAD_UNRESTOREDTYPEKEY</strong></li>
<li><strong>CLASS_LOAD_UNRESTORED</strong></li>
<li><strong>CLASS_LOAD_APPROXPARENTS</strong></li>
<li><strong>CLASS_LOAD_EXACTPARENTS</strong></li>
<li><strong>CLASS_DEPENDENCIES_LOADED</strong></li>
<li><strong>CLASS_LOADED</strong></li>
</ol>
</li>
<li><strong>FILE_LOAD_VERIFY_EXECUTION</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1185-L1214">VerifyExecution()</a></li>
<li><strong>FILE_ACTIVE</strong> - <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/domainfile.cpp#L1216-L1322">Activate()</a>
<ul>
<li>calls <a href="https://github.com/dotnet/coreclr/blob/13e7c4368da664a8b50228b1a5ef01a660fbb2dd/src/vm/methodtable.cpp#L3648-L3686">MethodTable::CheckRunClassInitThrowing()</a> and <a href="https://github.com/dotnet/coreclr/blob/f853a04ea9c80bf63419a07fe3fe2fefb23d25aa/src/vm/ceeload.cpp#L7882-L8197">Module::ExpandAll()</a> which trigger/run the <code class="language-plaintext highlighter-rouge">static</code> constructors of all the classes in the file/module</li>
</ul>
</li>
</ol>
<p>We can see this in action if we <a href="https://github.com/dotnet/coreclr/tree/master/Documentation/building">build a Debug version of the CoreCLR</a> and enable the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md">relevant configuration knobs</a>. For a simple ‘Hello World’ program we get the log output shown below, where <code class="language-plaintext highlighter-rouge">LOADER:</code> messages correspond to <code class="language-plaintext highlighter-rouge">FILE_LOAD_XXX</code> stages and <code class="language-plaintext highlighter-rouge">PHASEDLOAD:</code> messages indicate which <code class="language-plaintext highlighter-rouge">CLASS_LOAD_XXX</code> step we are on.</p>
<p>You can also see some of the other events that happen at the same time, these include creation of <code class="language-plaintext highlighter-rouge">static</code> variables (<code class="language-plaintext highlighter-rouge">STATICS:</code>), thread-statics (<code class="language-plaintext highlighter-rouge">THREAD STATICS:</code>) and <code class="language-plaintext highlighter-rouge">PreStubWorker</code> which indicates methods being prepared for the JITter.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-------------------------------------------------------------------------------------------------------
This is NOT the full output, it's only the parts that reference 'Program.exe' and it's modules/classses
-------------------------------------------------------------------------------------------------------
PEImage: Opened HMODULE C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe
StoreFile: Add cached entry (000007FE65174540) with PEFile 000000000040D6E0
Assembly C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe: bits=0x2
LOADER: 439e30:***Program* >>>Load initiated, LOADED/LOADED
LOADER: 0000000000439E30:***Program* loading at level BEGIN
LOADER: 0000000000439E30:***Program* loading at level FIND_NATIVE_IMAGE
LOADER: 0000000000439E30:***Program* loading at level VERIFY_NATIVE_IMAGE_DEPENDENCIES
LOADER: 0000000000439E30:***Program* loading at level ALLOCATE
STATICS: Allocating statics for module Program
Loaded pModule: "C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe".
Module Program: bits=0x2
STATICS: Allocating 72 bytes for precomputed statics in module C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe in LoaderAllocator 000000000043AA18
StoreFile (StoreAssembly): Add cached entry (000007FE65174F28) with PEFile 000000000040D6E0Completed Load Level ALLOCATE for DomainFile 000000000040D8C0 in AD 1 - success = 1
LOADER: 0000000000439E30:***Program* loading at level ADD_DEPENDENCIES
Completed Load Level ADD_DEPENDENCIES for DomainFile 000000000040D8C0 in AD 1 - success = 1
LOADER: 0000000000439E30:***Program* loading at level PRE_LOADLIBRARY
LOADER: 0000000000439E30:***Program* loading at level LOADLIBRARY
LOADER: 0000000000439E30:***Program* loading at level POST_LOADLIBRARY
LOADER: 0000000000439E30:***Program* loading at level EAGER_FIXUPS
LOADER: 0000000000439E30:***Program* loading at level VTABLE FIXUPS
LOADER: 0000000000439E30:***Program* loading at level DELIVER_EVENTS
DRCT::IsReady - wait(0x100)=258, GetLastError() = 42424
DRCT::IsReady - wait(0x100)=258, GetLastError() = 42424
D::LA: Load Assembly Asy:0x000000000040D8C0 AD:0x0000000000439E30 which:C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe
Completed Load Level DELIVER_EVENTS for DomainFile 000000000040D8C0 in AD 1 - success = 1
LOADER: 0000000000439E30:***Program* loading at level LOADED
Completed Load Level LOADED for DomainFile 000000000040D8C0 in AD 1 - success = 1
LOADER: 439e30:***Program* <<<Load completed, LOADED
In PreStubWorker for System.Environment::SetCommandLineArgs
Prestubworker: method 000007FEC2AE1160M
DoRunClassInit: Request to init 000007FEC3BACCF8T in appdomain 0000000000439E30
RunClassInit: Calling class contructor for type 000007FEC3BACCF8T
In PreStubWorker for System.Environment::.cctor
Prestubworker: method 000007FEC2AE1B10M
DoRunClassInit: Request to init 000007FEC3BACCF8T in appdomain 0000000000439E30
DoRunClassInit: returning SUCCESS for init 000007FEC3BACCF8T in appdomain 0000000000439E30
RunClassInit: Returned Successfully from class contructor for type 000007FEC3BACCF8T
DoRunClassInit: returning SUCCESS for init 000007FEC3BACCF8T in appdomain 0000000000439E30
PHASEDLOAD: LoadTypeHandleForTypeKey for type ConsoleApplication.Program to level LOADED
PHASEDLOAD: table contains:
LoadTypeHandle: Loading Class from Module 000007FE65174718 token 2000002
PHASEDLOAD: Creating loading entry for type ConsoleApplication.Program
PHASEDLOAD: About to do incremental load of type ConsoleApplication.Program (0000000000000000) from level BEGIN
Looking up System.Object by name.
Loading class "ConsoleApplication.Program" from module "C:\coreclr\bin\Product\Windows_NT.x64.Debug\Program.exe" in domain 0x0000000000439E30
SD: MT::MethodIterator created for System.Object.
EEC::IMD: pNewMD:0x65175178 for tok:0x6000001 (ConsoleApplication.Program::.cctor)
EEC::IMD: pNewMD:0x651751a8 for tok:0x6000002 (ConsoleApplication.Program::.ctor)
EEC::IMD: pNewMD:0x651751d8 for tok:0x6000003 (ConsoleApplication.Program::Main)
STATICS: Placing statics for ConsoleApplication.Program
STATICS: Field placed at non GC offset 0x38
Offset of staticCounter1: 56
STATICS: Field placed at non GC offset 0x40
Offset of staticCounter2: 64
STATICS: Static field bytes needed (0 is normal for non dynamic case)0
STATICS: Placing ThreadStatics for ConsoleApplication.Program
THREAD STATICS: Field placed at non GC offset 0x20
Offset of threadStaticCounter1: 32
THREAD STATICS: Field placed at non GC offset 0x28
Offset of threadStaticCounter2: 40
STATICS: ThreadStatic field bytes needed (0 is normal for non dynamic case)0
CLASSLOADER: AppDomainAgileAttribute for ConsoleApplication.Program is 0
MethodTableBuilder: finished method table for module 000007FE65174718 token 2000002 = 000007FE65175230T
PHASEDLOAD: About to do incremental load of type ConsoleApplication.Program (000007FE65175230) from level APPROXPARENTS
Notify: 000007FE65175230 ConsoleApplication.Program
Successfully loaded class ConsoleApplication.Program
PHASEDLOAD: Completed full dependency load of type (000007FE65175230)+ConsoleApplication.Program
PHASEDLOAD: Completed full dependency load of type (000007FE65175230)+ConsoleApplication.Program
LOADER: 439e30:***Program* >>>Load initiated, ACTIVE/ACTIVE
LOADER: 0000000000439E30:***Program* loading at level VERIFY_EXECUTION
LOADER: 0000000000439E30:***Program* loading at level ACTIVE
Completed Load Level ACTIVE for DomainFile 000000000040D8C0 in AD 1 - success = 1
LOADER: 439e30:***Program* <<<Load completed, ACTIVE
In PreStubWorker for ConsoleApplication.Program::Main
Prestubworker: method 000007FE651751D8M
In PreStubWorker, calling MakeJitWorker
CallCompileMethodWithSEHWrapper called...
D::gV: cVars=0, extendOthers=1
Looking up System.Console by name.
SD: MT::MethodIterator created for System.Console.
JitComplete completed successfully
Got through CallCompile MethodWithSEHWrapper
MethodDesc::MakeJitWorker finished. Stub is 000007fe`652d0480
DoRunClassInit: Request to init 000007FE65175230T in appdomain 0000000000439E30
RunClassInit: Calling class contructor for type 000007FE65175230T
In PreStubWorker for ConsoleApplication.Program::.cctor
Prestubworker: method 000007FE65175178M
In PreStubWorker, calling MakeJitWorker
CallCompileMethodWithSEHWrapper called...
D::gV: cVars=0, extendOthers=1
JitComplete completed successfully
Got through CallCompile MethodWithSEHWrapper
MethodDesc::MakeJitWorker finished. Stub is 000007fe`652d04c0
</code></pre></div></div>
<hr />
<p>So there you have it, the CLR loads your classes/Types <strong>carefully</strong>, <strong>cautiously</strong> and <strong>step-by-step</strong>!!</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14564962">HackerNews</a> and <a href="https://www.reddit.com/r/programming/comments/6hfgp7/how_the_net_runtime_loads_a_type/">/r/programming</a></p>
<hr />
<p>As always, here’s some more links if you’d like to find out further information:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/type-loader.md">Type Loader Design</a> (BotR)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/type-system.md">Type System Overview</a> (BotR)</li>
<li><a href="https://blogs.msdn.microsoft.com/davidnotario/2005/02/08/jit-compiler-and-type-constructors-cctors/">JIT compiler and type constructors (.cctors)</a> (i.e. ‘When do class constructors (.cctor) get run’?)</li>
<li><a href="https://blogs.msdn.microsoft.com/ericlippert/2008/02/18/why-do-initializers-run-in-the-opposite-order-as-constructors-part-two/">Why Do Initializers Run In The Opposite Order As Constructors? Part Two</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/969cea6a2ffff6c53a615d2fd398f9a7b8c73290">Disallow statics of spans and class instance members of span (PR)</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/8516">Span: Add tests to verify type loader checks for ref-like types #8516</a></li>
<li><a href="https://weblog.west-wind.com/posts/2012/Nov/03/Back-to-Basics-When-does-a-NET-Assembly-Dependency-get-loaded">Back to Basics: When does a .NET Assembly Dependency get loaded</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/06/15/How-the-.NET-Rutime-loads-a-Type/">How the .NET Runtime loads a Type</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Lowering in the C# Compiler (and what happens when you misuse it)2017-05-25T00:00:00+00:00http://www.mattwarren.org/2017/05/25/Lowering-in-the-C#-Compiler
<p>Turns out that what I’d always thought of as “<em>Compiler magic</em>” or “<em>Syntactic sugar</em>” is actually known by the technical term ‘<em>Lowering</em>’ and the C# compiler (a.k.a <a href="https://github.com/dotnet/roslyn">Roslyn</a>) uses it extensively.</p>
<p>But what is it? Well this quote from <a href="http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488?pgno=2">So You Want To Write Your Own Language?</a> gives us some idea:</p>
<blockquote>
<p><strong>Lowering</strong>
One semantic technique that is obvious in hindsight (but took Andrei Alexandrescu to point out to me) is called “lowering.” It consists of, internally, rewriting more complex semantic constructs in terms of simpler ones. For example, while loops and foreach loops can be rewritten in terms of for loops. Then, the rest of the code only has to deal with for loops. This turned out to uncover a couple of latent bugs in how while loops were implemented in D, and so was a nice win. It’s also used to rewrite scope guard statements in terms of try-finally statements, etc. Every case where this can be found in the semantic processing will be win for the implementation.</p>
<p>– by <a href="https://en.wikipedia.org/wiki/Walter_Bright">Walter Bright</a> (author of the D programming language)</p>
</blockquote>
<p>But if you’re still not sure what it means, have a read of Eric Lippert’s post on the subject, <a href="https://ericlippert.com/2014/04/28/lowering-in-language-design-part-one/">Lowering in language design</a>, which contains this quote:</p>
<blockquote>
<p>A common technique along the way though is to have the compiler “lower” from high-level language features to low-level language features in the <em>same language</em>.</p>
</blockquote>
<hr />
<p>As an aside, if you like reading about the <strong>Roslyn compiler source</strong> you may like these other posts that I’ve written:</p>
<ul>
<li><a href="/2016/10/26/How-does-the-fixed-keyword-work/?recommended=1">How does the ‘fixed’ keyword work?</a></li>
<li><a href="/2014/06/05/roslyn-code-base-performance-lessons-part-1/?recommended=1">Roslyn code base - performance lessons (part 1)</a></li>
<li><a href="/2014/06/10/roslyn-code-base-performance-lessons-part-2/?recommended=1">Roslyn code base - performance lessons (part 2)</a></li>
</ul>
<hr />
<h2 id="what-does-lowering-look-like">What does ‘Lowering’ look like?</h2>
<p>The C# compiler has used lowering for a while, one of the oldest or most recognised examples is when this code:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System.Collections.Generic</span><span class="p">;</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="k">public</span> <span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="nf">M</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="k">value</span> <span class="k">in</span> <span class="k">new</span> <span class="p">[]</span> <span class="p">{</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="m">5</span> <span class="p">})</span>
<span class="p">{</span>
<span class="k">yield</span> <span class="k">return</span> <span class="k">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>is turned into this</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">C</span>
<span class="p">{</span>
<span class="p">[</span><span class="n">CompilerGenerated</span><span class="p">]</span>
<span class="k">private</span> <span class="k">sealed</span> <span class="k">class</span> <span class="err"><</span><span class="nc">M</span><span class="p">></span><span class="n">d__0</span> <span class="p">:</span> <span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">>,</span> <span class="n">IEnumerable</span><span class="p">,</span> <span class="n">IEnumerator</span><span class="p"><</span><span class="kt">int</span><span class="p">>,</span> <span class="n">IDisposable</span><span class="p">,</span> <span class="n">IEnumerator</span>
<span class="p">{</span>
<span class="k">private</span> <span class="kt">int</span> <span class="p"><></span><span class="m">1</span><span class="n">__state</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="p"><></span><span class="m">2</span><span class="n">__current</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="p"><></span><span class="n">l__initialThreadId</span><span class="p">;</span>
<span class="k">public</span> <span class="n">C</span> <span class="p"><></span><span class="m">4</span><span class="n">__this</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span><span class="p">[]</span> <span class="p"><></span><span class="n">s__1</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="p"><></span><span class="n">s__2</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="p"><</span><span class="k">value</span><span class="p">></span><span class="m">5</span><span class="n">__3</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">IEnumerator</span><span class="p"><</span><span class="kt">int</span><span class="p">>.</span><span class="n">Current</span>
<span class="p">{</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="k">get</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">this</span><span class="p">.<></span><span class="m">2</span><span class="n">__current</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">object</span> <span class="n">IEnumerator</span><span class="p">.</span><span class="n">Current</span>
<span class="p">{</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="k">get</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">this</span><span class="p">.<></span><span class="m">2</span><span class="n">__current</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="k">public</span> <span class="p"><</span><span class="n">M</span><span class="p">></span><span class="nf">d__0</span><span class="p">(</span><span class="kt">int</span> <span class="p"><></span><span class="m">1</span><span class="n">__state</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">=</span> <span class="p"><></span><span class="m">1</span><span class="n">__state</span><span class="p">;</span>
<span class="k">this</span><span class="p">.<></span><span class="n">l__initialThreadId</span> <span class="p">=</span> <span class="n">Environment</span><span class="p">.</span><span class="n">CurrentManagedThreadId</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="k">void</span> <span class="n">IDisposable</span><span class="p">.</span><span class="nf">Dispose</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="n">IEnumerator</span><span class="p">.</span><span class="nf">MoveNext</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">num</span> <span class="p">=</span> <span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">num</span> <span class="p">!=</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">num</span> <span class="p">!=</span> <span class="m">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.<></span><span class="n">s__2</span><span class="p">++;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.<></span><span class="n">s__1</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">int</span><span class="p">[]</span> <span class="p">{</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="m">5</span> <span class="p">};</span>
<span class="k">this</span><span class="p">.<></span><span class="n">s__2</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.<></span><span class="n">s__2</span> <span class="p">>=</span> <span class="k">this</span><span class="p">.<></span><span class="n">s__1</span><span class="p">.</span><span class="n">Length</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.<></span><span class="n">s__1</span> <span class="p">=</span> <span class="k">null</span><span class="p">;</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">this</span><span class="p">.<</span><span class="k">value</span><span class="p">></span><span class="m">5</span><span class="n">__3</span> <span class="p">=</span> <span class="k">this</span><span class="p">.<></span><span class="n">s__1</span><span class="p">[</span><span class="k">this</span><span class="p">.<></span><span class="n">s__2</span><span class="p">];</span>
<span class="k">this</span><span class="p">.<></span><span class="m">2</span><span class="n">__current</span> <span class="p">=</span> <span class="k">this</span><span class="p">.<</span><span class="k">value</span><span class="p">></span><span class="m">5</span><span class="n">__3</span><span class="p">;</span>
<span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="k">void</span> <span class="n">IEnumerator</span><span class="p">.</span><span class="nf">Reset</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">NotSupportedException</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="n">IEnumerator</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">>.</span><span class="nf">GetEnumerator</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">C</span><span class="p">.<</span><span class="n">M</span><span class="p">></span><span class="n">d__0</span> <span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">d__</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">==</span> <span class="p">-</span><span class="m">2</span> <span class="p">&&</span> <span class="k">this</span><span class="p">.<></span><span class="n">l__initialThreadId</span> <span class="p">==</span> <span class="n">Environment</span><span class="p">.</span><span class="n">CurrentManagedThreadId</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.<></span><span class="m">1</span><span class="n">__state</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">d__</span> <span class="p">=</span> <span class="k">this</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">d__</span> <span class="p">=</span> <span class="k">new</span> <span class="n">C</span><span class="p">.<</span><span class="n">M</span><span class="p">></span><span class="nf">d__0</span><span class="p">(</span><span class="m">0</span><span class="p">);</span>
<span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">d__</span><span class="p">.<></span><span class="m">4</span><span class="n">__this</span> <span class="p">=</span> <span class="k">this</span><span class="p">.<></span><span class="m">4</span><span class="n">__this</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">d__</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">DebuggerHidden</span><span class="p">]</span>
<span class="n">IEnumerator</span> <span class="n">IEnumerable</span><span class="p">.</span><span class="nf">GetEnumerator</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">this</span><span class="p">.</span><span class="n">System</span><span class="p">.</span><span class="n">Collections</span><span class="p">.</span><span class="n">Generic</span><span class="p">.</span><span class="n">IEnumerable</span><span class="p"><</span><span class="n">System</span><span class="p">.</span><span class="n">Int32</span><span class="p">>.</span><span class="nf">GetEnumerator</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">[</span><span class="nf">IteratorStateMachine</span><span class="p">(</span><span class="k">typeof</span><span class="p">(</span><span class="n">C</span><span class="p">.<</span><span class="n">M</span><span class="p">></span><span class="n">d__0</span><span class="p">))]</span>
<span class="k">public</span> <span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="nf">M</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">C</span><span class="p">.<</span><span class="n">M</span><span class="p">></span><span class="n">d__0</span> <span class="n">expr_07</span> <span class="p">=</span> <span class="k">new</span> <span class="n">C</span><span class="p">.<</span><span class="n">M</span><span class="p">></span><span class="nf">d__0</span><span class="p">(-</span><span class="m">2</span><span class="p">);</span>
<span class="n">expr_07</span><span class="p">.<></span><span class="m">4</span><span class="n">__this</span> <span class="p">=</span> <span class="k">this</span><span class="p">;</span>
<span class="k">return</span> <span class="n">expr_07</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Yikes, I’m glad we don’t have to write that code ourselves!! There’s an entire state-machine in there, built to allow our original code to be halted/resumed each time round the loop (at the ‘yield’ statement).</p>
<hr />
<h2 id="the-c-compiler-and-lowering">The C# compiler and ‘Lowering’</h2>
<p>But it turns out that the Roslyn compiler does <em>a lot</em> more ‘lowering’ than you might think. If you take a look at the code under <a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering">‘/src/Compilers/CSharp/Portable/Lowering’</a> (VB.NET <a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/VisualBasic/Portable/Lowering">equivalent here</a>), you see the following folders:</p>
<ul>
<li><a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering/AsyncRewriter">AsyncRewriter</a></li>
<li><a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering/IteratorRewriter">IteratorRewriter</a></li>
<li><a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering/LambdaRewriter">LambdaRewriter</a></li>
<li><a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering/StateMachineRewriter">StateMachineRewriter</a></li>
</ul>
<p>Which correspond to some C# language features you might be familar with, such as ‘lambdas’, i.e. <code class="language-plaintext highlighter-rouge">x => x.Name > 5</code>, ‘iterators’ used by <code class="language-plaintext highlighter-rouge">yield</code> (above) and the <code class="language-plaintext highlighter-rouge">async</code> keyword.</p>
<p>However if we look at bit deeper, under the <a href="https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter">‘LocalRewriter’ folder</a> we can see lots more scenarios that we might never have considered ‘lowering’, such as:</p>
<ul>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_DelegateCreationExpression.cs">Delegate creation</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Event.cs">Events</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_FixedStatement.cs">‘fixed’ keyword</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_ForEachStatement.cs">ForEach loops</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_IsOperator.cs">‘Is’ operator</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_LockStatement.cs">‘lock’ statement</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_NullCoalescingOperator.cs">’??’ a.k.a the null-coalescing</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StackAlloc.cs">‘stackalloc’ keyword</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs">‘String.Concat()’</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_SwitchStatement.cs">‘switch’ statement</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_ThrowStatement.cs">‘throw’ expression</a></li>
<li><a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_UsingStatement.cs">‘using’ statement</a></li>
<li>even a <a href="https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_WhileStatement.cs">‘while’ loop</a></li>
</ul>
<p>So a big thank-you is due to all the past and present C# language developers and designers, they did all this work for us. Imagine that C# didn’t have all these high-level features, we’d be stuck writing them by hand.</p>
<p>It would be like writing <strong>Java</strong> :-)</p>
<hr />
<h2 id="what-happens-when-you-misuse-it">What happens when you misuse it</h2>
<p>But of course the real fun part is ‘misusing’ or outright ‘abusing’ the compiler. So I set up a little <a href="https://twitter.com/matthewwarren/status/867753577346985984">twitter competition</a> just how much ‘lowering’ could we get the compiler to do for us (i.e the highest ratio of ‘input’ lines of code to ‘output’ lines).</p>
<p>It had the following rules (see <a href="https://gist.github.com/mattwarren/3c7cfaa245effc0a318b87f1ee5dc153">this gist</a> for more info):</p>
<ol>
<li>You can have as many lines as you want within method <code class="language-plaintext highlighter-rouge">M()</code></li>
<li>No single line can be longer than 100 chars</li>
<li>To get your score, divide the ‘# of expanded lines’ by the ‘# of original line(s)’
<ol>
<li>Based on the default <strong>output</strong> formatting of <a href="https://sharplab.io/#b:master/f:r/">https://sharplab.io</a>, no re-formatting allowed!!</li>
<li>But you can format the <strong>intput</strong> however you want, i.e. make use of the full 100 chars</li>
</ol>
</li>
<li>Must compile with no warnings on <a href="https://sharplab.io/#b:master/f:r/">https://sharplab.io</a> (allows C# 7 features)
<ol>
<li>But doesn’t have to do anything sensible when run</li>
</ol>
</li>
<li>You cannot modify the code that is already there, i.e. <code class="language-plaintext highlighter-rouge">public class C {}</code> and <code class="language-plaintext highlighter-rouge">public void M()</code>
<ol>
<li>Cannot just add <code class="language-plaintext highlighter-rouge">async</code> to <code class="language-plaintext highlighter-rouge">public void M()</code>, that’s too easy!!</li>
</ol>
</li>
<li>You can add new <code class="language-plaintext highlighter-rouge">using ...</code> declarations, these do not count towards the line count</li>
</ol>
<p>For instance with the following code (interactive version available on <a href="https://sharplab.io/#b:master/f:r/K4Zwlgdg5gBAygTxAFwKYFsDcBYAUAB2ACMAbMAYxnJIEMQQYBhGAbzxg5kNIpgDcA9mAAmMALIAKAJSt2neQDFgEcgB4UAJ0hQAfDDQoYAXhjTjegESkaACws5c8gL54nQA">sharplab.io</a>):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System</span><span class="p">;</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">M</span><span class="p">()</span> <span class="p">{</span>
<span class="n">Func</span><span class="p"><</span><span class="kt">string</span><span class="p">></span> <span class="n">test</span> <span class="p">=</span> <span class="p">()</span> <span class="p">=></span> <span class="s">"blah"</span><span class="p">?.</span><span class="nf">ToString</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This counts as <strong>1</strong> line of original code (only code inside method <code class="language-plaintext highlighter-rouge">M()</code> is counted)</p>
<p>This expands to <strong>23</strong> lines (again only lines of code inside the braces (<code class="language-plaintext highlighter-rouge">{</code>, <code class="language-plaintext highlighter-rouge">}</code>) of <code class="language-plaintext highlighter-rouge">class C</code> are counted.</p>
<p>Giving a <strong>total score</strong> of <strong>23</strong> (23 / 1)</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">....</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">C</span>
<span class="p">{</span>
<span class="p">[</span><span class="n">CompilerGenerated</span><span class="p">]</span>
<span class="p">[</span><span class="n">Serializable</span><span class="p">]</span>
<span class="k">private</span> <span class="k">sealed</span> <span class="k">class</span> <span class="err"><></span><span class="nc">c</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">readonly</span> <span class="n">C</span><span class="p">.<></span><span class="n">c</span> <span class="p"><></span><span class="m">9</span><span class="p">;</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Func</span><span class="p"><</span><span class="kt">string</span><span class="p">></span> <span class="p"><></span><span class="m">9</span><span class="n">__0_0</span><span class="p">;</span>
<span class="k">static</span> <span class="p"><></span><span class="nf">c</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// Note: this type is marked as 'beforefieldinit'.</span>
<span class="n">C</span><span class="p">.<></span><span class="n">c</span><span class="p">.<></span><span class="m">9</span> <span class="p">=</span> <span class="k">new</span> <span class="n">C</span><span class="p">.<></span><span class="nf">c</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">internal</span> <span class="kt">string</span> <span class="p"><</span><span class="n">M</span><span class="p">></span><span class="nf">b__0_0</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="s">"blah"</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">M</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">C</span><span class="p">.<></span><span class="n">c</span><span class="p">.<></span><span class="m">9</span><span class="n">__0_0</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">C</span><span class="p">.<></span><span class="n">c</span><span class="p">.<></span><span class="m">9</span><span class="n">__0_0</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Func</span><span class="p"><</span><span class="kt">string</span><span class="p">>(</span><span class="n">C</span><span class="p">.<></span><span class="n">c</span><span class="p">.<></span><span class="m">9.</span><span class="p"><</span><span class="n">M</span><span class="p">></span><span class="n">b__0_0</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="results">Results</h3>
<p>The first place entry was the following entry from <a href="https://gist.github.com/mattwarren/3c7cfaa245effc0a318b87f1ee5dc153#gistcomment-2106237">Schabse Laks</a>, which contains 9 lines-of-code inside the <code class="language-plaintext highlighter-rouge">M()</code> method:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System.Linq</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">Y</span> <span class="p">=</span> <span class="n">System</span><span class="p">.</span><span class="n">Collections</span><span class="p">.</span><span class="n">Generic</span><span class="p">.</span><span class="n">IEnumerable</span><span class="p"><</span><span class="kt">dynamic</span><span class="p">>;</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">M</span><span class="p">()</span> <span class="p">{</span>
<span class="p">((</span><span class="n">Y</span><span class="p">)</span><span class="k">null</span><span class="p">).</span><span class="nf">Select</span><span class="p">(</span><span class="k">async</span> <span class="n">x</span> <span class="p">=></span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span>
<span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="k">await</span> <span class="n">x</span><span class="p">.</span><span class="nf">x</span><span class="p">()());</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>this expands to an impressive <strong>7964</strong> lines of code (yep you read that right!!) for a score of <strong>885</strong> (7964 / 9). The main trick he figured out was that adding more lines to the input increased the score, i.e is scales superlinearly. Although it you <a href="https://twitter.com/Schabse/status/867809080714313729">take things too far</a> the compiler bails out with a pretty impressive error message:</p>
<blockquote>
<p>error CS8078: An expression is too long or complex to compile</p>
</blockquote>
<p>Here’s the Top 6 top results:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Submitter</th>
<th>Entry</th>
<th style="text-align: right">Score</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><a href="https://twitter.com/Schabse">Schabse Laks</a></td>
<td><a href="https://twitter.com/Schabse/status/867808817655840768">link</a></td>
<td style="text-align: right"><strong>885</strong> (7964 / 9)</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://twitter.com/a_tessenr">Andrey Dyatlov</a></td>
<td><a href="https://twitter.com/a_tessenr/status/867776073735454721">link</a></td>
<td style="text-align: right"><strong>778</strong> (778 / 1)</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://twitter.com/alrz_h">alrz</a></td>
<td><a href="https://twitter.com/alrz_h/status/867780509627273216">link</a></td>
<td style="text-align: right"><strong>755</strong> (755 / 1)</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://twitter.com/andygocke">Andy Gocke</a> *</td>
<td><a href="https://twitter.com/andygocke/status/867773813907312640">link</a></td>
<td style="text-align: right"><strong>633</strong> (633 / 1)</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://twitter.com/jaredpar">Jared Parsons</a> *</td>
<td><a href="https://twitter.com/jaredpar/status/867772979698049024">link</a></td>
<td style="text-align: right"><strong>461</strong> (461 / 1)</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://twitter.com/jon_cham">Jonathan Chambers</a></td>
<td><a href="https://twitter.com/jon_cham/status/867759359803228162">link</a></td>
<td style="text-align: right"><strong>384</strong> (384 / 1)</td>
</tr>
</tbody>
</table>
<p><code class="language-plaintext highlighter-rouge">*</code> = member of the Roslyn compiler team (they’re not disqualified, but maybe they should have some kind of handicap applied to ‘even out’ the playing field?)</p>
<h3 id="honourable-mentions">Honourable mentions</h3>
<p>However there were some other entries that whilst they didn’t make it into the Top 6, are still worth a mention due to the ingenuity involved:</p>
<ul>
<li>Uncovering a <a href="https://twitter.com/a_tessenr/status/867765123745710080">complier bug</a>, kudos to <a href="https://twitter.com/a_tessenr">@a_tessenr</a>
<ul>
<li><a href="https://github.com/dotnet/roslyn/issues/19778">GitHub bug report</a> and <a href="https://github.com/dotnet/roslyn/pull/19784/files">fix in the compiler</a> that was done within a few hours!!</li>
</ul>
</li>
<li>Hitting an <a href="https://twitter.com/Schabse/status/867809080714313729">internal compiler limit</a>, nice work by <a href="https://twitter.com/Schabse">@Schabse</a></li>
<li>The most <a href="https://twitter.com/NickPalladinos/status/867764488958857216">elegant attempt</a> featuring a <code class="language-plaintext highlighter-rouge">Y combinator</code> by <a href="https://twitter.com/NickPalladinos">@NickPalladinos</a></li>
<li><a href="https://twitter.com/AdamSpeight2008/status/867800480478515200">Using VB.NET</a> (hint: it didn’t end well!!), but still a valiant attempt by <a href="https://twitter.com/AdamSpeight2008">@AdamSpeight2008</a></li>
<li>The most <a href="https://twitter.com/leppie/status/867861870241226753">astheticially pleasing</a> entry by <a href="https://twitter.com/leppie">@leppie</a></li>
</ul>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14422944">HackerNews</a>, <a href="https://www.reddit.com/r/programming/comments/6dfsdl/lowering_in_the_c_compiler_and_what_happens_when/">/r/programming</a> or <a href="https://www.reddit.com/r/csharp/comments/6dgkpk/lowering_in_the_c_compiler_and_what_happens_when/">/r/csharp</a> (whichever takes your fancy!!)</p>
<p>The post <a href="http://www.mattwarren.org/2017/05/25/Lowering-in-the-C-Compiler/">Lowering in the C# Compiler (and what happens when you misuse it)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Adding a new Bytecode Instruction to the CLR2017-05-19T00:00:00+00:00http://www.mattwarren.org/2017/05/19/Adding-a-new-Bytecode-Instruction-to-the-CLR
<p>Now that the <a href="https://blogs.msdn.microsoft.com/dotnet/2015/02/03/coreclr-is-now-open-source/">CoreCLR is open-source</a> we can do fun things, for instance find out if it’s possible to add new <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">IL (Intermediate Language)</a> instruction to the runtime.</p>
<p><strong>TL;DR</strong> it turns out that it’s easier than you might think!! Here are the steps you need to go through:</p>
<ul>
<li>Step 0 - <a href="#step-0">Introduction and Background</a></li>
<li>Step 1 - <a href="#step-1">Add the new IL instruction to the runtime</a></li>
<li>Step 2 - <a href="#step-2">Make the Interpreter work</a></li>
<li>Step 3 - <a href="#step-3">Ensure the JIT can recognise the new op-code</a></li>
<li>Step 4 - <a href="#step-4">Runtime code generation via Reflection.Emit</a></li>
<li>Step 5 - <a href="#step-5">Future Improvements</a></li>
</ul>
<p><strong>Update</strong>: turns out that I wasn’t the only person to have this idea, see <a href="https://www.slideshare.net/kekyo/beachhead-implements-new-opcode-on-clr-jit">Beachhead implements new opcode on CLR JIT</a> for another implementation by <a href="https://twitter.com/kekyo2">Kouji Matsui</a>.</p>
<hr />
<h3 id="step-0">Step 0</h3>
<p>But first a bit of background information. Adding a new IL instruction to the CLR is a pretty rare event, that last time is was done <em>for real</em> was in .NET 2.0 when support for generics was added. This is <em>part</em> of the reason why .NET code had good backwards-compatibility, from <a href="https://msdn.microsoft.com/en-us/library/ff602939(v=vs.110).aspx">Backward compatibility and the .NET Framework 4.5</a>:</p>
<blockquote>
<p>The .NET Framework 4.5 and its point releases (4.5.1, 4.5.2, 4.6, 4.6.1, 4.6.2, and 4.7) are backward-compatible with apps that were built with earlier versions of the .NET Framework. In other words, <strong>apps and components built with previous versions will work without modification on the .NET Framework 4.5</strong>.</p>
</blockquote>
<p><strong>Side note</strong>: The .NET framework <em>did</em> break backwards compatibility when moving from 1.0 to 2.0, precisely so that support for generics could be added <em>deep</em> into the runtime, i.e. with support in the IL. Java took a different decision, I guess because it had been around longer, breaking backwards-comparability was a bigger issue. See the excellent blog post <a href="http://www.jprl.com/Blog/archive/development/2007/Aug-31.html">Comparing Java and C# Generics</a> for more info.</p>
<hr />
<h3 id="step-1">Step 1</h3>
<p>For this exercise I plan to add a new IL instruction (op-code) to the CoreCLR runtime and because I’m a raving narcissist (not really, see below) I’m going to name it after myself. So let me introduce the <code class="language-plaintext highlighter-rouge">matt</code> IL instruction, that you can use like so:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.method private hidebysig static int32 TestMattOpCodeMethod(int32 x, int32 y)
cil managed noinlining
{
.maxstack 2
ldarg.0
ldarg.1
matt // yay, my name as an IL op-code!!!!
ret
}
</code></pre></div></div>
<p>But because I’m actually a bit-British (i.e. I don’t like to <a href="http://www.phrases.org.uk/meanings/68800.html">‘blow my own trumpet’</a>), I’m going to make the <code class="language-plaintext highlighter-rouge">matt</code> op-code almost completely pointless, it’s going to do exactly the same thing as calling <code class="language-plaintext highlighter-rouge">Math.Max(x, y)</code>, i.e. just return the largest of the 2 numbers.</p>
<p>The other reason for naming it <code class="language-plaintext highlighter-rouge">matt</code> is that I’d really like someone to make a version of the <a href="https://github.com/dotnet/roslyn">C# (Roslyn) compiler</a> that allows you to write code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"{0} m@ {1} = {2}"</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">7</span><span class="p">,</span> <span class="m">1</span> <span class="n">m</span><span class="err">@</span> <span class="m">7</span><span class="p">));</span> <span class="c1">// prints '1 m@ 7 = 7'</span>
</code></pre></div></div>
<p>I definitely want the <code class="language-plaintext highlighter-rouge">m@</code> operator to be a thing (pronounced ‘matt’, not ‘m-at’), maybe the other <a href="https://blogs.msdn.microsoft.com/mattwar/2004/03/05/about-me/">‘Matt Warren’</a> who works at Microsoft on the <a href="https://github.com/dotnet/csharplang/blob/master/meetings/2015/LDM-2015-01-21.md#design-team">C# Language Design Team</a> can help out!! Seriously though, if anyone reading this would like to write a similar blog post, showing how you’d add the <code class="language-plaintext highlighter-rouge">m@</code> operator to the Roslyn compiler, please let me know I’d love to read it.</p>
<p><strong>Update</strong>: Thanks to <a href="https://twitter.com/mmjuraszek">Marcin Juraszek (@mmjuraszek)</a> you can now use the <code class="language-plaintext highlighter-rouge">m@</code> in a C# program, see <a href="http://marcinjuraszek.com/2017/05/adding-matt-operator-to-roslyn-part-1.html">Adding Matt operator to Roslyn - Syntax, Lexer and Parser</a>, <a href="http://marcinjuraszek.com/2017/05/adding-matt-operator-to-roslyn-part-2.html">Adding Matt operator to Roslyn - Binder</a> and <a href="http://marcinjuraszek.com/2017/06/adding-matt-operator-to-roslyn-part-3.html">Adding Matt operator to Roslyn - Emitter</a> for the full details.</p>
<p>Now we’ve defined the op-code, the first step is to ensure that the run-time and tooling can recognise it. In particular we need <a href="https://msdn.microsoft.com/en-us/library/496e4ekx(v=vs.110).aspx">the IL Assembler</a> (a.k.a <code class="language-plaintext highlighter-rouge">ilasm</code>) to be able to take the IL code above (<code class="language-plaintext highlighter-rouge">TestMattOpCodeMethod(..)</code>) and produce a .NET executable.</p>
<p>As the .NET runtime source code is nicely structured (+1 to the runtime devs), to make this possible we only need to makes changes in <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/opcode.def">opcode.def</a>:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/inc/opcode.def
</span><span class="gi">+++ b/src/inc/opcode.def
</span><span class="p">@@ -154,7 +154,7 @@</span> OPDEF(CEE_NEWOBJ, "newobj", VarPop, Pu
OPDEF(CEE_CASTCLASS, "castclass", PopRef, PushRef, InlineType, IObjModel, 1, 0xFF, 0x74, NEXT)
OPDEF(CEE_ISINST, "isinst", PopRef, PushI, InlineType, IObjModel, 1, 0xFF, 0x75, NEXT)
OPDEF(CEE_CONV_R_UN, "conv.r.un", Pop1, PushR8, InlineNone, IPrimitive, 1, 0xFF, 0x76, NEXT)
<span class="gd">-OPDEF(CEE_UNUSED58, "unused", Pop0, Push0, InlineNone, IPrimitive, 1, 0xFF, 0x77, NEXT)
</span><span class="gi">+OPDEF(CEE_MATT, "matt", Pop1+Pop1, Push1, InlineNone, IPrimitive, 1, 0xFF, 0x77, NEXT)
</span> OPDEF(CEE_UNUSED1, "unused", Pop0, Push0, InlineNone, IPrimitive, 1, 0xFF, 0x78, NEXT)
OPDEF(CEE_UNBOX, "unbox", PopRef, PushI, InlineType, IPrimitive, 1, 0xFF, 0x79, NEXT)
OPDEF(CEE_THROW, "throw", PopRef, Push0, InlineNone, IObjModel, 1, 0xFF, 0x7A, THROW)
</code></pre></div></div>
<p>I just picked the first available <code class="language-plaintext highlighter-rouge">unused</code> slot and added <code class="language-plaintext highlighter-rouge">matt</code> in there. It’s defined as <code class="language-plaintext highlighter-rouge">Pop1+Pop1</code> because it takes 2 values from the stack as input and <code class="language-plaintext highlighter-rouge">Push1</code> because after is has executed, a single result is pushed back onto the stack.</p>
<p><strong>Note</strong>: all the changes I made are <a href="https://github.com/dotnet/coreclr/compare/master...mattwarren:newOpCode">available in one-place on GitHub</a> if you’d rather look at them like that.</p>
<p>Once this change was done <code class="language-plaintext highlighter-rouge">ilasm</code> will successfully assembly the test code file <code class="language-plaintext highlighter-rouge">HelloWorld.il</code> that contains <code class="language-plaintext highlighter-rouge">TestMattOpCodeMethod(..)</code> as shown above:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>λ ilasm /EXE /OUTPUT=HelloWorld.exe -NOLOGO HelloWorld.il
Assembling 'HelloWorld.il' to EXE --> 'HelloWorld.exe'
Source file is ANSI
Assembled method HelloWorld::Main
Assembled method HelloWorld::TestMattOpCodeMethod
Creating PE file
Emitting classes:
Class 1: HelloWorld
Emitting fields and methods:
Global
Class 1 Methods: 2;
Resolving local member refs: 1 -> 1 defs, 0 refs, 0 unresolved
Emitting events and properties:
Global
Class 1
Resolving local member refs: 0 -> 0 defs, 0 refs, 0 unresolved
Writing PE file
Operation completed successfully
</code></pre></div></div>
<hr />
<h3 id="step-2">Step 2</h3>
<p>However at this point the <code class="language-plaintext highlighter-rouge">matt</code> op-code isn’t actually executed, at runtime the CoreCLR just throws an exception because it doesn’t know what to do with it. As a first (simpler) step, I just wanted to make the <a href="/2017/03/30/The-.NET-IL-Interpreter/">.NET Interpreter</a> work, so I made the following changes to wire it up:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/vm/interpreter.cpp
</span><span class="gi">+++ b/src/vm/interpreter.cpp
</span><span class="p">@@ -2726,6 +2726,9 @@</span> void Interpreter::ExecuteMethod(ARG_SLOT* retVal, __out bool* pDoJmpCall, __out
case CEE_REM_UN:
BinaryIntOp<BIO_RemUn>();
break;
<span class="gi">+ case CEE_MATT:
+ BinaryArithOp<BA_Matt>();
+ break;
</span> case CEE_AND:
BinaryIntOp<BIO_And>();
break;
--- a/src/vm/interpreter.hpp
<span class="gi">+++ b/src/vm/interpreter.hpp
</span><span class="p">@@ -298,10 +298,14 @@</span> void Interpreter::BinaryArithOpWork(T val1, T val2)
{
res = val1 / val2;
}
<span class="gd">- else
</span><span class="gi">+ else if (op == BA_Rem)
</span> {
res = RemFunc(val1, val2);
}
<span class="gi">+ else if (op == BA_Matt)
+ {
+ res = MattFunc(val1, val2);
+ }
</span> }
</code></pre></div></div>
<p>and then I added the methods that would actually implement the interpreted code:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/vm/interpreter.cpp
</span><span class="gi">+++ b/src/vm/interpreter.cpp
</span><span class="p">@@ -10801,6 +10804,26 @@</span> double Interpreter::RemFunc(double v1, double v2)
return fmod(v1, v2);
}
<span class="gi">+INT32 Interpreter::MattFunc(INT32 v1, INT32 v2)
+{
+ return v1 > v2 ? v1 : v2;
+}
+
+INT64 Interpreter::MattFunc(INT64 v1, INT64 v2)
+{
+ return v1 > v2 ? v1 : v2;
+}
+
+float Interpreter::MattFunc(float v1, float v2)
+{
+ return v1 > v2 ? v1 : v2;
+}
+
+double Interpreter::MattFunc(double v1, double v2)
+{
+ return v1 > v2 ? v1 : v2;
+}
</span></code></pre></div></div>
<p>So fairly straight-forward and the bonus is that at this point the <code class="language-plaintext highlighter-rouge">matt</code> operator is fully operational, you can actually write IL using it and it will run (interpreted only).</p>
<hr />
<h3 id="step-3">Step 3</h3>
<p>However not everyone wants to <a href="/2017/03/30/The-.NET-IL-Interpreter/">re-compile the CoreCLR</a> just to enable the Interpreter, so I want to also make it work <em>for real</em> via the Just-in-Time (JIT) compiler.</p>
<p>The full changes to make this work were spread across multiple files, but were mostly <em>housekeeping</em> so I won’t include them all here, <a href="https://github.com/dotnet/coreclr/compare/master...mattwarren:newOpCode">check-out the full diff</a> if you’re interested. But the significant parts are below:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/jit/importer.cpp
</span><span class="gi">+++ b/src/jit/importer.cpp
</span><span class="p">@@ -11112,6 +11112,10 @@</span> void Compiler::impImportBlockCode(BasicBlock* block)
oper = GT_UMOD;
goto MATH_MAYBE_CALL_NO_OVF;
<span class="gi">+ case CEE_MATT:
+ oper = GT_MATT;
+ goto MATH_MAYBE_CALL_NO_OVF;
+
</span> MATH_MAYBE_CALL_NO_OVF:
ovfl = false;
MATH_MAYBE_CALL_OVF:
--- a/src/vm/jithelpers.cpp
<span class="gi">+++ b/src/vm/jithelpers.cpp
</span><span class="p">@@ -341,6 +341,14 @@</span> HCIMPL2(UINT32, JIT_UMod, UINT32 dividend, UINT32 divisor)
HCIMPLEND
/*********************************************************************/
<span class="gi">+HCIMPL2(INT32, JIT_Matt, INT32 x, INT32 y)
+{
+ FCALL_CONTRACT;
+ return x > y ? x : y;
+}
+HCIMPLEND
+
+/*********************************************************************/
</span> HCIMPL2_VV(INT64, JIT_LDiv, INT64 dividend, INT64 divisor)
{
FCALL_CONTRACT;
</code></pre></div></div>
<p>In summary, these changes mean that during the JIT’s <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#morph-blocks">‘Morph phase’</a> the IL containing the <code class="language-plaintext highlighter-rouge">matt</code> op code is converted from:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fgMorphTree BB01, stmt 1 (before)
[000004] ------------ ▌ return int
[000002] ------------ │ ┌──▌ lclVar int V01 arg1
[000003] ------------ └──▌ m@ int
[000001] ------------ └──▌ lclVar int V00 arg0
</code></pre></div></div>
<p>into this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fgMorphTree BB01, stmt 1 (after)
[000004] --C--+------ ▌ return int
[000003] --C--+------ └──▌ call help int HELPER.CORINFO_HELP_MATT
[000001] -----+------ arg0 in rcx ├──▌ lclVar int V00 arg0
[000002] -----+------ arg1 in rdx └──▌ lclVar int V01 arg1
</code></pre></div></div>
<p>Note the call to <code class="language-plaintext highlighter-rouge">HELPER.CORINFO_HELP_MATT</code></p>
<p>When this is finally compiled into assembly code it ends up looking like so:</p>
<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Assembly listing for method HelloWorld:TestMattOpCodeMethod(int,int):int </span>
<span class="c1">// Emitting BLENDED_CODE for X64 CPU with AVX </span>
<span class="c1">// optimized code </span>
<span class="c1">// rsp based frame </span>
<span class="c1">// partially interruptible </span>
<span class="c1">// Final local variable assignments </span>
<span class="c1">// </span>
<span class="c1">// V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx </span>
<span class="c1">// V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx </span>
<span class="c1">// V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] </span>
<span class="c1">// </span>
<span class="c1">// Lcl frame size = 40 </span>
<span class="nl">G_M9261_IG01:</span>
<span class="mi">4883</span><span class="n">EC28</span> <span class="n">sub</span> <span class="n">rsp</span><span class="o">,</span> <span class="mi">40</span>
<span class="nl">G_M9261_IG02:</span>
<span class="n">E8976FEB5E</span> <span class="n">call</span> <span class="n">CORINFO_HELP_MATT</span>
<span class="mi">90</span> <span class="n">nop</span>
<span class="nl">G_M9261_IG03:</span>
<span class="mi">4883</span><span class="n">C428</span> <span class="n">add</span> <span class="n">rsp</span><span class="o">,</span> <span class="mi">40</span>
<span class="n">C3</span> <span class="n">ret</span>
</code></pre></div></div>
<p>I’m not entirely sure why there is a <code class="language-plaintext highlighter-rouge">nop</code> instruction in there? But it works, which is the main thing!!</p>
<hr />
<h3 id="step-4">Step 4</h3>
<p>In the CLR you can also dynamically emit code at runtime using the methods that sit under the <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit(v=vs.110).aspx">‘System.Reflection.Emit’ namespace</a>, so the last task is to add the <code class="language-plaintext highlighter-rouge">OpCodes.Matt</code> field and have it emit the correct values for the <code class="language-plaintext highlighter-rouge">matt</code> op-code.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/mscorlib/src/System/Reflection/Emit/OpCodes.cs
</span><span class="gi">+++ b/src/mscorlib/src/System/Reflection/Emit/OpCodes.cs
</span><span class="p">@@ -139,6 +139,7 @@</span> internal enum OpCodeValues
Castclass = 0x74,
Isinst = 0x75,
Conv_R_Un = 0x76,
<span class="gi">+ Matt = 0x77,
</span> Unbox = 0x79,
Throw = 0x7a,
Ldfld = 0x7b,
<span class="p">@@ -1450,6 +1451,16 @@</span> private OpCodes()
(0 << OpCode.StackChangeShift)
);
<span class="gi">+ public static readonly OpCode Matt = new OpCode(OpCodeValues.Matt,
+ ((int)OperandType.InlineNone) |
+ ((int)FlowControl.Next << OpCode.FlowControlShift) |
+ ((int)OpCodeType.Primitive << OpCode.OpCodeTypeShift) |
+ ((int)StackBehaviour.Pop1_pop1 << OpCode.StackBehaviourPopShift) |
+ ((int)StackBehaviour.Push1 << OpCode.StackBehaviourPushShift) |
+ (1 << OpCode.SizeShift) |
+ (-1 << OpCode.StackChangeShift)
+ );
+
</span> public static readonly OpCode Unbox = new OpCode(OpCodeValues.Unbox,
((int)OperandType.InlineType) |
((int)FlowControl.Next << OpCode.FlowControlShift) |
</code></pre></div></div>
<p>This lets us write the code shown below, which emits, compiles and then executes the <code class="language-plaintext highlighter-rouge">matt</code> op-code:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DynamicMethod</span> <span class="n">method</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">DynamicMethod</span><span class="p">(</span>
<span class="s">"TestMattOpCode"</span><span class="p">,</span>
<span class="n">returnType</span><span class="p">:</span> <span class="k">typeof</span><span class="p">(</span><span class="kt">int</span><span class="p">),</span>
<span class="n">parameterTypes</span><span class="p">:</span> <span class="k">new</span> <span class="p">[]</span> <span class="p">{</span> <span class="k">typeof</span><span class="p">(</span><span class="kt">int</span><span class="p">),</span> <span class="k">typeof</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="p">},</span>
<span class="n">m</span><span class="p">:</span> <span class="k">typeof</span><span class="p">(</span><span class="n">TestClass</span><span class="p">).</span><span class="n">Module</span><span class="p">);</span>
<span class="c1">// Emit the IL</span>
<span class="kt">var</span> <span class="n">generator</span> <span class="p">=</span> <span class="n">method</span><span class="p">.</span><span class="nf">GetILGenerator</span><span class="p">();</span>
<span class="n">generator</span><span class="p">.</span><span class="nf">Emit</span><span class="p">(</span><span class="n">OpCodes</span><span class="p">.</span><span class="n">Ldarg_0</span><span class="p">);</span>
<span class="n">generator</span><span class="p">.</span><span class="nf">Emit</span><span class="p">(</span><span class="n">OpCodes</span><span class="p">.</span><span class="n">Ldarg_1</span><span class="p">);</span>
<span class="n">generator</span><span class="p">.</span><span class="nf">Emit</span><span class="p">(</span><span class="n">OpCodes</span><span class="p">.</span><span class="n">Matt</span><span class="p">);</span> <span class="c1">// Use the new 'matt' IL OpCode</span>
<span class="n">generator</span><span class="p">.</span><span class="nf">Emit</span><span class="p">(</span><span class="n">OpCodes</span><span class="p">.</span><span class="n">Ret</span><span class="p">);</span>
<span class="c1">// Compile the IL into a delegate (uses the JITter under-the-hood)</span>
<span class="kt">var</span> <span class="n">mattOpCodeInvoker</span> <span class="p">=</span>
<span class="p">(</span><span class="n">Func</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">>)</span><span class="n">method</span><span class="p">.</span><span class="nf">CreateDelegate</span><span class="p">(</span><span class="k">typeof</span><span class="p">(</span><span class="n">Func</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">>));</span>
<span class="c1">// prints "1 m@ 7 = 7"</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"{0} m@ {1} = {2} (via IL Emit)"</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">7</span><span class="p">,</span> <span class="nf">mattOpCodeInvoker</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">7</span><span class="p">));</span>
<span class="c1">// prints "12 m@ 9 = 12"</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"{0} m@ {1} = {2} (via IL Emit)"</span><span class="p">,</span> <span class="m">12</span><span class="p">,</span> <span class="m">9</span><span class="p">,</span> <span class="nf">mattOpCodeInvoker</span><span class="p">(</span><span class="m">12</span><span class="p">,</span> <span class="m">9</span><span class="p">));</span>
</code></pre></div></div>
<hr />
<h3 id="step-5">Step 5</h3>
<p>Finally, you may have noticed that I cheated a little bit in <a href="#step-3">Step 3</a> when I made changes to the JIT. Even though what I did works, it is not the most efficient way due to the extra method call to <code class="language-plaintext highlighter-rouge">CORINFO_HELP_MATT</code>. Also the JIT generally doesn’t use helper functions in this way, instead prefering to emit assembly code directly.</p>
<p>As a <em>future exercise</em> for anyone who has read this far (any takers?), it would be nice if the JIT emitted more efficient code. For instance if you write C# code like this (which does the same thing as the <code class="language-plaintext highlighter-rouge">matt</code> op-code):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">MaxMethod</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">x</span> <span class="p">></span> <span class="n">y</span> <span class="p">?</span> <span class="n">x</span> <span class="p">:</span> <span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>It’s turned into the following IL by the C# compiler</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IL to import:
IL_0000 02 ldarg.0
IL_0001 03 ldarg.1
IL_0002 30 02 bgt.s 2 (IL_0006)
IL_0004 03 ldarg.1
IL_0005 2a ret
IL_0006 02 ldarg.0
IL_0007 2a ret
</code></pre></div></div>
<p>Then when the JIT runs it’s processed as 3 basic-blocks (BB01, BB02 and BB03):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Importing BB01 (PC=000) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
[ 0] 0 (0x000) ldarg.0
[ 1] 1 (0x001) ldarg.1
[ 2] 2 (0x002) bgt.s
[000005] ------------ ▌ stmtExpr void (IL 0x000... ???)
[000004] ------------ └──▌ jmpTrue void
[000002] ------------ │ ┌──▌ lclVar int V01 arg1
[000003] ------------ └──▌ > int
[000001] ------------ └──▌ lclVar int V00 arg0
Importing BB03 (PC=006) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
[ 0] 6 (0x006) ldarg.0
[ 1] 7 (0x007) ret
[000009] ------------ ▌ stmtExpr void (IL 0x006... ???)
[000008] ------------ └──▌ return int
[000007] ------------ └──▌ lclVar int V00 arg0
Importing BB02 (PC=004) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
[ 0] 4 (0x004) ldarg.1
[ 1] 5 (0x005) ret
[000013] ------------ ▌ stmtExpr void (IL 0x004... ???)
[000012] ------------ └──▌ return int
[000011] ------------ └──▌ lclVar int V01 arg1
</code></pre></div></div>
<p>Before finally being turned into the following assembly code, which is way more efficient. It contains just a <code class="language-plaintext highlighter-rouge">cmp</code>, a <code class="language-plaintext highlighter-rouge">jg</code> and a couple of <code class="language-plaintext highlighter-rouge">mov</code> instructions, but crucially it’s all done in-line, it doesn’t need call out to another method.</p>
<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Assembly listing for method TestNamespace.TestClass:MaxMethod(int,int):int</span>
<span class="c1">// Emitting BLENDED_CODE for X64 CPU with AVX</span>
<span class="c1">// optimized code</span>
<span class="c1">// rsp based frame</span>
<span class="c1">// partially interruptible</span>
<span class="c1">// Final local variable assignments</span>
<span class="c1">//</span>
<span class="c1">// V00 arg0 [V00,T00] ( 4, 3.50) int -> rcx</span>
<span class="c1">// V01 arg1 [V01,T01] ( 4, 3.50) int -> rdx</span>
<span class="c1">// # V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00]</span>
<span class="c1">//</span>
<span class="c1">// Lcl frame size = 0</span>
<span class="nl">G_M32709_IG01:</span>
<span class="nl">G_M32709_IG02:</span>
<span class="mi">3</span><span class="n">BCA</span> <span class="n">cmp</span> <span class="n">ecx</span><span class="o">,</span> <span class="n">edx</span>
<span class="mi">7</span><span class="n">F03</span> <span class="n">jg</span> <span class="n">SHORT</span> <span class="n">G_M32709_IG04</span>
<span class="mi">8</span><span class="n">BC2</span> <span class="n">mov</span> <span class="n">eax</span><span class="o">,</span> <span class="n">edx</span>
<span class="nl">G_M32709_IG03:</span>
<span class="n">C3</span> <span class="n">ret</span>
<span class="nl">G_M32709_IG04:</span>
<span class="mi">8</span><span class="n">BC1</span> <span class="n">mov</span> <span class="n">eax</span><span class="o">,</span> <span class="n">ecx</span>
<span class="nl">G_M32709_IG05:</span>
<span class="n">C3</span> <span class="n">ret</span>
</code></pre></div></div>
<hr />
<h3 id="disclaimercredit">Disclaimer/Credit</h3>
<p>I got the idea for doing this from the Appendix of the excellent book <a href="https://www.amazon.co.uk/Shared-Source-Essentials-David-Stutz/dp/059600351X/ref=as_li_ss_tl?ie=UTF8&qid=1495146939&sr=8-1-fkmr0&keywords=shared+source+essentials+sscli&linkCode=ll1&tag=mattonsoft-21&linkId=033fb897262ad494f8f5322fd9f99f66">Shared Source CLI Essentials - Amazon</a>, you can also <a href="http://www.newardassociates.com/files/SSCLI2.pdf">download a copy of the 2nd edition</a> if you don’t want to purchase the print one.</p>
<p>In Appendix B the authors of the book reproduced the work that <a href="http://www.ugidotnet.org/eventi/28/Rotor">Peter Drayton</a> did to add an <em>Exponentiation</em> op-code to the SSCLI, which inspired this entire post, so thanks for that!!</p>
<p><img src="/images/2017/05/Appendix B - Add a new CIL opcode.png" alt="Appendix B - Add a new CIL opcode" /></p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14379557">HackerNews</a> and <a href="https://www.reddit.com/r/programming/comments/6c3qsh/adding_a_new_bytecode_instruction_to_the_clr/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/05/19/Adding-a-new-Bytecode-Instruction-to-the-CLR/">Adding a new Bytecode Instruction to the CLR</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Arrays and the CLR - a Very Special Relationship2017-05-08T00:00:00+00:00http://www.mattwarren.org/2017/05/08/Arrays-and-the-CLR-a-Very-Special-Relationship
<p>A while ago I wrote about the ‘special relationship’ that <a href="/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/">exists between Strings and the CLR</a>, well it turns out that Arrays and the CLR have an even deeper one, the type of closeness where you <em>hold hands on your first meeting</em></p>
<p><a href="http://www.telegraph.co.uk/news/2017/01/27/theresa-may-donald-trump-prove-opposites-can-attract-uk-us-leaders/"><img src="/images/2017/05/Donald-Trump-Theresa-May.jpg" alt="Donald Trump and Theresa May" /></a></p>
<hr />
<p>As an aside, if you like reading about <strong>CLR internals</strong> you may find these other posts interesting:</p>
<ul>
<li><a href="/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/?recommended=1">The CLR Thread Pool ‘Thread Injection’ Algorithm</a></li>
<li><a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/?recommended=1">The 68 things the CLR does before executing a single line of your code</a></li>
<li><a href="/2017/01/25/How-do-.NET-delegates-work/?recommended=1">How do .NET delegates work?</a></li>
<li><a href="/2016/12/14/Why-is-Reflection-slow/?recommended=1">Why is reflection slow?</a></li>
<li><a href="/2016/10/26/How-does-the-fixed-keyword-work/?recommended=1">How does the ‘fixed’ keyword work?</a></li>
</ul>
<hr />
<h2 id="fundamental-to-the-common-language-runtime-clr">Fundamental to the Common Language Runtime (CLR)</h2>
<p>Arrays are such a fundamental part of the CLR that they are included in the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/dotnet-standards.md">ECMA specification</a>, to make it clear that the <em>runtime</em> has to implement them:</p>
<p><img src="/images/2017/05/Single-Dimensions Arrays (Vectors) in the ECMA Spec.png" alt="Single-Dimensions Arrays (Vectors) in the ECMA Spec" /></p>
<p>In addition, there are several <a href="https://en.wikipedia.org/wiki/List_of_CIL_instructions">IL (Intermediate Language) instructions</a> that specifically deal with arrays:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">newarr</code> <etype>
<ul>
<li>Create a new array with elements of type etype.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ldelem.ref</code>
<ul>
<li>Load the element at index onto the top of the stack as an O. The type of the O is the same as the element type of the array pushed on the CIL stack.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">stelem</code> <typeTok>
<ul>
<li>Replace array element at index with the value on the stack (also <code class="language-plaintext highlighter-rouge">stelem.i</code>, <code class="language-plaintext highlighter-rouge">stelem.i1</code>, <code class="language-plaintext highlighter-rouge">stelem.i2</code>, <code class="language-plaintext highlighter-rouge">stelem.r4</code> etc)</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ldlen</code>
<ul>
<li>Push the length (of type native unsigned int) of array on the stack.</li>
</ul>
</li>
</ul>
<p>This makes sense because arrays are the building blocks of so many other data types, you want them to be available, well defined and efficient in a modern high-level language like C#. Without arrays you can’t have lists, dictionaries, queues, stacks, trees, etc, they’re all built on-top of arrays which provided low-level access to contiguous pieces of memory in a type-safe way.</p>
<h3 id="memory-and-type-safety">Memory and Type Safety</h3>
<p>This <em>memory</em> and <em>type-safety</em> is important because without it .NET couldn’t be described as a ‘managed runtime’ and you’d be left having to deal with the types of issues you get when you are writing code in a more low-level language.</p>
<p>More specifically, the CLR provides the following protections when you are using arrays (from the section on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/intro-to-clr.md#memory-and-type-safety">Memory and Type Safety</a> in the BOTR ‘Intro to the CLR’ page):</p>
<blockquote>
<p>While a GC is necessary to ensure memory safety, it is not sufficient. The GC will not prevent the program from <strong>indexing off the end of an array</strong> or accessing a field off the end of an object (possible if you compute the field’s address using a base and offset computation). <strong>However, if we do prevent these cases, then we can indeed make it impossible for a programmer to create memory-unsafe programs</strong>.</p>
</blockquote>
<blockquote>
<p>While the common intermediate language (CIL) does have operators that can fetch and set arbitrary memory (and thus violate memory safety), it also has the <strong>following memory-safe operators</strong> and the CLR strongly encourages their use in most programming:</p>
<ol>
<li>Field-fetch operators (LDFLD, STFLD, LDFLDA) that fetch (read), set and take the address of a field by name.</li>
<li><strong>Array-fetch operators (LDELEM, STELEM, LDELEMA)</strong> that fetch, set and take the address of an array element by index. <strong>All arrays include a tag specifying their length</strong>. This facilitates an automatic bounds check before each access.</li>
</ol>
</blockquote>
<p>Also, from the section on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/intro-to-clr.md#verifiable-code---enforcing-memory-and-type-safety">Verifiable Code - Enforcing Memory and Type Safety</a> in the same BOTR page</p>
<blockquote>
<p>In practice, the number of run-time checks needed is actually very small. They include the following operations:</p>
<ol>
<li>Casting a pointer to a base type to be a pointer to a derived type (the opposite direction can be checked statically)</li>
<li><strong>Array bounds checks</strong> (just as we saw for memory safety)</li>
<li>Assigning an element in an <strong>array of pointers to a new (pointer) value</strong>. This particular check is only required because <strong>CLR arrays have liberal casting rules</strong> (more on that later…)</li>
</ol>
</blockquote>
<p>However you don’t get this protection for free, there’s a cost to pay:</p>
<blockquote>
<p>Note that the need to do these checks places requirements on the runtime. In particular:</p>
<ol>
<li>All memory in the GC heap must be tagged with its type (so the casting operator can be implemented). This type information must be available at runtime, and it must be rich enough to determine if casts are valid (e.g., the runtime needs to know the inheritance hierarchy). In fact, the first field in every object on the GC heap points to a runtime data structure that represents its type.</li>
<li><strong>All arrays must also have their size</strong> (for bounds checking).</li>
<li><strong>Arrays must have complete type information</strong> about their element type.</li>
</ol>
</blockquote>
<hr />
<h2 id="implementation-details">Implementation Details</h2>
<p>It turns out that large parts of the internal implementation of arrays is best described as <em>magic</em>, this Stack Overflow <a href="http://stackoverflow.com/questions/19914523/mystery-behind-system-array#comment29631862_19914523">comment from Marc Gravell sums it up nicely</a></p>
<blockquote>
<p>Arrays are basically voodoo. Because they pre-date generics, yet must allow on-the-fly type-creation (even in .NET 1.0), they are implemented using tricks, hacks, and sleight of hand.</p>
</blockquote>
<p>Yep that’s right, arrays were parametrised (i.e. generic) before generics even existed. That means you could create arrays such as <code class="language-plaintext highlighter-rouge">int[]</code> and <code class="language-plaintext highlighter-rouge">string[]</code>, long before you were able to write <code class="language-plaintext highlighter-rouge">List<int></code> or <code class="language-plaintext highlighter-rouge">List<string></code>, which only became possible in .NET 2.0.</p>
<h3 id="special-helper-classes">Special helper classes</h3>
<p>All this <em>magic</em> or <em>sleight of hand</em> is made possible by 2 things:</p>
<ul>
<li>The CLR breaking all the usual type-safety rules</li>
<li>A special array helper class called <code class="language-plaintext highlighter-rouge">SZArrayHelper</code></li>
</ul>
<p>But first the why, why were all these tricks needed? From <a href="https://blogs.msdn.microsoft.com/bclteam/2004/11/19/net-arrays-ilistt-generic-algorithms-and-what-about-stl-brian-grunkemeyer/">.NET Arrays, IList<T>, Generic Algorithms, and what about STL?</a>:</p>
<blockquote>
<p>When we were designing our generic collections classes, one of the things that bothered me was how to write a generic algorithm that would work on both arrays and collections. To drive generic programming, of course we must make arrays and generic collections as seamless as possible. It felt that there should be a simple solution to this problem <strong>that meant you shouldn’t have to write the same code twice, once taking an IList<T> and again taking a T[]</strong>. The solution that dawned on me was that arrays needed to implement our generic IList. We made arrays in V1 implement the non-generic IList, which was rather simple due to the lack of strong typing with IList and our base class for all arrays (System.Array). <strong>What we needed was to do the same thing in a strongly typed way for IList<T></strong>.</p>
</blockquote>
<p>But it was only done for the common case, i.e. ‘single dimensional’ arrays:</p>
<blockquote>
<p>There were some restrictions here though – <strong>we didn’t want to support multidimensional arrays since IList<T> only provides single dimensional accesses</strong>. Also, arrays with non-zero lower bounds are rather strange, and probably wouldn’t mesh well with IList<T>, where most people may iterate from 0 to the return from the Count property on that IList. So, <strong>instead of making System.Array implement IList<T>, we made T[] implement IList<T></strong>. Here, T[] means a single dimensional array with 0 as its lower bound (often called an SZArray internally, but I think Brad wanted to promote the term ‘vector’ publically at one point in time), and the element type is T. So Int32[] implements IList<Int32>, and String[] implements IList<String>.</p>
</blockquote>
<p>Also, this comment from the <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/array.cpp#L1369-L1428">array source code</a> sheds some further light on the reasons:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//----------------------------------------------------------------------------------
// Calls to (IList<T>)(array).Meth are actually implemented by SZArrayHelper.Meth<T>
// This workaround exists for two reasons:
//
// - For working set reasons, we don't want insert these methods in the array
// hierachy in the normal way.
// - For platform and devtime reasons, we still want to use the C# compiler to
// generate the method bodies.
//
// (Though it's questionable whether any devtime was saved.)
//
// ....
//----------------------------------------------------------------------------------
</code></pre></div></div>
<p>So it was done for <em>convenience</em> and <em>efficiently</em>, as they didn’t want every instance of <code class="language-plaintext highlighter-rouge">System.Array</code> to carry around all the code for the <code class="language-plaintext highlighter-rouge">IEnumerable<T></code> and <code class="language-plaintext highlighter-rouge">IList<T></code> implementations.</p>
<p>This mapping takes places via a call to <a href="https://github.com/dotnet/coreclr/blob/a9b25d4aa22a1f4ad5f323f6c826e318f5a720fe/src/vm/methodtable.cpp#L6870-L6873">GetActualImplementationForArrayGenericIListOrIReadOnlyListMethod(..)</a>, which wins the prize for the best method name in the CoreCLR source!! It’s responsible for wiring up the corresponding method from the <a href="https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Array.cs#L2595-L2778">SZArrayHelper</a> class, i.e. <code class="language-plaintext highlighter-rouge">IList<T>.Count</code> -> <code class="language-plaintext highlighter-rouge">SZArrayHelper.Count<T></code> or if the method is part of the <code class="language-plaintext highlighter-rouge">IEnumerator<T></code> interface, the <a href="https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Array.cs#L2718-L2776">SZGenericArrayEnumerator<T></a> is used.</p>
<p>But this has the potential to cause security holes, as it breaks the normal C# type system guarantees, specifically regarding the <code class="language-plaintext highlighter-rouge">this</code> pointer. To illustrate the problem, here’s the source code of the <a href="https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Array.cs#L2627-L2633"><code class="language-plaintext highlighter-rouge">Count</code> property</a>, note the call to <code class="language-plaintext highlighter-rouge">JitHelpers.UnsafeCast<T[]></code>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">internal</span> <span class="kt">int</span> <span class="n">get_Count</span><span class="p"><</span><span class="n">T</span><span class="p">>()</span>
<span class="p">{</span>
<span class="c1">//! Warning: "this" is an array, not an SZArrayHelper. See comments above</span>
<span class="c1">//! or you may introduce a security hole!</span>
<span class="n">T</span><span class="p">[]</span> <span class="n">_this</span> <span class="p">=</span> <span class="n">JitHelpers</span><span class="p">.</span><span class="n">UnsafeCast</span><span class="p"><</span><span class="n">T</span><span class="p">[</span><span class="k">]></span><span class="p">(</span><span class="k">this</span><span class="p">);</span>
<span class="k">return</span> <span class="n">_this</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Yikes, it has to remap <code class="language-plaintext highlighter-rouge">this</code> to be able to call <code class="language-plaintext highlighter-rouge">Length</code> on the correct object!!</p>
<p>And just in case those comments aren’t enough, there is a very strongly worded comment <a href="https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Array.cs#L2572-L2594">at the top of the class</a> that further spells out the risks!!</p>
<p>Generally all this magic is hidden from you, but occasionally it leaks out. For instance if you run the code below, <code class="language-plaintext highlighter-rouge">SZArrayHelper</code> will show up in the <code class="language-plaintext highlighter-rouge">StackTrace</code> and <code class="language-plaintext highlighter-rouge">TargetSite</code> of properties of the <code class="language-plaintext highlighter-rouge">NotSupportedException</code>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="p">{</span>
<span class="kt">int</span><span class="p">[]</span> <span class="n">someInts</span> <span class="p">=</span> <span class="p">{</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">4</span> <span class="p">};</span>
<span class="n">IList</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="n">collection</span> <span class="p">=</span> <span class="n">someInts</span><span class="p">;</span>
<span class="c1">// Throws NotSupportedException 'Collection is read-only'</span>
<span class="n">collection</span><span class="p">.</span><span class="nf">Clear</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">NotSupportedException</span> <span class="n">nsEx</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"{0} - {1}"</span><span class="p">,</span> <span class="n">nsEx</span><span class="p">.</span><span class="n">TargetSite</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">,</span> <span class="n">nsEx</span><span class="p">.</span><span class="n">TargetSite</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">nsEx</span><span class="p">.</span><span class="n">StackTrace</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="removing-bounds-checks">Removing Bounds Checks</h3>
<p>The runtime also provides support for arrays in more conventional ways, the first of which is related to performance. Array bounds checks are all well and good when providing <em>memory-safety</em>, but they have a cost, so where possible the JIT removes any checks that it knows are redundant.</p>
<p>It does this by calculating the <em>range</em> of values that a <code class="language-plaintext highlighter-rouge">for</code> loop access and compares those to the actual length of the array. If it determines that there is <em>never</em> an attempt to access an item outside the permissible bounds of the array, the run-time checks are then removed.</p>
<p>For more information, the links below take you to the areas of the JIT source code that deal with this:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/ec80b02b61839af453ce297faf4ce074edeee9da/src/jit/compiler.cpp#L4524-L4525">JIT trying to remove range checks</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/27b2300f790793733e501497203316ccad390e2b/src/jit/rangecheck.cpp#L201-L303">RangeCheck::OptimizeRangeCheck(..)</a>
<ul>
<li>In turn calls <a href="https://github.com/dotnet/coreclr/blob/27b2300f790793733e501497203316ccad390e2b/src/jit/rangecheck.cpp#L1261-L1290">RangeCheck::GetRange(..)</a></li>
<li>Also call <a href="https://github.com/dotnet/coreclr/blob/c06fb332e7bb77a55bda724a56b33d6094a0a042/src/jit/optimizer.cpp#L7255-L7322">Compiler::optRemoveRangeCheck(..)</a> to actually remove the range-check</li>
</ul>
</li>
<li>Really informative source code comment <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/rangecheck.h#L5-L58">explaining the range check removal logic</a></li>
</ul>
<p>And if you are really keen, take a look at <a href="https://gist.github.com/mattwarren/a72cdb3ae427957af10635153d79555b#gistcomment-2075030">this gist</a> that I put together to explore the scenarios where bounds checks are ‘removed’ and ‘not removed’.</p>
<h3 id="allocating-an-array">Allocating an array</h3>
<p>Another task that the runtime helps with is allocating arrays, using hand-written assembly code so the methods are as optimised as possible, see:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/0ec02d7375a1aa96206fd755b02e553e075ac3ae/src/vm/i386/jitinterfacex86.cpp#L885-L1109">JIT_TrialAlloc::GenAllocArray(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/0ec02d7375a1aa96206fd755b02e553e075ac3ae/src/vm/i386/jitinterfacex86.cpp#L1082-L1104">Patching in the assembly code</a></li>
</ul>
<h3 id="run-time-treats-arrays-differently">Run-time treats arrays differently</h3>
<p>Finally, because arrays are so intertwined with the CLR, there are lots of places in which they are dealt with as a <em>special-case</em>. For instance <a href="https://github.com/dotnet/coreclr/search?l=C%2B%2B&q=path%3A%2Fsrc+IsArray%28%29&type=&utf8=%E2%9C%93">a search for ‘IsArray()’</a> in the CoreCLR source returns over 60 hits, including:</p>
<ul>
<li>The method table for an array is built differently
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/a9b25d4aa22a1f4ad5f323f6c826e318f5a720fe/src/vm/classcompat.cpp#L543-L608">MethodTableBuilder::BuildInteropVTableForArray(..)</a></li>
</ul>
</li>
<li>When you call <code class="language-plaintext highlighter-rouge">ToString()</code> on an array, you get special formatting, i.e. ‘System.Int32[]’ or ‘MyClass[,]’
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/typestring.cpp#L903-L937">TypeString::AppendType(..)</a></li>
</ul>
</li>
</ul>
<hr />
<p>So yes, it’s fair to say that arrays and the CLR have a <strong>Very Special Relationship</strong></p>
<hr />
<h2 id="further-reading">Further Reading</h2>
<p>As always, here are some more links for your enjoyment!!</p>
<ul>
<li><a href="https://github.com/ljw1004/csharpspec/blob/gh-pages/arrays.md">CSharp Specification for Arrays</a></li>
<li><a href="https://www.codeproject.com/Articles/20481/NET-Type-Internals-From-a-Microsoft-CLR-Perspecti?fid=459323&fr=26#20">.NET Type Internals - From a Microsoft CLR Perspective - ARRAYS</a></li>
<li><a href="http://web.archive.org/web/20081203124917/http://msdn.microsoft.com/msdnmag/issues/06/11/CLRInsideOut/">CLR INSIDE OUT - Investigating Memory Issues</a></li>
<li><a href="http://www.abhisheksur.com/2011/06/internals-of-array.html">Internals of Array</a></li>
<li><a href="http://www.abhisheksur.com/2011/09/internals-of-net-objects-and-use-of-sos.html">Internals of .NET Objects and Use of SOS</a></li>
<li><a href="https://windowsdebugging.wordpress.com/2012/04/07/memorylayoutofarrays/">Memory layout of .NET Arrays</a></li>
<li><a href="https://windowsdebugging.wordpress.com/2012/04/24/memorylayoutofarraysx64/">Memory Layout of .NET Arrays (x64)</a></li>
<li><a href="http://stackoverflow.com/questions/468832/why-are-multi-dimensional-arrays-in-net-slower-than-normal-arrays">Why are multi-dimensional arrays in .NET slower than normal arrays?</a></li>
<li><a href="http://stackoverflow.com/questions/11163297/how-do-arrays-in-c-sharp-partially-implement-ilistt/11164210#11164210">How do arrays in C# partially implement IList<T>?</a></li>
<li><a href="http://stackoverflow.com/questions/33632073/purpose-of-typedependencyattributesystem-szarrayhelper-for-ilistt-ienumer/33632407#33632407">Purpose of TypeDependencyAttribute(“System.SZArrayHelper”) for IList<T>, IEnumerable<T> and ICollection<T>?</a></li>
<li><a href="http://stackoverflow.com/questions/15341882/what-kind-of-class-does-yield-return-return/15341925#15341925">What kind of class does ‘yield return’ return</a></li>
<li><a href="http://labs.developerfusion.co.uk/SourceViewer/browse.aspx?assembly=SSCLI&namespace=System&type=SZArrayHelper">SZArrayHelper implemented in Shared Source CLI (SSCLI)</a></li>
</ul>
<h3 id="array-source-code-references">Array source code references</h3>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs">Array.cs</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/vm/array.cpp">array.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/vm/array.h">array.h</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/05/08/Arrays-and-the-CLR-a-Very-Special-Relationship/">Arrays and the CLR - a Very Special Relationship</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The CLR Thread Pool 'Thread Injection' Algorithm2017-04-13T00:00:00+00:00http://www.mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm
<p><strong>If you’re near London at the end of April, I’ll be speaking at <a href="http://2017.progscon.co.uk/">ProgSCon 2017</a> on <a href="http://2017.progscon.co.uk/cr3ativconference/microsoft-and-open-source-a-brave-new-world/">Microsoft and Open-Source – A ‘Brave New World’</a>. ProgSCon is 1-day conference, with talks <a href="http://2017.progscon.co.uk/home/talks/">covering an eclectic range of topics</a>, you’ll learn lots!!</strong></p>
<hr />
<p>As part of a never-ending quest to explore the <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">CoreCLR source code</a> I stumbled across the intriguing titled <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/hillclimbing.cpp">‘HillClimbing.cpp’</a> source file. This post explains what it does and why.</p>
<h3 id="what-is-hill-climbing">What is ‘Hill Climbing’</h3>
<p>It turns out that ‘Hill Climbing’ is a general technique, from the Wikipedia page on the <a href="https://en.wikipedia.org/wiki/Hill_climbing">Hill Climbing Algorithm</a>:</p>
<blockquote>
<p>In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. <strong>It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution</strong>. If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found.</p>
</blockquote>
<p>But in the context of the CoreCLR, ‘Hill Climbing’ (HC) is used to control the rate at which threads are added to the Thread Pool, from the <a href="https://msdn.microsoft.com/en-gb/library/ff963549.aspx">MSDN page on ‘Parallel Tasks’</a>:</p>
<blockquote>
<p><strong>Thread Injection</strong></p>
<p>The .NET thread pool automatically manages the number of worker threads in the pool. It adds and removes threads according to built-in heuristics. The .NET thread pool has two main mechanisms for injecting threads: a starvation-avoidance mechanism that adds worker threads if it sees no progress being made on queued items and a <strong>hill-climbing</strong> heuristic that tries to <strong>maximize throughput</strong> while using as <strong>few threads as possible</strong>.
…
A goal of the <strong>hill-climbing</strong> heuristic is to improve the utilization of cores when threads are blocked by I/O or other wait conditions that stall the processor
….
<strong>The .NET thread pool has an opportunity to inject threads every time a work item completes or at 500 millisecond intervals, whichever is shorter</strong>. The thread pool uses this opportunity to try adding threads (or taking them away), guided by feedback from previous changes in the thread count. If adding threads seems to be helping throughput, the thread pool adds more; otherwise, it reduces the number of worker threads. This technique is called the <strong>hill-climbing</strong> heuristic.</p>
</blockquote>
<p>For more specifics on what the algorithm is doing, you can read the research paper <a href="https://www.researchgate.net/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation">Optimizing Concurrency Levels in the .NET ThreadPool</a> published by Microsoft, although it you want a brief outline of what it’s trying to achieve, this summary from the paper is helpful:</p>
<blockquote>
<p>In addition the controller should have:</p>
<ol>
<li><strong>short settling times</strong> so that cumulative throughput is maximized</li>
<li><strong>minimal oscillations</strong> since changing control settings incurs overheads that reduce throughput</li>
<li><strong>fast adaptation</strong> to changes in workloads and resource characteristics.</li>
</ol>
</blockquote>
<p>So reduce throughput, don’t add and then remove threads too fast, but still adapt quickly to changing work-loads, simple really!!</p>
<p>As an aside, after reading (and re-reading) the research paper I found it interesting that a considerable amount of it was dedicated to testing, as the following excerpt shows:</p>
<p><img src="/images/2017/04/Research paper - issues encountered - approaches used to solve them.png" alt="Research paper - issues encountered - approaches used to solve them" /></p>
<p>In fact the approach to testing was considered so important that they wrote an entire follow-up paper that discusses it, see <a href="http://dl.acm.org/citation.cfm?id=1688934">Configuring Resource Managers Using Model Fuzzing</a>.</p>
<hr />
<h3 id="why-is-it-needed">Why is it needed?</h3>
<p>Because, in short, just adding new threads doesn’t always increase throughput and ultimately having lots of threads has a cost. As <a href="https://github.com/dotnet/corefx/issues/2329#issuecomment-146964909">this comment from Eric Eilebrecht</a>, one of the authors of the research paper explains:</p>
<blockquote>
<p>Throttling thread creation is not only about the cost of creating a thread; it’s mainly about the cost of having a <strong>large number of running threads on an ongoing basis</strong>. For example:</p>
<ul>
<li>More threads means more <strong>context-switching</strong>, which adds CPU overhead. With a large number of threads, this can have a significant impact.</li>
<li>More threads means more <strong>active stacks</strong>, which impacts data locality. The more stacks a CPU is having to juggle in its various caches, the less effective those caches are.</li>
</ul>
<p>The <strong>advantage</strong> of more threads than logical processors is, of course, that we can keep the CPU busy if some of the threads are blocked, and so get more work done. But we need to be careful not to “overreact” to blocking, and end up hurting performance by having <strong>too many</strong> threads.</p>
</blockquote>
<p>Or in other words, from <a href="https://msdn.microsoft.com/en-us/magazine/ff960958.aspx">Concurrency - Throttling Concurrency in the CLR 4.0 ThreadPool</a></p>
<blockquote>
<p>As opposed to what may be intuitive, concurrency control is about <strong>throttling</strong> and <strong>reducing</strong> the number of work items that can be run in parallel in order to improve the worker ThreadPool throughput (that is, controlling the degree of concurrency is <strong>preventing work from running</strong>).</p>
</blockquote>
<p>So the algorithm was designed with all these criteria in mind and was then tested over a large range of scenarios, to ensure it actually worked! This is why it’s often said that you should just leave the .NET ThreadPool alone, not try and tinker with it. It’s been heavily tested to work across a multiple situations and it was designed to adapt over time, so it should have you covered! (although of course, there are times <a href="http://joeduffyblog.com/2006/07/08/clr-thread-pool-injection-stuttering-problems/">when it doesn’t work perfectly</a>!!)</p>
<hr />
<h2 id="the-algorithm-in-action">The Algorithm in Action</h2>
<p>As the source in now available, we can actually play with the algorithm and try it out in a few scenarios to see what it does. It needs very few dependences and therefore all the relevant code is contained in the following files:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/vm/hillclimbing.cpp">/src/vm/hillclimbing.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/vm/hillclimbing.h">/src/vm/hillclimbing.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/inc/complex.h">/src/inc/complex.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/inc/random.h">/src/inc/random.h</a></li>
</ul>
<p>(For comparison, there’s an implementation of the same algorithm in the <a href="https://github.com/mono/mono/blob/master/mono/metadata/threadpool-worker-default.c">Mono source code</a>)</p>
<p>I have a project <a href="https://github.com/mattwarren/HillClimbingClrThreadPool">up on my GitHub page</a> that allows you to test the hill-climbing algorithm in a self-contained console app. If you’re interested you can see the <a href="https://github.com/mattwarren/HillClimbingClrThreadPool/commit/0941998aeda345aeaaa44f88e8d3b99f18e23abb">changes/hacks</a> I had to do to get it building, although in the end it was pretty simple! (<strong>Update</strong> Kudos to <a href="https://github.com/cklutz">Christian Klutz</a> who <a href="https://github.com/cklutz/HillClimbing">ported my self-contained app to C#</a>, nice job!!)</p>
<p>The algorithm is controlled via the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md">following <code class="language-plaintext highlighter-rouge">HillClimbing_XXX</code> settings</a>:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Setting</th>
<th style="text-align: center">Default Value</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">HillClimbing_WavePeriod</td>
<td style="text-align: center">4</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_TargetSignalToNoiseRatio</td>
<td style="text-align: center">300</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_ErrorSmoothingFactor</td>
<td style="text-align: center">1</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_WaveMagnitudeMultiplier</td>
<td style="text-align: center">100</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_MaxWaveMagnitude</td>
<td style="text-align: center">20</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_WaveHistorySize</td>
<td style="text-align: center">8</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_Bias</td>
<td style="text-align: center">15</td>
<td>The ‘cost’ of a thread. 0 means drive for increased throughput regardless of thread count; higher values bias more against higher thread counts</td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_MaxChangePerSecond</td>
<td style="text-align: center">4</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_MaxChangePerSample</td>
<td style="text-align: center">20</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_MaxSampleErrorPercent</td>
<td style="text-align: center">15</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_SampleIntervalLow</td>
<td style="text-align: center">10</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_SampleIntervalHigh</td>
<td style="text-align: center">200</td>
<td> </td>
</tr>
<tr>
<td style="text-align: left">HillClimbing_GainExponent</td>
<td style="text-align: center">200</td>
<td>The exponent to apply to the gain, times 100. 100 means to use linear gain, higher values will enhance large moves and damp small ones</td>
</tr>
</tbody>
</table>
<p>Because I was using the code in a self-contained console app, I just <a href="https://github.com/mattwarren/HillClimbingClrThreadPool/blob/a99db86a48309d569b221194ede0392d14eaa243/hillclimbing.cpp#L54-L91">hard-coded the default values</a> into the source, but in the CLR it <em>appears</em> that you can modify these values at runtime.</p>
<h3 id="working-with-the-hill-climbing-code">Working with the Hill Climbing code</h3>
<p>There are several things I discovered when implementing a simple test app that works with the algorithm:</p>
<ol>
<li>The calculation is triggered by calling the function <code class="language-plaintext highlighter-rouge">HillClimbingInstance.Update(currentThreadCount, sampleDuration, numCompletions, &threadAdjustmentInterval)</code> and the return value is the new ‘maximum thread count’ that the algorithm is proposing.</li>
<li>It calculates the desired number of threads based on the ‘current throughput’, which is the ‘# of tasks completed’ (<code class="language-plaintext highlighter-rouge">numCompletions</code>) during the current time-period (<code class="language-plaintext highlighter-rouge">sampleDuration</code> in seconds).</li>
<li>It also takes the current thread count (<code class="language-plaintext highlighter-rouge">currentThreadCount</code>) into consideration.</li>
<li>The core calculations (excluding error handling and house-keeping) are <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/hillclimbing.cpp#L162-L288">only just over 100 LOC</a>, so it’s not too hard to follow.</li>
<li>It works on the <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/hillclimbing.cpp#L162">basis of ‘transitions’</a> (<code class="language-plaintext highlighter-rouge">HillClimbingStateTransition</code>), first <code class="language-plaintext highlighter-rouge">Warmup</code>, then <code class="language-plaintext highlighter-rouge">Stabilizing</code> and will only recommend a new value once it’s moved into the <code class="language-plaintext highlighter-rouge">ClimbingMove</code> state.</li>
<li>The real .NET Thread Pool only increases the thread-count by one thread every 500 milliseconds. It keeps doing this until the ‘# of threads’ has reached the amount that the hill-climbing algorithm suggests. See <a href="https://github.com/dotnet/coreclr/blob/e5994fa5507a5f08058193ff26dc3698cd2e6444/src/vm/win32threadpool.h#L1085-L1101">ThreadpoolMgr::ShouldAdjustMaxWorkersActive()</a> and <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/win32threadpool.cpp#L910-L992">ThreadpoolMgr::AdjustMaxWorkersActive()</a> for the code that handles this.</li>
<li>If it hasn’t got enough samples to do a ‘statistically significant’ calculation this algorithm will indicate this via the <code class="language-plaintext highlighter-rouge">threadAdjustmentInterval</code> variable. This means that you should not call <code class="language-plaintext highlighter-rouge">HillClimbingInstance.Update(..)</code> until another <code class="language-plaintext highlighter-rouge">threadAdjustmentInterval</code> milliseconds have elapsed. (link to <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/hillclimbing.cpp#L105-L134">source code that calculates this</a>)</li>
<li>The current thread count is only <strong>decreased</strong> when threads complete their current task. At that point the current count is compared to the desired amount and if necessary a thread is ‘retired’</li>
<li>The algorithm with only returns values that respect the limits specified by <a href="https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setminthreads(v=vs.110).aspx">ThreadPool.SetMinThreads(..)</a> and <a href="https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads(v=vs.110).aspx">ThreadPool.SetMaxThreads(..)</a> (link to the <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/hillclimbing.cpp#L301-L305">code that handles this</a>)</li>
<li>In addition, it will only recommend increasing the thread count if the <a href="https://github.com/dotnet/coreclr/blob/e5faef44cac6e86b12b3b586742183293bdd34a7/src/vm/hillclimbing.cpp#L271-L275">CPU Utilization is below 95%</a></li>
</ol>
<p>First lets look at the graphs that were <strong>published in the research paper</strong> from Microsoft (<a href="https://www.researchgate.net/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation">Optimizing Concurrency Levels in the .NET ThreadPool</a>):</p>
<p><a href="/images/2017/04/Hill Climbing v Old Threadpool Algorithm.png"><img src="/images/2017/04/Hill Climbing v Old Threadpool Algorithm.png" alt="Hill Climbing v Old Threadpool Algorithm" /></a></p>
<p>They clearly show the thread-pool adapting the number of threads (up and down) as the throughput changes, so it appears the algorithm is doing what it promises.</p>
<p>Now for a similar image using the <strong>self-contained test app I wrote</strong>. Now, my test app only <a href="https://github.com/mattwarren/HillClimbingClrThreadPool/blob/fcb4bd27049b9cf8b5ddf2e5037611e36516642e/program.cpp#L63-L145">pretends to add/remove threads</a> based on the results for the Hill Climbing algorithm, so it’s only an approximation of the real behaviour, but it does provide a nice way to see it in action outside of the CLR.</p>
<p>In this simple scenario, the work-load that we are asking the thread-pool to do is just moving up and then down (click for full-size image):</p>
<p><a href="/images/2017/04/results-smooth.png"><img src="/images/2017/04/results-smooth.png" alt="Output from self-contained test app - smooth" /></a></p>
<p>Finally, we’ll look at what the algorithm does in a more noisy scenario, here the current ‘work load’ randomly jumps around, rather than smoothly changing:</p>
<p><a href="/images/2017/04/results-random.png"><img src="/images/2017/04/results-random.png" alt="Output from self-contained test app - random" /></a></p>
<p>So with a combination of a very detailed <a href="https://msdn.microsoft.com/en-gb/library/ff963549.aspx">MSDN article</a>, a easy-to-read <a href="https://www.researchgate.net/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation">research paper</a> and most significantly having the <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/hillclimbing.cpp">source code available</a>, we are able to get an understanding of what the .NET Thread Pool is doing ‘under-the-hood’!</p>
<hr />
<h2 id="references">References</h2>
<ol>
<li><a href="https://msdn.microsoft.com/en-us/magazine/ff960958.aspx">Concurrency - Throttling Concurrency in the CLR 4.0 ThreadPool</a> (I recommend reading this article <strong>before</strong> reading the research papers)</li>
<li><a href="https://www.researchgate.net/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation">Optimizing Concurrency Levels in the .NET ThreadPool: A case study of controller design and implementation</a>
<ul>
<li>direct link <a href="https://www.researchgate.net/profile/Joseph_Hellerstein2/publication/228977836_Optimizing_concurrency_levels_in_the_net_threadpool_A_case_study_of_controller_design_and_implementation/links/0c96052d441508cb45000000/Optimizing-concurrency-levels-in-the-net-threadpool-A-case-study-of-controller-design-and-implementation.pdf">to PDF file</a></li>
</ul>
</li>
<li><a href="http://dl.acm.org/citation.cfm?id=1688934">Configuring Resource Managers Using Model Fuzzing: A Case Study of the .NET Thread Pool</a>
<ul>
<li>direct link <a href="http://webcourse.cs.technion.ac.il/236635/Winter2009-2010/hw/WCFiles/2.pdf">to PDF file</a></li>
</ul>
</li>
<li><a href="https://msdn.microsoft.com/en-gb/library/ff963549.aspx">MSDN page on ‘Parallel Tasks’</a> (see section on ‘Thread Injection’)</li>
<li><a href="http://www.google.com/patents/US20100083272">Patent US20100083272 - Managing pools of dynamic resources</a></li>
</ol>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://channel9.msdn.com/Shows/Going+Deep/Erika-Parsons-and-Eric-Eilebrecht--CLR-4-Inside-the-new-Threadpool">Erika Parsons and Eric Eilebrecht - CLR 4 - Inside the Thread Pool - Channel 9</a></li>
<li><a href="http://www.danielmoth.com/Blog/New-And-Improved-CLR-4-Thread-Pool-Engine.aspx">New and Improved CLR 4 Thread Pool Engine</a> (Work-stealing and Local Queues)</li>
<li><a href="http://aviadezra.blogspot.co.uk/2009/06/net-clr-thread-pool-work.html">.NET CLR Thread Pool Internals</a> (compares the new Hill Climbing algorithm, to the previous algorithm used in the Legacy Thread Pool)</li>
<li><a href="http://joeduffyblog.com/2006/07/08/clr-thread-pool-injection-stuttering-problems/">CLR thread pool injection, stuttering problems</a></li>
<li><a href="http://joeduffyblog.com/2007/03/04/why-the-clr-20-sp1s-threadpool-default-max-thread-count-was-increased-to-250cpu/">Why the CLR 2.0 SP1’s threadpool default max thread count was increased to 250/CPU</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/1754">Use a more dependable policy for thread pool thread injection</a> (CoreCLR GitHub Issue)</li>
<li><a href="https://github.com/dotnet/corefx/issues/2329">Use a more dependable policy for thread pool thread injection</a> (CoreFX GitHub Issue)</li>
<li><a href="https://gist.github.com/JonCole/e65411214030f0d823cb">ThreadPool Growth: Some Important Details</a></li>
<li><a href="https://www.codeproject.com/articles/3813/net-s-threadpool-class-behind-the-scenes">.NET’s ThreadPool Class - Behind The Scenes</a> (Based on SSCLI source, not CoreCLR)</li>
<li><a href="http://chabster.blogspot.co.uk/2013/04/clr-execution-context.html">CLR Execution Context</a> (in Russian, but Google Translate does a reasonable job)</li>
<li><a href="https://github.com/benaadams/ThreadPoolTaskTesting">Thread Pool + Task Testing (by Ben Adams)</a></li>
<li><a href="http://belliottsmith.com/injector/">The Injector: A new Executor for Java</a> (an improved thread-injector for the Java Thread Pool)</li>
</ol>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14111369">Hacker News</a> and <a href="https://www.reddit.com/r/programming/comments/655xg2/the_clr_thread_pool_thread_injection_algorithm/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/">The CLR Thread Pool 'Thread Injection' Algorithm</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The .NET IL Interpreter2017-03-30T00:00:00+00:00http://www.mattwarren.org/2017/03/30/The-.NET-IL-Interpreter
<p>Whilst writing a <a href="/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code/">previous blog post</a> I stumbled across the .NET Interpreter, tucked away in the source code. Although, it I’d made even the smallest amount of effort to look for it, I’d have easily found it via the <a href="https://github.com/dotnet/coreclr/find/master"><em>GitHub ‘magic’ file search</em></a>:</p>
<p><img src="/images/2017/03/GitHub file search for 'Interpreter'.png" alt="GitHub file search for 'Interpreter'" /></p>
<h3 id="usage-scenarios">Usage Scenarios</h3>
<p>Before we look at how to use it and what it does, it’s worth pointing out that the Interpreter is not really meant for production code. As far as I can tell, its main purpose is to allow you to get the CLR up and running on a new CPU architecture. Without the interpreter you wouldn’t be able to test <em>any</em> C# code until you had a fully functioning JIT that could emit machine code for you. For instance see <a href="https://github.com/dotnet/coreclr/pull/8594">‘[ARM32/Linux] Initial bring up of FEATURE_INTERPRETER’</a> and <a href="https://github.com/dotnet/coreclr/commit/8c4e60054ddb42298f3eebaf20c970d665474ae3">‘[aarch64] Enable the interpreter on linux as well</a>.</p>
<p>Also it doesn’t have a few key features, most notable debugging support, that is you can’t debug through C# code that has been interpreted, although you can of course debug the interpreter itself. From <a href="https://github.com/dotnet/coreclr/pull/10478">‘Tiered Compilation step 1’</a>:</p>
<blockquote>
<p>…. - the interpreter is not in good enough shape to run production code as-is. There are also some significant issues if you want debugging and profiling tools to work (which we do).</p>
</blockquote>
<p>You can see an example of this in <a href="https://github.com/dotnet/coreclr/issues/34">‘Interpreter: volatile ldobj appears to have incorrect semantics?’</a> (thanks to <a href="https://www.reddit.com/r/programming/comments/62hcde/the_c_interpreter/dfn3ycc/">alexrp</a> for telling me about this issue). There is also a fair amount of <code class="language-plaintext highlighter-rouge">TODO</code> <a href="https://gist.github.com/mattwarren/a7e567c3aacd1c85da86206ea729c66f">comments in the code</a>, although I haven’t verified what (if any) specific C# code breaks due to the missing functionality.</p>
<p>However, I think another really useful scenario for the Interpreter is to help you learn about the inner workings of the CLR. It’s <em>only</em> 8,000 lines long, but it’s all in one file and most significantly it’s written in C++. The code that the CLR/JIT uses when compiling <em>for real</em> is in multiple several files (the JIT on it’s own is over 200,000 L.O.C, spread across 100’s of files) and there are large amounts hand-written written in <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/amd64">raw assembly</a>.</p>
<p>In theory the Interpreter should work in the same way as the <em>full</em> runtime, albeit not as optimised. This means that it much simpler and those of us who aren’t CLR and/or assembly experts can have a chance of working out what’s going on!</p>
<h2 id="enabling-the-interpreter">Enabling the Interpreter</h2>
<p>The Interpreter is disabled by default, so you have to <a href="https://github.com/dotnet/coreclr/tree/master/Documentation#build-coreclr-from-source">build the CoreCLR from source</a> to make it work (it used to be the <a href="https://github.com/dotnet/coreclr/commit/8a47eafa69614589eb86bbdf0c2c36aa690c1b15">fallback for ARM64</a> but that’s no longer the case), here’s the diff of the changes you need to make:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/src/inc/switches.h
</span><span class="gi">+++ b/src/inc/switches.h
</span><span class="p">@@ -233,5 +233,8 @@</span>
#define FEATURE_STACK_SAMPLING
#endif // defined (ALLOW_SXS_JIT)
+// Let's test the .NET Interpreter!!
<span class="gi">+#define FEATURE_INTERPRETER
+
</span> #endif // !defined(CROSSGEN_COMPILE)
</code></pre></div></div>
<p>You also need to enable some environment variables, the ones that I used are in the table below. For the full list, take a look at <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md">Host Configuration Knobs</a> and search for ‘Interpreter’.</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Interpret</strong></td>
<td>Selectively uses the interpreter to execute the specified methods</td>
</tr>
<tr>
<td><strong>InterpreterDoLoopMethods</strong></td>
<td>If set, don’t check for loops, start by interpreting <em>all</em> methods</td>
</tr>
<tr>
<td><strong>InterpreterPrintPostMortem</strong></td>
<td>Prints summary information about the execution to the console</td>
</tr>
<tr>
<td><strong>DumpInterpreterStubs</strong></td>
<td>Prints all interpreter stubs that are created to the console</td>
</tr>
<tr>
<td><strong>TraceInterpreterEntries</strong></td>
<td>Logs entries to interpreted methods to the console</td>
</tr>
<tr>
<td><strong>TraceInterpreterIL</strong></td>
<td>Logs individual instructions of interpreted methods to the console</td>
</tr>
<tr>
<td><strong>TraceInterpreterVerbose</strong></td>
<td>Logs interpreter progress with detailed messages to the console</td>
</tr>
<tr>
<td><strong>TraceInterpreterJITTransition</strong></td>
<td>Logs when the interpreter determines a method should be JITted</td>
</tr>
</tbody>
</table>
</span>
<p>To test out the Interpreter, I will be using the code below:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">max</span> <span class="p">=</span> <span class="m">1000</span> <span class="p">*</span> <span class="m">1000</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">Length</span> <span class="p">></span> <span class="m">0</span><span class="p">)</span>
<span class="kt">int</span><span class="p">.</span><span class="nf">TryParse</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="m">0</span><span class="p">],</span> <span class="k">out</span> <span class="n">max</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">timer</span> <span class="p">=</span> <span class="n">Stopwatch</span><span class="p">.</span><span class="nf">StartNew</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="n">max</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="p">(</span><span class="m">1000</span> <span class="p">*</span> <span class="m">100</span><span class="p">)</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"Completed {0,10:N0} iterations"</span><span class="p">,</span> <span class="n">i</span><span class="p">));</span>
<span class="p">}</span>
<span class="n">timer</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"Performed {0:N0} iterations, max);
</span> <span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"Took {0:N0} msecs"</span><span class="p">,</span> <span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">));</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>which on my machine, gives the following results for <code class="language-plaintext highlighter-rouge">100,000</code> iterations:</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: left">Run</th>
<th style="text-align: right">Compiled (msecs)</th>
<th style="text-align: right">Interpreted (msecs)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">1</td>
<td style="text-align: right">11</td>
<td style="text-align: right">4,393</td>
</tr>
<tr>
<td style="text-align: left">2</td>
<td style="text-align: right">11</td>
<td style="text-align: right">4,089</td>
</tr>
<tr>
<td style="text-align: left">3</td>
<td style="text-align: right">9</td>
<td style="text-align: right">4,416</td>
</tr>
</tbody>
</table>
</span>
<p>So yeah, you don’t want to be using the interpreter for any performance sensitive code!!</p>
<h3 id="diagnostic-output">Diagnostic Output</h3>
<p>In addition, a diagnostic output is produced. Note, this is from a single iteration of the loop, otherwise it becomes too verbose to read.</p>
<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">Generating interpretation stub (# 1 = 0x1, hash = 0x91b7d02e) for ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main.</span>
<span class="nl">Skipping ConsoleApplication.Program</span><span class="o">:</span><span class="nf">.cctor</span>
<span class="nl">Entering method #1 (= 0x1)</span><span class="o">:</span> <span class="nf">ConsoleApplication.Program:Main(class).</span>
<span class="nl">arguments</span><span class="o">:</span>
<span class="nl">0</span><span class="o">:</span> <span class="nf">class: 0x0000000002C50568 (System.String[]) [...]</span>
<span class="nl">START 1, ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span>
<span class="nl">0</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0x1</span><span class="o">:</span> <span class="nf">call</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">.cctor</span>
<span class="nl">Skipping DomainBoundILStubClass</span><span class="o">:</span><span class="nf">IL_STUB_PInvoke</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">StartNew</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">.ctor</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">Reset</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">Start</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">GetTimestamp</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x6</span><span class="o">:</span> <span class="nf">stloc.0</span>
<span class="nl">loc0 </span><span class="o">:</span> <span class="nf">class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]</span>
<span class="nl">loc1 </span><span class="o">:</span> <span class="nf">int: 0</span>
<span class="nl">loc2 </span><span class="o">:</span> <span class="nf">bool: false</span>
<span class="nl">0x7</span><span class="o">:</span> <span class="nf">ldc.i4.1</span>
<span class="nl">0x8</span><span class="o">:</span> <span class="nf">stloc.1</span>
<span class="nl">loc0 </span><span class="o">:</span> <span class="nf">class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]</span>
<span class="nl">loc1 </span><span class="o">:</span> <span class="nf">int: 1</span>
<span class="nl">loc2 </span><span class="o">:</span> <span class="nf">bool: false</span>
<span class="nl">0x9</span><span class="o">:</span> <span class="nf">br.s</span>
<span class="nl">0x27</span><span class="o">:</span> <span class="nf">ldloc.1</span>
<span class="nl">0x28</span><span class="o">:</span> <span class="nf">ldc.i4.2</span>
<span class="nl">0x29</span><span class="o">:</span> <span class="nf">clt</span>
<span class="nl">0x2b</span><span class="o">:</span> <span class="nf">stloc.2</span>
<span class="nl">loc0 </span><span class="o">:</span> <span class="nf">class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]</span>
<span class="nl">loc1 </span><span class="o">:</span> <span class="nf">int: 1</span>
<span class="nl">loc2 </span><span class="o">:</span> <span class="nf">bool: true</span>
<span class="nl">0x2c</span><span class="o">:</span> <span class="nf">ldloc.2</span>
<span class="nl">0x2d</span><span class="o">:</span> <span class="nf">brtrue.s</span>
<span class="nl">0xb</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0xc</span><span class="o">:</span> <span class="nf">ldstr</span>
<span class="nl">0x11</span><span class="o">:</span> <span class="nf">ldloc.1</span>
<span class="nl">0x12</span><span class="o">:</span> <span class="nf">box</span>
<span class="nl">0x17</span><span class="o">:</span> <span class="nf">call</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x1c</span><span class="o">:</span> <span class="nf">call</span>
<span class="err">Completed</span> <span class="err">1</span> <span class="err">iterations</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x21</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0x22</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0x23</span><span class="o">:</span> <span class="nf">ldloc.1</span>
<span class="nl">0x24</span><span class="o">:</span> <span class="nf">ldc.i4.1</span>
<span class="nl">0x25</span><span class="o">:</span> <span class="nf">add</span>
<span class="nl">0x26</span><span class="o">:</span> <span class="nf">stloc.1</span>
<span class="nl">loc0 </span><span class="o">:</span> <span class="nf">class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]</span>
<span class="nl">loc1 </span><span class="o">:</span> <span class="nf">int: 2</span>
<span class="nl">loc2 </span><span class="o">:</span> <span class="nf">bool: true</span>
<span class="nl">0x27</span><span class="o">:</span> <span class="nf">ldloc.1</span>
<span class="nl">0x28</span><span class="o">:</span> <span class="nf">ldc.i4.2</span>
<span class="nl">0x29</span><span class="o">:</span> <span class="nf">clt</span>
<span class="nl">0x2b</span><span class="o">:</span> <span class="nf">stloc.2</span>
<span class="nl">loc0 </span><span class="o">:</span> <span class="nf">class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]</span>
<span class="nl">loc1 </span><span class="o">:</span> <span class="nf">int: 2</span>
<span class="nl">loc2 </span><span class="o">:</span> <span class="nf">bool: false</span>
<span class="nl">0x2c</span><span class="o">:</span> <span class="nf">ldloc.2</span>
<span class="nl">0x2d</span><span class="o">:</span> <span class="nf">brtrue.s</span>
<span class="nl">0x2f</span><span class="o">:</span> <span class="nf">ldloc.0</span>
<span class="nl">0x30</span><span class="o">:</span> <span class="nf">callvirt</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">Stop</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x35</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0x36</span><span class="o">:</span> <span class="nf">ldstr</span>
<span class="nl">0x3b</span><span class="o">:</span> <span class="nf">ldloc.0</span>
<span class="nl">0x3c</span><span class="o">:</span> <span class="nf">callvirt</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">get_ElapsedMilliseconds</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">GetElapsedDateTimeTicks</span>
<span class="nl">Skipping ConsoleApplication.Stopwatch</span><span class="o">:</span><span class="nf">GetRawElapsedTicks</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x41</span><span class="o">:</span> <span class="nf">box</span>
<span class="nl">0x46</span><span class="o">:</span> <span class="nf">call</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x4b</span><span class="o">:</span> <span class="nf">call</span>
<span class="err">Took</span> <span class="err">33</span> <span class="err">msecs</span>
<span class="nl">Returning to method ConsoleApplication.Program</span><span class="o">:</span><span class="nf">Main(class)</span><span class="p">,</span><span class="nf"> stub num 1.</span>
<span class="nl">0x50</span><span class="o">:</span> <span class="nf">nop</span>
<span class="nl">0x51</span><span class="o">:</span> <span class="nf">ret</span>
</code></pre></div></div>
<p>So you can clearly see the interpreter in action, executing the individual IL instructions and showing the current values of any local variables as it goes along. Then, once the entire program has run, you also get some nice summary statistics (this time from a full-run, with <code class="language-plaintext highlighter-rouge">100,000</code> iterations):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IL instruction profiling:
Instructions (24000085 total, 20000083 1-byte):
Instruction | execs | % | cum %
-------------------------------------------
ldloc.1 | 3000011 | 12.50% | 12.50%
ceq | 3000001 | 12.50% | 25.00%
ldc.i4.0 | 3000001 | 12.50% | 37.50%
nop | 2000013 | 8.33% | 45.83%
stloc.2 | 2000001 | 8.33% | 54.17%
ldc.i4 | 2000001 | 8.33% | 62.50%
brtrue.s | 2000001 | 8.33% | 70.83%
ldloc.2 | 2000001 | 8.33% | 79.17%
ldc.i4.1 | 1000001 | 4.17% | 83.33%
cgt | 1000001 | 4.17% | 87.50%
stloc.1 | 1000001 | 4.17% | 91.67%
rem | 1000000 | 4.17% | 95.83%
add | 1000000 | 4.17% | 100.00%
call | 23 | 0.00% | 100.00%
ldstr | 11 | 0.00% | 100.00%
box | 11 | 0.00% | 100.00%
ldloc.0 | 2 | 0.00% | 100.00%
callvirt | 2 | 0.00% | 100.00%
br.s | 1 | 0.00% | 100.00%
stloc.0 | 1 | 0.00% | 100.00%
ret | 1 | 0.00% | 100.00%
</code></pre></div></div>
<hr />
<h2 id="main-sections-of-the-interpreter-code">Main sections of the Interpreter code</h2>
<p>Now we’ve seen it in action, let’s take a look at the code within the Interpreter and see <strong>how</strong> it works</p>
<h3 id="top-level-dispatcher">Top-level dispatcher</h3>
<p>At the heart of the Interpreter is a <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L2073-L3261">giant switch statement</a> (in <code class="language-plaintext highlighter-rouge">Interpreter::ExecuteMethod(..)</code>), that is almost 1,200 lines long! In it you’ll find <em>lots</em> of code like this:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">switch</span> <span class="p">(</span><span class="o">*</span><span class="n">m_ILCodePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="n">CEE_NOP</span><span class="p">:</span>
<span class="n">m_ILCodePtr</span><span class="o">++</span><span class="p">;</span>
<span class="k">continue</span><span class="p">;</span>
<span class="k">case</span> <span class="n">CEE_BREAK</span><span class="p">:</span> <span class="c1">// TODO: interact with the debugger?</span>
<span class="n">m_ILCodePtr</span><span class="o">++</span><span class="p">;</span>
<span class="k">continue</span><span class="p">;</span>
<span class="k">case</span> <span class="n">CEE_LDARG_0</span><span class="p">:</span>
<span class="n">LdArg</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="n">CEE_LDARG_1</span><span class="p">:</span>
<span class="n">LdArg</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In total, there are 199 <code class="language-plaintext highlighter-rouge">case</code> statements, corresponding to all the available CLR <a href="https://en.wikipedia.org/wiki/List_of_CIL_instructions">Intermediate Language (IL) op-codes</a>, in all their different combinations, for instance <code class="language-plaintext highlighter-rouge">CEE_LDC_??</code>, i.e. <code class="language-plaintext highlighter-rouge">CEE_LDC_I4</code>, <code class="language-plaintext highlighter-rouge">CEE_LDC_I8</code>, <code class="language-plaintext highlighter-rouge">CEE_LDC_R4</code> and <code class="language-plaintext highlighter-rouge">CEE_LDC_R8</code>. The large majority of the <code class="language-plaintext highlighter-rouge">case</code> statements just call out to another function that does the actual work, although there are some exceptions, <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L2268-L2391">such as <code class="language-plaintext highlighter-rouge">CEE_RET</code></a>.</p>
<h3 id="method-calls">Method calls</h3>
<p>The other task that takes up lots of code in the interpreter is handling method calls, over 2,500 L.O.C in total! This is spread across several methods, each doing a particular part of the work:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8965-L10027">void Interpreter::DoCallWork(..)</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">CALL</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.call%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Calls the method indicated by the passed method descriptor</a></li>
<li><code class="language-plaintext highlighter-rouge">CALLVIRT</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.callvirt%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Calls a late-bound method on an object, pushing the return value onto the evaluation stack.</a></li>
<li>Also via <code class="language-plaintext highlighter-rouge">Interpreter::NewObj()</code>, i.e the <code class="language-plaintext highlighter-rouge">NEWOBJ</code> IL op-code</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L10032-L10427">void Interpreter::CallI()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">CALLI</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.calli%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Calls the method indicated on the evaluation stack (as a pointer to an entry point) with arguments described by a calling convention</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L660-L1600">CorJitResult Interpreter::GenerateInterpreterStub(..)</a>
<ul>
<li>The external entry point, i.e. the <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/jitinterface.cpp#L11969-L12012">JIT inserts a stub to this method</a></li>
<li>Also called via <code class="language-plaintext highlighter-rouge">Interpreter::InterpretMethodBody(..)</code></li>
<li>Actually emits <strong>assembly code</strong>!!</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L195-L424">void InterpreterMethodInfo::InitArgInfo(..)</a>
<ul>
<li>Called via <code class="language-plaintext highlighter-rouge">Interpreter::GenerateInterpreterStub(..)</code></li>
</ul>
</li>
</ul>
<p>In summary, this work involves <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">dynamically generating stubs</a> and ensuring that method arguments are in the right registers (hence the assembly code). It handles virtual methods, static and instance calls, delegates, intrinsics and probably a few other scenarios as well! In addition, if the method being called needs to be interpreted, it also has to make sure that happens.</p>
<h3 id="creating-objects-and-arrays">Creating objects and arrays</h3>
<p>The interpreter needs to handle some of the key functionality of a runtime, that is creating and initialising objects. To do this it has to call into the GC, before finally calling the constructor:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L5833-L6012">void Interpreter::NewObj()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">NEWOBJ</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.newobj%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Creates a new object or a new instance of a value type, pushing an object reference (type O) onto the evaluation stack</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L6015-L6085">void Interpreter::NewArr()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">NEWARR</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.newarr%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Pushes an object reference to a new zero-based, one-dimensional array whose elements are of a specific type onto the evaluation stack</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L5761-L5811">void Interpreter::InitObj()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">INITOBJ</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.initobj%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Initializes each field of the value type at a specified address to a null reference or a 0 of the appropriate primitive type</a></li>
</ul>
</li>
</ul>
<h3 id="boxing-and-unboxing">Boxing and Unboxing</h3>
<p>Another large chuck of code is dedicated to boxing/unboxing, that is converting ‘value types’ (<code class="language-plaintext highlighter-rouge">structs</code>) into <code class="language-plaintext highlighter-rouge">object</code> references when needed. The .NET IL provides specific op-codes to handle this:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8497-L8562">void Interpreter::Box()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">BOX</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.box%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Converts a value type to an object reference (type O)</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8602-L8693">void Interpreter::Unbox()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">UNBOX</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Converts the boxed representation of a value type to its unboxed form</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8747-L8871">void Interpreter::UnboxAny()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">UNBOX_ANY</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox_any%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">Converts the boxed representation of a type specified in the instruction to its unboxed form</a></li>
</ul>
</li>
</ul>
<h3 id="loading-and-storing-data">Loading and Storing data</h3>
<p>That is, reading/writing fields in an object or elements in an array:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L7533-L7690">void Interpreter::StFld()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">STFLD</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.stfld(v=vs.110).aspx">Replaces the value stored in the field of an object reference or pointer with a new value</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8246-L8385">void Interpreter::StElem()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">STELEM</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.stelem(v=vs.110).aspx">Replaces the array element at a given index with the value on the evaluation stack, whose type is specified in the instruction</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L7248-L7477">void Interpreter::LdFld(FieldDesc* fldIn)</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">LDFLD</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldfld(v=vs.110).aspx">Finds the value of a field in the object whose reference is currently on the evaluation stack</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L8120-L8243">void Interpreter::LdElem()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">LDELEM</code> <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldelem(v=vs.110).aspx">Loads the element at a specified array index onto the top of the evaluation stack as the type specified in the instruction</a></li>
</ul>
</li>
</ul>
<h3 id="other-specific-il-op-codes">Other Specific IL Op Codes</h3>
<p>There is also a significant amount of code (over 1,000 lines) that just deals with low-level operations, that is ‘comparisions’, ‘branching’ and ‘basic arithmetic’:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L6714-L7127">INT32 Interpreter::CompareOpRes(..)</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">CEQ</code>, <code class="language-plaintext highlighter-rouge">CGT</code>, <code class="language-plaintext highlighter-rouge">CGT_UN</code>, <code class="language-plaintext highlighter-rouge">CLT</code> & <code class="language-plaintext highlighter-rouge">CLT_UN</code> called via <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L6694-L6710">Interpreter::CompareOp()</a></li>
<li><code class="language-plaintext highlighter-rouge">BEQ</code>, <code class="language-plaintext highlighter-rouge">BGE</code>, <code class="language-plaintext highlighter-rouge">BGT</code>, <code class="language-plaintext highlighter-rouge">BLE</code>, <code class="language-plaintext highlighter-rouge">BLT</code>, <code class="language-plaintext highlighter-rouge">BNE_UN</code>, <code class="language-plaintext highlighter-rouge">BGE_UN</code>, <code class="language-plaintext highlighter-rouge">BGT_UN</code>, <code class="language-plaintext highlighter-rouge">BLE_UN</code>, <code class="language-plaintext highlighter-rouge">BLT_UN</code> called via <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L7199-L7245">Interpreter::BrOnComparison()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L4353-L4608">void Interpreter::BinaryArithOp()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">ADD</code>, <code class="language-plaintext highlighter-rouge">SUB</code>, <code class="language-plaintext highlighter-rouge">MUL</code>, <code class="language-plaintext highlighter-rouge">DIV</code> and <code class="language-plaintext highlighter-rouge">REM</code></li>
<li>in turn calls <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.hpp#L266-L313">Interpreter::BinaryArithOpWork(..)</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L4612-L4823">void Interpreter::BinaryArithOvfOp()</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">ADD_OVF</code>, <code class="language-plaintext highlighter-rouge">ADD_OVF_UN</code>, <code class="language-plaintext highlighter-rouge">MUL_OVF</code>, <code class="language-plaintext highlighter-rouge">MUL_OVF_UN</code>, <code class="language-plaintext highlighter-rouge">SUB_OVF</code>, <code class="language-plaintext highlighter-rouge">SUB_OVF_UN</code></li>
<li>in turn calls <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L4825-L4866">Interpreter::BinaryArithOvfOpWork(..)</a></li>
</ul>
</li>
</ul>
<h3 id="working-with-the-garbage-collector-gc">Working with the Garbage Collector (GC)</h3>
<p>In addition, the interpreter has to provide the GC with the information it needs. This happens when the GC calls <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L3667-L3762">Interpreter::GCScanRoots(..)</a>, with additional work talking place in <a href="https://github.com/dotnet/coreclr/blob/48e244855c98c6f280c986d0981238f403a49ff3/src/vm/interpreter.cpp#L3765-L3795">Interpreter::GCScanRootAtLoc(..)</a>. Very simply the interpreter has to let the GC know about any ‘root’ objects that are currently ‘live’. This includes static variables and any local variables in the function that is currently executing.</p>
<p>When the interpreter locates a ‘root’ object, it notifies the GC via a callback (<code class="language-plaintext highlighter-rouge">pf(..)</code>):</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="n">Interpreter</span><span class="o">::</span><span class="n">GCScanRootAtLoc</span><span class="p">(</span><span class="n">Object</span><span class="o">**</span> <span class="n">loc</span><span class="p">,</span> <span class="n">InterpreterType</span> <span class="n">it</span><span class="p">,</span> <span class="n">promote_func</span><span class="o">*</span> <span class="n">pf</span><span class="p">,</span> <span class="n">ScanContext</span><span class="o">*</span> <span class="n">sc</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">pinningRef</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">it</span><span class="p">.</span><span class="n">ToCorInfoType</span><span class="p">())</span>
<span class="p">{</span>
<span class="k">case</span> <span class="n">CORINFO_TYPE_CLASS</span><span class="p">:</span>
<span class="k">case</span> <span class="n">CORINFO_TYPE_STRING</span><span class="p">:</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pinningRef</span><span class="p">)</span> <span class="n">flags</span> <span class="o">|=</span> <span class="n">GC_CALL_PINNED</span><span class="p">;</span>
<span class="p">(</span><span class="o">*</span><span class="n">pf</span><span class="p">)(</span><span class="n">loc</span><span class="p">,</span> <span class="n">sc</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">....</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="integration-with-the-virtual-machine-vm">Integration with the Virtual Machine (VM)</h2>
<p>Finally, whilst the Interpreter is fairly self-contained, there are times where it needs to work with the rest of the runtime</p>
<ul>
<li>The Run-time is responsible for <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/ceemain.cpp#L816-L818">starting</a> and <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/ceemain.cpp#L1824-L1826">stopping</a> the interpreter</li>
<li>The JIT <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/jitinterface.cpp#L11969-L12012">wires up interpreter stubs</a> or uses them as a fall-back if JIT compilation fails. In addition the JIT ‘pre-stubs’ allow for interpreted methods <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/prestub.cpp#L255-L655">when calling the JIT itself</a> and when <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/prestub.cpp#L1146-L1633">the ‘pre-stub’ is executed</a></li>
<li>Stack-walking <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/stackwalk.cpp#L80-L158">takes account of interpreter frames</a>, by utilising <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/frames.cpp#L1030-L1049">InterpreterFrame data structures</a></li>
<li>When looking up the <code class="language-plaintext highlighter-rouge">MethodDesc</code> for a given code address, the <a href="https://github.com/dotnet/coreclr/blob/1c4fda612e8a4f0d48346c477d058fa3fddf514e/src/vm/methodtable.cpp#L7524-L7535">interpreter stubs are accounted for</a></li>
</ul>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=14007489">HackerNews</a> and <a href="https://www.reddit.com/r/programming/comments/62hcde/the_c_interpreter/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/03/30/The-.NET-IL-Interpreter/">The .NET IL Interpreter</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
A Hitchhikers Guide to the CoreCLR Source Code2017-03-23T00:00:00+00:00http://www.mattwarren.org/2017/03/23/Hitchhikers-Guide-to-the-CoreCLR-Source-Code
<link rel="stylesheet" href="/datavis/treemap-coreclr.css" />
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="/datavis/treemap-coreclr.js" type="text/javascript"></script>
<p><a href="https://www.flickr.com/photos/toddle_email_newsletters/18056890646"><img src="/images/2017/03/Towel Day - Dont Panic - Douglas Adams - The Hitchhikers Guide to the Galaxy.jpg" alt="Towel Day - Dont Panic - Douglas Adams - The Hitchhikers Guide to the Galaxy" /></a></p>
<p><strong>photo by <a href="http://audiencestack.com/static/blog.html">Alan O’Rourke</a></strong></p>
<p>Just over 2 years ago Microsoft open-sourced the entire .NET framework, this posts attempts to provide a ‘Hitchhikers Guide’ to the source-code found in the <a href="https://github.com/dotnet/coreclr">CoreCLR GitHub repository</a>.</p>
<p>To make it easier for you to get to the information you’re interested in, this post is split into several parts</p>
<ul>
<li><a href="#overall-stats">Overall Stats</a></li>
<li><a href="#top-10-lists">‘Top 10’ lists</a></li>
<li><a href="#high-level-overview">High-level Overview</a></li>
<li><a href="#deep-dive-into-individual-areas">Deep Dive into Individual Areas</a>
<ul>
<li><a href="#mscorlib">mscorlib (C# code)</a></li>
<li><a href="#vm-virtual-machine">Virtual Machine (VM)</a></li>
<li><a href="#jit-just-in-time-compiler">Just-in-Time compiler (JIT)</a></li>
<li><a href="#pal-platform-adaptation-layer">Platform Adaptation Layer (PAL)</a></li>
<li><a href="#gc-garbage-collector">Garbage Collector (GC)</a></li>
<li><a href="#debug">Debugger</a></li>
</ul>
</li>
<li><a href="#all-the-rest">All the rest</a></li>
</ul>
<p>It’s worth pointing out that .NET Developers have provided 2 excellent glossaries, the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/glossary.md">CoreCLR one</a> and the <a href="https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/glossary.md">CoreFX one</a>, so if you come across any unfamiliar terms or abbreviations, check these first. Also there is extensive <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/">documentation available</a> and if you are interested in the low-level details I <em>really</em> recommend checking out the <a href="https://github.com/dotnet/coreclr/tree/master/Documentation/botr">‘Book of the Runtime’ (BotR)</a>.</p>
<hr />
<h2 id="overall-stats">Overall Stats</h2>
<p>If you take a look at the repository on GitHub, it shows the following stats for the entire repo</p>
<p><img src="/images/2017/03/CoreCLR GitHub repo info.png" alt="CoreCLR GitHub repo info" /></p>
<p>But most of the C# code is test code, so if we just look under <a href="https://github.com/dotnet/coreclr/tree/master/src"><code class="language-plaintext highlighter-rouge">/src</code></a> (i.e. ignore any code under <a href="https://github.com/dotnet/coreclr/tree/master/tests"><code class="language-plaintext highlighter-rouge">/tests</code></a>) there are the following mix of <strong>Source</strong> file types, i.e. no ‘.txt’, ‘.dat’, etc:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> - 2,012 .cpp
- 1,183 .h
- 956 .cs
- 113 .inl
- 98 .hpp
- 51 .S
- 43 .py
- 42 .asm
- 24 .idl
- 20 .c
</code></pre></div></div>
<p>So by far the majority of the code is written in C++, but there is still also a fair amount of C# code (all under <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib">‘mscorlib’</a>). Clearly there are low-level parts of the CLR that have to be written in C++ or Assembly code because they need to be ‘close to the metal’ or have high performance, but it’s interesting that there are large parts of the runtime written in managed code itself.</p>
<p><strong>Note</strong>: All stats/lists in the post were calculated using <a href="https://github.com/dotnet/coreclr/commit/51a6b5ce75c853e77266b8e1ce8c264736d2aabe">commit 51a6b5c</a> from the 9th March 2017.</p>
<h3 id="compared-to-rotor">Compared to ‘Rotor’</h3>
<p>As a comparison here’s what the stats for <a href="https://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure">‘Rotor’ the Shared Source CLI</a> looked like back in October 2002. Rotor was ‘Shared Source’, not truly ‘Open Source’, so it didn’t have the same community involvements as the CoreCLR.</p>
<p><a href="/images/2017/03/Shared Source CLI Stats - Oct 2002.jpg"><img src="/images/2017/03/Shared Source CLI Stats - Oct 2002.jpg" alt="Shared Source CLI Stats - Oct 2002" /></a></p>
<p><strong>Note:</strong> SSCLI aka ‘Rotor’ includes the fx or base class libraries (BCL), but the CoreCLR doesn’t as they are now hosted separately in the <a href="https://github.com/dotnet/corefx">CoreFX GitHub repository</a></p>
<p>For reference, the equivalent stats for the CoreCLR source in March 2017 look like this:</p>
<ul>
<li>Packaged as 61.2 MB .zip archive
<ul>
<li>Over 10.8 million lines of code (2.6 million of source code, under \src)</li>
<li>24,485 Files (7,466 source)
<ul>
<li>6,626 C# (956 source)</li>
<li>2,074 C and C++</li>
<li>3,701 IL</li>
<li>93 Assembler</li>
<li>43 Python</li>
<li>6 Perl</li>
</ul>
</li>
</ul>
</li>
<li>Over 8.2 million lines of test code</li>
<li>Build output expands to over 1.2 G with tests
<ul>
<li>Product binaries 342 MB</li>
<li>Test binaries 909 MB</li>
</ul>
</li>
</ul>
<hr />
<h2 id="top-10-lists">Top 10 lists</h2>
<p>These lists are mostly just for fun, but they do give some insights into the code-base and how it’s structured.</p>
<h3 id="top-10-largest-files">Top 10 Largest Files</h3>
<p>You might have heard about the mammoth source file that is <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">gc.cpp</a>, which is so large that GitHub refuses to display it.</p>
<p>But it turns out it’s not the only large file in the source, there are also several files in the JIT that are around 20K LOC. However it seems that all the large files are C++ source code, so if you’re only interested in C# code, you don’t have to worry!!</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: left">File</th>
<th style="text-align: left"># Lines of Code</th>
<th style="text-align: center">Type</th>
<th style="text-align: left">Location</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">gc.cpp</a></td>
<td style="text-align: left">37,037</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\gc\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/flowgraph.cpp">flowgraph.cpp</a></td>
<td style="text-align: left">24,875</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\jit\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenlegacy.cpp">codegenlegacy.cpp</a></td>
<td style="text-align: left">21,727</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\jit\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp">importer.cpp</a></td>
<td style="text-align: left">18,680</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\jit\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp">morph.cpp</a></td>
<td style="text-align: left">18,381</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\jit\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/inc/isolationpriv.h">isolationpriv.h</a></td>
<td style="text-align: left">18,263</td>
<td style="text-align: center">.h</td>
<td style="text-align: left">\src\inc\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/prebuilt/inc/cordebug.h">cordebug.h</a></td>
<td style="text-align: left">18,111</td>
<td style="text-align: center">.h</td>
<td style="text-align: left">\src\pal\prebuilt\inc\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/gentree.cpp">gentree.cpp</a></td>
<td style="text-align: left">17,177</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\jit\</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/master/src/debug/ee/debugger.cpp">debugger.cpp</a></td>
<td style="text-align: left">16,975</td>
<td style="text-align: center">.cpp</td>
<td style="text-align: left">\src\debug\ee\</td>
</tr>
</tbody>
</table>
</span>
<h3 id="top-10-longest-methods">Top 10 Longest Methods</h3>
<p>The large methods aren’t actually that hard to find, because they’re all have <code class="language-plaintext highlighter-rouge">#pragma warning(disable:21000)</code> before them, to keep the compiler happy! There are ~40 large methods in total, here’s the ‘Top 10’</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: left">Method</th>
<th style="text-align: right"># Lines of Code</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/vm/mlinfo.cpp#L1501-L3008">MarshalInfo::MarshalInfo(Module* pModule,</a></td>
<td style="text-align: right">1,507</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/gc/gc.cpp#L21419-L22924">void gc_heap::plan_phase (int condemned_gen_number)</a></td>
<td style="text-align: right">1,505</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/debug/di/process.cpp#L4533-L5884">void CordbProcess::DispatchRCEvent()</a></td>
<td style="text-align: right">1,351</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/debug/shared/dbgtransportsession.cpp#L1264-L2502">void DbgTransportSession::TransportWorker()</a></td>
<td style="text-align: right">1,238</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/utilcode/ex.cpp#L211-L1427">LPCSTR Exception::GetHRSymbolicName(HRESULT hr)</a></td>
<td style="text-align: right">1,216</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/ildasm/dis.cpp#L872-L1953">BOOL Disassemble(IMDInternalImport *pImport, BYTE *ILHeader,…</a></td>
<td style="text-align: right">1,081</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/debug/ee/debugger.cpp#L10555-L11605">bool Debugger::HandleIPCEvent(DebuggerIPCEvent * pEvent)</a></td>
<td style="text-align: right">1,050</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/vm/i386/gmsx86.cpp#L367-L1268">void LazyMachState::unwindLazyState(LazyMachState* baseState…</a></td>
<td style="text-align: right">901</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/vm/fieldmarshaler.cpp#L223-L1109">VOID ParseNativeType(Module* pModule,</a></td>
<td style="text-align: right">886</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/blob/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/vm/i386/stublinkerx86.cpp#L4934-L5773">VOID StubLinkerCPU::EmitArrayOpStub(const ArrayOpScript* pAr…</a></td>
<td style="text-align: right">839</td>
</tr>
</tbody>
</table>
</span>
<h3 id="top-10-files-with-the-most-commits">Top 10 files with the Most Commits</h3>
<p>Finally, lets look at which files have been changed the most since the <a href="https://github.com/dotnet/coreclr/commit/ef1e2ab">initial commit on GitHub</a> back in January 2015 (ignore ‘merge’ commits)</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: left">File</th>
<th style="text-align: right"># Commits</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/morph.cpp">src\jit\morph.cpp</a></td>
<td style="text-align: right">237</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/compiler.h">src\jit\compiler.h</a></td>
<td style="text-align: right">231</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/importer.cpp">src\jit\importer.cpp</a></td>
<td style="text-align: right">196</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/codegenxarch.cpp">src\jit\codegenxarch.cpp</a></td>
<td style="text-align: right">190</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/flowgraph.cpp">src\jit\flowgraph.cpp</a></td>
<td style="text-align: right">171</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/compiler.cpp">src\jit\compiler.cpp</a></td>
<td style="text-align: right">161</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/gentree.cpp">src\jit\gentree.cpp</a></td>
<td style="text-align: right">157</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/lower.cpp">src\jit\lower.cpp</a></td>
<td style="text-align: right">147</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/jit/gentree.h">src\jit\gentree.h</a></td>
<td style="text-align: right">137</td>
</tr>
<tr>
<td style="text-align: left"><a href="https://github.com/dotnet/coreclr/commits/51a6b5ce75c853e77266b8e1ce8c264736d2aabe/src/pal/inc/pal.h">src\pal\inc\pal.h</a></td>
<td style="text-align: right">136</td>
</tr>
</tbody>
</table>
</span>
<hr />
<h2 id="high-level-overview">High-level Overview</h2>
<p>Next we’ll take a look at how the source code is structured and what are the main components.</p>
<p>They say “A picture is worth a thousand words”, so below is a treemap with the source code files grouped by colour into the top-level sections they fall under. You can hover over an individual box to get more detailed information and can click on the different radio buttons to toggle the sizing (LOC/Files/Commits)</p>
<div id="top-level-treemap">
<!-- <svg width="960" height="570"></svg> -->
<svg width="800" height="570"></svg>
<form>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByLinesOfCode" checked="" /> Total L.O.C</label></span>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByNumFiles" /> # Files</label></span>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByNumCommits" /> # Commits</label></span>
</form>
</div>
<h3 id="notes-and-observations">Notes and Observations</h3>
<ul>
<li>The ‘# Commits’ only represent the commits made on GitHub, in the 2 1/2 years since the CoreCLR was open-sourced. So they are skewed to the recent work and don’t represent changes made over the entire history of the CLR. However it’s interesting to see which components have had more ‘churn’ in the last few years (i.e ‘jit’) and which have been left alone (e.g. ‘pal’)</li>
<li>From the number of LOC/files it’s clear to see what the significant components are within the CoreCLR source, e.g ‘vm’, ‘jit’, ‘pal’ & ‘mscorlib’ (these are covered in detail in the next part of this post)</li>
<li>In the ‘VM’ section it’s interesting to see how much code is generic ~650K LOC and how much is per-CPU architecture 25K LOC for ‘i386’, 16K for ‘amd64’, 14K for ‘arm’ and 7K for ‘arm64’. This suggests that the code is nicely organised so that the per-architecture work is minimised and cleanly separated out.</li>
<li>It’s surprising (to me) that the ‘GC’ section is as small as it is, I always thought of the GC is a very complex component, but there is way more code in the ‘debugger’ and the ‘pal’.</li>
<li>Likewise, I never really appreciated the complexity if the ‘JIT’, it’s the 2nd largest component, comprising over 370K LOC.</li>
</ul>
<p>If you’re interested, this raw numbers for the code under ‘/src’ are available in <a href="https://gist.github.com/mattwarren/33ca0c20d36be5790578e71f67975514">this gist</a> and for the code under ‘/tests/src’ in <a href="https://gist.github.com/mattwarren/9125c637dc1eb8dba18b2ab70023c0e4">this gist</a>.</p>
<hr />
<h2 id="deep-dive-into-individual-areas">Deep Dive into Individual Areas</h2>
<p>As the source code is well organised, the top-level folders (under <a href="https://github.com/dotnet/coreclr/tree/master/src">/src</a>) correspond to the logical components within the CoreCLR. We’ll start off by looking at the most significant components, i.e. the ‘<strong>Debugger</strong>’, ‘<strong>Garbage Collector</strong>’ (GC), ‘<strong>Just-in-Time compiler</strong>’ (JIT), ‘<strong>mscorlib</strong>’ (all the C# code), ‘<strong>Platform Adaptation Layer</strong>’ (PAL) and the CLR ‘<strong>Virtual Machine</strong>’ (VM).</p>
<h3 id="mscorlib"><a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib">mscorlib</a></h3>
<p>The ‘mscorlib’ folder contains all the C# code within the CoreCLR, so it’s the place that most C# developers would start looking if they wanted to contribute. For this reason it deserves it’s own treemap, so we can see how it’s structured:</p>
<div id="mscorlib-treemap">
<svg width="800" height="570"></svg>
<form>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByLinesOfCode" checked="" /> Total L.O.C</label></span>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByNumFiles" /> # Files</label></span>
<span style="padding-right: 5em"><label><input type="radio" name="mode" value="sumByNumCommits" /> # Commits</label></span>
</form>
</div>
<p>So by-far the bulk of the code is at the ‘top-level’, i.e. directly in the <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System">‘System’ namespace</a>, this contains the fundamental types that <a href="https://gist.github.com/mattwarren/07b38f39e2adc4acdd5ec53d10a50751">have to exist for the CLR to run</a>, such as:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">AppDomain</code>, <code class="language-plaintext highlighter-rouge">WeakReference</code>, <code class="language-plaintext highlighter-rouge">Type</code>,</li>
<li><code class="language-plaintext highlighter-rouge">Array</code>, <code class="language-plaintext highlighter-rouge">Delegate</code>, <code class="language-plaintext highlighter-rouge">Object</code>, <code class="language-plaintext highlighter-rouge">String</code></li>
<li><code class="language-plaintext highlighter-rouge">Boolean</code>, <code class="language-plaintext highlighter-rouge">Byte</code>, <code class="language-plaintext highlighter-rouge">Char</code>, <code class="language-plaintext highlighter-rouge">Int16</code>, <code class="language-plaintext highlighter-rouge">Int32</code>, etc</li>
<li><code class="language-plaintext highlighter-rouge">Tuple</code>, <code class="language-plaintext highlighter-rouge">Span</code>, <code class="language-plaintext highlighter-rouge">ArraySegment</code>, <code class="language-plaintext highlighter-rouge">Attribute</code>, <code class="language-plaintext highlighter-rouge">DateTime</code></li>
</ul>
<p>Where possible the CoreCLR is written in C#, because of the benefits that ‘managed code’ brings, so there is a significant amount of code within the ‘mscorlib’ section. Note that anything under here is not externally exposed, when you write C# code that runs against the CoreCLR, you actually access everything through <a href="https://github.com/dotnet/corefx">the CoreFX</a>, which then <a href="https://www.simple-talk.com/blogs/anatomy-of-a-net-assembly-type-forwards/">type-forwards</a> to the CoreCLR where appropriate.</p>
<p>I don’t know the rules for what lives in CoreCLR v CoreFX, but based on what I’ve read on various GitHub issues, it seems that over time, more and more code is moving from CoreCLR -> CoreFX.</p>
<p>However the <em>managed</em> C# code is often deeply entwined with <em>unmanaged</em> C++, for instance several types are implemented across multiple files, e.g.</p>
<ul>
<li>Arrays - <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs">Arrays.cs</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/array.cpp">array.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/array.h">array.h</a></li>
<li>Assemblies - <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Reflection/Assembly.cs">Assembly.cs</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/assembly.cpp">assembly.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/assembly.hpp">assembly.hpp</a></li>
</ul>
<p>From what I understand this is done for performance reasons, any code that is perf sensitive will end up being implemented in C++ (or even Assembly), unless the JIT can suitable optimise the C# code.</p>
<h4 id="code-shared-with-corert"><strong>Code shared with CoreRT</strong></h4>
<p>Recently there has been a significant amount of work done to moved more and more code over into the ‘shared partition’. This is the area of the CoreCLR source code that is shared with <a href="https://github.com/dotnet/corert/">CoreRT</a> (‘the .NET Core runtime optimized for AOT compilation’). Because certain classes are implemented in both runtimes, they’ve ensured that the work isn’t duplicated and any fixes are shared in both locations. You can see how this works by looking at the links below:</p>
<ul>
<li>CoreCLR
<ul>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22shared+partition%22&type=Commits">‘shared partition’ commits</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src">Normal mscorlib</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/shared">Shared mscorlib</a></li>
</ul>
</li>
<li>CoreRT
<ul>
<li><a href="https://github.com/dotnet/corert/search?q=shared+partition&type=Commits&utf8=%E2%9C%93">‘shared partition’ commits</a></li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/src">Normal System.Private.Corelib</a></li>
<li><a href="https://github.com/dotnet/corert/tree/master/src/System.Private.CoreLib/shared">Shared System.Private.Corelib</a></li>
</ul>
</li>
</ul>
<h4 id="other-parts-of-mscorlib"><strong>Other parts of mscorlib</strong></h4>
<p>All the other sections of mscorlib line up with <code class="language-plaintext highlighter-rouge">namespaces</code> available in the .NET runtime and contain functionality that <em>most</em> C# devs will have used at one time or another. The largest ones in there are shown below (click to go directly to the source code):</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Reflection">System.Reflection</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Reflection/Emit">System.Reflection.Emit</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">FieldInfo</code>, <code class="language-plaintext highlighter-rouge">PropertyInfo</code>, <code class="language-plaintext highlighter-rouge">MethodInfo</code>, <code class="language-plaintext highlighter-rouge">AssemblyBuilder</code>, <code class="language-plaintext highlighter-rouge">TypeBuilder</code>, <code class="language-plaintext highlighter-rouge">MethodBuilder</code>, <code class="language-plaintext highlighter-rouge">ILGenerator</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Globalization">System.Globalization</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">CultureInfo</code>, <code class="language-plaintext highlighter-rouge">CalendarInfo</code>, <code class="language-plaintext highlighter-rouge">DateTimeParse</code>, <code class="language-plaintext highlighter-rouge">JulianCalendar</code>, <code class="language-plaintext highlighter-rouge">HebrewCalendar</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Threading">System.Threading</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Threading/Tasks">System.Threading.Tasks</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">Thread</code>, <code class="language-plaintext highlighter-rouge">Timer</code>, <code class="language-plaintext highlighter-rouge">Semaphore</code>, <code class="language-plaintext highlighter-rouge">Mutex</code>, <code class="language-plaintext highlighter-rouge">AsyncLocal<T></code>, <code class="language-plaintext highlighter-rouge">Task</code>, <code class="language-plaintext highlighter-rouge">Task<T></code>, <code class="language-plaintext highlighter-rouge">CancellationToken</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Runtime/CompilerServices">System.Runtime.CompilerServices</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Runtime/InteropServices">System.Runtime.InteropServices</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">Unsafe</code>, <code class="language-plaintext highlighter-rouge">[CallerFilePath]</code>, <code class="language-plaintext highlighter-rouge">[CallerLineNumber]</code>, <code class="language-plaintext highlighter-rouge">[CallerMemberName]</code>, <code class="language-plaintext highlighter-rouge">GCHandle</code>, <code class="language-plaintext highlighter-rouge">[LayoutKind]</code>, <code class="language-plaintext highlighter-rouge">[MarshalAs(..)]</code>, <code class="language-plaintext highlighter-rouge">[StructLayout(LayoutKind ..)]</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Diagnostics">System.Diagnostics</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">Assert</code>, <code class="language-plaintext highlighter-rouge">Debugger</code>, <code class="language-plaintext highlighter-rouge">Stacktrace</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Text">System.Text</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">StringBuilder</code>, <code class="language-plaintext highlighter-rouge">ASCIIEncoding</code>, <code class="language-plaintext highlighter-rouge">UTF8Encoding</code>, <code class="language-plaintext highlighter-rouge">UnicodeEncoding</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Collections">System.Collections</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">ArrayList</code>, <code class="language-plaintext highlighter-rouge">Hashtable</code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/Collections/Generic">System.Collections.Generic</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">Dictionary<T,U></code>, <code class="language-plaintext highlighter-rouge">List<T></code></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/mscorlib/src/System/IO">System.IO</a>
<ul>
<li><code class="language-plaintext highlighter-rouge">Stream</code>, <code class="language-plaintext highlighter-rouge">MemoryStream</code>, <code class="language-plaintext highlighter-rouge">File</code>, <code class="language-plaintext highlighter-rouge">TestReader</code>, <code class="language-plaintext highlighter-rouge">TestWriter</code></li>
</ul>
</li>
</ul>
<h3 id="vm-virtual-machine"><a href="https://github.com/dotnet/coreclr/blob/master/src/vm">vm (Virtual Machine)</a></h3>
<p>The VM, not surprisingly, is the largest component of the CoreCLR, with over 640K L.O.C spread across 576 files, and it contains the <em>guts</em> of the runtime. The bulk of the code is OS and CPU independent and written in C++, however there is also a significant amount of architecture-specific assembly code, see the section <a href="#cpu-architecture-specific-code">‘CPU Architecture-specific code’</a> for more info.</p>
<p>The VM contains the main start-up routine of the entire runtime <code class="language-plaintext highlighter-rouge">EEStartupHelper()</code> in <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L806-L1378">ceemain.cpp</a>, see <a href="/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/">‘The 68 things the CLR does before executing a single line of your code’</a> for all the details. In addition it provides the following functionality:</p>
<ul>
<li><strong>Type System</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/method.cpp">method.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/class.cpp">class.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/typedesc.cpp">typedesc.cpp</a></li>
</ul>
</li>
<li><strong>Loading types/classes</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/ceeload.cpp">ceeload.cpp</a> <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/methodtable.cpp">methodtable.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/methodtablebuilder.cpp">methodtablebuilder.cpp</a></li>
</ul>
</li>
<li><strong>Threading</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/threads.cpp">threads.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/threadstatics.cpp">threadstatics.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/threadsuspend.cpp">threadsuspend.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/win32threadpool.cpp">win32threadpool.cpp</a></li>
</ul>
</li>
<li><strong>Exception Handling and Stack Walking</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/exceptionhandling.cpp">exceptionhandling.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/excep.cpp">excep.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/stackwalk.cpp">stackwalk.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/frames.cpp">frames.cpp</a></li>
</ul>
</li>
<li><strong>Fundamental Types</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/object.cpp">object.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/array.cpp">array.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/appdomain.cpp">appdomain.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/safehandle.cpp">safehandle.cpp</a></li>
</ul>
</li>
<li><strong>Generics</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/generics.cpp">generics.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/genericdict.cpp">genericdict.cpp</a></li>
</ul>
</li>
<li><strong>An entire Interpreter</strong> (yes .NET can run interpreted!!)
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/interpreter.cpp">interpreter.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/interpreter.hpp">interpreter.hpp</a></li>
</ul>
</li>
<li><strong>Function calling mechanisms</strong> (see <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/mscorlib.md#calling-from-managed-to-native-code">BotR</a> for more info)
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/ecall.cpp">ecall.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/fcall.cpp">fcall.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/qcall.cpp">qcall.cpp</a></li>
</ul>
</li>
<li><strong>Stubs</strong> (used for <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">virtual dispatch</a> and <a href="/2017/01/25/How-do-.NET-delegates-work/">delegates</a> amongst other things)
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/arm/stubs.cpp">stubs.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/prestub.cpp">prestub.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/stubgen.cpp">stubgen.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/stubhelpers.cpp">stubhelpers.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/stubmgr.cpp">stubmgr.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/virtualcallstub.cpp">virtualcallstub.cpp</a></li>
</ul>
</li>
<li><strong>Event Tracing</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/eventtrace.cpp">eventtrace.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/eventreporter.cpp">eventreporter.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/eventstore.cpp">eventstore.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/nativeeventsource.cpp">nativeeventsource.cpp</a></li>
</ul>
</li>
<li><strong>Profiler</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/arm/profiler.cpp">profiler.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/profilermetadataemitvalidator.cpp">profilermetadataemitvalidator.cpp</a> <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/profattach.cpp">profattach.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/profdetach.cpp">profdetach.cpp</a></li>
</ul>
</li>
<li><strong>P/Invoke</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/dllimport.cpp">dllimport.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/dllimportcallback.cpp">dllimportcallback.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/marshalnative.cpp">marshalnative.cpp</a></li>
</ul>
</li>
<li><strong>Reflection</strong>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/vm/reflectioninvocation.cpp">reflectioninvocation.cpp</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/dispatchinfo.cpp">dispatchinfo.cpp</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/invokeutil.cpp">invokeutil.cpp</a></li>
</ul>
</li>
</ul>
<h4 id="cpu-architecture-specific-code"><strong>CPU Architecture-specific code</strong></h4>
<p>All the architecture-specific code is kept separately in several sub-folders, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/amd64">amd64</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/arm">arm</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/arm64">arm64</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/vm/i386">i386</a>. For example here’s the various implementations of the <code class="language-plaintext highlighter-rouge">WriteBarrier</code> function used by the GC:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/amd64/JitHelpers_FastWriteBarriers.asm#L44-L81">amd64</a> (.asm), there is also <a href="https://github.com/dotnet/coreclr/blob/4a0a82a8dabaabb1e9a82af944d70aed210838a3/src/vm/amd64/jithelpers_fastwritebarriers.S#L10-L73">a .S version</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/a9b25d4aa22a1f4ad5f323f6c826e318f5a720fe/src/vm/arm/asmhelpers.asm#L1625-L2101">arm</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/9baa44aa334cf6f032e4abeae10dc1b960aaeb57/src/vm/arm64/asmhelpers.asm#L314-L397">arm64</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/05e35b9e4edb317ec0fcfbe622ae3d7621ef5ae4/src/vm/i386/jithelp.asm#L118-L281">i386</a></li>
</ul>
<h3 id="jit-just-in-time-compiler"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit">jit (Just-in-Time compiler)</a></h3>
<p>Before we look at the actual source code, it’s worth looking at the different ‘flavours’ or the JIT that are available:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit">clrjit</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/standalone">standalone</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/compatjit">compatjit</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/legacyjit">legacyjit</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/protojit">protojit</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/protononjit">protononjit</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/jit/jitstd">jitstd</a></li>
</ul>
<p>Fortunately one of the Microsoft developers has <a href="https://github.com/dotnet/coreclr/pull/2214#issuecomment-161850464">clarified which one should be used</a></p>
<blockquote>
<p>Here’s my guidance on how non-MS contributors should think about contributing to the JIT: <strong>If you want to help advance the state of the production code-generators for .NET, then contribute to the new RyuJIT x86/ARM32 backend. This is our long term direction.</strong> If instead your interest is around getting the .NET Core runtime working on x86 or ARM32 platforms to do other things, <strong>by all means use and contribute bug fixes if necessary to the LEGACY_BACKEND paths in the RyuJIT code base today to unblock yourself.</strong> We do run testing on these paths today in our internal testing infrastructure and will do our best to avoid regressing it until we can replace it with something better. <strong>We just want to make sure that there will be no surprises or hard feelings for when the time comes to remove them from the code-base.</strong></p>
</blockquote>
<h4 id="jit-phases"><strong>JIT Phases</strong></h4>
<p>The JIT has almost 90 source files, but fortunately they correspond to the different phases it goes through, so it’s not too hard to find your way around. Using the table from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#phases-of-ryujit">‘Phases of RyuyJIT’</a>, I added the right-hand column so you can jump to the relevant source file(s):</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th><strong>Phase</strong></th>
<th><strong>IR Transformations</strong></th>
<th style="text-align: center"><strong>File</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#pre-import">Pre-import</a></td>
<td><code class="language-plaintext highlighter-rouge">Compiler->lvaTable</code> created and filled in for each user argument and variable. BasicBlock list initialized.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.hpp">compiler.hpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#importation">Importation</a></td>
<td><code class="language-plaintext highlighter-rouge">GenTree</code> nodes created and linked in to Statements, and Statements into BasicBlocks. Inlining candidates identified.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp">importer.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#inlining">Inlining</a></td>
<td>The IR for inlined methods is incorporated into the flowgraph.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/inline.cpp">inline.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/inlinepolicy.cpp">inlinepolicy.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#struct-promotion">Struct Promotion</a></td>
<td>New lvlVars are created for each field of a promoted struct.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp">morph.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#mark-addr-exposed">Mark Address-Exposed Locals</a></td>
<td>lvlVars with references occurring in an address-taken context are marked. This must be kept up-to-date.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.hpp">compiler.hpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#morph-blocks">Morph Blocks</a></td>
<td>Performs localized transformations, including mandatory normalization as well as simple optimizations.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp">morph.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#eliminate-qmarks">Eliminate Qmarks</a></td>
<td>All <code class="language-plaintext highlighter-rouge">GT_QMARK</code> nodes are eliminated, other than simple ones that do not require control flow.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.cpp">compiler.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#flowgraph-analysis">Flowgraph Analysis</a></td>
<td><code class="language-plaintext highlighter-rouge">BasicBlock</code> predecessors are computed, and must be kept valid. Loops are identified, and normalized, cloned and/or unrolled.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/flowgraph.cpp">flowgraph.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#normalize-ir">Normalize IR for Optimization</a></td>
<td>lvlVar references counts are set, and must be kept valid. Evaluation order of <code class="language-plaintext highlighter-rouge">GenTree</code> nodes (<code class="language-plaintext highlighter-rouge">gtNext</code>/<code class="language-plaintext highlighter-rouge">gtPrev</code>) is determined, and must be kept valid.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.cpp">compiler.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/lclvars.cpp">lclvars.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#ssa-vn">SSA and Value Numbering Optimizations</a></td>
<td>Computes liveness (<code class="language-plaintext highlighter-rouge">bbLiveIn</code> and <code class="language-plaintext highlighter-rouge">bbLiveOut</code> on <code class="language-plaintext highlighter-rouge">BasicBlocks</code>), and dominators. Builds SSA for tracked lvlVars. Computes value numbers.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/liveness.cpp">liveness.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#licm">Loop Invariant Code Hoisting</a></td>
<td>Hoists expressions out of loops.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/optimizer.cpp">optimizer.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#copy-propagation">Copy Propagation</a></td>
<td>Copy propagation based on value numbers.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/copyprop.cpp">copyprop.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#cse">Common Subexpression Elimination (CSE)</a></td>
<td>Elimination of redundant subexressions based on value numbers.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/optcse.cpp">optcse.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#assertion-propagation">Assertion Propagation</a></td>
<td>Utilizes value numbers to propagate and transform based on properties such as non-nullness.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/assertionprop.cpp">assertionprop.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#range-analysis">Range analysis</a></td>
<td>Eliminate array index range checks based on value numbers and assertions</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/rangecheck.cpp">rangecheck.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#rationalization">Rationalization</a></td>
<td>Flowgraph order changes from <code class="language-plaintext highlighter-rouge">FGOrderTree</code> to <code class="language-plaintext highlighter-rouge">FGOrderLinear</code>. All <code class="language-plaintext highlighter-rouge">GT_COMMA</code>, <code class="language-plaintext highlighter-rouge">GT_ASG</code> and <code class="language-plaintext highlighter-rouge">GT_ADDR</code> nodes are transformed.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/rationalize.cpp">rationalize.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#lowering">Lowering</a></td>
<td>Register requirements are fully specified (<code class="language-plaintext highlighter-rouge">gtLsraInfo</code>). All control flow is explicit.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/lower.cpp">lower.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/lowerarm.cpp">lowerarm.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/lowerarm64.cpp">lowerarm64.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/lowerxarch.cpp">lowerxarch.cpp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#reg-alloc">Register allocation</a></td>
<td>Registers are assigned (<code class="language-plaintext highlighter-rouge">gtRegNum</code> and/or <code class="language-plaintext highlighter-rouge">gtRsvdRegs</code>),and the number of spill temps calculated.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/regalloc.cpp">regalloc.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/register_arg_convention.cpp">register_arg_convention.cp</a></td>
</tr>
<tr>
<td><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md#code-generation">Code Generation</a></td>
<td>Determines frame layout. Generates code for each <code class="language-plaintext highlighter-rouge">BasicBlock</code>. Generates prolog & epilog code for the method. Emit EH, GC and Debug info.</td>
<td style="text-align: center"><a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenarm.cpp">codegenarm.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenarm64.cpp">codegenarm64.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegencommon.cpp">codegencommon.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenlegacy.cpp">codegenlegacy.cpp</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenlinear.cpp">codegenlinear.cpp</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp">codegenxarch.cpp</a></td>
</tr>
</tbody>
</table>
</span>
<h3 id="pal-platform-adaptation-layer"><a href="https://github.com/dotnet/coreclr/blob/master/src/pal">pal (Platform Adaptation Layer)</a></h3>
<p>The PAL provides an OS independent layer to give access to common low-level functionality such as:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/pal/src/file">File system</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/pal/src/thread">Threads</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/sync">Critical Sections</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/sharedmemory">Shared Memory</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/pal/src/safecrt">‘Safe’ C runtime-library (CRT)</a></li>
</ul>
<p>As .NET was originally written to run on Windows, all the APIs look very similar to the Win32 APIs. However for non-Windows platforms they are actually implemented using the functionality available on that OS. For example this is what PAL code to <a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/examples/example1.cpp">read/write a file</a> looks like:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">WCHAR</span> <span class="n">src</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="sc">'f'</span><span class="p">,</span> <span class="sc">'o'</span><span class="p">,</span> <span class="sc">'o'</span><span class="p">,</span> <span class="sc">'\0'</span><span class="p">};</span>
<span class="n">WCHAR</span> <span class="n">dest</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="sc">'b'</span><span class="p">,</span> <span class="sc">'a'</span><span class="p">,</span> <span class="sc">'r'</span><span class="p">,</span> <span class="sc">'\0'</span><span class="p">};</span>
<span class="n">WCHAR</span> <span class="n">dir</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="sc">'/'</span><span class="p">,</span> <span class="sc">'t'</span><span class="p">,</span> <span class="sc">'m'</span><span class="p">,</span> <span class="sc">'p'</span><span class="p">,</span> <span class="sc">'\0'</span><span class="p">};</span>
<span class="n">HANDLE</span> <span class="n">h</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">b</span><span class="p">;</span>
<span class="n">PAL_Initialize</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">**</span><span class="p">)</span><span class="n">argv</span><span class="p">);</span>
<span class="n">SetCurrentDirectoryW</span><span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="n">SetCurrentDirectoryW</span><span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CreateFileW</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">GENERIC_WRITE</span><span class="p">,</span> <span class="n">FILE_SHARE_READ</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">CREATE_NEW</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="n">WriteFile</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="s">"Testing</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="o">&</span><span class="n">b</span><span class="p">,</span> <span class="n">FALSE</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="n">CopyFileW</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">dest</span><span class="p">,</span> <span class="n">FALSE</span><span class="p">);</span>
<span class="n">DeleteFileW</span><span class="p">(</span><span class="n">src</span><span class="p">);</span>
<span class="n">PAL_Terminate</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The PAL does contain some <a href="https://github.com/dotnet/coreclr/tree/master/src/pal/src/arch">per-CPU assembly code</a>, but it’s only for very low-level functionality, for instance here’s the different implementations of the <code class="language-plaintext highlighter-rouge">DebugBreak</code> function:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/arch/amd64/debugbreak.S">amd64</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/arch/arm/debugbreak.S">arm</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/arch/arm64/debugbreak.S">arm64</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/pal/src/arch/i386/debugbreak.S">i386</a></li>
</ul>
<h3 id="gc-garbage-collector"><a href="https://github.com/dotnet/coreclr/blob/master/src/gc">gc (Garbage Collector)</a></h3>
<p>The GC is clearly a very complex piece of code, lying right at the heart of the CLR, so for more information about what it does I recommend reading the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md">BotR entry on ‘Garbage Collection Design’</a> and if you’re interested I’ve also written <a href="http://mattwarren.org/tags/#Garbage-Collectors">several blog posts</a> looking at its functionality.</p>
<p>However from a source code point-of-view the GC is pretty simple, it’s spread across just 19 .cpp files, but the bulk of the work is in <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">gc.cpp</a> (<a href="https://raw.githubusercontent.com/dotnet/coreclr/master/src/gc/gc.cpp">raw version</a>) all ~37K L.O.C of it!!</p>
<p>If you want to get deeper into the GC code (warning, it’s pretty dense), a good way to start is to search for the occurrences of various <code class="language-plaintext highlighter-rouge">ETW</code> events that are fired as the GC moves through the phases outlined in the BotR post above, these events are listed below:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCTriggered(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCAllocationTick_V1(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCFullNotify_V1(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCJoin_V2(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCMarkWithType(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCPerHeapHistory_V3(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCGlobalHeapHistory_V2(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCCreateSegment_V1(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCFreeSegment_V1(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwBGCAllocWaitBegin(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwBGCAllocWaitEnd(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwBGCDrainMark(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwBGCRevisit(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwBGCOverflow(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwPinPlugAtGCTime(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCCreateConcurrentThread_V1(..)</code></li>
<li><code class="language-plaintext highlighter-rouge">FireEtwGCTerminateConcurrentThread_V1(..)</code></li>
</ul>
<p>But the GC doesn’t work in isolation, it also requires help from the Execute Engine (EE), this is done via the <code class="language-plaintext highlighter-rouge">GCToEEInterface</code> which is implemented in <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/gcenv.ee.cpp">gcenv.ee.cpp</a>.</p>
<h4 id="local-gc-and-gc-sample"><strong>Local GC and GC Sample</strong></h4>
<p>Finally, there are 2 others ways you can get into the GC code and understand what it does.</p>
<p>Firstly there is a <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/sample/GCSample.cpp"><strong>GC sample</strong></a> the lets you use the full GC independent of the rest of the runtime. It shows you how to ‘create type layout information in format that the GC expects’, ‘implement fast object allocator and write barrier’ and ‘allocate objects and work with GC handles’, all in under 250 LOC!!</p>
<p>Also worth mentioning is the ‘<strong>Local GC</strong>’ project, which is an ongoing effort to decouple the GC from the rest of the runtime, they even have a dashboard so you can <a href="https://github.com/dotnet/coreclr/projects/3">track its progress</a>. Currently the GC code is too intertwined with the runtime and vica-versa, so ‘Local GC’ is aiming to break that link by providing a set of clear interfaces, <code class="language-plaintext highlighter-rouge">GCToOSInterface</code> and <code class="language-plaintext highlighter-rouge">GCToEEInterface</code>. This will help with the CoreCLR cross-platform efforts, making the GC easier to port to new OSes.</p>
<h3 id="debug"><a href="https://github.com/dotnet/coreclr/blob/master/src/debug">debug</a></h3>
<p>The CLR is a ‘managed runtime’ and one of the significant components it provides is a advanced debugging experience, via Visual Studio or WinDBG. This debugging experience is very complex and I’m not going to go into it in detail here, however if you want to learn more I recommend you read <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/dac-notes.md">‘Data Access Component (DAC) Notes’</a>.</p>
<p>But what does the source look like, how is it laid out? Well the a several main sub-components under the top-level <code class="language-plaintext highlighter-rouge">/debug</code> folder:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/debug/daccess">dacaccess</a> - the provides the ‘Data Access Component’ (DAC) functionality as outlined in the BotR page linked to above. The DAC is an abstraction layer over the internal structures in the runtime, which the debugger uses to inspect objects/classes</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/debug/di">di</a> - this contains the exposed APIs (or entry points) of the debugger, implemented by <code class="language-plaintext highlighter-rouge">CoreCLRCreateCordbObject(..)</code> in <a href="https://github.com/dotnet/coreclr/blob/master/src/debug/di/cordb.cpp">cordb.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/debug/ee">ee</a> - the section of debugger that works with the Execution Engine (EE) to do things like stack-walking</li>
<li><a href="https://github.com/dotnet/coreclr/tree/master/src/debug/inc">inc</a> - all the interfaces (.h) files that the debugger components implement</li>
</ul>
<hr />
<h3 id="all-the-rest">All the rest</h3>
<p>As well as the main components, there are various other top-level folders in the source, the full list is below:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/binder">binder</a>
<ul>
<li>The ‘binder’ is responsible for loading assemblies within a .NET program (except the <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/binder.cpp">mscorlib binder</a> which is elsewhere). The ‘binder’ comprises low-level code that controls <a href="https://github.com/dotnet/coreclr/blob/master/src/binder/assembly.cpp">Assemblies</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/binder/applicationcontext.cpp">Application Contexts</a> and the all-important <a href="https://github.com/dotnet/coreclr/blob/master/src/binder/bindinglog.cpp">Fusion Log</a> for diagnosing why assemblies aren’t loading!</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative">classlibnative</a>
<ul>
<li>Code for native implementations of many of the core data types in the CoreCLR, e.g. <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/arraynative.cpp">Arrays</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/objectnative.cpp">System.Object</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/stringnative.cpp">String</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/decimal.cpp">decimal</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/float/floatsingle.cpp">float</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/float/floatdouble.cpp">double</a>.</li>
<li>Also includes all the native methods exposed in the <a href="https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/system.cpp">‘System.Environment’</a> namespace, e.g. <code class="language-plaintext highlighter-rouge">Environment.ProcessorCount</code>, <code class="language-plaintext highlighter-rouge">Environment.TickCount</code>, <code class="language-plaintext highlighter-rouge">Environment.GetCommandLineArgs()</code>, <code class="language-plaintext highlighter-rouge">Environment.FailFast()</code>, etc</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/coreclr">coreclr</a>
<ul>
<li>Contains the different tools that can ‘host’ or run the CLR, e.g. <code class="language-plaintext highlighter-rouge">corerun</code>, <code class="language-plaintext highlighter-rouge">coreconsole</code> or <code class="language-plaintext highlighter-rouge">unixcorerun</code>. See <a href="/2016/07/04/How-the-dotnet-CLI-tooling-runs-your-code/">How the dotnet CLI tooling runs your code</a> for more info on how these tools work.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/corefx">corefx</a>
<ul>
<li>Several classes under the <a href="https://msdn.microsoft.com/en-us/library/system.globalization%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">‘System.Globalization’</a> namespace have native implementations, in here you will find the code for <a href="https://github.com/dotnet/coreclr/blob/master/src/corefx/System.Globalization.Native/calendarData.cpp">Calendar Data</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/corefx/System.Globalization.Native/locale.cpp">Locales</a>, <a href="https://github.com/dotnet/coreclr/blob/master/src/corefx/System.Globalization.Native/normalization.cpp">Text Normalisation</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/corefx/System.Globalization.Native/timeZoneInfo.cpp">Time Zone information</a>.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/dlls">dlls</a>
<ul>
<li>Wrapper code and build files that control how the various dlls are built. For instance <a href="https://github.com/dotnet/coreclr/tree/master/src/dlls/mscoree">mscoree</a> is the main Execution Engine (EE) and contains the <a href="https://github.com/dotnet/coreclr/blob/d905f67f12c6b2eed918894e0642ec972a1d9fec/src/dlls/mscoree/mscoree.cpp#L61-L116">CoreCLR DLL Entrypoint</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/dlls/mscoree/coreclr/CMakeLists.txt">CoreCLR build definition</a>, likewise <a href="https://github.com/dotnet/coreclr/blob/master/src/dlls/mscorrc">mscorrc</a> includes the <a href="https://github.com/dotnet/coreclr/blob/master/src/dlls/mscorrc/mscorrc.rc">resource file</a> that houses all the CoreCLR error messages.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/gcdump">gcdump</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/gcinfo">gcinfo</a>
<ul>
<li>Code that will write-out the <code class="language-plaintext highlighter-rouge">GCInfo</code> that is produced by the JIT to help the GC do it’s job. This <code class="language-plaintext highlighter-rouge">GCInfo</code> includes information about the ‘liveness’ of variables within a section of code and whether the method is <a href="/2016/08/08/GC-Pauses-and-Safe-Points/#gc-suspension-in-user-code">fully or partially interruptible</a>, which enables the EE to suspend methods when the GC is working.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/ilasm">ilasm</a>
<ul>
<li>IL (Intermediate Language) Assembler is a tool for converting IL code into a .NET executable, see the <a href="https://msdn.microsoft.com/en-us/library/496e4ekx%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">MSDN page</a> for more info and usage examples.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/ildasm">ildasm</a>
<ul>
<li>Tool for disassembling a .NET executable into the corresponding IL source code, again, see the <a href="https://msdn.microsoft.com/en-us/library/f7dy01k1%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">MSDN page</a> for info and usage examples.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/inc">inc</a>
<ul>
<li>Header files that define the ‘interfaces’ between the sub-components that make up the CoreCLR. For example <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/corjit.h">corjit.h</a> covers all communication between the Execution Engine (EE) and the JIT, that is ‘EE -> JIT’ and <a href="https://github.com/dotnet/coreclr/blob/master/src/inc/corinfo.h">corinfo.h</a> is the interface going the other way, i.e. ‘JIT -> EE’</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/ipcman">ipcman</a>
<ul>
<li>Code that enables the ‘Inter-Process Communication’ (IPC) used in .NET (mostly legacy and <em>probably</em> not cross-platform)</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/md">md</a>
<ul>
<li>The MetaData (md) code provides the ability to gather information about methods, classes, types and assemblies and is what makes <a href="http://odetocode.com/Articles/288.aspx">Reflection possible</a>.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/nativeresources">nativeresources</a>
<ul>
<li>A simple tool that is responsible for converting/extracting resources from a Windows Resource File.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/palrt">palrt</a>
<ul>
<li>The PAL (Platform Adaptation Layer) Run-Time, contains specific parts of the PAL layer.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/scripts">scripts</a>
<ul>
<li>Several Python scripts for auto-generating various files in the source (e.g. ETW events).</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/strongname">strongname</a>
<ul>
<li>The code for handling <a href="https://msdn.microsoft.com/en-us/library/wd40t7ad%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">‘strong-naming’</a>, including the <a href="https://github.com/dotnet/coreclr/blob/master/src/strongname/inc/thekey.h">signing</a> <a href="https://github.com/dotnet/coreclr/blob/master/src/strongname/inc/ecmakey.h">keys</a> used by the CoreCLR itself.</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/ToolBox">ToolBox</a>
<ul>
<li>Contains 2 stand-alone tools
<ul>
<li><a href="https://blogs.msdn.microsoft.com/jasonz/2003/10/21/sos-debugging-of-the-clr-part-1/">SOS (son-of-strike)</a> the CLR debugging extension that enables reporting of .NET specific information when using WinDBG</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/ToolBox/superpmi/readme.txt">SuperPMI</a> which enables testing of the JIT without requiring the full Execution Engine (EE)</li>
</ul>
</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/tools">tools</a>
<ul>
<li>Several cmd-line tools that can be used in conjunction with the CoreCLR, e.g. <a href="https://github.com/dotnet/coreclr/blob/master/src/tools/metainfo/metainfo.cpp">‘Runtime Meta Data Dump Utility’</a> and <a href="https://github.com/dotnet/coreclr/blob/master/src/tools/crossgen/crossgen.cpp">‘Native Image Generator’</a> (also known as <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/crossgen.md">‘crossgen’</a>)</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/unwinder">unwinder</a>
<ul>
<li>Provides the low-level functionality to make it possible for the debugger and exception handling components to <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md">walk or unwind the stack</a>. This is done via 2 functions, <code class="language-plaintext highlighter-rouge">GetModuleBase(..)</code> and <code class="language-plaintext highlighter-rouge">GetFunctionEntry(..)</code> which are implemented in CPU architecture-specific code, see <a href="https://github.com/dotnet/coreclr/tree/master/src/unwinder/amd64">amd64</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/unwinder/arm">arm</a>, <a href="https://github.com/dotnet/coreclr/tree/master/src/unwinder/arm64">arm64</a> and <a href="https://github.com/dotnet/coreclr/tree/master/src/unwinder/i386">i386</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/utilcode">utilcode</a>
<ul>
<li>Shared utility code that is used by the VM, Debugger and JIT</li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/zap">zap</a>
<ul>
<li>‘ZAP’ is the original code name for <a href="https://msdn.microsoft.com/en-us/library/6t9t5wcf%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396">NGen (Native Image Generator)</a>, a tool that creates native images from .NET IL code.</li>
</ul>
</li>
</ul>
<hr />
<p>If you’ve read this far <strong><a href="https://www.youtube.com/watch?v=N_dUmDBfp6k">‘So long and thanks for all the fish’</a></strong> (YouTube)</p>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=13949986">Hacker News</a> and <a href="https://www.reddit.com/r/programming/comments/6131kr/a_hitchhikers_guide_to_the_coreclr_source_code/">/r/programming</a></p>
The 68 things the CLR does before executing a single line of your code (*)2017-02-07T00:00:00+00:00http://www.mattwarren.org/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code
<p>Because the CLR is a managed environment there are several components within the runtime that need to be initialised before <em>any</em> of your code can be executed. This post will take a look at the EE (Execution Engine) start-up routine and examine the initialisation process in detail.</p>
<p>(*) 68 is only a rough guide, it depends on which version of the runtime you are using, which features are enabled and a few other things</p>
<hr />
<h2 id="hello-world">‘Hello World’</h2>
<p>Imagine you have the simplest possible C# program, what has to happen before the CLR prints ‘Hello World’ out to the console?</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System</span><span class="p">;</span>
<span class="k">namespace</span> <span class="nn">ConsoleApplication</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">Program</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Hello World!"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="the-code-path-into-the-ee-execution-engine">The code path into the EE (Execution Engine)</h2>
<p>When a .NET executable runs, control gets into the EE via the following code path:</p>
<ol>
<li><a href="https://github.com/dotnet/coreclr/blob/5c47caa806e6907df81e7a96864984df4d0f38cd/src/vm/ceemain.cpp#L2821-L2846">_CorExeMain()</a> (the external entry point)
<ul>
<li>call to <a href="https://github.com/dotnet/coreclr/blob/5c47caa806e6907df81e7a96864984df4d0f38cd/src/vm/ceemain.cpp#L2837">_CorExeMainInternal()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L2856-L2934">_CorExeMainInternal()</a>
<ul>
<li>call to <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L2891">EnsureEEStarted()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L366-L496">EnsureEEStarted()</a>
<ul>
<li>call to <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L429">EEStartup()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L1419-L1451">EEStartup()</a>
<ul>
<li>call to <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L1436">EEStartupHelper()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L806-L1378">EEStartupHelper()</a></li>
</ol>
<p>(if you’re interested in what happens before this, i.e. how a CLR Host can start-up the runtime, see my previous post <a href="/2016/07/04/How-the-dotnet-CLI-tooling-runs-your-code/">‘How the dotnet CLI tooling runs your code’</a>)</p>
<p>And so we end up in <code class="language-plaintext highlighter-rouge">EEStartupHelper()</code>, which at a high-level does the following (from <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L1411-L1417">a comment in ceemain.cpp</a>):</p>
<blockquote>
<p>EEStartup is responsible for all the one time initialization of the runtime.<br />
Some of the highlights of what it does include</p>
<ul>
<li>Creates the default and shared, appdomains.</li>
<li>Loads mscorlib.dll and loads up the fundamental types (System.Object …)</li>
</ul>
</blockquote>
<hr />
<h2 id="the-main-phases-in-ee-execution-engine-start-up-routine">The main phases in EE (Execution Engine) start-up routine</h2>
<p>But let’s look at what it does in detail, the lists below contain all the individual function calls made from <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L806-L1378">EEStartupHelper()</a> (~500 L.O.C). To make them easier to understand, we’ll split them up into separate phases:</p>
<ul>
<li><a href="#phase-1---set-up-the-infrastructure-that-needs-to-be-in-place-before-anything-else-can-run">Phase 1</a> - Set-up the <strong>infrastructure</strong> that needs to be in place before anything else can run</li>
<li><a href="#phase-2---initialise-the-core-low-level-components">Phase 2</a> - Initialise the <strong>core, low-level</strong> components</li>
<li><a href="#phase-3---start-up-the-low-level-components-ie-error-handling-profiling-api-debugging">Phase 3</a> - Start-up the <strong>low-level components</strong>, i.e. error handling, profiling API, debugging</li>
<li><a href="#phase-4---start-the-main-components-ie-garbage-collector-gc-appdomains-security">Phase 4</a> - Start the <strong>main components</strong>, i.e. Garbage Collector (GC), AppDomains, Security</li>
<li><a href="#phase-5-final-setup-and-then-notify-other-components-that-the-ee-has-started">Phase 5</a> - Final setup and then <strong>notify other components</strong> that the EE has started</li>
</ul>
<p><strong>Note</strong> some items in the list below are only included if a particular <a href="https://github.com/dotnet/coreclr/blob/master/clr.defines.targets">feature</a> is <a href="https://github.com/dotnet/coreclr/blob/master/clr.props">defined at build-time</a>, these are indicated by the inclusion on an <code class="language-plaintext highlighter-rouge">ifdef</code> statement. Also note that the links take you to the code for the function being <em>called</em>, not the line of code within <code class="language-plaintext highlighter-rouge">EEStartupHelper()</code>.</p>
<h3 id="phase-1---set-up-the-infrastructure-that-needs-to-be-in-place-before-anything-else-can-run">Phase 1 - <strong>Set-up the infrastructure that needs to be in place before anything else can run</strong></h3>
<ol>
<li>Wire-up <strong>console handling</strong> - <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx">SetConsoleCtrlHandler(..)</a> (<code class="language-plaintext highlighter-rouge">ifndef FEATURE_PAL</code>)</li>
<li>Initialise the internal <strong><code class="language-plaintext highlighter-rouge">SString</code> class</strong> (everything uses strings!) - <a href="https://github.com/dotnet/coreclr/blob/f5cbe4c9cab2873b60cd3c991732a250d2e164a2/src/utilcode/sstring.cpp#L46-L67">SString::Startup()</a></li>
<li>Make sure the <strong>configuration</strong> is set-up, so settings that control run-time options can be accessed - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/eeconfig.cpp#L140-L163">EEConfig::Set-up()</a> and <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L568-L581">InitializeHostConfigFile()</a> (<code class="language-plaintext highlighter-rouge">#if !defined(CROSSGEN_COMPILE)</code>)</li>
<li>Initialize <strong>Numa and CPU group information</strong> - <a href="https://github.com/dotnet/coreclr/blob/3992010c31ffc9eb50359713f1c29fd29902e04a/src/utilcode/util.cpp#L793-L796">NumaNodeInfo::InitNumaNodeInfo()</a> and <a href="https://github.com/dotnet/coreclr/blob/3992010c31ffc9eb50359713f1c29fd29902e04a/src/utilcode/util.cpp#L1029-L1065">CPUGroupInfo::EnsureInitialized()</a> (<code class="language-plaintext highlighter-rouge">#ifndef CROSSGEN_COMPILE</code>)</li>
<li>Initialize <strong>global configuration settings</strong> based on startup flags - <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L584-L648">InitializeStartupFlags()</a></li>
<li>Set-up the <strong>Thread Manager</strong> that gives the runtime access to the OS threading functionality (<code class="language-plaintext highlighter-rouge">StartThread()</code>, <code class="language-plaintext highlighter-rouge">Join()</code>, <code class="language-plaintext highlighter-rouge">SetThreadPriority()</code> etc) - <a href="https://github.com/dotnet/coreclr/blob/496c33f0b5c6ad87257dd1ff1c42ea8db0a53ae0/src/vm/threads.cpp#L1550-L1692">InitThreadManager()</a></li>
<li>Initialize <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803(v=vs.85).aspx"><strong>Event Tracing (ETW)</strong></a> and fire off the CLR startup events - <a href="https://github.com/dotnet/coreclr/blob/38a0b157a1bad7080763009746cce92be2388b8e/src/vm/eventtrace.cpp#L4275-L4306">InitializeEventTracing()</a> and <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/inc/eventtracebase.h#L123">ETWFireEvent(EEStartupStart_V1)</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_EVENT_TRACE</code>)</li>
<li>Set-up the <a href="https://msdn.microsoft.com/en-us/library/8dbf701c.aspx"><strong>GS Cookie (Buffer Security Check)</strong></a> to help prevent buffer overruns - <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L693-L741">InitGSCookie()</a></li>
<li>Create the data-structures needed to hold the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#the-stack-model"><strong>‘frames’ used for stack-traces</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/6ed21c52f25243b7cc1c64b19a47bbd4beb69314/src/vm/frames.cpp#L304-L321">Frame::Init()</a></li>
<li>Ensure initialization of <a href="https://blogs.msdn.microsoft.com/junfeng/2004/10/09/should-we-put-apphack-in-net-2-0/"><strong>Apphacks environment variables</strong></a> - <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=GetGlobalCompatibilityFlags">GetGlobalCompatibilityFlags()</a> (<code class="language-plaintext highlighter-rouge">#ifndef FEATURE_CORECLR</code>)</li>
<li>Create the <strong>diagnostic and performance logs</strong> used by the runtime - <a href="https://github.com/dotnet/coreclr/blob/f5cbe4c9cab2873b60cd3c991732a250d2e164a2/src/utilcode/log.cpp#L191-L200">InitializeLogging()</a> (<code class="language-plaintext highlighter-rouge">#ifdef LOGGING</code>) and <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/utilcode/perflog.cpp#L58-L148">PerfLog::PerfLogInitialize()</a> (<code class="language-plaintext highlighter-rouge">#ifdef ENABLE_PERF_LOG</code>)</li>
</ol>
<h3 id="phase-2---initialise-the-core-low-level-components">Phase 2 - <strong>Initialise the core, low-level components</strong></h3>
<ol>
<li><strong>Write to the log</strong> <code class="language-plaintext highlighter-rouge">===================EEStartup Starting===================</code></li>
<li>Ensure that the <strong>Runtime Library functions</strong> (that interact with ntdll.dll) are enabled - <a href="https://github.com/dotnet/coreclr/ blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/rtlfunctions.cpp#L24-L47">EnsureRtlFunctions()</a> (<code class="language-plaintext highlighter-rouge">#ifndef FEATURE_PAL</code>)</li>
<li>Set-up the <strong>global store for</strong> <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/synch.h"><strong>events (mutexes, semaphores)</strong></a> used for synchronisation within the runtime - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/eventstore.cpp#L207-L212">InitEventStore()</a></li>
<li>Create the <strong>Assembly Binding logging</strong> mechanism a.k.a <a href="https://msdn.microsoft.com/en-us/library/e74a18c4(v=vs.110).aspx">Fusion</a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/fusioninit.cpp#L174-L490">InitializeFusion()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_FUSION</code>)</li>
<li>Then initialize the actual <strong>Assembly Binder infrastructure</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/binder/coreclrbindercommon.cpp#L18-L29">CCoreCLRBinderHelper::Init()</a> which in turn calls <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/binder/assemblybinder.cpp#L454-L472">AssemblyBinder::Startup()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_FUSION</code> is NOT defined)</li>
<li>Set-up the heuristics used to control <a href="https://github.com/dotnet/coreclr/blob/73b4f008866b153a4d86785b648de4a281981c9e/Documentation/coding-guidelines/clr-code-guide.md#262-using-crsts"><strong>Monitors, Crsts, and SimpleRWLocks</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/syncblk.h#L160-L170">InitializeSpinConstants()</a></li>
<li>Initialize the <strong>InterProcess Communication with COM</strong> (IPC) - <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L4209-L4317">InitializeIPCManager()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_IPCMAN</code>)</li>
<li>Set-up and enable <strong>Performance Counters</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/inc/perfcounters.h">PerfCounters::Init()</a> (<code class="language-plaintext highlighter-rouge">#ifdef ENABLE_PERF_COUNTERS</code>)</li>
<li>Set-up the <strong>CLR interpreter</strong> - <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/interpreter.cpp#L6612-L6635">Interpreter::Initialize()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_INTERPRETER</code>), turns out that the CLR has a mode where your code is interpreted instead of compiled!</li>
<li>Initialise the <strong>stubs that are used by the CLR for</strong> <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode"><strong>calling methods and triggering the JIT</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/stubmgr.cpp#L719-L729">StubManager::InitializeStubManagers()</a>, also <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/stublink.cpp#L2281-L2293">Stub::Init()</a> and <a href="https://github.com/dotnet/coreclr/blob/375948e39cf1a946b3d8048ca51cd4e548f94648/src/vm/i386/stublinkerx86.cpp#L841-L860">StubLinkerCPU::Init()</a></li>
<li>Set up the <strong>core handle map</strong>, used to load assemblies into memory - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/peimage.cpp#L39-L78">PEImage::Startup()</a></li>
<li>Startup the <strong>access checks options</strong>, used for granting/denying security demands on method calls - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/clsload.cpp#L4960-L4969">AccessCheckOptions::Startup()</a></li>
<li>Startup the <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/mscorlib.md#interface-between-managed--clr-code"><strong>mscorlib binder</strong></a> (used for loading “known” types from mscorlib.dll) - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/binder.cpp#L487-L491">MscorlibBinder::Startup()</a></li>
<li>Initialize <a href="https://msdn.microsoft.com/en-us/library/kwdt6w2k(v=vs.71).aspx"><strong>remoting</strong></a>, <strong>which allows out-of-process communication</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/remoting.cpp#L121-L129">CRemotingServices::Initialize()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_REMOTING</code>)</li>
<li>Set-up the data structures used by the GC for <a href="https://msdn.microsoft.com/en-us/library/ms404247(v=vs.110).aspx"><strong>weak, strong and no-pin references</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/38a0b157a1bad7080763009746cce92be2388b8e/src/gc/objecthandle.cpp#L612-L679">Ref_Initialize()</a></li>
<li>Set-up the contexts used to <a href="https://blogs.msdn.microsoft.com/suzcook/2003/06/12/executing-code-in-another-appdomain/"><strong>proxy method calls across App Domains</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/contexts.cpp#L139-L151">Context::Initialize()</a></li>
<li>Wire-up <strong>events that allow the EE to synchronise shut-down</strong> - <code class="language-plaintext highlighter-rouge">g_pEEShutDownEvent->CreateManualEvent(FALSE)</code></li>
<li>Initialise the process-wide data structures used for <strong>reader-writer lock implementation</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/rwlock.cpp#L115-L137">CRWLock::ProcessInit()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_RWLOCK</code>)</li>
<li>Initialize the <strong>debugger manager</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/corhost.cpp#L6090-L6100">CCLRDebugManager::ProcessInit()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_INCLUDE_ALL_INTERFACES</code>)</li>
<li>Initialize the <strong>CLR Security Attribute</strong> Manager - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/corhost.cpp#L6899-L6910">CCLRSecurityAttributeManager::ProcessInit()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_IPCMAN</code>)</li>
<li>Set-up the manager for <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md"><strong>Virtual call stubs</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/74967f89e0f43e156cf23cd88840e1f0fc94f997/src/vm/virtualcallstub.cpp#L859-L886">VirtualCallStubManager::InitStatic()</a></li>
<li>Initialise the lock that that <strong>GC uses when controlling memory pressure</strong> - <a href="https://github.com/dotnet/coreclr/blob/ffeef85a626d7344fd3e2031f749c356db0628d3/src/vm/comutilnative.cpp#L1634">GCInterface::m_MemoryPressureLock.Init(CrstGCMemoryPressure)</a></li>
<li>Initialize <strong>Assembly Usage Logger</strong> - <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L744-L772">InitAssemblyUsageLogManager()</a> (<code class="language-plaintext highlighter-rouge">#ifndef FEATURE_CORECLR</code>)</li>
</ol>
<h3 id="phase-3---start-up-the-low-level-components-ie-error-handling-profiling-api-debugging">Phase 3 - <strong>Start-up the low-level components, i.e. error handling, profiling API, debugging</strong></h3>
<ol>
<li>Set-up the <strong>App Domains</strong> used by the CLR - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L2229-L2287">SystemDomain::Attach()</a> (also creates the DefaultDomain and the SharedDomain by calling <a href="https://github.com/dotnet/coreclr/blob/93cb39e3c1bbd4407261926a7365949f288ebc37/src/vm/appdomain.cpp#L4505-L4536">SystemDomain::CreateDefaultDomain()</a> and <a href="https://github.com/dotnet/coreclr/blob/93cb39e3c1bbd4407261926a7365949f288ebc37/src/vm/appdomain.cpp#L11834-L11861">SharedDomain::Attach()</a>)</li>
<li>Start up the <strong>ECall interface</strong>, a private native calling interface used within the CLR - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/ecall.cpp#L510-L526">ECall::Init()</a></li>
<li>Set-up the <a href="/2017/01/25/How-do-.NET-delegates-work/"><strong>caches for the stubs used by <code class="language-plaintext highlighter-rouge">delegates</code></strong></a> - <a href="https://github.com/dotnet/coreclr/blob/c5abe8c5a3d74b8417378e03f560fd54799c17f2/src/vm/comdelegate.cpp#L524-L544">COMDelegate::Init()</a></li>
<li>Set-up all the <strong>global/static variables used by the EE itself</strong> - <a href="https://github.com/dotnet/coreclr/blob/b0e0168b65813f0067648966c81befff0a439da1/src/vm/codeman.cpp#L4164-L4187">ExecutionManager::Init()</a></li>
<li>Initialise <strong>Watson, for windows error reporting</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/dwreport.cpp#L166-L189">InitializeWatson(fFlags)</a> (<code class="language-plaintext highlighter-rouge">#ifndef FEATURE_PAL</code>)</li>
<li>Initialize the <strong>debugging services</strong>, this must be done before any EE thread objects are created, and before any classes or modules are loaded - <a href="https://github.com/dotnet/coreclr/blob/1d03b8fd8d650bd215623a7b035e68db96697e59/src/vm/ceemain.cpp#L4067-L4168">InitializeDebugger()</a> (<code class="language-plaintext highlighter-rouge">#ifdef DEBUGGING_SUPPORTED</code>)</li>
<li>Activate the <a href="https://msdn.microsoft.com/en-us/library/d21c150d(v=vs.110).aspx"><strong>Managed Debugging Assistants</strong></a> that the CLR provides - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/mda.cpp#L246-L270">ManagedDebuggingAssistants::EEStartupActivation()</a> (<code class="language-plaintext highlighter-rouge">ifdef MDA_SUPPORTED</code>)</li>
<li>Initialise the <a href="https://msdn.microsoft.com/en-us/library/bb384493(v=vs.110).aspx"><strong>Profiling API</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/profilinghelper.cpp#L493-L591">ProfilingAPIUtility::InitializeProfiling()</a> (<code class="language-plaintext highlighter-rouge">#ifdef PROFILING_SUPPORTED</code>)</li>
<li>Initialise the <strong>exception handling mechanism</strong> - <a href="https://github.com/dotnet/coreclr/blob/d24162bd144b37b2b353797db846aab80bf13db1/src/vm/exceptionhandling.cpp#L145-L168">InitializeExceptionHandling()</a></li>
<li>Install the CLR <strong>global exception filter</strong> - <a href="https://github.com/dotnet/coreclr/blob/2fc44782c783f363c1a98e0767f6fa65b5548c95/src/vm/excep.cpp#L4894-L5001">InstallUnhandledExceptionFilter()</a></li>
<li>Ensure that the initial <strong>runtime thread</strong> is created - <a href="https://github.com/dotnet/coreclr/blob/496c33f0b5c6ad87257dd1ff1c42ea8db0a53ae0/src/vm/threads.h#L649-L653">SetupThread()</a> in turn calls <a href="https://github.com/dotnet/coreclr/blob/496c33f0b5c6ad87257dd1ff1c42ea8db0a53ae0/src/vm/threads.cpp#L822-L1085">SetupThread(..)</a></li>
<li>Initialise the <strong>PreStub manager</strong> (<a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">PreStub’s trigger the JIT</a>) - <a href="https://github.com/dotnet/coreclr/blob/b1586fb32ae6bbb37966952c10308b328021db43/src/vm/prestub.cpp#L1688-L1702">InitPreStubManager()</a> and the corresponding helpers <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/stubhelpers.cpp#L46-L50">StubHelpers::Init()</a></li>
<li>Initialise the <strong>COM Interop layer</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/interoputil.cpp#L5346-L5368">InitializeComInterop()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_COMINTEROP</code>)</li>
<li>Initialise <strong>NDirect method calls</strong> (lazy binding of unmanaged P/Invoke targets) - <a href="https://github.com/dotnet/coreclr/blob/8c2db15331291324573d752fb3b6a3a9dae73b31/src/vm/dllimport.cpp#L7345-L7375">NDirect::Init()</a></li>
<li>Set-up the <strong>JIT Helper functions</strong>, so they are in place before the execution manager runs - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/jitinterfacegen.cpp#L193-L299">InitJITHelpers1()</a> and <a href="https://github.com/dotnet/coreclr/blob/3891c5f681eccd262f1ccca4bfa34a582573ce1d/src/vm/jithelpers.cpp#L6657-L6677">InitJITHelpers2()</a></li>
<li>Initialise and set-up the <strong>SyncBlock cache</strong> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/syncblk.cpp#L826-L829">SyncBlockCache::Attach()</a> and <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/syncblk.cpp#L919-L949">SyncBlockCache::Start()</a></li>
<li>Create the cache used when <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md"><strong>walking/unwinding the stack</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/1f1f95dc7b5c33a23ccc4df42078d11eb72d52db/src/vm/stackwalk.cpp#L3366-L3371">StackwalkCache::Init()</a></li>
</ol>
<h3 id="phase-4---start-the-main-components-ie-garbage-collector-gc-appdomains-security">Phase 4 - <strong>Start the main components, i.e. Garbage Collector (GC), AppDomains, Security</strong></h3>
<ol>
<li>Start up <strong>security system, that handles</strong> <a href="https://msdn.microsoft.com/en-us/library/930b76w0(v=vs.90).aspx"><strong>Code Access Security (CAS)</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/security.inl#L17-L21">Security::Start()</a> which in turn calls <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/securitypolicy.cpp#L94-L124">SecurityPolicy::Start()</a></li>
<li>Wire-up an event to allow <strong>synchronisation of AppDomain unloads</strong> - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L2617-L2630">AppDomain::CreateADUnloadStartEvent()</a></li>
<li>Initialise the <strong>‘Stack Probes’ used to setup stack guards</strong> <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/stackprobe.cpp#L556-L631">InitStackProbes()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_STACK_PROBE</code>)</li>
<li>Initialise the <strong>GC and create the heaps that it uses</strong> - <a href="https://github.com/dotnet/coreclr/blob/ace6d1b728f4041d351cbf05e9356a23305be182/src/gc/gccommon.cpp#L136-L159">InitializeGarbageCollector()</a></li>
<li>Initialise the <strong>tables used to hold the locations of pinned objects</strong> - <a href="https://github.com/dotnet/coreclr/blob/81c42cecca5e1b0b802d4df980280750d2e1419e/src/vm/nativeoverlapped.cpp#L363-L371">InitializePinHandleTable()</a></li>
<li>Inform the <strong>debugger about the DefaultDomain</strong>, so it can interact with it - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L4529-L4547">SystemDomain::System()->PublishAppDomainAndInformDebugger(..)</a> (<code class="language-plaintext highlighter-rouge">#ifdef DEBUGGING_SUPPORTED</code>)</li>
<li>Initialise the existing <strong>OOB Assembly List</strong> (no idea?) - <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/assembly.cpp#L5062-L5067">ExistingOobAssemblyList::Init()</a> (<code class="language-plaintext highlighter-rouge">#ifndef FEATURE_CORECLR</code>)</li>
<li>Actually initialise the <strong>System Domain (which contains mscorlib)</strong>, so that it can start executing - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L2478-L2591">SystemDomain::System()->Init()</a></li>
</ol>
<h3 id="phase-5-final-setup-and-then-notify-other-components-that-the-ee-has-started">Phase 5 <strong>Final setup and then notify other components that the EE has started</strong></h3>
<ol>
<li>Tell the <strong>profiler we’ve stated up</strong> - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L4606-L4657">SystemDomain::NotifyProfilerStartup()</a> (<code class="language-plaintext highlighter-rouge">#ifdef PROFILING_SUPPORTED</code>)</li>
<li>Pre-create a thread to <strong>handle AppDomain unloads</strong> - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L12944-L13004">AppDomain::CreateADUnloadWorker()</a> (<code class="language-plaintext highlighter-rouge">#ifndef CROSSGEN_COMPILE</code>)</li>
<li>Set a flag to confirm that <strong>‘initialisation’ of the EE succeeded</strong> - <code class="language-plaintext highlighter-rouge">g_fEEInit = false</code></li>
<li>Load the <strong>System Assemblies (‘mscorlib’) into the Default Domain</strong> - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L6397-L6432">SystemDomain::System()->DefaultDomain()->LoadSystemAssemblies()</a></li>
<li>Set-up all the <strong>shared static variables (and <code class="language-plaintext highlighter-rouge">String.Empty</code>) in the Default Domain</strong> - <a href="https://github.com/dotnet/coreclr/blob/e90db7bdfde00932d04188aa9eb105442a3fa294/src/vm/appdomain.cpp#L7548-L7613">SystemDomain::System()->DefaultDomain()->SetupSharedStatics()</a>, they are all contained in the internal class <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/SharedStatics.cs">SharedStatics.cs</a></li>
<li>Set-up the <strong>stack sampler feature</strong>, that identifies ‘hot’ methods in your code - <a href="https://github.com/dotnet/coreclr/blob/7250e6f6630839b09d54f2f71d858b33c018ae8b/src/vm/stacksampler.cpp#L85-L94">StackSampler::Init()</a> (<code class="language-plaintext highlighter-rouge">#ifdef FEATURE_STACK_SAMPLING</code>)</li>
<li>Perform any <strong>once-only</strong> <a href="https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.safehandle(v=vs.110).aspx"><strong>SafeHandle</strong></a> <strong>initialization</strong> - <a href="https://github.com/dotnet/coreclr/blob/0b064eef415468f50e7360256e42737d247eb677/src/vm/safehandle.cpp#L29-L51">SafeHandle::Init()</a> (<code class="language-plaintext highlighter-rouge">#ifndef CROSSGEN_COMPILE</code>)</li>
<li>Set flags to indicate that the <strong>CLR has successfully started</strong> - <code class="language-plaintext highlighter-rouge">g_fEEStarted = TRUE</code>, <code class="language-plaintext highlighter-rouge">g_EEStartupStatus = S_OK</code> and <code class="language-plaintext highlighter-rouge">hr = S_OK</code></li>
<li><strong>Write to the log</strong> <code class="language-plaintext highlighter-rouge">===================EEStartup Completed===================</code></li>
</ol>
<p><strong>Once this is all done, the CLR is now ready to execute your code!!</strong></p>
<hr />
<h2 id="executing-your-code">Executing your code</h2>
<p>Your code will be executed (after first being ‘JITted’) via the following code flow:</p>
<ol>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/corhost.cpp#L1267-L1365">CorHost2::ExecuteAssembly()</a>
<ul>
<li>calling <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/corhost.cpp#L1349">ExecuteMainMethod()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/5ff10a5b41d5481e21df9bbf5a4e8b419895530d/src/vm/assembly.cpp#L2698-L2784">Assembly::ExecuteMainMethod()</a>
<ul>
<li>calling <a href="https://github.com/dotnet/coreclr/blob/5ff10a5b41d5481e21df9bbf5a4e8b419895530d/src/vm/assembly.cpp#L2762">RunMain()</a></li>
</ul>
</li>
<li><a href="https://github.com/dotnet/coreclr/blob/5ff10a5b41d5481e21df9bbf5a4e8b419895530d/src/vm/assembly.cpp#L2529-L2660">RunMain() (in assembly.cpp)</a>
<ul>
<li>eventually calling into you <a href="https://github.com/dotnet/coreclr/blob/5ff10a5b41d5481e21df9bbf5a4e8b419895530d/src/vm/assembly.cpp#L2633-L2646">main() method</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/callhelpers.h#L390-L430">full explanation of the ‘call’ process</a></li>
</ul>
</li>
</ol>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=13593210">Hacker News</a> and <a href="https://www.reddit.com/r/programming/comments/5slr5m/the_68_things_the_clr_does_before_executing_a/">/r/programming</a></p>
<hr />
<h2 id="further-information">Further information</h2>
<p>The CLR provides a huge amount of log information if you create a <a href="https://github.com/dotnet/coreclr#building-the-repository">debug build</a> and then enable the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md">right environment variables</a>. The links below take you to the various logs produced when running a simple ‘hello world’ program (shown at the top of this post), they give you an pretty good idea of the different things that the CLR is doing behind-the-scenes.</p>
<ul>
<li><a href="/data/2017/02/All Classes Loaded.txt">All Classes Loaded</a></li>
<li><a href="/data/2017/02/All Methods JITted.txt">All Methods JITted</a></li>
<li><a href="/data/2017/02/COMPLUS-EVERYTHING.log">Entire log</a> (warning ~68K lines long!!)</li>
<li><a href="/data/2017/02/COMPLUS-EVERYTHING-Just-EEStartup.log">Log produced during EEStartupHelper() only</a> (only ~48K lines!!)</li>
<li><a href="/data\2017\02\COMPLUS-AppDomain.log">AppDomain log</a></li>
<li><a href="/data\2017\02\COMPLUS-ClassLoader.log">Class Loader log</a></li>
<li><a href="/data\2017\02\COMPLUS-ClassLoader-ConsoleApplication.log">Class loader log for <code class="language-plaintext highlighter-rouge">ConsoleApplication</code> only</a></li>
<li><a href="/data\2017\02\COMPLUS-CodeSharing.log">Code Sharing log</a></li>
<li><a href="/data\2017\02\COMPLUS-CORDB-(CoreDebugging).log">Core Debugging log</a></li>
<li><a href="/data\2017\02\COMPLUS-EH-(ExceptionHandling).log">Exception Handling log</a></li>
<li><a href="/data\2017\02\COMPLUS-Jit.log">JIT log</a></li>
<li><a href="/data\2017\02\COMPLUS-Loader.log">Loader log</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/02/07/The-68-things-the-CLR-does-before-executing-a-single-line-of-your-code/">The 68 things the CLR does before executing a single line of your code (*)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
How do .NET delegates work?2017-01-25T00:00:00+00:00http://www.mattwarren.org/2017/01/25/How-do-.NET-delegates-work
<p>Delegates are a fundamental part of the .NET runtime and whilst you rarely create them directly, they are there <em>under-the-hood</em> every time you use a lambda in LINQ (<code class="language-plaintext highlighter-rouge">=></code>) or a <code class="language-plaintext highlighter-rouge">Func<T></code>/<code class="language-plaintext highlighter-rouge">Action<T></code> to <a href="https://blogs.msdn.microsoft.com/madst/2007/01/23/is-c-becoming-a-functional-language/">make your code more functional</a>. But how do they actually work and what’s going in the CLR when you use them?</p>
<hr />
<h3 id="il-of-delegates-andor-lambdas">IL of delegates and/or lambdas</h3>
<p>Let’s start with a small code sample like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">delegate</span> <span class="kt">string</span> <span class="nf">SimpleDelegate</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
<span class="k">class</span> <span class="nc">DelegateTest</span>
<span class="p">{</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">Main</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// create an instance of the class</span>
<span class="n">DelegateTest</span> <span class="n">instance</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">DelegateTest</span><span class="p">();</span>
<span class="n">instance</span><span class="p">.</span><span class="n">name</span> <span class="p">=</span> <span class="s">"My instance"</span><span class="p">;</span>
<span class="c1">// create a delegate</span>
<span class="n">SimpleDelegate</span> <span class="n">d1</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">SimpleDelegate</span><span class="p">(</span><span class="n">instance</span><span class="p">.</span><span class="n">InstanceMethod</span><span class="p">);</span>
<span class="c1">// call 'InstanceMethod' via the delegate (compiler turns this into 'd1.Invoke(5)')</span>
<span class="kt">string</span> <span class="n">result</span> <span class="p">=</span> <span class="nf">d1</span><span class="p">(</span><span class="m">5</span><span class="p">);</span> <span class="c1">// returns "My instance: 5"</span>
<span class="p">}</span>
<span class="kt">string</span> <span class="nf">InstanceMethod</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"{0}: {1}"</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If you were to take a look at the IL of the <code class="language-plaintext highlighter-rouge">SimpleDelegate</code> class, the <code class="language-plaintext highlighter-rouge">ctor</code> and <code class="language-plaintext highlighter-rouge">Invoke</code> methods look like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">MethodCodeType</span><span class="p">=</span><span class="n">MethodCodeType</span><span class="p">.</span><span class="n">Runtime</span><span class="p">)]</span>
<span class="k">public</span> <span class="nf">SimpleDelegate</span><span class="p">(</span><span class="kt">object</span> <span class="n">@object</span><span class="p">,</span> <span class="n">IntPtr</span> <span class="n">method</span><span class="p">);</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">MethodCodeType</span><span class="p">=</span><span class="n">MethodCodeType</span><span class="p">.</span><span class="n">Runtime</span><span class="p">)]</span>
<span class="k">public</span> <span class="k">virtual</span> <span class="kt">string</span> <span class="nf">Invoke</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
</code></pre></div></div>
<p>It turns out that this behaviour is manadated by the spec, from <a href="http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf">ECMA 335 Standard - Common Language Infrastructure (CLI)</a>:</p>
<p><a href="/images/2017/01/Delegates in the Common Language Infrastructure (CLI) Spec - highlighted.png"><img src="/images/2017/01/Delegates in the Common Language Infrastructure (CLI) Spec - highlighted.png" alt="Delegates in the Common Language Infrastructure (CLI) Spec" /></a></p>
<p>So the internal implementation of a delegate, the part responsible for calling a method, is created by the runtime. This is because there needs to be complete control over those methods, delegates are a fundamental part of the CLR, any security issues, performance overhead or other inefficiencies would be a big problem.</p>
<p>Methods that are created in this way are technically know as <code class="language-plaintext highlighter-rouge">EEImpl</code> methods (i.e. implemented by the ‘Execution Engine’), from the ‘Book of the Runtime’ (BOTR) section ‘<a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/method-descriptor.md#kinds-of-methoddescs">Method Descriptor - Kinds of MethodDescs</a>:</p>
<blockquote>
<p><strong>EEImpl</strong>
Delegate methods whose implementation is provided by the runtime (Invoke, BeginInvoke, EndInvoke). See <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/project-docs/dotnet-standards.md">ECMA 335 Partition II - Delegates</a>.</p>
</blockquote>
<p>There’s also more information available in these two excellent articles <a href="https://www.codeproject.com/Articles/20481/NET-Type-Internals-From-a-Microsoft-CLR-Perspecti?fid=459323&fr=26#16">.NET Type Internals - From a Microsoft CLR Perspective</a> (section on ‘Delegates’) and <a href="https://www.codeproject.com/Articles/26936/Understanding-NET-Delegates-and-Events-By-Practice#Internal">Understanding .NET Delegates and Events, By Practice</a> (section on ‘Internal Delegates Representation’)</p>
<hr />
<h2 id="how-the-runtime-creates-delegates">How the runtime creates delegates</h2>
<h3 id="inlining-of-delegate-ctors">Inlining of delegate ctors</h3>
<p>So we’ve seen that the runtime has responsibility for creating the bodies of delegate methods, but how is this done. It starts by wiring up the delegate constructor (ctor), as per the BOTR page on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md">‘method descriptors’</a></p>
<blockquote>
<p><strong>FCall</strong>
Internal methods implemented in unmanaged code. These are methods marked with MethodImplAttribute(MethodImplOptions.InternalCall) attribute, <strong>delegate constructors</strong> and tlbimp constructors.</p>
</blockquote>
<p>At runtime this happens when the JIT compiles a method that contains IL code for creating a delegate. In <a href="https://github.com/dotnet/coreclr/blob/0d04afc8f5919edcbb371c1e0c4f832f76aed09f/src/jit/flowgraph.cpp#L7031-L7167">Compiler::fgOptimizeDelegateConstructor(..)</a>, the JIT firstly obtains a reference to the <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/comdelegate.cpp#L3609">correct delegate ctor</a>, which in the simple case is <code class="language-plaintext highlighter-rouge">CtorOpened(Object target, IntPtr methodPtr, IntPtr shuffleThunk)</code> <a href="https://github.com/dotnet/coreclr/blob/01a9eaaa14fc3de8f11eafa6155af8ce4e44e9e9/src/mscorlib/src/System/MulticastDelegate.cs#L622-L627">(link to C# code)</a>, before finally wiring up the <code class="language-plaintext highlighter-rouge">ctor</code>, <a href="https://github.com/dotnet/coreclr/blob/0d04afc8f5919edcbb371c1e0c4f832f76aed09f/src/jit/importer.cpp#L7366">inlining it if possible</a> for maximum performance.</p>
<h3 id="creation-of-the-delegate-invoke-method">Creation of the delegate Invoke() method</h3>
<p>But what’s more interesting is the process that happens when creating the <code class="language-plaintext highlighter-rouge">Invoke()</code> method, using a technique involving ‘stubs’ of code (raw-assembly) that know how to locate the information about the target method and can jump control to it. These ‘stubs’ are actually used in a wide-variety of scenarios, for instance during <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md#stubs">Virtual Method Dispatch</a> and also <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">by the JITter</a> (when a method is first called it hits a ‘pre-code stub’ that causes the method to be JITted, the ‘stub’ is then replaced by a call to the JITted ‘native code’).</p>
<p>In the particular case of delegates, these stubs are referred to as ‘shuffle thunks’. This is because part of the work they have to do is ‘shuffle’ the arguments that are passed into the <code class="language-plaintext highlighter-rouge">Invoke()</code> method, so that are in the correct place (stack/register) by the time the ‘target’ method is called.</p>
<p>To understand what’s going on, it’s helpful to look at the following diagram taken from the BOTR page on <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode">Method Descriptors and Precode stubs</a>. The ‘shuffle thunks’ we are discussing are a particular case of a ‘stub’ and sit in the corresponding box in the diagram:</p>
<p><img src="/images/2017/01/Figure 3 The most complex case of Precode, Stub and Native Code.png" alt="Figure 3 The most complex case of Precode, Stub and Native Code" /></p>
<h3 id="how-shuffle-thunks-are-set-up">How ‘shuffle thunks’ are set-up</h3>
<p>So let’s look at the code flow for the delegate we created in the sample at the beginning of this post, specifically an ‘open’ delegate, calling an instance method (if you are wondering about the difference between open and closed delegates, have a read of <a href="http://blog.slaks.net/2011/06/open-delegates-vs-closed-delegates.html">‘Open Delegates vs. Closed Delegates’</a>).</p>
<p>We start off in the <code class="language-plaintext highlighter-rouge">impImportCall()</code> method, deep inside the .NET JIT, triggered when a <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.call(v=vs.110).aspx">‘call’ op-code</a> for a delegate is encountered, it then goes through the following functions:</p>
<ol>
<li><a href="https://github.com/dotnet/coreclr/blob/0d04afc8f5919edcbb371c1e0c4f832f76aed09f/src/jit/importer.cpp#L7348-L7353">Compiler::impImportCall(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/0d04afc8f5919edcbb371c1e0c4f832f76aed09f/src/jit/flowgraph.cpp#L7031-L7167">Compiler::fgOptimizeDelegateConstructor(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/comdelegate.cpp#L3440-L3691">COMDelegate::GetDelegateCtor(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/c5abe8c5a3d74b8417378e03f560fd54799c17f2/src/vm/comdelegate.cpp#L584-L632">COMDelegate::SetupShuffleThunk</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/stubcache.cpp#L70-L165">StubCacheBase::Canonicalize(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/c5abe8c5a3d74b8417378e03f560fd54799c17f2/src/vm/comdelegate.cpp#L473-L483">ShuffleThunkCache::CompileStub()</a></li>
<li>EmitShuffleThunk (specific assembly code for different CPU architectures)
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/arm/stubs.cpp#L1534-L1716">arm</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/arm64/stubs.cpp#L1634-L1676">arm64</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/375948e39cf1a946b3d8048ca51cd4e548f94648/src/vm/i386/stublinkerx86.cpp#L3989-L4240">i386</a></li>
</ul>
</li>
</ol>
<p>Below is the code from the <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/arm64/stubs.cpp#L1634-L1676">arm64 version</a> (chosen because it’s the shortest one of the three!). You can see that it emits assembly code to fetch the real target address from <code class="language-plaintext highlighter-rouge">MethodPtrAux</code>, loops through the method arguments and puts them in the correct register (i.e. ‘shuffles’ them into place) and finally emits a tail-call jump to the target method associated with the delegate.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VOID</span> <span class="n">StubLinkerCPU</span><span class="o">::</span><span class="n">EmitShuffleThunk</span><span class="p">(</span><span class="n">ShuffleEntry</span> <span class="o">*</span><span class="n">pShuffleEntryArray</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// On entry x0 holds the delegate instance. Look up the real target address stored in the MethodPtrAux</span>
<span class="c1">// field and save it in x9. Tailcall to the target method after re-arranging the arguments</span>
<span class="c1">// ldr x9, [x0, #offsetof(DelegateObject, _methodPtrAux)]</span>
<span class="n">EmitLoadStoreRegImm</span><span class="p">(</span><span class="n">eLOAD</span><span class="p">,</span> <span class="n">IntReg</span><span class="p">(</span><span class="mi">9</span><span class="p">),</span> <span class="n">IntReg</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">DelegateObject</span><span class="o">::</span><span class="n">GetOffsetOfMethodPtrAux</span><span class="p">());</span>
<span class="c1">//add x11, x0, DelegateObject::GetOffsetOfMethodPtrAux() - load the indirection cell into x11 used by ResolveWorkerAsmStub</span>
<span class="n">EmitAddImm</span><span class="p">(</span><span class="n">IntReg</span><span class="p">(</span><span class="mi">11</span><span class="p">),</span> <span class="n">IntReg</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">DelegateObject</span><span class="o">::</span><span class="n">GetOffsetOfMethodPtrAux</span><span class="p">());</span>
<span class="k">for</span> <span class="p">(</span><span class="n">ShuffleEntry</span><span class="o">*</span> <span class="n">pEntry</span> <span class="o">=</span> <span class="n">pShuffleEntryArray</span><span class="p">;</span> <span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">!=</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">SENTINEL</span><span class="p">;</span> <span class="n">pEntry</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// If source is present in register then destination must also be a register</span>
<span class="n">_ASSERTE</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">);</span>
<span class="n">EmitMovReg</span><span class="p">(</span><span class="n">IntReg</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">OFSMASK</span><span class="p">),</span> <span class="n">IntReg</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">OFSMASK</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// source must be on the stack</span>
<span class="n">_ASSERTE</span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">));</span>
<span class="n">EmitLoadStoreRegImm</span><span class="p">(</span><span class="n">eLOAD</span><span class="p">,</span> <span class="n">IntReg</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">OFSMASK</span><span class="p">),</span> <span class="n">RegSp</span><span class="p">,</span> <span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="c1">// source must be on the stack</span>
<span class="n">_ASSERTE</span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">));</span>
<span class="c1">// dest must be on the stack</span>
<span class="n">_ASSERTE</span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">&</span> <span class="n">ShuffleEntry</span><span class="o">::</span><span class="n">REGMASK</span><span class="p">));</span>
<span class="n">EmitLoadStoreRegImm</span><span class="p">(</span><span class="n">eLOAD</span><span class="p">,</span> <span class="n">IntReg</span><span class="p">(</span><span class="mi">8</span><span class="p">),</span> <span class="n">RegSp</span><span class="p">,</span> <span class="n">pEntry</span><span class="o">-></span><span class="n">srcofs</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">EmitLoadStoreRegImm</span><span class="p">(</span><span class="n">eSTORE</span><span class="p">,</span> <span class="n">IntReg</span><span class="p">(</span><span class="mi">8</span><span class="p">),</span> <span class="n">RegSp</span><span class="p">,</span> <span class="n">pEntry</span><span class="o">-></span><span class="n">dstofs</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Tailcall to target</span>
<span class="c1">// br x9</span>
<span class="n">EmitJumpRegister</span><span class="p">(</span><span class="n">IntReg</span><span class="p">(</span><span class="mi">9</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="other-functions-that-call-setupshufflethunk">Other functions that call <code class="language-plaintext highlighter-rouge">SetupShuffleThunk(..)</code></h3>
<p>The other places in code that also emit these ‘shuffle thunks’ are listed below. They are used in the various scenarios where a delegate is explicitly created, e.g. via `Delegate.CreateDelegate(..).</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L881-L1099">COMDelegate::BindToMethod(..)</a> - actual <a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L1019">call to <code class="language-plaintext highlighter-rouge">SetupShuffleThunk(..)</code></a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L1938-L2174">COMDelegate::DelegateConstruct(..)</a> (ECall impl) - actual <a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L2052">call to <code class="language-plaintext highlighter-rouge">SetupShuffleThunk(..)</code></a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L3440-L3691">COMDelegate::GetDelegateCtor(..)</a> - actual <a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L3618">call to <code class="language-plaintext highlighter-rouge">SetupShuffleThunk(..)</code></a></li>
</ul>
<hr />
<h2 id="different-types-of-delegates">Different types of delegates</h2>
<p>Now that we’ve looked at how one type of delegate works (#2 ‘Instance open non-virt’ in the table below), it will be helpful to see the other different types that the runtime deals with. From the very informative <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/comdelegate.cpp#L3547-L3567"><strong>DELEGATE KINDS TABLE</strong></a> in the CLR source:</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: left">#</th>
<th style="text-align: left">delegate type</th>
<th style="text-align: left">_target</th>
<th style="text-align: left">_methodPtr</th>
<th style="text-align: left">_methodPtrAux</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">1</td>
<td style="text-align: left">Instance closed</td>
<td style="text-align: left">‘this’ ptr</td>
<td style="text-align: left">target method</td>
<td style="text-align: left">null</td>
</tr>
<tr>
<td style="text-align: left">2</td>
<td style="text-align: left"><strong>Instance open non-virt</strong></td>
<td style="text-align: left"><strong>delegate</strong></td>
<td style="text-align: left"><strong>shuffle thunk</strong></td>
<td style="text-align: left"><strong>target method</strong></td>
</tr>
<tr>
<td style="text-align: left">3</td>
<td style="text-align: left">Instance open virtual</td>
<td style="text-align: left">delegate</td>
<td style="text-align: left">Virtual-stub dispatch</td>
<td style="text-align: left">method id</td>
</tr>
<tr>
<td style="text-align: left">4</td>
<td style="text-align: left">Static closed</td>
<td style="text-align: left">first arg</td>
<td style="text-align: left">target method</td>
<td style="text-align: left">null</td>
</tr>
<tr>
<td style="text-align: left">5</td>
<td style="text-align: left">Static closed (special sig)</td>
<td style="text-align: left">delegate</td>
<td style="text-align: left">specialSig thunk</td>
<td style="text-align: left">target method</td>
</tr>
<tr>
<td style="text-align: left">6</td>
<td style="text-align: left">Static opened</td>
<td style="text-align: left">delegate</td>
<td style="text-align: left">shuffle thunk</td>
<td style="text-align: left">target method</td>
</tr>
<tr>
<td style="text-align: left">7</td>
<td style="text-align: left">Secure</td>
<td style="text-align: left">delegate</td>
<td style="text-align: left">call thunk</td>
<td style="text-align: left">MethodDesc (frame)</td>
</tr>
</tbody>
</table>
</span>
<p><strong>Note:</strong> The columns map to the <a href="https://github.com/dotnet/coreclr/blob/b1f5c6acca00ca471818237d698baca317851b1f/src/mscorlib/src/System/Delegate.cs#L23-L38">internal fields of a delegate</a> (from <code class="language-plaintext highlighter-rouge">System.Delegate</code>)</p>
<p>So we’ve (deliberately) looked at the simple case, but the more complex scenarios all work along similar lines, just using different and more stubs/thunks as needed e.g. ‘virtual-stub dispatch’ or ‘call thunk’.</p>
<hr />
<h2 id="delegates-are-special">Delegates are special!!</h2>
<p>As well as being responsible for creating delegates, the runtime also treats delegate specially, to enforce security and/or type-safety. You can see how this is implemented in the links below</p>
<p>In MethodTableBuilder.cpp:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/methodtablebuilder.cpp#L3341-L3352">For delegates we don’t allow any non-runtime implemented bodies for any of the four special methods</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/methodtablebuilder.cpp#L6316-L6336">It is not allowed for EnC (edit-and-continue) to replace one of the runtime builtin methods</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/methodtablebuilder.cpp#L6706-L6719">Don’t allow overrides for any of the four special runtime implemented delegate methods</a></li>
</ul>
<p>In ClassCompat.cpp:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/classcompat.cpp#L2749-L2792">currently the only runtime implemented functions are delegate instance methods</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/classcompat.cpp#L2869-L2880">For delegates we don’t allow any non-runtime implemented bodies for any of the four special methods</a></li>
</ul>
<hr />
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/5q2w1t/how_do_net_delegates_work/">/r/programming</a> and <a href="https://www.reddit.com/r/csharp/comments/5q3ges/how_do_net_delegates_work/">/r/csharp</a></p>
<hr />
<h2 id="other-links">Other links:</h2>
<p>If you’ve read this far, good job!!</p>
<p>As a reward, below are some extra links that cover more than you could possibly want to know about delegates!!</p>
<h3 id="general-info">General Info:</h3>
<ul>
<li><a href="http://stackoverflow.com/questions/299703/delegate-keyword-vs-lambda-notation">delegate keyword vs. lambda notation - Stack Overflow</a></li>
<li><a href="http://stackoverflow.com/questions/73227/what-is-the-difference-between-lambdas-and-delegates-in-the-net-framework">What is the difference between lambdas and delegates in the .NET Framework?</a></li>
<li><a href="https://news.ycombinator.com/item?id=13198805">Why can’t the jit inline the generated code?</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/6737">Inline literal delegates passed to functions</a></li>
<li><a href="http://csharpindepth.com/Articles/Chapter2/Events.aspx">Delegates and Events</a></li>
<li><a href="http://blog.slaks.net/2011/06/open-delegates-vs-closed-delegates.html">Open Delegates vs. Closed Delegates</a></li>
<li><a href="http://www.philosophicalgeek.com/2014/07/25/using-windbg-to-answer-implementation-questions-for-yourself-can-a-delegate-invocation-be-inlined/">Using Windbg to answer implementation questions for yourself (Can a Delegate Invocation be Inlined?)</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/8819">[Question] Can Virtual Stub Dispatch be “inlined”?</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md">BOTR - Virtual Stub Dispatch</a></li>
</ul>
<h3 id="internal-delegate-info">Internal Delegate Info</h3>
<ul>
<li><a href="http://blog.monstuff.com/archives/000038.html">C# Delegates strike back · Curiosity is bliss</a> (mostly from a Mono P.O.V)</li>
<li><a href="http://stackoverflow.com/questions/7136615/open-delegate-for-generic-interface-method">Open delegate for generic interface method</a> (Bug in the CLR)</li>
<li><a href="https://github.com/dotnet/coreclr/issues/5275">[ARM/Linux] ARM-softfp delegate code generation issue</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/virtualcallstub.h#L122-L134">On x86 are four possible kinds of callsites when you take into account all features</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/virtualcallstub.h#L179-L222">VirtualCallStubManager is the heart of the stub dispatch logic. See the book of the runtime entry</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/virtualcallstub.h#L912-L948">StubDispatchNotes</a></li>
<li><a href="https://syslog.ravelin.com/anatomy-of-a-function-call-in-go-f6fc81b80ecc">Anatomy of a function call in Go</a></li>
</ul>
<h3 id="debugging-delegates">Debugging delegates</h3>
<ul>
<li><a href="http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx">Useful .NET Delegate Internals</a></li>
<li><a href="https://github.com/fremag/MemoScope.Net/wiki/Delegate-Targets">Delegate Targets from MemoScope.Net</a></li>
<li><a href="https://github.com/fremag/MemoScope.Net/blob/master/MemoScope/Modules/Delegates/DelegatesAnalysis.cs">DelegatesAnalysis.cs from MemoScope.Net</a></li>
<li><a href="https://github.com/Microsoft/clrmd/issues/35">Getting the method name of a Delegate instance</a></li>
<li><a href="http://stackoverflow.com/questions/3668642/get-method-name-from-delegate-with-windbg">Get method name from delegate with WinDbg</a></li>
<li><a href="http://julmar.com/blog/debugging/sos-finding-the-method-bound-to-an-eventhandler-with-windbg/">SOS: finding the method bound to an EventHandler with WinDbg.</a></li>
<li><a href="https://blogs.msdn.microsoft.com/abhinaba/2014/09/29/net-just-in-time-compilation-and-warming-up-your-system/">.NET Just in Time Compilation and Warming up Your System</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2017/01/25/How-do-.NET-delegates-work/">How do .NET delegates work?</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Analysing Pause times in the .NET GC2017-01-13T00:00:00+00:00http://www.mattwarren.org/2017/01/13/Analysing-Pause-times-in-the-.NET-GC
<p>Over the last few months there have been several blog posts looking at GC pauses in different programming languages or runtimes. It all started with a post looking at the <a href="https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/">latency of the Haskell GC</a>, next came a follow-up that <a href="http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/">compared Haskell, OCaml and Racket</a>, followed by <a href="https://blog.pusher.com/golangs-real-time-gc-in-theory-and-practice/">Go GC in Theory and Practice</a>, before a final post looking at <a href="http://theerlangelist.com/article/reducing_maximum_latency">the situation in Erlang</a>.</p>
<p>After reading all these posts I wanted to see how the .NET GC compares to the other runtime implementations.</p>
<hr />
<p>The posts above all use a similar test program to exercise the GC, based on the message-bus scenario that <a href="https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/">Pusher initially described</a>, fortunately <a href="https://gitlab.com/frje">Franck Jeannin</a> had <a href="https://gitlab.com/frje/gc-latency-experiment/blob/master/Main.cs">already started work on a .NET version</a>, so this blog post will make us of that.</p>
<p>At the heart of the test is the following code:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">msgCount</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">sw</span> <span class="p">=</span> <span class="n">Stopwatch</span><span class="p">.</span><span class="nf">StartNew</span><span class="p">();</span>
<span class="nf">pushMessage</span><span class="p">(</span><span class="n">array</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="n">sw</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sw</span><span class="p">.</span><span class="n">Elapsed</span> <span class="p">></span> <span class="n">worst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">worst</span> <span class="p">=</span> <span class="n">sw</span><span class="p">.</span><span class="n">Elapsed</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">unsafe</span> <span class="k">void</span> <span class="nf">pushMessage</span><span class="p">(</span><span class="kt">byte</span><span class="p">[][]</span> <span class="n">array</span><span class="p">,</span> <span class="kt">int</span> <span class="n">id</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">array</span><span class="p">[</span><span class="n">id</span> <span class="p">%</span> <span class="n">windowSize</span><span class="p">]</span> <span class="p">=</span> <span class="nf">createMessage</span><span class="p">(</span><span class="n">id</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p><a href="https://gist.github.com/mattwarren/086634ba83170ed984679e17a09167ec">The full code is available</a></p>
<p>So we are creating a ‘message’ (that is actually a <code class="language-plaintext highlighter-rouge">byte[1024]</code>) and then putting it into a data structure (<code class="language-plaintext highlighter-rouge">byte[][]</code>). This is repeated 10 million times (<code class="language-plaintext highlighter-rouge">msgCount</code>), but at any one time there are only 200,000 (<code class="language-plaintext highlighter-rouge">windowSize</code>) messages in memory, because we overwrite old ‘messages’ as we go along.</p>
<p>We are timing how long it takes to <em>add</em> the message to the array, which should be a very quick operation. It’s not guaranteed that this time will always equate to GC pauses, but it’s pretty likely. However we can also double check the actual GC pause times by using the <a href="http://www.philosophicalgeek.com/2012/07/16/how-to-debug-gc-issues-using-perfview/">excellent PerfView tool</a>, to give us more confidence.</p>
<hr />
<h3 id="workstation-gc-vs-server-gc">Workstation GC vs. Server GC</h3>
<p>Unlike the Java GC <a href="https://twitter.com/matthewwarren/status/819130794262298625">that is very configurable</a>, the .NET GC really only gives you a few options:</p>
<ul>
<li>Workstation</li>
<li>Server</li>
<li>Concurrent/Background</li>
</ul>
<p>So we will be comparing the Server and Workstation modes, but as we want to <em>reduce</em> pauses we are going to always leave <a href="https://msdn.microsoft.com/en-us/library/yhwwzef8(v=vs.110).aspx">Concurrent/Background mode enabled</a>.</p>
<p>As outlined in the excellent post <a href="https://blogs.msdn.microsoft.com/seteplia/2017/01/05/understanding-different-gc-modes-with-concurrency-visualizer/">Understanding different GC modes with Concurrency Visualizer</a>, the 2 modes are optimised for different things (emphasis mine):</p>
<blockquote>
<p><strong>Workstation GC is designed for desktop applications to minimize the time spent in GC</strong>. In this case GC will happen more frequently but with shorter pauses in application threads. <strong>Server GC is optimized for application throughput in favor of longer GC pauses</strong>. Memory consumption will be higher, but application can process greater volume of data without triggering garbage collection.</p>
</blockquote>
<p>Therefore Workstation mode should give us shorter pauses than Server mode and the results bear this out, below is a graph of the pause times at different percentiles, <a href="https://github.com/HdrHistogram/HdrHistogram.NET/">recorded with by HdrHistogram.NET</a> (click for full-size image):</p>
<p><a href="/images/2017/01/Histogram - Array - WKS v SVR.png"><img src="/images/2017/01/Histogram - Array - WKS v SVR.png" alt="Histogram - Array - WKS v SVR" /></a></p>
<p>Note that the X-axis scale is logarithmic, the Workstation (WKS) pauses starts increasing at the 99.99%’ile, whereas the Server (SVR) pauses only start at the 99.9999%’ile, although they have a larger maximum.</p>
<p>Another way of looking at the results is the table below, here we can clearly see that Workstation has a-lot more GC pauses, although the max is smaller. But more significantly the total GC pause time is much higher and as a result the overall/elapsed time is twice as long (WKS v. SVR).</p>
<p><strong>Workstation GC (Concurrent) vs. Server GC (Background)</strong> (On .NET 4.6 - Array tests - all times in milliseconds)</p>
<table>
<thead>
<tr>
<th>GC Mode</th>
<th style="text-align: right">Max GC Pause</th>
<th style="text-align: right"># GC Pauses</th>
<th style="text-align: right">Total GC Pause Time</th>
<th style="text-align: right">Elapsed Time</th>
<th style="text-align: right">Peak Working Set (MB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Workstation - 1</td>
<td style="text-align: right">28.0</td>
<td style="text-align: right">1,797</td>
<td style="text-align: right">10,266.2</td>
<td style="text-align: right">21,688.3</td>
<td style="text-align: right">550.37</td>
</tr>
<tr>
<td>Workstation - 2</td>
<td style="text-align: right">23.2</td>
<td style="text-align: right">1,796</td>
<td style="text-align: right">9,756.6</td>
<td style="text-align: right">21,018.2</td>
<td style="text-align: right">543.50</td>
</tr>
<tr>
<td>Workstation - 3</td>
<td style="text-align: right">19.3</td>
<td style="text-align: right">1,800</td>
<td style="text-align: right">9,676.0</td>
<td style="text-align: right">21,114.6</td>
<td style="text-align: right">531.24</td>
</tr>
<tr>
<td>Server - 1</td>
<td style="text-align: right">104.6</td>
<td style="text-align: right">7</td>
<td style="text-align: right">646.4</td>
<td style="text-align: right">7,062.2</td>
<td style="text-align: right">2,086.39</td>
</tr>
<tr>
<td>Server - 2</td>
<td style="text-align: right">107.2</td>
<td style="text-align: right">7</td>
<td style="text-align: right">664.8</td>
<td style="text-align: right">7,096.6</td>
<td style="text-align: right">2,092.65</td>
</tr>
<tr>
<td>Server - 3</td>
<td style="text-align: right">106.2</td>
<td style="text-align: right">6</td>
<td style="text-align: right">558.4</td>
<td style="text-align: right">7,023.6</td>
<td style="text-align: right">2,058.12</td>
</tr>
</tbody>
</table>
<p>Therefore if you only care about the reducing the maximum pause time then Workstation mode is a suitable option, but you will experience more GC pauses overall and so the throughput of your application will be reduced. In addition, the working set is higher for Server mode as it allocates 1 heap per CPU.</p>
<p>Fortunately in .NET we have the choice of which mode we want to use, according to the fantastic article <a href="https://blog.plan99.net/modern-garbage-collection-911ef4f8bd8e">Modern garbage collection</a> the GO runtime has optimised for pause time only:</p>
<blockquote>
<p>The reality is that Go’s GC does not really implement any new ideas or research. As their announcement admits, it is a straightforward concurrent mark/sweep collector based on ideas from the 1970s. <strong>It is notable only because it has been designed to optimise for pause times at the cost of absolutely every other desirable characteristic in a GC</strong>. Go’s <a href="https://talks.golang.org/2015/go-gc.pdf">tech talks</a> and marketing materials don’t seem to mention any of these tradeoffs, leaving developers unfamiliar with garbage collection technologies to assume that no such tradeoffs exist, and by implication, that Go’s competitors are just badly engineered piles of junk.</p>
</blockquote>
<hr />
<h3 id="max-gc-pause-time-compared-to-amount-of-live-objects">Max GC Pause Time compared to Amount of Live Objects</h3>
<p>To investigate things further, let’s look at how the maximum pause times vary with the number of <em>live objects</em>. If you refer back to the sample code, we will still be allocating 10,000,000 message (<code class="language-plaintext highlighter-rouge">msgCount</code>), but we will vary the amount that are kept around at any one time by changing the <code class="language-plaintext highlighter-rouge">windowSize</code> value. Here are the results (click for full-size image):</p>
<p><a href="/images/2017/01/GC Pause times compared to WindowSize.png"><img src="/images/2017/01/GC Pause times compared to WindowSize.png" alt="GC Pause times compared to WindowSize" /></a></p>
<p>So you can clearly see that the max pause time is proportional (linearly?) to the amount of live objects, i.e. the amount of objects that survive the GC. Why is this that case, well to get a bit more info we will again use PerfView to help us. If you compare the 2 tables below, you can see that the ‘Promoted MB’ is drastically different, a lot more memory is promoted when we have a larger <code class="language-plaintext highlighter-rouge">windowSize</code>, so the GC has more work to do and as a result the ‘Pause MSec’ times go up.</p>
<center><table border="1"><tbody><tr><th colspan="13">GC Events by Time - windowSize = 100,000</th></tr><tr><th colspan="13">All times are in msec. Hover over columns for help.</th></tr><tr><th>GC<br />Index</th><th title="N=NonConcurrent, B=Background, F=Foreground (while background is running) I=Induced i=InducedNotForced">Gen</th><th>Pause<br />MSec</th><th title="Amount allocated since the last GC occured">Gen0<br />Alloc<br />MB</th><th title="The peak size of the GC during GC. (includes fragmentation)">Peak<br />MB</th><th title="The size after GC (includes fragmentation)">After<br />MB</th><th title="Memory this GC promoted">Promoted<br />MB</th><th title="Size of gen0 at the end of this GC.">Gen0<br />MB</th><th title="Size of gen1 at the end of this GC.">Gen1<br />MB</th><th title="Size of Gen2 in MB at the end of this GC.">Gen2<br />MB</th><th title="Size of Large object heap (LOH) in MB at the end of this GC.">LOH<br />MB</th></tr><tr><td style="text-align: right;">2</td><td style="text-align: right;">1N</td><td style="text-align: right;">39.443</td><td style="text-align: right;">1,516.354</td><td style="text-align: right;">1,516.354</td><td style="text-align: right;">108.647</td><td style="text-align: right;">104.831</td><td style="text-align: right;">0.000</td><td style="text-align: right;">107.200</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr><tr><td style="text-align: right;">3</td><td style="text-align: right;">0N</td><td style="text-align: right;">38.516</td><td style="text-align: right;">1,651.466</td><td style="text-align: right;">0.000</td><td style="text-align: right;">215.847</td><td style="text-align: right;">104.800</td><td style="text-align: right;">0.000</td><td style="text-align: right;">214.400</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr><tr><td style="text-align: right;">4</td><td style="text-align: right;">1N</td><td style="text-align: right;">42.732</td><td style="text-align: right;">1,693.908</td><td style="text-align: right;">1,909.754</td><td style="text-align: right;">108.647</td><td style="text-align: right;">104.800</td><td style="text-align: right;">0.000</td><td style="text-align: right;">107.200</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr><tr><td style="text-align: right;">5</td><td style="text-align: right;">0N</td><td style="text-align: right;">35.067</td><td style="text-align: right;">1,701.012</td><td style="text-align: right;">1,809.658</td><td style="text-align: right;">215.847</td><td style="text-align: right;">104.800</td><td style="text-align: right;">0.000</td><td style="text-align: right;">214.400</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr><tr><td style="text-align: right;">6</td><td style="text-align: right;">1N</td><td style="text-align: right;">54.424</td><td style="text-align: right;">1,727.380</td><td style="text-align: right;">1,943.226</td><td style="text-align: right;">108.647</td><td style="text-align: right;">104.800</td><td style="text-align: right;">0.000</td><td style="text-align: right;">107.200</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr><tr><td style="text-align: right;">7</td><td style="text-align: right;">0N</td><td style="text-align: right;">35.208</td><td style="text-align: right;">1,603.832</td><td style="text-align: right;">1,712.479</td><td style="text-align: right;">215.847</td><td style="text-align: right;">104.800</td><td style="text-align: right;">0.000</td><td style="text-align: right;">214.400</td><td style="text-align: right;">0.031</td><td style="text-align: right;">1.415</td></tr></tbody></table></center>
<p><a href="/images/2017/01/GC Events by Time - windowSize 100,000.png">Full PerfView output</a></p>
<center><table border="1"><tbody><tr><th colspan="13" align="Center">GC Events by Time - windowSize = 400,000</th></tr><tr><th colspan="13" align="Center">All times are in msec. Hover over columns for help.</th></tr><tr><th>GC<br />Index</th><th title="N=NonConcurrent, B=Background, F=Foreground (while background is running) I=Induced i=InducedNotForced">Gen</th><th>Pause<br />MSec</th><th title="Amount allocated since the last GC occured">Gen0<br />Alloc<br />MB</th><th title="The peak size of the GC during GC. (includes fragmentation)">Peak<br />MB</th><th title="The size after GC (includes fragmentation)">After<br />MB</th><th title="Memory this GC promoted">Promoted<br />MB</th><th title="Size of gen0 at the end of this GC.">Gen0<br />MB</th><th title="Size of gen1 at the end of this GC.">Gen1<br />MB</th><th title="Size of Gen2 in MB at the end of this GC.">Gen2<br />MB</th><th title="Size of Large object heap (LOH) in MB at the end of this GC.">LOH<br />MB</th></tr><tr><td align="right">2</td><td align="right">0N</td><td align="right">10.319</td><td align="right">76.170</td><td align="right">76.170</td><td align="right">76.133</td><td align="right">68.983</td><td align="right">0.000</td><td align="right">72.318</td><td align="right">0.000</td><td align="right">3.815</td></tr><tr><td align="right">3</td><td align="right">1N</td><td align="right">47.192</td><td align="right">666.089</td><td align="right">0.000</td><td align="right">708.556</td><td align="right">419.231</td><td align="right">0.000</td><td align="right">704.016</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">4</td><td align="right">0N</td><td align="right">145.347</td><td align="right">1,023.369</td><td align="right">1,731.925</td><td align="right">868.610</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">864.070</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">5</td><td align="right">1N</td><td align="right">190.736</td><td align="right">1,278.314</td><td align="right">2,146.923</td><td align="right">433.340</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">428.800</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">6</td><td align="right">0N</td><td align="right">150.689</td><td align="right">1,235.161</td><td align="right">1,668.501</td><td align="right">862.140</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">857.600</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">7</td><td align="right">1N</td><td align="right">214.465</td><td align="right">1,493.290</td><td align="right">2,355.430</td><td align="right">433.340</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">428.800</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">8</td><td align="right">0N</td><td align="right">148.816</td><td align="right">1,055.470</td><td align="right">1,488.810</td><td align="right">862.140</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">857.600</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">9</td><td align="right">1N</td><td align="right">225.881</td><td align="right">1,543.345</td><td align="right">2,405.485</td><td align="right">433.340</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">428.800</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">10</td><td align="right">0N</td><td align="right">148.292</td><td align="right">1,077.176</td><td align="right">1,510.516</td><td align="right">862.140</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">857.600</td><td align="right">0.725</td><td align="right">3.815</td></tr><tr><td align="right">11</td><td align="right">1N</td><td align="right">225.917</td><td align="right">1,610.319</td><td align="right">2,472.459</td><td align="right">433.340</td><td align="right">419.200</td><td align="right">0.000</td><td align="right">428.800</td><td align="right">0.725</td><td align="right">3.815</td></tr></tbody></table></center>
<p><a href="/images/2017/01/GC Events by Time - windowSize 400,000.png">Full PerfView output</a></p>
<hr />
<h3 id="going-off-heap">Going ‘off-heap’</h3>
<p>Finally, if we really want to eradicate GC pauses in .NET, we can go off-heap. To do that we can write <code class="language-plaintext highlighter-rouge">unsafe</code> code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">dest</span> <span class="p">=</span> <span class="n">array</span><span class="p">[</span><span class="n">id</span> <span class="p">%</span> <span class="n">windowSize</span><span class="p">];</span>
<span class="n">IntPtr</span> <span class="n">unmanagedPointer</span> <span class="p">=</span> <span class="n">Marshal</span><span class="p">.</span><span class="nf">AllocHGlobal</span><span class="p">(</span><span class="n">dest</span><span class="p">.</span><span class="n">Length</span><span class="p">);</span>
<span class="kt">byte</span><span class="p">*</span> <span class="n">bytePtr</span> <span class="p">=</span> <span class="p">(</span><span class="kt">byte</span> <span class="p">*)</span> <span class="n">unmanagedPointer</span><span class="p">;</span>
<span class="c1">// Get the raw data into the bytePtr (byte *) </span>
<span class="c1">// in reality this would come from elsewhere, e.g. a network packet</span>
<span class="c1">// but for the test we'll just cheat and populate it in a loop</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">dest</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="p">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">*(</span><span class="n">bytePtr</span> <span class="p">+</span> <span class="n">i</span><span class="p">)</span> <span class="p">=</span> <span class="p">(</span><span class="kt">byte</span><span class="p">)</span><span class="n">id</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Copy the unmanaged byte array (byte*) into the managed one (byte[])</span>
<span class="n">Marshal</span><span class="p">.</span><span class="nf">Copy</span><span class="p">(</span><span class="n">unmanagedPointer</span><span class="p">,</span> <span class="n">dest</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="n">dest</span><span class="p">.</span><span class="n">Length</span><span class="p">);</span>
<span class="n">Marshal</span><span class="p">.</span><span class="nf">FreeHGlobal</span><span class="p">(</span><span class="n">unmanagedPointer</span><span class="p">);</span>
</code></pre></div></div>
<p>Note: I wouldn’t recommend this option unless you have first profiled and determined that GC pauses are a problem, it’s called <code class="language-plaintext highlighter-rouge">unsafe</code> for a reason.</p>
<p><a href="/images/2017/01/Histogram - Array - SVR v OffHeap.png"><img src="/images/2017/01/Histogram - Array - SVR v OffHeap.png" alt="Histogram - Array - SVR v OffHeap" /></a></p>
<p>But as the graph shows, it clearly works (the off-heap values are there, honest!!). But it’s not that surprising, we are giving the GC nothing to do (because off-heap memory isn’t tracked by the GC), we get no GC pauses!</p>
<hr />
<p>To finish let’s get a final work from Maoni Stephens, the main GC dev on the .NET runtime, from <a href="https://blogs.msdn.microsoft.com/maoni/2014/12/25/gc-etw-events-2/">GC ETW events – 2 – Maoni’s WebLog</a>:</p>
<blockquote>
<p>It doesn’t even mean for the longest individual GC pauses you should always look at full GCs because full GCs can be done concurrently, which means you could have gen2 GCs whose pauses are shorter than ephemeral GCs. And even if full GCs did have longest individual pauses, it still doesn’t necessarily mean you should only look at them because you might be doing these GCs very infrequently, and ephemeral GCs actually contribute to most of the GC pause time if the total GC pauses are your problem.</p>
</blockquote>
<p>Note: <strong>Ephemeral</strong> generations and segments - Because objects in generations 0 and 1 are short-lived, these generations are known as the <strong>ephemeral</strong> generations.</p>
<p>So if GC pause times are a genuine issue in your application, make sure you analyse them correctly!</p>
<hr />
<p>Discuss this post in <a href="https://www.reddit.com/r/csharp/comments/5ns3dx/analysing_pause_times_in_the_net_gc/">/r/csharp</a>, <a href="https://www.reddit.com/r/programming/comments/5nrror/analysing_pause_times_in_the_net_gc/">/r/programming</a> and <a href="https://news.ycombinator.com/item?id=13397898">Hacker News</a></p>
<p>The post <a href="http://www.mattwarren.org/2017/01/13/Analysing-Pause-times-in-the-.NET-GC/">Analysing Pause times in the .NET GC</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Why Exceptions should be Exceptional2016-12-20T00:00:00+00:00http://www.mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional
<p><img src="/images/2016/12/Meteor-Hit-588072.jpg" alt="Meteor Hit on the Earth" /></p>
<p>According to the <a href="http://neo.jpl.nasa.gov/">NASA ‘Near Earth Object Program’</a> asteroid <a href="http://neo.jpl.nasa.gov/risk/a101955.html">‘<em>101955 Bennu (1999 RQ36)</em>’</a> has a Cumulative Impact Probability of 3.7e-04, i.e. there is a <strong>1 in 2,700</strong> (0.0370%) chance of Earth impact, but more reassuringly there is a 99.9630% chance the asteroid will miss the Earth completely!</p>
<p>But how does this relate to exceptions in the .NET runtime, well let’s take a look at the official .NET <a href="https://msdn.microsoft.com/en-us/library/ms229030(v=vs.110).aspx">Framework Design Guidelines for Throwing Exceptions</a> (which are based on the excellent book <a href="http://amzn.to/2hOOHsR">Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries</a>)</p>
<p><img src="/images/2016/12/Framework Design Guidelines for Exceptions.png" alt="Framework Design Guidelines for Exceptions" /></p>
<p><strong>So exceptions should be exceptional, unusual or rare, much like a asteroid strike!!</strong></p>
<h3 id="net-framework-tryxxx-pattern">.NET Framework TryXXX() Pattern</h3>
<p>In .NET, the recommended was to avoid exceptions in normal code flow is to use the <code class="language-plaintext highlighter-rouge">TryXXX()</code> pattern. As pointed out in the guideline section on <a href="https://msdn.microsoft.com/en-us/library/ms229009(v=vs.110).aspx">Exceptions and Performance</a>, rather than writing code like this, which has to catch the exception when the input string isn’t a valid integer:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">result</span> <span class="p">=</span> <span class="kt">int</span><span class="p">.</span><span class="nf">Parse</span><span class="p">(</span><span class="s">"IANAN"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">FormatException</span> <span class="n">fEx</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">fEx</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>You should instead use the <code class="language-plaintext highlighter-rouge">TryXXX</code> API, in the following pattern:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">result</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="kt">int</span><span class="p">.</span><span class="nf">TryParse</span><span class="p">(</span><span class="s">"IANAN"</span><span class="p">,</span> <span class="k">out</span> <span class="n">result</span><span class="p">))</span>
<span class="p">{</span>
<span class="c1">// SUCCESS!!</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="c1">// FAIL!!</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Fortunately large parts of the .NET runtime use this pattern for non-exceptional events, such as parsing a string, creating a URL or adding an item to a Concurrent Dictionary.</p>
<h2 id="the-performance-costs-of-exceptions">The performance costs of exceptions</h2>
<p>So onto the performance costs, I was inspired to write this post after reading this tweet from <a href="https://twitter.com/clemensv">Clemens Vasters</a>:</p>
<p><a href="https://twitter.com/clemensv/status/722821904189362179"><img src="/images/2016/12/Clemens Vasters tweet.png" alt="Clemens Vasters tweet" /></a></p>
<p>I also copied/borrowed a large amount of ideas from the excellent post <a href="https://shipilev.net/blog/2014/exceptional-performance/">‘The Exceptional Performance of Lil’ Exception’</a> by Java performance guru <a href="https://twitter.com/shipilev">Aleksey Shipilëv</a> (this post is in essence the .NET version of his post, which focuses exclusively on exceptions in the JVM)</p>
<p>So lets start with the full results (click for full-size image):</p>
<p><a href="/images/2016/12/Exception Benchmark Results.png"><img src="/images/2016/12/Exception Benchmark Results.png" alt="Exception Benchmark Results" /></a></p>
<p>(<a href="https://gist.github.com/mattwarren/e3cdd278ba9c2cad03cc6b53ce6d47f6">Full Benchmark Code and Results</a>)</p>
<h3 id="rare-exceptions-v-error-code-handling">Rare exceptions v Error Code Handling</h3>
<p>Up front I want to be clear that nothing in this post is meant to contradict the best-practices outlined in the .NET Framework Guidelines (above), in fact I hope that it actually backs them up!</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">StdErr</th>
<th style="text-align: right">StdDev</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>ErrorCodeWithReturnValue</td>
<td style="text-align: right">1.4472 ns</td>
<td style="text-align: right">0.0088 ns</td>
<td style="text-align: right">0.0341 ns</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td>RareExceptionStackTrace</td>
<td style="text-align: right">22.0401 ns</td>
<td style="text-align: right">0.0292 ns</td>
<td style="text-align: right">0.1132 ns</td>
<td style="text-align: right">15.24</td>
</tr>
<tr>
<td>RareExceptionMediumStackTrace</td>
<td style="text-align: right">61.8835 ns</td>
<td style="text-align: right">0.0609 ns</td>
<td style="text-align: right">0.2279 ns</td>
<td style="text-align: right">42.78</td>
</tr>
<tr>
<td>RareExceptionDeepStackTrace</td>
<td style="text-align: right">115.3692 ns</td>
<td style="text-align: right">0.1795 ns</td>
<td style="text-align: right">0.6953 ns</td>
<td style="text-align: right">79.76</td>
</tr>
</tbody>
</table>
<p>Here we can see that as long as you follow the guidance and ‘DO NOT use exceptions for the normal flow of control’ then they are actually not that costly. I mean yes, they’re 15 times slower than using error codes, but we’re only talking about 22 nanoseconds, i.e. 22 billionths of a second, you have to be throwing exceptions frequently for it to be noticeable. For reference, here’s what the code for the first 2 results looks like:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">struct</span> <span class="nc">ResultAndErrorCode</span><span class="p"><</span><span class="n">T</span><span class="p">></span>
<span class="p">{</span>
<span class="k">public</span> <span class="n">T</span> <span class="n">Result</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">ErrorCode</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="nf">Benchmark</span><span class="p">(</span><span class="n">Baseline</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
<span class="k">public</span> <span class="n">ResultAndErrorCode</span><span class="p"><</span><span class="kt">string</span><span class="p">></span> <span class="nf">ErrorCodeWithReturnValue</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="k">new</span> <span class="n">ResultAndErrorCode</span><span class="p"><</span><span class="kt">string</span><span class="p">>();</span>
<span class="n">result</span><span class="p">.</span><span class="n">Result</span> <span class="p">=</span> <span class="k">null</span><span class="p">;</span>
<span class="n">result</span><span class="p">.</span><span class="n">ErrorCode</span> <span class="p">=</span> <span class="m">5</span><span class="p">;</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">RareExceptionStackTrace</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="nf">RareLevel20</span><span class="p">();</span> <span class="c1">// start all the way down</span>
<span class="k">return</span> <span class="k">null</span><span class="p">;</span> <span class="c1">//Prevent Error CS0161: not all code paths return a value</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">InvalidOperationException</span> <span class="n">ioex</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Force collection of a full StackTrace</span>
<span class="k">return</span> <span class="n">ioex</span><span class="p">.</span><span class="n">StackTrace</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Where the ‘RareLevelXX() functions look like this (i.e. will <strong>only</strong> trigger an exception once for every 2,700 times it’s called):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">RareLevel1</span><span class="p">()</span> <span class="p">{</span> <span class="nf">RareLevel2</span><span class="p">();</span> <span class="p">}</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">RareLevel2</span><span class="p">()</span> <span class="p">{</span> <span class="nf">RareLevel3</span><span class="p">();</span> <span class="p">}</span>
<span class="p">...</span> <span class="c1">// several layers left out!!</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">RareLevel19</span><span class="p">()</span> <span class="p">{</span> <span class="nf">RareLevel20</span><span class="p">();</span> <span class="p">}</span>
<span class="p">[</span><span class="nf">MethodImpl</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">NoInlining</span><span class="p">)]</span>
<span class="k">private</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">RareLevel20</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">counter</span><span class="p">++;</span>
<span class="c1">// will *rarely* happen (1 in 2700)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">counter</span> <span class="p">%</span> <span class="n">chanceOfAsteroidHit</span> <span class="p">==</span> <span class="m">1</span><span class="p">)</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">InvalidOperationException</span><span class="p">(</span><span class="s">"Deep Stack Trace - Rarely triggered"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Therefore <code class="language-plaintext highlighter-rouge">RareExceptionMediumStackTrace()</code> just calls <code class="language-plaintext highlighter-rouge">RareLevel10()</code> to get a medium stack trace and <code class="language-plaintext highlighter-rouge">RareExceptionDeepStackTrace()</code> calls <code class="language-plaintext highlighter-rouge">RareLevel1()</code> which triggers the full/deep one (the full <a href="https://gist.github.com/mattwarren/e3cdd278ba9c2cad03cc6b53ce6d47f6">benchmark code is available</a>).</p>
<h3 id="stack-traces">Stack traces</h3>
<p>Now that we’ve seen the cost of calling exceptions rarely, we’re going to look at the effect the stack trace depth has on performance. Here are the full, raw results:</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">StdErr</th>
<th style="text-align: right">StdDev</th>
<th style="text-align: right">Gen 0</th>
<th style="text-align: right">Allocated</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exception-Message</td>
<td style="text-align: right">9,187.9417 ns</td>
<td style="text-align: right">13.4824 ns</td>
<td style="text-align: right">48.6117 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">148 B</td>
</tr>
<tr>
<td>Exception-TryCatch</td>
<td style="text-align: right">9,253.0215 ns</td>
<td style="text-align: right">13.2496 ns</td>
<td style="text-align: right">51.3154 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">148 B</td>
</tr>
<tr>
<td>Exception<strong>Medium</strong>-Message</td>
<td style="text-align: right">14,911.7999 ns</td>
<td style="text-align: right">20.2448 ns</td>
<td style="text-align: right">78.4078 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">916 B</td>
</tr>
<tr>
<td>Exception<strong>Medium</strong>-TryCatch</td>
<td style="text-align: right">15,158.0940 ns</td>
<td style="text-align: right">147.4210 ns</td>
<td style="text-align: right">737.1049 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">916 B</td>
</tr>
<tr>
<td>Exception<strong>Deep</strong>-Message</td>
<td style="text-align: right">19,166.3524 ns</td>
<td style="text-align: right">30.0539 ns</td>
<td style="text-align: right">116.3984 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">916 B</td>
</tr>
<tr>
<td>Exception<strong>Deep</strong>-TryCatch</td>
<td style="text-align: right">19,581.6743 ns</td>
<td style="text-align: right">208.3895 ns</td>
<td style="text-align: right">833.5579 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">916 B</td>
</tr>
<tr>
<td>CachedException-StackTrace</td>
<td style="text-align: right">29,354.9344 ns</td>
<td style="text-align: right">34.8932 ns</td>
<td style="text-align: right">135.1407 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">1.82 kB</td>
</tr>
<tr>
<td>Exception-StackTrace</td>
<td style="text-align: right">30,178.7152 ns</td>
<td style="text-align: right">41.0362 ns</td>
<td style="text-align: right">158.9327 ns</td>
<td style="text-align: right">-</td>
<td style="text-align: right">1.93 kB</td>
</tr>
<tr>
<td>Exception<strong>Medium</strong>-StackTrace</td>
<td style="text-align: right">100,121.7951 ns</td>
<td style="text-align: right">129.0631 ns</td>
<td style="text-align: right">499.8591 ns</td>
<td style="text-align: right">0.1953</td>
<td style="text-align: right">15.71 kB</td>
</tr>
<tr>
<td>Exception<strong>Deep</strong>-StackTrace</td>
<td style="text-align: right">154,569.3454 ns</td>
<td style="text-align: right">205.2174 ns</td>
<td style="text-align: right">794.8034 ns</td>
<td style="text-align: right">3.6133</td>
<td style="text-align: right">27.42 kB</td>
</tr>
</tbody>
</table>
</span>
<p><strong>Note:</strong> in these tests we are triggering an exception <strong>every-time</strong> a method is called, they aren’t the rare cases that we measured previously.</p>
<h4 id="exception-handling-without-collecting-the-full-stacktrace"><strong>Exception handling without collecting the full StackTrace</strong></h4>
<p>First we are going to look at the results measuring the scenario where we <strong>don’t</strong> explicitly collect the <code class="language-plaintext highlighter-rouge">StackTrace</code> after the exception is caught, so the benchmark code looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">ExceptionMessage</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="nf">Level20</span><span class="p">();</span> <span class="c1">// start *all* the way down the stack</span>
<span class="k">return</span> <span class="k">null</span><span class="p">;</span> <span class="c1">//Prevent Error CS0161: not all code paths return a value</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">InvalidOperationException</span> <span class="n">ioex</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Only get the simple message from the Exception </span>
<span class="c1">// (don't trigger a StackTrace collection)</span>
<span class="k">return</span> <span class="n">ioex</span><span class="p">.</span><span class="n">Message</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In the following graphs, <strong>shallow</strong> stack traces are in <font color="#5B9BD5" style="font-weight: bold;">blue bars</font>, <strong>medium</strong> in <font color="#ED7D31" style="font-weight: bold;">orange</font> and <strong>deep</strong> stacks are shown in <font color="#70AD47" style="font-weight: bold;">green</font></p>
<p><a href="/images/2016/12/Exception Handling - NOT Calculating StackTrace.png"><img src="/images/2016/12/Exception Handling - NOT Calculating StackTrace.png" alt="Exception Handling - NOT Calculating StackTrace" /></a></p>
<p>So we clearly see there is an extra cost for exception handling that increases the deeper the stack trace goes. This is because when an exception is thrown the runtime needs to search up the stack until it hits a method than can handle it. The further it has to look up the stack, the more work it has to do.</p>
<h4 id="exception-handling-including-collection-of-the-full-stacktrace"><strong>Exception handling including collection of the full StackTrace</strong></h4>
<p>Now for the final results, in which we <strong>explicitly ask</strong> the run-time to (lazily) fetch the full stack trace, by accessing the <code class="language-plaintext highlighter-rouge">StackTrace</code> property. The code looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">ExceptionStackTrace</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="nf">Level20</span><span class="p">();</span> <span class="c1">// start *all* the way down the stack</span>
<span class="k">return</span> <span class="k">null</span><span class="p">;</span> <span class="c1">//Prevent Error CS0161: not all code paths return a value</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">InvalidOperationException</span> <span class="n">ioex</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Force collection of a full StackTrace</span>
<span class="k">return</span> <span class="n">ioex</span><span class="p">.</span><span class="n">StackTrace</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p><a href="/images/2016/12/Exception Handling - Calculating StackTrace.png"><img src="/images/2016/12/Exception Handling - Calculating StackTrace.png" alt="Exception Handling - Calculating StackTrace" /></a></p>
<p>Finally we see that fetching the entire stack trace (via <code class="language-plaintext highlighter-rouge">StackTrace</code>) dominates the performance of just handling the exception (ie. only accessing the exception message). But again, the deeper the stack trace, the higher the cost.</p>
<p>So thanks goodness we’re in the .NET world, where huge stack traces are rare. Over in <a href="https://ptrthomas.wordpress.com/2006/06/06/java-call-stack-from-http-upto-jdbc-as-a-picture/">Java-land they have to deal with nonesense like this</a> (click to see the full-res version!!):</p>
<p><a href="/images/2016/12/Huge Java Stack Trace.png"><img src="/images/2016/12/Huge Java Stack Trace - smaller.png" alt="Huge Java Stack Trace" /></a></p>
<hr />
<h2 id="conclusion">Conclusion</h2>
<ol>
<li><strong>Rare or Exceptional exceptions are not hugely expensive</strong> and they should <strong>always</strong> be the preferred way of error handling in .NET</li>
<li>If you have code that is <strong>expected to fail often</strong> (such as parsing a string into an integer), use the <code class="language-plaintext highlighter-rouge">TryXXX()</code> pattern</li>
<li><strong>The deeper the stack trace, the more work that has to be done</strong>, so the more overhead there is when catching/handling exceptions</li>
<li>This is even more true if you are also fetching the entire stack trace, via the <code class="language-plaintext highlighter-rouge">StackTrace</code> property. <strong>So if you don’t need it, don’t fetch it.</strong></li>
</ol>
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/5jdosy/why_exceptions_should_be_exceptional/">/r/programming</a> and <a href="https://www.reddit.com/r/csharp/comments/5je0o3/why_exceptions_should_be_exceptional/">/r/csharp</a></p>
<hr />
<h3 id="further-reading">Further Reading</h3>
<p><a href="https://blogs.msdn.microsoft.com/ricom/2003/12/19/exception-cost-when-to-throw-and-when-not-to/">Exception Cost: When to throw and when not to</a> a classic post on the subject, by ‘.NET Perf Guru’ Rico Mariani.</p>
<h3 id="the-stack-trace-of-a-stacktrace">The stack trace of a StackTrace!!</h3>
<p>The full call-stack that the CLR goes through when fetching the data for the Exception <code class="language-plaintext highlighter-rouge">StackTrace</code> property</p>
<ul>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/exception.cs,950d763693dd32d3">Exception - public virtual String StackTrace</a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/exception.cs,fd7466f7c15d31c7">Exception - private string GetStackTrace(..)</a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/environment.cs,40b558dbbbc4b07a">Environment - internal static String GetStackTrace(..)</a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/diagnostics/stacktrace.cs,15f43636ec9ec56f">Diagnostics - public StackTrace(..)</a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/diagnostics/stacktrace.cs,2938a79cef33dc28">Diagnostics - private void CaptureStackTrace(..)</a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/diagnostics/stacktrace.cs,3a7c9de344634c84">Diagnostics - internal static extern void GetStackFramesInternal(..)</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/32d03bb66a51c7ed6712c4cdd319de0cc7cbbf37/src/vm/debugdebugger.cpp#L391-L868">debugdebugger - DebugStackTrace::GetStackFramesInternal(..)</a> (c/c++)</li>
<li><a href="https://github.com/dotnet/coreclr/blob/32d03bb66a51c7ed6712c4cdd319de0cc7cbbf37/src/vm/debugdebugger.cpp#L1185-L1289">debugdebugger - DebugStackTrace::GetStackFramesFromException(..)</a> (c/c++)</li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/">Why Exceptions should be Exceptional</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Why is reflection slow?2016-12-14T00:00:00+00:00http://www.mattwarren.org/2016/12/14/Why-is-Reflection-slow
<p>It’s common knowledge that <a href="http://stackoverflow.com/search?q=reflection+slow">reflection in .NET is slow</a>, but why is that the case? This post aims to figure that out by looking at what reflection does <em>under-the-hood</em>.</p>
<h3 id="clr-type-system-design-goals">CLR Type System Design Goals</h3>
<p>But first it’s worth pointing out that part of the reason reflection isn’t fast is that it was never designed to have <em>high-performance</em> as one of its goals, from <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/type-system.md#design-goals-and-non-goals">Type System Overview - ‘Design Goals and Non-goals’</a>:</p>
<blockquote>
<p><strong>Goals</strong></p>
<ul>
<li><strong>Accessing information needed at runtime from executing (non-reflection) code is very fast.</strong></li>
<li>Accessing information needed at compilation time for generating code is straightforward.</li>
<li>The garbage collector/stackwalker is able to access necessary information without taking locks, or allocating memory.</li>
<li>Minimal amounts of types are loaded at a time.</li>
<li>Minimal amounts of a given type are loaded at type load time.</li>
<li>Type system data structures must be storable in NGEN images.</li>
</ul>
</blockquote>
<blockquote>
<p><strong>Non-Goals</strong></p>
<ul>
<li>All information in the metadata is directly reflected in the CLR data structures.</li>
<li><strong>All uses of reflection are fast.</strong></li>
</ul>
</blockquote>
<p>and along the same lines, from <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/type-loader.md#key-data-structures">Type Loader Design - ‘Key Data Structures’</a>:</p>
<blockquote>
<p><strong>EEClass</strong></p>
<p>MethodTable data are split into “hot” and “cold” structures to improve working set and cache utilization. MethodTable itself is meant to only store “hot” data that are needed in program steady state. <strong>EEClass stores “cold” data that are typically only needed by type loading, JITing or reflection.</strong> Each MethodTable points to one EEClass.</p>
</blockquote>
<h2 id="how-does-reflection-work">How does Reflection work?</h2>
<p><strong>So we know that ensuring reflection was fast was not a design goal, but what is it doing that takes the extra time?</strong></p>
<p>Well there several things that are happening, to illustrate this lets look at the managed and unmanaged code call-stack that a reflection call goes through.</p>
<ul>
<li><strong>System.Reflection.RuntimeMethodInfo.Invoke</strong>(..) - <a href="https://github.com/dotnet/coreclr/blob/b638af3a4dd52fa7b1ea1958164136c72096c25c/src/mscorlib/src/System/Reflection/MethodInfo.cs#L619-L638">source code link</a>
<ul>
<li>calling <strong>System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal</strong>(..)</li>
</ul>
</li>
<li><strong>System.RuntimeMethodHandle.PerformSecurityCheck</strong>(..) - <a href="https://github.com/dotnet/coreclr/blob/e67851210d1c03d730a3bc97a87e8a6713bbf772/src/vm/reflectioninvocation.cpp#L949-L974">link</a>
<ul>
<li>calling <strong>System.GC.KeepAlive</strong>(..)</li>
</ul>
</li>
<li><strong>System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal</strong>(..) - <a href="https://github.com/dotnet/coreclr/blob/b638af3a4dd52fa7b1ea1958164136c72096c25c/src/mscorlib/src/System/Reflection/MethodInfo.cs#L651-L665">link</a>
<ul>
<li>calling stub for <strong>System.RuntimeMethodHandle.InvokeMethod</strong>(..)</li>
</ul>
</li>
<li>stub for <strong>System.RuntimeMethodHandle.InvokeMethod</strong>(..) - <a href="https://github.com/dotnet/coreclr/blob/e67851210d1c03d730a3bc97a87e8a6713bbf772/src/vm/reflectioninvocation.cpp#L1322-L1732">link</a></li>
</ul>
<p>Even if you don’t click the links and look at the individual C#/cpp methods, you can intuitively tell that there’s <em>alot</em> of code being executed along the way. But to give you an example, the final method, where the bulk of the work is done, <a href="https://github.com/dotnet/coreclr/blob/e67851210d1c03d730a3bc97a87e8a6713bbf772/src/vm/reflectioninvocation.cpp#L1322-L1732"><code class="language-plaintext highlighter-rouge">System.RuntimeMethodHandle.InvokeMethod</code> is over 400 LOC</a>!</p>
<p><strong>But this is a nice overview, however what is it <em>specifically</em> doing?</strong></p>
<h3 id="fetching-the-method-information">Fetching the Method information</h3>
<p>Before you can invoke a field/property/method via reflection you have to get the <code class="language-plaintext highlighter-rouge">FieldInfo/PropertyInfo/MethodInfo</code> handle for it, using code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Type</span> <span class="n">t</span> <span class="p">=</span> <span class="k">typeof</span><span class="p">(</span><span class="n">Person</span><span class="p">);</span>
<span class="n">FieldInfo</span> <span class="n">m</span> <span class="p">=</span> <span class="n">t</span><span class="p">.</span><span class="nf">GetField</span><span class="p">(</span><span class="s">"Name"</span><span class="p">);</span>
</code></pre></div></div>
<p>As shown in the previous section there’s a cost to this, because the relevant meta-data has to be fetched, parsed, etc. Interestingly enough the runtime helps us by keeping an internal cache of all the fields/properties/methods. This cache is implemented by the <a href="https://github.com/dotnet/coreclr/blob/b638af3a4dd52fa7b1ea1958164136c72096c25c/src/mscorlib/src/System/RtType.cs#L178-L248"><code class="language-plaintext highlighter-rouge">RuntimeTypeCache</code> class</a> and one example of its usage is in the <a href="https://github.com/dotnet/coreclr/blob/b638af3a4dd52fa7b1ea1958164136c72096c25c/src/mscorlib/src/System/Reflection/MethodInfo.cs#L95"><code class="language-plaintext highlighter-rouge">RuntimeMethodInfo</code> class</a>.</p>
<p>You can see the cache in action by running the code in <a href="https://gist.github.com/mattwarren/be21d80a016043ea5c462415b81d9b69">this gist</a>, which appropriately enough uses reflection to inspect the runtime internals!</p>
<p>Before you have done any reflection to obtain a <code class="language-plaintext highlighter-rouge">FieldInfo</code>, the code in the gist will print this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Type: ReflectionOverhead.Program
Reflection Type: System.RuntimeType (BaseType: System.Reflection.TypeInfo)
m_fieldInfoCache is null, cache has not been initialised yet
</code></pre></div></div>
<p>But once you’ve fetched even just one field, then the following will be printed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Type: ReflectionOverhead.Program
Reflection Type: System.RuntimeType (BaseType: System.Reflection.TypeInfo)
RuntimeTypeCache: System.RuntimeType+RuntimeTypeCache,
m_cacheComplete = True, 4 items in cache
[0] - Int32 TestField1 - Private
[1] - System.String TestField2 - Private
[2] - Int32 <TestProperty1>k__BackingField - Private
[3] - System.String TestField3 - Private, Static
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">ReflectionOverhead.Program</code> looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Program</span>
<span class="p">{</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">TestField1</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">string</span> <span class="n">TestField2</span><span class="p">;</span>
<span class="k">private</span> <span class="k">static</span> <span class="kt">string</span> <span class="n">TestField3</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">TestProperty1</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This means that repeated calls to <code class="language-plaintext highlighter-rouge">GetField</code> or <code class="language-plaintext highlighter-rouge">GetFields</code> are cheaper as the runtime only has to filter the pre-existing list that’s already been created. The same applies to <code class="language-plaintext highlighter-rouge">GetMethod</code> and <code class="language-plaintext highlighter-rouge">GetProperty</code>, when you call them the first time the <code class="language-plaintext highlighter-rouge">MethodInfo</code> or <code class="language-plaintext highlighter-rouge">PropertyInfo</code> cache is built.</p>
<h3 id="argument-validation-and-error-handling">Argument Validation and Error Handling</h3>
<p>But once you’ve obtained the <code class="language-plaintext highlighter-rouge">MethodInfo</code>, there’s still a lot of work to be done when you call <code class="language-plaintext highlighter-rouge">Invoke</code> on it. Imagine you wrote some code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PropertyInfo</span> <span class="n">stringLengthField</span> <span class="p">=</span>
<span class="k">typeof</span><span class="p">(</span><span class="kt">string</span><span class="p">).</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Length"</span><span class="p">,</span>
<span class="n">BindingFlags</span><span class="p">.</span><span class="n">Instance</span> <span class="p">|</span> <span class="n">BindingFlags</span><span class="p">.</span><span class="n">Public</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">length</span> <span class="p">=</span> <span class="n">stringLengthField</span><span class="p">.</span><span class="nf">GetGetMethod</span><span class="p">().</span><span class="nf">Invoke</span><span class="p">(</span><span class="k">new</span> <span class="nf">Uri</span><span class="p">(),</span> <span class="k">new</span> <span class="kt">object</span><span class="p">[</span><span class="m">0</span><span class="p">]);</span>
</code></pre></div></div>
<p>If you run it you would get the following exception:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>System.Reflection.TargetException: Object does not match target type.
at System.Reflection.RuntimeMethodInfo.CheckConsistency(..)
at System.Reflection.RuntimeMethodInfo.InvokeArgumentsCheck(..)
at System.Reflection.RuntimeMethodInfo.Invoke(..)
at System.Reflection.RuntimePropertyInfo.GetValue(..)
</code></pre></div></div>
<p>This is because we have obtained the <code class="language-plaintext highlighter-rouge">PropertyInfo</code> for the <code class="language-plaintext highlighter-rouge">Length</code> property on the <code class="language-plaintext highlighter-rouge">String</code> class, but invoked it with an <code class="language-plaintext highlighter-rouge">Uri</code> object, which is clearly the wrong type!</p>
<p>In addition to this, there also has to be validation of any arguments you pass through to the method you are invoking. To make argument passing work, reflection APIs take a parameter that is an array of <code class="language-plaintext highlighter-rouge">object</code>’s, one per argument. So if you using reflection to call the method <code class="language-plaintext highlighter-rouge">Add(int x, int y)</code>, you would invoke it by calling <code class="language-plaintext highlighter-rouge">methodInfo.Invoke(.., new [] { 5, 6 })</code>. At run-time checks need to be carried out on the amount and types of the values passed in, in this case to ensure that there are 2 and that they are both <code class="language-plaintext highlighter-rouge">int</code>’s. One down-side of all this work is that it often involves <em>boxing</em> which has an additional cost, but hopefully this will be <a href="https://github.com/dotnet/corefx/issues/14021">minimised in the future</a>.</p>
<h3 id="security-checks">Security Checks</h3>
<p>The other main task that is happening along the way is multiple security checks. For instance, it turns out that you aren’t allowed to use reflection to call just any method you feel like. There are some restricted or <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/dangerousapis.h#L7-L13">‘Dangerous Methods’</a>, that can only be called by trusted .NET framework code. In addition to a black-list, there are also dynamic security checks depending on the current <a href="https://msdn.microsoft.com/en-us/library/33tceax8(v=vs.110).aspx">Code Access Security permissions</a> that have to be <a href="https://github.com/dotnet/coreclr/blob/e67851210d1c03d730a3bc97a87e8a6713bbf772/src/vm/reflectioninvocation.cpp#L880-L947">checked during invocation</a>.</p>
<hr />
<h2 id="how-much-does-reflection-cost">How much does Reflection cost?</h2>
<p>So now that we know what reflection is doing <em>behind-the-scenes</em>, it’s a good time to look at what it costs us. Please note that these benchmarks are comparing reading/writing a property directly v via reflection. In .NET properties are actually a pair of <code class="language-plaintext highlighter-rouge">Get/Set</code> methods that <a href="http://stackoverflow.com/questions/23102639/are-c-sharp-properties-actually-methods/23102679#23102679">the compiler generates for us</a>, however when the property has just a simple backing field the .NET JIT inlines the method call for performance reasons. This means that using reflection to access a property will show reflection in the worse possible light, but it was chosen as it’s the most common use-case, showing up in <a href="https://github.com/StackExchange/dapper-dot-net">ORMs</a>, <a href="http://www.newtonsoft.com/json">Json serialisation/deserialisation libraries</a> and <a href="http://automapper.org/">object mapping tools</a>.</p>
<p>Below are the raw results as they are displayed by <a href="http://benchmarkdotnet.org/">BenchmarkDotNet</a>, followed by the same results displayed in 2 separate tables. (full <a href="https://gist.github.com/mattwarren/a8ae31a197f4716a9d65947f4a20a069">Benchmark code is available</a>)</p>
<p><a href="/images/2016/12/Reflection Benchmark Results.png"><img src="/images/2016/12/Reflection Benchmark Results.png" alt="Reflection Benchmark Results" /></a></p>
<h3 id="reading-a-property-get">Reading a Property (‘Get’)</h3>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">StdErr</th>
<th style="text-align: right">Scaled</th>
<th style="text-align: right">Bytes Allocated/Op</th>
</tr>
</thead>
<tbody>
<tr>
<td>GetViaProperty</td>
<td style="text-align: right">0.2159 ns</td>
<td style="text-align: right">0.0047 ns</td>
<td style="text-align: right">1.00</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaDelegate</td>
<td style="text-align: right">1.8903 ns</td>
<td style="text-align: right">0.0082 ns</td>
<td style="text-align: right">8.82</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaILEmit</td>
<td style="text-align: right">2.9236 ns</td>
<td style="text-align: right">0.0067 ns</td>
<td style="text-align: right">13.64</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaCompiledExpressionTrees</td>
<td style="text-align: right">12.3623 ns</td>
<td style="text-align: right">0.0200 ns</td>
<td style="text-align: right">57.65</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaFastMember</td>
<td style="text-align: right">35.9199 ns</td>
<td style="text-align: right">0.0528 ns</td>
<td style="text-align: right">167.52</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaReflectionWithCaching</td>
<td style="text-align: right">125.3878 ns</td>
<td style="text-align: right">0.2017 ns</td>
<td style="text-align: right">584.78</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>GetViaReflection</td>
<td style="text-align: right">197.9258 ns</td>
<td style="text-align: right">0.2704 ns</td>
<td style="text-align: right">923.08</td>
<td style="text-align: right">0.01</td>
</tr>
<tr>
<td>GetViaDelegateDynamicInvoke</td>
<td style="text-align: right">842.9131 ns</td>
<td style="text-align: right">1.2649 ns</td>
<td style="text-align: right">3,931.17</td>
<td style="text-align: right">419.04</td>
</tr>
</tbody>
</table>
<h3 id="writing-a-property-set">Writing a Property (‘Set’)</h3>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">StdErr</th>
<th style="text-align: right">Scaled</th>
<th style="text-align: right">Bytes Allocated/Op</th>
</tr>
</thead>
<tbody>
<tr>
<td>SetViaProperty</td>
<td style="text-align: right">1.4043 ns</td>
<td style="text-align: right">0.0200 ns</td>
<td style="text-align: right">6.55</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>SetViaDelegate</td>
<td style="text-align: right">2.8215 ns</td>
<td style="text-align: right">0.0078 ns</td>
<td style="text-align: right">13.16</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>SetViaILEmit</td>
<td style="text-align: right">2.8226 ns</td>
<td style="text-align: right">0.0061 ns</td>
<td style="text-align: right">13.16</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>SetViaCompiledExpressionTrees</td>
<td style="text-align: right">10.7329 ns</td>
<td style="text-align: right">0.0221 ns</td>
<td style="text-align: right">50.06</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>SetViaFastMember</td>
<td style="text-align: right">36.6210 ns</td>
<td style="text-align: right">0.0393 ns</td>
<td style="text-align: right">170.79</td>
<td style="text-align: right">0.00</td>
</tr>
<tr>
<td>SetViaReflectionWithCaching</td>
<td style="text-align: right">214.4321 ns</td>
<td style="text-align: right">0.3122 ns</td>
<td style="text-align: right">1,000.07</td>
<td style="text-align: right">98.49</td>
</tr>
<tr>
<td>SetViaReflection</td>
<td style="text-align: right">287.1039 ns</td>
<td style="text-align: right">0.3288 ns</td>
<td style="text-align: right">1,338.99</td>
<td style="text-align: right">115.63</td>
</tr>
<tr>
<td>SetViaDelegateDynamicInvoke</td>
<td style="text-align: right">922.4618 ns</td>
<td style="text-align: right">2.9192 ns</td>
<td style="text-align: right">4,302.17</td>
<td style="text-align: right">390.99</td>
</tr>
</tbody>
</table>
<p>So we can clearly see that regular reflection code (<code class="language-plaintext highlighter-rouge">GetViaReflection</code> and <code class="language-plaintext highlighter-rouge">SetViaReflection</code>) is considerably slower than accessing the property directly (<code class="language-plaintext highlighter-rouge">GetViaProperty</code> and <code class="language-plaintext highlighter-rouge">SetViaProperty</code>). But what about the other results, lets explore those in more detail.</p>
<h3 id="setup">Setup</h3>
<p>First we start with a <code class="language-plaintext highlighter-rouge">TestClass</code> that looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">TestClass</span>
<span class="p">{</span>
<span class="k">public</span> <span class="nf">TestClass</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Data</span> <span class="p">=</span> <span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">private</span> <span class="kt">string</span> <span class="n">data</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">string</span> <span class="n">Data</span>
<span class="p">{</span>
<span class="k">get</span> <span class="p">{</span> <span class="k">return</span> <span class="n">data</span><span class="p">;</span> <span class="p">}</span>
<span class="k">set</span> <span class="p">{</span> <span class="n">data</span> <span class="p">=</span> <span class="k">value</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>and the following common code, that all the options can make use of:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once </span>
<span class="n">TestClass</span> <span class="n">testClass</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">TestClass</span><span class="p">(</span><span class="s">"A String"</span><span class="p">);</span>
<span class="n">Type</span> <span class="n">@class</span> <span class="p">=</span> <span class="n">testClass</span><span class="p">.</span><span class="nf">GetType</span><span class="p">();</span>
<span class="n">BindingFlag</span> <span class="n">bindingFlags</span> <span class="p">=</span> <span class="n">BindingFlags</span><span class="p">.</span><span class="n">Instance</span> <span class="p">|</span>
<span class="n">BindingFlags</span><span class="p">.</span><span class="n">NonPublic</span> <span class="p">|</span>
<span class="n">BindingFlags</span><span class="p">.</span><span class="n">Public</span><span class="p">;</span>
</code></pre></div></div>
<h3 id="regular-reflection">Regular Reflection</h3>
<p>First we use regular benchmark code, that acts as out starting point and the ‘worst case’:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaReflection</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">PropertyInfo</span> <span class="n">property</span> <span class="p">=</span> <span class="n">@class</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="n">bindingFlags</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">string</span><span class="p">)</span><span class="n">property</span><span class="p">.</span><span class="nf">GetValue</span><span class="p">(</span><span class="n">testClass</span><span class="p">,</span> <span class="k">null</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="option-1---cache-propertyinfo">Option 1 - Cache PropertyInfo</h3>
<p>Next up, we can gain a small speed boost by keeping a reference to the <code class="language-plaintext highlighter-rouge">PropertyInfo</code>, rather than fetching it each time. But we’re still much slower than accessing the property directly, which demonstrates that there is a considerable cost in the ‘invocation’ part of reflection.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once</span>
<span class="n">PropertyInfo</span> <span class="n">cachedPropertyInfo</span> <span class="p">=</span> <span class="n">@class</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="n">bindingFlags</span><span class="p">);</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaReflection</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">string</span><span class="p">)</span><span class="n">cachedPropertyInfo</span><span class="p">.</span><span class="nf">GetValue</span><span class="p">(</span><span class="n">testClass</span><span class="p">,</span> <span class="k">null</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="option-2---use-fastmember">Option 2 - Use FastMember</h3>
<p>Here we make use of Marc Gravell’s excellent <a href="http://blog.marcgravell.com/2012/01/playing-with-your-member.html">Fast Member library</a>, which as you can see is very simple to use!</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once</span>
<span class="n">TypeAccessor</span> <span class="n">accessor</span> <span class="p">=</span> <span class="n">TypeAccessor</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="n">@class</span><span class="p">,</span> <span class="n">allowNonPublicAccessors</span><span class="p">:</span> <span class="k">true</span><span class="p">);</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaFastMember</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">string</span><span class="p">)</span><span class="n">accessor</span><span class="p">[</span><span class="n">testClass</span><span class="p">,</span> <span class="s">"Data"</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note that it’s doing something slightly different to the other options. It creates a <code class="language-plaintext highlighter-rouge">TypeAccessor</code> that allows access to <strong>all</strong> the Properties on a type, not just one. But the downside is that, as a result, it takes longer to run. This is because internally it first has to get the <code class="language-plaintext highlighter-rouge">delegate</code> for the Property you requested (in this case ‘Data’), before fetching it’s value. However this overhead is pretty small, FastMember is still way faster than Reflection and it’s very easy to use, so I recommend you take a look at it first.</p>
<p>This option and all subsequent ones convert the reflection code into a <a href="https://msdn.microsoft.com/en-us/library/ms173171.aspx"><code class="language-plaintext highlighter-rouge">delegate</code></a> that can be directly invoked without the overhead of reflection every time, hence the speed boost!</p>
<p>Although it’s worth pointing out that the creation of a <code class="language-plaintext highlighter-rouge">delegate</code> has a cost (see <a href="#further-reading">‘Further Reading’</a> for more info). So in short, the speed boost is because we are doing the expensive work once (security checks, etc) and storing a strongly typed <code class="language-plaintext highlighter-rouge">delegate</code> that we can use again and again with little overhead. You wouldn’t use these techniques if you were doing reflection once, but if you’re only doing it once it wouldn’t be a performance bottleneck, so you wouldn’t care if it was slow!</p>
<p>The reason that reading a property via a <code class="language-plaintext highlighter-rouge">delegate</code> isn’t as fast as reading it directly is because the .NET JIT won’t inline a <code class="language-plaintext highlighter-rouge">delegate</code> method call like it will do with a Property access. So with a <code class="language-plaintext highlighter-rouge">delegate</code> we have to pay the cost of a method call, which direct access doesn’t.</p>
<h3 id="option-3---create-a-delegate">Option 3 - Create a Delegate</h3>
<p>In this option we use the <code class="language-plaintext highlighter-rouge">CreateDelegate</code> function to turn our PropertyInfo into a regular <code class="language-plaintext highlighter-rouge">delegate</code>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once</span>
<span class="n">PropertyInfo</span> <span class="n">property</span> <span class="p">=</span> <span class="n">@class</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="n">bindingFlags</span><span class="p">);</span>
<span class="n">Func</span><span class="p"><</span><span class="n">TestClass</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="n">getDelegate</span> <span class="p">=</span>
<span class="p">(</span><span class="n">Func</span><span class="p"><</span><span class="n">TestClass</span><span class="p">,</span> <span class="kt">string</span><span class="p">>)</span><span class="n">Delegate</span><span class="p">.</span><span class="nf">CreateDelegate</span><span class="p">(</span>
<span class="k">typeof</span><span class="p">(</span><span class="n">Func</span><span class="p"><</span><span class="n">TestClass</span><span class="p">,</span> <span class="kt">string</span><span class="p">>),</span>
<span class="n">property</span><span class="p">.</span><span class="nf">GetGetMethod</span><span class="p">(</span><span class="n">nonPublic</span><span class="p">:</span> <span class="k">true</span><span class="p">));</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaDelegate</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nf">getDelegate</span><span class="p">(</span><span class="n">testClass</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The drawback is that you to need to know the concrete type at <strong>compile-time</strong>, i.e. the <code class="language-plaintext highlighter-rouge">Func<TestClass, string></code> part in the code above (no you can’t use <code class="language-plaintext highlighter-rouge">Func<object, string></code>, if you do it’ll thrown an exception!). In the majority of situations when you are doing reflection you don’t have this luxury, otherwise you wouldn’t be using reflection in the first place, so it’s not a complete solution.</p>
<p>For a very interesting/mind-bending way to get round this, see the <code class="language-plaintext highlighter-rouge">MagicMethodHelper</code> code in the fantastic blog post from Jon Skeet <a href="https://codeblog.jonskeet.uk/2008/08/09/making-reflection-fly-and-exploring-delegates/">‘Making Reflection fly and exploring delegates’</a> or read on for Options 4 or 5 below.</p>
<h3 id="option-4---compiled-expression-trees">Option 4 - Compiled Expression Trees</h3>
<p>Here we generate a <code class="language-plaintext highlighter-rouge">delegate</code>, but the difference is that we can pass in an <code class="language-plaintext highlighter-rouge">object</code>, so we get round the limitation of ‘Option 3’. We make use of the .NET <a href="https://msdn.microsoft.com/en-us/library/mt654263.aspx"><code class="language-plaintext highlighter-rouge">Expression</code> tree API</a> that allows dynamic code generation:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once</span>
<span class="n">PropertyInfo</span> <span class="n">property</span> <span class="p">=</span> <span class="n">@class</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="n">bindingFlags</span><span class="p">);</span>
<span class="n">ParameterExpression</span> <span class="p">=</span> <span class="n">Expression</span><span class="p">.</span><span class="nf">Parameter</span><span class="p">(</span><span class="k">typeof</span><span class="p">(</span><span class="kt">object</span><span class="p">),</span> <span class="s">"instance"</span><span class="p">);</span>
<span class="n">UnaryExpression</span> <span class="n">instanceCast</span> <span class="p">=</span>
<span class="p">!</span><span class="n">property</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">.</span><span class="n">IsValueType</span> <span class="p">?</span>
<span class="n">Expression</span><span class="p">.</span><span class="nf">TypeAs</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">property</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">)</span> <span class="p">:</span>
<span class="n">Expression</span><span class="p">.</span><span class="nf">Convert</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">property</span><span class="p">.</span><span class="n">DeclaringType</span><span class="p">);</span>
<span class="n">Func</span><span class="p"><</span><span class="kt">object</span><span class="p">,</span> <span class="kt">object</span><span class="p">></span> <span class="n">GetDelegate</span> <span class="p">=</span>
<span class="n">Expression</span><span class="p">.</span><span class="n">Lambda</span><span class="p"><</span><span class="n">Func</span><span class="p"><</span><span class="kt">object</span><span class="p">,</span> <span class="kt">object</span><span class="p">>>(</span>
<span class="n">Expression</span><span class="p">.</span><span class="nf">TypeAs</span><span class="p">(</span>
<span class="n">Expression</span><span class="p">.</span><span class="nf">Call</span><span class="p">(</span><span class="n">instanceCast</span><span class="p">,</span> <span class="n">property</span><span class="p">.</span><span class="nf">GetGetMethod</span><span class="p">(</span><span class="n">nonPublic</span><span class="p">:</span> <span class="k">true</span><span class="p">)),</span>
<span class="k">typeof</span><span class="p">(</span><span class="kt">object</span><span class="p">)),</span>
<span class="n">instance</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Compile</span><span class="p">();</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaCompiledExpressionTrees</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">string</span><span class="p">)</span><span class="nf">GetDelegate</span><span class="p">(</span><span class="n">testClass</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Full code for the <code class="language-plaintext highlighter-rouge">Expression</code> based approach is available in the blog post <a href="http://geekswithblogs.net/Madman/archive/2008/06/27/faster-reflection-using-expression-trees.aspx">Faster Reflection using Expression Trees</a></p>
<h3 id="option-5---dynamic-code-gen-with-il-emit">Option 5 - Dynamic code-gen with IL Emit</h3>
<p>Finally we come to the lowest-level approach, emiting raw IL, although ‘<em>with great power, comes great responsibility</em>’:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Setup code, done only once</span>
<span class="n">PropertyInfo</span> <span class="n">property</span> <span class="p">=</span> <span class="n">@class</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="n">bindingFlags</span><span class="p">);</span>
<span class="n">Sigil</span><span class="p">.</span><span class="n">Emit</span> <span class="n">getterEmiter</span> <span class="p">=</span> <span class="n">Emit</span><span class="p"><</span><span class="n">Func</span><span class="p"><</span><span class="kt">object</span><span class="p">,</span> <span class="kt">string</span><span class="p">>></span>
<span class="p">.</span><span class="nf">NewDynamicMethod</span><span class="p">(</span><span class="s">"GetTestClassDataProperty"</span><span class="p">)</span>
<span class="p">.</span><span class="nf">LoadArgument</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="p">.</span><span class="nf">CastClass</span><span class="p">(</span><span class="n">@class</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Call</span><span class="p">(</span><span class="n">property</span><span class="p">.</span><span class="nf">GetGetMethod</span><span class="p">(</span><span class="n">nonPublic</span><span class="p">:</span> <span class="k">true</span><span class="p">))</span>
<span class="p">.</span><span class="nf">Return</span><span class="p">();</span>
<span class="n">Func</span><span class="p"><</span><span class="kt">object</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="n">getter</span> <span class="p">=</span> <span class="n">getterEmiter</span><span class="p">.</span><span class="nf">CreateDelegate</span><span class="p">();</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">GetViaILEmit</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nf">getter</span><span class="p">(</span><span class="n">testClass</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Using <code class="language-plaintext highlighter-rouge">Expression</code> tress (as shown in Option 4), doesn’t give you as much flexibility as emitting IL codes directly, although it does prevent you from emitting invalid code! Because of this, if you ever find yourself needing to emil IL I really recommend using the excellent <a href="https://github.com/kevin-montrose/Sigil">Sigil library</a>, as it gives better error messages when you get things wrong!</p>
<hr />
<h2 id="conclusion">Conclusion</h2>
<p>The take-away is that if (and only if) you find yourself with a performance issue when using reflection, there are several different ways you can make it faster. These speed gains are achieved by getting a <code class="language-plaintext highlighter-rouge">delegate</code> that allows you to access the Property/Field/Method directly, without all the overhead of going via reflection every-time.</p>
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/5ie775/why_is_reflection_slow/">/r/programming</a> and <a href="https://www.reddit.com/r/csharp/comments/5igo67/why_is_reflection_slow/">/r/csharp</a></p>
<hr />
<h3 id="further-reading">Further Reading</h3>
<ul>
<li><a href="https://github.com/vivainio/FastExpressionKit">FastExpressionKit - A small library to make reflection-y things faster</a></li>
<li><a href="http://stackoverflow.com/questions/8846948/is-reflection-really-slow/8849503#8849503">Is Reflection really slow?</a></li>
<li><a href="http://stackoverflow.com/questions/3502674/why-is-reflection-slow/3502710#3502710">Why is reflection slow?</a></li>
<li><a href="http://stackoverflow.com/questions/25458/how-costly-is-net-reflection">How costly is .NET reflection?</a></li>
<li><a href="http://stackoverflow.com/questions/771524/how-slow-is-reflection/771533#771533">How slow is Reflection</a></li>
<li><a href="http://softwareengineering.stackexchange.com/questions/143205/reflection-is-using-reflection-still-bad-or-slow-what-has-changed-with-ref">Reflection: Is using reflection still “bad” or “slow”? What has changed with reflection since 2002?</a></li>
<li><a href="https://jeremybytes.blogspot.co.uk/2014/01/improving-reflection-performance-with.html">Improving Reflection Performance with Delegates</a></li>
<li><a href="http://kennethxu.blogspot.co.uk/2009/05/cnet-calling-grandparent-virtual-method.html">C#.Net Calling Grandparent’s Virtual Method (base.base in C#)</a> - <a href="http://kennethxu.blogspot.co.uk/2009/05/strong-typed-high-performance.html">Part I</a>, <a href="http://kennethxu.blogspot.co.uk/2009/05/strong-typed-high-performance_15.html">Part II</a>, <a href="http://kennethxu.blogspot.co.uk/2009/05/strong-typed-high-performance_18.html">Part III</a></li>
<li><a href="https://codeblog.jonskeet.uk/2008/08/09/making-reflection-fly-and-exploring-delegates/">‘Making Reflection fly and exploring delegates’</a></li>
<li><a href="http://theburningmonk.com/2015/08/fasterflect-vs-hyperdescriptor-vs-fastmember-vs-reflection/">Fasterflect vs HyperDescriptor vs FastMember vs Reflection</a></li>
<li><a href="https://github.com/naasking/Dynamics.NET">Dynamics.NET - Extensions for efficient runtime reflection and structural induction</a></li>
</ul>
<p>For reference, below is the call-stack or code-flow that the runtime goes through when <strong>Creating a Delegate</strong></p>
<ol>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/delegate.cs,0b7fb52ec60c22d3"><code class="language-plaintext highlighter-rouge">Delegate CreateDelegate(Type type, MethodInfo method)</code></a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/delegate.cs,944d5aaf940d71d0"><code class="language-plaintext highlighter-rouge">Delegate CreateDelegate(Type type, MethodInfo method, bool throwOnBindFailure)</code></a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/delegate.cs,2a6608b61df78396"><code class="language-plaintext highlighter-rouge">Delegate CreateDelegateInternal(RuntimeType rtType, RuntimeMethodInfo rtMethod, Object firstArgument, DelegateBindingFlags flags, ref StackCrawlMark stackMark)</code></a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/delegate.cs,432a6c045c0ce48d"><code class="language-plaintext highlighter-rouge">Delegate UnsafeCreateDelegate(RuntimeType rtType, RuntimeMethodInfo rtMethod, Object firstArgument, DelegateBindingFlags flags)</code></a></li>
<li><a href="https://referencesource.microsoft.com/#mscorlib/system/delegate.cs,06743cb3121175c1"><code class="language-plaintext highlighter-rouge">bool BindToMethodInfo(Object target, IRuntimeMethodInfo method, RuntimeType methodType, DelegateBindingFlags flags);</code></a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L802-L879"><code class="language-plaintext highlighter-rouge">FCIMPL5(FC_BOOL_RET, COMDelegate::BindToMethodInfo, Object* refThisUNSAFE, Object* targetUNSAFE, ReflectMethodObject *pMethodUNSAFE, ReflectClassBaseObject *pMethodTypeUNSAFE, int flags)</code></a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/7200e78258623eb889a46aa7a90818046bd1957d/src/vm/comdelegate.cpp#L885-L1099"><code class="language-plaintext highlighter-rouge">COMDelegate::BindToMethod(DELEGATEREF *pRefThis, OBJECTREF *pRefFirstArg, MethodDesc *pTargetMethod, MethodTable *pExactMethodType, BOOL fIsOpenDelegate, BOOL fCheckSecurity)</code></a></li>
</ol>
<p>The post <a href="http://www.mattwarren.org/2016/12/14/Why-is-Reflection-slow/">Why is reflection slow?</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Research papers in the .NET source2016-12-12T00:00:00+00:00http://www.mattwarren.org/2016/12/12/Research-papers-in-the-.NET-source
<p>This post is completely inspired by (or ‘copied from’ depending on your point of view) a recent post titled <a href="http://lowlevelbits.org/java-papers/">JAVA PAPERS</a> (also see the <a href="https://news.ycombinator.com/item?id=13022649">HackerNews discussion</a>). However, instead of looking at Java and the JVM, I’ll be looking at references to research papers in the <strong>.NET language, runtime and compiler source code</strong>.</p>
<p>If I’ve missed any that you know of, please leave a comment below!</p>
<p>Note: I’ve deliberately left out links to specifications, standards documents or RFC’s, instead concentrating only on <strong>Research Papers</strong>.</p>
<hr />
<h3 id="left-leaning-red-black-trees-by-robert-sedgewick---coreclr-source-reference"><a href="http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf"><strong>‘Left Leaning Red Black trees’ by Robert Sedgewick</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/tests/src/GC/Stress/Tests/RedBlackTree.cs#L7-L9">CoreCLR source reference</a></h3>
<p><strong>Abstract</strong>
The red-black tree model for implementing balanced search trees, introduced by Guibas and Sedgewick thirty years ago, is now found throughout our computational infrastructure. Red-black trees are described in standard textbooks and are the underlying data structure for symbol-table implementations within C++, Java, Python, BSD Unix, and many other modern systems. However, many of these implementations have sacrificed some of the original design goals (primarily in order to develop an effective implementation of the delete operation, which was incompletely specified in the original paper), so a new look is worthwhile.
In this paper, we describe a new variant of redblack trees that meets many of the original design goals and leads to substantially simpler code for insert/delete, less than one-fourth as much code as in implementations in common use.</p>
<h3 id="hopscotch-hashing-by-maurice-herlihy-nir-shavit-and-moran-tzafrir---coreclr-source-reference"><a href="http://mcg.cs.tau.ac.il/papers/disc2008-hopscotch.pdf"><strong>‘Hopscotch Hashing’ by Maurice Herlihy, Nir Shavit, and Moran Tzafrir</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/jit/smallhash.h#L48-L50">CoreCLR source reference</a></h3>
<p><strong>Abstract</strong>
We present a new class of resizable sequential and concur-rent hash map algorithms directed at both uni-processor and multicore machines. The new hopscotch algorithms are based on a novel hopscotch multi-phased probing and displacement technique that has the flavors of chaining, cuckoo hashing, and linear probing, all put together, yet avoids the limitations and overheads of these former approaches. The resulting algorithms provide tables with very low synchronization overheads and high cache hit ratios.
In a series of benchmarks on a state-of-the-art 64-way Niagara II multi- core machine, a concurrent version of hopscotch proves to be highly scal-able, delivering in some cases 2 or even 3 times the throughput of today’s most efficient concurrent hash algorithm, Lea’s ConcurrentHashMap from java.concurr.util. Moreover, in tests on both Intel and Sun uni-processor machines, a sequential version of hopscotch consistently outperforms the most effective sequential hash table algorithms including cuckoo hashing and bounded linear probing.
The most interesting feature of the new class of hopscotch algorithms is that they continue to deliver good performance when the hash table is more than 90% full, increasing their advantage over other algorithms as the table density grows.</p>
<h3 id="automatic-construction-of-inlining-heuristics-using-machine-learning-by-kulkarni-cavazos-wimmer-and-simon---coreclr-source-reference"><a href="http://dl.acm.org/citation.cfm?id=2495914"><strong>‘Automatic Construction of Inlining Heuristics using Machine Learning’ by Kulkarni, Cavazos, Wimmer, and Simon.</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/design-docs/inlining-plans.md#profitability">CoreCLR source reference</a></h3>
<p><strong>Abstract</strong>
Method inlining is considered to be one of the most important optimizations in a compiler. However, a poor inlining heuristic can lead to significant degradation of a program’s running time. Therefore, it is important that an inliner has an effective heuristic that controls whether a method is inlined or not. An important component of any inlining heuristic are the features that characterize the inlining decision. These features often correspond to the caller method and the callee methods. However, it is not always apparent what the most important features are for this problem or the relative importance of these features. Compiler writers developing inlining heuristics may exclude critical information that can be obtained during each inlining decision. In this paper, we use a machine learning technique, namely neuro-evolution [18], to automatically induce effective inlining heuristics from a set of features deemed to be useful for inlining. Our learning technique is able to induce novel heuristics that significantly out-perform manually-constructed inlining heuristics. We evaluate the heuristic constructed by our neuro-evolutionary technique within the highly tuned Java HotSpot server compiler and the Maxine VM C1X compiler, and we are able to obtain speedups of up to 89% and 114%, respectively. In addition, we obtain an average speedup of almost 9% and 11% for the Java HotSpot VM and Maxine VM, respectively. However, the output of neuro-evolution, a neural network, is not human readable. We show how to construct more concise and read-able heuristics in the form of decision trees that perform as well as our neuro-evolutionary approach.</p>
<h3 id="a-theory-of-objects-by-luca-cardelli--martín-abadi---coreclr-source-reference"><a href="http://dl.acm.org/citation.cfm?id=547964"><strong>‘A Theory of Objects’ by Luca Cardelli & Martín Abadi</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/5dbaa3cb2e2e11d98924afe9de472469b5136885/Documentation/botr/type-loader.md#11-related-reading">CoreCLR source reference</a></h3>
<p><strong>Abstract</strong>
Procedural languages are generally well understood. Their foundations have been cast in calculi that prove useful in matters of implementation and semantics. So far, an analogous understanding has not emerged for object-oriented languages. In this book the authors take a novel approach to the understanding of object-oriented languages by introducing object calculi and developing a theory of objects around them. The book covers both the semantics of objects and their typing rules, and explains a range of object-oriented concepts, such as self, dynamic dispatch, classes, inheritance, prototyping, subtyping, covariance and contravariance, and method specialization. Researchers and graduate students will find this an important development of the underpinnings of object-oriented programming.</p>
<h3 id="optimized-interval-splitting-in-a-linear-scan-register-allocator-by-wimmer-c-and-mössenböck-d---coreclr-source-reference"><a href="http://dl.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM"><strong>‘Optimized Interval Splitting in a Linear Scan Register Allocator’ by Wimmer, C. and Mössenböck, D.</strong></a> - <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/ryujit-overview.md#register-allocation">CoreCLR source reference</a></h3>
<p><strong>Abstract</strong>
We present an optimized implementation of the linear scan register allocation algorithm for Sun Microsystems’ Java HotSpot™ client compiler. Linear scan register allocation is especially suitable for just-in-time compilers because it is faster than the common graph-coloring approach and yields results of nearly the same quality.Our allocator improves the basic linear scan algorithm by adding more advanced optimizations: It makes use of lifetime holes, splits intervals if the register pressure is too high, and models register constraints of the target architecture with fixed intervals. Three additional optimizations move split positions out of loops, remove register-to-register moves and eliminate unnecessary spill stores. Interval splitting is based on use positions, which also capture the kind of use and whether an operand is needed in a register or not. This avoids the reservation of a scratch register.Benchmark results prove the efficiency of the linear scan algorithm: While the compilation speed is equal to the old local register allocator that is part of the Sun JDK 5.0, integer benchmarks execute about 15% faster. Floating-point benchmarks show the high impact of the Intel SSE2 extensions on the speed of numeric Java applications: With the new SSE2 support enabled, SPECjvm98 executes 25% faster compared with the current Sun JDK 5.0.</p>
<h3 id="extensible-pattern-matching-via-a-lightweight-language-extension-by-don-syme-gregory-neverov-james-margetson---roslyn-source-reference"><a href="https://www.microsoft.com/en-us/research/publication/extensible-pattern-matching-via-a-lightweight-language-extension/"><strong>‘Extensible pattern matching via a lightweight language extension’ by Don Syme, Gregory Neverov, James Margetson</strong></a> - <a href="https://github.com/dotnet/roslyn/blob/614299ff83da9959fa07131c6d0ffbc58873b6ae/docs/features/patterns.md#pattern-matching-for-c">Roslyn source reference</a></h3>
<p><strong>Abstract</strong>
Pattern matching of algebraic data types (ADTs) is a standard feature in typed functional programming languages, but it is well known that it interacts poorly with abstraction. While several partial solutions to this problem have been proposed, few have been implemented or used. This paper describes an extension to the .NET language F# called active patterns, which supports pattern matching over abstract representations of generic heterogeneous data such as XML and term structures, including where these are represented via object models in other .NET languages. Our design is the first to incorporate both ad hoc pattern matching functions for partial decompositions and “views” for total decompositions, and yet remains a simple and lightweight extension. We give a description of the language extension along with numerous motivating examples. Finally we describe how this feature would interact with other reasonable and related language extensions: existential types quantified at data discrimination tags, GADTs, and monadic generalizations of pattern matching.</p>
<h3 id="some-approaches-to-best-match-file-searching-by-w-a-burkhard--r-m-keller---roslyn-source-reference"><a href="http://dl.acm.org/citation.cfm?doid=362003.362025"><strong>‘Some approaches to best-match file searching’ by W. A. Burkhard & R. M. Keller</strong></a> - <a href="https://github.com/dotnet/roslyn/blob/65cc61578e9646cf76a297d8a9e0005afa57378a/src/Workspaces/Core/Portable/Utilities/BKTree.cs#L22">Roslyn source reference</a></h3>
<p><strong>Abstract</strong>
The problem of searching the set of keys in a file to find a key which is closest to a given query key is discussed. After “closest,” in terms of a metric on the the key space, is suitably defined, three file structures are presented together with their corresponding search algorithms, which are intended to reduce the number of comparisons required to achieve the desired result. These methods are derived using certain inequalities satisfied by metrics and by graph-theoretic concepts. Some empirical results are presented which compare the efficiency of the methods.</p>
<hr />
<p>For reference, the links below take you straight the the GitHub searches, so you can take a look yourself:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=pdf+OR+%22et+al.%22+OR+Proceedings+OR+Symposium+OR+Conference+OR+acm.org&type=Code">CoreCLR</a></li>
<li><a href="https://github.com/Microsoft/referencesource/search?utf8=%E2%9C%93&q=pdf+OR+%22et+al.%22+OR+Proceedings+OR+Symposium+OR+Conference+OR+acm.org&type=Code">.NET 4.5 Reference Source</a></li>
<li><a href="https://github.com/dotnet/corefx/search?utf8=%E2%9C%93&q=pdf+OR+%22et+al.%22+OR+Proceedings+OR+Symposium+OR+Conference+OR+acm.org&type=Code">CoreFX</a></li>
<li><a href="https://github.com/dotnet/roslyn/search?utf8=%E2%9C%93&q=pdf+OR+%22et+al.%22+OR+Proceedings+OR+Symposium+OR+Conference+OR+acm.org&type=Code">Roslyn</a></li>
</ul>
<hr />
<h2 id="research-produced-by-work-on-the-net-runtime-or-compiler">Research produced by work on the .NET Runtime or Compiler</h2>
<p>But what about the other way round, are there instances of work being done in .NET that is then turned into a research paper? Well it turns out there is, the first example I came across was from a tweet by <a href="https://twitter.com/xjoeduffyx">Joe Duffy</a>:</p>
<p><a href="https://twitter.com/xjoeduffyx/status/801416374086029312?p=p"><img src="/images/2016/12/Joe Duffy Tweet.png" alt="Joe Duffy tweet about research paper" /></a></p>
<p>(As an aside, I recommend checking out <a href="http://joeduffyblog.com/2015/11/03/blogging-about-midori/">Joe Duffy’s blog</a>, it contains lots of information about <strong>Midori</strong> the research project to build a managed OS!)</p>
<h3 id="applying-control-theory-in-the-real-world---experience-with-building-a-controller-for-the-net-thread-pool-by-joseph-l-hellerstein-vance-morrison-eric-eilebrecht"><a href="http://www.sigmetrics.org/conferences/sigmetrics/2009/workshops/papers_hotmetrics/session2_2.pdf"><strong>‘Applying Control Theory in the Real World - Experience With Building a Controller for the .NET Thread Pool’ by Joseph L. Hellerstein, Vance Morrison, Eric Eilebrecht</strong></a></h3>
<p><strong>Abstract</strong>
There has been considerable interest in using control theory to build web servers, database managers, and other systems. We claim that the potential value of using control theory cannot be realized in practice without a methodology that addresses controller design, testing, and tuning. Based on our experience with building a controller for the .NET thread pool, we develop a methodology that: (a) designs for extensibility to integrate diverse control techniques, (b) scales the test infrastructure to enable running a large number of test cases, (c) constructs test cases for which the ideal controller performance is known a priori so that the outcomes of test cases can be readily assessed, and (d) tunes controller parameters to achieve good results for multiple performance metrics. We conclude by discussing how our methodology can be extended, especially to designing controllers for distributed systems.</p>
<h3 id="uniqueness-and-reference-immutability-for-safe-parallelism-by-colin-s-gordon-matthew-parkinson-jared-parsons-aleks-bromfield--joe-duffy-alternative-link"><a href="http://dl.acm.org/citation.cfm?id=2384619"><strong>‘Uniqueness and Reference Immutability for Safe Parallelism’ by Colin S. Gordon, Matthew Parkinson, Jared Parsons. Aleks Bromfield & Joe Duffy</strong></a> (<a href="https://www.microsoft.com/en-us/research/publication/uniqueness-and-reference-immutability-for-safe-parallelism/">alternative link</a>)</h3>
<p><strong>Abstract</strong>
A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system’s flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.</p>
<h3 id="design-and-implementation-of-generics-for-the-net-common-language-runtime-by-andrew-kennedy-don-syme"><a href="https://www.microsoft.com/en-us/research/publication/design-and-implementation-of-generics-for-the-net-common-language-runtime/"><strong>‘Design and Implementation of Generics for the .NET Common Language Runtime’ by Andrew Kennedy, Don Syme</strong></a></h3>
<p><strong>Abstract</strong>
The Microsoft .NET Common Language Runtime provides a shared type system, intermediate language and dynamic execution environment for the implementation and inter-operation of multiple source languages. In this paper we extend it with direct support for parametric polymorphism (also known as generics), describing the design through examples written in an extended version of the C# programming language, and explaining aspects of implementation by reference to a prototype extension to the runtime. Our design is very expressive, supporting parameterized types, polymorphic static, instance and virtual methods, “F-bounded” type parameters, instantiation at pointer and value types, polymorphic recursion, and exact run-time types. The implementation takes advantage of the dynamic nature of the runtime, performing justin-time type specialization, representation-based code sharing and novel techniques for efficient creation and use of run-time types. Early performance results are encouraging and suggest that programmers will not need to pay an overhead for using generics, achieving performance almost matching hand-specialized code.</p>
<h3 id="securing-the-net-programming-model-industrial-application-by-andrew-kennedy"><a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/appsem-tcs.pdf"><strong>‘Securing the .NET Programming Model (Industrial Application)’ by Andrew Kennedy</strong></a></h3>
<p><strong>Abstract</strong>
The security of the .NET programming model is studied from the standpoint of fully abstract compilation of C#. A number of failures of full abstraction are identified, and fixes described. The most serious problems have recently been fixed for version 2.0 of the .NET Common Language Runtime.</p>
<h3 id="a-study-of-concurrent-real-time-garbage-collectors-by-filip-pizlo-erez-petrank--bjarne-steensgaard-this-features-work-done-as-part-of-midori"><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.353.9594&rep=rep1&type=pdf"><strong>‘A Study of Concurrent Real-Time Garbage Collectors’ by Filip Pizlo, Erez Petrank & Bjarne Steensgaard</strong></a> (this features work done as <a href="http://joeduffyblog.com/2015/12/19/safe-native-code/#gc">part of Midori</a>)</h3>
<p><strong>Abstract</strong>
Concurrent garbage collection is highly attractive for real-time systems, because offloading the collection effort from the executing threads allows faster response, allowing for extremely short deadlines at the microseconds level. Concurrent collectors also offer much better scalability over incremental collectors. The main problem with concurrent real-time collectors is their complexity. The first concurrent real-time garbage collector that can support fine synchronization, STOPLESS, has recently been presented by Pizlo et al. In this paper, we propose two additional (and different) algorithms for concurrent real-time garbage collection: CLOVER and CHICKEN. Both collectors obtain reduced complexity over the first collector STOPLESS, but need to trade a benefit for it. We study the algorithmic strengths and weaknesses of CLOVER and CHICKEN and compare them to STOPLESS. Finally, we have implemented all three collectors on the Bartok compiler and runtime for C# and we present measurements to compare their efficiency and responsiveness.</p>
<h3 id="stopless-a-real-time-garbage-collector-for-multiprocessors-by-filip-pizlo-daniel-frampton-erez-petrank-bjarne-steensgaard"><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.322&rep=rep1&type=pdf"><strong>‘STOPLESS: A Real-Time Garbage Collector for Multiprocessors’ by Filip Pizlo, Daniel Frampton, Erez Petrank, Bjarne Steensgaard</strong></a></h3>
<p><strong>Abstract</strong>
We present STOPLESS: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that supports real-time on modern platforms is notoriously hard, especially if real-time implies lock-freedom. Known real-time collectors either restrict the real-time guarantees to uniprocessors only, rely on special hardware, or just give up supporting atomic operations (which are crucial for lock-free software). STOPLESS is the first collector that provides real-time responsiveness while preserving lock-freedom, supporting atomic operations, controlling fragmentation by compaction, and supporting modern parallel platforms.
STOPLESS is adequate for modern languages such as C# or Java. It was implemented on top of the Bartok compiler and runtime for C# and measurements demonstrate high responsiveness (a factor of a 100 better than previously published systems), virtually no pause times, good mutator utilization, and acceptable overheads.</p>
<hr />
<p>Finally, a full list of MS Research publications related to <a href="https://www.microsoft.com/en-us/research/research-area/programming-languages-software-engineering/?q&content-type=publications&sort_by=most-relevant">‘programming languages and software engineering’</a> is available if you want to explore more of this research yourself.</p>
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=13335658">Hacker News</a></p>
Open Source .NET – 2 years later2016-11-23T00:00:00+00:00http://www.mattwarren.org/2016/11/23/open-source-net-2-years-later
<link rel="stylesheet" href="/datavis/dotnet-oss.css" />
<script src="/datavis/dotnet-oss.js" type="text/javascript"></script>
<p>A little over 2 years ago Microsoft announced that they were <a href="http://www.hanselman.com/blog/AnnouncingNET2015NETAsOpenSourceNETOnMacAndLinuxAndVisualStudioCommunity.aspx">open sourcing large parts of the .NET framework</a> and as <a href="https://twitter.com/shanselman">Scott Hanselman</a> said in his recent <a href="https://channel9.msdn.com/Events/Connect/2016/Keynotes-Scott-Guthrie-and-Scott-Hanselman">Connect keynote</a>, the community has been contributing in a significant way:</p>
<p><a href="https://twitter.com/poweredbyaltnet/status/798942478195970048"><img src="/images/2016/11/Over 60 of the contributions to dotnetcore come from the community.jpg" alt="Over 60% of the contribution to .NET Core come from the community" /></a></p>
<p>You can see some more detail on this number in the talk <a href="https://connectevent.microsoft.com/whats-new-in-the-net-platform/">‘What’s New in the .NET Platform’</a> by Scott Hunter:</p>
<p><img src="/images/2016/11/Connect talk - Community Contributions per month.png" alt="Connect talk - Community Contributions per month" /></p>
<p>This post aims to give more context to those numbers and allow you to explore patterns and trends across different repositories.</p>
<hr />
<h3 id="repository-activity-over-time">Repository activity over time</h3>
<p>First we are going to see an overview of the level of activity in each repo, by looking at the total number of ‘Issues’ (created) or ‘Pull Requests’ (closed) per month. (<a href="http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR">Yay sparklines FTW!!</a>)</p>
<p><strong>Note:</strong> Numbers in <span style="color:rgb(0,0,0);font-weight:bold;">black</span> are from the most recent month, with <span style="color:#d62728;font-weight:bold;">red</span> showing the lowest and <span style="color:#2ca02c;font-weight:bold;">green</span> the highest previous value. You can toggle between <strong>Issues</strong> and <strong>Pull Requests</strong> by clicking on the buttons, hover over individual sparklines to get a tooltip showing the per/month values and click on the project name to take you to the GitHub page for that repository.</p>
<section class="press" align="center">
<!-- <section class="gradient" align="center"> -->
<button id="btnIssues" class="active">Issues</button>
<button id="btnPRs">Pull Requests</button>
</section>
<div id="textbox" class="rChartHeader">
<!-- The Start/End dates are setup dynamically, once the data is loaded -->
<p id="dataStartDate" class="alignleft"></p>
<p id="dataEndDate" class="alignright"></p>
</div>
<div style="clear: both;"></div>
<!-- All the sparklines are added to this div -->
<div id="sparkLines" class="rChart nvd3">
</div>
<p>The main trend I see across all repos is there’s a sustained level of activity for the entire 2 years, things didn’t start with a bang and then tailed off. In addition, many (but not all) repos have a trend of increased activity month-by-month. For instance the PR’s in <strong>CoreFX</strong> or the Issues in <strong>Visual Studio Code (vscode)</strong> are clear example of this, their best months have been the most recent.</p>
<p>Finally one interesting ‘story’ that jumps out of this data is the contrasting levels of activity (PR’s) across the <strong>dnx</strong>, <strong>cli</strong> and <strong>msbuild</strong> repositories, as highlighted in the image below:</p>
<p><img src="/images/2016/11/Comparison of dnx v cli v msbuild.png" alt="Comparison of dnx v cli v msbuild" /></p>
<p>If you don’t know the full story, initially all the cmd-line tooling was known as <strong>dnx</strong>, but in RC2 was <a href="https://docs.microsoft.com/en-us/dotnet/articles/core/migrating-from-dnx">migrated to .NET Core CLI</a>. You can see this on the chart, activity in the <strong>dnx</strong> repo decreased at the same time that work in <strong>cli</strong> ramped up.</p>
<p>Following that, in May this year, the whole idea of having ‘project.json’ files was <a href="https://blogs.msdn.microsoft.com/dotnet/2016/05/23/changes-to-project-json/">abandoned in favour of sticking with ‘msbuild’</a>, you can see this change happen towards the right of the chart, there is a marked increase in the <strong>msbuild</strong> repo activity as any improvements that had been done in <strong>cli</strong> were ported over.</p>
<hr />
<h3 id="methodology---community-v-microsoft">Methodology - Community v. Microsoft</h3>
<p>But the main question I want to answer is:</p>
<blockquote>
<p>How much <strong>Community</strong> involvement has there been since Microsoft open sourced large parts of the .NET framework?</p>
</blockquote>
<p>(See my previous post to see how things <a href="/2016/01/15/open-source-net-1-year-later-now-with-aspnet/">looked after one year</a>)</p>
<p>To do this we need to look at who <strong>opened the Issue</strong> or <strong>created the Pull Request (PR)</strong> and specifically if they worked for Microsoft or not. This is possible because (almost) all Microsoft employees have indicated where they work on their GitHub profile, for instance:</p>
<p><a href="https://github.com/davidfowl"><img src="https://cloud.githubusercontent.com/assets/157298/12374944/b686820c-bca4-11e5-86c8-cf9f1076b45e.png" alt="David Fowler Profile" /></a></p>
<p>There are some notable exceptions, e.g. <a href="https://github.com/shanselman">@shanselman</a> clearly works at Microsoft, but it’s easy enough to allow for cases like this. Before you ask, I only analysed this data, <a href="https://www.troyhunt.com/8-million-github-profiles-were-leaked-from-geekedins-mongodb-heres-how-to-see-yours/">I did not keep a copy of it in stored in MongoDB</a> to sell to recruiters!!</p>
<h3 id="overall-participation---community-v-microsoft">Overall Participation - Community v. Microsoft</h3>
<p>This data represents the total participation from the last 2 years, i.e. <strong>November 2014</strong> to <strong>October 2016</strong>. All Pull Requests are Issues are treated equally, so a large PR counts the same as one that fixes a spelling mistake. Whilst this isn’t ideal it’s the simplest way to get an idea of the Microsoft/Community split.</p>
<p><strong>Note:</strong> You can hover over the bars to get the actual numbers, rather than percentages.</p>
<body>
<!-- TODO do this in css styles, not inline!! -->
<div class="g-chart-issues">
<span style="font-weight:bold;font-size:large;margin-left:150px;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
<div class="g-chart-pull-requests">
<span style="font-weight:bold;font-size:large;margin-left:150px;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:large;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:large;margin-left:5px;"> Community </span>
</div>
</body>
<p>The general pattern these graphs show is that the Community is more likely to open an Issue than submit a PR, which I guess isn’t that surprising given the relative amount of work involved. However it’s clear that the Community is still contributing a considerable amount of work, for instance if you look at the <strong>CoreCLR</strong> repo it <em>only</em> has 21% of PRs from the Community, but this stills account for almost 900!</p>
<p>There’s a few interesting cases that jump out here, for instance <strong>Roslyn</strong> gets 35% of its issues from the Community, but only 6% of its PR’s, clearly getting code into the compiler is a tough task. Likewise it doesn’t seem like the Community is that interested in submitting code to <strong>msbuild</strong>, although it does have my <a href="https://github.com/Microsoft/msbuild/pull/1">favourite PR ever</a>:</p>
<p><a href="https://github.com/Microsoft/msbuild/pull/1"><img src="/images/2016/11/Fix legacy msbuild issues.png" alt="Fix legacy msbuild issues" /></a></p>
<hr />
<h3 id="participation-over-time---community-v-microsoft">Participation over time - Community v. Microsoft</h3>
<p>Finally we can see the ‘per-month’ data from the last 2 years, i.e. <strong>November 2014</strong> to <strong>October 2016</strong>.</p>
<p><strong>Note</strong>: You can inspect different repos by selecting them from the pull-down list, but be aware that the y-axis on the graphs are re-scaled, so the maximum value will change each time.</p>
<div id="issuesGraph">
<!-- TODO do this in css styles, not inline!! -->
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Issues: </span>
<span style="color:#9ecae1;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#3182bd;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
<!-- <form>
<label><input type="radio" name="mode" value="stacked" checked> Stacked</label>
<label><input type="radio" name="mode" value="grouped"> Grouped</label>
</form> -->
</div>
<div id="pullRequestsGraph">
<!-- TODO do this in css styles, not inline!! -->
<span style="font-weight:bold;font-size:larger;margin-left:30px;"> Pull Requests: </span>
<span style="color:#a1d99b;font-weight:bold;font-size:larger;margin-left:5px;"> Microsoft </span>
<span style="color:#31a354;font-weight:bold;font-size:larger;margin-left:5px;"> Community </span>
<!-- <form>
<label><input type="radio" name="mode" value="stacked" checked> Stacked</label>
<label><input type="radio" name="mode" value="grouped"> Grouped</label>
</form> -->
</div>
<p>Whilst not every repo is growing month-by-month, the majority are and those that aren’t at least show sustained contributions across 2 years.</p>
<hr />
<h2 id="summary">Summary</h2>
<p>I think that it’s clear to see that the Community has got on-board with the new Open-Source Microsoft, producing a sustained level of contributions over the last 2 years, lets hope it continues!</p>
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/5eh17t/open_source_net_2_years_later/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2016/11/23/open-source-net-2-years-later/">Open Source .NET – 2 years later</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
How does the 'fixed' keyword work?2016-10-26T00:00:00+00:00http://www.mattwarren.org/2016/10/26/How-does-the-fixed-keyword-work
<p>Well it turns out that it’s a really nice example of collaboration between the main parts of the .NET runtime, here’s a list of all the components involved:</p>
<ul>
<li><a href="#compiler">Compiler</a></li>
<li><a href="#jitter">JITter</a></li>
<li><a href="#clr">CLR</a></li>
<li><a href="#garbage-collector">Garbage Collector (GC)</a></li>
</ul>
<p>Now you could argue that all of these are required to execute any C# code, but what’s interesting about the <code class="language-plaintext highlighter-rouge">fixed</code> keyword is that they all have a <em>specific</em> part to play.</p>
<hr />
<h3 id="compiler">Compiler</h3>
<p>To start with let’s look at one of the most basic scenarios for using the <code class="language-plaintext highlighter-rouge">fixed</code> keyword, directly accessing the contents of a C# <code class="language-plaintext highlighter-rouge">string</code>, (taken from a <a href="https://github.com/dotnet/roslyn/blob/614299ff83da9959fa07131c6d0ffbc58873b6ae/src/Compilers/CSharp/Test/Emit/CodeGen/UnsafeTests.cs#L1467-L1515">Roslyn unit test</a>)</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">System</span><span class="p">;</span>
<span class="k">unsafe</span> <span class="k">class</span> <span class="nc">C</span>
<span class="p">{</span>
<span class="k">static</span> <span class="k">unsafe</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">fixed</span> <span class="p">(</span><span class="kt">char</span><span class="p">*</span> <span class="n">p</span> <span class="p">=</span> <span class="s">"hello"</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(*</span><span class="n">p</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Which the compiler then turns into the following IL:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Code size 34 (0x22)</span>
<span class="p">.</span><span class="nx">maxstack</span> <span class="mi">2</span>
<span class="p">.</span><span class="nx">locals</span> <span class="nx">init</span> <span class="p">(</span><span class="nx">char</span><span class="o">*</span> <span class="nx">V_0</span><span class="p">,</span> <span class="c1">//p</span>
<span class="nx">pinned</span> <span class="nx">string</span> <span class="nx">V_1</span><span class="p">)</span>
<span class="nx">IL_0000</span><span class="p">:</span> <span class="nx">nop</span>
<span class="nx">IL_0001</span><span class="p">:</span> <span class="nx">ldstr</span> <span class="dl">"</span><span class="s2">hello</span><span class="dl">"</span>
<span class="nx">IL_0006</span><span class="p">:</span> <span class="nx">stloc</span><span class="p">.</span><span class="mi">1</span>
<span class="nx">IL_0007</span><span class="p">:</span> <span class="nx">ldloc</span><span class="p">.</span><span class="mi">1</span>
<span class="nx">IL_0008</span><span class="p">:</span> <span class="nx">conv</span><span class="p">.</span><span class="nx">i</span>
<span class="nx">IL_0009</span><span class="p">:</span> <span class="nx">stloc</span><span class="p">.</span><span class="mi">0</span>
<span class="nx">IL_000a</span><span class="p">:</span> <span class="nx">ldloc</span><span class="p">.</span><span class="mi">0</span>
<span class="nx">IL_000b</span><span class="p">:</span> <span class="nx">brfalse</span><span class="p">.</span><span class="nx">s</span> <span class="nx">IL_0015</span>
<span class="nx">IL_000d</span><span class="p">:</span> <span class="nx">ldloc</span><span class="p">.</span><span class="mi">0</span>
<span class="nx">IL_000e</span><span class="p">:</span> <span class="nx">call</span> <span class="dl">"</span><span class="s2">int System.Runtime.CompilerServices.RuntimeHelpers.OffsetToStringData.get</span><span class="dl">"</span>
<span class="nx">IL_0013</span><span class="p">:</span> <span class="nx">add</span>
<span class="nx">IL_0014</span><span class="p">:</span> <span class="nx">stloc</span><span class="p">.</span><span class="mi">0</span>
<span class="nx">IL_0015</span><span class="p">:</span> <span class="nx">nop</span>
<span class="nx">IL_0016</span><span class="p">:</span> <span class="nx">ldloc</span><span class="p">.</span><span class="mi">0</span>
<span class="nx">IL_0017</span><span class="p">:</span> <span class="nx">ldind</span><span class="p">.</span><span class="nx">u2</span>
<span class="nx">IL_0018</span><span class="p">:</span> <span class="nx">call</span> <span class="dl">"</span><span class="s2">void System.Console.WriteLine(char)</span><span class="dl">"</span>
<span class="nx">IL_001d</span><span class="p">:</span> <span class="nx">nop</span>
<span class="nx">IL_001e</span><span class="p">:</span> <span class="nx">nop</span>
<span class="nx">IL_001f</span><span class="p">:</span> <span class="nx">ldnull</span>
<span class="nx">IL_0020</span><span class="p">:</span> <span class="nx">stloc</span><span class="p">.</span><span class="mi">1</span>
<span class="nx">IL_0021</span><span class="p">:</span> <span class="nx">ret</span>
</code></pre></div></div>
<p>Note the <code class="language-plaintext highlighter-rouge">pinned string V_1</code> that the compiler has created for us, it’s made a <em>hidden</em> local variable that holds a reference to the <code class="language-plaintext highlighter-rouge">object</code> we are using in the <code class="language-plaintext highlighter-rouge">fixed</code> statement, which in this case is the string “<em>hello</em>”. The purpose of this pinned local variable will be explained in a moment.</p>
<p>It’s also emitted an call to the <code class="language-plaintext highlighter-rouge">OffsetToStringData</code> getter method (from <code class="language-plaintext highlighter-rouge">System.Runtime.CompilerServices.RuntimeHelpers</code>), which we will cover in more detail when we discuss the <a href="#clr">CLR’s role</a>.</p>
<p>However, as an aside the compiler is also performing an optimisation for us, normally it would <em>wrap</em> the <code class="language-plaintext highlighter-rouge">fixed</code> statement in a <code class="language-plaintext highlighter-rouge">finally</code> block to ensure the pinned local variable is nulled out after controls leaves the scope. But in this case it has determined that is can leave out the <code class="language-plaintext highlighter-rouge">finally</code> statement entirely, from <a href="https://github.com/dotnet/roslyn/blob/614299ff83da9959fa07131c6d0ffbc58873b6ae/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_FixedStatement.cs#L49-L54">LocalRewriter_FixedStatement.cs</a> in the Roslyn source:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// In principle, the cleanup code (i.e. nulling out the pinned variables) is always</span>
<span class="c1">// in a finally block. However, we can optimize finally away (keeping the cleanup</span>
<span class="c1">// code) in cases where both of the following are true:</span>
<span class="c1">// 1) there are no branches out of the fixed statement; and</span>
<span class="c1">// 2) the fixed statement is not in a try block (syntactic or synthesized).</span>
<span class="k">if</span> <span class="p">(</span><span class="nf">IsInTryBlock</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="p">||</span> <span class="nf">HasGotoOut</span><span class="p">(</span><span class="n">rewrittenBody</span><span class="p">))</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="what-is-this-pinned-identifier">What is this pinned identifier?</h3>
<p>Let’s start by looking at the authoritative source, from <a href="http://www.ecma-international.org/publications/standards/Ecma-335.htm">Standard ECMA-335 Common Language Infrastructure (CLI)</a></p>
<blockquote>
<p><strong>II.7.1.2 pinned</strong>
The signature encoding for <strong>pinned</strong> shall appear only in signatures that describe local variables (§II.15.4.1.3). While a method with a <strong>pinned</strong> local variable is executing, the VES shall not relocate the object to which the local refers. That is, if the implementation of the CLI uses a garbage collector that moves objects, the collector shall not move objects that are referenced by an active <strong>pinned</strong> local variable.</p>
<p>[<em>Rationale</em>: If unmanaged pointers are used to dereference managed objects, these objects shall be <strong>pinned</strong>. This happens, for example, when a managed object is passed to a method designed to operate with unmanaged data. <em>end rationale</em>]</p>
<p>VES = Virtual Execution System
CLI = Common Language Infrastructure
CTS = Common Type System</p>
</blockquote>
<p>But if you prefer an explanation in more human readable form (i.e. not from a spec), then this extract from <a href="https://www.amazon.co.uk/NET-Assembler-Serge-Lidin/dp/1430267615/ref=as_li_ss_tl?ie=UTF8&linkCode=sl1&tag=mattonsoft-21&linkId=062fce40a5e1895bba51689c80a6a163">.Net IL Assembler Paperback by Serge Lidin</a> is helpful:</p>
<p><img src="/images/2016/10/Explanation of pinned from .NET IL Assembler book.png" alt="Explanation of pinned from .NET IL Assembler book" /></p>
<p>(Also available on <a href="https://books.google.co.uk/books?id=Xv_0AwAAQBAJ&pg=PA140&lpg=PA140&dq=.net+il+pinned+local+variable&source=bl&ots=Yk262rHHNl&sig=nNmZtNncfcGAnMdBQ5uQLtggNQc&hl=en&sa=X&redir_esc=y#v=onepage&q&f=false">Google Books</a>)</p>
<hr />
<h3 id="clr">CLR</h3>
<p>Arguably the CLR has the easiest job to do (if you accept that it exists as a separate component from the JIT and GC), its job is to provide the offset of the raw <code class="language-plaintext highlighter-rouge">string</code> data via the <a href="https://github.com/dotnet/coreclr/blob/ffeef85a626d7344fd3e2031f749c356db0628d3/src/mscorlib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs#L177-L196"><code class="language-plaintext highlighter-rouge">OffsetToStringData</code> method</a> that is emitted by the compiler.</p>
<p>Now you might be thinking that this method does some complex calculations to determine the exact offset, but nope, it’s hard-coded!! (I told you that <a href="/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/">Strings and the CLR have a <em>Special Relationship</em></a>):</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">OffsetToStringData</span>
<span class="p">{</span>
<span class="c1">// This offset is baked in by string indexer intrinsic, so there is no harm</span>
<span class="c1">// in getting it baked in here as well.</span>
<span class="p">[</span><span class="n">System</span><span class="p">.</span><span class="n">Runtime</span><span class="p">.</span><span class="n">Versioning</span><span class="p">.</span><span class="n">NonVersionable</span><span class="p">]</span>
<span class="k">get</span> <span class="p">{</span>
<span class="c1">// Number of bytes from the address pointed to by a reference to</span>
<span class="c1">// a String to the first 16-bit character in the String. Skip </span>
<span class="c1">// over the MethodTable pointer, & String length. Of course, the </span>
<span class="c1">// String reference points to the memory after the sync block, so </span>
<span class="c1">// don't count that. </span>
<span class="c1">// This property allows C#'s fixed statement to work on Strings.</span>
<span class="c1">// On 64 bit platforms, this should be 12 (8+4) and on 32 bit 8 (4+4).</span>
<span class="cp">#if BIT64
</span> <span class="k">return</span> <span class="m">12</span><span class="p">;</span>
<span class="cp">#else // 32
</span> <span class="k">return</span> <span class="m">8</span><span class="p">;</span>
<span class="cp">#endif // BIT64
</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<h3 id="jitter">JITter</h3>
<p>For the <code class="language-plaintext highlighter-rouge">fixed</code> keyword to work the role of the JITter is to provide information to the GC/Runtime about the lifetimes of variables within a method and in-particular if they are <em>pinned</em> locals. It does this via the <code class="language-plaintext highlighter-rouge">GCInfo</code> data it <a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/ryujit-overview.md#gc-info">creates for every method</a>:</p>
<p><a href="https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/Documentation/botr/ryujit-overview.md#gc-info"><img src="/images/2016/10/GC Info provided by the JIT.png" alt="GC Info provided by the JIT" /></a></p>
<p>To see this in action we have to enable the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md#useful-complus-variables">correct magic flags</a> and then we will see the following:</p>
<div class="language-racket highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">Compiling</span> <span class="mi">0</span> <span class="nv">ConsoleApplication</span><span class="o">.</span><span class="nv">Program::Main</span><span class="o">,</span> <span class="nv">IL</span> <span class="nv">size</span> <span class="nv">=</span> <span class="mi">30</span><span class="o">,</span> <span class="nv">hsh=0x8d66958e</span>
<span class="c1">; Assembly listing for method ConsoleApplication.Program:Main(ref)</span>
<span class="c1">; Emitting BLENDED_CODE for X64 CPU with AVX</span>
<span class="c1">; optimized code</span>
<span class="c1">; rsp based frame</span>
<span class="c1">; partially interruptible</span>
<span class="c1">; Final local variable assignments</span>
<span class="c1">;</span>
<span class="c1">;* V00 arg0 [V00 ] ( 0, 0 ) ref -> zero-ref </span>
<span class="c1">; V01 loc0 [V01,T00] ( 5, 4 ) long -> rcx </span>
<span class="c1">; V02 loc1 [V02 ] ( 3, 3 ) ref -> [rsp+0x20] must-init pinned</span>
<span class="c1">; V03 tmp0 [V03,T01] ( 2, 4 ) long -> rcx </span>
<span class="c1">; V04 OutArgs [V04 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] </span>
<span class="c1">;</span>
<span class="c1">; Lcl frame size = 40</span>
<span class="nv">G_M27250_IG01:</span>
<span class="mi">000000</span> <span class="mi">4883</span><span class="nv">EC28</span> <span class="nv">sub</span> <span class="nv">rsp</span><span class="o">,</span> <span class="mi">40</span>
<span class="mi">000004</span> <span class="mi">33</span><span class="nv">C0</span> <span class="nv">xor</span> <span class="nv">rax</span><span class="o">,</span> <span class="nv">rax</span>
<span class="mi">000006</span> <span class="mi">4889442420</span> <span class="nv">mov</span> <span class="nv">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">rsp+20H</span><span class="p">]</span><span class="o">,</span> <span class="nv">rax</span>
<span class="nv">G_M27250_IG02:</span>
<span class="mi">00000</span><span class="nv">B</span> <span class="mi">488</span><span class="nv">B0C256830B412</span> <span class="nv">mov</span> <span class="nv">rcx</span><span class="o">,</span> <span class="nv">gword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">12B43068H</span><span class="p">]</span> <span class="ss">'hello</span><span class="o">'</span>
<span class="mi">000013</span> <span class="mi">48894</span><span class="nv">C2420</span> <span class="nv">mov</span> <span class="nv">gword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">rsp+20H</span><span class="p">]</span><span class="o">,</span> <span class="nv">rcx</span>
<span class="mi">000018</span> <span class="mi">488</span><span class="nv">B4C2420</span> <span class="nv">mov</span> <span class="nv">rcx</span><span class="o">,</span> <span class="nv">gword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">rsp+20H</span><span class="p">]</span>
<span class="mi">00001</span><span class="nv">D</span> <span class="mi">4885</span><span class="nv">C9</span> <span class="nv">test</span> <span class="nv">rcx</span><span class="o">,</span> <span class="nv">rcx</span>
<span class="mi">000020</span> <span class="mi">7404</span> <span class="nv">je</span> <span class="nv">SHORT</span> <span class="nv">G_M27250_IG03</span>
<span class="mi">000022</span> <span class="mi">4883</span><span class="nv">C10C</span> <span class="nv">add</span> <span class="nv">rcx</span><span class="o">,</span> <span class="mi">12</span>
<span class="nv">G_M27250_IG03:</span>
<span class="mi">000026</span> <span class="mi">0</span><span class="nv">FB709</span> <span class="nv">movzx</span> <span class="nv">rcx</span><span class="o">,</span> <span class="nv">word</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">rcx</span><span class="p">]</span>
<span class="mi">000029</span> <span class="nv">E842FCFFFF</span> <span class="nv">call</span> <span class="nv">System</span><span class="o">.</span><span class="nv">Console:WriteLine</span><span class="p">(</span><span class="nf">char</span><span class="p">)</span>
<span class="mi">00002</span><span class="nv">E</span> <span class="mi">33</span><span class="nv">C0</span> <span class="nv">xor</span> <span class="nv">rax</span><span class="o">,</span> <span class="nv">rax</span>
<span class="mi">000030</span> <span class="mi">4889442420</span> <span class="nv">mov</span> <span class="nv">gword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nf">rsp+20H</span><span class="p">]</span><span class="o">,</span> <span class="nv">rax</span>
<span class="nv">G_M27250_IG04:</span>
<span class="mi">000035</span> <span class="mi">4883</span><span class="nv">C428</span> <span class="nv">add</span> <span class="nv">rsp</span><span class="o">,</span> <span class="mi">40</span>
<span class="mi">000039</span> <span class="nv">C3</span> <span class="nv">ret</span>
<span class="c1">; Total bytes of code 58, prolog size 11 for method ConsoleApplication.Program:Main(ref)</span>
<span class="c1">; ============================================================</span>
<span class="nv">Set</span> <span class="nv">code</span> <span class="nv">length</span> <span class="nv">to</span> <span class="mi">58</span><span class="o">.</span>
<span class="nv">Set</span> <span class="nv">Outgoing</span> <span class="nv">stack</span> <span class="nv">arg</span> <span class="nv">area</span> <span class="nv">size</span> <span class="nv">to</span> <span class="mi">32</span><span class="o">.</span>
<span class="nv">Stack</span> <span class="nv">slot</span> <span class="nv">id</span> <span class="nv">for</span> <span class="nv">offset</span> <span class="mi">32</span> <span class="p">(</span><span class="nf">0x20</span><span class="p">)</span> <span class="p">(</span><span class="nf">sp</span><span class="p">)</span> <span class="p">(</span><span class="nf">pinned</span><span class="o">,</span> <span class="nv">untracked</span><span class="p">)</span> <span class="nv">=</span> <span class="mi">0</span><span class="o">.</span>
<span class="nv">Defining</span> <span class="mi">1</span> <span class="nv">call</span> <span class="nv">sites:</span>
<span class="nv">Offset</span> <span class="mi">0</span><span class="nv">x29</span><span class="o">,</span> <span class="nv">size</span> <span class="mi">5</span><span class="o">.</span>
</code></pre></div></div>
<p>See how in the section titled “<em>Final local variable assignments</em>” is had indicated that the <code class="language-plaintext highlighter-rouge">V02 loc1</code> variable is <code class="language-plaintext highlighter-rouge">must-init pinned</code> and then down at the bottom is has this text:</p>
<blockquote>
<p>Stack slot id for offset 32 (0x20) (sp) (pinned, untracked) = 0.</p>
</blockquote>
<p><strong>Aside</strong>: The JIT has also done some extra work for us and optimised away the call to <code class="language-plaintext highlighter-rouge">OffsetToStringData</code> by inlining it as the assembly code <code class="language-plaintext highlighter-rouge">add rcx, 12</code>. On a slightly related note, previously the <code class="language-plaintext highlighter-rouge">fixed</code> keyword prevented a method from being inlined, but recently that changed, see <a href="https://github.com/dotnet/coreclr/issues/7774">Support inlining method with pinned locals</a> for the full details.</p>
<hr />
<h3 id="garbage-collector">Garbage Collector</h3>
<p>Finally we come to the GC which has an important “<em>role to play</em>”, or “<em>not to play</em>” depending on which way you look at it.</p>
<p>In effect the GC has to get out of the way and leave the pinned local variable alone for the life-time of the method. Normally the GC is concerned about which objects are <em>live</em> or <em>dead</em> so that it knows what it has to clean up. But with pinned objects it has to go one step further, not only must it <em>not clean up</em> the object, but it must <em>not move it around</em>. Generally the GC likes to relocate objects around during the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md#compact-phase">Compact Phase</a> to make memory allocations cheap, but pinning prevents that as the object is being accessed via a pointer and therefore its memory address <em>has</em> to remain the same.</p>
<p>There is a great visual explanation of what that looks like from the excellent presentation <a href="http://slideplayer.com/slide/6084514/">CLR: Garbage Collection Inside Out</a> by <a href="https://blogs.msdn.microsoft.com/maoni/">Maoni Stephens</a> (click for full-sized version):</p>
<p><a href="/images/2016/10/Fragmentation Problem Caused By Pinning.png"><img src="/images/2016/10/Fragmentation Problem Caused By Pinning.png" alt="Fragmentation Problem Caused By Pinning" /></a></p>
<p>Note how the pinned blocks (marked with a ‘P’) have remained where they are, forcing the Gen 0/1/2 segments to start at awkard locations. This is why pinning too many objects and keeping them pinned for too long can cause GC overhead, it has to perform extra booking keeping and work around them.</p>
<p>In reality, when using the <code class="language-plaintext highlighter-rouge">fixed</code> keyword, your object will only remain pinned for a short period of time, i.e. until control leaves the scope. But if you are pinning object via the <a href="https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.gchandle(v=vs.110).aspx"><code class="language-plaintext highlighter-rouge">GCHandle</code> class</a> then the lifetime could be longer.</p>
<p>So to finish, let’s get the final word on pinning from Maoni Stephens, from <a href="https://blogs.msdn.microsoft.com/maoni/2004/12/19/using-gc-efficiently-part-3/">Using GC Efficiently – Part 3</a> (read the blog post for more details):</p>
<blockquote>
<p><strong>When you do need to pin, here are some things to keep in mind:</strong></p>
<ol>
<li>Pinning for a short time is cheap.</li>
<li>Pinning an older object is not as harmful as pinning a young object.</li>
<li>Creating pinned buffers that stay together instead of scattered around. This way you create fewer holes.</li>
</ol>
</blockquote>
<hr />
<h2 id="summary">Summary</h2>
<p>So that’s it, simple really!!</p>
<p>All the main parts of the .NET runtime do their bit and we get to use a handy feature that lets us drop-down and perform some bare-metal coding!!</p>
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/59qa94/how_does_the_c_fixed_keyword_work/">/r/programming</a></p>
<hr />
<h3 id="further-reading">Further Reading</h3>
<p>If you’ve read this far, you might find some of these links useful:</p>
<ul>
<li>CoreCLR Repo Searches
<ul>
<li><a href="https://github.com/dotnet/coreclr/search?q=pinned&utf8=%E2%9C%93">‘pinned’</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?q=GC+lifetimes&utf8=%E2%9C%93">‘GC lifetimes</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?q=path%3A%2Fsrc%2Fgc+pinned&type=Code&utf8=%E2%9C%93">‘path:/src/gc pinned’</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?q=path%3A%2FDocumentation+pinned&type=Code">‘path:/Documentation pinned’</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?q=path%3A%2FDocumentation+%22GC+Info%22">‘path:/Documentation “GC Info”’</a></li>
<li><a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=path%3A%2FDocumentation+GCInfo&type=Code">‘path:/Documentation GCInfo’</a></li>
</ul>
</li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2004/11/04/clearing-up-some-confusion-over-finalization-and-other-areas-in-gc/">Clearing up some confusion over finalization and other areas in GC</a></li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc/">Gen2 free list changes in CLR 4.6 GC</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/7029">Improve StringBuilder ctor(), ctor(int), and ToString() performance.</a> - turns out doing <code class="language-plaintext highlighter-rouge">fixed (char* sourcePtr = &array[0])</code> instead of <code class="language-plaintext highlighter-rouge">fixed(char* sourcePtr = array)</code> can be faster!!</li>
<li><a href="http://stackoverflow.com/questions/26927243/gc-behavior-when-pinning-an-object">GC behavior when pinning an object</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/10/26/How-does-the-fixed-keyword-work/">How does the 'fixed' keyword work?</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Adding a verb to the dotnet CLI tooling2016-10-03T00:00:00+00:00http://www.mattwarren.org/2016/10/03/Adding-a-verb-to-the-dotnet-CLI-tooling
<p>The <code class="language-plaintext highlighter-rouge">dotnet</code> CLI tooling comes with several built-in cmds such as <code class="language-plaintext highlighter-rouge">build</code>, <code class="language-plaintext highlighter-rouge">run</code> and <code class="language-plaintext highlighter-rouge">test</code>, but it turns out it’s possible to add your own verb to that list.</p>
<h3 id="arbitrary-cmds">Arbitrary cmds</h3>
<p>From <a href="https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/intro-to-cli.md#design">Intro to .NET Core CLI - Design</a></p>
<blockquote>
<p>The way the <code class="language-plaintext highlighter-rouge">dotnet</code> driver finds the command it is instructed to run using <code class="language-plaintext highlighter-rouge">dotnet {command}</code> is via a convention; <strong>any executable that is placed in the PATH and is named <code class="language-plaintext highlighter-rouge">dotnet-{command}</code> will be available to the driver</strong>. For example, when you install the CLI toolchain there will be an executable called <code class="language-plaintext highlighter-rouge">dotnet-build</code> in your PATH; when you run <code class="language-plaintext highlighter-rouge">dotnet build</code>, the driver will run the <code class="language-plaintext highlighter-rouge">dotnet-build</code> executable. All of the arguments following the command are passed to the command being invoked. So, in the invocation of <code class="language-plaintext highlighter-rouge">dotnet build --native</code>, the <code class="language-plaintext highlighter-rouge">--native</code> switch will be passed to <code class="language-plaintext highlighter-rouge">dotnet-build</code> executable that will do some action based on it (in this case, produce a single native binary).</p>
<p>This is also the basics of the current extensibility model of the toolchain. <strong>Any executable found in the PATH named in this way, that is as <code class="language-plaintext highlighter-rouge">dotnet-{command}</code>, will be invoked by the <code class="language-plaintext highlighter-rouge">dotnet</code> driver.</strong></p>
</blockquote>
<p><strong>Fun fact:</strong> This means that it’s actually possible to make a <code class="language-plaintext highlighter-rouge">dotnet go</code> command! You just need to make a copy of <code class="language-plaintext highlighter-rouge">go.exe</code> and rename it to <code class="language-plaintext highlighter-rouge">dotnet-go.exe</code></p>
<p><img src="/images/2016/10/dotnet-go-cmd.png" alt="dotnet go cmd" /></p>
<p>Yay <code class="language-plaintext highlighter-rouge">dotnet go</code> (I know, completely useless, but fun none-the-less)!!</p>
<p><img src="/images/2016/10/dotnet-go-cmd-output.png" alt="dotnet go cmd output" /></p>
<p>(and yes before you ask, you can also make <code class="language-plaintext highlighter-rouge">dotnet dotnet</code> work, but please don’t do that!!)</p>
<p>With regards to documentation, there’s further information in the <a href="https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/developer-guide.md#adding-a-command">‘Adding a Command’ section</a> of the Developer Guide. Also the <a href="https://github.com/dotnet/cli/tree/rel/1.0.0/src/Microsoft.DotNet.Tools.Test">source code</a> of the <code class="language-plaintext highlighter-rouge">dotnet test</code> command is a really useful reference and helped me out several times.</p>
<hr />
<p>Before I go any further I just want to acknowledge the 2 blog posts listed below. They show you how to build a custom command that will compresses all the images in the current directory and how to make it available to the <code class="language-plaintext highlighter-rouge">dotnet</code> tooling as a NuGet package:</p>
<ul>
<li><a href="http://dotnetthoughts.net/using-nuget-packages-in-aspnet-core/">Using nuget packages in ASP.NET Core</a></li>
<li><a href="http://dotnetthoughts.net/building-a-custom-dotnet-cli-tool/">Building a custom dotnet cli tool</a></li>
</ul>
<p>However they don’t explain how to interact with the current project or access it’s output. This is what I wanted to do, so this post will pick up where those posts left off.</p>
<hr />
<h3 id="information-about-the-current-project">Information about the current Project</h3>
<p>Any effective <code class="language-plaintext highlighter-rouge">dotnet</code> verb needs to know about the project it is running in and helpfully those kind developers at Microsoft have created some useful classes that will parse and examine a <code class="language-plaintext highlighter-rouge">project.json</code> file (available in the <a href="https://www.nuget.org/packages/Microsoft.DotNet.ProjectModel">Microsoft.DotNet.ProjectModel</a> NuGet package). It’s pretty simple to work with, just a few lines of code and you’re able to access the entire <a href="https://github.com/dotnet/cli/blob/rel/1.0.0/src/Microsoft.DotNet.ProjectModel/Project.cs">Project model</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Project</span> <span class="n">project</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">currentDirectory</span> <span class="p">=</span> <span class="n">Directory</span><span class="p">.</span><span class="nf">GetCurrentDirectory</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ProjectReader</span><span class="p">.</span><span class="nf">TryGetProject</span><span class="p">(</span><span class="n">currentDirectory</span><span class="p">,</span> <span class="k">out</span> <span class="n">project</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">project</span><span class="p">.</span><span class="n">Files</span><span class="p">.</span><span class="n">SourceFiles</span><span class="p">.</span><span class="nf">Any</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Files:"</span><span class="p">);</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">file</span> <span class="k">in</span> <span class="n">project</span><span class="p">.</span><span class="n">Files</span><span class="p">.</span><span class="n">SourceFiles</span><span class="p">)</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">" {0}"</span><span class="p">,</span> <span class="n">file</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="n">currentDirectory</span><span class="p">,</span> <span class="s">""</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">project</span><span class="p">.</span><span class="n">Dependencies</span><span class="p">.</span><span class="nf">Any</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Dependencies:"</span><span class="p">);</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">dependancy</span> <span class="k">in</span> <span class="n">project</span><span class="p">.</span><span class="n">Dependencies</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">" {0} - Line:{1}, Column:{2}"</span><span class="p">,</span>
<span class="n">dependancy</span><span class="p">.</span><span class="n">SourceFilePath</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="n">currentDirectory</span><span class="p">,</span> <span class="s">""</span><span class="p">),</span>
<span class="n">dependancy</span><span class="p">.</span><span class="n">SourceLine</span><span class="p">,</span>
<span class="n">dependancy</span><span class="p">.</span><span class="n">SourceColumn</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="building-a-project">Building a Project</h3>
<p>In addition to knowing about the current project, we need to ensure it successfully builds before we can do anything else with it. Fortunately this is also simple thanks to the <a href="https://www.nuget.org/packages/Microsoft.DotNet.Cli.Utils/">Microsoft.DotNet.Cli.Utils</a> NuGet package (along with further help from <code class="language-plaintext highlighter-rouge">Microsoft.DotNet.ProjectModel</code> which provides the <code class="language-plaintext highlighter-rouge">BuildWorkspace</code>):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Create a workspace</span>
<span class="kt">var</span> <span class="n">workspace</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">BuildWorkspace</span><span class="p">(</span><span class="n">ProjectReaderSettings</span><span class="p">.</span><span class="nf">ReadFromEnvironment</span><span class="p">());</span>
<span class="c1">// Fetch the ProjectContexts</span>
<span class="kt">var</span> <span class="n">projectPath</span> <span class="p">=</span> <span class="n">project</span><span class="p">.</span><span class="n">ProjectFilePath</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">runtimeIdentifiers</span> <span class="p">=</span>
<span class="n">RuntimeEnvironmentRidExtensions</span><span class="p">.</span><span class="nf">GetAllCandidateRuntimeIdentifiers</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">projectContexts</span> <span class="p">=</span> <span class="n">workspace</span><span class="p">.</span><span class="nf">GetProjectContextCollection</span><span class="p">(</span><span class="n">projectPath</span><span class="p">)</span>
<span class="p">.</span><span class="nf">EnsureValid</span><span class="p">(</span><span class="n">projectPath</span><span class="p">)</span>
<span class="p">.</span><span class="n">FrameworkOnlyContexts</span>
<span class="p">.</span><span class="nf">Select</span><span class="p">(</span><span class="n">c</span> <span class="p">=></span> <span class="n">workspace</span><span class="p">.</span><span class="nf">GetRuntimeContext</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">runtimeIdentifiers</span><span class="p">))</span>
<span class="p">.</span><span class="nf">ToList</span><span class="p">();</span>
<span class="c1">// Setup the build arguments</span>
<span class="kt">var</span> <span class="n">projectContextToBuild</span> <span class="p">=</span> <span class="n">projectContexts</span><span class="p">.</span><span class="nf">First</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">cmdArgs</span> <span class="p">=</span> <span class="k">new</span> <span class="n">List</span><span class="p"><</span><span class="kt">string</span><span class="p">></span>
<span class="p">{</span>
<span class="n">projectPath</span><span class="p">,</span>
<span class="s">"--configuration"</span><span class="p">,</span> <span class="s">"Release"</span><span class="p">,</span>
<span class="s">"--framework"</span><span class="p">,</span> <span class="n">projectContextToBuild</span><span class="p">.</span><span class="n">TargetFramework</span><span class="p">.</span><span class="nf">ToString</span><span class="p">()</span>
<span class="p">};</span>
<span class="c1">// Build!!</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Building Project for {0}"</span><span class="p">,</span> <span class="n">projectContextToBuild</span><span class="p">.</span><span class="n">RuntimeIdentifier</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="n">Command</span><span class="p">.</span><span class="nf">CreateDotNet</span><span class="p">(</span><span class="s">"build"</span><span class="p">,</span> <span class="n">cmdArgs</span><span class="p">).</span><span class="nf">Execute</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Build {0}"</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">ExitCode</span> <span class="p">==</span> <span class="m">0</span> <span class="p">?</span> <span class="s">"SUCCEEDED"</span> <span class="p">:</span> <span class="s">"FAILED"</span><span class="p">);</span>
</code></pre></div></div>
<p>When this runs you get the familiar <code class="language-plaintext highlighter-rouge">dotnet build</code> output if it successfully builds or any error/diagnostic messages if not.</p>
<h3 id="integrating-with-benchmarkdotnet">Integrating with BenchmarkDotNet</h3>
<p>Now that we know the project has produced an .exe or .dll, we can finally wire-up <a href="https://perfdotnet.github.io/BenchmarkDotNet">BenchmarkDotNet</a> and get it to execute the benchmarks for us:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Running BenchmarkDotNet"</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">benchmarkAssemblyPath</span> <span class="p">=</span>
<span class="n">projectContextToBuild</span><span class="p">.</span><span class="nf">GetOutputPaths</span><span class="p">(</span><span class="n">config</span><span class="p">).</span><span class="n">RuntimeFiles</span><span class="p">.</span><span class="n">Assembly</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">benchmarkAssembly</span> <span class="p">=</span>
<span class="n">AssemblyLoadContext</span><span class="p">.</span><span class="n">Default</span><span class="p">.</span><span class="nf">LoadFromAssemblyPath</span><span class="p">(</span><span class="n">benchmarkAssemblyPath</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Successfully loaded: {0}\n"</span><span class="p">,</span> <span class="n">benchmarkAssembly</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">switcher</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">BenchmarkSwitcher</span><span class="p">(</span><span class="n">benchmarkAssembly</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">summary</span> <span class="p">=</span> <span class="n">switcher</span><span class="p">.</span><span class="nf">Run</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">Exception</span> <span class="n">ex</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Error running BenchmarkDotNet"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">ex</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Because BenchmarkDotNet is a command-line tool we don’t actually need to do much work. It’s just a case of creating a <code class="language-plaintext highlighter-rouge">BenchmarkSwitcher</code>, giving it a reference to the dll that contains the benchmarks and then passing in the command line arguments. BenchmarkDotNet will then do the rest of the work for us!</p>
<p>However if you need to parse command line arguments yourself I’d recommend re-using the existing <a href="https://github.com/dotnet/cli/tree/a3a58423d19b01f113af0cc2cc2731c0e6e67082/src/dotnet/CommandLine">helper classes</a> as they make life much easier and will ensure that your tool fits in with the <code class="language-plaintext highlighter-rouge">dotnet</code> tooling ethos.</p>
<h3 id="the-final-result">The final result</h3>
<p>Finally, to test it out, we’ll use a <a href="https://gist.github.com/mattwarren/7a9628105a85274cb7d3236d43274ce4">simple test app</a> from the BenchmarkDotNet <a href="https://perfdotnet.github.io/BenchmarkDotNet/GettingStarted.htm">Getting Started Guide</a>, with the following in the <a href="https://gist.github.com/mattwarren/74b1be5baf812cc692b86f0987efd873">project.json</a> file (note the added <code class="language-plaintext highlighter-rouge">tools</code> section):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"version": "1.0.0-*",
"buildOptions": {
"emitEntryPoint": true
},
"dependencies": {
"Microsoft.NETCore.App": {
"type": "platform",
"version": "1.0.0-rc2-3002702"
},
"BenchmarkDotNet": "0.9.9"
},
"frameworks": {
"netcoreapp1.0": {
"imports": "dnxcore50"
}
},
"tools": {
"BenchmarkCommand": "1.0.0"
}
}
</code></pre></div></div>
<p>Then after doing a <code class="language-plaintext highlighter-rouge">dotnet restore</code>, we can finally run our new <code class="language-plaintext highlighter-rouge">dotnet benchmark</code> command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>λ dotnet benchmark --class Md5VsSha256
Building Project - BenchmarkCommandTest
Project BenchmarkCommandTest (.NETCoreApp,Version=v1.0) will be compiled because expected outputs are missing
Compiling BenchmarkCommandTest for .NETCoreApp,Version=v1.0
Compilation succeeded.
0 Warning(s)
0 Error(s)
Time elapsed 00:00:00.9760886
Build SUCCEEDED
Running BenchmarkDotNet
C:\Projects\BenchmarkCommandTest\bin\Release\netcoreapp1.0\BenchmarkCommandTest.dll
Successfully loaded: BenchmarkCommandTest, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
Target type: Md5VsSha256
// ***** BenchmarkRunner: Start *****
// Found benchmarks:
// Md5VsSha256_Sha256
// Md5VsSha256_Md5
// Validating benchmarks:
// **************************
// Benchmark: Md5VsSha256_Sha256
// *** Generate ***
// Result = Success
// BinariesDirectoryPath = C:\Projects\BDN.Auto\binaries
// *** Build ***
// Result = Success
// *** Execute ***
// Launch: 1
// Benchmark Process Environment Information:
// CLR=CORE, Arch=64-bit ? [RyuJIT]
// GC=Concurrent Workstation
...
</code></pre></div></div>
<p>If you’ve used <a href="https://perfdotnet.github.io/BenchmarkDotNet">BenchmarkDotNet</a> before you’ll recognise its output, if not it’s output is all the lines starting with <code class="language-plaintext highlighter-rouge">//</code>. A final note, currently the Console colours from the command aren’t displayed, but that <a href="https://github.com/dotnet/cli/issues/1977#issuecomment-248635335">should be fixed sometime soon</a>, which is great because BenchmarkDotNet looks way better in full-colour!!</p>
<hr />
<p>Discuss this post in <a href="https://www.reddit.com/r/csharp/comments/55oljz/adding_a_verb_to_the_dotnet_cli_tooling/">/r/csharp</a></p>
<p>The post <a href="http://www.mattwarren.org/2016/10/03/Adding-a-verb-to-the-dotnet-CLI-tooling/">Adding a verb to the dotnet CLI tooling</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Optimising LINQ2016-09-29T00:00:00+00:00http://www.mattwarren.org/2016/09/29/Optimising-LINQ
<h3 id="whats-the-problem-with-linq">What’s the problem with LINQ?</h3>
<p>As outlined by <a href="https://twitter.com/xjoeduffyx">Joe Duffy</a>, LINQ introduces inefficiencies in the form of <strong>hidden allocations</strong>, from <a href="http://joeduffyblog.com/2010/09/06/the-premature-optimization-is-evil-myth/">The ‘premature optimization is evil’ myth</a>:</p>
<blockquote>
<p>To take an example of a technology that I am quite supportive of, but that makes writing inefficient code very easy, let’s look at LINQ-to-Objects. Quick, how many inefficiencies are introduced by this code?</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span><span class="p">[]</span> <span class="nf">Scale</span><span class="p">(</span><span class="kt">int</span><span class="p">[]</span> <span class="n">inputs</span><span class="p">,</span> <span class="kt">int</span> <span class="n">lo</span><span class="p">,</span> <span class="kt">int</span> <span class="n">hi</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">var</span> <span class="n">results</span> <span class="p">=</span> <span class="k">from</span> <span class="n">x</span> <span class="k">in</span> <span class="n">inputs</span>
<span class="k">where</span> <span class="p">(</span><span class="n">x</span> <span class="p">>=</span> <span class="n">lo</span><span class="p">)</span> <span class="p">&&</span> <span class="p">(</span><span class="n">x</span> <span class="p"><=</span> <span class="n">hi</span><span class="p">)</span>
<span class="k">select</span> <span class="p">(</span><span class="n">x</span> <span class="p">*</span> <span class="n">c</span><span class="p">);</span>
<span class="k">return</span> <span class="n">results</span><span class="p">.</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div> </div>
</blockquote>
<p>Good question, who knows, probably only <a href="http://stackoverflow.com/users?tab=Reputation&filter=all">Jon Skeet</a> can tell just by looking at the code!! So to fully understand the problem we need to take a look at what the compiler is doing for us <em>behind-the-scenes</em>, the code above ends up looking something like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="kt">int</span><span class="p">[]</span> <span class="nf">Scale</span><span class="p">(</span><span class="kt">int</span><span class="p">[]</span> <span class="n">inputs</span><span class="p">,</span> <span class="kt">int</span> <span class="n">lo</span><span class="p">,</span> <span class="kt">int</span> <span class="n">hi</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="p"><></span><span class="n">c__DisplayClass0_0</span> <span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">;</span>
<span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span> <span class="p">=</span> <span class="k">new</span> <span class="p"><></span><span class="nf">c__DisplayClass0_0</span><span class="p">();</span>
<span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">.</span><span class="n">lo</span> <span class="p">=</span> <span class="n">lo</span><span class="p">;</span>
<span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">.</span><span class="n">hi</span> <span class="p">=</span> <span class="n">hi</span><span class="p">;</span>
<span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">.</span><span class="n">c</span> <span class="p">=</span> <span class="n">c</span><span class="p">;</span>
<span class="k">return</span> <span class="n">inputs</span>
<span class="p">.</span><span class="n">Where</span><span class="p"><</span><span class="kt">int</span><span class="p">>(</span><span class="k">new</span> <span class="n">Func</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">bool</span><span class="p">>(</span><span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">.<</span><span class="n">Scale</span><span class="p">></span><span class="n">b__0</span><span class="p">))</span>
<span class="p">.</span><span class="n">Select</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">>(</span><span class="k">new</span> <span class="n">Func</span><span class="p"><</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">>(</span><span class="n">CS</span><span class="p"><></span><span class="m">8</span><span class="n">__locals0</span><span class="p">.<</span><span class="n">Scale</span><span class="p">></span><span class="n">b__1</span><span class="p">))</span>
<span class="p">.</span><span class="n">ToArray</span><span class="p"><</span><span class="kt">int</span><span class="p">>();</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">CompilerGenerated</span><span class="p">]</span>
<span class="k">private</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">c__DisplayClass0_0</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">hi</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">lo</span><span class="p">;</span>
<span class="k">internal</span> <span class="kt">bool</span> <span class="p"><</span><span class="n">Scale</span><span class="p">></span><span class="nf">b__0</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">((</span><span class="n">x</span> <span class="p">>=</span> <span class="k">this</span><span class="p">.</span><span class="n">lo</span><span class="p">)</span> <span class="p">&&</span> <span class="p">(</span><span class="n">x</span> <span class="p"><=</span> <span class="k">this</span><span class="p">.</span><span class="n">hi</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">internal</span> <span class="kt">int</span> <span class="p"><</span><span class="n">Scale</span><span class="p">></span><span class="nf">b__1</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">x</span> <span class="p">*</span> <span class="k">this</span><span class="p">.</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>As you can see we have an extra <code class="language-plaintext highlighter-rouge">class</code> allocated and some <code class="language-plaintext highlighter-rouge">Func's</code> to perform the actual logic. But this doesn’t even account for the overhead of the <code class="language-plaintext highlighter-rouge">ToArray()</code> call, using iterators and calling LINQ methods via dynamic dispatch. As an aside, if you are interested in finding out more about closures it’s worth reading Jon Skeet’s excellent blog post <a href="http://csharpindepth.com/Articles/Chapter5/Closures.aspx">“The Beauty of Closures”</a>.</p>
<p>So there’s <em>a lot</em> going on behind the scenes, but it is actually possible to be shown these <em>hidden allocations</em> directly in Visual Studio. If you install the excellent <a href="https://blog.jetbrains.com/dotnet/2014/06/06/heap-allocations-viewer-plugin/">Heap Allocation Viewer</a> plugin for Resharper, you will get the following tool-tip right in the IDE:</p>
<p><a href="/images/2016/09/LINQ Optimisations - Heap Allocations Viewer - Joe Duffy Scale Method.png"><img src="/images/2016/09/LINQ Optimisations - Heap Allocations Viewer - Joe Duffy Scale Method.png" alt="Heap Allocations Viewer - Joe Duffy Scale Method" /></a></p>
<p>As useful as it is though, I wouldn’t recommend turning this on all the time as seeing all those <font color="#FF0000" style="font-weight: bold;">red lines</font> under your code tends to make you a bit paranoid!!</p>
<p><strong>Aside</strong>: If you don’t have Resharper, there is a <a href="https://github.com/mjsabby/RoslynClrHeapAllocationAnalyzer">Roslyn based Heap Allocation Analyser</a> available that provides similar functionality.</p>
<p>Now before we look at some ways you can reduce the impact of LINQ, it’s worth pointing out that LINQ itself does some pretty neat tricks (HT to Oren Novotny for <a href="https://twitter.com/onovotny/status/777785367718141952">pointing this out to me</a>). For instance the common pattern of having a <code class="language-plaintext highlighter-rouge">Where(..)</code> followed by a <code class="language-plaintext highlighter-rouge">Select(..)</code> is <a href="https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Where.cs#L359-L422">optimised so that only a single iterator is used</a>, not two as you would expect. Likewise two <code class="language-plaintext highlighter-rouge">Select(..)</code> statements in a row are combined, <a href="https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs#L86-L89">so that only a one iterator is needed</a>.</p>
<hr />
<h3 id="a-note-on-micro-optimisations">A note on micro-optimisations</h3>
<p>Whenever I write a post like this I inevitably get comments complaining that it’s an “<em>premature optimisation</em>” or something similar. So this time I just want to add the following caveat:</p>
<blockquote>
<p>I am <strong>not</strong> in any way advocating that LINQ is a bad thing, I think it’s fantastic feature of the C# language!</p>
</blockquote>
<p>Also:</p>
<blockquote>
<p>Please <strong>do not</strong> re-write any of your code based purely on the results of some micro-benchmarks!</p>
</blockquote>
<p>As I explain in <a href="http://www.skillsmatter.com/skillscasts/7809-performance-is-a-feature">one of my talks</a>, you should always <strong>profile</strong> first and then <strong>benchmark</strong>. If you do it the other way round there is a temptation to optimise where it’s not needed.</p>
<div style="text-align:center;">
<iframe src="//www.slideshare.net/slideshow/embed_code/key/LdInjrOoAs9K7U?startSlide=22" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen=""> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/mattwarren/performance-is-a-feature-london-net-user-group" title="Performance is a feature! - London .NET User Group" target="_blank">Performance is a feature! - London .NET User Group</a> </strong> from <strong><a target="_blank" href="//www.slideshare.net/mattwarren">Matt Warren</a></strong> </div>
</div>
<p>Having said all that, the C# Compiler (Roslyn) <a href="https://github.com/dotnet/roslyn/wiki/Contributing-Code#coding-conventions">coding guidelines</a> do actually state the following:</p>
<blockquote>
<p>Avoid allocations in compiler hot paths:</p>
<ul>
<li><strong>Avoid LINQ.</strong></li>
<li>Avoid using foreach over collections that do not have a struct enumerator.</li>
<li>Consider using an object pool. There are many usages of object pools in the compiler to see an example.</li>
</ul>
</blockquote>
<p>Which is slightly ironic considering this advice comes from the same people who conceived and designed LINQ in the first place! But as outlined in the excellent talk <a href="https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333">“Essential Truths Everyone Should Know about Performance in a Large Managed Codebase”</a>, they found LINQ has a noticeable cost.</p>
<p>Note: <strong>Hot paths</strong> are another way of talking about the <strong>critical 3%</strong> from the <a href="http://c2.com/cgi/wiki?PrematureOptimization">famous Donald Knuth quote</a>:</p>
<blockquote>
<p>We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. <strong>Yet we should not pass up our opportunities in that critical 3%.</strong></p>
</blockquote>
<hr />
<h3 id="roslynlinqrewrite-and-linqoptimizer">RoslynLinqRewrite and LinqOptimizer</h3>
<p>Now clearly we could manually re-write any LINQ statement into an iterative version if we were concerned about performance, but wouldn’t it be much nicer if there were tools that could do the hard work for us? Well it turns out there are!</p>
<p>First up is <a href="https://github.com/antiufo/roslyn-linq-rewrite">RoslynLinqRewrite</a>, as per the project page:</p>
<blockquote>
<p>This tool compiles C# code by first rewriting the syntax trees of LINQ expressions using plain procedural code, minimizing allocations and dynamic dispatch.</p>
</blockquote>
<p>Also available is the <a href="http://nessos.github.io/LinqOptimizer/">Nessos LinqOptimizer</a> which is:</p>
<blockquote>
<p>An automatic query optimizer-compiler for Sequential and Parallel LINQ. LinqOptimizer compiles declarative LINQ queries into fast loop-based imperative code. The compiled code has fewer virtual calls and heap allocations, better data locality and speedups of up to 15x (Check the <a href="https://github.com/nessos/LinqOptimizer/wiki/Performance">Performance</a> page).</p>
</blockquote>
<p>At a high-level, the main differences between them are:</p>
<ul>
<li>RoslynLinqRewrite
<ul>
<li>works at <strong>compile</strong> time (but prevents incremental compilation of your project)</li>
<li>no code changes, except if you want to opt out via <code class="language-plaintext highlighter-rouge">[NoLinqRewrite]</code></li>
</ul>
</li>
<li>LinqOptimiser
<ul>
<li>works at <strong>run-time</strong></li>
<li>forces you to add <code class="language-plaintext highlighter-rouge">AsQueryExpr().Run()</code> to LINQ methods</li>
<li>optimises Parallel LINQ</li>
</ul>
</li>
</ul>
<p>In the rest of the post will look at the tools in more detail and analyse their performance.</p>
<h3 id="comparison-of-linq-support">Comparison of LINQ support</h3>
<p>Obviously before choosing either tool you want to be sure that it’s actually going to optimise the LINQ statements you have in your code base. However neither tool supports the whole range of available <a href="https://msdn.microsoft.com/en-us/library/bb397927.aspx">LINQ Query Expressions</a>, as the chart below illustrates:</p>
<span class="compactTable">
<table>
<thead>
<tr>
<th style="text-align: right">Method</th>
<th style="text-align: center"><a href="https://github.com/antiufo/roslyn-linq-rewrite#supported-linq-methods">RoslynLinqRewrite</a></th>
<th style="text-align: center"><a href="https://github.com/nessos/LinqOptimizer/blob/master/src/LinqOptimizer.CSharp/Extensions.cs#L304-L604">LinqOptimiser</a></th>
<th style="text-align: center">Both?</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">Select</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">Where</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">ToList</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">ToArray</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">Count</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">ForEach</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: right">Reverse</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Cast</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">OfType</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">First/FirstOrDefault</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Single/SingleOrDefault</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Last/LastOrDefault</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">ToDictionary</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">LongCount</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Any</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">All</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">ElementAt/ElementAtOrDefault</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Contains</td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Aggregate</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Sum</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">SelectMany</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Take/TakeWhile</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">Skip/SkipWhile</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">GroupBy</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">OrderBy/OrderByDescending</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right">ThenBy/ThenByDescending</td>
<td style="text-align: center"><span class="False">✗</span></td>
<td style="text-align: center"><span class="True">✓</span></td>
<td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: right"><strong>Total</strong></td>
<td style="text-align: center"><strong>22</strong></td>
<td style="text-align: center"><strong>18</strong></td>
<td style="text-align: center"><strong>6</strong></td>
</tr>
</tbody>
</table>
</span>
<hr />
<h3 id="performance-results">Performance Results</h3>
<p>Finally we get to the main point of this blog post, how do the different tools perform, do they achieve their stated goals of optimising LINQ queries and reducing allocations?</p>
<p>Let’s start with a very common scenario, using LINQ to filter and map a sequence of numbers, i.e. in C#:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">results</span> <span class="p">=</span> <span class="n">items</span><span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">i</span> <span class="p">=></span> <span class="n">i</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Select</span><span class="p">(</span><span class="n">i</span> <span class="p">=></span> <span class="n">i</span> <span class="p">+</span> <span class="m">5</span><span class="p">);</span>
</code></pre></div></div>
<p>We will compare the LINQ code above with the 2 optimised versions, plus an iterative form that will serve as our baseline. Here are the results:</p>
<p><a href="/images/2016/09/LINQ Optimisations - Where Select Benchmarks.png"><img src="/images/2016/09/LINQ Optimisations - Where Select Benchmarks.png" alt="LINQ Optimisations - Where Select Benchmarks" /></a></p>
<p>(Full <a href="https://gist.github.com/mattwarren/e528bc7c43864baad93ff33eb038005b">benchmark code</a>)</p>
<p>The first things that jumps out is that the <strong>LinqOptimiser</strong> version is allocating <strong>a lot</strong> of memory compared to the others. To see why this is happening we need to look at the code it generates, which looks something like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="nf">LinqOptimizer</span><span class="p">(</span><span class="kt">int</span> <span class="p">[]</span> <span class="n">input</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">collector</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Nessos</span><span class="p">.</span><span class="n">LinqOptimizer</span><span class="p">.</span><span class="n">Core</span><span class="p">.</span><span class="n">ArrayCollector</span><span class="p"><</span><span class="kt">int</span><span class="p">>();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">counter</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">counter</span> <span class="p"><</span> <span class="n">input</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">counter</span><span class="p">++)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="n">input</span><span class="p">[</span><span class="n">counter</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="n">i</span> <span class="p">+</span> <span class="m">5</span><span class="p">;</span>
<span class="n">collector</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">collector</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This issue is that by default, <code class="language-plaintext highlighter-rouge">ArrayCollector</code> allocates a <code class="language-plaintext highlighter-rouge">int[1024]</code> as it’s <a href="https://github.com/nessos/LinqOptimizer/blob/7ccb3a5c032daab18a1438299cae5a7a53e7fc26/src/LinqOptimizer.Core/Collector.fs#L19-L20">backing storage</a>, hence the excessive allocations!</p>
<p>By contrast <strong>RoslynLinqRewrite</strong> optimises the code like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">IEnumerable</span><span class="p"><</span><span class="kt">int</span><span class="p">></span> <span class="nf">RoslynLinqRewriteWhereSelect_ProceduralLinq1</span><span class="p">(</span><span class="kt">int</span><span class="p">[]</span> <span class="n">_linqitems</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">_linqitems</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span>
<span class="k">throw</span> <span class="k">new</span> <span class="n">System</span><span class="p">.</span><span class="nf">ArgumentNullException</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">_index</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">_index</span> <span class="p"><</span> <span class="n">_linqitems</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">_index</span><span class="p">++)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">_linqitem</span> <span class="p">=</span> <span class="n">_linqitems</span><span class="p">[</span><span class="n">_index</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">_linqitem</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">_linqitem1</span> <span class="p">=</span> <span class="n">_linqitem</span> <span class="p">+</span> <span class="m">5</span><span class="p">;</span>
<span class="k">yield</span> <span class="k">return</span> <span class="n">_linqitem1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Which is much more sensible! By using the <code class="language-plaintext highlighter-rouge">yield</code> keyword it gets the compiler to do the hard work and so doesn’t have to allocate a temporary list to store the results in. This means that it is <em>streaming</em> the values, in the same way the original LINQ code does.</p>
<p>Lastly we’ll look at one more example, this time using a <code class="language-plaintext highlighter-rouge">Count()</code> expression, i.e.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">items</span><span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">i</span> <span class="p">=></span> <span class="n">i</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Count</span><span class="p">();</span>
</code></pre></div></div>
<p>Here we can clearly see that both tools significantly reduce the allocations compared to the original LINQ code:</p>
<p><a href="/images/2016/09/LINQ Optimisations - Count Benchmarks.png"><img src="/images/2016/09/LINQ Optimisations - Count Benchmarks.png" alt="LINQ Optimisations - Count Benchmarks" /></a></p>
<p>(Full <a href="https://gist.github.com/mattwarren/4c2b2e3585f8b9ad0f95a2a676c552bd">benchmark code</a>)</p>
<hr />
<h3 id="future-options">Future options</h3>
<p>However even though using <strong>RoslynLinqRewrite</strong> or <strong>LinqOptimiser</strong> is pretty painless, we still have to install a 3rd party library into our project.</p>
<p>Wouldn’t it be even nicer if the .NET compiler, JITter and/or runtime did all the optimisations for us?</p>
<p>Well it’s certainly possible, as Joe Duffy explains in his <a href="https://www.infoq.com/news/2016/06/systems-programming-qcon">QCon New York talk</a> and <a href="https://github.com/dotnet/coreclr/pull/6653">work has already started</a> so maybe we won’t have to wait too long!!</p>
<hr />
<p>Discuss this post in <a href="https://www.reddit.com/r/programming/comments/551lqy/optimising_linq/">/r/programming</a></p>
<hr />
<h3 id="further-reading">Further Reading:</h3>
<ul>
<li>Options for LINQ optimisation from <a href="https://github.com/dotnet/roslyn/issues/10378#issuecomment-247538865">State / Direction of C# as a High-Performance Language</a>:
<ul>
<li>Escape analysis only (JIT)</li>
<li>LINQ calls are optimized by the JIT</li>
<li>LINQ calls are optimized by the compiler</li>
</ul>
</li>
<li>An attempt to <a href="https://github.com/dotnet/roslyn/issues/10378#issuecomment-248556947">manually optimise LINQ</a></li>
<li>LinqOptimiser <a href="https://github.com/nessos/LinqOptimizer/wiki/Performance">performance results</a></li>
<li>RoslynLinqRewrite
<ul>
<li><a href="https://www.reddit.com/r/csharp/comments/5310m4/roslynlinqrewrite_compiles_linq_expressions_to/">r/charp discussion</a></li>
<li><a href="https://www.reddit.com/r/programming/comments/53nw6w/roslynlinqrewrite_optimize_linq_code_to/">r/programming discussion</a></li>
<li><a href="https://news.ycombinator.com/item?id=12544987">HackerNews discussion</a></li>
</ul>
</li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/09/29/Optimising-LINQ/">Optimising LINQ</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Compact strings in the CLR2016-09-19T00:00:00+00:00http://www.mattwarren.org/2016/09/19/Compact-strings-in-the-CLR
<p>In the CLR strings are stored as a sequence of UTF-16 code units, i.e. an array of <code class="language-plaintext highlighter-rouge">char</code> items. So if we have the string ‘testing’, in memory it looks like this:</p>
<p><img src="/images/2016/09/Testing - Unicode or UTF-16.png" alt="'Testing' - Unicode or UTF-16.png" /></p>
<p>But look at all those zero’s, wouldn’t it be more efficient if it could be stored like this instead?</p>
<p><img src="/images/2016/09/Testing - ASCII or UTF-8.png" alt="'Testing' - ASCII or UTF-8.png" /></p>
<p>Now this is a contrived example, clearly not all strings are simple <code class="language-plaintext highlighter-rouge">ASCII</code> text that can be compacted this way. Also, even though I’m an English speaker, I’m well aware that there are other languages with character sets than can only be expressed in <code class="language-plaintext highlighter-rouge">Unicode</code>. However it turns out that even in a fully internationalised modern web-application, there are still a large amount of strings that could be expressed as <code class="language-plaintext highlighter-rouge">ASCII</code>, such as:</p>
<ul>
<li><strong>Urls</strong> - <a href="https://en.wikipedia.org/wiki/Percent-encoding">Percent-encoding</a></li>
<li><strong>Http Headers</strong> - <a href="https://tools.ietf.org/html/rfc7230#section-3.2.4">RFC 7230 3.2.4. Field Parsing</a></li>
</ul>
<p>So there is still an overall memory saving if the CLR provided an implementation that stored some strings in a more compact encoding that only takes <strong>1 byte</strong> per character (<code class="language-plaintext highlighter-rouge">ASCII</code> or even <code class="language-plaintext highlighter-rouge">ISO-8859-1 (Latin-1)</code>) and the rest as <code class="language-plaintext highlighter-rouge">Unicode</code> (<strong>2 bytes</strong> per character).</p>
<p><strong>Aside:</strong> If you are wondering “Why does C# use UTF-16 for strings?” Eric Lippert has a <a href="http://blog.coverity.com/2014/04/09/why-utf-16">great post on this exact subject</a> and Jon Skeet has something interesting to say about the subject in <a href="http://codeblog.jonskeet.uk/2011/04/05/of-memory-and-strings/">“Of Memory and Strings”</a></p>
<h3 id="real-world-data">Real-world data</h3>
<p>In theory this is all well and good, but what about in practice, what about a real-world example?</p>
<p>Well <a href="https://twitter.com/nick_craver">Nick Craver</a> a developer at Stack Overflow was kind enough to run my <a href="/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD/">Heap Analyser tool</a> one of their memory dumps:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.NET Memory Dump Heap Analyser - created by Matt Warren - github.com/mattwarren
Found CLR Version: v4.6.1055.00
...
Overall 30,703,367 "System.String" objects take up 4,320,235,704 bytes (4,120.10 MB)
Of this underlying byte arrays (as Unicode) take up 3,521,948,162 bytes (3,358.79 MB)
Remaining data (object headers, other fields, etc) is 798,287,542 bytes (761.31 MB), at 26 bytes per object
Actual Encoding that the "System.String" could be stored as (with corresponding data size)
3,347,868,352 bytes are ASCII
5,078,902 bytes are ISO-8859-1 (Latin-1)
169,000,908 bytes are Unicode (UTF-16)
Total: 3,521,948,162 bytes (expected: 3,521,948,162)
Compression Summary:
1,676,473,627 bytes Compressed (to ISO-8859-1 (Latin-1))
169,000,908 bytes Uncompressed (as Unicode/UTF-16)
30,703,367 bytes EXTRA to enable compression (one byte field, per "System.String" object)
Total: 1,876,177,902 bytes, compared to 3,521,948,162 before compression
</code></pre></div></div>
<p>(<a href="https://gist.github.com/NickCraver/a5e8e307702f92d343f8ec86e71646e6">The full output is available</a>)</p>
<p>Here we can see that there are over <strong>30 million</strong> strings in memory, taking up <strong>4,120 MB</strong> out of a total heap size of <strong>13,232 MB</strong> (just over 30%).</p>
<p>Further more we can see that the raw data used by the strings (excluding the CLR Object headers) takes up <strong>3,358 MB</strong> when encoded as <code class="language-plaintext highlighter-rouge">Unicode</code>. However if the relevant strings were compacted to <code class="language-plaintext highlighter-rouge">ASCII</code>/<code class="language-plaintext highlighter-rouge">Latin-1</code> only <strong>1,789 MB</strong> would be needed to store them, a pretty impressive saving!</p>
<hr />
<h3 id="a-proposal-for-compact-strings-in-the-clr">A proposal for compact strings in the CLR</h3>
<p>I learnt about the idea of “Compact Strings” when reading about how they were <a href="http://openjdk.java.net/jeps/254">implemented in Java</a> and so I put together a proposal for <a href="https://github.com/dotnet/coreclr/issues/7083">an implementation in the CLR</a> (isn’t .NET OSS Great!!).</p>
<p>Turns out that <a href="https://blogs.msdn.microsoft.com/vancem/">Vance Morrison</a> (Performance Architect on the .NET Runtime Team) has been thinking about the same idea for quite a while:</p>
<blockquote>
<p>To answer @mattwarren question on whether changing the internal representation of a string has been considered before, the short answer is YES. <strong>In fact it has been a pet desire of mine for probably over a decade now.</strong></p>
</blockquote>
<p>He also confirmed that they’ve done their homework and found that a significant amount of strings could be compacted:</p>
<blockquote>
<p>What was clear now and has held true for quite sometime is that:
Typical apps have <strong>20% of their GC heap as strings</strong>. Most of the 16 bit characters have 0 in their upper byte. <strong>Thus you can save 10% of typical heaps</strong> by encoding in various ways that eliminate these pointless upper bytes.</p>
</blockquote>
<p>It’s worth reading <a href="https://github.com/dotnet/coreclr/issues/7083#issuecomment-246420765">his entire response</a> if you are interested in the full details of the proposal, including the trade-offs, benefits and drawbacks.</p>
<h3 id="implementation-details">Implementation details</h3>
<p>At a high-level the proposal would allow to strings to be stored in 2 formats:</p>
<ul>
<li><strong>Regular</strong> - i.e. Unicode encoded, as they are currently stored by the CLR</li>
<li><strong>Compact</strong> - ASCII, ISO-8859-1 (Latin-1) or even another format</li>
</ul>
<p>When you create a string, the constructor would determine the most efficient encoding and encode the data in that format. The formant used would then be stored in a field, so that the encoding is always known (CLR strings are immutable). That means that each method within the string class can use this field to determine how it operates, for instance the pseudo-code for the <code class="language-plaintext highlighter-rouge">Equals</code> method is shown below:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">boolean</span> <span class="nf">Equals</span><span class="p">(</span><span class="kt">string</span> <span class="n">other</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="n">type</span> <span class="p">!=</span> <span class="n">other</span><span class="p">.</span><span class="n">type</span><span class="p">)</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="p">==</span> <span class="n">ASCII</span><span class="p">)</span>
<span class="k">return</span> <span class="n">StringASCII</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">other</span><span class="p">);</span>
<span class="k">else</span>
<span class="k">return</span> <span class="n">StringLatinUTF16</span><span class="p">.</span><span class="nf">Equals</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">other</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This shows a nice property of having strings in two formats; some operations can be short-circuited, because we know that strings stored in different encodings won’t be the same.</p>
<h4 id="advantages">Advantages</h4>
<ul>
<li>less overall <strong>memory usage</strong> (as-per @davidfowl <a href="https://twitter.com/davidfowl/status/767585518854938625">“At the top of every ASP.NET profile… strings!”</a>)</li>
<li>strings become more <strong>cache-friendly</strong>, which <em>may</em> give better performance</li>
</ul>
<h4 id="disadvantages">Disadvantages</h4>
<ul>
<li>Makes some <strong>operations slower</strong> due to the extra <code class="language-plaintext highlighter-rouge">if (type == ...)</code> check needed</li>
<li>Breaks the <code class="language-plaintext highlighter-rouge">fixed</code> keyword, as well as COM and P/Invoke interop that <strong>relies on the current string layout/format</strong></li>
<li>If very few strings in the application can be compacted, this will have an <strong>overhead for no gain</strong></li>
</ul>
<hr />
<h3 id="next-steps">Next steps</h3>
<p>In his reply Vance Morrison highlighted that solving the issue with the <code class="language-plaintext highlighter-rouge">fixed</code> keyword was a first step, because that has a hard dependency on the current string layout. Once that’s done the real work of making large, sweeping changes to the CLR can be done:</p>
<blockquote>
<p>The main challenge is dealing with fixed, but there is also frankly at least a few man-months of simply dealing with the places in the runtime where we took a dependency on the layout of string (in the runtime, interop, and things like stringbuilder, and all the uses of ‘fixed’ in corefx).</p>
</blockquote>
<blockquote>
<p>Thus it IS doable, but it is at least moderately expensive (man months), and the payoff is non-trivial but not huge.</p>
</blockquote>
<p>So stay tuned, one day we might have a more compact, more efficient implementation of strings in the CLR, yay!!</p>
<hr />
<h3 id="further-reading">Further Reading</h3>
<ul>
<li>An implementation of this idea done in the <a href="http://www.mono-project.com/docs/advanced/runtime/docs/ascii-strings/">Mono runtime</a>, with <a href="https://lists.dot.net/pipermail/mono-devel-list/2016-July/043744.html">accompanying discussion</a></li>
<li>More info from Eric Lippert on <a href="https://blogs.msdn.microsoft.com/ericlippert/2011/07/19/strings-immutability-and-persistence/">why .NET strings are laid out as they are</a></li>
<li><a href="https://github.com/dotnet/corefxlab/tree/master/src/System.Text.Utf8/System/Text/Utf8">UTF-8 string Library</a> currently being developed in the CoreFX Labs.</li>
<li>Report produced by several Oracle Engineers: <a href="http://cr.openjdk.java.net/~shade/density/string-density-report.pdf">“String Density: Performance and Footprint”</a></li>
<li>Report on <a href="http://cr.openjdk.java.net/~shade/density/state-of-string-density-v1.txt">“State of String Density performance (May 5, 2015)”</a> in Java</li>
<li>What was involved in <a href="http://www.infoq.com/news/2016/02/compact-strings-Java-JDK9">optimising the Java implementation</a> (tl;dr quite a lot!!)</li>
<li><a href="https://www.python.org/dev/peps/pep-0393/">Python’s Flexible String Representation</a></li>
</ul>
<hr />
<p>Discuss this post on <a href="https://www.reddit.com/r/programming/comments/53hzrx/compact_strings_in_the_clr_a_proposal/">/r/programming</a></p>
Subverting .NET Type Safety with 'System.Runtime.CompilerServices.Unsafe'2016-09-14T00:00:00+00:00http://www.mattwarren.org/2016/09/14/Subverting-.NET-Type-Safety-with-System.Runtime.CompilerServices.Unsafe
<h4 id="in-which-we-use-systemruntimecompilerservicesunsafe-a-generic-api-type-safe-but-still-unsafe-and-mess-with-the-c-type-system"><strong>In which we use <code class="language-plaintext highlighter-rouge">System.Runtime.CompilerServices.Unsafe</code> a generic API (“type-safe” but still “unsafe”) and mess with the C# Type System!</strong></h4>
<hr />
<p>The post covers the following topics:</p>
<ul>
<li><a href="#what-it-is-and-why-its-useful">What it is and why it’s useful</a></li>
<li><a href="#how-it-works">How it works</a></li>
<li><a href="#code-samples">Code samples</a></li>
<li><a href="#tricks-you-can-do-with-it">Tricks you can do with it</a></li>
<li><a href="#using-it-safely">Using it safely</a></li>
</ul>
<hr />
<h3 id="what-it-is-and-why-its-useful">What it is and why it’s useful</h3>
<p>The XML documentation comments for <code class="language-plaintext highlighter-rouge">System.Runtime.CompilerServices.Unsafe</code> state that it:</p>
<blockquote>
<p>Contains generic, low-level functionality for manipulating pointers.</p>
</blockquote>
<p>But we can get a better understanding of <em>what it is</em> by looking at the actual API definition from the <a href="https://www.nuget.org/packages/System.Runtime.CompilerServices.Unsafe/">current NuGet package (4.0.0)</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Contains generic, low-level functionality for manipulating pointers.</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">class</span> <span class="nc">Unsafe</span>
<span class="p">{</span>
<span class="c1">// Casts the given object to the specified type.</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">T</span> <span class="n">As</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="kt">object</span> <span class="n">o</span><span class="p">)</span> <span class="k">where</span> <span class="n">T</span> <span class="p">:</span> <span class="k">class</span>
<span class="c1">// Returns a pointer to the given by-ref parameter. </span>
<span class="nc">public</span> <span class="k">static</span> <span class="k">void</span><span class="p">*</span> <span class="n">AsPointer</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">ref</span> <span class="n">T</span> <span class="k">value</span><span class="p">);</span>
<span class="c1">// Copies a value of type T to the given location. </span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="n">Copy</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">void</span><span class="p">*</span> <span class="n">destination</span><span class="p">,</span> <span class="k">ref</span> <span class="n">T</span> <span class="n">source</span><span class="p">);</span>
<span class="c1">// Copies a value of type T to the given location.</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="n">Copy</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">ref</span> <span class="n">T</span> <span class="n">destination</span><span class="p">,</span> <span class="k">void</span><span class="p">*</span> <span class="n">source</span><span class="p">);</span>
<span class="c1">// Copies bytes from the source address to the destination address.</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">CopyBlock</span><span class="p">(</span><span class="k">void</span><span class="p">*</span> <span class="n">destination</span><span class="p">,</span> <span class="k">void</span><span class="p">*</span> <span class="n">source</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="c1">// Initializes a block of memory at the given location with a given initial value. </span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">InitBlock</span><span class="p">(</span><span class="k">void</span><span class="p">*</span> <span class="n">startAddress</span><span class="p">,</span> <span class="kt">byte</span> <span class="k">value</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="c1">// Reads a value of type T from the given location.</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">T</span> <span class="n">Read</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">void</span><span class="p">*</span> <span class="n">source</span><span class="p">);</span>
<span class="c1">// Returns the size of an object of the given type parameter. </span>
<span class="k">public</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">SizeOf</span><span class="p"><</span><span class="n">T</span><span class="p">>();</span>
<span class="c1">// Writes a value of type T to the given location.</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="n">Write</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">void</span><span class="p">*</span> <span class="n">destination</span><span class="p">,</span> <span class="n">T</span> <span class="k">value</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note: I edited the the XML doc-comments for brevity, the full versions are available <a href="https://github.com/dotnet/corefx/blob/master/src/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.xml">in the source</a>. There are also some additional <a href="https://github.com/dotnet/corefx/issues/10451">methods that have been added to the API</a>, but to make use of them you have to use a version of the C# compiler with <a href="https://github.com/dotnet/roslyn/issues/118">support for ref returns and locals</a>.</p>
<p>However this doesn’t really tell us <em>why it’s useful</em>, to get some background on that we can look at the GitHub issue <a href="https://github.com/dotnet/corefx/issues/5474">“Provide a generic API to read from and write to a pointer”</a>:</p>
<p><a href="https://github.com/dotnet/corefx/issues/5474"><img src="/images/2016/09/GitHub issue - Provide a generic API to read from and write to a pointer.png" alt="GitHub issue - Provide a generic API to read from and write to a pointer" /></a></p>
<p>So at a high-level the goals of the <code class="language-plaintext highlighter-rouge">System.Runtime.CompilerServices.Unsafe</code> library are to:</p>
<ol>
<li><strong>Provide a <em>safer</em> way of writing low-level <code class="language-plaintext highlighter-rouge">unsafe</code> code</strong>
<ul>
<li>Without this library you have to resort to <code class="language-plaintext highlighter-rouge">fixed</code> and pointer manipulation, which can be error prone</li>
</ul>
</li>
<li><strong>Allow access to functionality that can’t be expressed in C#, but is possible in IL</strong>
<ul>
<li>For instance <code class="language-plaintext highlighter-rouge">Unsafe.Sizeof<T>()</code> allows access to the <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.sizeof(v=vs.110).aspx">Sizeof IL Opcode</a></li>
</ul>
</li>
<li><strong>Save developers from having to repeatedly write the same <code class="language-plaintext highlighter-rouge">unsafe</code> code</strong>
<ul>
<li>There are already <a href="https://github.com/dotnet/corefxlab/pull/796">code-bases making use of it</a>, including the <a href="https://github.com/aspnet/KestrelHttpServer/pull/1000">Kestrel the high-performance web server, based on libuv.</a></li>
</ul>
</li>
</ol>
<p>It’s also worth pointing out that the library is primarily for use with a Value Type (int, float, etc) rather than a <code class="language-plaintext highlighter-rouge">class</code> or Reference type. You can use it with classes, however you <a href="https://msdn.microsoft.com/en-us/library/23acw07k(v=vs.110).aspx">have to pin them first</a>, so they don’t move about in memory whilst you are working with the pointer.</p>
<p><strong>Update:</strong> It was pointed out to me that <a href="https://github.com/nietras">Niels</a> wrote an initial implementation of this library <a href="https://github.com/DotNetCross/Memory.Unsafe">in a separate project</a>, before Microsoft made their own version.</p>
<hr />
<h3 id="how-it-works">How it works</h3>
<p>Because the library allows access to functionality that can’t be expressed in C#, it has to be <a href="https://github.com/dotnet/corefx/blob/master/src/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.il">written in raw IL</a>, which is then compiled by a custom build-step. As an example we will look at the <code class="language-plaintext highlighter-rouge">AsPointer</code> method, which has the following signature:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="k">void</span><span class="p">*</span> <span class="n">AsPointer</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="k">ref</span> <span class="n">T</span> <span class="k">value</span><span class="p">)</span>
</code></pre></div></div>
<p>The IL for this is shown below, note how the <code class="language-plaintext highlighter-rouge">ref</code> keyword becomes <code class="language-plaintext highlighter-rouge">&</code> in IL and <code class="language-plaintext highlighter-rouge"><T></code> is expressed as <code class="language-plaintext highlighter-rouge">!!T</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.method public hidebysig static void* AsPointer<T>(!!T& 'value') cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
.maxstack 1
ldarg.0
conv.u
ret
} // end of method Unsafe::AsPointer
</code></pre></div></div>
<p>Here we can see that it’s making use of the <code class="language-plaintext highlighter-rouge">conv.u</code> IL instruction. For reference the explanation of this, along with some of the other op codes used by the library are shown below:</p>
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.conv_u(v=vs.110).aspx">Conv_U</a> - Converts the value on top of the evaluation stack to <strong>unsigned native int</strong>, and extends it to <strong>native int</strong>.</li>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldobj(v=vs.110).aspx">Ldobj</a> - Copies the value type object pointed to by an address to the top of the evaluation stack.</li>
<li><a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.stobj(v=vs.110).aspx">Stobj</a> - Copies a value of a specified type from the evaluation stack into a supplied memory address.</li>
</ul>
<p>After searching around I found several other places in the .NET Runtime that make use of raw IL in this way:</p>
<ul>
<li><a href="https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/Span.cs">System.Slices/System/Span.cs</a></li>
<li><a href="https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/PtrUtils.cs">PtrUtils in CoreFX Labs</a></li>
<li><a href="https://github.com/joeduffy/slice.net/blob/master/src/PtrUtils.il">Joe Duffy’s slice.net - PtrUtils.il</a></li>
</ul>
<hr />
<h3 id="code-samples">Code samples</h3>
<p>There’s a <a href="https://github.com/dotnet/corefx/blob/e34ffcd5875d44f8dad10efc07d357a78175b264/src/System.Runtime.CompilerServices.Unsafe/tests/UnsafeTests.cs">nice set of unit tests</a> that show the main use-cases for the library, for instance here is how to use <code class="language-plaintext highlighter-rouge">Unsafe.Write(..)</code> to directly change the value of an <code class="language-plaintext highlighter-rouge">int</code> via a pointer.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">Fact</span><span class="p">]</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">unsafe</span> <span class="k">void</span> <span class="nf">WriteInt32</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="k">value</span> <span class="p">=</span> <span class="m">10</span><span class="p">;</span>
<span class="kt">int</span><span class="p">*</span> <span class="n">address</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">*)</span><span class="n">Unsafe</span><span class="p">.</span><span class="nf">AsPointer</span><span class="p">(</span><span class="k">ref</span> <span class="k">value</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">expected</span> <span class="p">=</span> <span class="m">20</span><span class="p">;</span>
<span class="n">Unsafe</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="n">expected</span><span class="p">);</span>
<span class="n">Assert</span><span class="p">.</span><span class="nf">Equal</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span> <span class="k">value</span><span class="p">);</span>
<span class="n">Assert</span><span class="p">.</span><span class="nf">Equal</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span> <span class="p">*</span><span class="n">address</span><span class="p">);</span>
<span class="n">Assert</span><span class="p">.</span><span class="nf">Equal</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span> <span class="n">Unsafe</span><span class="p">.</span><span class="n">Read</span><span class="p"><</span><span class="kt">int</span><span class="p">>(</span><span class="n">address</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>You can write something similar by manipulating pointers directly, but it’s not as straightforward (unless you are familiar with C or C++)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int value = 10;
int* ptr = &value;
*ptr = 30;
Console.WriteLine(value); // prints "30"
</code></pre></div></div>
<p>For a more real-world use case, the code below shows how you can access a <code class="language-plaintext highlighter-rouge">KeyValuePair<DateTime, decimal></code> directly as a <code class="language-plaintext highlighter-rouge">byte []</code> (taken from a <a href="https://github.com/dotnet/coreclr/issues/5870#issuecomment-240186556">GitHub discussion</a>):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">dt</span> <span class="p">=</span> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="n">DateTime</span><span class="p">,</span> <span class="kt">decimal</span><span class="p">>[</span><span class="m">2</span><span class="p">];</span>
<span class="k">ref</span> <span class="kt">byte</span> <span class="n">asRefByte</span> <span class="p">=</span> <span class="k">ref</span> <span class="n">Unsafe</span><span class="p">.</span><span class="n">As</span><span class="p"><</span><span class="n">KeyValuePair</span><span class="p"><</span><span class="n">DateTime</span><span class="p">,</span> <span class="kt">decimal</span><span class="p">>,</span> <span class="kt">byte</span><span class="p">>(</span><span class="k">ref</span> <span class="n">dt</span><span class="p">[</span><span class="m">0</span><span class="p">]);</span>
<span class="k">fixed</span> <span class="p">(</span><span class="kt">byte</span> <span class="p">*</span> <span class="n">ptr</span> <span class="p">=</span> <span class="p">&</span><span class="n">asRefByte</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Treat the KeyValuePair<DateTime, decimal> as if it were a byte []</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>(this example is based on the StackOverflow question: <a href="http://stackoverflow.com/questions/32864239/get-unsafe-pointer-to-array-of-keyvaluepairdatetime-decimal-in-c-sharp/38979981#38979981">“Get unsafe pointer to array of KeyValuePair<DateTime,decimal> in C#”</a>)</p>
<hr />
<h3 id="tricks-you-can-do-with-it">Tricks you can do with it</h3>
<p>Despite providing you with a nice strongly-typed API, you still have to mark your code as <code class="language-plaintext highlighter-rouge">unsafe</code>, which it’s a bit of a give-away that you can use it to do things that normal C# can’t!</p>
<h4 id="breaking-immutability"><strong>Breaking immutability</strong></h4>
<p>Strings in C# are immutable and the runtime goes to great lengths to ensure you can’t bypass this behaviour. However under-the-hood the String data is just bytes which can be manipulated, indeed the runtime does this manipulation itself inside the <code class="language-plaintext highlighter-rouge">StringBuilder</code> class.</p>
<p>So using <code class="language-plaintext highlighter-rouge">Unsafe.Write(..)</code> we can modify the contents of a String - <strong>yay</strong>!! However it needs to be pointed out that this code will potentially break the behaviour of the String class in many subtle ways, <strong>so don’t ever use it in a real application!!</strong></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="s">"ABCDEFGHIJKLMNOPQRSTUVWXKZ"</span><span class="p">;</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"String Length {0}"</span><span class="p">,</span> <span class="n">text</span><span class="p">.</span><span class="n">Length</span><span class="p">);</span> <span class="c1">// prints 26</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Text: \"{0}\""</span><span class="p">,</span> <span class="n">text</span><span class="p">);</span> <span class="c1">// "ABCDEFGHIJKLMNOPQRSTUVWXKZ"</span>
<span class="kt">var</span> <span class="n">pinnedText</span> <span class="p">=</span> <span class="n">GCHandle</span><span class="p">.</span><span class="nf">Alloc</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">GCHandleType</span><span class="p">.</span><span class="n">Pinned</span><span class="p">);</span>
<span class="kt">char</span><span class="p">*</span> <span class="n">textAddress</span> <span class="p">=</span> <span class="p">(</span><span class="kt">char</span><span class="p">*)</span><span class="n">pinnedText</span><span class="p">.</span><span class="nf">AddrOfPinnedObject</span><span class="p">().</span><span class="nf">ToPointer</span><span class="p">();</span>
<span class="c1">// Make an immutable string think that it is shorter than it actually is!!!</span>
<span class="n">Unsafe</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">textAddress</span> <span class="p">-</span> <span class="m">2</span><span class="p">,</span> <span class="m">5</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"String Length {0}"</span><span class="p">,</span> <span class="n">text</span><span class="p">.</span><span class="n">Length</span><span class="p">);</span> <span class="c1">// prints 5</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Text: \"{0}\""</span><span class="p">,</span> <span class="n">text</span><span class="p">);</span> <span class="c1">// prints "ABCDE</span>
<span class="c1">// change the 2nd character 'B' to '@'</span>
<span class="n">Unsafe</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">textAddress</span> <span class="p">+</span> <span class="m">1</span><span class="p">,</span> <span class="sc">'@'</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Text: \"{0}\""</span><span class="p">,</span> <span class="n">text</span><span class="p">);</span> <span class="c1">// prints "A@CDE</span>
<span class="n">pinnedText</span><span class="p">.</span><span class="nf">Free</span><span class="p">();</span>
</code></pre></div></div>
<h4 id="messing-with-the-clr-type-system"><strong>Messing with the CLR type-system</strong></h4>
<p>But we can go even further than that and do a really nasty trick to completely defeat the CLR type-system. This code is horrible and could potentially break the CLR in several ways, so as before <strong>don’t ever use it in a real application!!</strong></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">intValue</span> <span class="p">=</span> <span class="m">5</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">floatValue</span> <span class="p">=</span> <span class="m">5.0f</span><span class="p">;</span>
<span class="kt">object</span> <span class="n">boxedInt</span> <span class="p">=</span> <span class="p">(</span><span class="kt">object</span><span class="p">)</span><span class="n">intValue</span><span class="p">,</span> <span class="n">boxedFloat</span> <span class="p">=</span> <span class="p">(</span><span class="kt">object</span><span class="p">)</span><span class="n">floatValue</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">pinnedFloat</span> <span class="p">=</span> <span class="n">GCHandle</span><span class="p">.</span><span class="nf">Alloc</span><span class="p">(</span><span class="n">boxedFloat</span><span class="p">,</span> <span class="n">GCHandleType</span><span class="p">.</span><span class="n">Pinned</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">pinnedInt</span> <span class="p">=</span> <span class="n">GCHandle</span><span class="p">.</span><span class="nf">Alloc</span><span class="p">(</span><span class="n">boxedInt</span><span class="p">,</span> <span class="n">GCHandleType</span><span class="p">.</span><span class="n">Pinned</span><span class="p">);</span>
<span class="kt">int</span><span class="p">*</span> <span class="n">floatAddress</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">*)</span><span class="n">pinnedFloat</span><span class="p">.</span><span class="nf">AddrOfPinnedObject</span><span class="p">().</span><span class="nf">ToPointer</span><span class="p">();</span>
<span class="kt">int</span><span class="p">*</span> <span class="n">intAddress</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">*)</span><span class="n">pinnedInt</span><span class="p">.</span><span class="nf">AddrOfPinnedObject</span><span class="p">().</span><span class="nf">ToPointer</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Type: {0}, Value: {1}"</span><span class="p">,</span> <span class="n">boxedInt</span><span class="p">.</span><span class="nf">GetType</span><span class="p">().</span><span class="n">FullName</span><span class="p">,</span> <span class="n">boxedInt</span><span class="p">);</span>
<span class="c1">// Make an int think it's a float!!!</span>
<span class="kt">int</span> <span class="n">floatType</span> <span class="p">=</span> <span class="n">Unsafe</span><span class="p">.</span><span class="n">Read</span><span class="p"><</span><span class="kt">int</span><span class="p">>(</span><span class="n">floatAddress</span> <span class="p">-</span> <span class="m">1</span><span class="p">);</span>
<span class="n">Unsafe</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">intAddress</span> <span class="p">-</span> <span class="m">1</span><span class="p">,</span> <span class="n">floatType</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Type: {0}, Value: {1}"</span><span class="p">,</span> <span class="n">boxedInt</span><span class="p">.</span><span class="nf">GetType</span><span class="p">().</span><span class="n">FullName</span><span class="p">,</span> <span class="n">boxedInt</span><span class="p">);</span>
<span class="n">pinnedFloat</span><span class="p">.</span><span class="nf">Free</span><span class="p">();</span>
<span class="n">pinnedInt</span><span class="p">.</span><span class="nf">Free</span><span class="p">();</span>
</code></pre></div></div>
<p>Which prints out:</p>
<blockquote>
<p>Type: System.Int32, Value: 5</p>
<p>Type: System.Single, Value: 7.006492E-45</p>
</blockquote>
<p>Yep, we’ve managed to convince a <code class="language-plaintext highlighter-rouge">int</code> (Int32) type that it’s actually a <code class="language-plaintext highlighter-rouge">float</code> (Single) and behave like one instead!!</p>
<p>This works by overwriting the <em>Method Table</em> pointer for the <code class="language-plaintext highlighter-rouge">int</code>, with the same value as the <code class="language-plaintext highlighter-rouge">float</code> one. So when it looks up it’s type or prints out it’s value, it uses the <code class="language-plaintext highlighter-rouge">float</code> methods instead! Thanks to <a href="https://github.com/Porges">@Porges</a> for the <a href="https://gist.github.com/Porges/4b5fb3f0d66093105422e9892177754f">example that motivated this</a>, his code does the same thing using <code class="language-plaintext highlighter-rouge">fixed</code> instead.</p>
<hr />
<h3 id="using-it-safely">Using it safely</h3>
<p>Despite the library requiring you to annotate your code with <code class="language-plaintext highlighter-rouge">unsafe</code>, there are still some <em>safe</em> or maybe more accurately <em>safer</em> ways to use it!</p>
<p>Fortunately one of the main .NET runtime developers provided a nice list of <a href="https://github.com/dotnet/coreclr/issues/5870#issuecomment-227007187">what you can and can’t do</a>:</p>
<p><a href="/images/2016/09/Safely using System.Runtime.CompilerServices.Unsafe.png"><img src="/images/2016/09/Safely using System.Runtime.CompilerServices.Unsafe.png" alt="Safely using System.Runtime.CompilerServices.Unsafe" /></a></p>
<p>But as with all <code class="language-plaintext highlighter-rouge">unsafe</code> code, you’re asking the runtime to let you do things that you are normally prevented from doing, things that it normally saves you from, so you have to be careful!</p>
<hr />
<p>Discuss this post in <a href="https://www.reddit.com/r/csharp/comments/52qs09/subverting_net_type_safety_with/">/r/csharp</a> or <a href="https://www.reddit.com/r/programming/comments/52viyd/subverting_net_type_safety_with/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2016/09/14/Subverting-.NET-Type-Safety-with-System.Runtime.CompilerServices.Unsafe/">Subverting .NET Type Safety with 'System.Runtime.CompilerServices.Unsafe'</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Analysing .NET Memory Dumps with CLR MD2016-09-06T00:00:00+00:00http://www.mattwarren.org/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD
<p>If you’ve ever spent time debugging .NET memory dumps in WinDBG you will be familiar with the commands shown below, which aren’t always the most straight-forward to work with!</p>
<p><a href="http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net"><img src="http://www.codeproject.com/KB/debug/WinDBGAndSOS/SOSHelp.PNG" alt="CodeProject - Debugging Memory Related Issues in .Net " /></a></p>
<p>However back in May 2013 Microsoft <a href="https://blogs.msdn.microsoft.com/dotnet/2013/05/01/net-crash-dump-and-live-process-inspection/">released the CLR MD library</a>, describing it as:</p>
<blockquote>
<p>… a set of advanced APIs for programmatically inspecting a crash dump of a .NET program much in the same way as the SOS Debugging Extensions (SOS). It allows you to write automated crash analysis for your applications and automate many common debugger tasks.</p>
</blockquote>
<p>This post explores some of the things you can achieve by instead using CLR MD, a C# library which is now available as a <a href="https://github.com/Microsoft/clrmd">NuGet Package</a>. If you’re interested the <a href="https://github.com/mattwarren/HeapStringAnalyser">full source code</a> for all the examples is available.</p>
<hr />
<h3 id="getting-started-with-clr-md">Getting started with CLR MD</h3>
<p>This post isn’t meant to serve as a <em>Getting Started</em> guide, there’s already a great set of Tutorials <a href="https://github.com/Microsoft/clrmd#tutorials">linked from project README</a> that serve that purpose:</p>
<ul>
<li><a href="https://github.com/microsoft/clrmd/blob/master/doc/GettingStarted.md">Getting Started</a> - A brief introduction to the API and how to create a CLRRuntime instance.</li>
<li><a href="https://github.com/microsoft/clrmd/blob/master/doc/ClrRuntime.md">The CLRRuntime Object</a> - Basic operations like enumerating AppDomains, Threads, the Finalizer Queue, etc.</li>
<li><a href="https://github.com/microsoft/clrmd/blob/master/doc/WalkingTheHeap.md">Walking the Heap</a> - Walking objects on the GC heap, working with types in CLR MD.</li>
<li><a href="https://github.com/microsoft/clrmd/blob/master/doc/TypesAndFields.md">Types and Fields in CLRMD</a> - More information about dealing with types and fields in CLRMD.</li>
<li><a href="https://github.com/microsoft/clrmd/blob/master/doc/MachineCode.md">Machine Code in CLRMD</a> - Getting access to the native code produced by the JIT or NGEN</li>
</ul>
<p>However we will be looking at what else CLR MD allows you to achieve.</p>
<hr />
<h3 id="detailed-gc-heap-information">Detailed GC Heap Information</h3>
<p>I’ve previously written about the <a href="/#Garbage-Collectors">Garbage Collectors</a>, so the first thing that we’ll do is see what GC related information we can obtain. The .NET GC creates 1 or more <strong>Heaps</strong>, depending on the number of CPU cores available and the mode it is running in (Server/Workstation). These heaps are in-turn made up of several <strong>Segments</strong>, for the different Generations (Gen0/Ephememral, Gen1, Gen2 and Large). Finally it’s worth pointing out that the GC initially <strong>Reserves</strong> the memory it wants, but only <strong>Commits</strong> it when it actually needs to. So using the <a href="https://github.com/mattwarren/HeapStringAnalyser/blob/2161764b11d19a54ef1d0c2d78b796ee4c8bfd62/HeapStringAnalyser/HeapStringAnalyser/Program.cs#L318-L367">code shown here</a>, we can iterate through the different GC Heaps, printing out the information about their individual Segments as we go:</p>
<p><a href="/images/2016/09/HeapStringAnalyser - GC Info.png"><img src="/images/2016/09/HeapStringAnalyser - GC Info.png" alt="HeapStringAnalyser - GC Info" /></a></p>
<h3 id="analysing-string-usage">Analysing String usage</h3>
<p>But knowing what’s inside those heaps is more useful, as <a href="https://github.com/davidfowl">David Fowler</a> nicely summed up in a tweet, strings often significantly contribute to memory usage:</p>
<p><a href="https://twitter.com/davidfowl/status/767585518854938625"><img src="/images/2016/09/David Fowler tweet about Strings.png" alt="David Fowler tweet about Strings" /></a></p>
<p>Now we could analyse the memory dump to produce a list of the most frequently occurring strings, as <a href="http://nickcraver.com/">Nick Craver</a> did with a <a href="https://twitter.com/Nick_Craver/status/752822131889729536">memory dump from the App Pool of a Stack Overflow server</a> (click for larger image):</p>
<p><a href="/images/2016/09/String frequency analysis of a Stack Overflow memory dump.jpg"><img src="/images/2016/09/String frequency analysis of a Stack Overflow memory dump.jpg" alt="String frequency analysis of a Stack Overflow memory dump" /></a></p>
<p>However we’re going to look more closely at the actual contents of the string and in-particular analyse what the underlying <em>encoding</em> is, i.e. <code class="language-plaintext highlighter-rouge">ASCII</code>, <code class="language-plaintext highlighter-rouge">ISO-8859-1 (Latin-1)</code> or <code class="language-plaintext highlighter-rouge">Unicode</code>.</p>
<p>By default the .NET string Encoder, instead of giving an error, replaces any characters it can’t convert with ‘�’ (which is known as the <em>Unicode Replacement Character</em>). So we will need to force it to throw an exception. This means we can detect the most <em>compact</em> encoding possible, by trying to convert to the raw string data to <code class="language-plaintext highlighter-rouge">ASCII</code>, <code class="language-plaintext highlighter-rouge">ISO-8859-1 (Latin-1)</code> and then <code class="language-plaintext highlighter-rouge">Unicode</code> (sequence of UTF-16 code units) in turn. To see this in action, below is the code from the <a href="https://github.com/mattwarren/HeapStringAnalyser/blob/2161764b11d19a54ef1d0c2d78b796ee4c8bfd62/HeapStringAnalyser/HeapStringAnalyser/Program.cs#L165-L178"><code class="language-plaintext highlighter-rouge">IsASCII(..)</code> function</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">static</span> <span class="n">Encoding</span> <span class="n">asciiEncoder</span> <span class="p">=</span> <span class="n">Encoding</span><span class="p">.</span><span class="nf">GetEncoding</span><span class="p">(</span>
<span class="n">Encoding</span><span class="p">.</span><span class="n">ASCII</span><span class="p">.</span><span class="n">EncodingName</span><span class="p">,</span>
<span class="n">EncoderFallback</span><span class="p">.</span><span class="n">ExceptionFallback</span><span class="p">,</span>
<span class="n">DecoderFallback</span><span class="p">.</span><span class="n">ExceptionFallback</span><span class="p">);</span>
<span class="k">private</span> <span class="k">static</span> <span class="kt">bool</span> <span class="nf">IsASCII</span><span class="p">(</span><span class="kt">string</span> <span class="n">text</span><span class="p">,</span> <span class="k">out</span> <span class="kt">byte</span><span class="p">[]</span> <span class="n">textAsBytes</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">unicodeBytes</span> <span class="p">=</span> <span class="n">Encoding</span><span class="p">.</span><span class="n">Unicode</span><span class="p">.</span><span class="nf">GetBytes</span><span class="p">(</span><span class="n">text</span><span class="p">);</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="n">textAsBytes</span> <span class="p">=</span> <span class="n">Encoding</span><span class="p">.</span><span class="nf">Convert</span><span class="p">(</span><span class="n">Encoding</span><span class="p">.</span><span class="n">Unicode</span><span class="p">,</span> <span class="n">asciiEncoder</span><span class="p">,</span> <span class="n">unicodeBytes</span><span class="p">);</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">EncoderFallbackException</span> <span class="cm">/*efEx*/</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">textAsBytes</span> <span class="p">=</span> <span class="k">null</span><span class="p">;</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Next we run this on a memory dump of Visual Studio with the <a href="https://github.com/mattwarren/HeapStringAnalyser">HeapStringAnalyser source code</a> solution loaded and get the following output:</p>
<p><a href="/images/2016/09/HeapStringAnalyser - String Info.png"><img src="/images/2016/09/HeapStringAnalyser - String Info.png" alt="HeapStringAnalyser - String Info" /></a></p>
<p>The most interesting part is reproduced below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Overall 145,872 "System.String" objects take up 12,391,286 bytes (11.82 MB)
Of this underlying byte arrays (as Unicode) take up 10,349,078 bytes (9.87 MB)
Remaining data (object headers, other fields, etc) are 2,042,208 bytes (1.95 MB), at 14 bytes per object
Actual Encoding that the "System.String" could be stored as (with corresponding data size)
10,339,638 bytes ( 145,505 strings) as ASCII
3,370 bytes ( 65 strings) as ISO-8859-1 (Latin-1)
6,070 bytes ( 302 strings) as Unicode
Total: 10,349,078 bytes
</code></pre></div></div>
<p>So in this case we can see that out of the 145,872 string objects in memory, 145,505 of them could actually be stored as <code class="language-plaintext highlighter-rouge">ASCII</code>, a further 65 as <code class="language-plaintext highlighter-rouge">ISO-8859-1 (Latin-1)</code> and only 302 need the full <code class="language-plaintext highlighter-rouge">Unicode</code> encoding.</p>
<hr />
<h2 id="additional-resources">Additional resources</h2>
<p>Hopefully this post has demonstrated that CLR MD is a powerful tool, if you want to find out more please refer to the links below:</p>
<ul>
<li><a href="http://blogs.microsoft.co.il/sasha/2013/05/20/traversing-the-gc-heap-with-clrmd/">Traversing the GC Heap with ClrMd </a></li>
<li><a href="https://github.com/goldshtn/msos">msos</a> - Command-line environment a-la WinDbg for executing SOS commands without having SOS available</li>
<li><a href="https://blogs.msdn.microsoft.com/dotnet/2013/05/01/net-crash-dump-and-live-process-inspection/">.NET Crash Dump and Live Process Inspection</a></li>
<li><a href="https://github.com/JeffCyr/ClrMD.Extensions">ClrMD.Extensions</a></li>
<li><a href="https://blogs.msdn.microsoft.com/kirillosenkov/2014/07/05/get-most-duplicated-strings-from-a-heap-dump-using-clrmd/">Get most duplicated strings from a heap dump using ClrMD</a></li>
<li><a href="https://github.com/jcdickinson/dumpty">Dumpty - A Dump tool for .Net.</a></li>
<li><a href="http://stackoverflow.com/questions/22150259/how-to-properly-work-with-non-primitive-clrinstancefield-values-using-clrmd/22229543#22229543">How to properly work with non-primitive ClrInstanceField values using ClrMD?</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/09/06/Analysing-.NET-Memory-Dumps-with-CLR-MD/">Analysing .NET Memory Dumps with CLR MD</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Analysing Optimisations in the Wire Serialiser2016-08-23T00:00:00+00:00http://www.mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser
<p>Recently <a href="http://www.twitter.com/RogerAlsing">Roger Johansson</a> wrote a post titled <a href="https://rogeralsing.com/2016/08/16/wire-writing-one-of-the-fastest-net-serializers/">Wire – Writing one of the fastest .NET serializers</a>, describing the optimisation that were implemented to make <a href="https://github.com/akkadotnet/Wire">Wire</a> as fast as possible. He also followed up that post with a set of <a href="https://twitter.com/RogerAlsing/status/767320145807147008">benchmarks</a>, showing how Wire compared to other .NET serialisers:</p>
<p><a href="/images/2016/08/Performance Graphs - Wire v. other Serialisers.jpg"><img src="/images/2016/08/Performance Graphs - Wire v. other Serialisers.jpg" alt="Wire compared to other .NET serialisers" /></a></p>
<p>Using <a href="https://perfdotnet.github.io/BenchmarkDotNet/">BenchmarkDotNet</a>, this post will analyse the individual optimisations and show how much faster each change is. For reference, the full list of optimisations in the <a href="https://rogeralsing.com/2016/08/16/wire-writing-one-of-the-fastest-net-serializers/">original blog post</a> are:</p>
<ul>
<li>Looking up value serializers by type</li>
<li>Looking up types when deserializing</li>
<li>Byte buffers, allocations and GC</li>
<li>Clever allocations</li>
<li>Boxing, Unboxing and Virtual calls</li>
<li>Fast creation of empty objects</li>
</ul>
<hr />
<h3 id="looking-up-value-serializers-by-type">Looking up value serializers by type</h3>
<p>This optimisation changed code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">ValueSerializer</span> <span class="nf">GetSerializerByType</span><span class="p">(</span><span class="n">Type</span> <span class="n">type</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ValueSerializer</span> <span class="n">serializer</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">_serializers</span><span class="p">.</span><span class="nf">TryGetValue</span><span class="p">(</span><span class="n">type</span><span class="p">,</span> <span class="k">out</span> <span class="n">serializer</span><span class="p">))</span>
<span class="k">return</span> <span class="n">serializer</span><span class="p">;</span>
<span class="c1">//more code to build custom type serializers.. ignore for now.</span>
<span class="p">}</span>
</code></pre></div></div>
<p>into this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">ValueSerializer</span> <span class="nf">GetSerializerByType</span><span class="p">(</span><span class="n">Type</span> <span class="n">type</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nf">ReferenceEquals</span><span class="p">(</span><span class="n">type</span><span class="p">.</span><span class="nf">GetTypeInfo</span><span class="p">().</span><span class="n">Assembly</span><span class="p">,</span> <span class="n">ReflectionEx</span><span class="p">.</span><span class="n">CoreAssembly</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="p">==</span> <span class="n">TypeEx</span><span class="p">.</span><span class="n">StringType</span><span class="p">)</span> <span class="c1">//we simply keep a reference to each primitive type</span>
<span class="k">return</span> <span class="n">StringSerializer</span><span class="p">.</span><span class="n">Instance</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="p">==</span> <span class="n">TypeEx</span><span class="p">.</span><span class="n">Int32Type</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Int32Serializer</span><span class="p">.</span><span class="n">Instance</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="p">==</span> <span class="n">TypeEx</span><span class="p">.</span><span class="n">Int64Type</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Int64Serializer</span><span class="p">.</span><span class="n">Instance</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So it has replaced a <code class="language-plaintext highlighter-rouge">dictionary</code> lookup with an <code class="language-plaintext highlighter-rouge">if</code> statement. In addition it is caching the <code class="language-plaintext highlighter-rouge">Type</code> instance of known types, rather than calculating them every time. As you can see the optimisation pays off in some circumstance but not in others, so it’s not a clear win. It depends on where the type is in the list of <code class="language-plaintext highlighter-rouge">if</code> statements. If it’s near the beginning (e.g. <code class="language-plaintext highlighter-rouge">System.String</code>) it’ll be quicker than if it’s near the end (e.g. <code class="language-plaintext highlighter-rouge">System.Byte[]</code>), which makes sense as all the other comparisons have to be done first.</p>
<p><a href="/images/2016/08/LookingUpValueSerializersByType-Results.png"><img src="/images/2016/08/LookingUpValueSerializersByType-Results.png" alt="LookingUpValueSerializersByType-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/af0319dc908449239cd3d135e76de4a8">Full benchmark code and results</a></p>
<h3 id="looking-up-types-when-deserializing">Looking up types when deserializing</h3>
<p>The 2nd optimisation works by removing all unnecessary memory allocations, it did this by:</p>
<ul>
<li>Using a custom <code class="language-plaintext highlighter-rouge">struct</code> (value type) rather than a <code class="language-plaintext highlighter-rouge">class</code></li>
<li>Pre-calculating a hash code once, rather than each time a comparison is needed.</li>
<li>Doing string comparisons with raw <code class="language-plaintext highlighter-rouge">byte []</code>, rather than deserialising to a <code class="language-plaintext highlighter-rouge">string</code></li>
</ul>
<p><a href="/images/2016/08/LookingUpTypesWhenDeserializing-Results.png"><img src="/images/2016/08/LookingUpTypesWhenDeserializing-Results.png" alt="LookingUpTypesWhenDeserializing-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/da62343df8fbdc5378df21e49df6a7c3">Full benchmark code and results</a></p>
<p><strong>Note:</strong> these results nicely demonstrate how BenchmarkDotNet can show you <a href="/2016/02/17/adventures-in-benchmarking-memory-allocations/">memory allocations</a> as well as the time taken.</p>
<p>Interestingly they hadn’t actually removed all memory allocations as the comparisons between <code class="language-plaintext highlighter-rouge">OptimisedLookup</code> and <code class="language-plaintext highlighter-rouge">OptimisedLookupCustomComparer</code> show. To fix this I <a href="https://github.com/akkadotnet/Wire/pull/76">sent a P.R</a> which removes unnecessary boxing, by using a Custom Comparer rather than the default <code class="language-plaintext highlighter-rouge">struct</code> comparer.</p>
<h3 id="byte-buffers-allocations-and-gc">Byte buffers, allocations and GC</h3>
<p>Again removing unnecessary memory allocations were key in this optimisation, most of which can be seen in the <a href="https://github.com/akkadotnet/Wire/blob/dev/Wire/NoAllocBitConverter.cs">NoAllocBitConverter</a>. Clearly serialisation spends <em>a lot</em> of time converting from the in-memory representation of an object to the serialised version, i.e. a <code class="language-plaintext highlighter-rouge">byte []</code>. So several tricks were employed to ensure that temporary memory allocations were either removed completely or if that wasn’t possible, they were done by re-using a buffer from a pool rather than allocating a new one each time (see <a href="https://rogeralsing.com/2016/08/16/wire-writing-one-of-the-fastest-net-serializers/">“Buffer recycling”</a>)</p>
<p><a href="/images/2016/08/StringSerialisationDeserialisation-Results.png"><img src="/images/2016/08/StringSerialisationDeserialisation-Results.png" alt="StringSerialisationDeserialisation-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/e6856ab4625d4e306cc04b9349edd869">Full benchmark code and results</a></p>
<h3 id="clever-allocations">Clever allocations</h3>
<p>This optimisation is perhaps the most interesting, because it’s implemented by creating a custom data structure, tailored to the specific needs of Wire. So, rather than using the default <a href="https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx">.NET dictionary</a>, they implemented <a href="https://github.com/akkadotnet/Wire/blob/36b93703b003d70744fc97e3e400cca411dce1c9/Wire/FastDictionary.cs">FastTypeUShortDictionary</a>. In essence this data structure optimises for having only 1 item, but falls back to a regular dictionary when it grows larger. To see this in action, here is the code from the <a href="https://github.com/akkadotnet/Wire/blob/36b93703b003d70744fc97e3e400cca411dce1c9/Wire/FastDictionary.cs#L13-L31">TryGetValue(..) method</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="kt">bool</span> <span class="nf">TryGetValue</span><span class="p">(</span><span class="n">Type</span> <span class="n">key</span><span class="p">,</span> <span class="k">out</span> <span class="kt">ushort</span> <span class="k">value</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">_length</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="m">0</span><span class="p">:</span>
<span class="k">value</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="k">case</span> <span class="m">1</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">key</span> <span class="p">==</span> <span class="n">_firstType</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">value</span> <span class="p">=</span> <span class="n">_firstValue</span><span class="p">;</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">value</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="k">default</span><span class="p">:</span>
<span class="k">return</span> <span class="n">_all</span><span class="p">.</span><span class="nf">TryGetValue</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="k">out</span> <span class="k">value</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Like we’ve seen before, the performance gains aren’t clear-cut. For instance it depends on whether <code class="language-plaintext highlighter-rouge">FastTypeUShortDictionary</code> contains the item you are looking for (<code class="language-plaintext highlighter-rouge">Hit</code> v <code class="language-plaintext highlighter-rouge">Miss</code>), but generally it is faster:</p>
<p><a href="/images/2016/08/FastTypeUShortDictionary-Alternative-Results.png"><img src="/images/2016/08/FastTypeUShortDictionary-Alternative-Results.png" alt="FastTypeUShortDictionary-Alternative-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/ed18d27c66e3e539b068371a0dca98f2">Full benchmark code and results</a></p>
<h3 id="boxing-unboxing-and-virtual-calls">Boxing, Unboxing and Virtual calls</h3>
<p>This optimisation is based on the widely used trick that I imagine almost all .NET serialisers employ. For a serialiser to be generic, is has to be able to handle any type of object that is passed to it. Therefore the first thing it does is use <a href="https://msdn.microsoft.com/en-us/library/f7ykdhsy(v=vs.110).aspx">reflection</a> to find the public fields/properties of that object, so that it knows the data is has to serialise. Doing reflection like this time-and-time again gets expensive, so the way to get round it is to do reflection once and then use <a href="https://blogs.msdn.microsoft.com/csharpfaq/2009/09/14/generating-dynamic-methods-with-expression-trees-in-visual-studio-2010/">dynamic code generation</a> to compile a <code class="language-plaintext highlighter-rouge">delegate</code> than you can then call again and again.</p>
<p>If you are interested in how to implement this, see the <a href="https://github.com/akkadotnet/Wire/blob/dev/Wire/Compilation/Compiler.cs">Wire compiler source</a> or <a href="http://stackoverflow.com/questions/17949208/whats-the-easiest-way-to-generate-code-dynamically-in-net-4-5/17949447#17949447">this Stack Overflow question</a>. As shown in the results below, compiling code dynamically is much faster than reflection and only a little bit slower than if you read/write the property directly in C# code:</p>
<p><a href="/images/2016/08/DynamicCodeGeneration-Results.png"><img src="/images/2016/08/DynamicCodeGeneration-Results.png" alt="DynamicCodeGeneration-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/9fb3084306f065e95b4712d51fe36217">Full benchmark code and results</a></p>
<h3 id="fast-creation-of-empty-objects">Fast creation of empty objects</h3>
<p>The final optimisation trick used is also based on dynamic code creation, but this time it is purely dealing with creating empty objects. Again this is something that a serialiser does many time, so any optimisations or savings are worth it.</p>
<p>Basically the benchmark is comparing code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FormatterServices</span><span class="p">.</span><span class="nf">GetUninitializedObject</span><span class="p">(</span><span class="n">type</span><span class="p">);</span>
</code></pre></div></div>
<p>with dynamically generated code, based on <a href="https://msdn.microsoft.com/en-us/library/mt654263.aspx">Expression trees</a>:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">newExpression</span> <span class="p">=</span> <span class="n">ExpressionEx</span><span class="p">.</span><span class="nf">GetNewExpression</span><span class="p">(</span><span class="n">typeToUse</span><span class="p">);</span>
<span class="n">Func</span><span class="p"><</span><span class="n">TestClass</span><span class="p">></span> <span class="n">optimisation</span> <span class="p">=</span> <span class="n">Expression</span><span class="p">.</span><span class="n">Lambda</span><span class="p"><</span><span class="n">Func</span><span class="p"><</span><span class="n">TestClass</span><span class="p">>>(</span><span class="n">newExpression</span><span class="p">).</span><span class="nf">Compile</span><span class="p">();</span>
</code></pre></div></div>
<p>However this trick only works if the <code class="language-plaintext highlighter-rouge">constructor</code> of the type being created is empty, otherwise it has to fall back to the slow version. But as shown in the results below, we can see that the optimisation is a clear win and worth implementing:</p>
<p><a href="/images/2016/08/FastCreationOfEmptyObjects-Results.png"><img src="/images/2016/08/FastCreationOfEmptyObjects-Results.png" alt="FastCreationOfEmptyObjects-Results" /></a></p>
<p><a href="https://gist.github.com/mattwarren/b48b3e5a720b174e64f16353d8ce9960">Full benchmark code and results</a></p>
<hr />
<h2 id="summary">Summary</h2>
<p>So it’s obvious that <a href="https://twitter.com/rogeralsing">Roger Johansson</a> and <a href="https://twitter.com/Scooletz">Szymon Kulec</a> (who also <a href="https://blog.scooletz.com/2016/08/09/wire-improvements/">contributed performance improvements</a>) know their optimisations and as a result they have steadily made the Wire serialiser faster, which makes is an interesting project to learn from.</p>
<p>The post <a href="http://www.mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/">Analysing Optimisations in the Wire Serialiser</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Preventing .NET Garbage Collections with the TryStartNoGCRegion API2016-08-16T00:00:00+00:00http://www.mattwarren.org/2016/08/16/Preventing-dotNET-Garbage-Collections-with-the-TryStartNoGCRegion-API
<p>Pauses are a known problem in runtimes that have a Garbage Collector (GC), such as Java or .NET. GC Pauses can last several milliseconds, during which your application is <a href="/2016/08/08/GC-Pauses-and-Safe-Points/">blocked or suspended</a>. One way you can alleviate the pauses is to modify your code so that it doesn’t allocate, i.e. so the GC has nothing to do. But this can require lots of work and you really have to understand the runtime as many allocation are hidden.</p>
<p>Another technique is to temporarily suspend the GC, during a critical region of your code where you don’t want any pauses and then start it up again afterwards. This is exactly what the <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> API (<a href="https://blogs.msdn.microsoft.com/dotnet/2015/07/20/announcing-net-framework-4-6/">added in .NET 4.6</a>) allows you to do.</p>
<p>From the <a href="https://msdn.microsoft.com/en-us/library/dn906201(v=vs.110).aspx">MSDN docs</a>:</p>
<blockquote>
<p>Attempts to disallow garbage collection during the execution of a critical path if a specified amount of memory is available.</p>
</blockquote>
<h2 id="trystartnogcregion-in-action">TryStartNoGCRegion in Action</h2>
<p>To see how the API works, I ran some simple tests using the .NET GC <strong>Workstation</strong> mode, on a 32-bit CPU. The test simply call <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> and then verify how much memory can be allocated before a Collection happens. The <a href="https://gist.github.com/mattwarren/c9a87c40301f12084d0ab9ba43c01908">code is available</a> if you want to try it out for yourself.</p>
<h3 id="test-1-regular-allocation-trystartnogcregion-not-called">Test 1: Regular allocation, <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> not called</h3>
<p>You can see that a garbage collection happens after the 2nd allocation (indicated by “**”):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prevent GC: False, Over Allocate: False
Allocated: 3.00 MB, Mode: Interactive, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 6.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1, **
Allocated: 9.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1,
Allocated: 12.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1,
Allocated: 15.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1,
</code></pre></div></div>
<h3 id="test-2-trystartnogcregion-with-size-set-to-15mb">Test 2: <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion(..)</code> with size set to 15MB</h3>
<p>Here we see that despite allocating the same amount as in the first test, no garbage collections are triggered during the run.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prevent GC: True, Over Allocate: False
TryStartNoGCRegion: Size=15 MB (15,360 K or 15,728,640 bytes) SUCCEEDED
Allocated: 3.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 6.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 9.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 12.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 15.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
</code></pre></div></div>
<h3 id="test-3-trystartnogcregion-size-of-15mb-but-allocating-more-than-15mb">Test 3: <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion(..)</code> size of 15MB, but allocating more than 15MB</h3>
<p>Finally we see that once we’ve allocated more that the <code class="language-plaintext highlighter-rouge">size</code> we asked for, the mode switches from <code class="language-plaintext highlighter-rouge">NoGCRegion</code> to <code class="language-plaintext highlighter-rouge">Interactive</code> and garbage collections can now happen.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prevent GC: True, Over Allocate: True
TryStartNoGCRegion: Size=15 MB (15,360 K or 15,728,640 bytes) SUCCEEDED
Allocated: 3.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 6.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 9.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 12.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 15.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 18.00 MB, Mode: NoGCRegion, Gen0: 0, Gen1: 0, Gen2: 0,
Allocated: 21.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1, **
Allocated: 24.00 MB, Mode: Interactive, Gen0: 1, Gen1: 1, Gen2: 1,
Allocated: 27.00 MB, Mode: Interactive, Gen0: 2, Gen1: 2, Gen2: 2, **
Allocated: 30.00 MB, Mode: Interactive, Gen0: 2, Gen1: 2, Gen2: 2,
</code></pre></div></div>
<p>So this shows that at least in the simple test we’ve done, the API works as advertised. As long as you don’t subsequently allocate more memory than you asked for, no Garbage Collections will take place.</p>
<h3 id="object-size">Object Size</h3>
<p>However there are a few caveats when using <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code>, the first of which is that you are required to know up-front, the total size in bytes of the objects you will be allocating. As we’ve seen <a href="#test-3-trystartnogcregion-size-of-15mb-but-allocating-more-than-15mb">previously</a> if you allocate more than <code class="language-plaintext highlighter-rouge">totalSize</code> bytes, the <em>No GC Region</em> will no longer be active and it will then be possible for garbage collections to happen.</p>
<p>It’s not straight forward to get the size of an object in .NET, it’s a managed-runtime and it tries it’s best to hide that sort of detail from you. To further complicate matters is varies depending on the CPU architecture and even the version of the runtime.</p>
<p>But you do have a few options:</p>
<ol>
<li>Guess?!</li>
<li><a href="http://stackoverflow.com/questions/631825/net-object-size">Search</a> on <a href="http://stackoverflow.com/questions/1128315/find-size-of-object-instance-in-bytes-in-c-sharp">Stack</a> <a href="http://stackoverflow.com/questions/207592/getting-the-size-of-a-field-in-bytes-with-c-sharp">Overflow</a></li>
<li>Start-up <a href="https://en.wikipedia.org/wiki/WinDbg">WinDbg</a> and use the <code class="language-plaintext highlighter-rouge">!objsize</code> command on a memory dump of your process</li>
<li>Get a estimate using the technique that <a href="https://codeblog.jonskeet.uk/2011/04/05/of-memory-and-strings/">Jon Skeet proposes</a></li>
<li>Use <a href="https://www.nuget.org/packages/DotNetEx/">DotNetEx</a>, which relies on inspecting the <a href="https://github.com/mumusan/dotnetex/blob/master/System.Runtime.CLR/GCEx.cs#L67-L125">internal fields of the CLR object</a></li>
</ol>
<p>Personally I would go with a variation of 3), use WinDbg, but automate it using the excellent <a href="https://github.com/Microsoft/clrmd/blob/master/Documentation/WalkingTheHeap.md#a-non-linear-heap-walk">CLRMD</a> C# library.</p>
<h3 id="segment-size">Segment Size</h3>
<p><strong>Update:</strong> It turns out that I completely missed the section on segment sizes on the MSDN page, thanks to Maoni for <a href="https://github.com/dotnet/coreclr/issues/6809#issuecomment-241238416">pointing this out to me</a>. In the <a href="https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx#generations">section on “Generations”</a> there is the following chart (which fortunately correlates with my findings below):</p>
<p><img src="/images/2016/08/Default Segment Sizes (from MSDN page).png" alt="Default Segment Sizes" /></p>
<p><del>However even when you know how many bytes will be allocated within the <em>No GC Region</em>, you still need to ensure that it’s less that the maximum amount allowed, because if you specify a value too large an <code class="language-plaintext highlighter-rouge">ArgumentOutOfRangeException</code> exception is thrown. From the <a href="https://msdn.microsoft.com/en-us/library/dn906201(v=vs.110).aspx">MSDN docs</a> (emphasis mine):</del></p>
<blockquote>
<p><del> The amount of memory in bytes to allocate without triggering a garbage collection. <strong>It must be less than or equal to the size of an ephemeral segment</strong>. For information on the size of an ephemeral segment, see the “Ephemeral generations and segments” section in the <a href="https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx">Fundamentals of Garbage Collection article</a>.</del></p>
</blockquote>
<p><del>However if you visit the linked article on <em>GC Fundamentals</em>, it has no exact figure for the size of an <em>ephemeral segment</em>, it does however have <a href="https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx#Anchor_2">this stark warning</a>:</del></p>
<blockquote>
<p><del><strong>Important</strong>
The size of segments allocated by the garbage collector is implementation-specific and is subject to change at any time, including in periodic updates. <strong>Your app should never make assumptions about or depend on a particular segment size</strong>, nor should it attempt to configure the amount of memory available for segment allocations.</del></p>
</blockquote>
<p><del><strong>Excellent, that’s very helpful!?</strong></del></p>
<p><del><strong>So let me get this straight, to prevent <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> from throwing an exception, we have to pass in a <code class="language-plaintext highlighter-rouge">totalSize</code> value that isn’t larger than the size of an ephemeral segment, but we’re not allowed to know the actual value of an ephemeral segment, in-case we assume too much!!</strong></del></p>
<p><del>So where does that leave us?</del></p>
<p>Well fortunately it’s possible to figure out the size of an ephemeral or Small Object Heap (SOH) segment using either <a href="http://blogs.microsoft.co.il/sasha/2011/07/18/mapping-the-memory-usage-of-net-applications-part-2-vmmap-and-memorydisplay/">VMMap</a>, or the previously mentioned <a href="https://github.com/Microsoft/clrmd/blob/master/Documentation/WalkingTheHeap.md">CLRMD library</a> (<a href="https://gist.github.com/mattwarren/3dce1aea76c50da850af53a2d453e3c0">code sample available</a>).</p>
<p>Here are the results I got with the .NET Framework 4.6.1, running on a <a href="http://ark.intel.com/products/75128/Intel-Core-i7-4800MQ-Processor-6M-Cache-up-to-3_70-GHz">4 Core (HT) - Intel® Core™ i7-4800MQ</a>, i.e. <a href="https://msdn.microsoft.com/en-us/library/system.environment.processorcount(v=vs.110).aspx">Environment.ProcessorCount = 8</a>. If you click on the links for each row heading, you can see the full breakdown as reported by <a href="https://technet.microsoft.com/en-us/sysinternals/vmmap.aspx">VMMap</a>.</p>
<table>
<thead>
<tr>
<th>GC Mode</th>
<th>CPU Arch</th>
<th style="text-align: right">SOH Segment</th>
<th style="text-align: right">LOH Segment</th>
<th style="text-align: right">Initial GC Size</th>
<th style="text-align: right">Largest <em>No GC Region</em> <code class="language-plaintext highlighter-rouge">totalSize</code> value</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="/images/2016/08/GC Heaps - Workstation - 32-bit.png">Workstation</a></td>
<td>32-bit</td>
<td style="text-align: right">16 MB</td>
<td style="text-align: right">16 MB</td>
<td style="text-align: right">32 MB</td>
<td style="text-align: right">16 MB</td>
</tr>
<tr>
<td><a href="/images/2016/08/GC Heaps - Workstation - 64-bit.png">Workstation</a></td>
<td>64-bit</td>
<td style="text-align: right">256 MB</td>
<td style="text-align: right">128 MB</td>
<td style="text-align: right">384 MB</td>
<td style="text-align: right">244 MB</td>
</tr>
<tr>
<td><a href="/images/2016/08/GC Heaps - Server - 32-bit.png">Server</a></td>
<td>32-bit</td>
<td style="text-align: right">32 MB</td>
<td style="text-align: right">16 MB</td>
<td style="text-align: right">384 MB</td>
<td style="text-align: right">256 MB</td>
</tr>
<tr>
<td><a href="/images/2016/08/GC Heaps - Server - 64-bit.png">Server</a></td>
<td>64-bit</td>
<td style="text-align: right">2,048 MB</td>
<td style="text-align: right">256 MB</td>
<td style="text-align: right">18,423 MB</td>
<td style="text-align: right">16,384 MB</td>
</tr>
</tbody>
</table>
<p>The final column is the largest <code class="language-plaintext highlighter-rouge">totalSize</code> value that can be passed into <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion(long totalSize)</code>, this was found by experimentation/trial-and-error.</p>
<p><strong>Note:</strong> The main difference between <strong>Server</strong> and <strong>Workstation</strong> is that in Workstation mode there is <a href="/images/2016/08/GC Heaps - Workstation - 32-bit.png">only one heap</a>, whereas in Server mode there is <a href="/images/2016/08/GC Heaps - Server - 32-bit.png">one heap per logical CPU</a>.</p>
<hr />
<h2 id="trystartnogcregion-under-the-hood">TryStartNoGCRegion under-the-hood</h2>
<p>What’s nice is that the <a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a">entire feature is in a single Github commit</a>, so it’s easy to see what code changes were made:</p>
<p><a href="/images/2016/08/Github commit for the feature.png"><img src="/images/2016/08/Github commit for the feature.png" alt="Github commit for the feature" /></a></p>
<p>Around half of the files modified (listed below) are the changes needed to set-up the plumbing and error handling involved in adding a API to the <a href="https://msdn.microsoft.com/en-us/library/system.gc(v=vs.110).aspx()">System.GC class</a>, they also give an interesting overview of what’s involved in having the external <code class="language-plaintext highlighter-rouge">C#</code> code talk to the internal <code class="language-plaintext highlighter-rouge">C++</code> code in the CLR (click on a link to go directly to the diff):</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-6dec79513185e5c912cb878e0858d41c">src/mscorlib/src/System/GC.cs</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-1817fbf34d63e01e6b9ae4908e459f36">src/mscorlib/src/System/Runtime/GcSettings.cs</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-ca326b8cd58d6642f56aa054c221c22a">src/vm/comutilnative.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-b8ebb0f0bef52890d69facf86688870e">src/vm/comutilnative.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-3667dffbd11675529c85670ef344242e">src/vm/ecalllist.h</a></li>
</ul>
<p>The rest of the changes are where the actual work takes place, with all the significant heavy-lifting happening in <code class="language-plaintext highlighter-rouge">gc.cpp</code>:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-9b1cf8b32169db5abb15e28386d99a10">src/gc/gc.cpp</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-f27aec4c298a7df8ff654eff47e7c0dd">src/gc/gc.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-3001bb7a5fd2ac11b928c223e44a2b95">src/gc/gcimpl.h</a></li>
<li><a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-295f0ed467af7d7d972f659a633bf8b9">src/gc/gcpriv.h</a></li>
</ul>
<h3 id="trystartnogcregion-implementation">TryStartNoGCRegion Implementation</h3>
<p>When you call <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> the following things happen:</p>
<ul>
<li>The maximum required heap sizes are calculated based on the <code class="language-plaintext highlighter-rouge">totalSize</code> parameter passed in. These calculations take place in <a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-9b1cf8b32169db5abb15e28386d99a10R15196">gc_heap::prepare_for_no_gc_region</a></li>
<li>If the current heaps aren’t large enough to accommodate the new value, they are re-sized. To achieve this a full collection is triggered (see <a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-9b1cf8b32169db5abb15e28386d99a10R34831">GCHeap::StartNoGCRegion</a>)</li>
</ul>
<p><strong>Note:</strong> Due to the way the GC uses <a href="#segment-size">segments</a>, it won’t always <em>allocate</em> memory. It will however ensure that it <em>reserves</em> the maximum amount of memory required, so that it can be <em>committed</em> when actually needed.</p>
<p>Then next time the GC wants to perform a collection it checks:</p>
<ol>
<li>Is the current mode set to <em>No GC Region</em>
<ul>
<li>By checking <code class="language-plaintext highlighter-rouge">gc_heap::settings.pause_mode == pause_no_gc</code>, <a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-9b1cf8b32169db5abb15e28386d99a10R14638">relevant code here</a></li>
</ul>
</li>
<li>Can we stay in the <em>No GC Region</em> mode
<ul>
<li>This is done by calling <a href="https://github.com/dotnet/coreclr/commit/4f74a99e296d929945413c5a65d0c61bb7f2c32a#diff-9b1cf8b32169db5abb15e28386d99a10R15448">gc_heap::should_proceed_for_no_gc()</a>, which performs a sanity-check to ensure that we haven’t allocated more than the # of bytes we asked for when <code class="language-plaintext highlighter-rouge">TryStartNoGCRegion</code> was set-up</li>
</ul>
</li>
</ol>
<p>If 1) and 2) are both true then a collection <strong>does not</strong> take place because the GC knows that it has already <em>reserved</em> enough memory to fulfil future allocations, so it doesn’t need to clean-up up any existing garbage to make space.</p>
<hr />
<h3 id="further-reading">Further Reading:</h3>
<ul>
<li><a href="http://thrivingapp.com/?p=33">You can now tell the .NET GC to stop collecting during critical code paths</a></li>
<li><a href="http://stackoverflow.com/questions/31560471/prevent-gc-collections-in-certain-spots-to-improve-performance/31561180#31561180">Prevent GC Collections In Certain Spots To Improve Performance</a></li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2005/10/04/so-whats-new-in-the-clr-2-0-gc/">So, what’s new in the CLR 2.0 GC?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/tess/2008/04/17/how-does-the-gc-work-and-what-are-the-sizes-of-the-different-generations/">How does the GC work and what are the sizes of the different generations?</a></li>
<li><a href="https://blogs.msdn.microsoft.com/tess/2006/09/06/net-memory-usage-a-restaurant-analogy/">.NET Memory usage – A restaurant analogy</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/08/16/Preventing-dotNET-Garbage-Collections-with-the-TryStartNoGCRegion-API/">Preventing .NET Garbage Collections with the TryStartNoGCRegion API</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
GC Pauses and Safe Points2016-08-08T00:00:00+00:00http://www.mattwarren.org/2016/08/08/GC-Pauses-and-Safe-Points
<p>GC pauses are a popular topic, if you do a <a href="https://www.google.com/#q=gc+pauses+in+.net">google search</a>, you’ll see lots of articles explaining how to measure and more importantly how to reduce them. This issue is that in most runtimes that have a GC, allocating objects is a quick operation, but at some point in time the GC will need to clean up all the garbage and to do this is has to <em>pause</em> the entire runtime (except if you happen to be using <a href="https://www.azul.com/products/zing/pgc/">Azul’s pauseless GC for Java</a>).</p>
<p>The GC needs to pause the entire runtime so that it can move around objects as part of it’s <em>compaction</em> phase. If these objects were being referenced by code that was simultaneously executing then all sorts of bad things would happen. So the GC can only make these changes when it knows that no other code is running, hence the need to <em>pause</em> the entire runtime.</p>
<h2 id="gc-flow">GC Flow</h2>
<p>In a <a href="/2016/06/20/Visualising-the-dotNET-Garbage-Collector/">previous post</a> I demonstrated how you can use ETW Events to visualise what the .NET Garbage Collector (GC) is doing. That post included the following GC flow for a Foreground/Blocking Collection (info taken from the <a href="https://blogs.msdn.microsoft.com/maoni/2014/12/25/gc-etw-events-3/">excellent blog post</a> by <a href="https://github.com/Maoni0/">Maoni Stephens</a> the main developer on the .NET GC):</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code> <– <strong>suspension is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCStart_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCEnd_V1</code> <– <strong>actual GC is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEBegin_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code> <– <strong>resumption is done.</strong></li>
</ol>
<p>This post is going to be looking at <strong>how</strong> the .NET Runtime brings all the threads in an application to a <strong>safe-point</strong> so that the GC can do it’s work. This corresponds to what happens between step 1) <code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code> and 2) <code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code> in the flow above.</p>
<p>For some background this passage from the excellent <a href="https://www.amazon.co.uk/Pro-NET-Performance-Optimize-Applications/dp/1430244585/ref=as_li_ss_tl?ie=UTF8&linkCode=ll1&tag=mattonsoft-21&linkId=f18fd47630f046ab8e28512acc728fbb">Pro .NET Performance: Optimize Your C# Applications </a> explains what’s going on:</p>
<p><a href="https://books.google.co.uk/books?id=fhpYTbos8OkC&pg=PA103&lpg=PA103&dq=GC+safepoints+.NET&source=bl&ots=OcEbYCaMor&sig=XNDl1pSuKRcDU_xc1M6Go64ot2Q&hl=en&sa=X&redir_esc=y#v=onepage&q&f=false"><img src="/images/2016/08/Suspending Threads for GC.png" alt="Suspending Threads for GC" /></a></p>
<p>Technically the GC itself doesn’t actually perform a suspension, it calls <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/gcenv.ee.cpp#L26-L36">into the <em>Execution Engine</em> (EE)</a> and asks that to suspend all the running threads. This suspension needs to be as quick as possible, because the time taken contributes to the overall <em>GC pause</em>. Therefore this <em>Time To Safe Point</em> (TTSP) as it’s known, needs to be minimised, the CLR does this by using several techniques.</p>
<h2 id="gc-suspension-in-runtime-code">GC suspension in Runtime code</h2>
<p>Inside code that it controls, the runtime inserts method calls to ensure that threads can regularly <em>poll</em> to determine when they need to suspend. For instance take a look at the following code snippet from the <a href="https://github.com/dotnet/coreclr/blob/deb00ad58acf627763b6c0a7833fa789e3bb1cd0/src/classlibnative/bcltype/stringnative.cpp#L351-L400"><code class="language-plaintext highlighter-rouge">IndexOfCharArray()</code></a> method (which is called internally by <a href="https://msdn.microsoft.com/en-us/library/system.string.indexofany(v=vs.110).aspx"><code class="language-plaintext highlighter-rouge">String.IndexOfAny(..)</code></a>). Notice that it contains multiple calls to the macro <code class="language-plaintext highlighter-rouge">FC_GC_POLL_RET()</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FCIMPL4</span><span class="p">(</span><span class="n">INT32</span><span class="p">,</span> <span class="n">COMString</span><span class="o">::</span><span class="n">IndexOfCharArray</span><span class="p">,</span> <span class="n">StringObject</span><span class="o">*</span> <span class="n">thisRef</span><span class="p">,</span> <span class="n">CHARArray</span><span class="o">*</span> <span class="n">valueRef</span><span class="p">,</span> <span class="n">INT32</span> <span class="n">startIndex</span><span class="p">,</span> <span class="n">INT32</span> <span class="n">count</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// <OTHER CODE REMOVED></span>
<span class="c1">// use probabilistic map, see (code:InitializeProbabilisticMap)</span>
<span class="kt">int</span> <span class="n">charMap</span><span class="p">[</span><span class="n">PROBABILISTICMAP_SIZE</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="n">InitializeProbabilisticMap</span><span class="p">(</span><span class="n">charMap</span><span class="p">,</span> <span class="n">valueChars</span><span class="p">,</span> <span class="n">valueLength</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">startIndex</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">endIndex</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">WCHAR</span> <span class="n">thisChar</span> <span class="o">=</span> <span class="n">thisChars</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ProbablyContains</span><span class="p">(</span><span class="n">charMap</span><span class="p">,</span> <span class="n">thisChar</span><span class="p">))</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ArrayContains</span><span class="p">(</span><span class="n">thisChars</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">valueChars</span><span class="p">,</span> <span class="n">valueLength</span><span class="p">)</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">FC_GC_POLL_RET</span><span class="p">();</span>
<span class="k">return</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">FC_GC_POLL_RET</span><span class="p">();</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The are <a href="https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=FC_GC_POLL+FC_GC_POLL_RET&type=Code">lots of other places</a> in the runtime where these calls are inserted, to ensure that a GC suspension can happen as soon as possible. However having these calls spread throughout the code has an overhead, so the runtime uses a special trick to ensure the cost is only paid when a suspension has actually been requested, From <a href="https://github.com/dotnet/coreclr/blob/deb00ad58acf627763b6c0a7833fa789e3bb1cd0/src/vm/i386/jithelp.asm#L472-L480">jithelp.asm</a> you can see that the method call is re-written to a <code class="language-plaintext highlighter-rouge">nop</code> routine when not needed and only calls the <a href="https://github.com/dotnet/coreclr/blob/deb00ad58acf627763b6c0a7833fa789e3bb1cd0/src/vm/jithelpers.cpp#L6331-L6536">actual <code class="language-plaintext highlighter-rouge">JIT_PollGC()</code> function</a> when absolutely required:</p>
<pre><code class="language-assembly">; Normally (when we're not trying to suspend for GC), the
; CORINFO_HELP_POLL_GC helper points to this nop routine. When we're
; ready to suspend for GC, we whack the Jit Helper table entry to point
; to the real helper. When we're done with GC we whack it back.
PUBLIC @JIT_PollGC_Nop@0
@JIT_PollGC_Nop@0 PROC
ret
@JIT_PollGC_Nop@0 ENDP
</code></pre>
<p>However calls to <code class="language-plaintext highlighter-rouge">FC_GC_POLL</code> need to be carefully inserted in the correct locations, too few and the EE won’t be able to suspend quickly enough and this will cause excessive GC pauses, as this comment from one of the .NET JIT devs confirms:</p>
<p><a href="https://github.com/dotnet/coreclr/pull/36#discussion_r24088949"><img src="/images/2016/08/FC_GC_POLL call location.png" alt="FC_GC_POLL call location" /></a></p>
<h2 id="gc-suspension-in-user-code">GC suspension in User code</h2>
<p>Alternatively, in code that the runtime doesn’t control things are a bit different. Here the JIT analyses the code and classifies it as either:</p>
<ul>
<li><strong>Partially interruptible</strong></li>
<li><strong>Fully interruptible</strong></li>
</ul>
<p><strong>Partially interruptible</strong> code can only be suspended at explicit GC poll locations (i.e. <code class="language-plaintext highlighter-rouge">FC_GC_POLL</code> calls) or when it calls into other methods. On the other hand <strong>fully interruptible</strong> code can be interrupted or suspended at any time, as every line within the method is considered a GC safe-point.</p>
<p>I’m not going to talk about how the <em>thread-hijacking</em> mechanism works (used with <em>fully interruptible</em> code), as it’s a complex topic, but as always there’s an in-depth <a href="https://github.com/dotnet/coreclr/blob/775003a4c72f0acc37eab84628fcef541533ba4e/Documentation/botr/threading.md#hijacking">section in the BOTR</a> that gives all the gory details. If you don’t want to read the whole thing, in summary it suspends the underlying native thread, via the <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx">Win32 SuspendThread API</a>.</p>
<p>You can see <a href="https://github.com/dotnet/coreclr/blob/deb00ad58acf627763b6c0a7833fa789e3bb1cd0/src/jit/flowgraph.cpp#L7382-L7462">some of the heuristics</a> that the JIT uses to decide whether code is fully or partially interruptible as it seeks to find the best trade-off between code quality/size and GC suspension latency. But as a concrete example, if we take the following code that accumulates a counter in a tight loop:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span> <span class="nf">TestMethod</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">long</span> <span class="n">counter</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="m">1000</span> <span class="p">*</span> <span class="m">1000</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="m">2000</span><span class="p">;</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="n">counter</span><span class="p">++;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Loop exited, counter = {0:N0}"</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="k">return</span> <span class="n">counter</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And then execute it with the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md#useful-complus-variables">JIT diagnostics turned on</a> you get the following output, which shows that this code is classified as <em>fully interruptible</em>:</p>
<pre><code class="language-assembly">; Assembly listing for method ConsoleApplication.Program:TestMethod():long
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
</code></pre>
<p>(<a href="https://gist.github.com/mattwarren/71adb255e4b35a92a060029aef4d1728#file-testmethod-fully-interruptible-md">full JIT diagnostic output of <strong>Fully</strong> Interruptible method</a>)</p>
<p>Now, if we run the same test again, but tweak the code by adding a few <code class="language-plaintext highlighter-rouge">Console.WriteLine(..)</code> methods calls:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span> <span class="nf">TestMethod</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">long</span> <span class="n">counter</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="m">1000</span> <span class="p">*</span> <span class="m">1000</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="m">2000</span><span class="p">;</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="m">10</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="n">counter</span><span class="p">++;</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Inside Inner Loop, counter = {0:N0}"</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"After Inner Loop, counter = {0:N0}"</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Thread loop exited cleanly, counter = {0:N0}"</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="k">return</span> <span class="n">counter</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The method is then classified as <em>Partially Interruptible</em>, due to the additional <code class="language-plaintext highlighter-rouge">Console.WriteLine(..)</code> calls:</p>
<pre><code class="language-assembly">; Assembly listing for method ConsoleApplication.Program:TestMethod():long
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
</code></pre>
<p>(<a href="https://gist.github.com/mattwarren/06dd970b5364c80d445da4252558a5d3#file-testmethod-partially-interruptible-md">full JIT diagnostic output of <strong>Partially</strong> Interruptible method</a>)</p>
<p>Interesting enough there seems to be functionality that enables <code class="language-plaintext highlighter-rouge">JIT_PollGC()</code> calls to be inserted into <strong>user</strong> code as they are compiled by the .NET JIT, this is controlled by the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md"><code class="language-plaintext highlighter-rouge">GCPollType</code> CLR Configuration flag</a>. However by default it’s disabled and in my tests turning it on causes the CoreCLR to exit with some interesting errors. So it appears that currently, the default or supported behaviour is to use thread-hijacking on user code, rather than inserting explicit <code class="language-plaintext highlighter-rouge">JIT_PollGC()</code> calls.</p>
<hr />
<h3 id="further-reading">Further Reading</h3>
<ul>
<li><a href="http://blogs.microsoft.co.il/sasha/2013/11/05/modern-garbage-collection-in-theory-and-practice/">Modern Garbage Collection in Theory and Practice</a></li>
<li><a href="http://flyingfrogblog.blogspot.co.uk/2012/03/gc-safe-points-mutator-suspension-and.html">GC-safe points, mutator suspension and barriers</a></li>
<li><a href="http://stackoverflow.com/questions/30416520/how-local-variable-usage-infomation-is-maintained-in-net-clr-source-code">How local variable usage infomation is maintained in .net clr source code</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/678ysw69(v=vs.110).aspx">Thread.Suspend, Garbage Collection, and Safe Points</a></li>
<li><a href="http://llvm.org/devmtg/2015-04/slides/LLILC_Euro_LLVM_2015.pptx">LLVM as a code generator for the CoreCLR - With a particular emphasis on GC
</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/6f26329518b08055c090315eee5db533e42f39ae/src/vm/threadsuspend.cpp#L4784-L4822">Comments on “SuspendRuntime” and “Redirection vs. Hijacking:” in <code class="language-plaintext highlighter-rouge">threadsuspend.cpp</code></a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/6f26329518b08055c090315eee5db533e42f39ae/src/vm/threads.h#L36-L132">Comments on “Suspending The Runtime”, “Cooperative Mode” and “Partially/Fully Interuptible Code” in <code class="language-plaintext highlighter-rouge">threads.h</code></a></li>
<li><a href="http://geekswithblogs.net/akraus1/archive/2014/03/24/155766.aspx">What Every Developer Must Know About Fast Garbage Collection (+ more)</a></li>
<li><a href="http://stackoverflow.com/questions/16655948/does-the-net-garbage-collectors-stop-the-world-effect-halt-or-delay-the-execut">Does the .NET Garbage Collector’s stop-the-world effect halt or delay the execution of unmanaged threads and timer callbacks?</a></li>
<li><a href="http://stackoverflow.com/questions/8404245/gc-behavior-and-clr-thread-hijacking/8405187#8405187">GC Behavior and CLR Thread Hijacking</a></li>
<li><a href="http://stackoverflow.com/questions/4418356/safely-pausing-of-thread-during-gc-in-net/4418520#4418520">Safely pausing of thread during GC in .NET</a></li>
<li><a href="http://osdir.com/ml/windows.devel.dotnet.rotor/2002-08/msg00006.html">CLR and Thread Safe Points</a></li>
<li><a href="https://blogs.msdn.microsoft.com/maoni/2006/06/07/suspending-and-resuming-threads-for-gc/">Suspending and resuming threads for GC</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/08/08/GC-Pauses-and-Safe-Points/">GC Pauses and Safe Points</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
How the dotnet CLI tooling runs your code2016-07-04T00:00:00+00:00http://www.mattwarren.org/2016/07/04/How-the-dotnet-CLI-tooling-runs-your-code
<p>Just over a week ago the <a href="https://blogs.msdn.microsoft.com/dotnet/2016/06/27/announcing-net-core-1-0/">official 1.0 release of .NET Core</a> was announced, the release includes:</p>
<blockquote>
<p>the .NET Core runtime, libraries and tools and the ASP.NET Core libraries.</p>
</blockquote>
<p>However alongside a completely new, revamped, xplat version of the .NET runtime, the development experience has been changed, with the <a href="https://docs.microsoft.com/en-us/dotnet/articles/core/tools/dotnet"><code class="language-plaintext highlighter-rouge">dotnet</code> based tooling</a> now available (<strong>Note</strong>: the <em>tooling</em> itself is currently still in preview and it’s <a href="https://github.com/dotnet/core/blob/master/roadmap.md#planned-11-features">expected to be RTM</a> later this year)</p>
<p>So you can now write:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dotnet new
dotnet restore
dotnet run
</code></pre></div></div>
<p>and at the end you’ll get the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hello World!
</code></pre></div></div>
<p>It’s the <code class="language-plaintext highlighter-rouge">dotnet</code> CLI (Command Line Interface) tooling that is the focus of this post and more specifically <em>how it actually runs your code</em>, although if you want a <strong>tl;dr</strong> version see this tweet from <a href="https://twitter.com/citizenmatt">@citizenmatt</a>:</p>
<p><a href="https://twitter.com/citizenmatt/status/747874853135466496"><img src="/images/2016/07/Tweet explaining dotnet CLI runtime.png" alt="Tweet explaining dotnet CLI runtime" /></a></p>
<hr />
<h2 id="traditional-way-of-running-net-executables">Traditional way of running .NET executables</h2>
<p>As a brief reminder, .NET executables can’t be run directly (they’re just <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">IL</a>, not machine code), therefore the Windows OS has always needed to do a few tricks to execute them, from <a href="http://amzn.to/29baVly">CLR via C#</a>:</p>
<blockquote>
<p>After Windows has examined the EXE file’s header to determine whether to create a 32-bit process, a 64-bit process, or a WoW64 process, Windows loads the x86, x64, or IA64 version of MSCorEE.dll into the process’s address space.
…
Then, the process’ primary thread calls a method defined inside MSCorEE.dll. This method initializes the CLR, loads the EXE assembly, and then calls its entry point method (Main). At this point, the managed application is up and running.</p>
</blockquote>
<h2 id="new-way-of-running-net-executables">New way of running .NET executables</h2>
<h3 id="dotnet-run"><code class="language-plaintext highlighter-rouge">dotnet run</code></h3>
<p>So how do things work now that we have the new CoreCLR and the CLI tooling? Firstly to understand what is going on under-the-hood, we need to set a few environment variables (<code class="language-plaintext highlighter-rouge">COREHOST_TRACE</code> and <code class="language-plaintext highlighter-rouge">DOTNET_CLI_CAPTURE_TIMING</code>) so that we get a more verbose output:</p>
<p><img src="/images/2016/07/dotnet run - with cli timings and verbose output.png" alt="dotnet run - with cli timings and verbose output" /></p>
<p>Here, amongst all the pretty ASCII-art, we can see that <code class="language-plaintext highlighter-rouge">dotnet run</code> actually executes the following cmd:</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">dotnet exec --additionalprobingpath C:\Users\matt\.nuget\packages c:\dotnet\bin\Debug\netcoreapp1.0\myapp.dll</code></p>
</blockquote>
<p><strong>Note</strong>: this is what happens when running a Console Application. The CLI tooling <a href="https://docs.microsoft.com/en-us/dotnet/articles/core/app-types">supports other scenarios</a>, such as self-hosted web sites, which work differently.</p>
<h3 id="dotnet-exec-and-corehost"><code class="language-plaintext highlighter-rouge">dotnet exec</code> and <code class="language-plaintext highlighter-rouge">corehost</code></h3>
<p>Up-to this point everything was happening within managed code, however once <code class="language-plaintext highlighter-rouge">dotnet exec</code> is called we <a href="https://github.com/dotnet/core-setup/blob/release/1.0.0/src/corehost/corehost.cpp#L105-L119">jump over to unmanaged code</a> within <a href="https://github.com/dotnet/core-setup/tree/release/1.0.0/src/corehost">the corehost application</a>. In addition several other .dlls are loaded, the last of which is the CoreCLR runtime itself (click to go to the main source file for each module):</p>
<ul>
<li><a href="https://github.com/dotnet/core-setup/blob/release/1.0.0/src/corehost/cli/hostpolicy.cpp"><code class="language-plaintext highlighter-rouge">hostpolicy.dll</code></a></li>
<li><a href="https://github.com/dotnet/core-setup/blob/release/1.0.0/src/corehost/cli/fxr/hostfxr.cpp"><code class="language-plaintext highlighter-rouge">hostfxr.dll</code></a></li>
<li><a href="https://github.com/dotnet/coreclr"><code class="language-plaintext highlighter-rouge">coreclr.dll</code></a></li>
</ul>
<p>The main task that the <code class="language-plaintext highlighter-rouge">corehost</code> application performs is to calculate and locate all the dlls needed to run the application, along with their dependencies. The full <a href="https://gist.github.com/mattwarren/f527b06c4579ebb414d6e182b910c474">output is available</a>, but in summary it processes:</p>
<ul>
<li>99 <strong>Managed</strong> dlls <a href="https://gist.github.com/mattwarren/428234f1f4508486f4ba3a4e6543bf2e">(“Adding runtime asset..”)</a></li>
<li>136 <strong>Native</strong> dlls <a href="https://gist.github.com/mattwarren/919f54d760f045c47b4833a345abde57">(“Adding native asset..”)</a></li>
</ul>
<p>There are so many individual files because the CoreCLR operates on a “pay-for-play” model, from <a href="https://docs.asp.net/en/1.0.0-rc1/conceptual-overview/dotnetcore.html#motivation-behind-net-core">Motivation Behind .NET Core</a>:</p>
<blockquote>
<p>By factoring the CoreFX libraries and allowing individual applications to pull in only those parts of CoreFX they require (a so-called <strong>“pay-for-play” model</strong>), server-based applications built with ASP.NET 5 can minimize their dependencies.</p>
</blockquote>
<p>Finally, once all the housekeeping is done control is handed off to <a href="https://github.com/dotnet/core-setup/blob/release/1.0.0/src/corehost/corehost.cpp"><code class="language-plaintext highlighter-rouge">corehost</code></a>, but not before the following <a href="https://github.com/dotnet/core-setup/blob/release/1.0.0/src/corehost/cli/hostpolicy.cpp#L91-L123">properties are set</a> to control the execution of the CoreCLR itself:</p>
<ul>
<li><strong>TRUSTED_PLATFORM_ASSEMBLIES</strong> =
<ul>
<li>Paths to 235 .dlls (99 managed, 136 native), from <code class="language-plaintext highlighter-rouge">C:\Program Files\dotnet\shared\Microsoft.NETCore.App\1.0.0-rc2-3002702</code></li>
</ul>
</li>
<li><strong>APP_PATHS</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0</code></li>
</ul>
</li>
<li><strong>APP_NI_PATHS</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0</code></li>
</ul>
</li>
<li><strong>NATIVE_DLL_SEARCH_DIRECTORIES</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">C:\Program Files\dotnet\shared\Microsoft.NETCore.App\1.0.0-rc2-3002702</code></li>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0</code></li>
</ul>
</li>
<li><strong>PLATFORM_RESOURCE_ROOTS</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0</code></li>
<li><code class="language-plaintext highlighter-rouge">C:\Program Files\dotnet\shared\Microsoft.NETCore.App\1.0.0-rc2-3002702</code></li>
</ul>
</li>
<li><strong>AppDomainCompatSwitch</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">UseLatestBehaviorWhenTFMNotSpecified</code></li>
</ul>
</li>
<li><strong>APP_CONTEXT_BASE_DIRECTORY</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0</code></li>
</ul>
</li>
<li><strong>APP_CONTEXT_DEPS_FILES</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">c:\dotnet\bin\Debug\netcoreapp1.0\dotnet.deps.json</code></li>
<li><code class="language-plaintext highlighter-rouge">C:\Program Files\dotnet\shared\Microsoft.NETCore.App\1.0.0-rc2-3002702\Microsoft.NETCore.App.deps.json</code></li>
</ul>
</li>
<li><strong>FX_DEPS_FILE</strong> =
<ul>
<li><code class="language-plaintext highlighter-rouge">C:\Program Files\dotnet\shared\Microsoft.NETCore.App\1.0.0-rc2-3002702\Microsoft.NETCore.App.deps.json</code></li>
</ul>
</li>
</ul>
<p><strong>Note</strong>: You can also run your app by invoking <code class="language-plaintext highlighter-rouge">corehost.exe</code> directly with the following command:</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">corehost.exe C:\dotnet\bin\Debug\netcoreapp1.0\myapp.dll</code></p>
</blockquote>
<h3 id="executing-a-net-assembly">Executing a .NET Assembly</h3>
<p>At last we get to the point at which the .NET dll/assembly is loaded and executed, via the code shown below, taken from <a href="https://github.com/dotnet/coreclr/blob/release/1.0.0/src/dlls/mscoree/unixinterface.cpp#L156-L244">unixinterface.cpp</a>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hr</span> <span class="o">=</span> <span class="n">host</span><span class="o">-></span><span class="n">SetStartupFlags</span><span class="p">(</span><span class="n">startupFlags</span><span class="p">);</span>
<span class="n">IfFailRet</span><span class="p">(</span><span class="n">hr</span><span class="p">);</span>
<span class="n">hr</span> <span class="o">=</span> <span class="n">host</span><span class="o">-></span><span class="n">Start</span><span class="p">();</span>
<span class="n">IfFailRet</span><span class="p">(</span><span class="n">hr</span><span class="p">);</span>
<span class="n">hr</span> <span class="o">=</span> <span class="n">host</span><span class="o">-></span><span class="n">CreateAppDomainWithManager</span><span class="p">(</span>
<span class="n">appDomainFriendlyNameW</span><span class="p">,</span>
<span class="c1">// Flags:</span>
<span class="c1">// APPDOMAIN_ENABLE_PLATFORM_SPECIFIC_APPS</span>
<span class="c1">// - By default CoreCLR only allows platform neutral assembly to be run. To allow</span>
<span class="c1">// assemblies marked as platform specific, include this flag</span>
<span class="c1">//</span>
<span class="c1">// APPDOMAIN_ENABLE_PINVOKE_AND_CLASSIC_COMINTEROP</span>
<span class="c1">// - Allows sandboxed applications to make P/Invoke calls and use COM interop</span>
<span class="c1">//</span>
<span class="c1">// APPDOMAIN_SECURITY_SANDBOXED</span>
<span class="c1">// - Enables sandboxing. If not set, the app is considered full trust</span>
<span class="c1">//</span>
<span class="c1">// APPDOMAIN_IGNORE_UNHANDLED_EXCEPTION</span>
<span class="c1">// - Prevents the application from being torn down if a managed exception is unhandled</span>
<span class="c1">//</span>
<span class="n">APPDOMAIN_ENABLE_PLATFORM_SPECIFIC_APPS</span> <span class="o">|</span>
<span class="n">APPDOMAIN_ENABLE_PINVOKE_AND_CLASSIC_COMINTEROP</span> <span class="o">|</span>
<span class="n">APPDOMAIN_DISABLE_TRANSPARENCY_ENFORCEMENT</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">// Name of the assembly that contains the AppDomainManager implementation</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">// The AppDomainManager implementation type name</span>
<span class="n">propertyCount</span><span class="p">,</span>
<span class="n">propertyKeysW</span><span class="p">,</span>
<span class="n">propertyValuesW</span><span class="p">,</span>
<span class="p">(</span><span class="n">DWORD</span> <span class="o">*</span><span class="p">)</span><span class="n">domainId</span><span class="p">);</span>
</code></pre></div></div>
<p>This is making use of the <a href="https://msdn.microsoft.com/en-us/library/ms164408(v=vs.110).aspx">ICLRRuntimeHost Interface</a>, which is part of the COM based hosting API for the CLR. Despite the file name, it is actually from the Windows version of the CLI tooling. In the xplat world of the CoreCLR the hosting API that was originally written for Unix has been replicated across all the platforms so that a common interface is available for any tools that want to use it, see the following GitHub issues for more information:</p>
<ul>
<li><a href="https://github.com/dotnet/coreclr/issues/1234">Refactor the Unix hosting API</a></li>
<li><a href="https://github.com/dotnet/coreclr/issues/1256">Expose the Unix hosting API on Windows too</a></li>
<li><a href="https://github.com/dotnet/coreclr/pull/1295">Expose Unix hosting API on Windows</a></li>
<li><a href="https://github.com/dotnet/coreclr/blob/master/src/dlls/mscoree/mscorwks_ntdef.src#L20-L24">Unix Hosting API</a></li>
</ul>
<p><strong>And that’s it, your .NET code is now running, simple really!!</strong></p>
<hr />
<h2 id="additional-information">Additional information:</h2>
<ul>
<li><a href="https://docs.microsoft.com/en-us/dotnet/articles/core/tools/dotnet-run">Official dotnet cli tooling documentation</a></li>
<li><a href="https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/specs/corehost.md">corehost runtime assembly resolution</a></li>
<li><a href="https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/specs/runtime-configuration-file.md">Runtime Configuration File specification</a></li>
<li><a href="https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/specs/runtime-configuration-file.md#sections">CoreCLR runtime options</a></li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/07/04/How-the-dotnet-CLI-tooling-runs-your-code/">How the dotnet CLI tooling runs your code</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Visualising the .NET Garbage Collector2016-06-20T00:00:00+00:00http://www.mattwarren.org/2016/06/20/Visualising-the-dotNET-Garbage-Collector
<p>As part of an ongoing attempt to learn more about how a real-life Garbage Collector (GC) works (see <a href="/2016/02/04/learning-how-garbage-collectors-work-part-1/">part 1</a>) and after being inspired by <a href="https://twitter.com/b0rk">Julia Evans’</a> excellent post <a href="http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equals-awesome/">gzip + poetry = awesome</a> I spent a some time writing a tool to enable a live visualisation of the .NET GC in action.</p>
<p>The output from the tool is shown below, click to Play/Stop (<a href="/images/2016/06/GC Visualisation.gif">direct link to gif</a>). The <a href="https://github.com/mattwarren/GCVisualisation">full source is available</a> if you want to take a look.</p>
<p><img class="gifplayer" data-label="Play" gifid="GC-Visualisation" src="/images/2016/06/GC Visualisation.png" /></p>
<p><img src="/images/2016/06/Key to visualisation symbols.png" alt="Key to visualisation symbols" /></p>
<h2 id="capturing-gc-events-in-net">Capturing GC Events in .NET</h2>
<p>Fortunately there is a straight-forward way to capture the raw GC related events, using the excellent <a href="https://blogs.msdn.microsoft.com/vancem/2013/08/15/traceevent-etw-library-published-as-a-nuget-package/">TraceEvent library</a> that provides a wrapper over the underlying <a href="https://msdn.microsoft.com/en-us/library/ff356162(v=vs.110).aspx">ETW Events</a> the .NET GC outputs.</p>
<p>It’s a simple as writing code like this :</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">session</span><span class="p">.</span><span class="n">Source</span><span class="p">.</span><span class="n">Clr</span><span class="p">.</span><span class="n">GCAllocationTick</span> <span class="p">+=</span> <span class="n">allocationData</span> <span class="p">=></span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ProcessIdsUsedInRuns</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="n">allocationData</span><span class="p">.</span><span class="n">ProcessID</span><span class="p">)</span> <span class="p">==</span> <span class="k">false</span><span class="p">)</span>
<span class="k">return</span><span class="p">;</span>
<span class="n">totalBytesAllocated</span> <span class="p">+=</span> <span class="n">allocationData</span><span class="p">.</span><span class="n">AllocationAmount</span><span class="p">;</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Here we are wiring up a callback each time a <code class="language-plaintext highlighter-rouge">GCAllocationTick</code> event is fired, other events that are available include <code class="language-plaintext highlighter-rouge">GCStart</code>, <code class="language-plaintext highlighter-rouge">GCEnd</code>, <code class="language-plaintext highlighter-rouge">GCSuspendEEStart</code>, <code class="language-plaintext highlighter-rouge">GCRestartEEStart</code> and <a href="https://msdn.microsoft.com/en-us/library/ff356162(v=vs.110).aspx">many more</a>.</p>
<p>As well outputting a visualisation of the raw events, they are also aggregated so that a summary can be produced:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Memory Allocations:
1,065,720 bytes currently allocated
1,180,308,804 bytes have been allocated in total
GC Collections:
16 in total (12 excluding B/G)
2 - generation 0
9 - generation 1
1 - generation 2
4 - generation 2 (B/G)
Time in GC: 1,300.1 ms (108.34 ms avg)
Time under test: 3,853 ms (33.74 % spent in GC)
Total GC Pause time: 665.9 ms
Largest GC Pause: 75.99 ms
</code></pre></div></div>
<h2 id="gc-pauses">GC Pauses</h2>
<p>Most of the visualisation and summary information is relatively easy to calculate, however the timings for the GC <em>pauses</em> are not always straight-forward. Since .NET 4.5 the Server GC has 2 main modes available the new <strong>Background</strong> GC mode and the existing <strong>Foreground/Non-Concurrent</strong> one. The .NET Workstation GC has had a <strong>Background</strong> GC mode since .NET 4.0 and a <strong>Concurrent</strong> mode before that.</p>
<p>The main benefit of the <strong>Background</strong> mode is that it reduces <em>GC pauses</em>, or more specifically it reduces the time that the GC has to suspend all the user threads running inside the CLR. The problem with these “stop-the-world” pauses, as they are also known, is that during this time your application can’t continue with whatever it was doing and if the pauses last long enough <a href="http://blog.marcgravell.com/2011/10/assault-by-gc.html">users will notice</a>.</p>
<p>As you can see in the image below (courtesy of the <a href="https://blogs.msdn.microsoft.com/dotnet/2012/07/20/the-net-framework-4-5-includes-new-garbage-collector-enhancements-for-client-and-server-apps/">.NET Blog</a>) , with the newer <strong>Background</strong> mode in .NET 4.5 the time during which user-threads are <em>suspended</em> is much smaller (the dark blue arrows). They only need to be suspended for part of the GC process, not the entire duration.</p>
<p><a href="/images/2016/06/Background GC - .NET 4.0 v 4.5.png"><img src="/images/2016/06/Background GC - .NET 4.0 v 4.5.png" alt="Background GC - .NET 4.0 v 4.5" /></a></p>
<h3 id="foreground-blocking-gc-flow">Foreground (Blocking) GC flow</h3>
<p>So calculating the pauses for a <strong>Foreground</strong> GC (this means all Gen 0/1 GCs and full blocking GCs) is relatively straightforward, using the info from the <a href="https://blogs.msdn.microsoft.com/maoni/2014/12/25/gc-etw-events-3/">excellent blog post</a> by <a href="https://github.com/Maoni0/">Maoni Stephens</a> the main developer on the .NET GC:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code> <– <strong>suspension is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCStart_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCEnd_V1</code> <– <strong>actual GC is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEBegin_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code> <– <strong>resumption is done.</strong></li>
</ol>
<p>So the pause is just the difference between the timestamp of the <code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code> event and that of the <code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code>.</p>
<h3 id="background-gc-flow">Background GC flow</h3>
<p>However for <strong>Background</strong> GC (Gen 2) it is more complicated, again from <a href="https://blogs.msdn.microsoft.com/maoni/2014/12/25/gc-etw-events-3/">Maoni’s blog post</a>:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCStart_V1</code> <– <strong>Background GC starts</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEBegin_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code> <– <strong>done with the initial suspension</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEBegin_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code> <– <strong>done with Background GC’s own suspension</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEE_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCSuspendEEEnd_V1</code> <– <strong>suspension for Foreground GC is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCStart_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCEnd_V1</code> <– <strong>Foreground GC is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEBegin_V1</code></li>
<li><code class="language-plaintext highlighter-rouge">GCRestartEEEnd_V1</code> <– <strong>resumption for Foreground GC is done</strong></li>
<li><code class="language-plaintext highlighter-rouge">GCEnd_V1</code> <– <strong>Background GC ends</strong></li>
</ol>
<p>It’s a bit easier to understand these steps by using an annotated version of the image from the <a href="https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx#background_garbage_collection">MSDN page on GC</a> (the numbers along the bottom correspond to the steps above)</p>
<p><a href="/images/2016/06/BackgroundGarbageCollection-Annotated.jpeg"><img src="/images/2016/06/BackgroundGarbageCollection-Annotated.jpeg" alt="Background Garbage Collection" /></a></p>
<p>But there’s a few caveats that make it <a href="https://blogs.msdn.microsoft.com/maoni/2014/12/25/gc-etw-events-3/">trickier to calculate the actual time</a>:</p>
<blockquote>
<p>Of course there could be more than one foreground GC, there could be 0+ between line 5) and 6), and more than one between line 9) and 16).</p>
</blockquote>
<blockquote>
<p>We may also decide to do an ephemeral GC before we start the BGC (as BGC is meant for gen2) so you might also see an ephemeral GC between line 3) and 4) – the only difference between it and a normal ephemeral GC is you wouldn’t see its own suspension and resumption events as we already suspended/resumed for BGC purpose.</p>
</blockquote>
<hr />
<h3 id="age-of-ascent---gc-pauses">Age of Ascent - GC Pauses</h3>
<p>Finally, if you want a more dramatic way of visualising a “<em>Stop the World</em>” or more accurately a “<em>Stop the Universe</em>” GC pause, take a look at the video below. The GC pause starts at around 7 seconds in (credit to <a href="https://twitter.com/ben_a_adams">Ben Adams</a> and <a href="https://twitter.com/ageofascent">Age of Ascent</a>)</p>
<iframe width="774" height="435" src="https://www.youtube.com/embed/BTHimgTauwQ" frameborder="0" allowfullscreen=""></iframe>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=11941874">Hacker News</a></p>
<p>The post <a href="http://www.mattwarren.org/2016/06/20/Visualising-the-dotNET-Garbage-Collector/">Visualising the .NET Garbage Collector</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Strings and the CLR - a Special Relationship2016-05-31T00:00:00+00:00http://www.mattwarren.org/2016/05/31/Strings-and-the-CLR-a-Special-Relationship
<p>Strings and the Common Language Runtime (CLR) have a <em>special relationship</em>, but it’s a bit different (and way less political) than the UK <-> US <em>special relationship</em> that is often talked about.</p>
<p><a href="http://www.bbc.com/news/uk-36084672"><img src="/images/2016/05/UK and US - Special Relationship.png" alt="UK and US - Special Relationship" /></a></p>
<p>This relationship means that <a href="https://msdn.microsoft.com/en-us/library/system.string(v=vs.110).aspx">Strings</a> can do things that aren’t possible in the C# code that you and I can write and they also get a helping hand from the runtime to achieve maximum performance, which makes sense when you consider how ubiquitous they are in .NET applications.</p>
<h2 id="string-layout-in-memory">String layout in memory</h2>
<p>Firstly strings differ from any other data type in the CLR (other than arrays) in that their size isn’t fixed. Normally the .NET GC knows the size of an object when it’s being allocated, because it’s based on the size of the fields/properties within the object and they don’t change. However in .NET a string object doesn’t contain a pointer to the actual string data, which is then stored elsewhere on the heap. That raw data, the actual bytes that make up the text are contained within the string object itself. That means that the memory representation of a string looks like this:</p>
<p><img src="/images/2016/05/Memory Layout - CLR String.png" alt="Memory Layout - CLR String" /></p>
<p>The benefit is that this gives excellent memory locality and ensures that when the CLR wants to access the raw string data it doesn’t have to do another pointer lookup. For more information, see the Stack Overflow questions <a href="http://stackoverflow.com/questions/5240971/where-does-net-place-the-string-value">“Where does .NET place the String value?”</a> and Jon Skeet’s excellent post on <a href="http://csharpindepth.com/Articles/General/Strings.aspx">strings</a>.</p>
<p>Whereas if you were to implement your own string class, like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">MyString</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">Length</span><span class="p">;</span>
<span class="kt">byte</span> <span class="p">[]</span> <span class="n">Data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If would look like this in memory:</p>
<p><img src="/images/2016/05/Memory Layout - Custom String.png" alt="Memory Layout - Custom String" /></p>
<p>In this case, the actual string data would be held in the <code class="language-plaintext highlighter-rouge">byte []</code>, located elsewhere in memory and would therefore require a pointer reference and lookup to locate it.</p>
<p>This is summarised nicely in the excellent BOTR, in in the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/mscorlib.md#interface-between-managed--clr-code">mscorlib section</a>:</p>
<blockquote>
<p>The managed mechanism for calling into native code must also support the special managed calling convention used by <strong>String’s constructors, where the constructor allocates the memory used by the object</strong> (instead of the typical convention where the constructor is called after the GC allocates memory).</p>
</blockquote>
<h2 id="implemented-in-un-managed-code">Implemented in un-managed code</h2>
<p>Despite the <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/String.cs">String class</a> being a managed C# source file, large parts of it are implemented in un-managed code, that is in C++ or even Assembly. For instance there are 15 methods in <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/String.cs">String.cs</a> that have no method body, are marked as <code class="language-plaintext highlighter-rouge">extern</code> with <code class="language-plaintext highlighter-rouge">[MethodImplAttribute(MethodImplOptions.InternalCall)]</code> applied to them. This indicates that their implementations are provided elsewhere by the runtime. Again from the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/mscorlib.md#calling-from-managed-to-native-code">mscorlib section of the BOTR</a> (emphasis mine)</p>
<blockquote>
<p>We have two techniques for calling into the CLR from managed code. FCall allows you to call directly into the CLR code, and provides a lot of flexibility in terms of manipulating objects, though it is easy to cause GC holes by not tracking object references correctly. QCall allows you to call into the CLR via the P/Invoke, and is much harder to accidentally mis-use than FCall. <strong>FCalls are identified in managed code as extern methods with the MethodImplOptions.InternalCall bit set</strong>. QCalls are static extern methods that look like regular P/Invokes, but to a library called “QCall”.</p>
</blockquote>
<h3 id="types-with-a-managedunmanaged-duality">Types with a Managed/Unmanaged Duality</h3>
<p>A consequence of Strings being implemented in unmanaged and managed code is that they <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/mscorlib.md#types-with-a-managedunmanaged-duality">have to be defined in both</a> and those definitions must be kept in sync:</p>
<blockquote>
<p>Certain managed types must have a representation available in both managed & native code. You could ask whether the canonical definition of a type is in managed code or native code within the CLR, but the answer doesn’t matter – the key thing is they must both be identical. <strong>This will allow the CLR’s native code to access fields within a managed object in a very fast, easy to use manner</strong>. There is a more complex way of using essentially the CLR’s equivalent of Reflection over MethodTables & FieldDescs to retrieve field values, but this probably doesn’t perform as well as you’d like, and it isn’t very usable. For commonly used types, it makes sense to declare a data structure in native code & attempt to keep the two in sync.</p>
</blockquote>
<p>So in <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/mscorlib/src/System/String.cs#L50-L56">String.cs</a> we can see:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">//NOTE NOTE NOTE NOTE</span>
<span class="c1">//These fields map directly onto the fields in an EE StringObject. </span>
<span class="c1">//See object.h for the layout.</span>
<span class="p">[</span><span class="n">NonSerialized</span><span class="p">]</span><span class="k">private</span> <span class="kt">int</span> <span class="n">m_stringLength</span><span class="p">;</span>
<span class="p">[</span><span class="n">NonSerialized</span><span class="p">]</span><span class="k">private</span> <span class="kt">char</span> <span class="n">m_firstChar</span><span class="p">;</span>
</code></pre></div></div>
<p>Which corresponds to the following in <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/object.h#L1095-L1101">object.h</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">private:</span>
<span class="n">DWORD</span> <span class="n">m_StringLength</span><span class="p">;</span>
<span class="n">WCHAR</span> <span class="n">m_Characters</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
</code></pre></div></div>
<h2 id="fast-string-allocations">Fast String Allocations</h2>
<p>In a typical .NET program, one of the most common ways that you would allocate strings dynamically is either via <code class="language-plaintext highlighter-rouge">StringBuilder</code> or <code class="language-plaintext highlighter-rouge">String.Format</code> (which uses <code class="language-plaintext highlighter-rouge">StringBuilder</code> under the hood).</p>
<p>So you may have some code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">builder</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="p">();</span>
<span class="p">...</span>
<span class="n">builder</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">valueX</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">builder</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="s">"Some text"</span><span class="p">)</span>
<span class="p">...</span>
<span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="n">builder</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
</code></pre></div></div>
<p>or</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"{0}, {1}"</span><span class="p">,</span> <span class="n">valueX</span><span class="p">,</span> <span class="n">valueY</span><span class="p">);</span>
</code></pre></div></div>
<p>Then, when the <code class="language-plaintext highlighter-rouge">StringBuilder</code> <code class="language-plaintext highlighter-rouge">ToString()</code> <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/mscorlib/src/System/Text/StringBuilder.cs#L336">method is called</a>, it internally calls the <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/mscorlib/src/System/String.cs#L1556">FastAllocateString</a> on the String class, which is declared like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">System</span><span class="p">.</span><span class="n">Security</span><span class="p">.</span><span class="n">SecurityCritical</span><span class="p">]</span> <span class="c1">// auto-generated</span>
<span class="p">[</span><span class="nf">MethodImplAttribute</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">InternalCall</span><span class="p">)]</span>
<span class="k">internal</span> <span class="k">extern</span> <span class="k">static</span> <span class="n">String</span> <span class="nf">FastAllocateString</span><span class="p">(</span><span class="kt">int</span> <span class="n">length</span><span class="p">);</span>
</code></pre></div></div>
<p>This method is marked as <code class="language-plaintext highlighter-rouge">extern</code> and has the <code class="language-plaintext highlighter-rouge">[MethodImplAttribute(MethodImplOptions.InternalCall)]</code> attribute applied and as we saw earlier this implies it will be implemented in un-managed code by the CLR. It turns out that eventually the call stack ends up in a hand-written assembly function, called <strong>AllocateStringFastMP_InlineGetThread</strong> from <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/amd64/JitHelpers_InlineGetThread.asm#L159-L204">JitHelpers_InlineGetThread.asm</a></p>
<p>This also shows something else we talked about earlier. The assembly code is actually allocating the memory needed for the string, based on the required length that was passed in by the calling code.</p>
<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">LEAF_ENTRY</span><span class="w"> </span><span class="n">AllocateStringFastMP_InlineGetThread,</span><span class="w"> </span><span class="n">_TEXT</span><span class="w">
</span><span class="c1">; We were passed the number of characters in ECX</span><span class="w">
</span><span class="c1">; we need to load the method table for string from the global</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">r9,</span><span class="w"> </span><span class="p">[</span><span class="n">g_pStringClass</span><span class="p">]</span><span class="w">
</span><span class="c1">; Instead of doing elaborate overflow checks, we just limit the number of elements</span><span class="w">
</span><span class="c1">; to (LARGE_OBJECT_SIZE - 256)/sizeof(WCHAR) or less.</span><span class="w">
</span><span class="c1">; This will avoid avoid all overflow problems, as well as making sure</span><span class="w">
</span><span class="c1">; big string objects are correctly allocated in the big object heap.</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">ecx,</span><span class="w"> </span><span class="p">(</span><span class="nf">ASM_LARGE_OBJECT_SIZE</span><span class="w"> </span><span class="nb">-</span><span class="w"> </span><span class="mi">256</span><span class="p">)</span><span class="n">/2</span><span class="w">
</span><span class="n">jae</span><span class="w"> </span><span class="n">OversizedString</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">edx,</span><span class="w"> </span><span class="p">[</span><span class="n">r9</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">OFFSET__MethodTable__m_BaseSize</span><span class="p">]</span><span class="w">
</span><span class="c1">; Calculate the final size to allocate.</span><span class="w">
</span><span class="c1">; We need to calculate baseSize + cnt*2, </span><span class="w">
</span><span class="c1">; then round that up by adding 7 and anding ~7.</span><span class="w">
</span><span class="n">lea</span><span class="w"> </span><span class="n">edx,</span><span class="w"> </span><span class="p">[</span><span class="n">edx</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">ecx*2</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="mi">7</span><span class="p">]</span><span class="w">
</span><span class="nb">and</span><span class="w"> </span><span class="n">edx,</span><span class="w"> </span><span class="mi">-8</span><span class="w">
</span><span class="n">PATCHABLE_INLINE_GETTHREAD</span><span class="w"> </span><span class="n">r11,</span><span class="w"> </span><span class="n">AllocateStringFastMP_InlineGetThread__PatchTLSOffset</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">r10,</span><span class="w"> </span><span class="p">[</span><span class="n">r11</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">OFFSET__Thread__m_alloc_context__alloc_limit</span><span class="p">]</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">rax,</span><span class="w"> </span><span class="p">[</span><span class="n">r11</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">OFFSET__Thread__m_alloc_context__alloc_ptr</span><span class="p">]</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">rdx,</span><span class="w"> </span><span class="n">rax</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rdx,</span><span class="w"> </span><span class="n">r10</span><span class="w">
</span><span class="n">ja</span><span class="w"> </span><span class="n">AllocFailed</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="p">[</span><span class="n">r11</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">OFFSET__Thread__m_alloc_context__alloc_ptr</span><span class="p">]</span><span class="n">,</span><span class="w"> </span><span class="n">rdx</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="p">[</span><span class="n">rax</span><span class="p">]</span><span class="n">,</span><span class="w"> </span><span class="n">r9</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="p">[</span><span class="n">rax</span><span class="w"> </span><span class="nb">+</span><span class="w"> </span><span class="n">OFFSETOF__StringObject__m_StringLength</span><span class="p">]</span><span class="n">,</span><span class="w"> </span><span class="n">ecx</span><span class="w">
</span><span class="n">ifdef</span><span class="w"> </span><span class="n">_DEBUG</span><span class="w">
</span><span class="n">call</span><span class="w"> </span><span class="n">DEBUG_TrialAllocSetAppDomain_NoScratchArea</span><span class="w">
</span><span class="n">endif</span><span class="w"> </span><span class="c1">; _DEBUG</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span><span class="n">OversizedString</span><span class="err">:</span><span class="w">
</span><span class="n">AllocFailed</span><span class="err">:</span><span class="w">
</span><span class="n">jmp</span><span class="w"> </span><span class="n">FramedAllocateString</span><span class="w">
</span><span class="n">LEAF_END</span><span class="w"> </span><span class="n">AllocateStringFastMP_InlineGetThread,</span><span class="w"> </span><span class="n">_TEXT</span><span class="w">
</span></code></pre></div></div>
<p>There is also a less optimised version called <strong>AllocateStringFastMP</strong> from <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/amd64/JitHelpers_Slow.asm#L274-L322">JitHelpers_Slow.asm</a>. The reason for the different versions is explained in <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/jitinterfacegen.cpp#L31-L46">jinterfacegen.cpp</a> and then at run-time the decision is made as to which one to use, <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/jitinterfacegen.cpp#L234-L254">depending on the state of the Thread-local storage</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// These are the fastest(?) versions of JIT helpers as they have the code to </span>
<span class="c1">// GetThread patched into them that does not make a call.</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_TrialAllocSFastMP_InlineGetThread</span><span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">typeHnd_</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_BoxFastMP_InlineGetThread</span> <span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">type</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">unboxedData</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">AllocateStringFastMP_InlineGetThread</span> <span class="p">(</span><span class="n">CLR_I4</span> <span class="n">cch</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_NewArr1OBJ_MP_InlineGetThread</span> <span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">arrayTypeHnd_</span><span class="p">,</span> <span class="n">INT_PTR</span> <span class="n">size</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_NewArr1VC_MP_InlineGetThread</span> <span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">arrayTypeHnd_</span><span class="p">,</span> <span class="n">INT_PTR</span> <span class="n">size</span><span class="p">);</span>
<span class="c1">// This next set is the fast version that invoke GetThread but is still faster </span>
<span class="c1">// than the VM implementation (i.e. the "slow" versions).</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_TrialAllocSFastMP</span><span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">typeHnd_</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_TrialAllocSFastSP</span><span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">typeHnd_</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_BoxFastMP</span> <span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">type</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">unboxedData</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">JIT_BoxFastUP</span> <span class="p">(</span><span class="n">CORINFO_CLASS_HANDLE</span> <span class="n">type</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">unboxedData</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">AllocateStringFastMP</span> <span class="p">(</span><span class="n">CLR_I4</span> <span class="n">cch</span><span class="p">);</span>
<span class="n">EXTERN_C</span> <span class="n">Object</span><span class="o">*</span> <span class="nf">AllocateStringFastUP</span> <span class="p">(</span><span class="n">CLR_I4</span> <span class="n">cch</span><span class="p">);</span>
</code></pre></div></div>
<h2 id="optimised-string-length">Optimised String Length</h2>
<p>The final example of the “special relationship” is shown by how the string <code class="language-plaintext highlighter-rouge">Length</code> property is optimised by the run-time. Finding the length of a string is a very common operation and because .NET <a href="https://msdn.microsoft.com/en-us/library/362314fe.aspx">strings are immutable</a> should also be very quick, because the value can be calculated once and then cached.</p>
<p>As we can see in the comment from <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/mscorlib/src/System/String.cs#L963-L975">String.cs</a>, the CLR ensures that this is true by implementing it in such a way that the JIT can optimise for it:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Gets the length of this string</span>
<span class="c1">//</span>
<span class="c1">/// This is a EE implemented function so that the JIT can recognise is specially</span>
<span class="c1">/// and eliminate checks on character fetches in a loop like:</span>
<span class="c1">/// for(int i = 0; i < str.Length; i++) str[i]</span>
<span class="c1">/// The actually code generated for this will be one instruction and will be inlined.</span>
<span class="c1">//</span>
<span class="c1">// Spec#: Add postcondition in a contract assembly. Potential perf problem.</span>
<span class="k">public</span> <span class="k">extern</span> <span class="kt">int</span> <span class="n">Length</span> <span class="p">{</span>
<span class="p">[</span><span class="n">System</span><span class="p">.</span><span class="n">Security</span><span class="p">.</span><span class="n">SecuritySafeCritical</span><span class="p">]</span> <span class="c1">// auto-generated</span>
<span class="p">[</span><span class="nf">MethodImplAttribute</span><span class="p">(</span><span class="n">MethodImplOptions</span><span class="p">.</span><span class="n">InternalCall</span><span class="p">)]</span>
<span class="k">get</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This code is implemented in <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/classlibnative/bcltype/stringnative.cpp#L492-L504">stringnative.cpp</a>, which in turn calls <code class="language-plaintext highlighter-rouge">GetStringLength</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FCIMPL1</span><span class="p">(</span><span class="n">INT32</span><span class="p">,</span> <span class="n">COMString</span><span class="o">::</span><span class="n">Length</span><span class="p">,</span> <span class="n">StringObject</span><span class="o">*</span> <span class="n">str</span><span class="p">)</span> <span class="p">{</span>
<span class="n">FCALL_CONTRACT</span><span class="p">;</span>
<span class="n">FC_GC_POLL_NOT_NEEDED</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">str</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="n">FCThrow</span><span class="p">(</span><span class="n">kNullReferenceException</span><span class="p">);</span>
<span class="n">FCUnique</span><span class="p">(</span><span class="mh">0x11</span><span class="p">);</span>
<span class="k">return</span> <span class="n">str</span><span class="o">-></span><span class="n">GetStringLength</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">FCIMPLEND</span>
</code></pre></div></div>
<p>Which is a <a href="https://github.com/dotnet/coreclr/blob/19a88d8a92e08c8506f6e69c3964dc77329c108a/src/vm/object.h#L1113">simple method call</a> that the JIT can inline:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DWORD</span> <span class="nf">GetStringLength</span><span class="p">()</span> <span class="p">{</span> <span class="n">LIMITED_METHOD_DAC_CONTRACT</span><span class="p">;</span> <span class="k">return</span><span class="p">(</span> <span class="n">m_StringLength</span> <span class="p">);}</span>
</code></pre></div></div>
<h2 id="why-have-a-special-relationship">Why have a special relationship?</h2>
<p>In one word <strong>performance</strong>, strings are widely used in .NET programs and therefore need to be as optimised, space efficient and cache-friendly as possible. That’s why the CLR developers have gone to great lengths to make this happen, including implementing methods in assembly and ensuring that the JIT can optimise code as much as possible.</p>
<p>Interestingly enough one of the .NET developers recently made a comment about this on a <a href="https://github.com/dotnet/coreclr/issues/4703#issuecomment-216071622">GitHub issue</a>, in response to a query asking why more string functions weren’t implemented in managed code they said:</p>
<blockquote>
<p>We have looked into this in the past and moved everything that could be moved without significant perf loss. Moving more depends on having pretty good managed optimizations for all coreclr architectures.
This makes sense to consider only once RyuJIT or better codegen is available for all architectures that coreclr runs on (x86, x64, arm, arm64).</p>
</blockquote>
<hr />
<p>Discuss this post on <a href="https://news.ycombinator.com/item?id=11811061">Hacker News</a> or <a href="https://www.reddit.com/r/programming/comments/4ly6uy/strings_and_the_clr_a_special_relationship/">/r/programming</a></p>
<p>The post <a href="http://www.mattwarren.org/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/">Strings and the CLR - a Special Relationship</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Adventures in Benchmarking - Performance Golf2016-05-16T00:00:00+00:00http://www.mattwarren.org/2016/05/16/adventures-in-benchmarking-performance-golf
<p>Recently <a href="http://nickcraver.com">Nick Craver</a> one of the developers at Stack Overflow has been <a href="https://twitter.com/hashtag/StackCode?src=hash">tweeting snippets of code</a> from their source, the other week the following code was posted:</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">A daily screenshot from the Stack Overflow codebase (checking strings for tokens without allocations). <a href="https://twitter.com/hashtag/StackCode?src=hash">#StackCode</a> <a href="https://t.co/sDPqviHgD0">pic.twitter.com/sDPqviHgD0</a></p>— Nick Craver (@Nick_Craver) <a href="https://twitter.com/Nick_Craver/status/722741298575319040">April 20, 2016</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This code is an optimised version of what you would normally write, specifically written to ensure that is doesn’t allocate memory. Previously Stack Overflow have encountered issues with <a href="http://blog.marcgravell.com/2011/10/assault-by-gc.html">large pauses caused by the .NET GC</a>, so it appears that where appropriate, they make a concerted effort to write code that doesn’t needlessly allocate.</p>
<p>I also have to give Nick credit for making me aware of the term <a href="https://twitter.com/Nick_Craver/status/722795460302385153">“Performance Golf”</a>, I’ve heard of <a href="http://stackoverflow.com/questions/tagged/code-golf">Code Golf</a>, but not the Performance version.</p>
<p><strong>Aside:</strong> If you want to see the full discussion and the code for all the different entries, take a look at <a href="https://gist.github.com/mattwarren/f0594a9f3afa9377a4bbc2bcf8e573c5">this gist</a>. Also for a really in-depth explanation of what the fastest version is actually doing, I really recommend checking out <a href="https://twitter.com/kevinmontrose">Kevin Montrose’s</a> blog post <a href="https://kevinmontrose.com/2016/04/26/an-optimization-exercise/">“An Optimisation Exercise”</a>, there’s some very cool tricks in there, although by this point he is basically writing C/C++ code rather than anything you would recognise as C#!</p>
<h2 id="good-benchmarking-tools">Good Benchmarking Tools</h2>
<p>In this post I’m not going to concentrate too much on this particular benchmark, but instead I’m going to use it as an example of what I believe a good benchmarking library should provide for you. Full disclaimer, I’m one of the authors of <a href="https://github.com/PerfDotNet/BenchmarkDotNet#team">BenchmarkDotNet</a>, so I admit I might be biased!</p>
<p>I think that a good benchmarking tool should offer the following features:</p>
<ul>
<li><a href="#benchmark-scaffolding">Benchmark Scaffolding</a></li>
<li><a href="#diagnose-what-is-going-on">Diagnose what is going on</a></li>
<li><a href="#consistent-reliable-and-clear-results">Consistent, Reliable and Clear Results</a></li>
</ul>
<h3 id="benchmark-scaffolding">Benchmark Scaffolding</h3>
<p>By using <a href="https://www.nuget.org/packages/BenchmarkDotNet/">BenchmarkDotNet</a>, or indeed any benchmarking tool, you can just get on with the business of actually writing the benchmark and not worry about any of the mechanics of accurately measuring the code. This is important because often when someone has posted an optimisation and accompanying benchmark on Stack Overflow, several of the comments then point out why their measurements are inaccurate or plain wrong.</p>
<p>In the case of BenchmarkDotNet, it’s as simple as adding a <code class="language-plaintext highlighter-rouge">[Benchmark]</code> attribute to the methods that you want to benchmark and then a few lines of code to launch the run:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">Benchmark</span><span class="p">(</span><span class="n">Baseline</span> <span class="p">=</span> <span class="k">true</span><span class="p">)]</span>
<span class="k">public</span> <span class="kt">bool</span> <span class="nf">StringSplit</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">tokens</span> <span class="p">=</span> <span class="n">Value</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="n">delimeter</span><span class="p">);</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">token</span> <span class="k">in</span> <span class="n">tokens</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">token</span> <span class="p">==</span> <span class="n">Match</span><span class="p">)</span>
<span class="k">return</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">summary</span> <span class="p">=</span> <span class="n">BenchmarkRunner</span><span class="p">.</span><span class="n">Run</span><span class="p"><</span><span class="n">Program</span><span class="p">>();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>It also offers a few more tools for advanced scenarios, for instance you can decorate a field/property with the <code class="language-plaintext highlighter-rouge">[Params]</code> attribute like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">Params</span><span class="p">(</span><span class="s">"Foo;Bar"</span><span class="p">,</span>
<span class="s">"Foo;FooBar;Whatever"</span><span class="p">,</span>
<span class="s">"Bar;blaat;foo"</span><span class="p">,</span>
<span class="s">"blaat;foo;Bar"</span><span class="p">,</span>
<span class="s">"foo;Bar;Blaat"</span><span class="p">,</span>
<span class="s">"foo;FooBar;Blaat"</span><span class="p">,</span>
<span class="s">"Bar1;Bar2;Bar3;Bar4;Bar"</span><span class="p">,</span>
<span class="s">"Bar1;Bar2;Bar3;Bar4;NoMatch"</span><span class="p">,</span>
<span class="s">"Foo;FooBar;Whatever"</span><span class="p">,</span>
<span class="s">"Some;Other;Really;Interesting;Tokens"</span><span class="p">)]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Value</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
</code></pre></div></div>
<p>and then each benchmark will be run multiples times, with <code class="language-plaintext highlighter-rouge">Value</code> set to the different strings. This gives you a really easy way of trying out benchmarks across different inputs. For instance some methods were consistently fast, whereas other performed badly on inputs that were a worse-case scenario for them.</p>
<h3 id="diagnose-what-is-going-on">Diagnose what is going on</h3>
<p>If you state that the aim of optimising you code is to “check a string for tokens, <strong>without</strong> allocations”, you would really like to be able to prove if that is true or not. I’ve previously written about how BenchmarkDotNet can <a href="/2016/02/17/adventures-in-benchmarking-memory-allocations/">give you this information</a> and in this case we get the following results (click for full-size image):</p>
<p><a href="/images/2016/05/Results showing memory allocations.png"><img src="/images/2016/05/Results showing memory allocations.png" alt="Results showing memory allocations" /></a></p>
<p>So you can see that the <code class="language-plaintext highlighter-rouge">ContainTokenFransBouma</code> benchmark isn’t allocation free, which in the scenario is a problem.</p>
<h3 id="consistent-reliable-and-clear-results">Consistent, Reliable and Clear Results</h3>
<p>Another important aspect is that you should be able to rely on the results. Part of this is trusting the tool and hopefully people will come to <a href="https://github.com/PerfDotNet/BenchmarkDotNet/wiki/People-using-BenchmarkDotNet">trust BenchmarkDotNet over time</a>.</p>
<p>Also you should be able to get clear results, so in as well as providing a text-based result table that you can easily paste into a GitHub issue or Stack Overflow answer, BenchmarkDotNet will provide several graphs using the <a href="https://www.r-project.org/">R statistics and graphing library</a>. Sometimes a wall of text isn’t the easiest thing to interpret, but colourful graphs can help (click for full image).</p>
<p><a href="/images/2016/05/Graph of different benchmarks - with varying inputs - large.png"><img src="/images/2016/05/Graph of different benchmarks - with varying inputs.png" alt="Graph of different benchmarks - with varying inputs" /></a></p>
<p>Here we can see that the original <code class="language-plaintext highlighter-rouge">ContainsToken</code> code is “slower” in some scenarios (although it’s worth pointing out that the Y-axis is in nanoseconds).</p>
<h2 id="summary">Summary</h2>
<p>Would I recommend writing code like any of these optimisations for normal day-to-day scenarios? No.</p>
<p>Without exception the optimised versions of the code are less readable, harder to debug and probably contain more errors. Certainly, by the time you get to the <a href="https://gist.github.com/mattwarren/f0594a9f3afa9377a4bbc2bcf8e573c5#file-containstokenbenchmark-cs-L201-L363">fastest version</a> you are no longer writing recognisable C# code, it’s basically C++/C masquerading as C#.</p>
<p>However, for the purposes of learning, a bit of fun or just because you like a spot of competition, then it’s fine. Just make sure you use a decent tool that lets you get on with the fun part of writing the most optimised code possible!</p>
<p>The post <a href="http://www.mattwarren.org/2016/05/16/adventures-in-benchmarking-performance-golf/">Adventures in Benchmarking - Performance Golf</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Coz: Finding Code that Counts with Causal Profiling - An Introduction2016-03-30T00:00:00+00:00http://www.mattwarren.org/2016/03/30/Coz-Finding-Code-that-Counts-with-Causal-Profiling
<p>A while ago I came across an interesting and very readable paper titled <a href="http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/090-curtsinger.pdf">“COZ Finding Code that Counts with Causal Profiling”</a> that was presented at <a href="http://www.ssrc.ucsc.edu/sosp15/">SOSP 2015</a> (and was recipient of a Best Paper Award). This post is my attempt to provide an introduction to <em>Causal Profiling</em> for anyone who doesn’t want to go through the entire paper.</p>
<h2 id="what-is-causal-profiling">What is “Causal Profiling”</h2>
<p>Here’s the explanation from the paper itself:</p>
<blockquote>
<p>Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running <em>performance experiments</em> during program execution.
Each experiment calculates the impact of any potential optimization by <em>virtually speeding</em> up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same <em>relative</em> effect as running that line faster, thus “virtually” speeding it up.</p>
</blockquote>
<p>Or if you prefer, below is an image from the paper explaining what it does (click to enlarge)</p>
<p><a href="/images/2016/03/Coz - virtual speedup explanation - large.png"><img src="/images/2016/03/Coz - virtual speedup explanation.png" alt="Virtual speedup explanation" /></a></p>
<p>The key part is that it tries to find the <strong>effect</strong> of speeding up a given block of code on the <strong>overall</strong> running time of the program. But being able to speed up arbitrary pieces of code is very hard and if the authors could do that, then then would be better off making lots of money selling code optimisation tools. So instead of <strong>speeding up</strong> a given piece of code, they <strong>artificially slow-down</strong> all the other code that is running at the same time, which has exactly the same <strong>relative</strong> effect.</p>
<p>In the diagram above Coz is trying to determine the effect that optimising the code in block <code class="language-plaintext highlighter-rouge">f</code> would have on the overall runtime. Instead of making <code class="language-plaintext highlighter-rouge">f</code> run quicker, as shown in part (b), they instead make <code class="language-plaintext highlighter-rouge">g</code> run slower by inserting pauses, see part (c). Then Coz is able to infer that the speed-up seen in (c) will have the same relative effect if <code class="language-plaintext highlighter-rouge">f</code> was to run faster, therefore the “Actual Speedup” as shown in (b) is possible.</p>
<p>Unfortunately Coz doesn’t tell you how to speed up your code, that’s left up to you, but it does tell you which parts of the code you should focus on to get the best overall improvements. Or another way of saying it is, Coz tells you:</p>
<blockquote>
<p><strong>If you speed up a given block of code by this much, the program will run this much faster</strong></p>
</blockquote>
<h2 id="existing-profilers">Existing profilers</h2>
<p>In the paper, the authors argue that existing profilers only tell you about:</p>
<ul>
<li>Frequently executed code (# of calls)</li>
<li>Code that runs for a long time (% of total time)</li>
</ul>
<p>What they don’t help you with is finding important code in parallel programs and this is the problem that Coz solves. The (contrived) example they give is:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">a</span><span class="p">()</span> <span class="p">{</span> <span class="c1">// ˜6.7 seconds</span>
<span class="k">for</span><span class="p">(</span><span class="k">volatile</span> <span class="kt">size_t</span> <span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">x</span><span class="o"><</span><span class="mi">2000000000</span><span class="p">;</span> <span class="n">x</span><span class="o">++</span><span class="p">)</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">b</span><span class="p">()</span> <span class="p">{</span> <span class="c1">// ˜6.4 seconds</span>
<span class="k">for</span><span class="p">(</span><span class="k">volatile</span> <span class="kt">size_t</span> <span class="n">y</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">y</span><span class="o"><</span><span class="mi">1900000000</span><span class="p">;</span> <span class="n">y</span><span class="o">++</span><span class="p">)</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// Spawn both threads and wait for them.</span>
<span class="kr">thread</span> <span class="n">a_thread</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">b_thread</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
<span class="n">a_thread</span><span class="p">.</span><span class="n">join</span><span class="p">();</span> <span class="n">b_thread</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>which they state is a:</p>
<blockquote>
<p>.. simple multi-threaded program that illustrates the shortcomings of existing profilers. Optimizing f<code class="language-plaintext highlighter-rouge">a</code> will improve performance by no more than 4.5%, while optimizing f<code class="language-plaintext highlighter-rouge">b</code> would have no effect on performance.</p>
</blockquote>
<p>As shown in the comparison below (click for larger version), a regular profiler shows that f<code class="language-plaintext highlighter-rouge">a</code> and f<code class="language-plaintext highlighter-rouge">b</code> both comprise similar fractions of the total runtime (55.20% and 45.19% respectively). However by using a Causal Profiler, it predicts that optimising line 2 from f<code class="language-plaintext highlighter-rouge">a</code> will increase the overall runtime by 4-6%, whereas optimising f<code class="language-plaintext highlighter-rouge">b</code> will only increase it by < 2%.</p>
<p><a href="/images/2016/03/Profiling - Conventional v Causal - large.png"><img src="/images/2016/03/Profiling - Conventional v Causal.png" alt="Profiling - Conventional v Causal" /></a></p>
<h2 id="results">Results</h2>
<p>However their research was not only done on contrived programs, they also looked at several real-world projects including:</p>
<ul>
<li><a href="https://www.sqlite.org/">SQLite</a></li>
<li><a href="http://parsec.cs.princeton.edu/overview.htm">PARSEC benchmark suite</a>
<ul>
<li>dedup - Next-generation compression with data deduplication</li>
<li>ferred - Content similarity search server</li>
</ul>
</li>
</ul>
<p>Results taken from a <a href="http://www.cs.grinnell.edu/~curtsinger/files/coz_slides.pdf">presentation by Charlie Curtsinger</a> (one of the authors of Coz) show that there are several situations where Coz identifies an area for optimisation that a conventional profiler would miss. For instance they identified a function in SQLite that when optimised provided a 25% speed-up, however very little time was actually spent in the function, only 0.15%, so it would not have shown up in the output from a conventional profiler.</p>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Speedup with Coz</strong></th>
<th style="text-align: right"><strong>% Runtime reported via a Profiler</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">SQLite</td>
<td style="text-align: right">25%</td>
<td style="text-align: right">0.15%</td>
</tr>
<tr>
<td style="text-align: left">dedup</td>
<td style="text-align: right">9%</td>
<td style="text-align: right">14.38%</td>
</tr>
<tr>
<td style="text-align: left">ferred</td>
<td style="text-align: right">21%</td>
<td style="text-align: right">0.00%</td>
</tr>
</tbody>
</table>
<p>You can explore these results in the <a href="http://plasma-umass.github.io/coz/">interactive viewer</a> that has been developed alongside the tool. For instance the image below shows the lines on code in the SQLite source base that Coz identifies as having the maximum impact, positive or negative (click for full-size version):</p>
<p><a href="/images/2016/03/SQLite - lines of code with max impact.png"><img src="/images/2016/03/SQLite - lines of code with max impact.png" alt="SQLite - lines of code with max impact" /></a></p>
<h2 id="summary">Summary</h2>
<p>It’s worth pointing out that Coz is currently a <em>prototype</em> causal profiler, that at the moment only runs on Linux, but doesn’t require you to modify your executable. However the ideas presented in the paper could be ported to other OSes, programming languages or runtimes. For instance work has already begun on a <a href="https://morsmachine.dk/causalprof">Go version</a> that only required a <a href="https://github.com/golang/go/compare/master...DanielMorsing:causalprof">few modifications to the runtime</a> to get a prototype up and running.</p>
<p>It would be great to see something like this for .NET, any takers?</p>
<hr />
<h2 id="further-information">Further Information</h2>
<p>If you want to find out any more information about Coz, here is a list of useful links:</p>
<ul>
<li>The Coz paper <a href="http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/090-curtsinger.pdf">“Finding Code that Counts with Causal Profiling”</a></li>
<li><a href="http://blog.acolyer.org/2015/10/14/coz-finding-code-that-counts-with-causal-profling/">Comprehensive (and more in-depth) write-up</a> on the paper from “the morning paper” blog</li>
<li><a href="https://github.com/plasma-umass/coz">Coz GitHub repository</a>
<ul>
<li><a href="https://github.com/plasma-umass/coz#using-coz">Getting started with Coz</a></li>
<li><a href="https://github.com/plasma-umass/coz#profiling-modes">Coz profiling modes</a></li>
</ul>
</li>
<li>Presentation by <a href="http://www.cs.grinnell.edu/~curtsinger/research/">Charlie Curtsinger</a> (one of the authors of Coz)
<ul>
<li><a href="https://www.youtube.com/watch?v=jE0V-p1odPg">Video</a></li>
<li><a href="http://www.cs.grinnell.edu/~curtsinger/files/coz_slides.pdf">Slides</a></li>
</ul>
</li>
<li><a href="https://morsmachine.dk/causalprof">Causal Profiling for Go</a> is an attempt to implement Coz within the Go runtime</li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/03/30/Coz-Finding-Code-that-Counts-with-Causal-Profiling/">Coz: Finding Code that Counts with Causal Profiling - An Introduction</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Adventures in Benchmarking - Method Inlining2016-03-09T00:00:00+00:00http://www.mattwarren.org/2016/03/09/adventures-in-benchmarking-method-inlining
<p>In a <a href="/2016/02/17/adventures-in-benchmarking-memory-allocations/">previous post</a> I looked at how you can use <a href="https://github.com/PerfDotNet/BenchmarkDotNet/">BenchmarkDotNet</a> to help diagnose <em>why</em> one benchmark is running slower than another. The post outlined how ETW Events are used to give you an accurate measurement of the <em># of Bytes allocated</em> and the <em># of GC Collections</em> per benchmark.</p>
<h3 id="inlining">Inlining</h3>
<p>In addition to memory allocation, BenchmarkDotNet can also give you information about which methods were inlined by the JITter. <a href="http://en.wikipedia.org/wiki/Inline_expansion">Inlining</a> is the process by which code is copied from one function (the <em>inlinee</em>) directly into the body of another function (the <em>inliner</em>). The reason for this is to save the overhead of a method call and the associated work that needs to be done when control is passed from one method to another.</p>
<p>To see this in action we are going to run the following benchmark:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">int</span> <span class="nf">Calc</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nf">WithoutStarg</span><span class="p">(</span><span class="m">0x11</span><span class="p">)</span> <span class="p">+</span> <span class="nf">WithStarg</span><span class="p">(</span><span class="m">0x12</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">WithoutStarg</span><span class="p">(</span><span class="kt">int</span> <span class="k">value</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">private</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">WithStarg</span><span class="p">(</span><span class="kt">int</span> <span class="k">value</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">value</span> <span class="p"><</span> <span class="m">0</span><span class="p">)</span>
<span class="k">value</span> <span class="p">=</span> <span class="p">-</span><span class="k">value</span><span class="p">;</span>
<span class="k">return</span> <span class="k">value</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>BenchmarkDotNet also gives you the ability to run Benchmarks against different versions of the .NET JITter and on various CPU Platforms. So in this test will will ask it to run against the following configurations:</p>
<ul>
<li>Legacy JIT - x86</li>
<li>Legacy JIT - x64</li>
</ul>
<p>Once this is all set-up, we can run the benchmark and we get the following results:</p>
<p><img src="/images/2016/03/Method Inlining - Benchmark Results.png" alt="Method Inlining - Benchmark Results" /></p>
<p>The interesting thing to note is that <code class="language-plaintext highlighter-rouge">Legacy JIT - x64</code> runs significantly faster than than the <code class="language-plaintext highlighter-rouge">x86</code> version, even though they are both running the same C# code (from the <code class="language-plaintext highlighter-rouge">Calc()</code> function above).</p>
<p>So now we are going to ask BenchmarkDotNet to give us the JIT inlining diagnostics. These diagnostics are available <a href="https://msdn.microsoft.com/library/ff356158(v=vs.100).aspx">via ETW Events</a> and are collected, parsed and displayed at the end of the output, as shown below:</p>
<p><img src="/images/2016/03/Method Inlining - Explanation.png" alt="Method Inlining - Explanation" /></p>
<p>Here we can that when the <code class="language-plaintext highlighter-rouge">x64</code> JITter runs the <code class="language-plaintext highlighter-rouge">WithStarg()</code> function is successfully inlined into the <code class="language-plaintext highlighter-rouge">Calc()</code> function, whereas with <code class="language-plaintext highlighter-rouge">x86</code> version it is not. So the same code is being executed, but because the <code class="language-plaintext highlighter-rouge">WithStarg()</code> function is relatively simple, when it is not inlined the cost of the method call dominates and causes the <code class="language-plaintext highlighter-rouge">Calc()</code> function to take more time. For a comparison the <code class="language-plaintext highlighter-rouge">WithoutStarg()</code> function is always inlined, because it doesn’t do anything with the <code class="language-plaintext highlighter-rouge">value</code> that is passed into it.</p>
<p>For a full-explanation of why there is a difference in behaviour between the 2 version of the JITter, I recommend reading <a href="http://aakinshin.net/en/blog/dotnet/inlining-and-starg/">Andrey Akinhin’s blog post on the subject</a>. But in summary the <code class="language-plaintext highlighter-rouge">x64</code> version is more efficient and it’s a bug/regression that the <code class="language-plaintext highlighter-rouge">x86</code> version doesn’t have the same behaviour.</p>
<h2 id="net-jit-inlining-rules">.NET JIT inlining rules</h2>
<p>In this case the specific reason that the <code class="language-plaintext highlighter-rouge">Legacy JIT - x86</code> gives for not inlining the <code class="language-plaintext highlighter-rouge">WithStarg()</code> method is:</p>
<blockquote>
<p><strong>Fail Reason: Inlinee writes to an argument.</strong></p>
</blockquote>
<p>For reference, there is a comprehensive list of <a href="https://blogs.msdn.microsoft.com/clrcodegeneration/2009/10/21/jit-etw-inlining-event-fail-reasons/">JIT ETW Inlining Event Fail Reasons</a> available on MSDN, although interestingly enough it doesn’t include this reason!</p>
<p>However, inlining isn’t always a win-win scenario. Because you are copying the same code to 2 locations, it can bloat the amount of memory that your programs needs.</p>
<p><strong>Update:</strong> A more recent list of justifications that the <a href="https://github.com/dotnet/coreclr/blob/master/src/jit/inline.def">.NET JITter provides for not inlining a method</a> is available, thanks to <a href="https://github.com/AndyAyersMS">Andy Ayers</a> from Microsoft for pointing it out to me.</p>
<p>So there are some rules that the .NET JITter follows <a href="https://blogs.msdn.microsoft.com/davidnotario/2004/11/01/jit-optimizations-inlining-ii/">when deciding whether or not to inline a method</a> (Note this list is from 2004, so the rules may well have changed since then)</p>
<blockquote>
<p>These are some of the reasons for which we won’t inline a method:</p>
<ul>
<li>
<p><strong>Method is marked as not inline</strong> with the CompilerServices.MethodImpl attribute.</p>
</li>
<li>
<p><strong>Size of inlinee is limited to 32 bytes of IL</strong>: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it’s x86 JIT specific and it’s longterm value, as compilers get smarter, is dubious).</p>
</li>
<li>
<p><strong>Virtual calls</strong>: We don’t inline across virtual calls. The reason for not doing this is that we don’t know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it’s not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we’re not forced to be so aggressive about optimizing this case.</p>
</li>
<li>
<p><strong>Valuetypes</strong>: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).</p>
</li>
<li>
<p><strong>MarshalByRef</strong>: Call targets that are in MarshalByRef classes won’t be inlined (call has to be intercepted and dispatched). We’ve got better in Whidbey for this scenario</p>
</li>
<li>
<p><strong>VM restrictions</strong>: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).</p>
</li>
<li>
<p><strong>Complicated flowgraph</strong>: We don’t inline loops, methods with exception handling regions, etc…</p>
</li>
<li>
<p>If basic block that has the call is <strong>deemed as it won’t execute frequently</strong> (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)</p>
</li>
<li>
<p><strong>Other</strong>: Exotic IL instructions, security checks that need a method frame, etc…</p>
</li>
</ul>
</blockquote>
<h2 id="summary">Summary</h2>
<p>So we can see that BenchmarkDotNet will display multiple pieces of information that allow you to diagnosing why your benchmarks take the time they do:</p>
<ol>
<li>Amount of Bytes allocated per Benchmark</li>
<li>Number of GC Collections triggered (Gen 0/1/2)</li>
<li>Whether a method was inlined or not</li>
</ol>
<p>The post <a href="http://www.mattwarren.org/2016/03/09/adventures-in-benchmarking-method-inlining/">Adventures in Benchmarking - Method Inlining</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Adventures in Benchmarking - Memory Allocations2016-02-17T00:00:00+00:00http://www.mattwarren.org/2016/02/17/adventures-in-benchmarking-memory-allocations
<p>For a while now I’ve been involved in the Open Source <a href="https://github.com/PerfDotNet/BenchmarkDotNet">BenchmarkDotNet</a> library along with <a href="https://github.com/AndreyAkinshin">Andrey Akinshin</a> the project owner. Our goal has been to produce a .NET Benchmarking library that is:</p>
<ol>
<li>Accurate</li>
<li>Easy-to-use</li>
<li>Helpful</li>
</ol>
<p>First and foremost we do everything we can to ensure that BenchmarkDotNet gives you accurate measurements, everything else is just <a href="http://www.brainyquote.com/quotes/quotes/p/paulwalker185136.html">“sprinkles on the sundae”</a>. That is, without accurate measurements, a benchmarking library is pretty useless, especially one that displays results in nanoseconds.</p>
<p>But once point 1) <a href="https://github.com/PerfDotNet/BenchmarkDotNet#how-it-works">has been dealt with</a>, 2) it a bit more subjective. Using BenchmarkDotNet involves little more than adding a <code class="language-plaintext highlighter-rouge">[Benchmark]</code> attribute to your method and then running it as per the <a href="https://github.com/PerfDotNet/BenchmarkDotNet#getting-started">Step-by-step guide</a> in the GitHub README. I’ll let you decide if that is <em>easy-to-use</em> or not, but again it’s something we strive for. Once you’re done with the “Getting Started” guide, there is also a complete set of <a href="https://github.com/PerfDotNet/BenchmarkDotNet/tree/master/BenchmarkDotNet.Samples/Intro">Tutorial Benchmarks</a> available, as well as some more <a href="https://github.com/PerfDotNet/BenchmarkDotNet/tree/master/BenchmarkDotNet.Samples">real-word examples</a> for you to take a look at.</p>
<h2 id="being-helpful">Being “Helpful”</h2>
<p>But this post isn’t going to be a general BenchmarkDotNet tutorial, instead I’m going to focus on some of the specific tools that it gives you to diagnose what is going on in a benchmark, or to put it another way, to help you answer the question “Why is Benchmark A slower than Benchmark B?”</p>
<h3 id="string-concat-vs-stringbuilder">String Concat vs StringBuilder</h3>
<p>Let’s start with a simple benchmark:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">Framework_StringConcatVsStringBuilder</span>
<span class="p">{</span>
<span class="p">[</span><span class="nf">Params</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="m">5</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="m">15</span><span class="p">,</span> <span class="m">20</span><span class="p">)]</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">Loops</span><span class="p">;</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">StringConcat</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">string</span> <span class="n">result</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="n">Empty</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">Loops</span><span class="p">;</span> <span class="p">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">result</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="nf">Concat</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="kt">string</span> <span class="nf">StringBuilder</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">StringBuilder</span> <span class="n">sb</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="p">(</span><span class="kt">string</span><span class="p">.</span><span class="n">Empty</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">Loops</span><span class="p">;</span> <span class="p">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">sb</span><span class="p">.</span><span class="nf">Append</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="nf">ToString</span><span class="p">());</span>
<span class="k">return</span> <span class="n">sb</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note: In case it’s not obvious the <code class="language-plaintext highlighter-rouge">[Params(..)]</code> attribute lets you run the same benchmark for a set of different input values. In this case the <code class="language-plaintext highlighter-rouge">Loops</code> field is set to each of the values in turn, i.e. <code class="language-plaintext highlighter-rouge">1, 2, 3, 4, 5, 10, 15, 20</code>, before another instance of the benchmark is run.</p>
<p>If you’ve been programming in C# for long enough, you’ll have no doubt have been given the guidance <a href="http://www.yoda.arachsys.com/csharp/stringbuilder.html">“use StringBuilder to concatenate strings”</a>, but what is the actual difference?</p>
<p><img src="/images/2016/02/Framework_StringConcatVsStringBuilder - Basic Results.png" alt="StringConcat Vs StringBuilder - Basic Results" /></p>
<p>Well in terms of time taken there <em>is</em> a difference, but even with <code class="language-plaintext highlighter-rouge">20</code> loops it’s not huge, we are talking about roughly <code class="language-plaintext highlighter-rouge">500 ns</code>, i.e. <code class="language-plaintext highlighter-rouge">0.0005 ms</code>, so you would have to be doing it <em>alot</em> to notice a slow-down.</p>
<p>However, this time lets see what the results would look like if we have the BenchmarkDotNet “Garbage Collection” (GC) Diagnostics enabled:</p>
<p><img src="/images/2016/02/Framework_StringConcatVsStringBuilder - GC Results - cutdown.png" alt="StringConcat Vs StringBuilder - Results with GC Diagnostic" /></p>
<p>Here we can clearly see a difference between the benchmarks. Once we get beyond 10 loops, the <code class="language-plaintext highlighter-rouge">StringBuilder</code> benchmark is way more efficient compared to <code class="language-plaintext highlighter-rouge">StringConcat</code>. It causes way less “Generation 0” collections and allocates roughly 50% less bytes for each <code class="language-plaintext highlighter-rouge">Operation</code>, i.e. each invocation of the benchmark method.</p>
<p>It’s worth noting that <strong>in this case</strong>, 10 loops is the break-even point. Before that point <code class="language-plaintext highlighter-rouge">StringConcat</code> is marginally faster and allocates less memory, but after that point <code class="language-plaintext highlighter-rouge">StringBuilder</code> is more efficient. The reason is that there is a memory overhead for the <code class="language-plaintext highlighter-rouge">StringBuilder</code> class itself, which dominates the cost when you are only appending a few short strings (as we are in this particular benchmark). Interesting enough the .NET Runtime developers noticed this overhead and so <a href="http://referencesource.microsoft.com/#mscorlib/system/text/stringbuildercache.cs,a6dbe82674916ac0">introduced a StringBuilder Cache</a>, to enable re-use of existing instances, rather than allocating a new one every time.</p>
<h3 id="dictionary-vs-idictionary">Dictionary vs IDictionary</h3>
<p>But what about a less well-known example. Imagine after some re-factoring you noticed that your application was triggering a lot more Gen 0/1/2 collections (you do monitor this in your live systems right?) After looking at the recent code commits and carrying out some profiling you narrow the problem down to a refactoring that changed a variable declaration from <code class="language-plaintext highlighter-rouge">Dictionary</code> to <code class="language-plaintext highlighter-rouge">IDictionary</code>, i.e. exactly the type of refactoring that this <a href="http://stackoverflow.com/questions/1595498/a-difference-in-style-idictionary-vs-dictionary">Stack Overflow question is discussing</a>.</p>
<p>To benchmark what’s actually going on here, we can write some code like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">Framework_DictionaryVsIDictionary</span>
<span class="p">{</span>
<span class="n">Dictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="n">dict</span><span class="p">;</span>
<span class="n">IDictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="n">idict</span><span class="p">;</span>
<span class="p">[</span><span class="n">Setup</span><span class="p">]</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">Setup</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">dict</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Dictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">>();</span>
<span class="n">idict</span> <span class="p">=</span> <span class="p">(</span><span class="n">IDictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">>)</span><span class="n">dict</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="n">Dictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="nf">DictionaryEnumeration</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">item</span> <span class="k">in</span> <span class="n">dict</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">dict</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">[</span><span class="n">Benchmark</span><span class="p">]</span>
<span class="k">public</span> <span class="n">IDictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">string</span><span class="p">></span> <span class="nf">IDictionaryEnumeration</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">item</span> <span class="k">in</span> <span class="n">idict</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">idict</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note: we are deliberately not doing anything with the items inside the <code class="language-plaintext highlighter-rouge">foreach</code> loop because we just want to see what the difference in iteration of the 2 collections is. Also note that we are using the <strong>same underlying data structure</strong>, we are just accessing via an <code class="language-plaintext highlighter-rouge">IDictionary</code> cast in the 2nd benchmark.</p>
<p>So what results do we get:</p>
<p><img src="/images/2016/02/Dictionary v IDictionary - GC Results.png" alt="Dictionary v IDictionary - GC Results.png" /></p>
<p>Nice and clear, accessing the same data via the <code class="language-plaintext highlighter-rouge">IDictionary</code> interface causes a lot of extra allocations, roughly 22 bytes per <code class="language-plaintext highlighter-rouge">foreach</code> loop. This in turn triggers a lot of extra GC collections. It’s worth pointing out that when BenchmarkDotNet executes, it will run the same benchmark method, <code class="language-plaintext highlighter-rouge">IDictionaryEnumeration()</code> in this case, millions of times, so that we can obtain an accurate measurment. Therefore the actual # of <code class="language-plaintext highlighter-rouge">Gen 0</code> collections isn’t so important, it is the relative amount compared to the <code class="language-plaintext highlighter-rouge">DictionaryEnumeration()</code> benchmark that should be looked at.</p>
<p>Now this scenario might seem a bit contrived and I have to admit that I knew the answer before I started investigating it, however it did originate from a real-life issue, discovered by <a href="https://twitter.com/ben_a_adams">Ben Adams</a>. For the full background take a look at the CoreCLR GitHub issue, <a href="https://github.com/dotnet/coreclr/issues/1579">Avoid enumeration allocation via interface</a>, but as shown below this was identified because in Kestrel/ASP.NET the request/resposne headers are kept in an <code class="language-plaintext highlighter-rouge">IDictionary</code> data structure and so cause an additional 128 MBytes of garbage per second, when running at 1 Million requests per/second.</p>
<p><a href="https://github.com/dotnet/coreclr/issues/1579#issuecomment-141432753"><img src="/images/2016/02/Dictionary v IDictionary - In Kestrel and ASPNET.png" alt="Dictionary v IDictionary - In Kestrel and ASPNET" /></a></p>
<p>Finally, what is the technical explanation of the additional allocations, quoting from <a href="https://github.com/dotnet/coreclr/issues/1579#issuecomment-141133843">Stephen Toub of Microsoft</a></p>
<blockquote>
<p>… But when accessed via the interface, you’re using the interface method that’s typed to return <code class="language-plaintext highlighter-rouge">IEnumerator<KeyValuePair<TKey,TValue>></code> rather than <code class="language-plaintext highlighter-rouge">Dictionary<TKey, TValue>.Enumerator</code>, <strong>so the struct gets boxed</strong>.</p>
</blockquote>
<p>and then <a href="https://github.com/dotnet/coreclr/issues/1579#issuecomment-142953036">further down the same issue</a></p>
<blockquote>
<p>Yes, the issue isn’t just enumerator allocations, it’s also interface-based dispatch. In addition to boxing the enumerator, the <code class="language-plaintext highlighter-rouge">MoveNext</code> and <code class="language-plaintext highlighter-rouge">Current</code> calls made per element <strong>go from being potentially-inlineable non-virtual calls to being interface calls</strong>.</p>
</blockquote>
<h2 id="implementation-details">Implementation Details</h2>
<p><strong>Update Feb 2017</strong> - This section is now out-of-date as the implementation details have now changed, please see Adam Sitnik’s <a href="http://adamsitnik.com/the-new-Memory-Diagnoser/">blog post</a> for all the details</p>
<p>This is all made possible be the excellent <a href="https://msdn.microsoft.com/en-us/library/ff356162(v=vs.110).aspx">Gargage Collection ETW Events</a> that the .NET runtime produces. In particular the <a href="https://msdn.microsoft.com/en-us/library/ff356162(v=vs.110).aspx#gcallocationtick_v2_event">GCAllocationTick_V2 Event</a> that is fired each time approximately 100 KB is allocated. An xml representation of a typical event is shown below, you can see that <code class="language-plaintext highlighter-rouge">0x1A060</code> or 106,592 bytes have just been allocated.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><UserData></span>
<span class="nt"><GCAllocationTick_V3</span> <span class="na">xmlns=</span><span class="s">'myNs'</span><span class="nt">></span>
<span class="nt"><AllocationAmount></span>0x1A060<span class="nt"></AllocationAmount></span>
<span class="nt"><AllocationKind></span>0<span class="nt"></AllocationKind></span>
<span class="nt"><ClrInstanceID></span>34<span class="nt"></ClrInstanceID></span>
<span class="nt"><AllocationAmount64></span>0x1A060<span class="nt"></AllocationAmount64></span>
<span class="nt"><TypeID></span>0xEE05D18<span class="nt"></TypeID></span>
<span class="nt"><TypeName></span>LibGit2Sharp.Core.GitDiffFile<span class="nt"></TypeName></span>
<span class="nt"><HeapIndex></span>0<span class="nt"></HeapIndex></span>
<span class="nt"><Address></span>0x32056CD0<span class="nt"></Address></span>
<span class="nt"></GCAllocationTick_V3></span>
<span class="nt"></UserData></span>
</code></pre></div></div>
<p>To collect these events BenchmarkDotNet uses the <a href="https://technet.microsoft.com/en-gb/library/cc753820.aspx">logman tool</a> that is built into Windows. This runs in the background and collects the specified ETW events until you ask it to stop. These events are continuously written to an <code class="language-plaintext highlighter-rouge">.etl</code> file that can then be read by tools such as <a href="https://msdn.microsoft.com/en-us/library/windows/hardware/hh448170.aspx">Windows Performance Analyzer</a>. Once the ETW events have been collected, BenchmarkDotNet then parses them using the excellent <a href="https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent">TraceEvent</a> library, using code like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="p">(</span><span class="kt">var</span> <span class="n">source</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">ETWTraceEventSource</span><span class="p">(</span><span class="n">fileName</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">source</span><span class="p">.</span><span class="n">Clr</span><span class="p">.</span><span class="n">GCAllocationTick</span> <span class="p">+=</span> <span class="p">(</span><span class="n">gcData</span> <span class="p">=></span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">statsPerProcess</span><span class="p">.</span><span class="nf">ContainsKey</span><span class="p">(</span><span class="n">gcData</span><span class="p">.</span><span class="n">ProcessID</span><span class="p">))</span>
<span class="n">statsPerProcess</span><span class="p">[</span><span class="n">gcData</span><span class="p">.</span><span class="n">ProcessID</span><span class="p">].</span><span class="n">AllocatedBytes</span> <span class="p">+=</span> <span class="n">gcData</span><span class="p">.</span><span class="n">AllocationAmount64</span><span class="p">;</span>
<span class="p">});</span>
<span class="n">source</span><span class="p">.</span><span class="n">Clr</span><span class="p">.</span><span class="n">GCStart</span> <span class="p">+=</span> <span class="p">(</span><span class="n">gcData</span> <span class="p">=></span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">statsPerProcess</span><span class="p">.</span><span class="nf">ContainsKey</span><span class="p">(</span><span class="n">gcData</span><span class="p">.</span><span class="n">ProcessID</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">genCounts</span> <span class="p">=</span> <span class="n">statsPerProcess</span><span class="p">[</span><span class="n">gcData</span><span class="p">.</span><span class="n">ProcessID</span><span class="p">].</span><span class="n">GenCounts</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">gcData</span><span class="p">.</span><span class="n">Depth</span> <span class="p">>=</span> <span class="m">0</span> <span class="p">&&</span> <span class="n">gcData</span><span class="p">.</span><span class="n">Depth</span> <span class="p"><</span> <span class="n">genCounts</span><span class="p">.</span><span class="n">Length</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// ignore calls to GC.Collect(..) from BenchmarkDotNet itself</span>
<span class="k">if</span> <span class="p">(</span><span class="n">gcData</span><span class="p">.</span><span class="n">Reason</span> <span class="p">!=</span> <span class="n">GCReason</span><span class="p">.</span><span class="n">Induced</span><span class="p">)</span>
<span class="n">genCounts</span><span class="p">[</span><span class="n">gcData</span><span class="p">.</span><span class="n">Depth</span><span class="p">]++;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="n">source</span><span class="p">.</span><span class="nf">Process</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<hr />
<p>Hopefully this has shown you some of the power of BenchmarkDotNet, please consider giving it a go next time you need to (micro-)benchmark some .NET code, hopefully it will save you from having to hand-roll your own benchmarking code.</p>
<p>The post <a href="http://www.mattwarren.org/2016/02/17/adventures-in-benchmarking-memory-allocations/">Adventures in Benchmarking - Memory Allocations</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Technically Speaking - Anniversary Mentoring2016-02-16T00:00:00+00:00http://www.mattwarren.org/2016/02/16/technically-speaking-anniversary-mentoring
<p>I’ve been reading the excellent <a href="https://tinyletter.com/techspeak/archive">Technically Speaking</a> newsletter for a while now and when they announced they would be running a <a href="http://www.catehuston.com/blog/2015/12/07/running-a-mentoring-program/">mentoring program</a>, I jumped at the chance and applied straight away. The idea was that each applicant had to set themselves speaking goals or identify areas they wanted to improve and then if you were selected <a href="https://twitter.com/techspeakdigest">@techspeakdigest</a> would set you up with a mentor.</p>
<p>I was fortunate enough to be chosen and assigned to <a href="https://twitter.com/catehstn">Cate</a> one of the authors of the newsletter, who is also a prolific <a href="http://www.catehuston.com/blog/talks/">conference speaker</a>. As part of scheme I had to identify the areas that I wanted to improve during the hour-long mentoring session, which for me were:</p>
<ul>
<li>Turning an outline into a good abstract.</li>
<li>Tips for getting a talk accepted via a CFP submission</li>
</ul>
<p>I’ve previously done <a href="/speaking">some talks</a> and they seemed to be well received, but I wanted to expand the range of topics I talked about and try and speak at some other conferences.</p>
<h2 id="writing-a-good-abstract">Writing a Good Abstract</h2>
<hr />
<p>At the start of the session Cate looked through an existing submission and offered some advice, which started with the initial comment of:</p>
<blockquote>
<p>Good idea, not well pitched</p>
</blockquote>
<p>She then went onto offer some really great tips about what conferences were looking for and how I could develop my abstract. I’ve put the rest of my notes below and left them as I wrote them down, so they are a bit jumbled, but they reflect what happened during the conversation!</p>
<h3 id="tips-for-an-abstract-after-reading-mine">Tips for an abstract (after reading mine):</h3>
<ol>
<li>Be pragmatic, too much “<em>one true way</em>” can put people off. Maybe a bit too opinionated.</li>
<li>Don’t tie your talk to just one library, might alienate people too much.</li>
</ol>
<h3 id="talk-outlinestructure">Talk outline/structure</h3>
<ol>
<li><strong>Explain</strong> - what does it mean to write faster code</li>
<li><strong>Situate</strong> - optimisation - what is it? how do you do it? benchmark, etc</li>
<li><strong>Apply</strong> - specific examples</li>
</ol>
<h3 id="other-suggestions">Other suggestions</h3>
<blockquote>
<p>If listeners (or conference organisation committee) <strong>agree with your assumptions</strong>, they might be more likely to choose your pitch</p>
</blockquote>
<ul>
<li>
<p>Be careful about being too specific in the abstract</p>
</li>
<li>
<p>Don’t put too much in the abstract, leave some specifics out</p>
</li>
</ul>
<blockquote>
<p><em>be compelling, but a little big vague</em></p>
</blockquote>
<ul>
<li>
<p>1 or 2 examples of what <strong>not</strong> to do is okay, but must give them something to <strong>do</strong> afterwards, otherwise you could put them off.</p>
</li>
<li>Broad v. Narrow talks
<ul>
<li>Most conferences will want “<em>broader talks</em>”</li>
</ul>
</li>
<li><strong>Bio</strong> is pitch for you
<ul>
<li><strong>Abstract</strong> is pitch for you talk</li>
</ul>
</li>
</ul>
<hr />
<p>Finally, as well as offering general advice, Cate also took the time to help me re-write an existing abstract I’d put together. I’ve included the “before” and “after” below, so you can see the difference. Whilst it’s hard to see someone pick apart what you’re written, I do agree that the “after” reads much better and sounds more compelling than the “before”!</p>
<h3 id="before">Before</h3>
<blockquote>
<p><strong>Microbenchmarks and Optimisations</strong></p>
<p>We all want to write faster code right, but how do we know it really is faster, how do we measure it correctly?</p>
<p>During this talk we will look at what mistakes to avoid when benchmarking .NET code and how to do it accurately. Along the way we will also discover some surprising code optimisations and explore why they are happening</p>
</blockquote>
<h3 id="after">After</h3>
<blockquote>
<p><strong>Where the Wild Things Are - Finding Performance Problems Before They Bite You</strong></p>
<p>You don’t want to prematurely optimize, but sometimes you want to optimize, the question is - where to start? Benchmarking can help you figure out what your application is doing and where performance problems could arise - allowing you to find (and fix!) them before your customers do.</p>
<p>If you aren’t already benchmarking your code this talk will offer some starting points. We’ll look at how to accurately benchmark in .NET and things to avoid. Along the way we’ll also discover some surprising code optimisations!</p>
</blockquote>
<h2 id="the-end-result">The End Result</h2>
<p>After the mentoring with Cate took place I was accepted to talk at <a href="http://www.progscon.co.uk/talks">ProgSCon London 2016</a>, so obviously the tips and re-write of my abstract made a big difference!!</p>
<p><a href="http://www.progscon.co.uk/program"><img src="/images/2016/02/Talk at ProgSCon London.png" alt="Talk at ProgSCon London" /></a></p>
<p>So thanks to <a href="https://twitter.com/chiuki">Chiu-Ki Chan</a> and <a href="https://twitter.com/catehstn">Cate</a> for producing Technically Speaking every week, it’s certainly helped me out!</p>
Learning How Garbage Collectors Work - Part 12016-02-04T00:00:00+00:00http://www.mattwarren.org/2016/02/04/learning-how-garbage-collectors-work-part-1
<p>This series is an attempt to learn more about how a real-life “Garbage Collector” (GC) works internally, i.e. not so much “<em>what it does</em>”, but “<em>how it does it</em>” at a low-level. I will be mostly be concentrating on the .NET GC, because I’m a .NET developer and also because it’s recently been <a href="/2015/12/08/open-source-net-1-year-later/">Open Sourced</a> so we can actually look at the code.</p>
<p><strong>Note:</strong> If you do want to learn about what a GC does, I really recommend the talk <a href="https://vimeo.com/113632451">Everything you need to know about .NET memory</a> by Ben Emmett, it’s a fantastic talk that uses lego to explain what the .NET GC does (the <a href="http://www.slideshare.net/benemmett/net-memory-management-ndc-london">slides are also available</a>)</p>
<p>Well, trying to understand what the .NET GC does by looking at the source was my original plan, but if you go and take a look at the <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp">code on GitHub</a> you will be presented with the message “<em>This file has been truncated,…</em>”:</p>
<p><a href="https://github.com/dotnet/coreclr/blob/master/src/gc/gc.cpp"><img src="https://cloud.githubusercontent.com/assets/157298/12352478/49f74242-bb7e-11e5-8028-5df72943f58a.png" alt="gc.cpp on GitHub" /></a></p>
<p>This is because the file is <strong>36,915</strong> lines long and <strong>1.19MB</strong> in size! Now before you send a PR to Microsoft that chops it up into smaller bits, you might want to read this <a href="https://github.com/dotnet/coreclr/issues/408">discussion on reorganizing gc.cpp</a>. It turns out you are not the only one who’s had that idea and your PR will probably be rejected, for some <a href="https://github.com/dotnet/coreclr/issues/408#issuecomment-78014795">specific reasons</a>.</p>
<h2 id="goals-of-the-gc">Goals of the GC</h2>
<p>So, as I’m not going to be able to read and understand a 36 KLOC .cpp source file any time soon, instead I tried a different approach and started off by looking through the excellent Book-of-the-Runtime (BOTR) section on the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md#design-of-the-collector">“Design of the Collector”</a>. This very helpfully lists the following goals of the .NET GC (<strong>emphasis</strong> mine):</p>
<blockquote>
<p>The GC strives to manage memory <strong>extremely efficiently</strong> and require <strong>very little effort from people who write managed code</strong>. Efficient means:</p>
<ul>
<li>GCs should occur often enough to <strong>avoid the managed heap containing a significant amount (by ratio or absolute count) of unused but allocated objects</strong> (garbage), and therefore use memory unnecessarily.</li>
<li>GCs should happen as <strong>infrequently as possible to avoid using otherwise useful CPU time</strong>, even though frequent GCs would result in lower memory usage.</li>
<li><strong>A GC should be productive</strong>. If GC reclaims a small amount of memory, then the GC (including the associated CPU cycles) was wasted.</li>
<li><strong>Each GC should be fast</strong>. Many workloads have low latency requirements.</li>
<li><strong>Managed code developers shouldn’t need to know much about the GC to achieve good memory utilization</strong> (relative to their workload). – The GC should tune itself to satisfy different memory usage patterns.</li>
</ul>
</blockquote>
<p>So there’s some interesting points in there, in particular they twice included the goal of ensuring developers don’t have to know much about the GC to make it efficient. This is probably one of the main differences between the .NET and Java GC implementations, as explained in an answer to the Stack Overflow question <a href="http://stackoverflow.com/questions/492703/net-vs-java-garbage-collector/492821#492821">“<em>.Net vs Java Garbage Collector</em>”</a></p>
<blockquote>
<p>A difference between Oracle’s and Microsoft’s GC implementation ‘ethos’ is one of configurability.</p>
<p>Oracle provides a vast number of options (at the command line) to tweak aspects of the GC or switch it between different modes. Many options are of the -X or -XX to indicate their lack of support across different versions or vendors. The CLR by contrast provides next to no configurability; your only real option is the use of the server or client collectors which optimise for throughput verses latency respectively.</p>
</blockquote>
<hr />
<h2 id="net-gc-sample">.NET GC Sample</h2>
<p>So now we have an idea about what the goals of the GC are, lets take a look at how it goes about things. Fortunately those nice developers at Microsoft released a <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/sample/GCSample.cpp">GC Sample</a> that shows you, at a basic level, how you can use the full .NET GC in your own code. After building the sample (and <a href="https://github.com/dotnet/coreclr/pull/2582">finding a few bugs in the process</a>), I was able to get a simple, single-threaded Workstation GC up and running.</p>
<p>What’s interesting about the sample application is that it clearly shows you what actions the <a href="https://github.com/mattwarren/GCSample/blob/master/sample/GCSample.cpp#L11-L37">.NET Runtime has to perform to make the GC work</a>. So for instance, at a high-level the runtime needs to go through the following process to allocate an object:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">AllocateObject(..)</code>
<ul>
<li>See below for the code and explanation of the allocation process</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">CreateGlobalHandle(..)</code>
<ul>
<li>If we want to store the object in a “strong handle/reference”, as opposed to a “weak” one. In C# code this would typically be a static variable. This is what tells the GC that the object is referenced, so that is can know that it shouldn’t be cleaned up when a GC collection happens.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">ErectWriteBarrier(..)</code>
<ul>
<li>For more information see “Marking the Card Table” below</li>
</ul>
</li>
</ol>
<h3 id="allocating-an-object">Allocating an Object</h3>
<p><a href="https://github.com/dotnet/coreclr/blob/master/src/gc/sample/GCSample.cpp#L55-L79"><code class="language-plaintext highlighter-rouge">AllocateObject(..)</code> code from GCSample.cpp</a></p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Object</span> <span class="p">*</span> <span class="nf">AllocateObject</span><span class="p">(</span><span class="n">MethodTable</span> <span class="p">*</span> <span class="n">pMT</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">alloc_context</span> <span class="p">*</span> <span class="n">acontext</span> <span class="p">=</span> <span class="nf">GetThread</span><span class="p">()-></span><span class="nf">GetAllocContext</span><span class="p">();</span>
<span class="n">Object</span> <span class="p">*</span> <span class="n">pObject</span><span class="p">;</span>
<span class="n">size_t</span> <span class="n">size</span> <span class="p">=</span> <span class="n">pMT</span><span class="p">-></span><span class="nf">GetBaseSize</span><span class="p">();</span>
<span class="n">uint8_t</span><span class="p">*</span> <span class="n">result</span> <span class="p">=</span> <span class="n">acontext</span><span class="p">-></span><span class="n">alloc_ptr</span><span class="p">;</span>
<span class="n">uint8_t</span><span class="p">*</span> <span class="n">advance</span> <span class="p">=</span> <span class="n">result</span> <span class="p">+</span> <span class="n">size</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">advance</span> <span class="p"><=</span> <span class="n">acontext</span><span class="p">-></span><span class="n">alloc_limit</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">acontext</span><span class="p">-></span><span class="n">alloc_ptr</span> <span class="p">=</span> <span class="n">advance</span><span class="p">;</span>
<span class="n">pObject</span> <span class="p">=</span> <span class="p">(</span><span class="n">Object</span> <span class="p">*)</span><span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">pObject</span> <span class="p">=</span> <span class="n">GCHeap</span><span class="p">::</span><span class="nf">GetGCHeap</span><span class="p">()-></span><span class="nf">Alloc</span><span class="p">(</span><span class="n">acontext</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="m">0</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pObject</span> <span class="p">==</span> <span class="n">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="n">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pObject</span><span class="p">-></span><span class="nf">RawSetMethodTable</span><span class="p">(</span><span class="n">pMT</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pObject</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>To understand what’s going on here, the BOTR again comes in handy as it gives us a clear overview of the process, from <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md#design-of-allocator">“Design of Allocator”</a>:</p>
<blockquote>
<p>When the GC gives out memory to the allocator, it does so in terms of allocation contexts. The size of an allocation context is defined by the allocation quantum.</p>
</blockquote>
<blockquote>
<ul>
<li>Allocation contexts are smaller regions of a given heap segment that are each dedicated for use by a given thread. On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.</li>
<li>The Allocation quantum is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.</li>
</ul>
</blockquote>
<p>This shows how is is possible for the .NET GC to make allocating an object (or memory) such a cheap operation. Because of all the work that it has done in the background, the majority of the time an object allocation happens, it is just a case of incrementing a pointer by the number of bytes needed to hold the new object. This is what the code in the first half of the <code class="language-plaintext highlighter-rouge">AllocateObject(..)</code> function (above) is doing, it’s bumping up the free-space pointer (<code class="language-plaintext highlighter-rouge">acontext->alloc_ptr</code>) and giving out a pointer to the newly created space in memory.</p>
<p>It’s only when the current <strong>allocation context</strong> doesn’t have enough space that things get more complicated and potentially more expensive. At this point <code class="language-plaintext highlighter-rouge">GCHeap::GetGCHeap()->Alloc(..)</code> is called which may in turn trigger a GC collection before a new allocation context can be provided.</p>
<p>Finally, it’s worth looking at the goals that the allocator was designed to achieve, again from the BOTR:</p>
<blockquote>
<ul>
<li><strong>Triggering a GC when appropriate:</strong> The allocator triggers a GC when the allocation budget (a threshold set by the collector) is exceeded or when the allocator can no longer allocate on a given segment. The allocation budget and managed segments are discussed in more detail later.</li>
<li><strong>Preserving object locality:</strong> Objects allocated together on the same heap segment will be stored at virtual addresses close to each other.</li>
<li><strong>Efficient cache usage:</strong> The allocator allocates memory in allocation quantum units, not on an object-by-object basis. It zeroes out that much memory to warm up the CPU cache because there will be objects immediately allocated in that memory. The allocation quantum is usually 8k.</li>
<li><strong>Efficient locking:</strong> The thread affinity of allocation contexts and quantums guarantee that there is only ever a single thread writing to a given allocation quantum. As a result, there is no need to lock for object allocations, as long as the current allocation context is not exhausted.</li>
<li><strong>Memory integrity:</strong> The GC always zeroes out the memory for newly allocated objects to prevent object references pointing at random memory.</li>
<li><strong>Keeping the heap crawlable:</strong> The allocator makes sure to make a free object out of left over memory in each allocation quantum. For example, if there is 30 bytes left in an allocation quantum and the next object is 40 bytes, the allocator will make the 30 bytes a free object and get a new allocation quantum.</li>
</ul>
</blockquote>
<p>One of the interesting items this highlights is an advantage of GC systems, namely that you get efficient <a href="http://mechanical-sympathy.blogspot.co.uk/2012/08/memory-access-patterns-are-important.html">CPU cache usage or good object locality</a> because memory is allocated in units. This means that objects created one after the other (on the same thread), will sit next to each other in memory.</p>
<h3 id="marking-the-card-table">Marking the “Card Table”</h3>
<p>The 3rd part of the process of allocating an object was a call to <a href="https://github.com/dotnet/coreclr/blob/master/src/gc/sample/GCSample.cpp#L90-L105">ErectWriteBarrier(..)
</a>, which looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>inline void ErectWriteBarrier(Object ** dst, Object * ref)
{
// if the dst is outside of the heap (unboxed value classes) then we simply exit
if (((uint8_t*)dst < g_lowest_address) || ((uint8_t*)dst >= g_highest_address))
return;
if ((uint8_t*)ref >= g_ephemeral_low && (uint8_t*)ref < g_ephemeral_high)
{
// volatile is used here to prevent fetch of g_card_table from being reordered
// with g_lowest/highest_address check above.
uint8_t* pCardByte = (uint8_t *)*(volatile uint8_t **)(&g_card_table) +
card_byte((uint8_t *)dst);
if(*pCardByte != 0xFF)
*pCardByte = 0xFF;
}
}
</code></pre></div></div>
<p>Now explaining what is going on here is probably an entire post on it’s own and fortunately other people have already done the work for me, if you are interested in finding our more take a look at the <a href="#further-information">links at the end of this post</a>.</p>
<p>But in summary, the card-table is an optimisation that allows the GC to collect a single Generation (e.g. Gen 0), but still know about objects that are referenced from other, older generations. For instance if you had an array, <code class="language-plaintext highlighter-rouge">myArray = new MyClass[100]</code> that was in Gen 1 and you wrote the following code <code class="language-plaintext highlighter-rouge">myArray[5] = new MyClass()</code>, a write barrier would be set-up to indicate that the <code class="language-plaintext highlighter-rouge">MyClass</code> object was referenced by a given section of Gen 1 memory.</p>
<p>Then, when the GC wants to perform the mark phase for a Gen 0, in order to find all the live-objects it uses the card-table to tell it in which memory section(s) of other Generations it needs to look. This way it can find references from those older objects to the ones stored in Gen 0. This is a space/time tradeoff, the card-table represents 4KB sections of memory, so it still has to scan through that 4KB chunk, but it’s better than having to scan the entire contents of the Gen 1 memory when it wants to carry of a Gen 0 collection.</p>
<p>If it didn’t do this extra check (via the card-table), then any Gen 0 objects that were only referenced by older objects (i.e. those in Gen 1/2) would not be considered “live” and would then be collected. See the image below for what this looks like in practice:</p>
<p><img src="https://msdnshared.blob.core.windows.net/media/TNBlogsFS/BlogFileStorage/blogs_msdn/abhinaba/WindowsLiveWriter/BackToBasicsGenerationalGarbageCollectio_115F4/image_18.png" alt="Write barrier + card-table" /></p>
<p>Image taken from <a href="http://blogs.msdn.com/b/abhinaba/archive/2009/03/02/back-to-basics-generational-garbage-collection.aspx">Back To Basics: Generational Garbage Collection</a></p>
<hr />
<h2 id="gc-and-execution-engine-interaction">GC and Execution Engine Interaction</h2>
<p>The final part of the GC sample that I will be looking at is the way in which the GC needs to interact with the .NET Runtime Execution Engine (EE). The EE is responsible for actually running or coordinating all the low-level things that the .NET runtime needs to-do, such as creating threads, reserving memory and so it acts as an interface to the OS, via <a href="https://github.com/mattwarren/GCSample/blob/master/sample/gcenv.windows.cpp">Windows</a> and <a href="https://github.com/mattwarren/GCSample/blob/master/sample/gcenv.unix.cpp">Unix</a> implementations.</p>
<p>To understand this interaction between the GC and the EE, it’s helpful to look at all the functions the GC expects the EE to make available:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">void SuspendEE(GCToEEInterface::SUSPEND_REASON reason)</code></li>
<li><code class="language-plaintext highlighter-rouge">void RestartEE(bool bFinishedGC)</code></li>
<li><code class="language-plaintext highlighter-rouge">void GcScanRoots(promote_func* fn, int condemned, int max_gen, ScanContext* sc)</code></li>
<li><code class="language-plaintext highlighter-rouge">void GcStartWork(int condemned, int max_gen)</code></li>
<li><code class="language-plaintext highlighter-rouge">void AfterGcScanRoots(int condemned, int max_gen, ScanContext* sc)</code></li>
<li><code class="language-plaintext highlighter-rouge">void GcBeforeBGCSweepWork()</code></li>
<li><code class="language-plaintext highlighter-rouge">void GcDone(int condemned)</code></li>
<li><code class="language-plaintext highlighter-rouge">bool RefCountedHandleCallbacks(Object * pObject)</code></li>
<li><code class="language-plaintext highlighter-rouge">bool IsPreemptiveGCDisabled(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">void EnablePreemptiveGC(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">void DisablePreemptiveGC(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">void SetGCSpecial(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">alloc_context * GetAllocContext(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">bool CatchAtSafePoint(Thread * pThread)</code></li>
<li><code class="language-plaintext highlighter-rouge">void AttachCurrentThread()</code></li>
<li><code class="language-plaintext highlighter-rouge">void GcEnumAllocContexts (enum_alloc_context_func* fn, void* param)</code></li>
<li><code class="language-plaintext highlighter-rouge">void SyncBlockCacheWeakPtrScan(HANDLESCANPROC, uintptr_t, uintptr_t)</code></li>
<li><code class="language-plaintext highlighter-rouge">void SyncBlockCacheDemote(int /*max_gen*/)</code></li>
<li><code class="language-plaintext highlighter-rouge">void SyncBlockCachePromotionsGranted(int /*max_gen*/)</code></li>
</ul>
<p>If you want to see how the .NET Runtime performs these “tasks”, you can take a look at the <a href="https://github.com/dotnet/coreclr/blob/master/src/vm/gcenv.ee.cpp">real implementation</a>. However in the GC Sample these methods are mostly <a href="https://github.com/mattwarren/GCSample/blob/90d07fdff32d370a3977978854d2d221027e1780/sample/gcenv.ee.cpp#L147-L165">stubbed out</a> as no-ops. So that I could get an idea of the flow of the GC during a collection, I added simple <code class="language-plaintext highlighter-rouge">print(..)</code> statements to each one, then when I ran the GC Sample I got the following output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SuspendEE(SUSPEND_REASON = 1)
GcEnumAllocContexts(..)
GcStartWork(condemned = 0, max_gen = 2)
GcScanRoots(condemned = 0, max_gen = 2)
AfterGcScanRoots(condemned = 0, max_gen = 2)
GcScanRoots(condemned = 0, max_gen = 2)
GcDone(condemned = 0)
RestartEE(bFinishedGC = TRUE)
</code></pre></div></div>
<p>Which fortunately corresponds nicely with the GC phases for <strong><a href="https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md#wks-gc-with-concurrent-gc-off">WKS GC with concurrent GC off</a></strong> as outlined in the BOTR:</p>
<blockquote>
<ol>
<li>User thread runs out of allocation budget and triggers a GC.</li>
<li>GC calls SuspendEE to suspend managed threads.</li>
<li>GC decides which generation to condemn.</li>
<li>Mark phase runs.</li>
<li>Plan phase runs and decides if a compacting GC should be done.</li>
<li>If so relocate and compact phase runs. Otherwise, sweep phase runs.</li>
<li>GC calls RestartEE to resume managed threads.</li>
<li>User thread resumes running.</li>
</ol>
</blockquote>
<hr />
<h2 id="further-information">Further Information</h2>
<p>If you want to find out any more information about Garbage Collectors, here is a list of useful links:</p>
<ul>
<li>General
<ul>
<li><a href="http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/">Baby’s First Garbage Collector</a></li>
<li><a href="http://web.engr.illinois.edu/%7Emaplant2/gc.html">Writing a Simple Garbage Collector in C</a></li>
</ul>
</li>
<li>Marking the Card Table
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/ms973837.aspx">Making Generations Work with Write Barriers</a></li>
<li><a href="http://patshaughnessy.net/2013/10/30/generational-gc-in-python-and-ruby">Generational GC in Python and Ruby</a></li>
<li><a href="https://www.jetbrains.com/dotmemory/help/NET_Memory_Management_Concepts.html">NET Memory Management Concepts</a></li>
<li><a href="http://blogs.msdn.com/b/abhinaba/archive/2009/03/02/back-to-basics-generational-garbage-collection.aspx">Back-to-basics Generational GC</a></li>
<li><a href="http://www.devx.com/Java/Article/21977">Garbage Collection in the Java HotSpot Virtual Machine</a></li>
<li><a href="http://www.cncoders.net/article/6981/">Understanding GC pauses in JVM, HotSpot’s minor GC</a></li>
</ul>
</li>
</ul>
<hr />
<h2 id="gc-sample-code-layout-for-reference">GC Sample Code Layout (for reference)</h2>
<p><strong>GC Sample Code (under \sample)</strong></p>
<ul>
<li>GCSample.cpp</li>
<li>gcenv.h</li>
<li>gcenv.ee.cpp</li>
<li>gcenv.windows.cpp</li>
<li>gcenv.unix.cpp</li>
</ul>
<p><strong>GC Sample Environment (under \env)</strong></p>
<ul>
<li>common.cpp</li>
<li>common.h</li>
<li>etmdummy.g</li>
<li>gcenv.base.h</li>
<li>gcenv.ee.h</li>
<li>gcenv.interlocked.h</li>
<li>gcenv.interlocked.inl</li>
<li>gcenv.object.h</li>
<li>gcenv.os.h</li>
<li>gcenv.structs.h</li>
<li>gcenv.sync.h</li>
</ul>
<p><strong>GC Code (top-level folder)</strong></p>
<ul>
<li>gc.cpp (36,911 lines long!!)</li>
<li>gc.h</li>
<li>gccommon.cpp</li>
<li>gcdesc.h</li>
<li>gcee.cpp</li>
<li>gceewks.cpp</li>
<li>gcimpl.h</li>
<li>gcrecord.h</li>
<li>gcscan.cpp</li>
<li>gcscan.h</li>
<li>gcsvr.cpp</li>
<li>gcwks.cpp</li>
<li>handletable.h</li>
<li>handletable.inl</li>
<li>handletablecache.cpp</li>
<li>gandletablecore.cpp</li>
<li>handletablepriv.h</li>
<li>handletablescan.cpp</li>
<li>objecthandle.cpp</li>
<li>objecthandle.h</li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2016/02/04/learning-how-garbage-collectors-work-part-1/">Learning How Garbage Collectors Work - Part 1</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Open Source .NET – 1 year later - Now with ASP.NET2016-01-15T00:00:00+00:00http://www.mattwarren.org/2016/01/15/open-source-net-1-year-later-now-with-aspnet
<p>In the <a href="/2015/12/08/open-source-net-1-year-later/">previous post</a> I looked at the community involvement in the year since Microsoft open-sourced large parts of the .NET framework.</p>
<p>As a follow-up I’m going to repeat that analysis, but this time focussing on the repositories that sit under the <a href="https://github.com/aspnet"><strong>ASP.NET</strong></a> umbrella project:</p>
<ul>
<li><a href="https://github.com/aspnet/mvc/"><strong>MVC</strong></a> - Model view controller framework for building dynamic web sites with clean separation of concerns, including the merged MVC, Web API, and Web Pages w/ Razor.</li>
<li><a href="https://github.com/aspnet/dnx/"><strong>DNX</strong></a> - The DNX (a .NET Execution Environment) contains the code required to bootstrap and run an application, including the compilation system, SDK tools, and the native CLR hosts.</li>
<li><a href="https://github.com/aspnet/EntityFramework/"><strong>EntityFramework</strong></a> - Microsoft’s recommended data access technology for new applications in .NET.</li>
<li><a href="https://github.com/aspnet/KestrelHttpServer/"><strong>KestrelHttpServer</strong></a> - A web server for ASP.NET 5 based on libuv.</li>
</ul>
<h3 id="methodology"><a name="Methodology"></a><strong>Methodology</strong></h3>
<p>In the first part I classified the Issues/PRs as <strong>Owner</strong>, <strong>Collaborator</strong> or <strong>Community</strong>. However this turned out to have some problems, as was pointed out to me in the comments. There are several people who are non Microsoft employees, but have been made “Collaborators” due to their extensive contributions to a particular repository, for instance <a href="https://github.com/kangaroo">@kangaroo</a> and <a href="https://github.com/benpye/">@benpye</a>.</p>
<p>To address this, I decided to change to just the following 2 categories:</p>
<ul>
<li><strong>Microsoft</strong></li>
<li><strong>Community</strong></li>
</ul>
<p>This is possible because (almost) all Microsoft employees have indicated where they work on their GitHub profile, for instance:</p>
<p><a href="https://github.com/davidfowl"><img src="https://cloud.githubusercontent.com/assets/157298/12374944/b686820c-bca4-11e5-86c8-cf9f1076b45e.png" alt="David Fowler Profile" /></a></p>
<p>There are some notable exceptions, e.g. <a href="https://github.com/shanselman">@shanselman</a> clearly works at Microsoft, but it’s easy enough to allow for cases like this.</p>
<h2 id="results"><a name="Results"></a>Results</h2>
<p>So after all this analysis, what results did I get. Well overall, the Community involvement accounts for just over <strong>60%</strong> over the “Issues Created” and <strong>33%</strong> of the “Merged Pull Requests (PRs)”. However the amount of PRs is skewed by Entity Framework which has a much higher involvement from Microsoft employees, if this is ignored the Community proportion of PRs increases to <strong>44%</strong>.</p>
<h3 id="issues-created-nov-2013---dec-2015">Issues Created (Nov 2013 - Dec 2015)</h3>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Microsoft</strong></th>
<th style="text-align: right"><strong>Community</strong></th>
<th style="text-align: right"><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">aspnet/<strong>MVC</strong></td>
<td style="text-align: right">716</td>
<td style="text-align: right">1380</td>
<td style="text-align: right">2096</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>dnx</strong></td>
<td style="text-align: right">897</td>
<td style="text-align: right">1206</td>
<td style="text-align: right">2103</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>EntityFramework</strong></td>
<td style="text-align: right">1066</td>
<td style="text-align: right">1427</td>
<td style="text-align: right">2493</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>KestrelHttpServer</strong></td>
<td style="text-align: right">89</td>
<td style="text-align: right">176</td>
<td style="text-align: right">265</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: right"><strong>2768</strong></td>
<td style="text-align: right"><strong>4189</strong></td>
<td style="text-align: right"><strong>6957</strong></td>
</tr>
</tbody>
</table>
<h3 id="merged-pull-requests-nov-2013---dec-2015">Merged Pull Requests (Nov 2013 - Dec 2015)</h3>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Microsoft</strong></th>
<th style="text-align: right"><strong>Community</strong></th>
<th style="text-align: right"><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">aspnet/<strong>MVC</strong></td>
<td style="text-align: right">385</td>
<td style="text-align: right">228</td>
<td style="text-align: right">613</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>dnx</strong></td>
<td style="text-align: right">406</td>
<td style="text-align: right">368</td>
<td style="text-align: right">774</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>EntityFramework</strong></td>
<td style="text-align: right">937</td>
<td style="text-align: right">225</td>
<td style="text-align: right">1162</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>KestrelHttpServer</strong></td>
<td style="text-align: right">69</td>
<td style="text-align: right">88</td>
<td style="text-align: right">157</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: right"><strong>1798</strong></td>
<td style="text-align: right"><strong>909</strong></td>
<td style="text-align: right"><strong>2706</strong></td>
</tr>
</tbody>
</table>
<p>Note: I included the <a href="https://github.com/aspnet/KestrelHttpServer">Kestrel Http Server</a> because it is an interesting case. Currently the #1 contributor is not a Microsoft employee, it is <a href="https://twitter.com/ben_a_adams/status/684503094810525696/photo/1">Ben Adams</a>, who is doing a great job of <a href="http://www.hanselman.com/blog/WhenDidWeStopCaringAboutMemoryManagement.aspx">improving the memory usage</a> and in the process helping Kestrel handle more and more requests per/second.</p>
<p>By looking at the results over time, you can see that there is a clear and sustained Community involvement (the lighter section of the bars) over the past 2 years (Nov 2013 - Dec 2015) and it doesn’t look like it’s going to stop.</p>
<h3 id="issues-per-month---by-submitter-click-for-full-size-image"><a name="IssuesPerMonthBySubmitter"></a><strong>Issues Per Month - By Submitter (click for full-size image)</strong></h3>
<p><a href="https://cloud.githubusercontent.com/assets/157298/12142495/6f746e92-b470-11e5-97fd-bf0d59a74875.png"><img src="https://cloud.githubusercontent.com/assets/157298/12142495/6f746e92-b470-11e5-97fd-bf0d59a74875.png" alt="Issues Per Month - By Submitter (Microsoft or Community)" /></a></p>
<p>In addition, whilst the Community involvement is easier to see with the Issues per/month, it is still visible in the Merged PRs and again it looks like it has being sustained over the 2 years.</p>
<h3 id="merged-pull-request-per-month---by-submitter-click-for-full-size-image"><a name="MergedPullRequestPerMonthBySubmitter"></a><strong>Merged Pull Request Per Month - By Submitter (click for full-size image)</strong></h3>
<p><a href="https://cloud.githubusercontent.com/assets/157298/12142522/9f72726a-b470-11e5-8333-aec772ff9f6b.png"><img src="https://cloud.githubusercontent.com/assets/157298/12142522/9f72726a-b470-11e5-8333-aec772ff9f6b.png" alt="Merged Pull Requests Per Month - By Submitter (Microsoft or Community)" /></a></p>
<h3 id="total-number-of-people-contributing"><a name="TotalNumberOfPeopleContributing"></a><strong>Total Number of People Contributing</strong></h3>
<p>It’s also interesting to look at the total number of different people who contributed to each project. By doing this you get a real sense of the size of the Community contribution, it’s not just a small amount of people doing a lot of work, it’s spread across a large amount of people.</p>
<p>This table shows the number of different GitHub users (per project) who opened an Issue or created a PR that was Merged:</p>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Microsoft</strong></th>
<th style="text-align: right"><strong>Community</strong></th>
<th style="text-align: right"><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">aspnet/<strong>MVC</strong></td>
<td style="text-align: right">39</td>
<td style="text-align: right">395</td>
<td style="text-align: right">434</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>dnx</strong></td>
<td style="text-align: right">46</td>
<td style="text-align: right">421</td>
<td style="text-align: right">467</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>EntityFramework</strong></td>
<td style="text-align: right">31</td>
<td style="text-align: right">570</td>
<td style="text-align: right">601</td>
</tr>
<tr>
<td style="text-align: left">aspnet/<strong>KestrelHttpServer</strong></td>
<td style="text-align: right">22</td>
<td style="text-align: right">95</td>
<td style="text-align: right">117</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: right"><strong>138</strong></td>
<td style="text-align: right"><strong>1481</strong></td>
<td style="text-align: right"><strong>1619</strong></td>
</tr>
</tbody>
</table>
<h2 id="-fsharp"><a name="FSharp"></a> <strong>FSharp</strong></h2>
<p>In the comments of my first post, Isaac Abraham correctly pointed out:</p>
<blockquote>
<p>parts of .NET have been open source for quite a bit more than a year – the F# compiler and FSharp.Core have been for quite a while now.</p>
</blockquote>
<p>So, to address this, I will take a quick look at the main FSharp repositories:</p>
<ul>
<li><a href="https://github.com/microsoft/visualfsharp"><strong>microsoft/visualfsharp</strong></a></li>
<li><a href="https://github.com/fsharp/fsharp"><strong>fsharp/fsharp</strong></a></li>
</ul>
<p>As Isaac explained, their relationship is:</p>
<blockquote>
<p>… visualfsharp is the Microsoft-owned repo Visual F#. The other is the community owned one. The former one feeds directly into tools like Visual F# tooling in Visual Studio etc.; the latter feeds into things like Xamarin etc. There’s a (slightly out of date) <a href="http://fsharp.github.io/2014/06/18/fsharp-contributions.html">diagram that explains the relationship</a>, and this is another useful resource http://fsharp.github.io/.</p>
</blockquote>
<h3 id="fsharp---issues-created-dec-2010---dec-2015">FSharp - Issues Created (Dec 2010 - Dec 2015)</h3>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Microsoft</strong></th>
<th style="text-align: right"><strong>Community</strong></th>
<th style="text-align: right"><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">fsharp/fsharp</td>
<td style="text-align: right">9</td>
<td style="text-align: right">312</td>
<td style="text-align: right">321</td>
</tr>
<tr>
<td style="text-align: left">microsoft/visualfsharp</td>
<td style="text-align: right">161</td>
<td style="text-align: right">367</td>
<td style="text-align: right">528</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: right"><strong>170</strong></td>
<td style="text-align: right"><strong>679</strong></td>
<td style="text-align: right"><strong>849</strong></td>
</tr>
</tbody>
</table>
<h3 id="fsharp---merged-pull-requests-may-2011---dec-2015">FSharp - Merged Pull Requests (May 2011 - Dec 2015)</h3>
<table>
<thead>
<tr>
<th style="text-align: left"><strong>Project</strong></th>
<th style="text-align: right"><strong>Microsoft</strong></th>
<th style="text-align: right"><strong>Community</strong></th>
<th style="text-align: right"><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">fsharp/fsharp</td>
<td style="text-align: right">27</td>
<td style="text-align: right">134</td>
<td style="text-align: right">161</td>
</tr>
<tr>
<td style="text-align: left">microsoft/visualfsharp</td>
<td style="text-align: right">36</td>
<td style="text-align: right">33</td>
<td style="text-align: right">69</td>
</tr>
<tr>
<td style="text-align: left"><strong>Total</strong></td>
<td style="text-align: right"><strong>63</strong></td>
<td style="text-align: right"><strong>167</strong></td>
<td style="text-align: right"><strong>230</strong></td>
</tr>
</tbody>
</table>
<h2 id="conclusion"><a name="Conclusion"></a>Conclusion</h2>
<p>I think that it’s fair to say that the Community has responded to Microsoft making more and more of their code Open Source. There have been a significant amount of Community contributions across several projects, over a decent amount of time. Whilst you could argue that it took Microsoft a long time to open source their code, it seems that .NET developers are happy they have done it, as shown by a sizeable Community response.</p>
<p>The post <a href="http://www.mattwarren.org/2016/01/15/open-source-net-1-year-later-now-with-aspnet/">Open Source .NET – 1 year later - Now with ASP.NET</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Open Source .NET – 1 year later2015-12-08T00:00:00+00:00http://www.mattwarren.org/2015/12/08/open-source-net-1-year-later
<p>A little over a year ago Microsoft announced that they were <a href="http://www.hanselman.com/blog/AnnouncingNET2015NETAsOpenSourceNETOnMacAndLinuxAndVisualStudioCommunity.aspx">open sourcing large parts of the .NET framework</a>. At the time Scott Hanselman did a <a href="http://www.hanselman.com/blog/TheNETCoreCLRIsNowOpenSourceSoIRanTheGitHubRepoThroughMicrosoftPowerBI.aspx">nice analysis of the source</a>, using Microsoft Power BI. Inspired by this and now that a year has passed, I wanted to try and answer the question:</p>
<blockquote>
<p>How much <strong>Community</strong> involvement has there been since Microsoft open sourced large parts of the .NET framework?</p>
</blockquote>
<p>I will be looking at the 3 following projects, as they are all highly significant parts of the .NET ecosystem and are also some of the <a href="https://github.com/dotnet/">most active/starred/forked projects</a> within the .NET Foundation:</p>
<ul>
<li><a href="https://github.com/dotnet/roslyn/"><strong>Roslyn</strong></a> - The .NET Compiler Platform (“Roslyn”) provides open-source C# and Visual Basic compilers with rich code analysis APIs.</li>
<li><a href="https://github.com/dotnet/coreclr/"><strong>CoreCLR</strong></a> - the .NET Core runtime, called CoreCLR, and the base library, called mscorlib. It includes the garbage collector, JIT compiler, base .NET data types and many low-level classes.</li>
<li><a href="https://github.com/dotnet/corefx/"><strong>CoreFX</strong></a> the .NET Core foundational libraries, called CoreFX. It includes classes for collections, file systems, console, XML, async and many others.</li>
</ul>
<h2 id="available-data"><a name="AvailableData"></a><strong>Available Data</strong></h2>
<p>GitHub itself has some nice graphs built-in, for instance you can see the <strong>Commits per Month</strong> over an entire year:</p>
<p><a href="https://github.com/dotnet/roslyn/graphs/contributors"><img src="https://cloud.githubusercontent.com/assets/157298/11634181/f451abce-9d06-11e5-8940-d133d1931422.png" alt="Commits Per Month" /></a></p>
<p>Also you can get a nice dashboard showing the <strong>Monthly Pulse</strong></p>
<p><a href="https://github.com/dotnet/roslyn/pulse/monthly"><img src="https://cloud.githubusercontent.com/assets/157298/11634411/35085a4a-9d08-11e5-8995-02c65d9ee12d.png" alt="github stats - monthly pulse" /></a></p>
<p>However to answer the question above, I needed more data. Fortunately GitHub provides a <a href="https://developer.github.com/v3/">really comprehensive API</a>, which combined with the excellent <a href="https://github.com/octokit/octokit.net">Octokit.net library</a> and the <a href="https://www.linqpad.net/">brilliant LINQPad</a>, meant I was able to easily get all the data I needed. Here’s a <a href="https://gist.github.com/mattwarren/894aa5f46ca62a63764a">sample LINQPad script</a> if you want to start playing around with the API yourself.</p>
<p>However, knowing the “<em># of Issues</em>” or “<em>Merged Pull Requests</em>” per/month on it’s own isn’t that useful, it doesn’t tell us anything about <em>who</em> created the issue or submitted the PR. Fortunately GitHub classifies users into categories, for instance in the image below from <a href="https://github.com/dotnet/roslyn/issues/670">Roslyn Issue #670</a> we can see what type of user posted each comment, an “Owner”, “Collaborator” or blank which signifies a “Community” member, i.e. someone who (AFAICT) doesn’t work at Microsoft.</p>
<p><a href="https://cloud.githubusercontent.com/assets/157298/11634101/8abd7210-9d06-11e5-82b0-570f296cf433.png"><img src="https://cloud.githubusercontent.com/assets/157298/11634101/8abd7210-9d06-11e5-82b0-570f296cf433.png" alt="owner collaborator or community" /></a></p>
<h2 id="results"><a name="Results"></a><strong>Results</strong></h2>
<p>So now that we can get the data we need, what results do we get.</p>
<h3 id="total-issues---by-submitter"><a name="TotalIssuesBySubmitter"></a><strong>Total Issues - By Submitter</strong></h3>
<table>
<thead>
<tr>
<th><strong>Project</strong></th>
<th><strong>Owner</strong></th>
<th><strong>Collaborator</strong></th>
<th><strong>Community</strong></th>
<th><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Roslyn</td>
<td>481</td>
<td>1867</td>
<td>1596</td>
<td>3944</td>
</tr>
<tr>
<td>CoreCLR</td>
<td>86</td>
<td>298</td>
<td>487</td>
<td>871</td>
</tr>
<tr>
<td>CoreFX</td>
<td>334</td>
<td>911</td>
<td>735</td>
<td>1980</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>901</strong></td>
<td><strong>3076</strong></td>
<td><strong>2818</strong></td>
<td><strong>6795</strong></td>
</tr>
</tbody>
</table>
<p>Here you can see that the Owners and Collaborators do in some cases dominate, e.g. in Roslyn where almost 60% of the issues were opened by them. But in other cases the Community is very active, especially in CoreCLR where Community members are opening more issues than Owners/Collaborators combined. Part of the reason for this is the nature of the different repositories, CoreCLR is the most visible part of the .NET framework as it encompasses most of the libraries that .NET developers would use on a day-to-day basis, so it’s not surprising that the Community has lots of suggestions for improvements or bug fixes. In addition, the CoreCLR has been around for a much longer time and so the Community has had more time to use it and find out the parts it doesn’t like. Whereas Roslyn is a much newer project so there has been less time to use it, plus finding bugs in a compiler is by its nature harder to do.</p>
<h3 id="total-merged-pull-requests---by-submitter"><a name="TotalMergedPullRequestsBySubmitter"></a><strong>Total Merged Pull Requests - By Submitter</strong></h3>
<table>
<thead>
<tr>
<th><strong>Project</strong></th>
<th><strong>Owner</strong></th>
<th><strong>Collaborator</strong></th>
<th><strong>Community</strong></th>
<th><strong>Total</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Roslyn</td>
<td>465</td>
<td>2093</td>
<td>118</td>
<td>2676</td>
</tr>
<tr>
<td>CoreCLR</td>
<td>378</td>
<td>567</td>
<td>201</td>
<td>1146</td>
</tr>
<tr>
<td>CoreFX</td>
<td>516</td>
<td>1409</td>
<td>464</td>
<td>2389</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>1359</strong></td>
<td><strong>4069</strong></td>
<td><strong>783</strong></td>
<td><strong>6211</strong></td>
</tr>
</tbody>
</table>
<p>However if we look at Merged Pull Requests, we can see that that the overall amount of Community contributions across the 3 projects is much lower, only accounting for roughly 12%. This however isn’t that surprising, there’s a much higher bar for getting a pull request accepted. Firstly, if the project is using this mechanism, you have to pick an issue that is <a href="https://github.com/dotnet/corefx/labels/up%20for%20grabs">“<em>up for grabs</em>”</a>, then you have to get any <a href="http://blogs.msdn.com/b/dotnet/archive/2015/01/08/api-review-process-for-net-core.aspx">API changes through a review</a>, then finally you have to meet any comparability/performance/correctness issues that come up during the code review itself. So actually 12% is a pretty good result as there is a non–trivial amount of work involved in getting your PR merged, especially considering most Community members will be working in their spare time.</p>
<p><strong>Update:</strong> I was wrong about the “up for grabs” requirement, see <a href="/2015/12/08/open-source-net-1-year-later/#comment-7091">this comment</a> from <a href="https://github.com/davkean">David Kean</a> and <a href="https://twitter.com/leppie/status/674285812146675714">this tweet</a> for more information. “Up for grabs” is a guideline and meant to help new users, but it is not a requirement, you can submit PRs for issues that don’t have that label.</p>
<p>Finally if you look at the amount per/month (see the 2 graphs below, click for larger images), it’s hard to pick up any definite trends or say if the Community is <em>definitely</em> contributing more or less over time. But you can say that over a year the Community has consistently contributed and it doesn’t look like that contribution is going to end. It is not just an initial burst that only happened straight after the projects were open sourced, it is a sustained level of contributions over an entire year.</p>
<h3 id="issues-per-month---by-submitter"><a name="IssuesPerMonthBySubmitter"></a><strong>Issues Per Month - By Submitter</strong></h3>
<p><a href="https://cloud.githubusercontent.com/assets/157298/11596712/ad28f518-9aae-11e5-81d9-42bc22903d09.png"><img src="https://cloud.githubusercontent.com/assets/157298/11596712/ad28f518-9aae-11e5-81d9-42bc22903d09.png" alt="Issues Per Month - By Submitter (Owner, Collaborator or Community)" /></a></p>
<h3 id="merged-pull-request-per-month---by-submitter"><a name="MergedPullRequestPerMonthBySubmitter"></a><strong>Merged Pull Request Per Month - By Submitter</strong></h3>
<p><a href="https://cloud.githubusercontent.com/assets/157298/11652755/785d0d20-9d91-11e5-9802-834bb3955718.png"><img src="https://cloud.githubusercontent.com/assets/157298/11652755/785d0d20-9d91-11e5-9802-834bb3955718.png" alt="Merged Pull Requests Per Month - By Submitter (Owner, Collaborator or Community)" /></a></p>
<h2 id="top-20-issue-labels"><a name="Top20IssuesLabels"></a><strong>Top 20 Issue Labels</strong></h2>
<p>The last thing that I want to do whilst I have the data is to take a look at the most popular <em>Issue Labels</em> and see what they tell us about the <em>type</em> of work that has been going on since the 3 projects were open sourced.</p>
<p><a href="https://cloud.githubusercontent.com/assets/157298/11633496/8505205a-9d03-11e5-89fd-33384b20306c.png"><img src="https://cloud.githubusercontent.com/assets/157298/11633496/8505205a-9d03-11e5-89fd-33384b20306c.png" alt="Top 20 Issue Labels" /></a></p>
<p>Here are a few observations about the results:</p>
<ul>
<li>Having <a href="https://github.com/dotnet/coreclr/labels/CodeGen"><strong>CodeGen</strong></a> so high on the list is not that surprising considering that <a href="http://blogs.msdn.com/b/dotnet/archive/2013/09/30/ryujit-the-next-generation-jit-compiler.aspx">RyuJIT - the next-gen .NET JIT Compiler</a> was only released 2 years ago. However, it’s a bit worrying that were so <em>many</em> issues, especially considering that some of them have <a href="https://github.com/dotnet/coreclr/issues/1296">severe consequences</a> as the <a href="http://nickcraver.com/blog/2015/07/27/why-you-should-wait-on-dotnet-46/">devs at Stack Overflow</a> found out! (On a related note, if you want to find out lots of low-level details about what the JIT does, just take a look at all the issues that <a href="https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=commenter%3Amikedn+type%3Aissue+label%3Acodegen+">@MikeDN has commented on</a>, unbelievably for someone with that much knowledge he doesn’t actually work on the product itself, or even another team at Microsoft!!)</li>
<li>It’s nice to see that all 3 projects have a lots of <strong>“Up for Grabs”</strong> issues, see <a href="https://github.com/dotnet/roslyn/labels/Up%20for%20Grabs">Roslyn</a>, <a href="https://github.com/dotnet/coreclr/labels/up-for-grabs">CoreCLR</a> and <a href="https://github.com/dotnet/corefx/labels/up%20for%20grabs">CoreFX</a>, plus the Community seems to be <a href="https://github.com/dotnet/corefx/labels/grabbed%20by%20community">grabbing them back!</a></li>
<li>Finally, I love the fact that <a href="https://github.com/dotnet/corefx/labels/performance"><strong>Performance</strong></a> and <a href="https://github.com/dotnet/coreCLR/labels/optimization"><strong>Optimisation</strong></a> are being taken seriously, after all <a href="/speaking/">Performance is a Feature!!</a></li>
</ul>
<p>Discuss on <a href="https://www.reddit.com/r/programming/comments/3vyezb/open_source_net_1_year_later/">/r/programming</a> and <a href="https://news.ycombinator.com/item?id=10697993">Hacker News</a></p>
<p>The post <a href="http://www.mattwarren.org/2015/12/08/open-source-net-1-year-later/">Open Source .NET – 1 year later</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The Stack Overflow Tag Engine – Part 32015-10-29T00:00:00+00:00http://www.mattwarren.org/2015/10/29/the-stack-overflow-tag-engine-part-3
<p>This is the part 3 of a mini-series looking at what it <em>might</em> take to build the Stack Overflow Tag Engine, if you haven’t read <a href="/2014/11/01/the-stack-overflow-tag-engine-part-1/">part 1</a> or <a href="/2015/08/19/the-stack-overflow-tag-engine-part-2/">part 2</a>, I recommend reading them first.</p>
<hr />
<h2 id="complex-boolean-queries"><a name="ComplexBooleanQueries"></a><strong>Complex boolean queries</strong></h2>
<p>One of the most powerful features of the Stack Overflow Tag Engine is that it allows you to do complex boolean queries against multiple Tag, for instance:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/tagged/.net+or+jquery-">.net OR (NOT jquery)</a></li>
<li><a href="http://stackoverflow.com/questions/tagged/.net+or+jquery-+javascript">.net OR (NOT jquery) AND javascript</a></li>
</ul>
<p>A simple way of implementing this is to write code like below, which makes use of a <a href="https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx"><code class="language-plaintext highlighter-rouge">HashSet</code></a> to let us efficiently do lookups to see if a particular questions should be included or excluded.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="k">new</span> <span class="n">List</span><span class="p"><</span><span class="n">Question</span><span class="p">>(</span><span class="n">pageSize</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">andHashSet</span> <span class="p">=</span> <span class="k">new</span> <span class="n">HastSet</span><span class="p"><</span><span class="kt">int</span><span class="p">>(</span><span class="n">queryInfo</span><span class="p">[</span><span class="n">tag2</span><span class="p">]);</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">id</span> <span class="k">in</span> <span class="n">queryInfo</span><span class="p">[</span><span class="n">tag1</span><span class="p">])</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">Count</span> <span class="p">>=</span> <span class="n">pageSize</span><span class="p">)</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">baseQueryCounter</span><span class="p">++;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">questions</span><span class="p">[</span><span class="n">id</span><span class="p">].</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Any</span><span class="p">(</span><span class="n">t</span> <span class="p">=></span> <span class="n">tagsToExclude</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="n">t</span><span class="p">)))</span>
<span class="p">{</span>
<span class="n">excludedCounter</span><span class="p">++;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">andHashSet</span><span class="p">.</span><span class="nf">Remove</span><span class="p">(</span><span class="n">item</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">itemsSkipped</span> <span class="p">>=</span> <span class="n">skip</span><span class="p">)</span>
<span class="n">result</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">questions</span><span class="p">[</span><span class="n">item</span><span class="p">]);</span>
<span class="k">else</span>
<span class="n">itemsSkipped</span><span class="p">++;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The main problem is that we have to scan through all the ids for <code class="language-plaintext highlighter-rouge">tag1</code> until we have enough matches, i.e. <code class="language-plaintext highlighter-rouge">foreach (var id in queryInfo[tag1])</code>. In addition we have to initially load up the <code class="language-plaintext highlighter-rouge">HashSet</code> with all the ids for <code class="language-plaintext highlighter-rouge">tag2</code>, so that we can check matches. So this method takes longer as we skip more and more questions, i.e. for larger value of <code class="language-plaintext highlighter-rouge">skip</code> or if there are a large amount of <code class="language-plaintext highlighter-rouge">tagsToExclude</code> (i.e. “<em>Ignored Tags</em>”), see <a href="/2015/08/19/the-stack-overflow-tag-engine-part-2/#IgnoredTags">Part 2 for more infomation</a>.</p>
<h2 id="bitmaps"><a name="Bitmaps"></a><strong>Bitmaps</strong></h2>
<p>So can we do any better, well yes, there is a fairly established mechanism for doing these types of queries, known as <a href="http://lemire.me/blog/archives/2008/08/20/the-mythical-bitmap-index/"><strong>Bitmap indexes</strong></a>. To use these you have to pre-calculate an index in which each bit is set to <code class="language-plaintext highlighter-rouge">1</code> to indicate a match and <code class="language-plaintext highlighter-rouge">0</code> otherwise. In our scenario this looks so:</p>
<p><a href="/images/2015/10/bit-map-indexing-explanation.png"><img src="/images/2015/10/bit-map-indexing-explanation.png" alt="Bit Map Indexing explanation" /></a></p>
<p>Then it is just a case of doing the relevant bitwise operations against the bits (a <code class="language-plaintext highlighter-rouge">byte</code> at a time), for example if you want to get the questions that have the <code class="language-plaintext highlighter-rouge">C#</code> <code class="language-plaintext highlighter-rouge">AND</code> <code class="language-plaintext highlighter-rouge">Java</code> Tags, you do the following:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">numBits</span> <span class="p">/</span> <span class="m">8</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">result</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">bitSetCSharp</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">&</span> <span class="n">bitSetJava</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The main drawback is that we have to create a Bitmap index for <em>each</em> tag (<code class="language-plaintext highlighter-rouge">C#</code>, <code class="language-plaintext highlighter-rouge">.NET</code>, <code class="language-plaintext highlighter-rouge">Java</code>, etc) for <em>every</em> sort order (<code class="language-plaintext highlighter-rouge">LastActivityDate</code>, <code class="language-plaintext highlighter-rouge">CreationDate</code>, <code class="language-plaintext highlighter-rouge">Score</code>, <code class="language-plaintext highlighter-rouge">ViewCount</code>, <code class="language-plaintext highlighter-rouge">AnswerCount</code>), so we soon use up a <em>lot</em> of memory. The Sept 2014 Stack Overflow dataset contains just under 8 million questions and so at 8 questions per byte, a single Bitmap needs 976KB or 0.95MB. This adds up to an impressive <strong>149GB</strong> (0.95MB * 32,000 Tags * 5 sort orders).</p>
<h2 id="compressed-bitmaps"><a name="CompressedBitmaps"></a><strong>Compressed Bitmaps</strong></h2>
<p>Fortunately there is a way to heavily compress the Bitmaps using a form of <a href="http://en.wikipedia.org/wiki/Run-length_encoding">Run-length encoding</a>, to do this I made use of the <a href="https://github.com/lemire/csharpewah">C# version</a> of the excellent <a href="https://github.com/lemire/javaewah">EWAH library</a>. This library is based on the research carried out in the paper <a href="http://arxiv.org/abs/0901.3751">Sorting improves word-aligned bitmap indexes</a> by <a href="https://twitter.com/lemire">Daniel Lemire</a> and others. By using EWAH it has the added benefit that you don’t need to uncompress the Bitmap to perform the bitwise operations, they can be done in-place (for an idea of how this is done take a look at <a href="https://github.com/mattwarren/StackOverflowTagServer/commit/20561e60e1b7d90ff0bb023ec8cf89494d0705f5">this commit where I added a single in-place <code class="language-plaintext highlighter-rouge">AndNot</code> function</a> to the existing library).</p>
<p>However if you don’t want to read the <a href="http://arxiv.org/abs/0901.3751">research paper</a>, the diagram below shows how the Bitmap is compressed into 64-bit <code class="language-plaintext highlighter-rouge">words</code> that have 1 or more bits set, plus runs of repeating zeros or ones. So <code class="language-plaintext highlighter-rouge">31 0x00</code> indicates that 31 instances of a <code class="language-plaintext highlighter-rouge">64-bit word</code> (with all the bits set to <code class="language-plaintext highlighter-rouge">0</code>) have be encoded as a single value, rather than as 31 individual <code class="language-plaintext highlighter-rouge">words</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 0x00
1 words
[ 0]= 17, 2 bits set ->
{0000000000000000000000000000000000000000000000000000000000010001}
31 0x00
1 words
[ 0]= 2199023255552, 1 bits set ->
{0000000000000000000000100000000000000000000000000000000000000000}
18 0x01
1 words
[ 0]= 64, 1 bits set ->
{0000000000000000000000000000000000000000000000000000000001000000}
48 0x01
3 words
[ 0]= 1048576, 1 bits set ->
{0000000000000000000000000000000000000000000100000000000000000000}
[ 1]= 9007199254740992, 1 bits set ->
{0000000000100000000000000000000000000000000000000000000000000000}
[ 2]= 9007199304740992, 13 bits set ->
{0000000000100000000000000000000000000010111110101111000010000000}
131 0x00
1 words
[ 0]= 536870912, 1 bits set ->
{0000000000000000000000000000000000100000000000000000000000000000}
....
</code></pre></div></div>
<p>To give an idea of the space savings that can be achieved, the table below shows the size in bytes for compressed Bitmaps that have varying amounts of individual bit set to <code class="language-plaintext highlighter-rouge">1</code> (for comparision uncompressed Bitmaps are 1,000,000 bytes or 0.95MB)</p>
<table>
<thead>
<tr>
<th># Bits Set</th>
<th>Size in Bytes</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>24</td>
</tr>
<tr>
<td>10</td>
<td>168</td>
</tr>
<tr>
<td>25</td>
<td>408</td>
</tr>
<tr>
<td>50</td>
<td>808</td>
</tr>
<tr>
<td>100</td>
<td>1,608</td>
</tr>
<tr>
<td>200</td>
<td>3,208</td>
</tr>
<tr>
<td>400</td>
<td>6,408</td>
</tr>
<tr>
<td>800</td>
<td>12,808</td>
</tr>
<tr>
<td>1,600</td>
<td>25,608</td>
</tr>
<tr>
<td>32,000</td>
<td>512,008</td>
</tr>
<tr>
<td>64,000</td>
<td>1,000,008</td>
</tr>
<tr>
<td>128,000</td>
<td>1,000,008</td>
</tr>
</tbody>
</table>
<p>As you can see it’s not until we get over 64,000 bits (62,016 to be precise) that we match the size of the regular Bitmaps. <strong>Note:</strong> in these tests I was setting the bits with an evenly spaced distribution across the entire range of 8 million possible bits. The compression is also dependant on which bits are set, so this is a worse case. The more the bits are clumped together (within the same <code class="language-plaintext highlighter-rouge">byte</code>), the more it will be compressed.</p>
<p>So over the entire Stack Overflow data set of 32,000 Tags, the Bitmaps compress down to an impressive <strong>1.17GB</strong>, compared to 149GB uncompressed!</p>
<h2 id="results"><a name="Results"></a><strong>Results</strong></h2>
<p>But do queries against compressed Bitmaps actually perform faster than the naive queries using <code class="language-plaintext highlighter-rouge">HashSets</code> (see code above). Well yes they do and in some cases the difference is significant.</p>
<p>As you can see below, for <code class="language-plaintext highlighter-rouge">AND NOT</code> queries they are much faster, especially compared to the worse-case where the regular/naive code takes over 150 ms and the compressed Bitmap code takes ~5 ms (the x-axis is <code class="language-plaintext highlighter-rouge"># of excluded/skipped questions</code> and the y-axis is <code class="language-plaintext highlighter-rouge">time in milliseconds</code>).</p>
<p><a href="/images/2015/10/and-not-queries-with-exclusions.png"><img src="/images/2015/10/and-not-queries-with-exclusions.png" alt="AND NOT Queries with Exclusions" /></a></p>
<p>For reference there are 194,384 questions tagged with <code class="language-plaintext highlighter-rouge">.net</code> and 528,490 tagged with <code class="language-plaintext highlighter-rouge">jquery</code>.</p>
<p>To ensure I’m being fair, I should point out that the compressed Bitmap queries are <em>slower</em> for <code class="language-plaintext highlighter-rouge">OR</code> queries, as shown below. But note the scale, they take ~5 ms compared to ~1-2 ms for the regular queries, so the compressed Bitmap queries are still fast! The nice things about the compressed Bitmap queries is that they take the same amount of time, regardless of how many questions we skip, whereas the regular queries get slower as <code class="language-plaintext highlighter-rouge"># of excluded/skipped questions</code> increases.</p>
<p><a href="/images/2015/10/or-queries-with-exclusions.png"><img src="/images/2015/10/or-queries-with-exclusions.png" alt="OR Queries with Exclusions" /></a></p>
<p>If you are interested the results for all the query types are available:</p>
<ul>
<li><a href="/images/2015/10/and-queries-with-exclusions.png">AND Queries</a></li>
<li><a href="/images/2015/10/and-not-queries-with-exclusions.png">AND NOT Queries</a></li>
<li><a href="/images/2015/10/or-queries-with-exclusions.png">OR Queries</a></li>
<li><a href="/images/2015/10/or-not-queries-with-exclusions.png">OR NOT Queries</a></li>
<li><a href="/images/2015/10/or-not-queries-with-exclusions.png">OR NOT Queries</a></li>
</ul>
<h2 id="further-reading"><a name="FurtherReading"></a><strong>Further Reading</strong></h2>
<ul>
<li>Bitmaps
<ul>
<li><a href="http://lemire.me/blog/archives/2008/08/20/the-mythical-bitmap-index/">The mythical bitmap index</a></li>
<li><a href="http://roaringbitmap.org/">Roaring Bitmaps</a> (a newer/faster compressed Bit Map implementation)</li>
<li><a href="http://lemire.me/blog/archives/2012/10/23/when-is-a-bitmap-faster-than-an-integer-list/">When is a bitmap faster than an integer list</a></li>
<li><a href="http://kellabyte.com/2013/03/05/using-bitmap-indexes-in-databases/">Using bitmap indexes in databases</a></li>
<li><a href="https://news.ycombinator.com/item?id=8796997">Interesting Hacker News discussion on Roaring Bitmaps</a></li>
<li><a href="http://ascr-discovery.science.doe.gov/2008/12/more-than-a-bit-faster/">Research into different Bitmap implementations</a></li>
</ul>
</li>
<li>Real-world usage
<ul>
<li><a href="http://githubengineering.com/counting-objects/">How GitHub used Bitmaps to speed up repository cloning</a></li>
<li><a href="https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps">Roaring Bitmap implementation in Elastic Search</a></li>
<li><a href="https://issues.apache.org/jira/browse/LUCENE-5983">Usage of Bitmaps indexes in Lucene</a></li>
<li><a href="https://groups.google.com/forum/m/#!topic/druid-development/_kw2jncIlp0">Compressed Bitmaps implemented in Druid</a></li>
</ul>
</li>
</ul>
<h2 id="future-posts"><a name="FuturePosts"></a><strong>Future Posts</strong></h2>
<p>But there’s still more things to implement, in future posts I hope to cover the following:</p>
<ul>
<li>Currently my implementation doesn’t play nicely with the Garbage Collector and it does lots of allocations. I will attempt to replicate the “no-allocations” rule that Stack Overflow have after <a href="http://blog.marcgravell.com/2011/10/assault-by-gc.html">their battle with the .NET GC</a></li>
</ul>
<p><a href="https://twitter.com/Nick_Craver/status/636516399435923456"><img src="/images/2015/10/nick_craver-tweet.png" alt="Nick_Craver Tweet" /></a></p>
<ul>
<li><a href="http://stackstatus.net/post/107352821074/outage-postmortem-january-6th-2015">How a DDOS attack on TagServer</a> <em>might</em> have been caused</li>
</ul>
<blockquote>
<p>In October, we had a situation where a flood of crafted requests were causing high resource utilization on our Tag Engine servers, which is our internal application for associating questions and tags in a high-performance way.</p>
</blockquote>
<p>The post <a href="http://www.mattwarren.org/2015/10/29/the-stack-overflow-tag-engine-part-3/">The Stack Overflow Tag Engine – Part 3</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The Stack Overflow Tag Engine – Part 22015-08-19T00:00:00+00:00http://www.mattwarren.org/2015/08/19/the-stack-overflow-tag-engine-part-2
<p>I’ve added a <a href="/resources/"><strong>Resources</strong></a> and <a href="/speaking/"><strong>Speaking</strong></a> page to my site, check them out if you want to learn more. There’s also a video available of my NDC London 2014 talk <a href="/speaking/#NDCLondon2014">“Performance is a Feature!”</a>.</p>
<hr />
<h2 id="recap-of-stack-overflow-tag-engine"><a name="Recap"></a><strong>Recap of Stack Overflow Tag Engine</strong></h2>
<p>This is the long-delayed part 2 of a mini-series looking at what it <em>might</em> take to build the Stack Overflow Tag Engine, if you haven’t read <a href="/2014/11/01/the-stack-overflow-tag-engine-part-1/" target="_blank">part 1</a>, I recommend reading it first.</p>
<p>Since the first part was published, Stack Overflow published a nice performance report, giving some more stats on the Tag Engine Servers. As you can see they run the Tag Engine on some pretty powerful servers, but only have a peak CPU usage of 10%, which means there’s plenty of overhead available. It’s a nice way of being able to cope with surges in demand or busy times of the day.</p>
<p><a href="https://stackexchange.com/performance" target="_blank"><img src="/images/2015/08/tag-server-infographic.png" alt="Tag Engine infographic" /></a></p>
<h2 id="ignored-tag-preferences"><a name="IgnoredTags"></a><strong>Ignored Tag Preferences</strong></h2>
<p>In <a href="/2014/11/01/the-stack-overflow-tag-engine-part-1/" target="_blank">part 1</a>, I only really covered the simple things, i.e. a basic search for all the questions that contain a given tag, along with multiple sort orders (by score, view count, etc). But the real Tag Engine does much more than that, for instance:</p>
<p><a href="https://twitter.com/marcgravell/status/522515630248189953" target="_blank"><img src="/images/2015/08/tweet-wildcard-exclusions.png" alt="Tweet - Wildcard exclusions" /></a></p>
<p>What is he talking about here? Well any time you do a <em>tag</em> search, after the actual search has been done per-user exclusions can then be applied. These exclusions are configurable and allow you to set <em>“Ignored Tags”</em>, i.e. tags that you don’t want to see questions for. Then when you do a search, it will exclude these questions from the results.</p>
<p>Note: it will let you know if there were questions excluded due to your preferences, which is a pretty nice user-experience. If that happens, you get this message: (it can also be configured so that matching questions are greyed out instead):</p>
<p><a href="/images/2015/08/questions-hidden-due-to-ignored-tag-preferences.png" target="_blank"><img src="/images/2015/08/questions-hidden-due-to-ignored-tag-preferences.png" alt="Questions hidden due to Ignored Tag preferences" /></a></p>
<p>Now most people probably have just a few exclusions and maybe 10’s at most, but fortunately <a href="https://twitter.com/leppie" target="_blank">@leppie</a> a Stack Overflow <a href="http://stackoverflow.com/users/15541/leppie" target="_blank"><em>power-user</em></a> got in touch with me and shared his list of preferences.</p>
<script src="https://gist.github.com/leppie/4d9b84abd8c2d06d6ef4.js"></script>
<p>You’ll need to scroll across to appreciate this full extent of this list, but here’s some statistics to help you:</p>
<blockquote>
<ul>
<li>It contains <strong>3,753</strong> items, of which <strong>210</strong> are wildcards (e.g. cocoa* or *hibernate*)</li>
<li>The tags and wildcards expand to <strong>7,677</strong> tags in total (out of a possible 30,529 tags)</li>
<li>There are <strong>6,428,251</strong> questions (out of 7,990,787) that have at least one of the 7,677 tags in them!</li>
</ul>
</blockquote>
<h2 id="wildcards"><a name="Wildcards"></a><strong>Wildcards</strong></h2>
<p>If you want to see the wildcard expansion in action you can visit the url’s below:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/tagged/*java*?sort=votes" target="_blank">*java*</a>
<ul>
<li>[facebook-javascript-sdk] [java] [java.util.scanner] [java-7] [java-8] [javabeans] [javac] [javadoc] [java-ee] [java-ee-6] [javafx] [javafx-2] [javafx-8] [java-io] [javamail] [java-me] [javascript] [javascript-events] [javascript-objects] [java-web-start]</li>
</ul>
</li>
<li><a href="http://stackoverflow.com/questions/tagged/.net*?sort=votes" target="_blank">.net*</a>
<ul>
<li>[.net] [.net-1.0] [.net-1.1] [.net-2.0] [.net-3.0] [.net-3.5] [.net-4.0] [.net-4.5] [.net-4.5.2] [.net-4.6] [.net-assembly] [.net-cf-3.5] [.net-client-profile] [.net-core] [.net-framework-version] [.net-micro-framework] [.net-reflector] [.net-remoting] [.net-security] [.nettiers]</li>
</ul>
</li>
</ul>
<p>Now a simple way of doing these matches is the following, i.e. loop through the wildcards and compare each one with every single tag to see if it could be expanded to match that tag. (<code class="language-plaintext highlighter-rouge">IsActualMatch(..)</code> is a simple method that does a basic string <a href="https://msdn.microsoft.com/en-us/library/baketfxw(v=vs.110).aspx" target="_blank">StartsWith</a>, <a href="https://msdn.microsoft.com/en-us/library/2333wewz(v=vs.110).aspx" target="_blank">EndsWith</a> or <a href="https://msdn.microsoft.com/en-us/library/dy85x1sa(v=vs.110).aspx" target="_blank">Contains</a> as appropriate)</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">expandedTags</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">HashSet</span><span class="p">();</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">wildcard</span> <span class="k">in</span> <span class="n">wildcardsToExpand</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nf">IsWildCard</span><span class="p">(</span><span class="n">tagToExpand</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">rawTagPattern</span> <span class="p">=</span> <span class="n">tagToExpand</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="s">"*"</span><span class="p">,</span> <span class="s">""</span><span class="p">);</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">tag</span> <span class="k">in</span> <span class="n">allTags</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nf">IsActualMatch</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span> <span class="n">tagToExpand</span><span class="p">,</span> <span class="n">rawTagPattern</span><span class="p">))</span>
<span class="n">expandedTags</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">tag</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">allTags</span><span class="p">.</span><span class="nf">ContainsKey</span><span class="p">(</span><span class="n">tagToExpand</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">expandedTags</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">tagToExpand</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This works fine with a few wildcards, but it’s not very efficient. Even on a relatively small data-set containing 32,000 tags, it’s slow when comparing it to 210 <code class="language-plaintext highlighter-rouge">wildcardsToExpand</code>, taking over a second. After chatting to a few of the Stack Overflow developers on Twitter, they consider a Tag Engine query that takes longer than 500 milliseconds to be slow, so a second just to apply the wildcards is unacceptable.</p>
<h2 id="trigram-index"><a name="TrigramIndex"></a><strong>Trigram Index</strong></h2>
<p>So can we do any better? Well it turns out that that there is a really nice technique for doing <a href="https://swtch.com/~rsc/regexp/regexp4.html" target="_blank">Regular Expression Matching with a Trigram Index</a> that is used in <a href="https://code.google.com/p/chromium/codesearch" target="_blank">Google Code Search</a>. I’m not going to explain all the details, the linked page has a very readable explanation. But basically what you do is create an <em>inverted index</em> of the tags and search the index instead. That way you aren’t affected so much by the amount of wilcards, because you are only searching via an index rather than a full search that runs over the whole list of tags.</p>
<p>For instance when using Trigrams, the tags are initially split into 3 letter chunks, for instance the expansion for the tag <em>javascript</em> is shown below (‘_’ is added to denote the start/end of a word):</p>
<blockquote>
<p>_ja, jav, ava, vas, asc, scr, cri, rip, ipt, pt_</p>
</blockquote>
<p>Next you create an index of all the tags as trigrams and include the position of tag they came from so that you can reference back to it later:</p>
<blockquote>
<ul>
<li>_ja -> { 0, 5, 6 }</li>
<li>jav -> { 0, 5, 12 }</li>
<li>ava -> { 0, 5, 6 }</li>
<li>va_ -> { 0, 5, 11, 13 }</li>
<li>_ne -> { 1, 10, 12 }</li>
<li>net -> { 1, 10, 12, 15 }</li>
<li>…</li>
</ul>
</blockquote>
<p>For example if you want to match any tags that contain <em>java</em> any where in the tag, i.e. a *java* wildcard query, you fetch the index values for <code class="language-plaintext highlighter-rouge">jav</code> and <code class="language-plaintext highlighter-rouge">ava</code>, which gives you (from above) these 2 matching index items:</p>
<blockquote>
<ul>
<li>jav -> { 0, 5, 12 }</li>
<li>ava -> { 0, 5, 6 }</li>
</ul>
</blockquote>
<p>and you now know that the tags with index <em>0</em> and <em>5</em> are the only matches because they have <code class="language-plaintext highlighter-rouge">jav</code> and <code class="language-plaintext highlighter-rouge">ava</code> (<em>6</em> and <em>12</em> don’t have both)</p>
<h2 id="results"><a name="Results"></a><strong>Results</strong></h2>
<p>On my laptop I get the results shown below, where <code class="language-plaintext highlighter-rouge">Contains</code> is the naive way shown above and <code class="language-plaintext highlighter-rouge">Regex</code> is an <em>attempt</em> to make it faster by using compiled Regex queries (which was actually slower)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Expanded to 7,677 tags (Contains), took 721.51 ms
Expanded to 7,677 tags (Regex), took 1,218.69 ms
Expanded to 7,677 tags (Trigrams), took 54.21 ms
</code></pre></div></div>
<p>As you can see, the inverted index using Trigrams is a clear winner. If you are interested, the <a href="https://github.com/mattwarren/StackOverflowTagServer/blob/master/TagServer/WildcardProcessor.cs" target="_blank">source code</a> is available on GitHub.</p>
<p>In this post I showed <em>one way</em> that the Tag Engine could implement wildcards matching. As I don’t work at Stack Overflow there’s no way of knowing if they use the same method or not, but at the very least my method is pretty quick!</p>
<h2 id="future-posts"><a name="FuturePosts"></a><strong>Future Posts</strong></h2>
<p>But there’s still more things to implement, in future posts I hope to cover the following:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/tagged/.net+or+jquery-" target="_blank">Complex boolean queries</a>, i.e. questions tagged “c# OR .NET”, “.net AND (NOT jquery)” and how to make them fast</li>
<li><a href="http://stackstatus.net/post/107352821074/outage-postmortem-january-6th-2015" target="_blank">How a DDOS attack on TagServer</a> <em>might</em> have been caused</li>
</ul>
<blockquote>
<p>In October, we had a situation where a flood of crafted requests were causing high resource utilization on our Tag Engine servers, which is our internal application for associating questions and tags in a high-performance way.</p>
</blockquote>
<p>The post <a href="http://www.mattwarren.org/2015/08/19/the-stack-overflow-tag-engine-part-2/">The Stack Overflow Tag Engine – Part 2</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The Stack Overflow Tag Engine – Part 12014-11-01T00:00:00+00:00http://www.mattwarren.org/2014/11/01/the-stack-overflow-tag-engine-part-1
<p>I’ve added a <a href="/resources/"><strong>Resources</strong></a> and <a href="/speaking/"><strong>Speaking</strong></a> page to my site, check them out if you want to learn more.</p>
<hr />
<h2 id="stack-overflow-tag-engine"><a name="Introduction"></a><strong>Stack Overflow Tag Engine</strong></h2>
<p>I first heard about the Stack Overflow <a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank"><em>Tag engine of doom</em></a> when I read about <a href="http://blog.marcgravell.com/2011/10/assault-by-gc.html" target="_blank">their battle with the .NET Garbage Collector</a>. If you haven’t heard of it before I recommend reading the previous links and then this interesting <a href="http://blog.marcgravell.com/2014/04/technical-debt-case-study-tags.html" target="_blank">case-study on technical debt</a>.</p>
<p>But if you’ve ever visited <a href="http://www.stackoverflow.com" target="_blank">Stack Overflow</a> you will have used it, maybe without even realising. It powers the pages under <code class="language-plaintext highlighter-rouge">stackoverflow.com/questions/tagged</code>, for instance you can find the questions tagged <a href="http://stackoverflow.com/questions/tagged/.net" target="_blank">.NET</a>, <a href="http://stackoverflow.com/questions/tagged/c%23" target="_blank">C#</a> or <a href="http://stackoverflow.com/questions/tagged/java" target="_blank">Java</a> and you get a page like this (note the related tags down the right-hand side):</p>
<p><a href="http://stackoverflow.com/questions/tagged/.net" target="_blank"><img src="/images/2014/10/dotnet-tag.png" alt="dotNet Tag" class="aligncenter" /></a></p>
<h2 id="tag-api"><a name="TagAPI"></a><strong>Tag API</strong></h2>
<p>As well as simple searches, you can also tailor the results with more complex queries (you may need to be logged into the site for these links to work), so you can search for:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/tagged/.net+or+jquery-" target="_blank">questions tagged with .NET but not jQuery</a></li>
<li><a href="http://stackoverflow.com/questions/tagged/c%23?order=desc&sort=votes" target="_blank">the most popular C# questions (by votes)</a></li>
<li><a href="http://stackoverflow.com/questions/tagged/xml?sort=frequent&page=10&pagesize=5" target="_blank">page 10 of the most frequently linked to XML question</a></li>
<li><a href="http://stackoverflow.com/questions/tagged/.net?page=197709&sort=newest&pagesize=1" target="_blank">the oldest .NET question</a></li>
</ul>
<p>It’s worth noting that all these searches take your personal preferences into account. So if you have asked to have any tags excluded, questions containing these tags are filtered out. You can see your preferences by going to your account page and clicking on <em>Preferences</em>, the <em>Ignored Tags</em> are then listed at the bottom of the page. Apparently some power-users on the site have 100’s of ignored tags, so dealing with these is a non-trivial problem.</p>
<h2 id="publicly-available-question-data-set"><a name="DataSet"></a><strong>Publicly available Question Data set</strong></h2>
<p>As I said I wanted to see what was involved in building a version of the Tag Engine. Fortunately, data from <a href="https://archive.org/details/stackexchange" target="_blank">all the Stack Exchange sites</a> is available to download. To keep things simple I just worked with the posts (not their entire history of edits), so I downloaded <a href="https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z" target="_blank">stackoverflow.com-Posts.7z</a> (warning direct link to 5.7 GB file), which appears to contain data up-to the middle of September 2014. To give an idea of what is in the data set, a typical question looks like the .xml below. For the Tag Engine we only need the items highlighted in red, because it is only providing an index into the actual questions themselves, so we ignore any <strong>content</strong> and just look at the <strong>meta-data</strong>.</p>
<p><a href="/images/2014/10/sample-question-parts-used-highlighted-in-red.png" target="_blank"><img src="/images/2014/10/sample-question-parts-used-highlighted-in-red.png" alt="Sample Question" class="aligncenter" /></a></p>
<p>Below is the output of the code that runs on start-up and processes the data, you can see there are just over 7.9 millions questions in the data set, taking up just over 2GB of memory, when read into a <a href="https://github.com/mattwarren/StackOverflowTagServer/blob/master/Shared/Question.cs" target="_blank"><code class="language-plaintext highlighter-rouge">List<Question></code></a>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Took 00:00:31.623 to DE-serialise 7,990,787 Stack Overflow Questions, used 2136.50 MB
Took 00:01:14.229 (74,229 ms) to group all the tags, used 2799.32 MB
Took 00:00:34.148 (34,148 ms) to create all the "related" tags info, used 362.57 MB
Took 00:01:31.662 (91,662 ms) to sort the 191,025 arrays
After SETUP - Using 4536.21 MB of memory in total
</code></pre></div></div>
<p>So it takes roughly <em>31 seconds</em> to de-serialise the data from disk (yay <a href="https://code.google.com/p/protobuf-net/">protobuf-net</a>!) and another <em>3 1/2 minutes</em> to process and sort it. At the end we are using roughly 4.5GB of memory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Max LastActivityDate 14/09/2014 03:07:29
Min LastActivityDate 18/08/2008 03:34:29
Max CreationDate 14/09/2014 03:06:45
Min CreationDate 31/07/2008 21:42:52
Max Score 8596 (Id 11227809)
Min Score -147
Max ViewCount 1917888 (Id 184618)
Min ViewCount 1
Max AnswerCount 518 (Id 184618)
Min AnswerCount 0
</code></pre></div></div>
<p>Yes that’s right, there is actually a Stack Overflow questions with <a href="http://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered" target="_blank">1.9 million views</a>, not surprisingly it’s locked for editing, but it’s also considered “not constructive”! The same question also has 518 answers, the most of any on the site and if you’re wondering, the question with the highest score has an impressive 8192 votes and is titled <a href="http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array" target="_blank">Why is processing a sorted array faster than an unsorted array?</a></p>
<h2 id="creating-an-index"><a name="CreatingAnIndex"></a><strong>Creating an Index</strong></h2>
<p>So what does the index actually look like, well it’s basically a series of sorted lists (<code class="language-plaintext highlighter-rouge">List<int></code>) that contain an offset into the main <code class="language-plaintext highlighter-rouge">List<Question></code> that contains all the <a href="https://github.com/mattwarren/StackOverflowTagServer/blob/master/Shared/Question.cs" target="_blank"><code class="language-plaintext highlighter-rouge">Question</code></a> data. Or in a diagram, something like this:</p>
<p><a href="/images/2014/09/indexing-explanation.png" target="_blank"><img src="/images/2014/09/indexing-explanation.png" alt="Indexing explanation" /></a></p>
<p><strong>Note:</strong> This is very similar to the way that <a href="http://lucene.apache.org/" target="_blank">Lucene</a> indexes data.</p>
<p>It turns out the the code to do this isn’t that complex:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// start with a copy of the main array, with Id's in order, { 0, 1, 2, 3, 4, 5, ..... }</span>
<span class="n">tagsByLastActivityDate</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Dictionary</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">int</span><span class="p">[</span><span class="k">]></span><span class="p">(</span><span class="n">groupedTags</span><span class="p">.</span><span class="n">Count</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">byLastActivityDate</span> <span class="p">=</span> <span class="n">tag</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">Positions</span><span class="p">.</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="n">Array</span><span class="p">.</span><span class="nf">Sort</span><span class="p">(</span><span class="n">byLastActivityDate</span><span class="p">,</span> <span class="n">comparer</span><span class="p">.</span><span class="n">LastActivityDate</span><span class="p">);</span>
</code></pre></div></div>
<p>Where the comparer is as simple as the following (note that is sorting the <code class="language-plaintext highlighter-rouge">byLastActiviteDate</code> array, using the values in the <code class="language-plaintext highlighter-rouge">question</code> array to determine the sort order.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="kt">int</span> <span class="nf">LastActivityDate</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">questions</span><span class="p">[</span><span class="n">y</span><span class="p">].</span><span class="n">LastActivityDate</span> <span class="p">==</span> <span class="n">questions</span><span class="p">[</span><span class="n">x</span><span class="p">].</span><span class="n">LastActivityDate</span><span class="p">)</span>
<span class="k">return</span> <span class="nf">CompareId</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="c1">// Compare LastActivityDate DESCENDING, i.e. most recent is first</span>
<span class="k">return</span> <span class="n">questions</span><span class="p">[</span><span class="n">y</span><span class="p">].</span><span class="n">LastActivityDate</span><span class="p">.</span><span class="nf">CompareTo</span><span class="p">(</span><span class="n">questions</span><span class="p">[</span><span class="n">x</span><span class="p">].</span><span class="n">LastActivityDate</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So once we’ve created the sorted list on the left and right of the diagram above (<code class="language-plaintext highlighter-rouge">Last Edited</code> and <code class="language-plaintext highlighter-rouge">Score</code>), we can just traverse them <em>in order</em> to get the indexes of the <code class="language-plaintext highlighter-rouge">Questions</code>. For instance if we walk through the <code class="language-plaintext highlighter-rouge">Score</code> array in order <code class="language-plaintext highlighter-rouge">(1, 2, .., 7, 8)</code>, collecting the Id’s as we go, we end up with <code class="language-plaintext highlighter-rouge">{ 8, 4, 3, 5, 6, 1, 2, 7 }</code>, which are the array indexes for the corresponding <code class="language-plaintext highlighter-rouge">Questions</code>. The code to do this is the following, taking account of the <code class="language-plaintext highlighter-rouge">pageSize</code> and <code class="language-plaintext highlighter-rouge">skip</code> values:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="n">queryInfo</span><span class="p">[</span><span class="n">tag</span><span class="p">]</span>
<span class="p">.</span><span class="nf">Skip</span><span class="p">(</span><span class="n">skip</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Take</span><span class="p">(</span><span class="n">pageSize</span><span class="p">)</span>
<span class="p">.</span><span class="nf">Select</span><span class="p">(</span><span class="n">i</span> <span class="p">=></span> <span class="n">questions</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="p">.</span><span class="nf">ToList</span><span class="p">();</span>
</code></pre></div></div>
<p>Once that’s all done, I ended up with an API that you can query in the browser. Note that the timing is the time taken on the server-side, but it is correct, basic queries against a single tag are lightening quick!</p>
<p><a href="/images/2014/10/API Usage in Chrome.png" target="_blank"><img src="/images/2014/10/API Usage in Chrome.png" /></a></p>
<h2 id="next-time"><a name="NextTime"></a><strong>Next time</strong></h2>
<p>Now that the basic index is setup, next time I’ll be looking at how to handle:</p>
<ul>
<li>Complex boolean queries <code class="language-plaintext highlighter-rouge">.net or jquery- and c#</code></li>
<li>Power users who have 100’s of excluded tags</li>
</ul>
<p>and anything else that I come up with in the meantime.</p>
<p>The post <a href="http://www.mattwarren.org/2014/11/01/the-stack-overflow-tag-engine-part-1/">The Stack Overflow Tag Engine – Part 1</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
The Art of Benchmarking (Updated 2014-09-23)2014-09-19T00:00:00+00:00http://www.mattwarren.org/2014/09/19/the-art-of-benchmarking
<h4 id="tldr"><strong>tl;dr</strong></h4>
<p>Benchmarking is hard, it’s very easy to end up “<em>not measuring, what you think you are measuring</em>”</p>
<hr />
<p><strong>Update (2014-09-23):</strong> Sigh - I made a pretty big mistake in these benchmarks, fortunately Reddit user <a href="http://www.reddit.com/user/zvrba" target="_blank">zvrba</a> corrected me:</p>
<p><a href="http://www.reddit.com/r/programming/comments/2guj0t/the_art_of_benchmarking_aka_fighting_the_jit/" target="_blank"><img src="/images/2014/09/reddit-post-showing-my-mistake.png" alt="Reddit post showing my mistake" /></a></p>
<p>Yep, can’t argue with that, see <a href="#results">Results</a> and <a href="#resources">Resources</a> below for the individual updates.</p>
<h4 id="-intro-to-benchmarks"><a name="intro_to_benchmarks"></a> <strong>Intro to Benchmarks</strong></h4>
<p>To start with, lets clarify what types of benchmarks we are talking about. Below is a table from the <a href="http://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf" target="_blank">DEVOXX talk</a> by <a href="http://shipilev.net/" target="_blank">Aleksey Shipilev</a>, who works on the <a href="http://openjdk.java.net/projects/code-tools/jmh/" target="_blank">Java Micro-benchmarking Harness</a> (JMH)</p>
<ul>
<li>kilo: > 1000 s, Linpack</li>
<li>????: 1…1000 s, SPECjvm2008, SPECjbb2013</li>
<li>milli: 1…1000 ms, SPECjvm98, SPECjbb2005</li>
<li>micro: 1…1000 us, single webapp request</li>
<li>nano: 1…1000 ns, single operations</li>
<li>pico: 1…1000 ps, pipelining</li>
</ul>
<p>He then goes on to say:</p>
<ul>
<li><strong>Milli</strong>benchmarks are not really hard</li>
<li><strong>Micro</strong>benchmarks are challenging, but OK</li>
<li><strong>Nano</strong>benchmarks are the damned beasts!</li>
<li><strong>Pico</strong>benchmarks…</li>
</ul>
<p>This post is talking about <strong>micro</strong> and <strong>nano</strong> benchmarks, that is ones where the code we are measuring takes <strong>microseconds</strong> or <strong>nanoseconds</strong> to execute.</p>
<h4 id="-first-attempt"><a name="first_attempt"></a> <strong>First attempt</strong></h4>
<p>Let’s start with a <a href="http://stackoverflow.com/questions/1047218/benchmarking-small-code-samples-in-c-can-this-implementation-be-improved/1048708#1048708" target="_blank">nice example</a> available from Stack Overflow:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">void</span> <span class="nf">Profile</span><span class="p">(</span><span class="kt">string</span> <span class="n">description</span><span class="p">,</span> <span class="kt">int</span> <span class="n">iterations</span><span class="p">,</span> <span class="n">Action</span> <span class="n">func</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// clean up</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">WaitForPendingFinalizers</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="c1">// warm up </span>
<span class="nf">func</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">watch</span> <span class="p">=</span> <span class="n">Stopwatch</span><span class="p">.</span><span class="nf">StartNew</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">iterations</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="nf">func</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">description</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Time Elapsed {0} ms"</span><span class="p">,</span>
<span class="n">watch</span><span class="p">.</span><span class="n">Elapsed</span><span class="p">.</span><span class="n">TotalMilliseconds</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>You then use it like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Profile</span><span class="p">(</span><span class="s">"a description"</span><span class="p">,</span> <span class="n">how_many_iterations_to_run</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span>
<span class="p">{</span>
<span class="c1">// ... code being profiled</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Now there is a lot of good things that this code sample is doing:</p>
<ul>
<li>Eliminating the overhead of the .NET GC (as much as possible), by making sure it has run before the timing takes place</li>
<li>Calling the function that is being profiled, outside the timing loop, so that the overhead of the .NET JIT Compiler isn’t included in the benchmark itself. The first time a function is called the JITter steps in and converts the code from IL into machine code, so that it can actually be executed by the CPU.</li>
<li>Using <code class="language-plaintext highlighter-rouge">Stopwatch</code> rather than <code class="language-plaintext highlighter-rouge">DateTime.Now</code>, Stopwatch is a high-precision timer with a low-overhead, DateTime.Now isn’t!</li>
<li>Running a lot of iterations of the code (100,000’s), to give an accurate measurement</li>
</ul>
<p>Now far be it from me to criticise a highly voted Stack Overflow answer, but that’s exactly what I’m going to do! I should add that for a whole range of scenarios the Stack Overflow code is absolutely fine, but it does have it’s limitations. There are several situations where this code doesn’t work, because it fails to actually profile the code you want it to.</p>
<h4 id="-baseline-benchmark"><a name="baseline_benchmark"></a> <strong>Baseline benchmark</strong></h4>
<p>But first let’s take a step back and look at the simplest possible case, with all the code inside the function. We’re going to measure the time that <a href="http://msdn.microsoft.com/en-us/library/system.math.sqrt(v=vs.110).aspx" target="_blank"><code class="language-plaintext highlighter-rouge">Math.Sqrt(..)</code></a> takes to execute, nice and simple:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">void</span> <span class="nf">ProfileDirect</span><span class="p">(</span><span class="kt">string</span> <span class="n">description</span><span class="p">,</span> <span class="kt">int</span> <span class="n">iterations</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// clean up</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">WaitForPendingFinalizers</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="c1">// warm up</span>
<span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">watch</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Stopwatch</span><span class="p">();</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Start</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">iterations</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"ProfileDirect - "</span> <span class="p">+</span> <span class="n">description</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span>
<span class="s">"{0:0.00} ms ({1:N0} ticks) (over {2:N0} iterations), {3:N0} ops/milliseconds"</span><span class="p">,</span>
<span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">,</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedTicks</span><span class="p">,</span> <span class="n">iterations</span><span class="p">,</span>
<span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">iterations</span> <span class="p">/</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And the results:</p>
<blockquote>
<p>ProfileDirect - 2.00 ms (7,822 ticks) (over 10,000,000 iterations), <strong>5,000,000 ops/millisecond</strong></p>
</blockquote>
<p>That’s 5 million operations per millisecond, I know CPU’s are fast, but that seems quite high!</p>
<p>For reference, the assembly code that the JITter produced is below, from this you can see that there is no <code class="language-plaintext highlighter-rouge">sqrt</code> instruction as we’d expect there to be. So in effect we are timing an empty loop!</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>; 91: var watch = new Stopwatch();
000000a6 lea rcx,[5D3EBA90h]
000000ad call 000000005F6722F0
000000b2 mov r12,rax
000000b5 mov rcx,r12
000000b8 call 000000005D284EF0
; 92: watch.Start();
000000bd mov rcx,r12
000000c0 call 000000005D284E60
; 93: for (int i = 0; i < iterations; i++)
000000c5 mov r13d,dword ptr [rbp+58h]
000000c9 test r13d,r13d
000000cc jle 00000000000000D7
000000ce xor eax,eax
000000d0 inc eax
000000d2 cmp eax,r13d
000000d5 jl 00000000000000D0
; 97: }
; 98: watch.Stop();
000000d7 mov rcx,r12
000000da call 000000005D32CBD0
; 99: Console.WriteLine(description + " (ProfileDirect)");
</code></pre></div></div>
<p><strong>Note:</strong> To be able to get the optimised version of the assembly code that JITter produces, see <a href="http://msdn.microsoft.com/en-us/library/ms241594.aspx" target="_blank">this MSDN page</a>. If you just debug the code normally in Visual Studio, you only get the un-optimised code, which doesn’t help at all.</p>
<h4 id="-dead-code-elimination"><a name="dead_code_elimination"></a> <strong>Dead-code elimination</strong></h4>
<p>One of the main problems with writing benchmarks is that you are often fighting against the just-in-time (JIT) compiler, which is trying to optimise the code as much as it can. One of the many things is does, is to remove code that it thinks is not needed, or to be more specific, code it thinks has no <em>side-effects</em>. This is non-trivial to do, there’s some really tricky <a href="http://stackoverflow.com/questions/10943370/could-the-net-jitter-optimise-away-a-while-xmlreader-read-loop/10943403#10943403" target="_blank">edge-cases to worry about</a>, aside from the more obvious problem of knowing which code causes side-effects and which doesn’t. But this is exactly what is happening in the original profiling code.</p>
<p><strong>Aside:</strong> For a full list of all the optimisations that the .NET JIT Compiler performs, see this <a href="http://stackoverflow.com/questions/4043821/performance-differences-between-debug-and-release-builds/4045073#4045073" target="_blank">very thorough SO answer</a>.</p>
<p>So let’s fix the original code, by storing the result of <code class="language-plaintext highlighter-rouge">Math.Sqrt</code> in a variable:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">static</span> <span class="kt">double</span> <span class="n">result</span><span class="p">;</span>
<span class="k">static</span> <span class="k">void</span> <span class="nf">ProfileDirect</span><span class="p">(</span><span class="kt">string</span> <span class="n">description</span><span class="p">,</span> <span class="kt">int</span> <span class="n">iterations</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// clean up</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">WaitForPendingFinalizers</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="c1">// warm up</span>
<span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">watch</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Stopwatch</span><span class="p">();</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Start</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">iterations</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"ProfileDirect - "</span> <span class="p">+</span> <span class="n">description</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span>
<span class="s">"{0:0.00} ms ({1:N0} ticks) (over {2:N0} iterations), {3:N0} ops/milliseconds"</span><span class="p">,</span>
<span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">,</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedTicks</span><span class="p">,</span>
<span class="n">iterations</span><span class="p">,</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">iterations</span> <span class="p">/</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p><strong>Note</strong>: <code class="language-plaintext highlighter-rouge">result</code> has to be a class-level field, it can’t be local to the method, i.e. <code class="language-plaintext highlighter-rouge">double result = Math.Sqrt(123.456)</code>. This is because the JITter is clever enough to figure out that the local field isn’t accessed outside of the method and optimise it away, again you are always fighting against the JITter.</p>
<p>So now the results look like this, which is a bit more sane!</p>
<blockquote>
<p>ProfileDirectWithStore - 68.00 ms (180,801 ticks) (over 10,000,000 iterations), <strong>147,059 ops/millisecond</strong></p>
</blockquote>
<h4 id="loop-unrolling"><strong>Loop-unrolling</strong></h4>
<p>One other thing you have to look out for is whether or not the time spent running the loop is dominating the code you want to profile. In this case <code class="language-plaintext highlighter-rouge">Math.Sqrt()</code> ends up as a few assembly instructions, so less time is spent executing that, compared to the instructions needed to make <code class="language-plaintext highlighter-rouge">for (..)</code> loop happen.</p>
<p>To fix this we can unroll the loop, so that we execute <code class="language-plaintext highlighter-rouge">Math.Sqrt(..)</code> multiple times per loop, but to compensate we run the loop less times. The code now looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">void</span> <span class="nf">ProfileDirectWithStoreUnrolledx10</span><span class="p">(</span><span class="kt">string</span> <span class="n">description</span><span class="p">,</span> <span class="kt">int</span> <span class="n">iterations</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// clean up</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">WaitForPendingFinalizers</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">Collect</span><span class="p">();</span>
<span class="c1">// warm up</span>
<span class="kt">var</span> <span class="n">temp</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">watch</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Stopwatch</span><span class="p">();</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Start</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">loops</span> <span class="p">=</span> <span class="n">iterations</span> <span class="p">/</span> <span class="m">10</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">loops</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="m">123.456</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">watch</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"ProfileDirectWithStoreUnrolled x10 - "</span> <span class="p">+</span> <span class="n">description</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span>
<span class="s">"{0:0.00} ms ({1:N0} ticks) (over {2:N0} iterations), {3:N0} ops/milliseconds"</span><span class="p">,</span>
<span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">,</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedTicks</span><span class="p">,</span> <span class="n">iterations</span><span class="p">,</span>
<span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">iterations</span> <span class="p">/</span> <span class="n">watch</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And now the result is:</p>
<blockquote>
<p>ProfileDirectWithStoreUnrolled x10 -
47.00 ms (124,582 ticks) (over 10,000,000 iterations), <strong>212,766 ops/millisecond</strong></p>
</blockquote>
<p>So we are now doing 212,766 ops/millisecond, compared to 147,059 when we didn’t unroll the loop. I did some further tests to see if unrolling the loop 20 or 40 times made any further difference and if did continue to get slightly faster, but the change was not significant.</p>
<h4 id="-results"><a name="results"></a> <strong>Results</strong></h4>
<p>These results were produced by running the code in RELEASE mode and launching the application from outside Visual Studio, also the .exe’s were explicitly compiled in x86/x64 mode and optimisations were turned on. To ensure I didn’t mess up, I included some <a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L344" target="_blank">diagnostic code in the application</a>, that prints out a message in red if anything is setup wrong. Finally these tests were run with .NET 4.5, so the results will be different under other versions, the JIT compilers have brought in more and more optimisations over time.</p>
<p>As seen in the chart below the best results for <strong>64-bit</strong> (red) were achieved when we unrolled the loop (“ProfileDirectWithStoreUnrolled”). There are other other results that were faster, but in these the actual code we wanted to profile was optimised away by the JITter (“Profile via an Action”, “ProfileDirect” and “ProfileDirectWithConsume”).</p>
<p><a href="/images/2014/09/math-sqrt-results-graph.png" target="_blank"><img src="/images/2014/09/math-sqrt-results-graph.png" alt="Math.Sqrt() - results graph" /></a></p>
<p><strong>Update (2014-09-23):</strong> The correct results are in the chart below</p>
<p><a href="/images/2014/09/math-sqrt-results-graph-after-reddit-fixes.png" target="_blank"><img src="/images/2014/09/math-sqrt-results-graph-after-reddit-fixes.png" alt="Math.Sqrt() - results graph - AFTER Reddit fixes" /></a></p>
<h4 id="clr-jit-compiler---32-bit-v-64-bit"><strong>CLR JIT Compiler - 32-bit v. 64-bit</strong></h4>
<p>You might have noticed that the 32-bit and 64-bit results in the graph vary per test, why is this? Well one reason is the fundamental difference between 32-bit and 64-bit, 64-bit has 8 byte pointers compared to 4 byte ones in 32-bit. But the larger difference is that in .NET there are <a href="http://blogs.msdn.com/b/dotnet/archive/2013/09/30/ryujit-the-next-generation-jit-compiler.aspx" target="_blank">2 different JIT compilers, with different goals</a></p>
<blockquote>
<p>The .NET 64-bit JIT was originally designed to <strong>produce very efficient code throughout the long run of a server process</strong>. This differs from the .NET x86 JIT, which was optimized to <strong>produce code quickly so that the program starts up fast</strong>. Taking time to compile efficient code made sense when 64-bit was primarily for server code. But “server code” today includes web apps that have to start fast. The 64-bit JIT currently in .NET isn’t always fast to compile your code, meaning you have to rely on other technologies such as NGen or background JIT to achieve fast program startup.</p>
</blockquote>
<p>However one benefit of <a href="http://blogs.msdn.com/b/dotnet/archive/2013/09/30/ryujit-the-next-generation-jit-compiler.aspx" target="_blank">RyuJIT (the next generation JIT Compiler)</a> is that it’s a common code base for 32-bit and 64-bit, so when it comes out, everything may change! (BTW <em>RyuJIT</em>, <a href="https://twitter.com/matthewwarren/status/512176548678742016" target="_blank">what a great name</a>)</p>
<p>For reference the assembly code that is generated in both cases is available:</p>
<ul>
<li><a href="https://gist.github.com/mattwarren/c44a08eedb46c01aad51" target="_blank">32-bit version</a> where the <a href="http://x86.renejeschke.de/html/file_module_x86_id_116.html" target="_blank"><strong>fsqrt</strong> instruction</a> is used</li>
<li><a href="https://gist.github.com/mattwarren/faa0ebf6a1b5ff81a08e" target="_blank">64-bit version</a> where the <a href="http://x86.renejeschke.de/html/file_module_x86_id_300.html" target="_blank"><strong>sqrtsd</strong> instruction</a> is used</li>
</ul>
<h4 id="-but-theres-still-more-to-do"><a name="still_more_to_do"></a> <strong>But there’s still more to do</strong></h4>
<p>Even though this post is over 2000 words longs, it still hasn’t covered:</p>
<ul>
<li>How you store and present the results</li>
<li>How users can write their own benchmarks</li>
<li>Multi-threaded benchmarks</li>
<li>Allowing state in benchmarks</li>
</ul>
<p>And there’s even more than that to worry about, see the complete list below, taken from <a href="https://groups.google.com/d/msg/mechanical-sympathy/m4opvy4xq3U/7lY8x8SvHgwJ" target="_blank">this discussion thread</a> on the excellent <em>mechanical sympathy</em> group:</p>
<ol>
<li>Dynamic selection of benchmarks.</li>
<li>Loop optimizations.</li>
<li>Dead-code elimination.</li>
<li>Constant foldings</li>
<li>Non-throughput measures</li>
<li>Synchronize iterations</li>
<li>Multi-threaded sharing</li>
<li>Multi-threaded setup/teardown</li>
<li>False-sharing</li>
<li>Asymmetric benchmarks</li>
<li>Inlining</li>
</ol>
<p>Note: these are only the headings, the discussion goes into a lot of detail about how these issues are solved in JMH. But whilst the JVM and the CLR do differ in a number of ways, a lot of what is said applies to writing benchmarks for the CLR.</p>
<p>The summary from <a href="https://twitter.com/shipilev" target="_blank">Aleksey</a> sums it all up really!</p>
<blockquote>
<p>The benchmarking harness business is <strong>very hard, and very non-obvious</strong>. My own
experience tells me even the smartest people make horrible mistakes in them,
myself included. We try to get around that by fixing more and more things
in JMH as we discover more, even if that means significant API changes….</p>
</blockquote>
<blockquote>
<p><strong>The job for a benchmark harness it to provide [a] reliable benchmarking
environment</strong> …</p>
</blockquote>
<h4 id="-resources"><a name="resources"></a> <strong>Resources</strong></h4>
<p>Here’s a list of all the code samples and other data used in making this post:</p>
<ol>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79" target="_blank">The full benchmarking code</a> <strong>Updated (2014-09-23)</strong></li>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L94" target="_blank">Profile via an <code class="language-plaintext highlighter-rouge">Action</code></a></li>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L248">Profile Direct</a></li>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L270" target="_blank">Profile Direct, storing the result (BROKEN)</a></li>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L292" target="_blank">Profile Direct, storing the result (FIXED)</a></li>
<li><a href="https://gist.github.com/mattwarren/69070616cf0efbb68a79#file-benchmarking-cs-L339" target="_blank">Profile Direct, storing the result, unrolled 10 times</a></li>
<li><a href="/images/2014/09/benchmark-results-math-sqrt1.xlsx" target="_blank">Spreadsheet of results</a> <strong>Updated (2014-09-23)</strong></li>
<li>Generated assembly code <strong>Updated (2014-09-23)</strong>:</li>
<li><a href="https://gist.github.com/mattwarren/02ca1567cecbd6ea68a0" target="_blank">Profile via a <code class="language-plaintext highlighter-rouge">Action</code></a></li>
<li><a href="https://gist.github.com/mattwarren/dcd546babf76986125ea" target="_blank">Profile Direct</a></li>
<li><a href="https://gist.github.com/mattwarren/e2bdb25a17eb785295d1" target="_blank">Profile Direct and storing the result (BROKEN)</a></li>
<li><a href="https://gist.github.com/mattwarren/0a5a52c57bb82d296f16" target="_blank">Profile Direct and storing the result (FIXED)</a></li>
</ol>
<h4 id="-further-reading"><a name="further_reading"></a> <strong>Further Reading</strong></h4>
<p>There’s lots of really good information out there related to writing benchmarks and understanding what the .NET JIT compiler is doing, below are just some of the links I’ve found:</p>
<ul>
<li><strong>Writing good benchmarks</strong>
<ul>
<li>http://www.yoda.arachsys.com/csharp/benchmark.html</li>
<li>http://blogs.msmvps.com/jonskeet/2009/01/26/benchmarking-made-easy/</li>
<li>http://blogs.msdn.com/b/vancem/archive/2009/02/06/measureit-update-tool-for-doing-microbenchmarks.aspx</li>
<li>http://measureitdotnet.codeplex.com/</li>
</ul>
</li>
<li><strong>JIT Optimisations, including method in-lining and dead code eliminations</strong>
<ul>
<li>http://blogs.microsoft.co.il/sasha/2007/02/27/jit-optimizations-inlining-and-interface-method-dispatching-part-1-of-n/</li>
<li>http://blogs.microsoft.co.il/sasha/2007/08/12/jit-optimizations-inlining-and-interface-method-dispatching-part-2-of-n/</li>
<li>http://blogs.microsoft.co.il/sasha/2012/01/20/aggressive-inlining-in-the-clr-45-jit/</li>
<li>http://blogs.microsoft.co.il/sasha/2012/06/22/micro-benchmarking-done-wrong-and-for-the-wrong-reasons/</li>
<li>http://blogs.msdn.com/b/ericgu/archive/2004/01/29/64717.aspx</li>
<li>http://blogs.msdn.com/b/jmstall/archive/2006/03/13/dead-code-elimination.aspx</li>
<li>http://blogs.msdn.com/b/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx</li>
<li>http://stackoverflow.com/questions/4043821/performance-differences-between-debug-and-release-builds/4045073#4045073</li>
</ul>
</li>
<li><strong>Inspecting generated assembly code</strong>
<ul>
<li>http://blogs.msdn.com/b/vancem/archive/2006/02/20/535807.aspx</li>
<li>http://www.cuttingedge.it/blogs/steven/downloads/Program_InlinableMethodTests.cs</li>
<li>http://www.philosophicalgeek.com/2014/07/25/using-windbg-to-answer-implementation-questions-for-yourself-can-a-delegate-invocation-be-inlined/</li>
</ul>
</li>
</ul>
<p>The post <a href="http://www.mattwarren.org/2014/09/19/the-art-of-benchmarking/">The Art of Benchmarking (Updated 2014-09-23)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Stack Overflow - performance lessons (part 2)2014-09-05T00:00:00+00:00http://www.mattwarren.org/2014/09/05/stack-overflow-performance-lessons-part-2
<p>In <a href="/2014/09/01/stackoverflow-performance-lessons-part-1/" target="_blank">Part 1</a> I looked at some of the more general performance issues that can be learnt from Stack Overflow (the team/product), in Part 2 I’m looking at some of the examples of <strong>coding</strong> performance lessons.</p>
<hr />
<p>Please don’t take these blog posts as blanket recommendations of techniques that you should go away and apply to your code base. They are specific optimisations that you can use if you want to squeeze every last drop of performance out of your CPU.</p>
<p>Also, don’t optimise anything unless you have measured and profiled first, you will probably optimise the wrong thing!</p>
<hr />
<h4 id="battles-with-the-net-garbage-collector"><strong>Battles with the .NET Garbage Collector</strong></h4>
<p>I first learnt about the performance work done in Stack Overflow (the site/company), when I read the post on their <a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank">battles with the .NET Garbage Collector (GC)</a>. If you haven’t read it, the short summary is that they were experiencing page load times that would suddenly spike to the 100’s of msecs, compared to the normal sub 10 msecs they were use to. After investigating for a few days they narrowed the problem down to the behaviour of the GC. GC pauses are a real issue and even the new modes available in .NET 4.5 don’t fully eliminate them, see my <a href="/2014/06/23/measuring-the-impact-of-the-net-garbage-collector-an-update/" target="_blank">previous investigation for more information</a>.</p>
<p>One thing to remember is that to make this all happen, they needed the following items in place:</p>
<ul>
<li><strong>Monitoring in production</strong> - these issues would only show up under load, once the application had been running for a while, so they would be very hard to recreate in staging or during development.</li>
<li><strong>Multiple measurements</strong> - they recorded both ASP.NET and IIS web server response times and were able to cross-reference them (see image below).</li>
<li><strong>Storing outliers</strong> - these spikes rarely happened so <a href="http://blog.serverfault.com/2011/07/25/a-non-foolish-consistency/" target="_blank">having detailed metrics was needed</a>, averages hide too much information.</li>
<li><strong>Good knowledge of the .NET GC</strong> - according to the article, it took them 3 weeks to identify and fix this issue <em>“So Marc and I set off on a 3 week adventure to resolve the memory pressure.”</em></li>
</ul>
<p><a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank"><img src="/images/2014/09/Stack Overflow - Battle with the .NET GC.png" /></a></p>
<p>You can read all the gory details of the fix and the follow-up in the posts below, but the <strong>tl;dr</strong> is that they removed of all the work that the .NET Garbage Collector had to do, thus eliminating the pauses:</p>
<ul>
<li><a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank">In managed code we trust, our recent battles with the .NET Garbage Collector</a></li>
<li><a href="http://blog.marcgravell.com/2011/10/assault-by-gc.html" target="_blank">Assault by GC</a></li>
<li><a href="http://blog.marcgravell.com/2014/04/technical-debt-case-study-tags.html" target="_blank">Technical Debt, a case study : tags</a> (a follow-up post)</li>
</ul>
<h4 id="jil---a-fast-json-deserializer-with-a-number-of-somewhat-crazy-optimization-tricks"><strong>Jil - A fast JSON (de)serializer, with a number of somewhat crazy optimization tricks.</strong></h4>
<p>But if you think that the <code class="language-plaintext highlighter-rouge">struct</code> based code they wrote is crazy, their JSON serialisation library, Jil, takes things to a new level. This is all in the pursuit of the maximum performance and based on their benchmarks, it seems to be working!
Note: protobuf-net is a binary serialisation library, but doesn’t support JSON, it’s only included is a base-line:</p>
<p><a href="https://github.com/kevin-montrose/Jil#deserialization" target="_blank"><img src="/images/2014/09/Jil Benchmarks.png" class="aligncenter" /></a></p>
<p>For instance, instead of writing code like this</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">T</span> <span class="n">Serialise</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="kt">string</span> <span class="n">json</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">isJSONP</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isJSONP</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// code to handle JSONP</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="c1">// code to handle regular JSON</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>They write code like this, which is a classic <a href="https://github.com/kevin-montrose/Jil#trade-memory-for-speed" target="_blank">memory/speed trade-off</a>.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="n">ISerialiser</span> <span class="nf">GetSerialiser</span><span class="p">(</span><span class="kt">bool</span> <span class="n">isJSONP</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isJSONP</span><span class="p">)</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">SerialiseWithJSONP</span><span class="p">();</span>
<span class="k">else</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">Serialiser</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">SerialiserWithJSONP</span> <span class="p">:</span> <span class="n">ISerialiser</span>
<span class="p">{</span>
<span class="k">private</span> <span class="n">T</span> <span class="n">Serialiser</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="kt">string</span> <span class="n">json</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// code to handle JSONP </span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">Serialiser</span> <span class="p">:</span> <span class="n">ISerialiser</span>
<span class="p">{</span>
<span class="k">private</span> <span class="n">T</span> <span class="n">Serialise</span><span class="p"><</span><span class="n">T</span><span class="p">>(</span><span class="kt">string</span> <span class="n">json</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// code to handle regular JSON</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This means that during serialisation there doesn’t need to be any <em>“feature switches”</em>, they just emit the different versions of the code at <em>creation time</em> and based on the options you specify, hand you the correct one. Of course the classes (<code class="language-plaintext highlighter-rouge">SerialiserWithJSONP</code> and <code class="language-plaintext highlighter-rouge">Serialiser</code> in this case) are dynamically created just once and then cached for later re-use, so the cost of the dymanic code generation is only paid once.</p>
<p>By doing this the code plays nicely with <a href="//stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array/11227902#11227902">CPU branch prediction</a>, because it has a nice predictable pattern that the CPU can easily work with. It also has the added benefit of making the methods smaller, which <em>may</em> make then candidates for <a href="http://blogs.msdn.com/b/ericgu/archive/2004/01/29/64717.aspx" target="_blank">in-lining by the the .NET JITter</a>.</p>
<p>For more examples of optimisations used, see the links below</p>
<ul>
<li><a href="https://github.com/kevin-montrose/Jil/commit/de8d5d49722a0eb3b5f3791ee67f1d55c1d4e3a1" target="_blank">fast skip white space optimisation</a></li>
<li><a href="https://github.com/kevin-montrose/Jil/commit/11e5dd8049225cb81352178517d55315b92705cf" target="_blank">signed integers optimisation</a></li>
</ul>
<h4 id="jil---marginal-gains"><strong>Jil - Marginal Gains.</strong></h4>
<p>On top of this the measure everything to ensure that the optimisations actually work! These tests are all run as unit-tests, allowing easy generation of the results, take a look at <a href="https://github.com/kevin-montrose/Jil/blob/master/JilTests/SpeedProofTests.cs#L266" target="_blank">ReorderMembers</a> for instance.</p>
<p><strong>Note:</strong> All the times are in milliseconds, but timed over <strong>1000’s of runs</strong>, not per call.</p>
<table>
<thead>
<tr>
<th><strong>Feature name</strong></th>
<th><strong>Original</strong></th>
<th><strong>Improved</strong></th>
<th><strong>Difference</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ReorderMembers</td>
<td>2721</td>
<td>2712</td>
<td>9</td>
</tr>
<tr>
<td>SkipNumberFormatting</td>
<td>166</td>
<td>163</td>
<td>3</td>
</tr>
<tr>
<td>UseCustomIntegerToString</td>
<td>589</td>
<td>339</td>
<td>250</td>
</tr>
<tr>
<td>SkipDateTimeMathMethods</td>
<td>108</td>
<td>100</td>
<td>8</td>
</tr>
<tr>
<td>UseCustomISODateFormatting</td>
<td>399</td>
<td>269</td>
<td>130</td>
</tr>
<tr>
<td>UseFastLists</td>
<td>277</td>
<td>267</td>
<td>10</td>
</tr>
<tr>
<td>UseFastArrays</td>
<td>486</td>
<td>469</td>
<td>17</td>
</tr>
<tr>
<td>UseFastGuids</td>
<td>744</td>
<td>304</td>
<td>440</td>
</tr>
<tr>
<td>AllocationlessDictionaries</td>
<td>134</td>
<td>127</td>
<td>7</td>
</tr>
<tr>
<td>PropagateConstants</td>
<td>77</td>
<td>35</td>
<td>42</td>
</tr>
<tr>
<td>AlwaysUseCharBufferForStrings</td>
<td>63</td>
<td>56</td>
<td>7</td>
</tr>
<tr>
<td>UseHashWhenMatchingMembers</td>
<td>141</td>
<td>131</td>
<td>10</td>
</tr>
<tr>
<td>DynamicDeserializer_UseFastNumberParsing</td>
<td>94</td>
<td>51</td>
<td>43</td>
</tr>
<tr>
<td>DynamicDeserializer_UseFastIntegerConversion</td>
<td>131</td>
<td>131</td>
<td>2</td>
</tr>
<tr>
<td>UseHashWhenMatchingEnums</td>
<td>38</td>
<td>10</td>
<td>28</td>
</tr>
<tr>
<td>UseCustomWriteIntUnrolledSigned</td>
<td>2182</td>
<td>1765</td>
<td>417</td>
</tr>
</tbody>
</table>
<p>This is very similar to the “<a href="http://www.bbc.co.uk/sport/0/olympics/19174302" target="_blank">Marginal Gains</a>” approach that worked so well for British Cycling in the last Olympics:</p>
<blockquote>
<p>There’s fitness and conditioning, of course, but there are other things that might seem on the periphery, like sleeping in the right position, having the same pillow when you are away and training in different places.
Do you really know how to clean your hands? Without leaving the bits between your fingers? If you do things like that properly, you will get ill a little bit less.
<strong>“They’re tiny things but if you clump them together it makes a big difference.”</strong></p>
</blockquote>
<h4 id="summary"><strong>Summary</strong></h4>
<p>All-in-all there is a lot to be learnt from code and blog posts that have come from Stack Overflow developers, I’m glad they’ve shared everything so openly. Also by having a high-profile website running on .NET, it stops the argument that .NET is inherently slow.</p>
<p>The post <a href="http://www.mattwarren.org/2014/09/05/stack-overflow-performance-lessons-part-2/">Stack Overflow - performance lessons (part 2)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Stack Overflow - performance lessons (part 1)2014-09-01T00:00:00+00:00http://www.mattwarren.org/2014/09/01/stackoverflow-performance-lessons-part-1
<p>This post is part of a semi-regular series, you can find the other entries <a href="/2014/06/05/roslyn-code-base-performance-lessons-part-1/" target="_blank">here</a> and <a href="/2014/06/10/roslyn-code-base-performance-lessons-part-2/" target="_blank">here</a></p>
<hr />
<p>Before diving into any of the technical or coding aspects of performance, it is really important to understand that the main lesson to take-away from Stack Overflow (the team/product) is that they <strong>take performance seriously</strong>. You can see this from the <a href="http://blog.codinghorror.com/performance-is-a-feature/" target="_blank">blog post</a> that Jeff Atwood wrote, it’s a part of their culture and has been from the beginning:
<a href="http://blog.codinghorror.com/performance-is-a-feature/" target="_blank"><img src="/images/2014/08/performance-is-a-feature-coding-horror-blog.png" alt="performance is a feature - coding horror blog" /></a></p>
<p>But anyone can come up with a catchy line like <strong>“Performance is a Feature!!”</strong>, it only means something if you actually carry it out. Well it’s clear that Stack Overflow have done just this, not only is it a <a href="http://www.alexa.com/siteinfo/stackoverflow.com" target="_blank">Top 100 website</a>, but they’ve done the whole thing with <a href="http://highscalability.com/blog/2014/7/21/stackoverflow-update-560m-pageviews-a-month-25-servers-and-i.html" target="_blank">very few servers</a> and several of those are running <a href="http://blog.cellfish.se/2014/07/lying-with-statistics-and-stackoverflow.html" target="_blank">at only 15% of their capacity</a>, so they can scale up if needed and/or deal with large traffic bursts.</p>
<p><strong>Update (2/9/2014 9:25:35 AM):</strong> Nick Craver <a href="https://twitter.com/Nick_Craver/status/506452974647140352" target="_blank">tweeted me</a> to say that the High Scalability post is a bad summarisation (apparently they have got things wrong before), so take what it says with a grain of salt!</p>
<p><strong>Aside:</strong> If you want even more information about their set-up, I definitely recommend reading the <a href="https://news.ycombinator.com/item?id=8064534" target="_blank">Hacker News discussion</a> and <a href="http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/" target="_blank">this post</a> from <a href="https://twitter.com/Nick_Craver" target="_blank">Nick Craver</a>, one of the Stack Overflow developers.</p>
<p>Interestingly they have gone for <strong>scale-up</strong> rather than <strong>scale-out</strong>, by building their own servers instead of using cloud hosting. The reason for this, <a href="http://blog.codinghorror.com/building-servers-for-fun-and-prof-ok-maybe-just-for-fun/" target="_blank">just to get better performance</a>!</p>
<blockquote>
<p>Why do I choose to build and colocate servers? <strong>Primarily to achieve maximum performance</strong>. That’s the one thing you consistently just do not get from cloud hosting solutions unless you are willing to pay a massive premium, per month, forever: raw, unbridled performance….</p>
</blockquote>
<h3 id="taking-performance-seriously"><strong>Taking performance seriously</strong></h3>
<p>It’s also worth noting that they are even prepared to sacrifice the ability to unit test their code, <a href="http://highscalability.com/blog/2014/7/21/stackoverflow-update-560m-pageviews-a-month-25-servers-and-i.html" target="_blank">because it gives them better performance.</a></p>
<blockquote>
<ul>
<li><strong>Garbage collection driven programming.</strong> SO goes to great lengths to reduce garbage collection costs, skipping practices like TDD, avoiding layers of abstraction, and using static methods. While extreme, the result is highly performing code. When you’re doing hundreds of millions of objects in a short window, you can actually measure pauses in the app domain while GC runs. These have a pretty decent impact on request performance.</li>
</ul>
</blockquote>
<p>Now, this isn’t for everyone and even suggesting that unit testing isn’t needed or useful tends to produce <a href="http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html" target="_blank">strong reactions</a>. But you can see that they are making an informed trade-off and they are prepared to go against the conventional wisdom (<em>“write code that is unit-testing friendly”</em>), because it gives them the extra performance they want. One caveat is that they are in a fairly unique position, they have passionate users that are willing to act as beta-testers, so having less unit test might not harm them, not everyone has that option!</p>
<blockquote>
<ul>
<li>To get around garbage collection problems, only one copy of a class used in templates are created and kept in a cache. <strong>Everything is measured, including GC operation,</strong> from statistics it is known that layers of indirection increase GC pressure to the point of noticeable slowness.</li>
</ul>
</blockquote>
<p>For a more detailed discussion on why this approach to coding can make a difference to GC pressure, see <a href="https://news.ycombinator.com/item?id=8065987" target="_blank">here</a> and <a href="https://news.ycombinator.com/item?id=8066394" target="_blank">here</a>.</p>
<h3 id="sharing-and-doing-everything-out-in-the-open"><strong>Sharing and doing everything out in the open</strong></h3>
<p>Another non-technical lesson is that Stack Overflow are committed to doing things out in the open and sharing what they create as code or <em>lessons-learnt</em> blog posts. Their list of open source projects includes:</p>
<ul>
<li><a href="http://blog.marcgravell.com/2011/04/practical-profiling.html" target="_blank">MiniProfiler</a> - which gives developers an overview of where the time is being spent when a page renders (front-end, back-end, database, etc)</li>
<li><a href="http://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM" target="_blank">Dapper</a> - developed because Entity Framework imposed too large an overhead when materialising the results of a SQL query into <a href="http://en.wikipedia.org/wiki/Plain_Old_CLR_Object" target="_blank">POCO’s</a></li>
<li><a href="https://github.com/kevin-montrose/Jil" target="_blank">Jil</a> - a newly release JSON serialisation/library, developed so that they can get the best possible performance. JSON parsing and serialisation must be a very common operation across their web-servers, so shaving off <a href="https://github.com/kevin-montrose/Jil#serialization" target="_blank">microseconds from the existing libraries</a> is justified.</li>
<li><a href="http://blog.marcgravell.com/2014/04/technical-debt-case-study-tags.html" target="_blank">TagServer</a> - a custom .NET service that was written to make the <a href="http://stackoverflow.com/tags" target="_blank">complex tag searches</a> quicker than they would be if done directly in SQL Server.</li>
<li><a href="https://github.com/opserver/Opserver" target="_blank">Opserver</a> - fully featured monitoring tool, giving their operation engineers a deep-insight into what their servers are doing in production.</li>
</ul>
<p><a href="http://miniprofiler.com" target="_blank"><img src="/images/2014/08/MiniProfiler.png" /></a></p>
<p>All these examples show that they are not afraid to write their own tools when the existing ones aren’t up-to scratch, don’t have the features they need or don’t give the performance they require.</p>
<h3 id="measure-profile-and-display"><strong>Measure, profile and display</strong></h3>
<p>As shown by the development of Opserver, they care about measuring performance accurately even (or especially) in production. Take a look at the images below and you can see not only the detailed level of information they keep, but how it is displayed in a way that makes is easy to see what is going on (there are also <a href="http://imgur.com/a/dawwf" target="_blank">more screenshots</a> available).</p>
<p><a href="http://imgur.com/a/dawwf" target="_blank"><img src="/images/2014/08/opserver-MiniProfiler.png" /></a></p>
<p><a href="http://imgur.com/a/dawwf" target="_blank"><img src="/images/2014/08/opserver.png" /></a></p>
<p>Finally I really like their guidelines for achieving good observability in a production system. They serve as a really good check-list of things you need to do if you want to have any chance of knowing what your system up to in production. I would image these steps and the resulting screens they designed into Opserver have been built up over several years of monitoring and fixing issues in the Stack Overflow sites, so they are battle-hardened!</p>
<blockquote>
<p><strong>5 Steps to Achieving Good Observability:</strong>
In order to achieve good observability an SRE team (often in conduction with the rest of the organization) needs to do the following steps.</p>
<ul>
<li>Instrument your systems by publishing metrics and events</li>
<li>Gather those metrics and events in a queryable data store(s)</li>
<li>Make that data readily accessible</li>
<li>Highlight metrics that are, or are trending towards abnormal or out of bounds behavior</li>
<li>Establish the resources to drill down into abnormal or out of bounds behavior</li>
</ul>
</blockquote>
<h3 id="next-time"><strong>Next time</strong></h3>
<p>Next time I’ll look at some concrete examples of performance lessons for the open source projects that SO have set-up, including the crazy tricks they use in Jil, their <a href="https://github.com/kevin-montrose/Jil" target="_blank">JSON serialisation library</a>.</p>
<p>The post <a href="http://www.mattwarren.org/2014/09/01/stackoverflow-performance-lessons-part-1/">Stack Overflow - performance lessons (part 1)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
How to mock sealed classes and static methods2014-08-14T00:00:00+00:00http://www.mattwarren.org/2014/08/14/how-to-mock-sealed-classes-and-static-methods
<p><a href="http://www.typemock.com/" target="_blank">Typemock</a> & <a href="http://www.telerik.com/products/mocking.aspx" target="_blank">JustMock</a> are 2 commercially available mocking tools that let you achieve something that should be impossible. Unlike all other mocking frameworks, they let you mock <strong>sealed classes, static</strong> and <strong>non-virtual methods</strong>, but how do they do this?</p>
<h4><strong>Dynamic Proxies</strong></h4>
<p>Firstly it’s worth covering how regular mocking frameworks work with virtual methods or interfaces. Suppose you have a class you want to mock, like so:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">TestingMocking</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">virtual</span> <span class="k">void</span> <span class="nf">MockMe</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">..</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>At runtime the framework will generate a <em>mocked</em> class like the one below. As it inherits from <code>TestingMocking</code> you can use it instead of your original class, but the <em>mocked</em> method will be called instead.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">DynamicProxy</span> <span class="p">:</span> <span class="n">TestingMocking</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">override</span> <span class="k">void</span> <span class="nf">MockMe</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">..</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This is achieved using the <a href="http://msdn.microsoft.com/en-us/library/system.reflection.emit.dynamicmethod(v=vs.110).aspx" target="_blank">DynamicMethod</a> class available in <a href="http://msdn.microsoft.com/en-us/library/System.Reflection.Emit(v=vs.110).aspx" target="_blank">System.Reflection.Emit</a>, this <a href="http://www.mindscapehq.com/blog/index.php/2011/11/27/reflection-performance-and-runtime-code-generation/" target="_blank">blog post</a> contains a nice overview and <a href="https://twitter.com/billwagner" target="_blank">Bill Wagner</a> has put together a <a href="https://bitbucket.org/BillWagner/codemashstuntcoding/src/c449bf1c6b703b34d1e086f1a0f527757f4720c2/StuntCodingUtilities/DynamicConverter.cs?at=default#cl-14" target="_blank">more complete example</a> that gives you a better idea of what is involved. I found that once you discover dynamic code generation is possible, you realise that it is used everywhere, for instance:</p>
<ul>
<li><a href="http://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM" target="_blank">Dapper</a> (see <a href="https://gist.github.com/SamSaffron/893878" target="_blank">this gist</a> for ver1)</li>
<li><a href="http://www.codingodyssey.com/2010/04/08/viewing-generated-proxy-code-in-the-entity-framework/" target="_blank">Entity Framework</a> (it enables lazy-loading when doing Code-First)</li>
<li><a href="https://github.com/mgravell/protobuf-net/blob/15174a09ee3223c8805b3ef81c1288879c746dfa/protobuf-net/Compiler/CompilerContext.cs#L309" target="_blank">protobuf-net</a></li>
<li><a href="https://github.com/JamesNK/Newtonsoft.Json/blob/bbe7eaf852b41ecdfb4817b9bd2f1fc9432abc1a/Src/Newtonsoft.Json/Utilities/DynamicReflectionDelegateFactory.cs#L43" target="_blank">Json.NET</a></li>
<li><a href="https://github.com/AutoMapper/AutoMapper/blob/f6bce50e7040db6142f19eef5dff9dd4e6071168/src/AutoMapper/Mappers/DataReaderMapper.cs#L121" target="_blank">AutoMapper</a> </li>
<li>and many more!</li>
</ul>
<p>BTW if you ever find yourself needing to dynamically emit IL code, I’d recommend using the <a href="http://kevinmontrose.com/2013/02/14/sigil-adding-some-more-magic-to-il/" target="_blank">Sigil library</a> that was created by some of the developers at StackOverflow. It takes away a lot of the pain associated with writing and debugging IL.</p>
<p>However dynamically generated proxies will always run into the limitation that <a href="http://msdn.microsoft.com/en-us/library/aa645767(v=vs.71).aspx" target="_blank">you can’t override non-virtual methods</a> and they also can’t do anything with static methods or sealed class (i.e. classes that can’t be inherited).</p>
<h4><strong>.NET Profiling API and JITCompilationStarted() Method</strong></h4>
<p>How Typemock and JustMock achieve what they do is hinted at in a <a href="http://stackoverflow.com/questions/5556115/open-source-free-alternative-of-typemock-isolator/5563750#5563750" target="_blank">StackOverflow answer by a Typemock employee</a> and is also discussed in <a href="http://www.codethinked.com/static-method-interception-in-net-with-c-and-monocecil" target="_blank">this blog post</a>. But they only talk about the solution, I wanted to actually write a small proof-of-concept myself, to see what is involved.</p>
<p>To start with the <a href="http://msdn.microsoft.com/en-us/library/ms404386(v=vs.110).aspx" target="_blank">.NET profiling API</a> is what makes this possible, but a word of warning, it is a C++ API and it requires you to write a <a href="http://msdn.microsoft.com/en-us/library/bb384493(v=vs.110).aspx#com" target="_blank">COM component</a> to be able to interact with it, you can’t work with it from C#. To get started I used the excellent <a href="https://github.com/sawilde/DDD2011_ProfilerDemo" target="_blank">profiler demo project</a> from Shaun Wilde. If you want to learn more about the profiling API and in particular how you can use it to re-write methods, I really recommend looking at this code step-by-step and also reading the <a href="http://www.slideshare.net/shaun_wilde/net-profilers-and-il-rewriting-ddd-melbourne-2" target="_blank">accompanying slides</a>.</p>
<p>By using the profiling API and in particular the <a href="http://msdn.microsoft.com/en-us/library/ms230586(v=vs.110).aspx" target="_blank">JITCompilationStarted method</a>, we are able to modify the IL of any method being run by the CLR (user code or the .NET runtime), before the JITer compiles it to machine code and it is executed. This means that we can modify a method that originally looks like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">ClassToMock</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">StaticMethodToMock</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"StaticMethodToMock called, returning 42"</span><span class="p">);</span>
<span class="k">return</span> <span class="m">42</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So that instead it does this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">ClassToMock</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">StaticMethodToMock</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// Inject the IL to do this instead!!</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Mocked</span><span class="p">.</span><span class="nf">ShouldMock</span><span class="p">(</span><span class="s">"Profilier.ClassToMock.StaticMethodToMock"</span><span class="p">))</span>
<span class="k">return</span> <span class="n">Mocked</span><span class="p">.</span><span class="nf">MockedMethod</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"StaticMethodToMock called, returning 42"</span><span class="p">);</span>
<span class="k">return</span> <span class="m">42</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For reference, the original IL looks like this:</p>
<pre><code class="language-asm">IL_0000 ( 0) nop
IL_0001 ( 1) ldstr (70)00023F //"StaticMethodToMockWhatWeWantToDo called, returning 42"
IL_0006 ( 6) call (06)000006 //call Console.WriteLine(..)
IL_000B (11) nop
IL_000C (12) ldc.i4.s 2A //return 42;
IL_000E (14) stloc.0
IL_000F (15) br IL_0014
IL_0014 (20) ldloc.0
IL_0015 (21) ret
</code></pre>
<p>and after code injection, it ends up like this:</p>
<pre><code class="language-asm">IL_0000 ( 0) ldstr (70)000135
IL_0005 ( 5) call (0A)00001B //call ShouldMock(string methodNameAndPath)
IL_000A (10) brfalse.s IL_0012
IL_000C (12) call (0A)00001C //call MockedMethod()
IL_0011 (17) ret
IL_0012 (18) nop
IL_0013 (19) ldstr (70)00023F //"StaticMethodToMockWhatWeWantToDo called, returning 42"
IL_0018 (24) call (06)000006 //call Console.WriteLine(..)
IL_001D (29) nop
IL_001E (30) ldc.i4.s 2A //return 42;
IL_0020 (32) stloc.0
IL_0021 (33) br IL_0026
IL_0026 (38) ldloc.0
IL_0027 (39) ret
</code></pre>
<p>And that is the basics of how you can modify any .NET method, it seems relatively simple when you know how! In my simple demo I just add in the relevant IL so that a mocked method is called instead, you can see the C++ code needed to achieve this <a href="https://github.com/mattwarren/DDD2011_ProfilerDemo/blob/master/step5_main_injected_method_object_array/DDDProfiler/CodeInjection.cpp#L279" target="_blank">here</a>. Of course in reality it’s much more complicated, my <a href="https://github.com/mattwarren/DDD2011_ProfilerDemo/commit/9f804cec8ef11b802e020e648180b436a429833f" target="_blank">simple demo</a> only deals with a very simplistic scenario, a static method that returns an <code>int</code>. The commercial products that do this are way more powerful and have to deal with all the issues that you can encounter when you are <strong>re-writing code at the IL level</strong>, for instance if you aren’t careful you get exceptions like this:</p>
<p><a href="https://twitter.com/matthewwarren/status/497876741650907136" target="_blank"><img src="/images/2014/12/exception-when-things-go-wrong.jpg" /></a></p>
<h4><strong>Running the demo code</strong></h4>
<p>If you want to run my demo, you need to open the solution file under <a href="https://github.com/mattwarren/DDD2011_ProfilerDemo/tree/master/step5_main_injected_method_object_array" target="_blank">step5_main_injected_method_object_array</a> and set “ProfilerHost” as the “Start-up Project” (right-click on the project in VS) before you run. When you run it, you should see something like this:</p>
<p><a href="/images/2014/12/mocking-in-action.png" target="_blank"><img src="/images/2014/12/mocking-in-action.png" alt="Mocking in action" /></a></p>
<p>You can see the C# code that controls the mocking below. At the moment the API in the demo is fairly limited, it only lets you turn mocking on/off and set the value that is returned from the mocked method.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Without mocking enabled (the default)</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="k">new</span> <span class="kt">string</span><span class="p">(</span><span class="sc">'#'</span><span class="p">,</span> <span class="m">90</span><span class="p">));</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Calling ClassToMock.StaticMethodToMock() (a static method in a sealed class)"</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="n">ClassToMock</span><span class="p">.</span><span class="nf">StaticMethodToMock</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Result: "</span> <span class="p">+</span> <span class="n">result</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="k">new</span> <span class="kt">string</span><span class="p">(</span><span class="sc">'#'</span><span class="p">,</span> <span class="m">90</span><span class="p">)</span> <span class="p">+</span> <span class="s">"n"</span><span class="p">);</span>
<span class="c1">// With mocking enabled, doesn't call the static method, calls mocked version instead</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="k">new</span> <span class="kt">string</span><span class="p">(</span><span class="sc">'#'</span><span class="p">,</span> <span class="m">90</span><span class="p">));</span>
<span class="n">Mocked</span><span class="p">.</span><span class="n">SetReturnValue</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Turning ON mocking of Profilier.ClassToMock.StaticMethodToMock"</span><span class="p">);</span>
<span class="n">Mocked</span><span class="p">.</span><span class="nf">Configure</span><span class="p">(</span><span class="s">"ProfilerTarget.ClassToMock.StaticMethodToMock"</span><span class="p">,</span> <span class="n">mockMethod</span><span class="p">:</span> <span class="k">true</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Calling ClassToMock.StaticMethodToMock() (a static method in a sealed class)"</span><span class="p">);</span>
<span class="n">result</span> <span class="p">=</span> <span class="n">ClassToMock</span><span class="p">.</span><span class="nf">StaticMethodToMock</span><span class="p">();</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">"Result: "</span> <span class="p">+</span> <span class="n">result</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="k">new</span> <span class="kt">string</span><span class="p">(</span><span class="sc">'#'</span><span class="p">,</span> <span class="m">90</span><span class="p">)</span> <span class="p">+</span> <span class="s">"n"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h4><strong>Other Uses for IL re-writing</strong></h4>
<p>Again once you learn about this mechanism, you realise that it is used in lots of places, for instance</p>
<ul>
<li>profilers, see <a href="http://stackoverflow.com/questions/6527597/how-does-the-redgate-profiler-actually-work/6528758#6528758" target="_blank">this SO answer</a> for more info (<a href="http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/" target="_blank">Ants</a> and <a href="http://www.jetbrains.com/profiler/" target="_blank">JetBrains</a>)</li>
<li>test coverage (<a href="http://www.ncover.com/" target="_blank">NCover</a>)</li>
<li>productions monitoring systems</li>
</ul>
<p><a href="http://www.reddit.com/r/csharp/comments/2dk0zt/how_to_mock_sealed_classes_and_static_methods/" target="_blank">Discuss on /r/csharp</a></p>
<p>The post <a href="http://www.mattwarren.org/2014/08/14/how-to-mock-sealed-classes-and-static-methods/">How to mock sealed classes and static methods</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Know thy .NET object memory layout (Updated 2014-09-03)2014-07-04T00:00:00+00:00http://www.mattwarren.org/2014/07/04/know-thy-net-object-memory-layout
<p>Apologies to <a href="https://twitter.com/nitsanw" target="_blank">Nitsan Wakart</a>, from whom I shamelessly stole the <a href="http://psy-lob-saw.blogspot.co.uk/2013/05/know-thy-java-object-memory-layout.html" target="_blank">title of this post</a>!</p>
<h4><strong>tl;dr</strong></h4>
<p>The .NET port of <a href="https://github.com/HdrHistogram/HdrHistogram" target="_blank">HdrHistogram</a> can control the field layout within a class, using the same technique that the original Java code does.</p>
<hr />
<p>Recently I’ve spent some time porting HdrHistogram from <a href="https://github.com/HdrHistogram/HdrHistogram/tree/master/src/main/java/org/HdrHistogram" target="_blank">Java</a> to <a href="https://github.com/HdrHistogram/HdrHistogram/tree/master/src/main/csharp" target="_blank">.NET</a>, it’s been great to learn a bit more about Java and get a better understanding of some low-level code. In case you’re not familiar with it, the goals of HdrHistogram are to:</p>
<ol>
<li>Provide an accurate mechanism for measuring latency at a full-range of percentiles (99.9%, 99.99% etc)</li>
<li>Minimising the overhead needed to perform the measurements, so as to not impact your application</li>
</ol>
<p>You can find a full explanation of what is does and how point 1) is achieved in the <a href="http://giltene.github.io/HdrHistogram/" target="_blank">project readme</a>.</p>
<h3><strong>Minimising overhead</strong></h3>
<p>But it’s the 2nd of the points that I’m looking at in this post, by answering the question</p>
<blockquote>
How does HdrHistogram minimise its overhead?
</blockquote>
<p>But first it makes sense to start with the why, well it turns out it’s pretty simple. HdrHistogram is meant for measuring low-latency applications, if it had a large overhead or caused the GC to do extra work, then it would negatively affect the performance of the application is was meant to be measuring.</p>
<p>Also imagine for a minute that HdrHistogram took <em>1/10,000th</em> of a second (0.1 milliseconds or 100,000 nanoseconds) to record a value. If this was the case you could only hope to accurately record events lasting down to a millisecond (<em>1/1,000th</em> of a second), anything faster would not be possible as the overhead of recording the measurement would take up too much time.</p>
<p>As it is HdrHistogram is much faster than that, so we don’t have to worry! From the <a href="http://giltene.github.io/HdrHistogram/" target="_blank">readme</a>:</p>
<blockquote>
Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs.
</blockquote>
<p>So how does it achieve this, well it does a few things:</p>
<ol>
<li>It doesn't do any memory allocations when storing a value, all allocations are done up front when you create the histogram. Upon creation you have to specify the range of measurements you would like to record and the precision. For instance if you want to record timings covering the range from 1 nanosecond (ns) to 1 hour (3,600,000,000,000 ns), with 3 decimal places of resolution, you would do the following:<br />
<code>Histogram histogram = new Histogram(3600000000000L, 3);</code></li>
<li>Uses a few low-level tricks to ensure that storing a value can be done as fast as possible. For instance putting the value in the right bucket (array location) is a <a href="https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/csharp/AbstractHistogram.cs#L1600" target="_blank">constant lookup</a> (no searching required) and on top of that it makes use of some nifty <a href="https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/csharp/Utilities/MiscUtilities.cs#L16" target="_blank">bit-shifting</a> to ensure it happens as fast as possible.</li>
<li>Implements a slightly strange class-hierarchy to ensure that fields are laid out in the right location. It you look at the source you have <a href="https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L78" target="_blank">AbstractHistogram</a> and then the seemingly redundant class <a href="https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L32" target="_blank">AbstractHistogramBase</a>, why split up the fields up like that? <del datetime="2014-09-03T08:35:56+00:00">Well the comments give it away a little bit, it's due to <strong>false-sharing</strong></del></li>
</ol>
<h3><strong>False sharing</strong></h3>
<p><strong>Update (2014-09-03):</strong> As pointed out by Nitsan in <a href="/2014/07/04/know-thy-net-object-memory-layout/comment-page-1/#comment-152">the comments</a>, I got the wrong end of the stick with this entire section. It’s not about false-sharing at all, it’s the opposite, I’ll quote him to make sure I get it right this time!</p>
<blockquote>
The effort made in HdrHistogram towards controlling field ordering is not about False Sharing but rather towards ensuring certain fields are more likely to be loaded together as they are clumped together, thus avoiding a potential extra read miss.
</blockquote>
<p><del datetime="2014-09-03T08:35:56+00:00">So what is false sharing, to find out more I recommend reading Martin Thompson’s <a href="http://mechanical-sympathy.blogspot.co.uk/2011/07/false-sharing.html" target="_blank">excellent post</a> and this <a href="http://psy-lob-saw.blogspot.co.uk/2014/06/notes-on-false-sharing.html" target="_blank">equally good one</a> from Nitsan Wakart. But if you’re too lazy to do that, it’s summed up by the image below (from Martin’s post).</del></p>
<p><a href="http://mechanical-sympathy.blogspot.co.uk/2011/07/false-sharing.html" target="_blank"><img src="/images/2014/07/8ad85-cache-line.png" alt="CPU Cache lines" class="aligncenter" /></a></p>
<p align="center"><b>Image from the Mechanical Sympathy blog</b></p>
<p><del datetime="2014-09-03T08:35:56+00:00">The problem is that a CPU pulls data into its cache in lines, even if your code only wants to read a single variable/field. If 2 threads are reading from 2 fields (X and Y in the image) that are next to each other in memory, the CPU running a thread will invalidate the cache of the other CPU when it pulls in a line of memory. This invalidation costs time and in high-performance situations can slow down your program.</del></p>
<p><del datetime="2014-09-03T08:35:56+00:00">The opposite is also true, you can gain performance by ensuring that fields you know are accessed in succession are located together in memory. This means that once the first field is pulled into the CPU cache, subsequent accesses will be cheaper as the fields will be <em>“Hot”</em>. It is this scenario HdrHistogram is trying to achieve, but how do you know that fields in a .NET object are located together in memory?</del></p>
<h3><a name="analysing_memory_layout"></a> <strong>Analysing the memory layout of a .NET Object</strong></h3>
<p>To do this you need to drop down into the debugger and use the excellent <a href="http://msdn.microsoft.com/en-us/library/bb190764(v=vs.110).aspx" target="_blank">SOS or Son-of-Strike extension</a>. This is because the <a href="http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.90).aspx" target="_blank">.NET JITter</a> is free to reorder fields as it sees fit, so the order you put the fields in your class does not determine the order they end up. The JITter changes the layout to minimise the space needed for the object and to make sure that fields are aligned on byte boundaries, it does this by packing them in the most efficient way.</p>
<p>To test out the difference between the Histogram with a class-hierarchy and without, the following code was written (you can find HistogramAllInOneClass in <a href="//gist.github.com/mattwarren/d7e56a3709d347862141" target="_blank">this gist</a>):</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Histogram</span> <span class="n">testHistogram</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Histogram</span><span class="p">(</span><span class="m">3600000000000L</span><span class="p">,</span> <span class="m">3</span><span class="p">);</span>
<span class="n">HistogramAllInOneClass</span> <span class="n">combinedHistogram</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">HistogramAllInOneClass</span><span class="p">();</span>
<span class="n">Debugger</span><span class="p">.</span><span class="nf">Launch</span><span class="p">();</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">KeepAlive</span><span class="p">(</span><span class="n">combinedHistogram</span><span class="p">);</span> <span class="c1">// put a breakpoint on this line</span>
<span class="n">GC</span><span class="p">.</span><span class="nf">KeepAlive</span><span class="p">(</span><span class="n">testHistogram</span><span class="p">);</span>
</code></pre></div></div>
<p>Then to actually test it, you need to perform the following steps:</p>
<ol>
<li>Set the build to <strong>Release</strong> and <strong>x86</strong></li>
<li>Build the test and then launch your .exe from <strong>OUTSIDE</strong> Visual Studio (VS), i.e. by double-clicking on it in Windows Explorer. You must not be debugging in VS when it starts up, otherwise the .NET JITter won't perform any optimisations.</li>
<li>When the "Just-In-Time Debugger" prompt pops up, select the instance of VS that is already opened (not a NEW one)</li>
<li>Then check "Manually choose the debugging engines." and click "Yes"</li>
<li>Finally make sure "Managed (...)", "Native" AND <strong>"Managed Compatibility Mode"</strong> are checked</li>
</ol>
<p>Once the debugger has connected back to VS, you can type the following commands in the “Immediate Window”:</p>
<ol>
<li><code>.load sos</code></li>
<li><code>!DumpStackObjects</code></li>
<li><code>DumpObj <ADDRESS></code> (where ADDRESS is the the value from the "Object" column in Step 2.)</li>
</ol>
<p>If all that works, you will end up with an output like below:</p>
<p><a href="/images/2014/07/hdrhistogram-field-layout.png"><img src="/images/2014/07/hdrhistogram-field-layout.png" alt="HdrHistogram - field layout" /></a></p>
<h3><strong>Update (2014-09-03)</strong></h3>
<p>Since first writing this blog post, I came across a really clever technique for getting the offsets of fields <strong>in code</strong>, something that I initially thought was impossible. The full <a href="https://github.com/kevin-montrose/Jil/blob/519a0c552e9fb93a4df94eed0b2f9804271f2fef/Jil/Serialize/Utils.cs#L320" target="_blank">code to achieve this</a> comes from the Jil JSON serialiser and was written to ensure that it accessed fields in the <a href="https://github.com/kevin-montrose/Jil#optimizing-member-access-order" target="_blank">most efficient order</a>.</p>
<p>It is based on a very clever trick, it dynamically emits IL code, making use of the <a href="http://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldflda(v=vs.110).aspx" target="_blank"><strong>Ldflda</strong></a> instruction. This is code you could not write in C#, but are able to write directly in IL.</p>
<blockquote>
The <strong>ldflda</strong> instruction pushes the address of a field located in an object onto the stack. The object must be on the stack as an object reference (type O), a managed pointer (type &), an unmanaged pointer (type native int), a transient pointer (type *), or an instance of a value type. The use of an unmanaged pointer is not permitted in verifiable code. The object's field is specified by a metadata token that must refer to a field member.
</blockquote>
<p>By putting this code into my project, I was able to verify that it gives exactly the same field offsets that you can see when using the SOS technique (above). So it’s a nice technique and the only option if you want to get this information <em>without</em> having to drop-down into a debugger.</p>
<h3><strong>Results</strong></h3>
<p>After all these steps we end up with the results shown in the images below, where the rows are ordered by the “Offset” value.</p>
<p><a href="/images/2014/07/hdrhistogram-with-hierachy2.png"><img src="/images/2014/07/hdrhistogram-with-hierachy2.png" alt="HdrHistogram (with Hierachy)" /></a></p>
<p align="center"><b>AbstractHistogramBase.cs -> AbstractHistogram.cs -> Histogram.cs</b></p>
<p>You can see that with the class hierarchy in place, the fields remain grouped as we want them to (shown by the orange/green/blue highlighting). What is interesting is that the JITter has still rearranged fields within a single group, preferring to put Int64 (long) fields before Int32 (int) fields in this case. This is seen by comparing the ordering of the “Field” column with the “Offset” one, where the values in the “Field” column represent the original ordering of the fields as they appear in the source code.</p>
<p>However when we put all the fields in a single class, we lose the grouping:</p>
<p><a href="/images/2014/07/histogramallinoneclass2.png"><img src="/images/2014/07/histogramallinoneclass2.png" alt="HistogramAllInOneClass" /></a></p>
<p align="center"><b>Equivalent fields all in one class</b></p>
<h3><strong>Alternative Technique</strong></h3>
<p>To achieve the same effect you can use the <a href="http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute(v=vs.110).aspx" target="_blank">StructLayout attribute</a>, but this requires that you calculate all the offsets yourself, which can be cumbersome:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">StructLayout</span><span class="p">(</span><span class="n">LayoutKind</span><span class="p">.</span><span class="n">Explicit</span><span class="p">,</span> <span class="n">Size</span> <span class="p">=</span> <span class="m">28</span><span class="p">,</span> <span class="n">CharSet</span> <span class="p">=</span> <span class="n">CharSet</span><span class="p">.</span><span class="n">Ansi</span><span class="p">)]</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">HistogramAllInOneClass</span>
<span class="p">{</span>
<span class="c1">// "Cold" accessed fields. Not used in the recording code path:</span>
<span class="p">[</span><span class="nf">FieldOffset</span><span class="p">(</span><span class="m">0</span><span class="p">)]</span>
<span class="k">internal</span> <span class="kt">long</span> <span class="n">identity</span><span class="p">;</span>
<span class="p">[</span><span class="nf">FieldOffset</span><span class="p">(</span><span class="m">8</span><span class="p">)]</span>
<span class="k">internal</span> <span class="kt">long</span> <span class="n">highestTrackableValue</span><span class="p">;</span>
<span class="p">[</span><span class="nf">FieldOffset</span><span class="p">(</span><span class="m">16</span><span class="p">)]</span>
<span class="k">internal</span> <span class="kt">long</span> <span class="n">lowestTrackableValue</span><span class="p">;</span>
<span class="p">[</span><span class="nf">FieldOffset</span><span class="p">(</span><span class="m">24</span><span class="p">)]</span>
<span class="k">internal</span> <span class="kt">int</span> <span class="n">numberOfSignificantValueDigits</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If you are interested, the full results of this test <a href="/images/2014/07/hdrhistogram-field-layout1.xlsx" target="_blank">are available</a></p>
<p>The post <a href="http://www.mattwarren.org/2014/07/04/know-thy-net-object-memory-layout/">Know thy .NET object memory layout (Updated 2014-09-03)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Measuring the impact of the .NET Garbage Collector - An Update2014-06-23T00:00:00+00:00http://www.mattwarren.org/2014/06/23/measuring-the-impact-of-the-net-garbage-collector-an-update
<h4><strong>tl;dr</strong></h4>
<p>Measuring performance accurately is hard. But it is a whole lot easier if someone with experience takes the time to explain your mistakes to you!!</p>
<hr />
<p>This is an update to my <a href="/2014/06/18/measuring-the-impact-of-the-net-garbage-collector/" title="Measuring the impact of the .NET Garbage Collector" target="_blank">previous post</a>, if you haven’t read that, you might want to go back and read it first.</p>
<p>After I published that post, Gil Tene (<a href="http://twitter.com/giltene" title="Gil Tene - Twitter" target="_blank">@GilTene</a>) the author of <a href="http://www.azulsystems.com/downloads/jHiccup" target="_blank">jHiccup</a>, was kind enough to send me an email pointing out a few things I got wrong! It’s great that he took the time to do this and so (with his permission), I’m going to talk through his comments.</p>
<p>Firstly he pointed out that the premise for my investigation wasn’t in-line what jHiccup reports. So instead answering the question:</p>
<blockquote>
<strong>what % of pauses do what?</strong>
</blockquote>
<p>jHiccup answers a different question:</p>
<blockquote>
<strong>what % of my operations will see what minimum possible latency levels?</strong>
</blockquote>
<p>He also explained that I wasn’t measuring only GC pauses. This was something which I alluded to in my post, but didn’t explicitly point out.</p>
<blockquote>
...I suspect that your current data is somewhat contaminated by hiccups that are not GC pauses (normal blips of 2+ msec due to scheduling, etc.). Raising the 2 msec recording threshold (e.g. to 5 or 10msec) may help with that, but then you may miss some actual GC pauses in your report. There isn't really a good way around this, since "very short" GC pauses and "other system noise" overlap in magnitude.
</blockquote>
<p>So in summary, it is better to describe my tests as measuring <strong>any pauses in a program</strong>, not just GC pauses. Again quoting from Gil:</p>
<blockquote>
Over time (and based on experience), I think you may find that just using the jHiccup approach of <strong>"whatever is stopping my apps from running"</strong> will become natural, and that you'll stop analyzing the pure "what percent of GC pauses do what" question (if you think about it, the answer to that question is meaningless to applications).
</blockquote>
<p>This is so true, it really doesn’t matter what is slowing your app down or causing the user to experience unacceptable pauses. What matters is finding out if and how often this is happening and then doing something about it.</p>
<h4><strong>Tweaks made</strong></h4>
<p>He also suggested some tweaks to make to the code (emphasis mine):</p>
<blockquote>
<ol>
<li><strong>Record everything (good and bad):</strong>
You current code only records pauses (measurements above 2msec). To report from a "% of operations" viewpoint, you need to record everything, unconditionally. As you probably see in jHiccup, <strong>what I record as hiccups is the measured time minus the expected sleep time</strong>. Recording everything will have the obvious effect of shifting the percentile levels to the right.</li>
<li><strong>Correct for coordinated omission.</strong>
My "well trained" eye sees clear evidence of coordinated omission in your current charts (which is fine for "what % of pauses" question, but not for a "what % of operations" question): <strong>any vertical jumps in latency on a percentile chart are a strong indication of coordinated omission</strong>. While it is possible to have such jumps be "valid" and happening without coordinated omission in cases where the concurrently measured transactions are "either fast or slow, without blocking anything else" (e.g. a web page takes either 5msec or 250msec, and never any other number in between), these are very rare in the wild, and never happen in a jHiccup-like measurement. <strong>Then, whenever you see a 200 msec measurement, it also means that you "should have seen" measurements with the values 198, 196, 194, ... 4, but never got a chance to</strong>.</li>
</ol>
</blockquote>
<p>Based on these 2 suggestions, the code to record the timings becomes the following:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">timer</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Stopwatch</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">sleepTimeInMsecs</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">timer</span><span class="p">.</span><span class="nf">Restart</span><span class="p">();</span>
<span class="n">Thread</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="n">sleepTimeInMsecs</span><span class="p">);</span>
<span class="n">timer</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="c1">// Record the pause (using the old method, for comparison)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span> <span class="p">></span> <span class="m">2</span><span class="p">)</span>
<span class="n">_oldhistogram</span><span class="p">.</span><span class="nf">recordValue</span><span class="p">(</span><span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">);</span>
<span class="c1">// more accurate method, correct for coordinated omission</span>
<span class="n">_histogram</span><span class="p">.</span><span class="nf">recordValueWithExpectedInterval</span><span class="p">(</span>
<span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span> <span class="p">-</span> <span class="n">sleepTimeInMsecs</span><span class="p">,</span> <span class="m">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>To see what difference this made to the graphs I re-ran the test, this time just in Server GC mode. You can see the changes on the graph below, the dotted lines are the original (inaccurate) mode and the solid lines show the results after they have been corrected for coordinated omission.
<a href="/images/2014/06/gc-pause-times-comparision-corrected-for-coordinated-omission.png" target="_blank"><img src="/images/2014/06/gc-pause-times-comparision-corrected-for-coordinated-omission.png" alt="GC Pause Times - comparision (Corrected for Coordinated Omission)" /></a></p>
<h4><strong>Correcting for Coordinated Omission</strong></h4>
<p>This is an interesting subject and after becoming aware of it, I’ve spent some time reading up on it and trying to understand it more deeply. One way to comprehend it, is to take a look at the code in HdrHistogram that handles it:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">recordCountAtValue</span><span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="k">value</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">expectedIntervalBetweenValueSamples</span> <span class="p"><=</span> <span class="m">0</span><span class="p">)</span>
<span class="k">return</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">long</span> <span class="n">missingValue</span> <span class="p">=</span> <span class="k">value</span> <span class="p">-</span> <span class="n">expectedIntervalBetweenValueSamples</span><span class="p">;</span>
<span class="n">missingValue</span> <span class="p">>=</span> <span class="n">expectedIntervalBetweenValueSamples</span><span class="p">;</span>
<span class="n">missingValue</span> <span class="p">-=</span> <span class="n">expectedIntervalBetweenValueSamples</span><span class="p">)</span>
<span class="p">{</span>
<span class="nf">recordCountAtValue</span><span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="n">missingValue</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>As you can see it fills in all the missing values, from 0 to the value you are actually storing.</p>
<p>It is comforting to know that I’m not alone in making this mistake, the authors of Disruptor and log4j2 both made the <a href="https://groups.google.com/forum/#!msg/mechanical-sympathy/icNZJejUHfE/BfDekfBEs_sJ" target="_blank">same mistake</a> when measuring percentiles in their high-performance code.</p>
<p>Finally if you want some more information on Coordinated Omission and the issue it is trying to prevent, take a look at <a href="http://www.javaadvent.com/2013/12/how-not-to-measure-latency.html" target="_blank">this post</a> from the Java Advent calendar (you need to scroll down past the calendar to see the actual post). The main point is that without correcting for it, you will be getting inaccurate percentile values, which kind-of defeats the point of making accurate performance measurements in the first place!</p>
<p>The post <a href="http://www.mattwarren.org/2014/06/23/measuring-the-impact-of-the-net-garbage-collector-an-update/">Measuring the impact of the .NET Garbage Collector - An Update</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Measuring the impact of the .NET Garbage Collector2014-06-18T00:00:00+00:00http://www.mattwarren.org/2014/06/18/measuring-the-impact-of-the-net-garbage-collector
<p>There is an <a href="/2014/06/23/measuring-the-impact-of-the-net-garbage-collector-an-update/" title="Measuring the impact of the .NET Garbage Collector – An Update" target="_blank">update to this post</a>, based on feedback I received.</p>
<hr />
<p>In my <a href="/2014/06/10/roslyn-code-base-performance-lessons-part-2/" title="Roslyn code base – performance lessons (part 2)" target="_blank">last post</a> I talked about the techniques that the Roslyn team used to minimise the effect of the Garbage Collector (GC). Firstly I guess its worth discussing what the actual issue is.</p>
<h4><strong>GC Pauses and Latency</strong></h4>
<p>In early versions of the .NET CLR, garbage collection was a “Stop the world” event, i.e. before a GC could happen all the threads in your program had to be brought to a safe place and suspended. If your ASP.NET MVC app was in the middle of serving a request, it would not complete until after the GC finished and the latency for that user would be much higher than normal. This is exactly the issue that Stackoverflow ran into a few years ago, in their <a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" title="Stackoverflow battles with the .NET GC" target="_blank">battles with the .NET Garbage Collector</a>. If you look at the image below (from that blog post), you can see the spikes in response times of over 1 second, caused by Gen 2 collections.</p>
<p><a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank"><img src="/images/2014/06/SO-Battle-with-the-GC.png" alt="Spikes in Stackoverflow response times due to Gen 2 collections" class="aligncenter" /></a></p>
<p>However in the .NET framework 4.5 there were <a href="http://blogs.msdn.com/b/dotnet/arc7hive/2012/07/20/the-net-framework-4-5-includes-new-garbage-collector-enhancements-for-client-and-server-apps.aspx" title=".NET 4.5 GC Enhancements" target="_blank">enhancements to the GC</a> brought in that can help mitigate these (emphasis mine)</p>
<blockquote>
The new background server GC in the .NET Framework 4.5 offloads <strong>much</strong> of the GC work associated with a full blocking collection to dedicated background GC threads that can run concurrently with user code, resulting in <strong>much shorter</strong> (less noticeable) pauses. One customer reported a 70% decrease in GC pause times.
</blockquote>
<p>But as you can see from the quote, this doesn’t get rid of pauses completely, it just minimises them. Even the <a href="http://msdn.microsoft.com/library/system.runtime.gclatencymode(v=vs.110).aspx" title="Sustained low-latency GC mode" target="_blank">SustainedLowLatency</a> mode isn’t enough, <em>“The collector <strong>tries</strong> to perform only generation 0, generation 1, and concurrent generation 2 collections. <strong>Full blocking collections may still occur</strong> if the system is under memory pressure.”</em> If you want a full understanding of the different modes, you can see some nice diagrams on <a href="http://msdn.microsoft.com/en-us/library/ee787088.aspx#background_server_garbage_collection" title="GC modes" target="_blank">this MSDN page.</a></p>
<p>I’m not in any way being critical or dismissive of these improvements. GC is a really hard engineering task, you need to detect and clean-up the unused memory of a program, whilst it’s running, ensuring that you don’t affect it’s correctness in any way and making sure you add as little overhead as possible. Take a look at <a href="http://channel9.msdn.com/Shows/Going+Deep/Maoni-Stephens-and-Andrew-Pardoe-CLR-4-Inside-Background-GC" title="Inside background GC" target="_blank">this video</a> for some idea of what’s involved. The .NET GC is a complex and impressive piece of engineering, but there are still some scenarios where it can introduce pauses to your program.</p>
<p><strong>Aside:</strong> In the Java world there is a commercial <a href="http://www.azulsystems.com/zing/pgc" target="_blank">Pauseless Garbage Collector</a> available from Azul Systems. It uses a <a href="http://www.azulsystems.com/sites/default/files//images/wp_pgc_zing_v5.pdf" title="Zing white papar" target="_blank">patented technique</a> to offer <em>“Predictable, consistent garbage collection (GC) behavior”</em> and <em>“Predictable, consistent application response times”</em>, but there doesn’t seem to be anything like that in the .NET space.</p>
<h4><strong>Detecting GC Pauses</strong></h4>
<p>But how do you detect GC pauses, well the first thing to do is take a look at the properties of the process using the excellent <a href="http://technet.microsoft.com/en-gb/sysinternals/bb896653.aspx" title="Process Explorer" target="_blank">Process Explorer</a> tool from <a href="http://technet.microsoft.com/en-gb/sysinternals" title="Sysinternals" target="_blank">Sysinternals</a> (imagine Task Manager on steroids). It will give you a summary like the one below, the number of <em>Gen 0/1/2 Collections</em> and <em>% Time in GC</em> are the most interesting values to look at.</p>
<p><a href="/images/2014/06/time-in-gc.png" target="_blank"><img src="/images/2014/06/time-in-gc.png" alt="Time in GC" /></a></p>
<p>But the limitation of this is that it has no context, what <em>% of time in GC</em> is too high, how many <em>Gen 2 collections</em> are too many? What effect does GC actually have on your program, in terms of pauses that a customer will experience?</p>
<h4><strong>jHiccup and HdrHistogram</strong></h4>
<p>To gain a better understanding, I’ve used some of the ideas from the excellent <a href="http://www.azulsystems.com/downloads/jHiccup" target="_blank">jHiccup</a> Java tool. Very simply, it starts a new thread in which the following code runs:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">timer</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Stopwatch</span><span class="p">();</span>
<span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">timer</span><span class="p">.</span><span class="nf">Restart</span><span class="p">();</span>
<span class="n">Thread</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="m">1</span><span class="p">);</span>
<span class="n">timer</span><span class="p">.</span><span class="nf">Stop</span><span class="p">();</span>
<span class="c1">// allow a little bit of leeway</span>
<span class="k">if</span> <span class="p">(</span><span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span> <span class="p">></span> <span class="m">2</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Record the pause</span>
<span class="n">_histogram</span><span class="p">.</span><span class="nf">recordValue</span><span class="p">(</span><span class="n">timer</span><span class="p">.</span><span class="n">ElapsedMilliseconds</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Any pauses that this thread experiences will also be seen by the other threads running in the program and whilst these pauses aren’t <em>guaranteed</em> to be caused by the GC, it’s the most likely culprit.</p>
<p><strong>Note:</strong> this uses the <a href="https://github.com/HdrHistogram/HdrHistogram/tree/master/src/main/csharp" target="_blank">.NET port</a> of the Java <a href="https://github.com/HdrHistogram/HdrHistogram" target="_blank">HdrHistogram</a>, a full explanation of what HdrHistogram offers and how it works is available in the <a href="https://github.com/HdrHistogram/HdrHistogram/blob/master/README" target="_blank">Readme</a>. But the summary is that it offers a non-intrusive way of collecting samples in a histogram, so that you can then produce a graph of the <a href="http://www.azulsystems.com/sites/www.azulsystems.com/azul/images/jhiccup/3gb-hotspot-hiccup.gif" target="_blank">50%/99%/99.9%/99.99% percentiles</a>. It does this by allocating all the memory it needs up front, so after start-up it performs no allocations during usage. The benefit of recording full percentile information like this is that you get a much fuller view of any outlying values, compared to just recording a simple average.</p>
<p>To trigger garbage collection, the test program also runs several threads, each executing the code below. In a loop, each thread creates a large <code>string</code> and a <code>byte array</code>, to simulate what a web server might be doing when generating a response to a web request (for instance from de-serialising some Json and creating a HTML page). Then to ensure that the objects are kept around long enough, they are both put into a Least Recently Used (LRU) cache, that holds the 2000 most recent items.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">processingThreads</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Thread</span><span class="p">(()</span> <span class="p">=></span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">threadCounter</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">string</span><span class="p">((</span><span class="kt">char</span><span class="p">)</span><span class="n">random</span><span class="p">.</span><span class="nf">Next</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="p">+</span> <span class="m">1</span><span class="p">),</span> <span class="m">1000</span><span class="p">);</span>
<span class="n">stringCache</span><span class="p">.</span><span class="nf">Set</span><span class="p">(</span><span class="n">text</span><span class="p">.</span><span class="nf">GetHashCode</span><span class="p">(),</span> <span class="n">text</span><span class="p">);</span>
<span class="c1">// Use 80K, If we are > 85,000 bytes = LOH and we don't want these there</span>
<span class="kt">var</span> <span class="n">bytes</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">byte</span><span class="p">[</span><span class="m">80</span> <span class="p">*</span> <span class="m">1024</span><span class="p">];</span>
<span class="n">random</span><span class="p">.</span><span class="nf">NextBytes</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
<span class="n">bytesCache</span><span class="p">.</span><span class="nf">Set</span><span class="p">(</span><span class="n">bytes</span><span class="p">.</span><span class="nf">GetHashCode</span><span class="p">(),</span> <span class="n">bytes</span><span class="p">);</span>
<span class="n">threadCounter</span><span class="p">++;</span>
<span class="n">Thread</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="m">1</span><span class="p">);</span> <span class="c1">// So we don't thrash the CPU!!!!</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div></div>
<h4><strong>Test Results</strong></h4>
<p>The test was left running for 10 mins, in each of the following GC modes:</p>
<ul>
<li>Workstation Batch (non-concurrent)</li>
<li>Workstation Interactive (concurrent)</li>
<li>Server Batch (non-concurrent)</li>
<li>Server Interactive (concurrent)</li>
</ul>
<p>The results are below, you can clearly see that Server modes offer lower pauses than the Workstation modes and that Interactive (concurrent) mode is also an improvement over Batch mode. The graph shows pause times on the Y axis (so lower is better) and the X axis plots the percentiles, scaled logarithmically.</p>
<p><a href="/images/2014/06/gc-pause-times-comparision.png" target="_blank"><img src="/images/2014/06/gc-pause-times-comparision.png" alt="GC Pause Times - comparision" /></a></p>
<p>If we take a closer look at just the 99% percentile, i.e. the value (at) which “1 in 100” pauses are less than, the difference is even clearer. Here you can see that the Workstation modes have pauses upto 25 milliseconds, compared to 10 milliseconds for the Server modes.</p>
<p><a href="/images/2014/06/gc-pause-times-upto-99-comparision.png" target="_blank"><img src="/images/2014/06/gc-pause-times-upto-99-comparision.png" alt="GC Pause Times - upto 99% comparision" /></a></p>
<h4><strong>SustainedLowLatency Mode</strong></h4>
<p>As a final test, the program was run using the new <a href="http://msdn.microsoft.com/library/system.runtime.gclatencymode(v=vs.110).aspx" title="Sustained low-latency GC mode" target="_blank">SustainedLowLatency</a> mode, to see what effect that has. In the graph below you can see this offers lower pause times, although it isn’t able to sustain these for an unlimited period of time. After 10 minutes we start to see longer pauses compared to those we saw when running the test for just 5 minutes.</p>
<p><a href="/images/2014/06/gc-pause-times-comparision-including-sustainedlowlatency.png" target="_blank"><img src="/images/2014/06/gc-pause-times-comparision-including-sustainedlowlatency.png" alt="GC Pause Times - comparision including SustainedLowLatency" /></a></p>
<p>It’s worth noting that there is a trade-off to take into account when using this mode, <a href="http://msdn.microsoft.com/en-US/library/bb384202(v=vs.110).aspx" title="Sustained low-latency GC mode" target="_blank">SustainedLowLatency mode is</a>:</p>
<blockquote>
For applications that have time-sensitive operations for a contained but potentially longer duration of time during which interruptions from the garbage collector could be disruptive. For example, applications that need quick response times as market data changes during trading hours.
This mode results in a larger managed heap size than other modes. Because it does not compact the managed heap, higher fragmentation is possible. Ensure that sufficient memory is available.
</blockquote>
<p>All the data used in these tests can be found in the spreadsheet <a href="/images/2014/06/gc-pause-times-comparision.xlsx">GC Pause Times - comparision</a></p>
<p><a href="http://www.reddit.com/r/csharp/comments/28ghp8/measuring_the_impact_of_the_net_garbage_collector/" target="_blank">Discuss on the csharp sub-reddit</a></p>
<p><a href="https://news.ycombinator.com/item?id=8282310" target="_blank">Discuss on Hacker News</a></p>
<p>The post <a href="http://www.mattwarren.org/2014/06/18/measuring-the-impact-of-the-net-garbage-collector/">Measuring the impact of the .NET Garbage Collector</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Roslyn code base - performance lessons (part 2)2014-06-10T00:00:00+00:00http://www.mattwarren.org/2014/06/10/roslyn-code-base-performance-lessons-part-2
<p>In my <a href="/2014/06/05/roslyn-code-base-performance-lessons-part-1/" target="_blank">previous post</a>, I talked about some of the general performance lessons that can be learnt from the <a href="https://roslyn.codeplex.com/" target="_blank">Roslyn</a> project. This post builds on that and looks at specific examples from the code base.</p>
<p>Generally the performance gains within Roslyn come down to one thing:</p>
<blockquote>
<strong>Ensuring the garbage collector does the least possible amount of work</strong>
</blockquote>
<p>.NET is a managed language and one of the features that it provides is memory management, via the garbage collector (GC). However GC doesn’t come for free, it has to find and inspect all the <em>live</em> objects (and their descendants) in the “mark” phrase, before cleaning up any <em>dead</em> objects in the “sweep” phase.</p>
<p>This is backed up by the guidance provided for <a href="https://roslyn.codeplex.com/wikipage?title=How%20to%20Contribute&referringTitle=Documentation" target="_blank">contributing to Roslyn</a>, from the <strong>Coding Conventions</strong> section:</p>
<blockquote>
<ul>
<li>Avoid allocations in compiler hot paths:
<ul>
<li>Avoid LINQ.</li>
<li>Avoid using foreach over collections that do not have a struct enumerator.</li>
<li>Consider using an object pool. There are many usages of object pools in the compiler to see an example.</li>
</ul></li>
</ul>
</blockquote>
<p>It’s interesting to see LINQ specifically called out, I think it’s great and it does allow you to write much more declarative and readable code, in fact I’d find it hard to write C# code without it. But behind the scenes there are lots of hidden allocations going on and they are not always obvious. If you don’t believe me, have a go at <a href="http://joeduffyblog.com/2010/09/06/the-premature-optimization-is-evil-myth/" target="_blank">Joe Duffy’s quiz</a> (about 1/2 way through the blog post).</p>
<h2><strong>Techniques used</strong></h2>
<p>There are several techniques used in the Roslyn code base that either minimise or eliminate allocations, thus giving the GC less work to do. One important characteristic all of them share is that they are only applied to “Hot Paths” within the code. <a href="http://c2.com/cgi/wiki?PrematureOptimization" target="_blank">Optimising prematurely</a> is never recommended, nor is using optimisations on parts of your code that are rarely exercised. You need to measure and identify the <strong>bottlenecks</strong> and understand what are the <strong>hot-paths</strong> through your code, <strong>before</strong> you apply any optimisations.</p>
<h4><strong>Avoiding allocations altogether</strong></h4>
<p>Within the .NET framework there are many methods that cause allocations, for instance String.Trim(..) or any LINQ methods. To combat this we can find several examples where code was specifically re-written, for example:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">// PERF: Avoid calling string.Trim() because that allocates a new substring</code>
<ul>
<li>from <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.CSharp/Compiler/DocumentationCommentCompiler.cs#731" target="_blank">DocumentationCommentCompiler.cs</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">// PERF: Expansion of "assemblies.Any(a => a.NamespaceNames.Contains(namespaceName))" to avoid allocating a lambda.</code>
<ul>
<li>from <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.Workspaces/Shared/Extensions/IAssemblySymbolExtensions.cs#17" target="_blank">IAssemblySymbolExtensions.cs</a></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">// PERF: Beware ImmutableArray.Builder.Sort allocates a Comparer wrapper object</code>
<ul>
<li>from <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis/Collections/ImmutableArrayExtensions.cs#439" target="_blank">ImmutableArrayExtensions.cs</a></li>
</ul>
</li>
</ul>
<p>Another good lesson is that each improvement is annotated with a “<code>// PERF:</code>” comment to explain the reasoning, I guess this is to prevent another developer coming along and re-factoring the code to something more readable (at the expense of performance).</p>
<h4><strong>Object pooling with a Cache</strong></h4>
<p>Another strategy used is <a>object pooling</a> where rather than <em>newing</em> up objects each time, old ones are re-used. Again this helps relieve pressure on the GC as less objects are allocated and the ones that are, stick around for a while (often the life-time of the program). This is a sweet-spot for the .NET GC, as per the advice from Rico Mariani’s excellent <a href="http://msdn.microsoft.com/en-us/library/ms973837.aspx#dotnetgcbasics_topic4" target="_blank">Garbage Collector Basics and Performance Hints</a>:</p>
<blockquote>
<strong>Too Many Almost-Long-Life Objects</strong>
Finally, perhaps the biggest pitfall of the generational garbage collector is the creation of many objects, which are neither exactly temporary nor are they exactly long-lived. These objects can cause a lot of trouble, because they will not be cleaned up by a gen0 collection (the cheapest), as they will still be necessary, and they might even survive a gen1 collection because they are still in use, but they soon die after that.
</blockquote>
<p>We can see how this was handled in Roslyn in the code below from <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.Workspaces/Formatting/StringBuilderPool.cs" target="_blank">StringBuilderPool</a>, that makes use of the more generic <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.Workspaces/Utilities/ObjectPools/PooledObject.cs#12" target="_blank">ObjectPool</a> infrastructure and <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.Workspaces/Utilities/ObjectPools/SharedPools.cs#c5905bf81da0a7e8" target="_blank">helper classes</a>. Obviously it was such a widely used pattern that they build a generic class to handle the bulk of the work, making it easy to write an implementation for a specific type, including StringBuilder, Dictionary, HashSet and Stream.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">internal</span> <span class="k">static</span> <span class="k">class</span> <span class="nc">StringBuilderPool</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">StringBuilder</span> <span class="nf">Allocate</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">SharedPools</span><span class="p">.</span><span class="n">Default</span><span class="p"><</span><span class="n">StringBuilder</span><span class="p">>().</span><span class="nf">AllocateAndClear</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Free</span><span class="p">(</span><span class="n">StringBuilder</span> <span class="n">builder</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SharedPools</span><span class="p">.</span><span class="n">Default</span><span class="p"><</span><span class="n">StringBuilder</span><span class="p">>().</span><span class="nf">ClearAndFree</span><span class="p">(</span><span class="n">builder</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">static</span> <span class="kt">string</span> <span class="nf">ReturnAndFree</span><span class="p">(</span><span class="n">StringBuilder</span> <span class="n">builder</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SharedPools</span><span class="p">.</span><span class="n">Default</span><span class="p"><</span><span class="n">StringBuilder</span><span class="p">>().</span><span class="nf">ForgetTrackedObject</span><span class="p">(</span><span class="n">builder</span><span class="p">);</span>
<span class="k">return</span> <span class="n">builder</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Having a class like this makes sense, a large part of compiling is parsing and building strings. Not only do they use a StringBuilder to save lots of temporary String allocations, but they also re-use StringBuilder objects to save the GC the work of having to clean up these.</p>
<p>Interestingly enough this technique has also been used inside the .NET framework itself, you can see this in the code below from <a href="http://referencesource.microsoft.com/#mscorlib/system/text/stringbuildercache.cs#40" target="_blank">StringBuilderCache.cs</a>. Again, the comment shows that the optimisation was debated and a trade-off between memory usage and efficiency was weighed up.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">internal</span> <span class="k">static</span> <span class="k">class</span> <span class="nc">StringBuilderCache</span>
<span class="p">{</span>
<span class="c1">// The value 360 was chosen in discussion with performance experts as a compromise between using</span>
<span class="c1">// as little memory (per thread) as possible and still covering a large part of short-lived</span>
<span class="c1">// StringBuilder creations on the startup path of VS designers.</span>
<span class="k">private</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">MAX_BUILDER_SIZE</span> <span class="p">=</span> <span class="m">360</span><span class="p">;</span>
<span class="p">[</span><span class="n">ThreadStatic</span><span class="p">]</span>
<span class="k">private</span> <span class="k">static</span> <span class="n">StringBuilder</span> <span class="n">CachedInstance</span><span class="p">;</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">StringBuilder</span> <span class="nf">Acquire</span><span class="p">(</span><span class="kt">int</span> <span class="n">capacity</span> <span class="p">=</span> <span class="n">StringBuilder</span><span class="p">.</span><span class="n">DefaultCapacity</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">capacity</span> <span class="p"><=</span> <span class="n">MAX_BUILDER_SIZE</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">StringBuilder</span> <span class="n">sb</span> <span class="p">=</span> <span class="n">StringBuilderCache</span><span class="p">.</span><span class="n">CachedInstance</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sb</span> <span class="p">!=</span> <span class="k">null</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Avoid stringbuilder block fragmentation by getting a new StringBuilder</span>
<span class="c1">// when the requested size is larger than the current capacity</span>
<span class="k">if</span> <span class="p">(</span><span class="n">capacity</span> <span class="p"><=</span> <span class="n">sb</span><span class="p">.</span><span class="n">Capacity</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">StringBuilderCache</span><span class="p">.</span><span class="n">CachedInstance</span> <span class="p">=</span> <span class="k">null</span><span class="p">;</span>
<span class="n">sb</span><span class="p">.</span><span class="nf">Clear</span><span class="p">();</span>
<span class="k">return</span> <span class="n">sb</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">StringBuilder</span><span class="p">(</span><span class="n">capacity</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">void</span> <span class="nf">Release</span><span class="p">(</span><span class="n">StringBuilder</span> <span class="n">sb</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sb</span><span class="p">.</span><span class="n">Capacity</span> <span class="p"><=</span> <span class="n">MAX_BUILDER_SIZE</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">StringBuilderCache</span><span class="p">.</span><span class="n">CachedInstance</span> <span class="p">=</span> <span class="n">sb</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">static</span> <span class="kt">string</span> <span class="nf">GetStringAndRelease</span><span class="p">(</span><span class="n">StringBuilder</span> <span class="n">sb</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">string</span> <span class="n">result</span> <span class="p">=</span> <span class="n">sb</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
<span class="nf">Release</span><span class="p">(</span><span class="n">sb</span><span class="p">);</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Which you then use like this:</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">builder</span> <span class="p">=</span> <span class="n">StringBuilderCache</span><span class="p">.</span><span class="nf">Acquire</span><span class="p">();</span>
<span class="c1">// use the builder as normal, i.e. builder.Append(..)</span>
<span class="kt">string</span> <span class="n">data</span> <span class="p">=</span> <span class="n">StringBuilderCache</span><span class="p">.</span><span class="nf">GetStringAndRelease</span><span class="p">(</span><span class="n">builder</span><span class="p">);</span>
</code></pre></div></div>
<h4><strong>Specialised Collections</strong> <a name="SpecialisedCollections"></a></h4>
<p>Finally there are several examples where custom collections were written to ensure that excessive memory overhead wasn’t created. For instance in the code below from <a href="http://source.roslyn.codeplex.com/#Microsoft.CodeAnalysis.CSharp/Symbols/Metadata/PE/PENamedTypeSymbol.cs#673" target="_blank">PENamesTypeSymbol.cs</a>, you can clearly see that specific collections are re-used whenever there are 0, 1 or up-to 6 items.
The comment clearly spells out the trade-off, so whilst these collections aren’t as efficient when doing lookups (<em>O(n)</em> v <em>O(log n)</em>), they are more efficient in terms of space and so the trade-off is worth it. It’s also interesting to note that the size of <em>6</em> wasn’t chose randomly, in their tests they found that 50% of the time there were 6 items or fewer, so these optimisations will give a performance gain in the <em>majority</em> of scenarios.</p>
<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="k">static</span> <span class="n">ICollection</span><span class="p"><</span><span class="kt">string</span><span class="p">></span> <span class="nf">CreateReadOnlyMemberNames</span><span class="p">(</span><span class="n">HashSet</span><span class="p"><</span><span class="kt">string</span><span class="p">></span> <span class="n">names</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">names</span><span class="p">.</span><span class="n">Count</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="m">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">SpecializedCollections</span><span class="p">.</span><span class="n">EmptySet</span><span class="p"><</span><span class="kt">string</span><span class="p">>();</span>
<span class="k">case</span> <span class="m">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">SpecializedCollections</span><span class="p">.</span><span class="nf">SingletonCollection</span><span class="p">(</span><span class="n">names</span><span class="p">.</span><span class="nf">First</span><span class="p">());</span>
<span class="k">case</span> <span class="m">2</span><span class="p">:</span>
<span class="k">case</span> <span class="m">3</span><span class="p">:</span>
<span class="k">case</span> <span class="m">4</span><span class="p">:</span>
<span class="k">case</span> <span class="m">5</span><span class="p">:</span>
<span class="k">case</span> <span class="m">6</span><span class="p">:</span>
<span class="c1">// PERF: Small collections can be implemented as ImmutableArray.</span>
<span class="c1">// While lookup is O(n), when n is small, the memory savings are more valuable.</span>
<span class="c1">// Size 6 was chosen because that represented 50% of the names generated in the Picasso end to end.</span>
<span class="c1">// This causes boxing, but that's still superior to a wrapped HashSet</span>
<span class="k">return</span> <span class="n">ImmutableArray</span><span class="p">.</span><span class="nf">CreateRange</span><span class="p">(</span><span class="n">names</span><span class="p">);</span>
<span class="k">default</span><span class="p">:</span>
<span class="k">return</span> <span class="n">SpecializedCollections</span><span class="p">.</span><span class="nf">ReadOnlySet</span><span class="p">(</span><span class="n">names</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4><strong>Summary</strong></h4>
<p>All in all there are some really nice tricks and examples of high-performance code to be found in the Roslyn code base. But the main lesson is that you should <strong>never</strong> be applying these for the sake of it or because they look clever. They should only be used in conjunction with proper performance testing that identifies the parts of your code that cause it to run slower than your performance goals.</p>
<p>Interestingly enough StackOverflow faced a similar issue a few years back, see <a href="http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector" target="_blank">In managed code we trust, our recent battles with the .NET Garbage Collector</a>, but that’s a subject for another post, stay tuned!</p>
<hr />
<p><strong>Update:</strong> Since first writing this post, I’ve found out about an excellent talk <a href="http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333" target="_blank">Essential Truths Everyone Should Know about Performance in a Large Managed Codebase</a>, in which Dustin Campbell (a Roslyn Program Manager), talks about how they improved the performance of Roslyn. I can’t recommend it enough.</p>
<p>The post <a href="http://www.mattwarren.org/2014/06/10/roslyn-code-base-performance-lessons-part-2/">Roslyn code base - performance lessons (part 2)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>
Roslyn code base - performance lessons (part 1)2014-06-05T00:00:00+00:00http://www.mattwarren.org/2014/06/05/roslyn-code-base-performance-lessons-part-1
<p>At <a href="http://www.buildwindows.com/" target="_blank">Build 2014</a> Microsoft open source their next-generation C#/VB.NET compiler, called <a href="http://msdn.microsoft.com/en-us/vstudio/roslyn.aspx" target="_blank">Roslyn</a>. The project is <a href="https://roslyn.codeplex.com/" target="_blank">hosted on codeplex</a> and you can even <a href="http://source.roslyn.codeplex.com/" target="_blank">browse the source</a>, using the new Reference Source browser, which is itself <a href="http://www.hanselman.com/blog/AnnouncingTheNewRoslynpoweredNETFrameworkReferenceSource.aspx" target="_blank">powered by Roslyn</a> (that’s some crazy, meta-recursion going on there!).</p>
<p><a href="http://source.roslyn.codeplex.com/" target="_blank"><img src="/images/2014/05/roslyn-reference-source-browser.png" alt="Roslyn reference source browser" /></a></p>
<p><strong>Easter Eggs</strong></p>
<p>There’s also some nice info available, for instance you can <a href="http://source.roslyn.codeplex.com/i.txt" target="_blank">get a summary</a> of the number of lines of code, files etc, you can also list the <a href="http://source.roslyn.codeplex.com/Projects.txt" target="_blank">projects</a> and <a href="http://source.roslyn.codeplex.com/Assemblies.txt" target="_blank">assemblies</a>.</p>
<blockquote>
<pre><strong>ProjectCount=50
DocumentCount=4,366
LinesOfCode=2,355,329
BytesOfCode=96,850,461
DeclaredSymbols=124,312
DeclaredTypes=6,649
PublicTypes=2,076</strong></pre>
</blockquote>
<p>That’s ~2.3 million lines of code, across over 4300 files! (HT to Slaks for <a href="http://blog.slaks.net/2014-02-24/inside-the-new-net-reference-source/#toc_2" target="_blank">pointing out this functionality</a>)</p>
<p><strong>Being part of the process</strong></p>
<p>If you are in any way interested in new C# language features or just want to find out how a compiler is built, this is really great news. On top of this, not only have Microsoft open sourced the code, the entire process is there for everyone to see. You can get a peek behind the scenes of the <a href="https://roslyn.codeplex.com/discussions/546465" target="_blank">C# Design Meetings</a>, debate possible new features <a href="https://roslyn.codeplex.com/discussions/542963" target="_blank">with some of the designers</a> and best of all, they seem <a href="https://roslyn.codeplex.com/discussions/541194#post1240018" target="_blank">genuinely interested</a> in getting community feedback.</p>
<p><strong>Taking performance seriously</strong></p>
<p>But what I find really interesting is the performance lessons that can be learned. As outlined in <a href="http://blogs.msdn.com/b/csharpfaq/archive/2014/01/15/roslyn-performance-matt-gertz.aspx" target="_blank">this post</a>, performance is something they take seriously. It’s not really surprising, the new compiler can’t afford to be slower than the old C++ one and developers are pretty demanding customers, so any performance issues would be noticed and complained about.</p>
<p>To give you an idea of what’s involved, here’s the list of scenarios that they measure the performance against.</p>
<ul style="color:#424242;">
<li>Build timing of small, medium, and (very) large solutions</li>
<li>Typing speed when working in the above solutions, including “goldilocks” tests where we slow the typing entry to the speed of a human being</li>
<li>IDE feature speed (navigation, rename, formatting, pasting, find all references, etc…)</li>
<li>Peak memory usage for the above solutions</li>
<li>All of the above for multiple configurations of CPU cores and available memory</li>
</ul>
<p>And to make sure that they have accurate measurements and that they know as soon as performance has degraded (<strong>emphasis mine</strong>):</p>
<blockquote>
<p style="color:#424242;">These are all <strong>assessed & reported daily</strong>, so that we can identify & repair any check-in that introduced a regression as soon as possible, before it becomes entrenched. Additionally, we don’t just check for the average time elapsed on a given metric; <strong>we also assess the 98<sup>th</sup> & 99.9<sup>th</sup> percentiles</strong>, because we want good performance all of the time, not just some of the time.</p>
</blockquote>
<p>There’s lots of information about why <a href="http://filipspagnoli.wordpress.com/2009/11/13/lies-damned-lies-and-statistics-21-misleading-averages/" target="_blank">just using averages is a bad idea</a>, particularly when <a href="http://mvolo.com/why-average-latency-is-a-terrible-way-to-track-website-performance-and-how-to-fix-it/" target="_blank">dealing with response times</a>, so it’s good to see that they are using percentiles as well. But running performance tests as part of their daily builds and tracking those numbers over time, is a really good example of taking performance seriously, <strong>performance testing wasn’t left till the end, as an after-thought</strong>.</p>
<p>I’ve worked on projects where the performance targets were at best vague and ensuring they were met was left till right at the end, after all the features had been implemented. It’s much harder to introduce performance testing at this time, we certainly don’t do it with unit testing, so why with performance testing?</p>
<p>This ties in with <a href="http://blog.codinghorror.com/performance-is-a-feature/" target="_blank">Stack Overflow mantra</a>:</p>
<blockquote>
<h4><strong>Performance is a feature</strong></h4>
</blockquote>
<p>Next time I’ll be looking at specific examples of performance enhancements made in the code base and what problems they are trying to solve.</p>
<p>The post <a href="http://www.mattwarren.org/2014/06/05/roslyn-code-base-performance-lessons-part-1/">Roslyn code base - performance lessons (part 1)</a> first appeared on my blog <a href="http://mattwarren.org">Performance is a Feature!</a></p>
<a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=641373" rel="tag" style="display:none">CodeProject</a>