Jekyll2022-11-11T19:07:16+00:00/feed.xmlCosmo’s BlogThoughts about Computer Sciences, Mathematics, Music and Other.
Setting Up a Private Limited Company in Ireland2020-03-28T20:14:37+00:002020-03-28T20:14:37+00:00/2020/03/28/ltd<p><strong>Disclaimer.</strong> This post’s only intent is to share my personal experience with setting up a private limited company in Ireland. You should not take it as legal or tax advice and you should seek the advice of professionals if you are willing to build your company in Ireland. <strong>No warranties</strong> are given that my experience can apply to your project and <strong>no liability</strong> will be accepted.</p>
<h1 id="what-is-a-private-limited-company-ltd-">What is a Private Limited Company (Ltd.) ?</h1>
<h2 id="private">Private</h2>
<p>Private as opposed to Public. A private limited company cannot sell shares on the stock market to raise capital. A private company can become public when it meets the <a href="https://www.enterprise-ireland.com/en/Invest-in-Emerging-Companies/Source-of-Private-Capital/Public-Listing.html">adequate financial criteria</a>.</p>
<h2 id="limited">Limited</h2>
<p>Limited as in “liability <strong>limited by shares</strong>”. This aspect is maybe the most important part of what a <strong>Ltd.</strong> is.</p>
<p>A private limited company has <strong>shareholders</strong> who share the ownership of the company. Being a shareholder grants several <strong>privileges</strong>, among them:</p>
<ul>
<li>
<p>A <strong>decision power</strong>: shareholders participate in business critical decisions and their decision power is proportional to their share</p>
</li>
<li>
<p>The ability to <strong>benefit</strong> from the company’s profit: when the company generates a profit, this profit can be re-distributed to shareholders in proportion to their share</p>
</li>
</ul>
<p>However these privileges come with a <strong>liability</strong>: the shareholders are <strong>liable to the company</strong> with respect to the price that their share was set at when they joined the business. In other words, the shareholders <strong>owe to the company</strong> the money that was agreed that they’d put in the business in order to get their share of it.</p>
<p>But, most importantly, this liability is <strong>limited</strong> to their share amount. It means that, if the company goes bust and is owing money to creditors, shareholders are only liable for the price of their share (which was agreed when they joined) only. They do not owe anything else.</p>
<p><strong>Limited liability</strong> offers great <strong>protection</strong> to shareholders. For instance, as long as they paid their share, their personal properties (house, cars, other businesses etc..) are not at risk if business goes bad. Sole traders (individuals who trade without a company) do not benefit from this protection whatsoever, the money owed by their business is directly owed by them as individuals.</p>
<h2 id="company">Company</h2>
<p>A <strong>company</strong> is a distinct entity than its owners. It has its own administrative existence: it has a date of birth, it pays its own taxes, and eventually it dies. A big part of managing a company is to keep up to date with the mandatory administrative tasks required by various state authorities.</p>
<p>This post will cover the steps, the benefits and costs of building a private limited company in Ireland and what are the mandatory administrative tasks that need to be performed while it is alive.</p>
<h1 id="why-ireland">Why Ireland?</h1>
<p>In my particular case, the main reason is that I live in Ireland. But, let’s face it, Ireland is notoriously known for its extremely low <strong>Corporation Tax Rate</strong> of 12.5% (see this <a href="https://www.investopedia.com/articles/personal-finance/051915/corporate-tax-rates-highs-and-lows.asp">ranking</a>). It means that the profit generated by an Irish company (the money left from sales after paying for products and overhead) are taxed at 12.5%.</p>
<p>Apart from the tax aspects, the Irish administrative processes are fairly straightforward to understand for any English speaking person.</p>
<p><strong>However</strong> you should have the three following <strong>caveats</strong> in mind:</p>
<ul>
<li>
<p>In order to qualify to the 12.5% Corporate Tax rate, an Irish-incorporated company needs to prove that it is <strong>centrally controlled and managed in Ireland</strong>. The <a href="https://revenue.ie/">Revenue Comissioner</a> is very strict when assessing this condition.</p>
</li>
<li>
<p>There is <strong>20% Whitholding Tax</strong> on dividends in Ireland (there are some exceptions). It means that if your company was to pay you €1000 in dividends, you would receive €800 and €200 would be paid in tax. Hence, low Coporation Tax rate does not mean that you can easily benefit from your company’s profit.</p>
</li>
<li>
<p>There is a <strong>30% Capital Gain Tax</strong> which applies if you decide to sell you shares of the business.</p>
</li>
</ul>
<p>Again, given your particular situation, a professional will give you advice on these points.</p>The steps, the benefits and the costs of setting up a private limited company in Ireland.About Free Energy2019-07-18T11:10:37+00:002019-07-18T11:10:37+00:00/2019/07/18/free-energy<h2 id="introduction">Introduction</h2>
<p>Free energy is a concept which seems to be central in statistical physics, thermodynamics, chemistry and physical biology. More specifically, the idea that free energy is a quantity which systems naturally tend to <strong>minimize</strong> is very developed in those fields.</p>
<p>However, how can we build an intuition about the following textbook definition of free energy:</p>
\[G = E - T S\]
<p>Where:</p>
<ul>
<li>$E$ is the energy (sometimes enthalpy $H$ is used instead of $E$)</li>
<li>$T$ the temperature</li>
<li>$S$ the entropy</li>
</ul>
<p>To begin with, this expression of $G$ seems to be ill-typed because “energy” and “entropy” are not properties of the same kind of objects. Said differently, $E$ is the energy of what ? $S$ is the entropy of what ? Indeed, as presented in our <a href="/2019/04/21/shannon-entropy.html">article on entropy</a>, entropy is a property of a <strong>probability distribution</strong> (or a random variable) while the energy $E$ seems to be the property of a <strong>state</strong> of a physical system. So how can we make sense of the expression of $G$ when $E$ and $S$ seem to be about very different things?</p>
<p>In this article, we answer that question by presenting a formalism of free energy with random variables. We’ll rewrite $G$ as being the property of a system <strong>state random variable</strong> $X$ together with an <strong>energetic valuation function</strong> $v$. We will write:</p>
\[G(X,v) = \mathbb{E}[v(X)] - \tau H(X)\]
<p>Where:</p>
<ul>
<li>$X$ is the random variable indicating in which state (microstate) our system is currently in</li>
<li>$v:\Omega \to \mathbb{R}$ is a function indicating what is the energy of a given state $s\in\Omega$ ($\Omega$ is the set of all states)</li>
<li>$\mathbb{E}[v(X)]$ is the expectation of $v(X)$, i.e. the average energy of the system with valuation $v$</li>
<li>$\tau$ is a non-negative parameter called the ‘‘threshold’’</li>
<li>$H(X)$ is the <a href="/2019/04/21/shannon-entropy.html">Shannon entropy</a> of $X$ (not to be mistaken with some enthalpy $H$)</li>
</ul>
<p>We intentionally use the term “threshold” instead of the term “temperature” because, in this mathematical model we don’t deal with units and physical scales.</p>
<p>Thanks to this formulation we’ll get some intuition on what minimizing $G$ means: finding the best compromise between stability (being in a low energy state) and “chaos” which is the natural ability of a system to explore all its possible states. With the theory of optimization we will be able to show that the probability distribution of $X$ which minimizes $G$ corresponds exactly to the <a href="https://en.wikipedia.org/wiki/Boltzmann_distribution">Boltzmann distribution</a>. Indeed, in this formalism, the principle of free energy minimization directly leads to the Boltzmann distribution and justifies why, in statistical physics, the probability for a system to be at a level of energy $E_i$ is given by:</p>
\[p_i = \frac{1}{Z}e^{-\frac{E_i}{k_B T}}\]
<p>Finally, throughout the article, we’ll illustrate this formalism on a concrete toy example of hybridization of two particles in a 1D-world.</p>
<h2 id="toy-model-of-hybridization">Toy Model of Hybridization</h2>
<h3 id="two-particles-1d-world">Two Particles 1D-World</h3>
<div class="imgcap" style="border: 0px">
<div>
<img src="/assets/free_energy/world.svg" />
</div>
<div class="thecap">Figure 1. Three of the $N^2 + N$ states of the two particles 1D-world of size $N=5$.<br /> The two particles can possibly bind if they are on the same cell.</div>
</div>
<p><br /></p>
<p>Our toy model of hybridization is as follows: we take two distinct particles (a circle and a square in Figure 1 and 2) living in a disrcete, one dimensional, world composed of $N$ cells. These two particles can freely move to any cell of the world, in particular they can both be on the same cell (Figure 1, first and second rows). When they are on the same cell these two particles can “hybridize”, i.e. make a bond between themselves (Figure 1, third row).</p>
<h3 id="micro-and-macro-states">Micro and Macro States</h3>
<p>As shown in Figure 2, this system can be in $N^2+N$ states which we call <strong>microstates</strong>:</p>
<div class="imgcap" style="border: 0px">
<div>
<img src="/assets/free_energy/world2.svg" style="width:70%" />
</div>
<div class="thecap">Figure 2. The $N^2 + N$ microstates of the system and its two macrostates: *bonded* or *not bonded*. Here, $N=5$.</div>
</div>
<p><br /></p>
<p>These $N^2 + N$ microstates can be grouped in two distincts <strong>macrostates</strong> according to whether or not there is a bond between the particles or not. Macrostates are collections of microstates sharing a common property. Here we have two macrostates: <em>bonded</em> and <em>not bonded</em>.</p>
<h3 id="our-system-as-a-random-variable">Our System as a Random Variable</h3>
<p>Let $X$ be the random variable indicating in which microstate the system is. The set of all microstates, $\Omega = \{ S_1, \ldots, S_{N^2 + N} \}$, is given in Figure 2. Formally, $X$ is the identity on $\Omega \to \Omega$.</p>
<p>The question we ask is: <strong>“What is the probability to be in a given microstate ?”</strong>. Equivalently: what is the distribution $p_X$ of the state variable $X$?</p>
<p>In order to answer this question we introduce the concept of <em>energy of a state $S_i$</em>.</p>
<h2 id="free-energy-or-the-fight-between-energy-and-entropy">Free Energy or the Fight Between Energy and Entropy</h2>
<h3 id="some-states-are-more-favored-than-others">Some States are more Favored than Others</h3>
<p>We are going to look at each of our microstate $S_i$ and state how <em>favorable</em> this state is. In other words, for each microstate, we are going to give a score encompassing how much our system favors this state with the intuition that, the more a state is favored the more likely is our system to end up in that state.</p>
<p>This score is the <strong>energy</strong> of the microstate. By convention energies are numbers in $\mathbb{R}$ and the lowest is the score, the more favored is the state. A microstate with energy $E=1000$ will be less more favored than a microstate with energy $E=-1000$. Formally, we are going to construct an energetic valuation function $v:\Omega \to \mathbb{R}$ by defining $v(S_i)$ for $1 \leq i \leq N+N^2$.</p>
<p>In our 1D-world, any microstate where the two particles are <em>bonded</em> will be considered as more favored – i.e having a lower energy – than any microstate where they are <em>not bonded</em>. Furthermore, in our model there is no reason to give a different energetic score to two microstates being in the same macrostate. Indeed, energetically speaking, nothing distinguishes microstates $1$ to $N^2$ (<em>not bonded</em> case) as well as nothing distinguishes microstates $N^2+1$ to $N^2+N$ (<em>bonded</em> case).</p>
<p>Hence we have:</p>
\[\begin{align*} v(S_1) &= v(S_2) = \dots v(S_{N^2}) = E_{\text{not bonded}} = E_0 \\ v(S_{N^2+1}) &= v(S_{N^2+2}) = \dots v(S_{N^2+N}) = E_{\text{bonded}} = E_1 \end{align*}\]
<p>In the following we take $E_{0} = 0$ as a reference energy. The value of $E_{1}$ (negative) will account for how intense is the bond between the two particles. For instance $E_{1} = -100$ will correspond to a situation where that bond is 10 time stronger than a scenario where $E_{1} = -10$. In a case like this one where energy refers to the intensity of a bond, the term <strong>enthalpy</strong> is often used instead of energy.</p>
<h3 id="what-is-a-fair-distribution-on-microstates">What is a Fair Distribution on Microstates?</h3>
<p>Our goal is to construct $p_X = (p_1, \ldots, p_{N^2 + N})$, the probability distribution over microstates: $p_i$ is the probability that the system is in microstate $S_i$. In order to get there we must describe what is a <em>good</em> (or a fair) distribution over the microstates space.</p>
<p>For instance, would it be fair if $p_X$ was uniform, i.e all microstates are equally likely $p_i = \frac{1}{N^2 + N}$ ? No, because of the <strong>energetic</strong> argument. Indeed, microstates corresponding to the <em>bonded</em> macrostate are more favored by the system than <em>not bonded</em> microstates. Hence, our distribution $p_X$ must be biased in favor of the microstates in the <em>bonded</em> macrostate: $S_{N^2 + 1} \dots S_{N^2 + N}$.</p>
<p>Inversely, would it be fair if $p_X$ was concentrated over one particular microstate? For instance if we set $p_{N^2+1} = 1$? No, because of the <strong>entropic</strong> argument. The entropic argument accounts for the chaotic nature of microscopic systems: molecular agitation drives the system to explore its different possible configurations. Molecular agitation limits our ability to predict in which microstate the system is. This entropic effect, in physics, is proportional to the temperature. In our mathematical model, it will be proportional to the <em>threshold</em>.</p>
<p>Gibbs free energy will provide an answer to achieving a good compromise between the energetic and the entorpic argument.</p>
<h3 id="minimizing-free-energy-a-compromise-between-energy-and-entropy">Minimizing Free Energy: a Compromise between Energy and Entropy</h3>
<p>Gibbs (or Helmholtz) free energy is a mathematical formalisation of the intuitive idea of a fight between energy and entropy in microscopic systems. We define it as follows:</p>
\[G(X,v) = \mathbb{E}[v(X)] - \tau H(X)\]
<p>With $\mathbb{E}[v(X)] = \sum_{i} v(S_i) p_i$, the weighted average energy, $\tau \geq 0$ a parameter called the <em>threshold</em> and $H(X)$ the <a href="/2019/04/21/shannon-entropy.html">Shannon entropy</a> of $X$.</p>
<p>If we minimize $G(X,v)$, i.e. find the probability distribution $p_X$ which gives the smallest value of $G$, we achieve an interesting compromise. Indeed, we minimize the <strong>weighted average energy</strong> of the system while <strong>maximizing</strong> the corresponding entropy of the microstates distribution. Note that maximizing Shannon entropy matches the intuitive idea of the entropic argument since we maximize the <strong>lack of predictability</strong> of the random variable $X$ (see our <a href="/2019/04/21/shannon-entropy.html">article</a>). The energetic argument is formalised by the idea of minimizing $\mathbb{E}[v(X)]$, the weighted average energy of our system.</p>
<p>The parameter $\tau$ allows us to linearly control the lack of predictability (or chaos) of the system. If $\tau = 0$, there is no chaos. Minimizing free energy will correspond to deterministically set the system to one of the most favorable state (less energy state). If $\tau = +\infty$, the system microstate is totally unpredictable: minimizing free energy will lead to the uniform distribution on microstates and the system is so unstable that the energetic argument does not hold anymore. In physics, the threshold $\tau$, up to the normalization constant $k_{B}$, corresponds to temperature. Temperature linearly controls the molecular agitation of the system which determines the ability of the system to explore its state space.</p>
<h2 id="experimental-solution-to-free-energy-minimization">Experimental Solution to Free Energy Minimization</h2>
<p>In the case of our 1D-world we can write some code in order to minimize $G(X,\nu)$. Free energy becomes:</p>
\[G(X,\nu) = q_0E_0 + q_1E_1 - \tau H(X)\]
<p>With:</p>
\[H(X) = -( q_{0} \text{log}(\frac{q_0}{N^2}) + q_{1} \text{log}(\frac{q_{1}}{N}))\]
<p>And with $q_0 = N^2 p_0$ and $q_1 = N p_{N^2+1}$ the probabilities of the macrostates <em>bonded</em> and <em>not bonded</em> (recall that $(p_1,\ldots,p_N^2,p_{N^2+1},\ldots,p_{N^2+N})$ is the probability distribution over microstates and the probability distribution of the variable $X$).</p>
<p>Note that we have made the implicit assumption that microstates with the same energy had the same probability (i.e. $p_1 = p_2 = \dots = p_{N^2}$ and $p_{N^2+1} = p_{N^2+2} = \dots = p_{N^2+N}$). This assumption will be confirmed later on with the calculus leading to Boltzmann distribution. As of right now, this assumption helps us writing a feasible optimization routine since we only have to optimize over $(q_0, q_1)$ and not the whole $(p_1,\ldots,p_N^2,p_{N^2+1},\ldots,p_{N^2+N})$.</p>
<p>The following code optimizes \(G(X,\nu)\) in our grid world context for different threshold conditions. You can play with this code interactively in <a href="https://github.com/tcosmo/tcosmo.github.io/tree/master/assets/free_energy/FreeEnergyMinimization.ipynb">this notebook</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
</pre></td><td class="rouge-code"><pre><span class="kn">import</span> <span class="nn">scipy.optimize</span> <span class="k">as</span> <span class="n">opt</span>
<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">LinearConstraint</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">xlogy</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="n">plt</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">'dark_background'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">entropy</span><span class="p">(</span><span class="n">proba_dist</span><span class="p">,</span> <span class="n">world_weights</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="p">(</span><span class="n">xlogy</span><span class="p">(</span><span class="n">proba_dist</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">proba_dist</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">/</span><span class="n">world_weights</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">+</span>
<span class="n">xlogy</span><span class="p">(</span><span class="n">proba_dist</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="n">proba_dist</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">/</span><span class="n">world_weights</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="k">def</span> <span class="nf">free_energy</span><span class="p">(</span><span class="n">proba_dist</span><span class="p">,</span> <span class="n">thresh</span><span class="p">,</span> <span class="n">energies</span><span class="p">,</span> <span class="n">world_weights</span><span class="p">):</span>
<span class="k">return</span> <span class="n">proba_dist</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">energies</span><span class="p">)</span> <span class="o">-</span> <span class="n">thresh</span><span class="o">*</span><span class="n">entropy</span><span class="p">(</span><span class="n">proba_dist</span><span class="p">,</span> <span class="n">world_weights</span><span class="p">)</span>
<span class="n">N_cells</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">N_not_bounded</span> <span class="o">=</span> <span class="n">N_cells</span><span class="o">**</span><span class="mi">2</span>
<span class="n">N_bounded</span> <span class="o">=</span> <span class="n">N_cells</span>
<span class="n">E_not_bounded</span><span class="o">=</span><span class="mi">0</span>
<span class="n">E_bounded</span><span class="o">=-</span><span class="mi">1000</span>
<span class="n">thresh_space</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
<span class="n">proba_mat</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">thresh</span> <span class="ow">in</span> <span class="n">thresh_space</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">opt</span><span class="p">.</span><span class="n">minimize</span><span class="p">(</span><span class="n">free_energy</span><span class="p">,</span> <span class="n">x0</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.5</span><span class="p">]),</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">(</span><span class="n">thresh</span><span class="p">,</span>
<span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">E_not_bounded</span><span class="p">,</span> <span class="n">E_bounded</span><span class="p">]),</span>
<span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">N_not_bounded</span><span class="p">,</span> <span class="n">N_bounded</span><span class="p">])),</span>
<span class="n">constraints</span> <span class="o">=</span> <span class="n">LinearConstraint</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span><span class="mf">1.0</span><span class="p">]),</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="n">proba_mat</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">x</span><span class="p">)</span>
<span class="n">proba_mat</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">proba_mat</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">5</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">thresh_space</span><span class="p">,</span> <span class="n">proba_mat</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">label</span><span class="o">=</span><span class="s">"Proba Not Bounded"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">thresh_space</span><span class="p">,</span> <span class="n">proba_mat</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">label</span><span class="o">=</span><span class="s">"Proba Bounded"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Threshold'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Probability'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Energy/Entropy Trade-off'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This code produces the following output:</p>
<div class="imgcap">
<div>
<img width="80%" src="/assets/free_energy/graph.png" alt="free energy" />
</div>
<div class="thecap">Figure 3. Compromise between energy and entropy in the 1D-world with $N=100$</div>
</div>
<p><br />
Figure 3 illustrates minimization of free energy depending on the threshold parameter. With this graph we realize the compromise made by free energy between energy and entropy. When the threshold is low, the energetic term wins and microstates with the lowest energy (i.e, in the <em>bonded</em> macrostate) are mainly favored ($q_{1} \simeq 1$). However, when the threshold gets very big, the entropic term wins and the solution to the minimization problem is the uniform distribution on microstates. In that case, since there are more microstates in the macrostate <em>not bonded</em>, the system behaves like a biased coin where $q_0 \simeq \frac{N^2}{N^2 + N}$ and $q_1 \simeq \frac{N}{N^2 + N} $.</p>
<p>In <a href="https://github.com/tcosmo/tcosmo.github.io/tree/master/assets/free_energy/FreeEnergyMinimization.ipynb">the notebook</a>, you can experience the effect of other parameters on the overall result. You can for instance try modifying $E_{1}=E_{\text{bonded}}$ or $N$.</p>
<p>For instance, if you take $N=10$ instead of $N=100$, the difference between $\frac{N^2}{N^2 + N}$ and $\frac{N}{N^2 + N}$ becomes less important hence the <em>not bonded</em> case is less favored with $\tau$ big than when $N=100$:</p>
<div class="imgcap">
<div>
<img width="80%" src="/assets/free_energy/graph2.png" alt="free energy" />
</div>
<div class="thecap">Figure 4. Compromise between energy and entropy in the 1D-world N=$10$</div>
</div>
<p><br /></p>
<p>The melting point which is where the two curves meet in Figure 3 and 4 plays an important role in a wet lab experiment since, at it will appear with the next section, it allows to determine the value of $E_1-E_0$ (which is $E_1$ if we set $E_0=0$ as a reference value).</p>
<h2 id="analytical-solution-to-free-energy-minimization-boltzmann-distribution">Analytical Solution to Free Energy Minimization: Boltzmann Distribution</h2>
<p>In the last section, we gave an experimental solution to the problem of Free Energy Minimization which is, given a $\nu$, find the microstates distribution $p_X$ which minimizes $G(X,v)$. In fact, thanks to the theory of <a href="https://en.wikipedia.org/wiki/Lagrange_multiplier">Lagrange Multipliers</a> this problem admits an analytical solution.</p>
<p>Indeed, the solution is uniquely given by:</p>
\[p_i = \frac{1}{Z} e^{-\nu(S_i)/\tau}\]
<p>With $Z=\sum_{i} e^{\nu(S_i)/\tau}$ a normalization factor also called <em>partition function</em>. If you are interested in how to solve this problem with Lagrange Multipliers please see <a href="/assets/free_energy/solution.jpg">this</a> (thanks to Scott Pesme!).</p>
<p>Also, we can note that, as we assumed in the experimental section, microstates with same energies will have the same probabilities.</p>
<p>Finally, in the case of our grid world and according to Boltzmann distribution the probabilities of macrostates <em>bonded</em> and <em>not bonded</em> are given by:</p>
\[\begin{align*}
q_0 &= \sum_{i=1}^{N^2}p_i = \frac{N^2}{Z}e^{-E_0/\tau}\\
q_1 &= \sum_{i=N^2+1}^{N^2+N}p_i = \frac{N}{Z}e^{-E_1/\tau}
\end{align*}\]
<p>The curves of Figure 3 and 4 should thus match the above expressions given by Boltzmann distribution.</p>
<p>The melting threshold $\tau^{\star}$, when $q_0=q_1=0.5$ is interesting because we get:</p>
\[\frac{N^2}{Z}e^{-E_0/\tau^{\star}} = \frac{N}{Z}e^{-E_1/\tau^{\star}} \Leftrightarrow E_1 - E_0 = \tau^{\star}\text{log}(\frac{1}{N})\]
<p>In a physical system, the threshold $\tau$ is in fact $k_{B}T$. Thus, if our toy model was physically meaningful for the hybridization of some particles and if we had an experimental way to determine the melting temperature $T^{\star}$, we could evaluate the energetics of the <em>bonded</em> state (assuming $E_0=0$ is a reference value) by:</p>
\[E_{\text{bonded}} = E_1 = k_{B}T^{\star}\text{log}(\frac{1}{N})\]
<p>In a real world experiment, $N$ would be replaced by some equivalent volumetric quantity.</p>Free energy minimization and the Boltzmann distribution from a mathematical perspective.About Undecidability in Logic2019-05-23T17:49:37+00:002019-05-23T17:49:37+00:00/2019/05/23/about-undecidability<p>The concept of undecidability in logic is often seen as something mysterious or even mystical: a statement (on natural numbers for instance) could be true, false or… undecidable. The goals of this post is to ground the notion of undecidability on earth and to show that it is in fact a very natural concept. We will illustrate the concept of undecidability on a very rudimentary system of axioms: <a href="https://en.wikipedia.org/wiki/Monoid">Monoïds</a>.</p>
<p>We hope that after reading this post, the reader will be convinced that undecidability is not an alternative to truth (a statement is either true or false in a given model) but that undecidability is just a mirror of the weakness of a system of axioms.</p>
<p>A big part of the mystery surrounding undecidability maybe comes from the fact that it plays a major role in Gödel incompletness theorems and that these theorems are often seen as mystical, infinitely deep results. Gödel incompletness theorems are not our subject today but we will, at the end of this post, give a brief description of how they relate to the concept of undecidability.</p>
<p>The take-home message of this blog post is:</p>
<p style="text-align: center;">
Undecidability is a property of a logical statement with respect to a set of axioms.
</p>
<h1 id="monoïds">Monoïds</h1>
<p>Monoïds are objects containing elements which can interact with each others thanks to a composition law. Take two elements $x,y$ of a monoïd $\mathcal{M}$, you can construct $z=x\cdot y$, an other element of $\mathcal{M}$, which is called the <em>composition</em> of $x$ and $y$. A monoïd also has a special element denoted by $e$ called “neutral”.</p>
<p>To be a monoïd, $\mathcal{M}$ must verify the two following axioms:</p>
<ol>
<li>$ \forall x\forall y \forall z,\; x\cdot(y\cdot z) = (x\cdot y )\cdot z $</li>
<li>$ \forall x, \; x\cdot e = e\cdot x = x $</li>
</ol>
<p>The first axiom specifies that parenthesis don’t matter when you are composing elements with each others, $x\cdot(y\cdot z) = (x\cdot y )\cdot z = x\cdot y \cdot z$ while the second one justifies why $e$ is called “neutral”: composing an element $x$ with $e$ wont affect $x$.</p>
<p>The definition of monoïds is very abstract. As a consequence, loads of different objects are monoïds. For instance, the set of natural numbers is a monoïd! Take $\mathbb{N}={0,1\dots,}$ and use addition as the composition law and 0 to be the neutral element. Then you will satisfy both axiom 1 and 2:</p>
<ol>
<li>$\forall x,y,z \in \mathbb{N},\; x+(y+z) = (x+y)+z = x + y + z$</li>
<li>$\forall x \in \mathbb{N},\; x + 0 = 0 + x = x$</li>
</ol>
<p>Weirder objects are monoïds. Consider $\mathcal{A}$, the set of words you can make with the letters “a” and “b”. You can make:</p>
<ol>
<li>The empty word $\epsilon$ (you write nothing)</li>
<li>The word “a”</li>
<li>The word “ab”</li>
<li>The word “aba”</li>
<li>etc…</li>
</ol>
<p>The composition operation becomes string concatenation : $\text{abba}\cdot \text{aab} = \text{abbaaab}$.</p>
<p>This object $\mathcal{A}$ together with the law $\cdot$ meets all the requirement to be a monoïd, indeed, the neutral element is the empty word, for instance: $\text{aaab}\cdot \epsilon = \text{aaab}$ (concatenating with nothing does not change the string) and parenthesis don’t matter while concatenating. However this object looks different from the first monoïd we saw, the set of natural numbers $\mathbb{N}$!</p>
<p>In other words, the very minimal set of axioms that monoïds satisfy allow to construct a wide variety of different looking objects. This set of axiom has an important expressive power because it can shape such different objects.</p>
<p>Undecidability will arise from the fact that some of these objects will have properties that the others don’t have.</p>
<h1 id="groups">Groups</h1>
<p>Let’s consider the following statement, called the “inverse property”:</p>
\[\forall x\, \exists y,\; x\cdot y = y\cdot x = e\]
<p>The element $y$ is called the “inverse” of $x$.</p>
<p>The question that arises is: “is this statement verified by all monoïds” ?</p>
<p>The answer is NO! Indeed, the two examples of monoïds we gave don’t have that property:</p>
<ol>
<li>Take any strictly positive $x\in\mathbb{N}$, there is no element of $y\in\mathbb{N}$ such that $x+y=0$</li>
<li>Take a non empty string like “aabaa”, there is no string you can concatenate to it in order to get back to the empty string “\epsilon”.</li>
</ol>
<p>However, some monoïds do have the “inverse property”! For instance, in the set of negative and positive integers $\mathbb{Z}$ we have:</p>
\[\forall x \in \mathbb{Z},\; x + (-x) = 0\]
<p>Monoïds which have the “inverse property” are called <a href="https://en.wikipedia.org/wiki/Group_(mathematics)">Groups</a>.</p>
<p>What we just saw is that any group is a monoïd but not all monoïds are groups!</p>
<p>For that reason, the sentence $ \forall x\, \exists y\,\, x\cdot y = y\cdot x = e$ is <strong>undecidable</strong> under the axioms of monoïds. The axioms of monoïds are not <em>strong enough</em> to enforce this property in all the objects that satisfy the monoïds axioms.</p>
<p>Finally, keep in mind that in a given model of the axioms of monoïds (a model is an object satisfying the axioms), the “inverse property”, as any property, is either true or false.</p>
<h1 id="wrapping-up">Wrapping Up</h1>
<p>From the Monoïd VS Groups example, we understand that undecidability is a very natural concept. A sentence is undecidable with respect to a system of axioms if not all models of these axioms verify it. Said otherwise, a property is undecidable with respect to a system of axioms when this system of axioms is <strong>too weak</strong> to logically imply this property. From the axioms of monoïds you cannot logically deduce the inverse property!</p>
<p>Undecidability outlines the degrees of freedom that a system of axioms gives you. An undecidable property can live in harmony (i.e. without causing any logical contradiction) with the system of axioms as well as the negation of that property: we are free to have either.</p>
<p>Undecidability is not at all an alternative to being true or false, it is a property of a logical sentence with respect to a given system of axioms. In a specific model of these axioms, a property is always either true or false. However, if the property is undecidable, we will find models of the axioms where the property is true and models where the property is false.</p>
<p>On a side remark, note that characterizing all the objects that satisfy a system of axioms is not an easy task. Consider the axioms of groups (axioms of monoïds + the inverse property). It has been a major task of 20th century mathematics to only describe the finite models of groups axioms (called finite groups). This description resulted in a theorem called the <a href="https://en.wikipedia.org/wiki/Classification_of_finite_simple_groups">Monster Theorem</a> of which proof is ten thousands of pages long.</p>
<h1 id="link-with-gödel-">Link With Gödel ?</h1>
<p>Gödel theorems deal with a specific set of axioms: the axioms of arithmetic, called <a href="https://en.wikipedia.org/wiki/Peano_axioms">Peano Axioms</a>. These axioms formalize the concept of numbers and the operations that we are used to do on them: addition and multiplication.</p>
<p>These theorems construct an undecidable property in the axioms of arithmetic. It was easy for us to outline an undecidable property when looking at monoïds but it is much more tricky when looking at arithmetic. More powerfully, Gödel incompletness theorems show that you will always be able to construct undecidable properties in systems of axioms which “embeds” arithmetic.</p>
<p>In other word, if you have a system of axioms which allows you to do something as elementary as counting, you will be able to construct undecidable properties with respect to this system of axioms.</p>Demystifying the concept of undecidability through monoïds and groups.About Shannon’s Entropy2019-04-21T18:05:37+00:002019-04-21T18:05:37+00:00/2019/04/21/shannon-entropy<h2 id="introduction">Introduction</h2>
<p>The word and the concept of <strong>entropy</strong> might be familiar to you. It is usually associated with disorder, chaos, randomness, unpredictability.
It originally came from physics where it was introduced in 1865 with the theory of thermodynamics. It was revisited in 1948 by the computer scientist <a href="https://en.wikipedia.org/wiki/Claude_Shannon">Claude Shannon</a>
which interpreted it in terms of <strong>lack of information</strong>. This post aims at presenting you this <strong>informational</strong> point
of view on entropy.</p>
<p>Entropy is at the root of Shannon’s <a href="https://en.wikipedia.org/wiki/Information_theory">Information Theory</a>. You might never have heard of Information Theory but you use it every day!! Indeed, among others:</p>
<ul>
<li>It provides the mathematical tools for <a href="https://en.wikipedia.org/wiki/Data_compression">Data Compression</a> such as, in practice: <em>zip</em>, <em>jpeg</em>, <em>mp3</em>, <em>mp4</em>, etc. Huge internet infrastructures such as YouTube or
Facebook use them extensively to store data.</li>
<li>It’s as the source of <a href="https://en.wikipedia.org/wiki/Error_detection_and_correction">Error Correcting Codes</a> that make possible the communication of data with possible loss on the way. Internet’s efficiency deeply depends on it as its communication channels are highly unreliable.</li>
</ul>
<p>The first part of this post is purely informal and requires no particular background. In a second part we present a minimal formalization that leads to the construction of Shannon’s entropy.</p>
<h2 id="an-intuitive-approach-to-shannons-entropy">An Intuitive Approach to Shannon’s Entropy</h2>
<h3 id="random-phenomenons">Random phenomenons</h3>
<p>Entropy is a property of <em>random phenomenons</em>. More precisely, entropy is a property of the probability distribution of a random phenomenon.</p>
<p>A random phenomenon is any event of which outcome can be described by a probability distribution. For instance, the roll of a perfect dice can be seen as a random phenomenon with six possible outcomes (from “one” to “six”) with probability distribution $(1/6,1/6,1/6,1/6,1/6,1/6)$: each outcome is as likely to happen as any other. If I bias my dice in order to more frequently see the outcome “one” and never see neither “five” or “six”, I could end up with the following distribution: $(3/6,1/6,1/6,1/6,0,0)$.</p>
<p>In our case, a probability distribution is a finite set of non-negative numbers summing to one, we can think of them as percentages.</p>
<!-- ### The notion of surprisal
Let's say you give me the following forecast for tomorrow's weather:
<div class="imgcap">
<div>
<img src="/assets/H/forecast.png" style=" height:200px;">
</div>
<div class="thecap">Weather forecast for tomorrow : it's very likely to be sunny</div>
</div>
<br/>
It's 99% likely to be sunny and only 1% likely to be rainy.
Tomorrow when I wake up, there're two possible outcomes:
- It's *sunny*: well I'm not that **surprised**, you told me it was very likely.
- It's *rainy*: I'm very **surprised**!!! You told me it was only 1% likely :(
If we want to generalize this concept of **surprisal** we have that:
- An event which occurs with *high* probability should have a *low* surprisal, in our example it's the *sunny* outcome.
- An event which occurs with *low* probability should have a *high* surprisal, in our example it's the *rainy* outcome.
<br/>
**Information and entropy.** What has it to do with **information** ? Surprise is a way to quantify information. If the outcome of a situation surprises me, I say
it gives me more information than if it went as I predicted. Information is seen as being what contradicts what we thought we knew.
What has it to do with **entropy** ? As it is defined in Shannon's theory, entropy is your average surprisal among the possible outcomes of
a situation. It measures the **lack of information** about the situation or said differently, it's **uncertainty**. Indeed, if the entropy is high, by definition, you'll be surprised on average. No matters how the situation is resolved, here *sunny* or *rainy*, you'll be surprised.
Let's remark that it is not intrinsic to
the situation itself but to the way we modelled it in a probability distribution, here, the forecast.
A subtlety is that this average is weighted by the probabilty of each outome. In our example, the *rainy* outcome would give me high surprisal, but weighted by the
fact that is has 1% chance to occurs it becomes quite small. The other outcome, *sunny*, would give me a very low surprisal and I enforce it by the
fact it happens 99% of the time. Overall, given your forecast, my average surprisal which is the **entropy** of the *weather* phenomenon, is very low. There's not a lot of uncertainty about tomorrow's weather.
So which weather forecast would lead to a higher entropy ? As intuition might whisper it to our ears, the **equiprobable** forecast :
<div class="imgcap">
<div>
<img src="/assets/H/forecast_equi.png" style=" height:200px;">
</div>
<div class="thecap">An equiprobable weather forecast : it has high entropy</div>
</div>
<br/>
In that setting, the *weather* phenomenon becomes a lot more **uncertain**! With this forecast I would be really less inclined to plan an activity which entirely depends
on having a *sunny* weather. It matches with our informal definition of entropy, each outcome's surprisal is medium but enforced by
a relatively high probability to happen, it qualitatively leads to a higher entropy than in the previous situation.
It turns out that that in Shannon's formalism, this equiprobable forecast has **maximal entropy**! In the following part we inspect the formalism that Shannon's introduced to mathematically express these ideas.
# Formalization : Towards Shannon's Entropy -->
<h3 id="weather-forecasts">Weather Forecasts</h3>
<div class="imgcap">
<div>
<img src="/assets/H/forecast.png" style=" height:200px;" />
<img src="/assets/H/forecast_equi.png" style=" height:200px;" />
</div>
<div class="thecap">Two possible forecasts for tomorrow's weather</div>
</div>
<p><br /></p>
<p>An other example of a random phenomenon is tomorrow’s weather: let say it’s either rainy or sunny. Now consider the two above forecasts. If you were to plan a hiking trip with your friends, which of the two forecasts would give you the most information ?</p>
<p>Very certainly the first one: you know you can go safely on your trip without taking any umbrella since it is so unlikely to rain. Similarly, if the forecast was 99% rainy, 1% sunny, you would know to definitely take an umbrella with you.</p>
<p>However, the second forecast is of very little help for you planning your trip. This forecast gives you the highest uncertainty about tomorrow’s weather: you have no idea weather it’s going to rain or not!</p>
<p>We could imagine a third forecast, for instance 30% sunny and 70% rainy, which does not convey as much information as the first forecast but which still gives better reasons to take an umbrella compared to the second forecast.</p>
<h3 id="the-intuition-behind-entropy">The intuition behind entropy</h3>
<p>Entropy is exactly the metric which encompasses the above intuition. A probability distribution (a weather forecast in the above example), will have a high entropy if it gives you very little information about the outcome of the underlying random phenomenon. Conversly, it will have a low entropy if it gives the random phenomenon more determination.</p>
<p>In that sense, <strong>entropy measures the lack of information</strong> or <strong>uncertainty</strong> conveyed by a probability distribution on the outcome of a random phenomenon. To follow this intuition, the entropy of a random phenomenon with $k$ outcomes will be:</p>
<ul>
<li>Maximal if the random distribution is <strong>uniform</strong>: $(1/k,1/k, \dots, 1/k)$. For instance, an unbiased coin or dice.</li>
<li>Minimal if the random distribution is <strong>deterministic</strong> that is, has one entry with probability one and all the others to zero. For instance, $(1,0,0,0,0,0)$ the probability distribution of a dice which has been biased to always output “one”.</li>
</ul>
<p>Note that we expect entropy to be symmetric. Indeed, we consider to have as much information with a forecast 80% rainy, 20% sunny than with a forecast 20% rainy 80% sunny.</p>
<h2 id="mathematical-formalism">Mathematical formalism</h2>
<h3 id="shannon-entropy-formula">Shannon entropy formula</h3>
<p>Let $X$ be the random variable associated to a given random phenomenon (for instance $X$ is the variable “tomorrow’s weather”), and $p_X=(p_1,\dots,p_k)$ its distribution. Shannon’s entropy, $H(X)$ which satisfies the intuition developed in the above section is given by:</p>
\[H(X) = \sum_{i=1}^k p_i \text{log}(1/p_i) = -\sum_{i=1}^k p_i \text{log}(p_i) \label{eq:shannon}\]
<p>Computer scientists will tend to choose the log in base two while physicists may prefer to use the log in base $e$. This choice changes nothing with respect to the intuitive interpretation of entropy. Indeed we have:</p>
\[0 \leq H(X) \leq \text{log}(k)\]
<p>With $H(X) = 0$ if and only if $p_X$ is deterministic ($p_X$ of the form $(0,\dots,1,\dots,0)$) and $H(X) = \text{log}(k)$ if and only if $p_X$ is uniform $p_X=(1/k,\dots,1/k)$. Furthermore, the expression of Shannon’s entropy exhibits the symmetry we were expecting: take any permutation of $p_X$, you will get the same entropy.</p>
<div class="imgcap">
<div>
<img src="/assets/H/Figure_1.png" />
</div>
<div class="thecap">Shannon's entropy curve for the weather forecast problem</div>
</div>
<p><br /></p>
<p>In the above graph, we plot $H(X)$ when there are only two outcomes, like in the weather forecast problem, $p_X = (p_\text{sunny},p_\text{rainy}) = (p_1,p_2)$. Because we have $p_2 = 1-p_1$ we can plot the whole in 2D. We notice the maximum at $p_1 = p_2 = 0.5$ and the minimum at $(p_1,p_2) = (0,1)$ and $(p_1,p_2) = (1,0)$ as expected. We also clearly see the symmetry in $p_1,p_2$.</p>
<h3 id="axiomatic-fundations">Axiomatic fundations</h3>
<p>We can legitimately ask the following question: where does the expression of Shannon’s entropy ($\ref{eq:shannon}$) comes from? More formally, from which axioms, this expression can be uniquely derived?</p>
<p>As highlighted in the different <a href="#ref">references</a> at the end of this article, the answer to this question is not unique. We find in the litterature at least three different axiomatic approaches to the definition of entropy. None of them matches the intuition level on a straightforward way. However, the approach which might be the closest to our intuition might be Khinchin’s (Khinchin, 1957).</p>
<p>Khinchin gives four axioms to define $H$. Here are the first three axioms which are really close to our intuition on H:</p>
<ul>
<li>$H(X)$ should only depend on $p_X$</li>
<li>If $X$ is a random variable with $k$ outcomes, $H(X)$ should be maximal when $p_X$ is uniform, that is when $p_X=(1/k,\dots,1/k)$</li>
<li>If $X$ is such that $p_X = (p_1,\dots,p_k)$ and $Y$ such that $p_Y=(p_1,\dots,p_k,0,0,\dots,0)$ then $H(X) = H(Y)$</li>
</ul>
<p>The fourth axiom would require too many details to be fully covered. We are going to give a weaker version which can more easily match our intuition. For the full details, refer to this <a href="/assets/H/lecture-06a.pdf">course</a> (<a href="/assets/H/lecture-06a.pdf">[1]</a>). The weaker version states that:</p>
<ul>
<li>If $X$ and $Y$ are two independent random variables then the entropy of their product $(X,Y)$ is given by:</li>
</ul>
\[H(X,Y) = H(X)+H(Y)\label{eq:sum}\]
<p>Intuitively, this states that if $X$ and $Y$ dont share any information (knowing X doesn’t give you any information on Y and conversly) then their uncertainty sum up when you pair them together.</p>
<p>For instance, take $X$ tomorrow’s weather and $Y$ the outcome of a 6-faced dice you have on your desktop. It seems reasonable to say that $X$ and $Y$ are independent of each other. Then the variable $(X,Y)$ represents all the possible outcomes of this pair of phenomenon: (sunny, “one”), (sunny, “two”), (sunny, “three”) …, (rainy, “five”), (rainy, “six”). What equation $\ref{eq:sum}$ states is that the uncertainty of this pair of phenomenons is the sum of the uncertainty of the two individual phenomenons.</p>
<p>Why not. This requirement seems pretty reasonable. At least, our uncertainty on the pair of phenomenons depends on both phenomenons and is greater that the two individual uncertainties. Also, the transformation of a product into a sum guides our intuition to appreciate why the $\text{log}$ function is involved in the expression of Shannon’s entropy.</p>
<p>However, as mentionned earlier, this fourth axiom is too weak to lead uniquely to Shannon’s entropy. It gives a broader class of functions known as <em>Rényi entropies</em>:</p>
\[H_{\alpha}(X) = \frac{1}{1-\alpha}\text{log}\sum_{i=1}^k p_i^\alpha\]
<p>With $\alpha \geq 0$. Shannon’s entropy is recovered when $\alpha = 1$: the limit of the above expression in $\alpha=1$ exists and is Shannon’s entropy.</p>
<h2 id="wrapping-up">Wrapping up</h2>
<p>A random phenomenon, through its probability distribution, has an intrinsic property: entropy. Entropy measures the lack of knowledge (predictability) we have on this random phenomenon. It is a measure of <strong>uncertainty</strong>.</p>
<p>Different formalism all lead to the definition of Shannon’s entropy which meets our intuition with the principal characteristics of entropy:</p>
<ul>
<li>Entropy is maximal when the distribution is uniform, minimal when the distribution is deterministic</li>
<li>Entropy is symmetric with respect to the probability distribution</li>
<li>Entropy grows when two independent phenomenons are considered together</li>
</ul>
<p>Shannon’s entropy naturally arises when notion as optimal compression or communication over a noisy channel are considered. It is at the root of <em>Information Theory</em> which is a crucial element of our understanding of communication processes both on the theoretical and practical point of view. Information theory introduces other very interesting quantities such as <em>Kullback Leibler Divergence</em>, Mutual Information* or <em>Conditional Entropy</em>.</p>
<p>We invite the curious reader to read an Information Theory course :) !</p>
<h3 id="what-about-physics">What about Physics?</h3>
<p>Entropy is an important quantity in physics and chemistry, for instance in thermodynamic or statistical physics. So what is the link between this information theoretic point of view and physics?</p>
<p>Ask a physicist!!! (or look at our <a href="/2019/07/18/free-energy.html">post on Free Energy</a>.. )</p>
<p><a name="ref"></a></p>
<h2 id="references">References</h2>
<p><a href="/assets/H/lecture-06a.pdf">[1]</a>, original link: <a href="https://www.stat.cmu.edu/~cshalizi/350/2008/lectures/06a/lecture-06a.pdf">https://www.stat.cmu.edu/~cshalizi/350/2008/lectures/06a/lecture-06a.pdf</a></p>
<p><a href="/assets/H/0511171.pdf">[2]</a>, original link: <a href="https://arxiv.org/pdf/quant-ph/0511171.pdf">https://arxiv.org/pdf/quant-ph/0511171.pdf</a></p>
<p><a href="Lesson1_h.pdf">[3]</a>, original link: <a href="http://www.cs.tau.ac.il/~iftachh/Courses/Info/Fall15/Printouts/Lesson1_h.pdf">http://www.cs.tau.ac.il/~iftachh/Courses/Info/Fall15/Printouts/Lesson1_h.pdf</a></p>Some intuition on entropy from the Information Theory point of view.Differentiating Primes2018-01-18T16:13:46+00:002018-01-18T16:13:46+00:00/2018/01/18/diff-primes<ul id="markdown-toc">
<li><a href="#warning" id="markdown-toc-warning">Warning</a></li>
<li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
<li><a href="#experiments" id="markdown-toc-experiments">Experiments</a> <ul>
<li><a href="#pertubing-primes" id="markdown-toc-pertubing-primes">Pertubing primes</a></li>
<li><a href="#taking-nlnn" id="markdown-toc-taking-nlnn">Taking $nln(n)$</a></li>
<li><a href="#taking-ntann" id="markdown-toc-taking-ntann">Taking $ntan(n)$</a></li>
<li><a href="#back-to-primes-from-order-to-chaos-" id="markdown-toc-back-to-primes-from-order-to-chaos-">Back to primes: from order to chaos ?</a></li>
</ul>
</li>
</ul>
<h2 id="warning">Warning</h2>
<p>Note that all the the experimental results shown in this post might be biased by numerical errors. They all need to be checkproofed. You can find the base python code that generated them here: <a href="/assets/primes/diff_primes.py">Code</a>.</p>
<h2 id="introduction">Introduction</h2>
<p>
We play the following game. You are given a finite sequence of numbers and the goal is to predict which is the next one.
For instance let's say we take:
<center>
0, 1, 4, 9, 16, 25, ?
</center>
<br />
If you are familiar with maths you would certainly come up with 36. <br />
If not here's a way we could proceed: through iterated differentiations.
The idea is the following, we substract each term to the following one and do it again on the new sequence. Here we get: <br />
<center>
<br />
0, 1, 4, 9, 16, 25<br />
1, 3, 5, 7, 9<br />
2, 2, 2, 2<br />
0, 0, 0<br />
0, 0<br />
0
</center> <br />
From this pyramidal construction we can make the probable hypothesis that the third line will always be constant, equal to 2 and propagate
from there in order to decide the next term in the first line:
<center>
<br />
0, 1, 4, 9, 16, 25, <span style="color:red">36</span><br />
1, 3, 5, 7, 9, <span style="color:red">11</span><br />
2, 2, 2, 2, <span style="color:red">2</span><br />
</center> <br />
Which is consistent with the first mathematical guess that was based on the recognition of the sequence of squares:
<center>
<br />
$ U_n = n^2 $
</center>
<br />
Formally, our method through differentiation was to repeatedly construct $V_n = U_{n+1}-U_n$ and setting $U=V$ at each stage. Even more formally have constructed a sequence of sequence $W$ such that:
<center>
<br />
$W^0 = U$ <br />
and <br />
$W^{k+1}_{n} = W^{k}_{n+1}-W^{k}_{n} = D(W^k)$
<br />
</center>
With $D$ the differentiation operator on sequences. <br /><br />
Since we have only a finite number of elements of $U$ -- the sequence from which we want to guess the next element -- note that we "loose" one term at each iteration of the differentitation and thus have a pyramidal structure in the end. <br />
</p>
<p><br /></p>
<p>
In this example we were lucky because we could guess the next number even without using this method of differentiation. This is because the sequence of squares is well known. To enforce the idea that this method can be generalized let's apply it on the following $U$:
<center>
13, 10, 5, 4, 13, 38, ?
</center>
<br />
We get:
<center>
<br />
$W^0:$ 13, 10, 5, 4, 13, 38<br />
$W^1:$ -3, -5, -1, 9, 25<br />
$W^2:$ -2, 4, 10, 16<br />
$W^3:$ 6, 6, 6<br />
$W^4:$ 0, 0<br />
$W^5:$ 0
</center> <br />
By progration we guess the next number:
<center>
<br />
$W^0:$ 13, 10, 5, 4, 13, 38, <span style="color:red">85</span><br />
$W^1:$ -3, -5, -1, 9, 25, <span style="color:red">47</span><br />
$W^2:$ -2, 4, 10, 16, <span style="color:red">22</span><br />
$W^3:$ 6, 6, 6, <span style="color:red">6</span>
</center> <br />
Which is consistent with the fact that the underlying formula we chose to generate this sequence was:
<center>
<br />
$ U_n = n^3 - 4n^2 + 13$
</center>
<br />
Which would have been way harder to recognize than $U_n=n^2$!
</p>
<p><br /><br />
Ok so this method of differentation gave us a tool in order to predict the underlying structure of our example sequences. <br />
When thinking of interger sequences there is one of which structure is very mysterious, <strong>primes numbers</strong>:</p>
<center>
$(p_n):$ 2, 3, 5, 7, 11, ...
<br />
<br />
</center>
<p>Why not doing the same, that is taking $W^0 = (p_n)$ and see what happens ? However, we’re not going to do it by hand but program it. Here’s what we obtain:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/primes.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of the primes below $10^4$</div>
</div>
<br />
</center>
<p>This video was made by successively plotting for each $k$, $n \mapsto W^{k}_{n}$. <br /><br />
Isn’t it <strong>super strange</strong> ? <br /><br />
This blog post aims at compiling experiments around this iterated differentitation idea and at making a formal link with cellular automata, we do not proove nor conjecture anything. <br />
We found very little literature on the subject, please <strong>feel free to add some in the comment sections if these plots ring you a bell</strong>.
<br /><br /></p>
<h1 id="experiments">Experiments</h1>
<h2 id="pertubing-primes">Pertubing primes</h2>
<p>The first question that came to our mind after seeing the video shown in introduction was: is this phenomenon characteristic of prime numbers ? <br />
Without any experiments we can already say: <strong>no</strong>. Because translating primes by a constant, for instance $p’_n=p_n+53$, won’t perturbate the differentations. However, this pertubation is quite straightforward and not very harmful on the structure of primes. Let’s pertubate them quite a lot.
We add to each prime below $10^4$ a different random number between $-1000$ and $1000$ and sort the obtained sequence. We take this as our $W^0$. Here’s what we obtain:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/primes2.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of sorted primes below $10^4$ with random pertubations</div>
</div>
<br />
</center>
<p>Which is a quite similar behavior! <br />
<br />
Ok so, this fancy shapes are certainly not specific to primes.
<br />
In this pertubation spirit let’s try something even harder: we similarly add random numbers to each primes but <strong>do not sort</strong> the sequence after this operation. That is that $W^0$ is not increasing at all and has random fluctuations. Here’s what we get:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/primes3.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of unsorted primes below $10^4$ with random pertubations</div>
</div>
<br />
</center>
<p>Again, these shapes.<br />
Ok so for which other sequence should we have these shapes ?</p>
<h2 id="taking-nlnn">Taking $nln(n)$</h2>
<p>The theorem of primes number claims that $(p_n) \sim nln(n)$ so it would be natural to see this kind of behaviors with $W^0_n = nln(n)$. <br />
It gives:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/nln.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of $W^0_n = nln(n)$</div>
</div>
<br />
</center>
<p>To stay with integers only, what happens with $W^0_n = \text{int}(nln(n))$ ? The following:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/nlnent.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of $W^0_n = \text{int}(nln(n))$</div>
</div>
<br />
</center>
<p>The behavior seems really more geometric at least for earlier derivatives (low $k$).</p>
<h2 id="taking-ntann">Taking $ntan(n)$</h2>
<p>We thought of $nln(n)$ because it was somehow related to primes. What if we take something that does not look especially related ?
For instance $W^0_n = ntan(n)$.</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/ntan.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of $W^0_n = ntan(n)$</div>
</div>
<br />
</center>
<!-- back to polynomes -->
<!-- disclaimer order of magnitudes float error ? -->
<!-- cellular automata, symmetrized rule -->
<h2 id="back-to-primes-from-order-to-chaos-">Back to primes: from order to chaos ?</h2>
<p>Let’s try something weird on our primes. Let’s take: \(W^0_n = p_{p_n}\), the sequence of primes indexed by primes.
In order to have enough data, we extend $p_n$ to be all the primes below $10^5$. It gives:</p>
<center>
<div class="imgcap">
<video width="70%" controls="">
<source type="video/mp4" src="/assets/primes/videos/chaos.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Iterated differentiations of $W^0_n = p_{p_n}$</div>
</div>
<br />
</center>
<p>At the begining everything seems “as usual”. But suddenly it breaks and it looks very chaotic…</p>What happens if you iterate the discrete differenciation operator on the sequence of primes numbers ?Having Fun with Self-Organizing Maps2017-07-27T15:46:31+00:002017-07-27T15:46:31+00:00/2017/07/27/fun-with-som<ul id="markdown-toc">
<li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
<li><a href="#whats-a-som-" id="markdown-toc-whats-a-som-">What’s a SOM ?</a> <ul>
<li><a href="#what-are-we-going-to-do-with-soms-" id="markdown-toc-what-are-we-going-to-do-with-soms-">What are we going to do with SOMs ?</a> <ul>
<li><a href="#finding-the-best-matching-unit" id="markdown-toc-finding-the-best-matching-unit">Finding the Best Matching Unit</a></li>
<li><a href="#updating-the-bmu" id="markdown-toc-updating-the-bmu">Updating the BMU</a></li>
<li><a href="#updating-the-rest-of-the-som" id="markdown-toc-updating-the-rest-of-the-som">Updating the rest of the SOM</a></li>
</ul>
</li>
<li><a href="#how-to-start-how-to-stop-them-" id="markdown-toc-how-to-start-how-to-stop-them-">How to start, how to stop them ?</a> <ul>
<li><a href="#initialization" id="markdown-toc-initialization">Initialization</a></li>
<li><a href="#stopping" id="markdown-toc-stopping">Stopping</a></li>
</ul>
</li>
<li><a href="#assessing-the-quality-of-learning-and-choosing-hyperparameters" id="markdown-toc-assessing-the-quality-of-learning-and-choosing-hyperparameters">Assessing the quality of learning and choosing hyperparameters</a></li>
</ul>
</li>
<li><a href="#having-fun-with-soms" id="markdown-toc-having-fun-with-soms">Having fun with SOMs!!!</a> <ul>
<li><a href="#2d-feature-vectors" id="markdown-toc-2d-feature-vectors">2D feature vectors</a> <ul>
<li><a href="#inputing-a-square" id="markdown-toc-inputing-a-square">Inputing a square</a></li>
<li><a href="#inputing-a-circle" id="markdown-toc-inputing-a-circle">Inputing a circle</a></li>
</ul>
</li>
<li><a href="#colors" id="markdown-toc-colors">Colors</a></li>
</ul>
</li>
<li><a href="#whats-next-" id="markdown-toc-whats-next-">What’s next ?</a></li>
<li><a href="#references" id="markdown-toc-references">References</a> <ul>
<li><a href="#articles" id="markdown-toc-articles">Articles</a></li>
<li><a href="#code" id="markdown-toc-code">Code</a></li>
<li><a href="#report" id="markdown-toc-report">Report</a></li>
<li><a href="#websites" id="markdown-toc-websites">Websites</a></li>
</ul>
</li>
</ul>
<h2 id="introduction">Introduction</h2>
<p>Self-Organizing Maps (<strong>SOM</strong>), or <a href="http://www.scholarpedia.org/article/Kohonen_network">Kohonen Networks</a> (<a href="#ref">[1]</a>), is an unsupervised learning method that can be applied to a wide range of problems such as: data visualization, dimensionality reduction or clustering.
It was introduced in the 80’ by computer scientist Teuvo Kohonen as a type of neural network (<a href="/assets/soms/doc/kohonen1982.pdf">[Kohonen 82]</a>,<a href="http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1990-Kohonen-PIEEE.pdf">[Kohonen 90]</a>).</p>
<p>In this post we are going to present the basics of the SOM model and build a minimal python implementation based on <code class="language-plaintext highlighter-rouge">numpy</code>. It leads to visualizations such as:</p>
<center>
<div class="imgcap">
<video width="50%" controls="">
<source type="video/mp4" src="/assets/soms/video/square.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">A SOM trained on square data</div>
</div>
</center>
<p><br /></p>
<p>There is a huge litterature on SOMs (see <a href="#ref">[2]</a>), theoretical and applied, this post only aims at having fun with this model over a tiny implementation.
The approach is very much inspired by this <a href="http://www.ai-junkie.com/ann/som/som1.html">post</a> (<a href="#ref">[3]</a>).</p>
<p>We first introduce the maths of the model and then present concrete examples to get more familiar with it.
<br /></p>
<h2 id="whats-a-som-">What’s a SOM ?</h2>
<p>For the sake of simplicity we’ll describe SOMs as being supported by a <strong>h</strong>*<strong>w</strong> set of 2D points that we call cells, regularly spaced in the plan, also called a <em>point lattice</em>.
We refer to them by assigning them coordinates, here top down in the spirit of <code class="language-plaintext highlighter-rouge">numpy</code> matrices.</p>
<div class="imgcap">
<div>
<img width="45%" src="/assets/soms/images/point_lattice.png" alt="point lattice" />
</div>
<div class="thecap">A point lattice, support of a (5,6)-shaped SOM</div>
</div>
<p><br /></p>
<p>To each of these cells is associated a <strong>feature vector</strong> of dimension <strong>d</strong> (d is the same for all cells).
In that setting a SOM is entirely described by a <strong>h</strong>*<strong>w</strong> matrix containing <strong>d</strong> dimensional vectors, that is a
<strong>(h,w,d)</strong> tensor.</p>
<p>We thus can construct a zero-initialized SOM with the following code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="rouge-code"><pre><span class="c1"># jdc for step by step class construction
</span><span class="kn">import</span> <span class="nn">jdc.jdc</span> <span class="k">as</span> <span class="n">jdc</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="k">class</span> <span class="nc">SOM</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">):</span>
<span class="s">"""
Construction of a zero-filled SOM.
h,w,dim_feat: constructs a (h,w,dim_feat) SOM.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<p><strong>Note on the code.</strong> We construct the code iteratively, the blocks you see are the cells of this <a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/SOM_construction.ipynb">notebook</a>.
We use the <a href="https://github.com/alexhagen/jdc">jdc</a> <code class="language-plaintext highlighter-rouge">%%add_to</code> (see <a href="#ref">[4]</a>) magic command in order to construct the SOM class step by step. If you’re not familiar with notebooks
just consider each of the blocks commencing by <code class="language-plaintext highlighter-rouge">%%add_to SOM</code> as updates we do on the class <code class="language-plaintext highlighter-rouge">SOM</code> methods.</p>
<p>The entire final code is to be found <a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/som.py">here</a>.</p>
<p>When a function appears on several blocks such as the <code class="language-plaintext highlighter-rouge">__init__</code> function like <a href="#b1">here</a> or <a href="#b2">here</a> it means it has been updated from one block to the other.
Otherwise function calls always refer to the most recent definition of it.</p>
<p><strong>Disclaimer.</strong> The code we present could be highly optimized, in particular in its way it interacts with numpy.</p>
<p><br /></p>
<h3 id="what-are-we-going-to-do-with-soms-">What are we going to do with SOMs ?</h3>
<p>We are going to <strong>train</strong> them!!</p>
<p><a name="bef"></a>
The idea of a SOM is to feed it with <strong>d</strong> dimensional data and let it update
it’s feature vectors in order to match this data. Furthermore, we want it to group nearby on the lattice data which are
originally nearby in the <strong>d</strong>-dimensional space. This will be done by spreading information in <strong>neighbourhoods</strong>.
This neighbourhood approach concentrate similar data in <strong>clusters</strong> around close cells on the SOM.</p>
<p>Furthermore, with a SOM we can <strong>visualize</strong> on a 2D lattice data potentially lying in a very high dimension space. That’s why
it can be seen as a <strong>dimensionality reduction</strong> technique.</p>
<p>All of this being <strong>unsupervised</strong> as we give our data to the model without additional information.</p>
<p>The best way to have an intuitive understanding of these aspects of SOMs is to code one.
Let’s go!!</p>
<p><br /></p>
<h4 id="finding-the-best-matching-unit">Finding the Best Matching Unit</h4>
<p>At each time step t of the training, we are going to present to the SOM a randomly chosen input vector of our data. Then,
we find the cell having the closest (norm L2 speaking) feature vector to that input vector. We call this cell the Best Matching Unit (<strong>BMU</strong>).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">):</span>
<span class="s">"""
Training procedure for a SOM.
data: a N*d matrix, N the number of examples,
d the same as dim_feat=self.shape[2].
"""</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">count</span><span class="p">():</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)))</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">find_bmu</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_vec</span><span class="p">):</span>
<span class="s">"""
Find the BMU of a given input vector.
input_vec: a d=dim_feat=self.shape[2] input vector.
"""</span>
<span class="n">list_bmu</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="n">dist</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">((</span><span class="n">input_vec</span><span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">y</span><span class="p">,</span><span class="n">x</span><span class="p">]))</span>
<span class="n">list_bmu</span><span class="p">.</span><span class="n">append</span><span class="p">(((</span><span class="n">y</span><span class="p">,</span><span class="n">x</span><span class="p">),</span><span class="n">dist</span><span class="p">))</span>
<span class="n">list_bmu</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">return</span> <span class="n">list_bmu</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<h4 id="updating-the-bmu">Updating the BMU</h4>
<p>Now that we found the BMU, we want it’s current feature vector \(V_t\) to be closer to the inputed data vector at time t,
\(D_t\). Here the update rule we are going to use:</p>
\[V_{t+1} = V_{t} + L(t)*(D_{t}-V_{t})\label{eq:bmu}\]
<p>\(L(t)\) is called the <em>learning rate</em> and depends on time t. Remark that if \(L(t) = 1\), we directly replace our feature vector by the
input vector. It’s the kind of behavior that we would like to see at the begining of the training but if we want the SOM to “stabilize”
we’ll need to decrease L(t) over time. That’s exactly why we define:</p>
\[L(t) = L_0*e^{-\frac{t}{\lambda}} \label{eq:L}\]
<p>With \(L_0\) the initial learning rate and \(\lambda\) a time scaling constant.</p>
<p>Let implements these new ideas:</p>
<p><a name="b1"></a></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">):</span>
<span class="s">"""
Construction of a zero-filled SOM.
h,w,dim_feat: constructs a (h,w,dim_feat) SOM.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">))</span>
<span class="c1"># Training parameters
</span> <span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">,</span><span class="n">L0</span><span class="p">,</span><span class="n">lam</span><span class="p">):</span>
<span class="s">"""
Training procedure for a SOM.
data: a N*d matrix, N the number of examples,
d the same as dim_feat=self.shape[2].
L0,lam: training parameters.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="n">L0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="n">lam</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">count</span><span class="p">():</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)))</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">update_bmu</span><span class="p">(</span><span class="n">bmu</span><span class="p">,</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">],</span><span class="n">t</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">update_bmu</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">bmu</span><span class="p">,</span><span class="n">input_vector</span><span class="p">,</span><span class="n">t</span><span class="p">):</span>
<span class="s">"""
Update rule for the BMU.
bmu: (y,x) BMU's coordinates.
input_vector: current data vector.
t: current time.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">bmu</span><span class="p">]</span> <span class="o">+=</span> <span class="bp">self</span><span class="p">.</span><span class="n">L</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">input_vector</span><span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">bmu</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">L</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="s">"""
Learning rate formula.
t: current time.
"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">L0</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">t</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">lam</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<h4 id="updating-the-rest-of-the-som">Updating the rest of the SOM</h4>
<p>As suggested <a href="#bef">before</a>, we are going to communicate the BMU’s update to its <strong>neighbourhood</strong>.
This will lead on having close feature vectors on close cells. That’s why we can see SOMs as a clustering method:
it groups together similar information.</p>
<p>Let’s have $W_t$ the feature vector of a cell $\boldsymbol c = (x,y)$, different than the BMU $\boldsymbol c_{0} = (x_0,y_0)$.
We are going to update it as below:</p>
\[W_{t+1} = W_{t} + N(\delta,t)*L(t)*(D_{t}-W_{t})\label{eq:non_bmu}\]
<p>With $N(\delta,t)$ the <em>neighbouring penalty</em> depending on $\delta = || \boldsymbol c - \boldsymbol c_{0}||_{2}$, the distance
between that cell and the BMU, and $t$ the current time.</p>
<p>Indeed, we would like farest cells ($\delta$ high) to be less affected by the BMU’s update. Also, we would like this update to be
more and more localized near the BMU over time. As for the learning rate, we want the SOM to “stabilize” over time.</p>
<p>It leads to the following formulation for $N(\delta,t)$:</p>
\[N(\delta,t) = e^{-\frac{\delta^2}{2\sigma(t)^2}}\]
<p>Which is a spatial <em>Gaussian decay</em> scaled by radius factor $\sigma(t)$.
For the reasons we just mentionned, we want $\sigma(t)$ to get smaller over time so that the “influence area” of the BMU shrinks.
We’ll do that in the exact same way as equation $\ref{eq:L}$. It gives:</p>
\[\sigma(t) = \sigma_0 * e^{-\frac{t}{\lambda}}\label{eq:sigma}\]
<p>We use the same time scaling $\lambda$ parameter for both $L$ and $\sigma$.</p>
<p>By looking at the update equations $\ref{eq:bmu}$ and $\ref{eq:non_bmu}$ for the BMU cell and non-BMU cells we can notice that the only thing
that differs is the $N(\delta,t)$ term. Furthermore, from the BMU to itself $\delta=0$ and gives $N(\delta,t)=1$ which exactly give equation $\ref{eq:bmu}$.
Thus we can use equation $\ref{eq:non_bmu}$ as a unique update rule for all cells, BMU and non-BMUs.</p>
<p>Let implements all that:</p>
<p><a name="b2"></a></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">):</span>
<span class="s">"""
Construction of a zero-filled SOM.
h,w,dim_feat: constructs a (h,w,dim_feat) SOM.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">h</span><span class="p">,</span><span class="n">w</span><span class="p">,</span><span class="n">dim_feat</span><span class="p">))</span>
<span class="c1"># Training parameters
</span> <span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">sigma0</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">,</span><span class="n">L0</span><span class="p">,</span><span class="n">lam</span><span class="p">,</span><span class="n">sigma0</span><span class="p">):</span>
<span class="s">"""
Training procedure for a SOM.
data: a N*d matrix, N the number of examples,
d the same as dim_feat=self.shape[2].
L0,lam,sigma0: training parameters.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="n">L0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="n">lam</span>
<span class="bp">self</span><span class="p">.</span><span class="n">sigma0</span> <span class="o">=</span> <span class="n">sigma0</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">count</span><span class="p">():</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)))</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">update_som</span><span class="p">(</span><span class="n">bmu</span><span class="p">,</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">],</span><span class="n">t</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">update_som</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">bmu</span><span class="p">,</span><span class="n">input_vector</span><span class="p">,</span><span class="n">t</span><span class="p">):</span>
<span class="s">"""
Calls the update rule on each cell.
bmu: (y,x) BMU's coordinates.
input_vector: current data vector.
t: current time.
"""</span>
<span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="n">dist_to_bmu</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">((</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">bmu</span><span class="p">)</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">((</span><span class="n">y</span><span class="p">,</span><span class="n">x</span><span class="p">))))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">update_cell</span><span class="p">((</span><span class="n">y</span><span class="p">,</span><span class="n">x</span><span class="p">),</span><span class="n">dist_to_bmu</span><span class="p">,</span><span class="n">input_vector</span><span class="p">,</span><span class="n">t</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">update_cell</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">cell</span><span class="p">,</span><span class="n">dist_to_bmu</span><span class="p">,</span><span class="n">input_vector</span><span class="p">,</span><span class="n">t</span><span class="p">):</span>
<span class="s">"""
Computes the update rule on a cell.
cell: (y,x) cell's coordinates.
dist_to_bmu: L2 distance from cell to bmu.
input_vector: current data vector.
t: current time.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">cell</span><span class="p">]</span> <span class="o">+=</span> <span class="bp">self</span><span class="p">.</span><span class="n">N</span><span class="p">(</span><span class="n">dist_to_bmu</span><span class="p">,</span><span class="n">t</span><span class="p">)</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">L</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">input_vector</span><span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">cell</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">N</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">dist_to_bmu</span><span class="p">,</span><span class="n">t</span><span class="p">):</span>
<span class="s">"""
Computes the neighbouring penalty.
dist_to_bmu: L2 distance to bmu.
t: current time.
"""</span>
<span class="n">curr_sigma</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">sigma</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">dist_to_bmu</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">curr_sigma</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">sigma</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="s">"""
Neighbouring radius formula.
t: current time.
"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">sigma0</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">t</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">lam</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<h3 id="how-to-start-how-to-stop-them-">How to start, how to stop them ?</h3>
<p>Two things we have left behind are the begining and the ending of the training procedure.
For the moment we start with a full zero SOM and our <code class="language-plaintext highlighter-rouge">train</code> method is an infinite loop.</p>
<p>As often in machine learning, these two stages of the process are foremost a matter of choice.</p>
<h4 id="initialization">Initialization</h4>
<p>Intuitively it is in our interest to have diversity in our initial SOM so that BMU’s are strong matches and form clusters early.
Initialization is a very important part of the SOM process and there’s plenty of options:</p>
<ul>
<li>Initialize randomly: uniformly, Gaussian or in any other way. Efficiency will highly depends on our data’s distribution.</li>
<li>Initialize by sampling: if we have a lot of data compared to the number of cells we can pick some of it randomly to be our initial feature vectors.</li>
</ul>
<p>Given all these possibilities we are going to leave the choice to the code’s user, implementing a default uniform random initialization in $[0,1]^d$.
In practice we provide the possibility to specify an <code class="language-plaintext highlighter-rouge">initializer(h,w,dim_feat)</code> function that returns the initial <strong>(h,w,d)</strong> SOM tensor:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">,</span><span class="n">L0</span><span class="p">,</span><span class="n">lam</span><span class="p">,</span><span class="n">sigma0</span><span class="p">,</span><span class="n">initializer</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">rand</span><span class="p">):</span>
<span class="s">"""
Training procedure for a SOM.
data: a N*d matrix, N the number of examples,
d the same as dim_feat=self.shape[2].
L0,lam,sigma0: training parameters.
initializer: a function taking h,w and dim_feat (*self.shape) as
parameters and returning an initial (h,w,dim_feat) tensor.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="n">L0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="n">lam</span>
<span class="bp">self</span><span class="p">.</span><span class="n">sigma0</span> <span class="o">=</span> <span class="n">sigma0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span> <span class="o">=</span> <span class="n">initializer</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">count</span><span class="p">():</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)))</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">update_som</span><span class="p">(</span><span class="n">bmu</span><span class="p">,</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">],</span><span class="n">t</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<h4 id="stopping">Stopping</h4>
<p>We choose to stop the process when $\sigma(t) < 1$, that is when updates only concern BMUs and no
more significative information spreads on the SOM. Notice that with this criterion, given equation $\ref{eq:sigma}$, we can
predict the total training time only with the $\lambda$ and $\sigma_0$ parameters. This gives:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">,</span><span class="n">L0</span><span class="p">,</span><span class="n">lam</span><span class="p">,</span><span class="n">sigma0</span><span class="p">,</span><span class="n">initializer</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">rand</span><span class="p">):</span>
<span class="s">"""
Training procedure for a SOM.
data: a N*d matrix, N the number of examples,
d the same as dim_feat=self.shape[2].
L0,lam,sigma0: training parameters.
initializer: a function taking h,w and dim_feat (*self.shape) as
parameters and returning an initial (h,w,dim_feat) tensor.
"""</span>
<span class="bp">self</span><span class="p">.</span><span class="n">L0</span> <span class="o">=</span> <span class="n">L0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="n">lam</span>
<span class="bp">self</span><span class="p">.</span><span class="n">sigma0</span> <span class="o">=</span> <span class="n">sigma0</span>
<span class="bp">self</span><span class="p">.</span><span class="n">som</span> <span class="o">=</span> <span class="n">initializer</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">count</span><span class="p">():</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">sigma</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="o"><</span> <span class="mf">1.0</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)))</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">update_som</span><span class="p">(</span><span class="n">bmu</span><span class="p">,</span><span class="n">data</span><span class="p">[</span><span class="n">i_data</span><span class="p">],</span><span class="n">t</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p><br /></p>
<h3 id="assessing-the-quality-of-learning-and-choosing-hyperparameters">Assessing the quality of learning and choosing hyperparameters</h3>
<p>We have several hyperparameters to our model: $L_0$, $\lambda$, $\sigma_0$ and the way
to initialize the SOM. Each instantiation of these parameters will lead to a different model.
In order to choose them we need a criterion to assess the quality of learning.
This criterion can really depend on the task we are solving with the SOM.
However, a general approach is to look at the <strong>quantization error</strong>.</p>
<p>It is being defined as the mean of the distances between each input vector and it’s BMU’s feature:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">add_to</span> <span class="n">SOM</span>
<span class="k">def</span> <span class="nf">quant_err</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""
Computes the quantization error of the SOM.
It uses the data fed at last training.
"""</span>
<span class="n">bmu_dists</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">input_vector</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">data</span><span class="p">:</span>
<span class="n">bmu</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">find_bmu</span><span class="p">(</span><span class="n">input_vector</span><span class="p">)</span>
<span class="n">bmu_feat</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">som</span><span class="p">[</span><span class="n">bmu</span><span class="p">]</span>
<span class="n">bmu_dists</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">input_vector</span><span class="o">-</span><span class="n">bmu_feat</span><span class="p">))</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">bmu_dists</span><span class="p">).</span><span class="n">mean</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>To compute this error we have internalized the dataset in the SOM class in<code class="language-plaintext highlighter-rouge">self.data</code> at training time.</p>
<p><strong>In practice.</strong> It is very common to set the initial neighbourhood radius, $\sigma_0$, to half the size of the SOM. The
$\lambda$ parameter controls the duration of the learning process as well as the decaying speed of other parameters.
Values between $10$ and $10^3$ are often a good fit. Finally you should try different orders of magnitude for the $L_0$ paramater,
$0.5 \leq L_0 \leq 10$ is often fine.</p>
<p>The whole code of this part is <a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/som.py">here</a>. It has a few extra features that are usefull for visualization purpose.
They are detailed in the following part.</p>
<p><br /></p>
<h2 id="having-fun-with-soms">Having fun with SOMs!!!</h2>
<p>All of these visualization come from this <a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/SOM_viz.ipynb">visualization notebook</a>.</p>
<h3 id="2d-feature-vectors">2D feature vectors</h3>
<p>Having 2D feature vectors in a SOM, that is, having a <strong>(*,*,2)</strong> SOM, isn’t an example of dimensionalty reduction.
However it’s very nice to visualizing how the SOM actually organizes itself. In that sense it’s quite meta. We do not use a SOM
to visualize data, we use data to visualize a SOM.</p>
<p>The underlying idea is quite simple: we are going to fed the SOM with shapes and see how it responds to it.</p>
<p>Please note that we have used the <code class="language-plaintext highlighter-rouge">(y,x)</code> convention to refer to the coordinates of the SOM’s cells, however we’ll
use the <code class="language-plaintext highlighter-rouge">(x,y)</code> convention to interpret the content of their 2D feature vector.</p>
<h4 id="inputing-a-square">Inputing a square</h4>
<p>In order to input a square we simply generate uniformly random data in $[0,1]^2$:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">square_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">5000</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>We slightly modify <code class="language-plaintext highlighter-rouge">SOM.train</code> in order to save intermediates soms for us to make a little movie of the training.
For the movie to last a bit more we change the stopping condition from <code class="language-plaintext highlighter-rouge">sigma(t) < 1.0</code> to <code class="language-plaintext highlighter-rouge">sigma(t) < 0.5</code>.
We also add a prompt on stdout in order to know the total number of iterations.</p>
<p>We train a <strong>(20,20,2)</strong> SOM on that task:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">time</span>
<span class="n">som_square</span> <span class="o">=</span> <span class="n">SOM</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">20</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
<span class="n">frames_square</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">som_square</span><span class="p">.</span><span class="n">train</span><span class="p">(</span><span class="n">square_data</span><span class="p">,</span><span class="n">L0</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span><span class="n">lam</span><span class="o">=</span><span class="mf">1e2</span><span class="p">,</span><span class="n">sigma0</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span><span class="n">frames</span><span class="o">=</span><span class="n">frames_square</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">%%time</code> magic command gives us running time information. We get:</p>
<pre>
final t: 300
CPU times: user 2.91 s, sys: 12.4 ms, total: 2.93 s
Wall time: 2.95 s
</pre>
<p>Let’s look at the quantization error:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="o">%%</span><span class="n">time</span>
<span class="k">print</span><span class="p">(</span><span class="s">"quantization error:"</span><span class="p">,</span> <span class="n">som_square</span><span class="p">.</span><span class="n">quant_err</span><span class="p">())</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>We get:</p>
<pre>
quantization error: 0.0432528457322
CPU times: user 10.3 s, sys: 282 ms, total: 10.6 s
Wall time: 10.3 s
</pre>
<p>The quantization error seems low! Running time info shows us that it’s not a very efficient task to perform, at least
in the way we implemented it.</p>
<p>Thanks to <code class="language-plaintext highlighter-rouge">frames_square</code> we use plotting routines implemented in the [notebook][https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/SOM_viz.ipynb] in order to get the movie of the learning.</p>
<p>We get:</p>
<center>
<div class="imgcap">
<video width="50%" controls="">
<source type="video/mp4" src="/assets/soms/video/square2.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">Our square-trained SOM's feature vectors at each step of the learning process</div>
</div>
</center>
<p><br />
Notice that it was taken from a different learning process than the introductory video.</p>
<p>The first frame shows the random initialization of our SOM. We can see that the SOM somehow converges to the square shape. As expected, because of our loose stopping condition, there’s almost no changes anymore at the end of the video.</p>
<p>Intuitively it’s very much likely that it grows in the direction of “brand new” examples. Indeed, sometimes we see it largely moving a corner in one direction. It should be because at that step an example in that area was fed and, as our learning rate is almost $1$, the corresponding BMU quickly moves towards it.</p>
<p>Furthermore, what’s really impressive with SOMs is that the original “topology” of the SOM, here the <em>point lattice</em>, is preserved:</p>
<div class="imgcap">
<div>
<img src="/assets/soms/images/top.png" alt="topology" />
</div>
<div class="thecap">ID of each feature vector on the SOM point lattice</div>
</div>
<p><br />
This huge image plots our feature vectors at the end of the training and adds on each of them the “ID” of the corresponding
cell on the SOM’s point lattice. If a cell has coordinates <code class="language-plaintext highlighter-rouge">(y,x)</code> on the lattice the ID is $20y+x$.
We see that here, the <code class="language-plaintext highlighter-rouge">(0,0)</code> cell corresponds to the right top corner of the square.</p>
<p>It’s very much as if the SOM was a piece of cloth physically matching the points it has been fed.</p>
<p><br /></p>
<h4 id="inputing-a-circle">Inputing a circle</h4>
<p>Let’s challenge our 2D SOM. We saw that we retrieved the <em>point lattice</em> structure at the end of training.
Intuitively it was a good fit because our squared input data was more or less easily representable with this structure.
Now, what if we fed the SOM with data that has nothing to do with a square ? For instance, the rim of a circle.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="c1">#Reference: https://stackoverflow.com/questions/8487893/generate-all-the-points-on-the-circumference-of-a-circle
</span><span class="k">def</span> <span class="nf">PointsInCircum</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">n</span><span class="o">=</span><span class="mi">5000</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="n">math</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">pi</span><span class="o">/</span><span class="n">n</span><span class="o">*</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">r</span><span class="o">+</span><span class="mf">0.5</span><span class="p">,</span><span class="n">math</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">pi</span><span class="o">/</span><span class="n">n</span><span class="o">*</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">r</span><span class="o">+</span><span class="mf">0.5</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">)])</span>
<span class="n">circle_data</span> <span class="o">=</span> <span class="n">PointsInCircum</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>It gives, for instance, the following set of input points:</p>
<div class="imgcap">
<div>
<img src="/assets/soms/images/circle.png" alt="points on the circumference of a circle" />
</div>
<div class="thecap">Input points on the circumference of a circle</div>
</div>
<p><br />
Let’s train the SOM with the same parameters:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre>
<span class="n">som_circle</span> <span class="o">=</span> <span class="n">SOM</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">20</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
<span class="n">frames_circle</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">som_circle</span><span class="p">.</span><span class="n">train</span><span class="p">(</span><span class="n">circle_data</span><span class="p">,</span><span class="n">L0</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span><span class="n">lam</span><span class="o">=</span><span class="mf">1e2</span><span class="p">,</span><span class="n">sigma0</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span><span class="n">frames</span><span class="o">=</span><span class="n">frames_circle</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>It gives this final set of feature vectors:</p>
<div class="imgcap">
<div>
<img src="/assets/soms/images/final_circle.png" alt="trained SOM on circle" />
</div>
<div class="thecap">Feature vectors of the SOM after training</div>
</div>
<p><br />
The SOM “imitated” the circle as it could!!
Here’s the whole training video:</p>
<center>
<div class="imgcap">
<video width="50%" controls="">
<source type="video/mp4" src="/assets/soms/video/circle.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">SOM training on circumference circle data</div>
</div>
</center>
<p><br />
And, again, the SOM’s <em>point lattice</em> topology was somehow preserved:</p>
<div class="imgcap">
<div>
<img src="/assets/soms/images/circle_top.png" alt="topology with circle data" />
</div>
<div class="thecap">ID of each feature vector on the SOM point lattice with circle data</div>
</div>
<p><br /></p>
<p>Therefore, whereas it seems impossible for the SOM to match the exact circumference of a circle, it fits it
by expanding like a disc. Neat…</p>
<p>It even looks better than our square that was supposed to be a really good fit for the <em>point lattice</em>.
However let’s take into account that we had data from inside the square during the training. Thus not all steps were informational
in terms of shape because when an input data point already was inside the SOM area it would change almost nothing.
It’s quite certain that redoing the square training but only with points on the border of the square would lead to
“more visual” results.</p>
<p>Conversly, feeding input data points from all over a disc may lead to weird results.</p>
<h3 id="colors">Colors</h3>
<p>We used 2D feature vectors in order to visualize the training of our SOM. In the same spirit, there’s a type
of 3D features that can be visualized: colors.</p>
<p>However our approach is going to be a bit different. While we had a lot of data in the previous example we are going
to fed the SOM with only 3 colors. We’ll then plot the SOM with the colors on it which is straightforward when using
the <code class="language-plaintext highlighter-rouge">plt.imshow</code> <code class="language-plaintext highlighter-rouge">matplotlib</code> routine.</p>
<p>We first generates our random colors:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">color_data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<div class="imgcap">
<div>
<img src="/assets/soms/images/random_colors.png" alt="3 random colors" />
</div>
<div class="thecap">Our dataset: 3 random colors</div>
</div>
<p><br /></p>
<p>Then train the SOM:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="n">som_color</span> <span class="o">=</span> <span class="n">SOM</span><span class="p">(</span><span class="mi">40</span><span class="p">,</span><span class="mi">40</span><span class="p">,</span><span class="mi">3</span><span class="p">)</span>
<span class="n">frames_color</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">som_color</span><span class="p">.</span><span class="n">train</span><span class="p">(</span><span class="n">color_data</span><span class="p">,</span><span class="n">L0</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span><span class="n">lam</span><span class="o">=</span><span class="mf">1e2</span><span class="p">,</span><span class="n">sigma0</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span><span class="n">frames</span><span class="o">=</span><span class="n">frames_color</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>At the end of the learning we get:</p>
<div class="imgcap">
<div>
<img src="/assets/soms/images/final_colors.png" alt="SOM trained on random colors" />
</div>
<div class="thecap">Trained SOM on 3 random colors</div>
</div>
<p><br /></p>
<p>It looks very much like a Voronoï diagram. Three regions have been created, one for each color.</p>
<p>Here’s the video of the training:</p>
<center>
<div class="imgcap">
<video width="50%" controls="">
<source type="video/mp4" src="/assets/soms/video/colors.mp4" />
Your browser does not support the video tag.
</video>
<div class="thecap">A SOM trained on 3 random colors</div>
</div>
</center>
<p><br /></p>
<p>In that example, the <code class="language-plaintext highlighter-rouge">sigma(t) < 0.5</code> condition and the $\lambda = 10^2$ we chose have made
the learning quite long whereas the SOM stabilizes early (around 0.25s over 2.27m on the video).</p>
<!-- ### Classification
#### Training
Let's move to a more practical example: classification.
We are going to use the 4D data of the `sklearn` `iris` dataset.
This dataset describes the features of 150 iris flowers. There's three types of flowers,
labelled from 0 to 2, and we have 50 examples per class.
```python
iris_data = sklearn.datasets.load_iris()
```
Feature vectors are stocked in `iris_data['data']` and there corresponding targets
in `iris_data['target']`.
You can get plenty of info concerning this dataset by outputting `iris_data['DESCR']`.
We cut this dataset into a train and a test set, keeping 75% of the examples per class in the train set:
```python
train_percent_per_class = 0.75
train_0 = np.random.choice(range(50),int(50*train_percent_per_class),replace=False)
train_1 = np.random.choice(range(50),int(50*train_percent_per_class),replace=False)
train_2 = np.random.choice(range(50),int(50*train_percent_per_class),replace=False)
indices_train = np.append(train_0,np.append(50+train_1,100+train_1))
indices_test = list(set(range(150))-set(indices_train))
```
Then we feed the train set to an empirically chosen **(6,6,4)** SOM.
```python
som_iris = SOM(6,6,4)
som_iris.train(iris_data['data'][indices_train],L0=0.8,lam=1e2,sigma0=5)
```
And finally, we plot the *majority matrix* after training:
<div class="imgcap">
<div>
<img src="/assets/soms/images/iris_maj.png" alt="majority matrix on the iris dataset"/>
</div>
<div class="thecap">Majority matrix on the iris dataset</div>
</div>
<br/>
This *majority matrix* is built by first grouping all our data per bmu: we associate to each cell
the input vectors for which it's the bmu. rThen we compute the most frequent target associated to each cell.
On the image, 0 means that there are no input vectors associated to the cell and classes
0,1,2 are respectively 1,2,3.
It's looks quite nice, we have 3 distinct zones, one for each target. The SOM has performed
an unsupervised clustering of our data!
However the *majority* information is not very informational alone. Indeed it could hide
mixed cells, with more than one target. We're not sure yet how trustable is our SOM.
That's why we plot an *entropy matrix*. It consists in the *entropy* of each cell target's distribution.
<div class="imgcap">
<div>
<img src="/assets/soms/images/iris_ent.png" alt="entropy matrix on the iris dataset"/>
</div>
<div class="thecap">Entropy matrix on the iris dataset</div>
</div>
<br/>
If you're not familiar with entropy (a post is coming on that subject!) here's how to interpret it:
- A value close to 0 means that the target's distribution is *pure*. There's mainly one unique target in it,
which is what we want.
- A value close to 1 means that the distribution is not *pure*, it's *mixed*. The least pure distribution
being one with the same number of elements of each target: you cannot trust the majority winner, it
has been chosen at random.
Here it looks really good! Indeed, almost all cells are 100% sure of their associated target. It's also
quite nice that mixed cells are located on the border of two clusters.
As retraining or reshaping the SOM also leads to mixed cells, they might reflect the [Bayesian error] of the dataset.
### Testing
Now, if we want to classify new data, we simply feed it to the SOM and output the majority target of its BMU.
The entropy matrix tells us how close to randomness is our answer depending on the BMU.
TODOTODO:
Find points satisfying set of distance constraints -->
<h2 id="whats-next-">What’s next ?</h2>
<p>If you want to see an even more concrete example of how to use SOMs checkout a project I did on how to recognize apparels on pictures <a href="http://perso.ens-lyon.fr/tristan.sterin/reports/sterin_MSc_internship_DeepNeuralFeatures.pdf">here</a>.</p>
<p><a name="ref"></a></p>
<h2 id="references">References</h2>
<h3 id="articles">Articles</h3>
<p><a href="/assets/soms/doc/kohonen1982.pdf">[Kohonen 82]</a> <br />
<a href="http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1990-Kohonen-PIEEE.pdf">[Kohonen 90]</a></p>
<h3 id="code">Code</h3>
<p><a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/som.py">[SOM code]</a> <br />
<a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/SOM_construction.ipynb">[SOM notbook]</a> <br />
<a href="https://github.com/tcosmo/tcosmo.github.io/blob/master/assets/soms/code/SOM_viz.ipynb">[SOM visualization]</a></p>
<h3 id="report">Report</h3>
<p><a href="http://perso.ens-lyon.fr/tristan.sterin/reports/sterin_MSc_internship_DeepNeuralFeatures.pdf">[Deep Neural Features]</a></p>
<h3 id="websites">Websites</h3>
<p>[1]: http://www.scholarpedia.org/article/Kohonen_network <br />
[2]: http://cis.legacy.ics.tkk.fi/research/som-bibl/vol1_4.pdf <br />
[3]: http://www.ai-junkie.com/ann/som/som1.html <br />
[4]: https://github.com/alexhagen/jdc</p>Implementing SOMs in Python.Welcome to Jekyll!2017-05-23T14:10:47+00:002017-05-23T14:10:47+00:00/jekyll/update/2017/05/23/welcome-to-jekyll<p>You’ll find this post in your <code class="language-plaintext highlighter-rouge">_posts</code> directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run <code class="language-plaintext highlighter-rouge">jekyll serve</code>, which launches a web server and auto-regenerates your site when a file is updated.</p>
<p>To add new posts, simply add a file in the <code class="language-plaintext highlighter-rouge">_posts</code> directory that follows the convention <code class="language-plaintext highlighter-rouge">YYYY-MM-DD-name-of-post.ext</code> and includes the necessary front matter. Take a look at the source for this post to get an idea about how it works.</p>
<p>Jekyll also offers powerful support for code snippets:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="k">def</span> <span class="nf">print_hi</span><span class="p">(</span><span class="nb">name</span><span class="p">)</span>
<span class="nb">puts</span> <span class="s2">"Hi, </span><span class="si">#{</span><span class="nb">name</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="n">print_hi</span><span class="p">(</span><span class="s1">'Tom'</span><span class="p">)</span>
<span class="c1">#=> prints 'Hi, Tom' to STDOUT.</span></code></pre></figure>
<p>Check out the <a href="https://jekyllrb.com/docs/home">Jekyll docs</a> for more info on how to get the most out of Jekyll. File all bugs/feature requests at <a href="https://github.com/jekyll/jekyll">Jekyll’s GitHub repo</a>. If you have questions, you can ask them on <a href="https://talk.jekyllrb.com/">Jekyll Talk</a>.</p>