Akash VaishEntity from Middle Earth
http://akash9712.github.io/
Mon, 13 Aug 2018 13:20:10 +0000Mon, 13 Aug 2018 13:20:10 +0000Jekyll v3.7.3Week 13: Wrapping up (not really)<p>I am still working on implementing Stochastic Processes as per an API that is similar to the current API, but unfortunately, haven’t made much progress this week. On the other hand, expectations of expressions in the various ‘components’ of a joint random variable can now be calculated (Implemented in <a href="https://github.com/sympy/sympy/pull/15079">#15079</a>). One issue that came up while implementing this was how to approach arithmatic expressions of a joint random variable itself. Consider the following:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> N = Normal('N', [0, 1], [[1, 0], [0, 1]])
>>> E(N)
</code></pre></div></div>
<p>Now, <code class="highlighter-rouge">N[1]</code> and <code class="highlighter-rouge">N[0]</code> take values belonging to the set of reals numbers, but <code class="highlighter-rouge">N</code> itself takes values in <code class="highlighter-rouge">R*R</code>, where <code class="highlighter-rouge">R</code> is the set of real numbers. Francesco suggested that we might consider a joint random variable to be a vector, but this issue has been left for a later time. As of now, running a piece of code as given in the example will simply raise a <code class="highlighter-rouge">NotImplementedError</code>.</p>
<hr />
<p>GSoC 2018 coding period officially ends on August 14th, which means this is the last in the series of GSoC blog. I plan to keep contributing to the SymPy statistics module in the future, and implement the work which could not be completed over the summers in the near future. Hopefully, the features added over the summer will be useful to anyone using SymPy/stats.</p>
<p>A summary of the work I did as a part of GSoC 2018 can be found <a href="https://github.com/akash9712/GSoC-Report/wiki/GSoC-2018:-Improving-Probability-and-Random-processes:-Report">here</a>.</p>
Sun, 12 Aug 2018 00:00:00 +0000
http://akash9712.github.io/2018/week13/
http://akash9712.github.io/2018/week13/GSoCGSoCWeek 12: Stochastic Processes<p>This last week turned out to be particularly tedious for me. GSoC period is about to come to an end, the final evals are here, and yet again, my laptop malfunctioned. But that’s a whole another story.
This week’s task was to complete the implementation of Stochastic Processes, and hopefully get it merged into the SymPy master before the coding period ends. This was one of the first things my mentors discussed with me, and something I almost completely got wrong in my project proposal. However, with the joint distributions already being a part of the master branch, the task seems quite feasable.
Right now, I have made <a href="https://github.com/sympy/sympy/pull/15058">a PR</a> that implements Bernoulli process. This process is implemented in a way that indexing the process returns a random variable with the bernoulli distribution, and contains a method which returns a joint distribution given a set of keys. For example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> B = BernoulliProcess('x', S(1)/3)
>>> x, y = symbols('x y', integer=True)
>>> assert E(B[1]) == S(1)/3
>>> assert E(B[x + y], B[x]) == S(1)/3
</code></pre></div></div>
<p>One other change that was made this week was in PR <a href="https://github.com/sympy/sympy/pull/15045">#15045</a>. As per the changes in this PR, random variables are not needed any more to create a marginal distribution, and it can be created only out a joint distribution and the indices which should be a part of the marginal distribution. This means that Compound distributions, whose PDF was previously implemented by creating a marginal distribution and later marginalising out all the latent distributions, now calculate their PDFs independently. This makes the code look a little more ugly in the <code class="highlighter-rouge">MarginalDistribution.pdf</code>, but is more in accordance with how the other random distributions in SymPy are created, i.e, without the use of random variables.</p>
Wed, 08 Aug 2018 00:00:00 +0000
http://akash9712.github.io/2018/week12/
http://akash9712.github.io/2018/week12/GSoCGSoCWeek 11: Compound distributions(4)<p>My mentors were not very convinced about creating a <code class="highlighter-rouge">JointDistributionHandmade</code> object out of each compound distributions, and marginalising over the latent distribution.
Finally, after some discussion the Gitter channel, and on <a href="https://github.com/sympy/sympy/pull/14989">#14989</a>, the final API that was decided gives handles compound distribution to give results as in the follows:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>N1 = Normal('N1', 0, 1)
N2 = Normal('N2', N1, 2)
assert density(N2)(0).doit() == sqrt(10)/(10*sqrt(pi))
assert simplify(density(N2, Eq(N1, 1))(x)) == \
sqrt(2)*exp(-(x - 1)**2/8)/(4*sqrt(pi))
</code></pre></div></div>
<p>This result doesn’t break the previous API, and gives the correct results in terms of density. This is done by creating a <code class="highlighter-rouge">CompoundDistribution</code> object instead of <code class="highlighter-rouge">SingleDistribution</code>
objects, and calculating the PDFs using <code class="highlighter-rouge">MarginalDistribution</code>. There are checks in place that see to it that the <code class="highlighter-rouge">CompoundDistribution</code> objects indeed have a RV as an argument, and return single distribution objects otherwise.
Though compound distributions took a longer time than expected, the PR was merged into the master branch, and compound distributions are implemented in the development version.</p>
Sun, 29 Jul 2018 00:00:00 +0000
http://akash9712.github.io/2018/week11/
http://akash9712.github.io/2018/week11/GSoCGSoCWeek 9/10: Compound distributions(3)<p>I couldn’t update the blog last week due to some issues with my laptop(I hope they don’t cause more trouble any time soon), so this post is going to contain the updates of the last week as well as the one before it.</p>
<p>I tried another approach of handling compound distributions(<a href="https://github.com/sympy/sympy/pull/14888">#14888</a>), which was basically hardcoding the known results for compound distributions, i.e, writing combinations of <code class="highlighter-rouge">if/else</code> statements to identify if a combination of a ‘outer’ distribution and latent distribution is known to give a certain resultant distribution. Here’s a piece of code from the PR doing that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if cls == NormalDistribution: #cls -> outer distribution of the compound RV
if isinstance(args[0], RandomSymbol) and \
isinstance(distribution(args[0]), NormalDistribution):
mu, sigma = distribution(args[0]).args
return NormalDistribution, (mu, sqrt(sigma**2 + args[1]**2)
</code></pre></div></div>
<p>The issue that remained unaddressed was how to reflect the conditions imposed on latent distributions, which resulted in failing tests. An approach suggested by Francesco to solve this issue was to use Joint distributions. The current implementation is written to return a RV not with a distribution belonging to <code class="highlighter-rouge">SingleDistribution</code>, but rather <code class="highlighter-rouge">MarginalDistribution</code>. The advantage of doing this instead of marginalising at the very beginning was that since the random variable is still a part of the joint distribution, <code class="highlighter-rouge">given</code> can modify as per any condition provided by the user. The outcome was that the result was mathematically correct. The PDF It would, however, require some changes to the current API, because of the following failing tests(or similar ones):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from sympy.abc import x
rate = Beta(l, 2, 3)
X = Poisson(x, rate)
assert density(X, Eq(rate, rate.symbol)) == PoissonDistribution(l)
</code></pre></div></div>
<p>As opposed to the test, the output for the test case would be the PDF of <code class="highlighter-rouge">Poisson(x, l)</code>, <code class="highlighter-rouge">l</code> being a <code class="highlighter-rouge">Symbol</code>.</p>
Sun, 22 Jul 2018 00:00:00 +0000
http://akash9712.github.io/2018/week9-10/
http://akash9712.github.io/2018/week9-10/GSoCGSoCWeek 8: Compound distributions(2)<p>After some changes, <a href="https://github.com/sympy/sympy/pull/14847">#14847</a> was merged. This means that now SymPy supports 4 different kinds of pre-defined joint distributions, as listed in the PR description. A ‘TODO’ that’s best left for later right now is adding the support the method that single probability distributions support, <code class="highlighter-rouge">variance( )</code>, <code class="highlighter-rouge">expectation( )</code> and <code class="highlighter-rouge">probability( )</code>. While I am not sure about all the challenges that might arise while implementing these, implementing <code class="highlighter-rouge">probability</code> might be a bit trickier than others due to the fact that it will require SymPy to solve inequalities in multiple variables, which is not yet supported.</p>
<p>Regarding compound distributions, it’s turning out to be more complicated than I expected. To start with, I am not very sure about how the result should look like. Simply leaving a compound distribution random variable in the state given by the user could be one way, but it might not be mathematically correct. For example, if a user types in the following on a python console:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#<import statements>
Y1 = Poisson('Y1', 2)
Y2 = Poisson('Y2', Y1)
</code></pre></div></div>
<p>Since Y2 is not a poisson random variable with a constant mean, it would be incorrect if <code class="highlighter-rouge">Y2.pspace.distribution</code> returns a <code class="highlighter-rouge">PoissonDistribution</code> object. Changing it to <code class="highlighter-rouge">JointDistribution</code> at the time of its creation, however, is an issue because I cannot yet figure out how to reflect a given condition imposed on the latent distribution(<code class="highlighter-rouge">Y1</code> in the given example).
Here’s an example of what is returned with the current changes in PR #<a href="https://github.com/sympy/sympy/pull/14855">14855</a></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>N1 = Normal('N1', 0, 1)
N2 = Normal('N2', N1, 2)
N = Symbol('N2')
assert simplify(N2.pspace.pdf) == sqrt(10)*exp(-N**2/10)/(10*sqrt(pi))
</code></pre></div></div>
Sun, 08 Jul 2018 00:00:00 +0000
http://akash9712.github.io/2018/week8/
http://akash9712.github.io/2018/week8/GSoCGSoCWeek 7: Starting with Compound distributions<p>I was finally able to get <a href="https://github.com/sympy/sympy/pull/14764">#14764</a> merged, thus having implemented joint probability distributions on the master branch of SymPy. While I have made contributions to the stats module before, this was my first major implementation, and the first new feature I have added to the module, or SymPy entirely. I opened <a href="https://github.com/sympy/sympy/pull/14847">#14847</a> also, which contains certain other predefined joint probability distributions.</p>
<p>I also started working with compound probability distributions, and opened <a href="https://github.com/sympy/sympy/pull/14855">#14855</a>. The PR implements a method to calculate the PDF of any random variable which may have one or more of its parameters randomly distributed. However, I believe several changes will have to be made before this can be merged with the master branch. The first decision that needs to be made is regarding the probability spaces of such distributions; whether it should belong to <code class="highlighter-rouge">SinglePSpace</code> or <code class="highlighter-rouge">JointPSpace</code>/<code class="highlighter-rouge">ProductPSpace</code>. As of now, with the code in <a href="https://github.com/sympy/sympy/pull/14855">#14855</a>, the PDF expression of a compounded RV can be obtained as seen in the following example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> from sympy.stats import *
>>> E1 = Exponential('E1', 1)
>>> E2 = Exponential('E2', E1)
>>> pprint(E2.pspace.pdf)
⎧ 1 π
⎪ ───────── for ─ ≥ │arg(E₂)│
⎪ 2 2
⎪ (E₂ + 1)
⎪
⎨∞
⎪⌠
⎪⎮ -E₁ -E₁⋅E₂
⎪⎮ E₁⋅ℯ ⋅ℯ d(E₁) otherwise
⎪⌡
⎩0
</code></pre></div></div>
Sun, 01 Jul 2018 00:00:00 +0000
http://akash9712.github.io/2018/week7/
http://akash9712.github.io/2018/week7/GSoCGSoCWeek 6: Joint Probability distributions (continued)<p>This week too, was spent working on joint probability distributions. I should have been working on compound probability distributions by now, but multivariate distributions took more time than I anticipated. I think I’ll be able to wrap up joint probability distributions soon, and hopefully catch up with the timeline I suggested in my project proposal.
As for this week’s progress, a class <code class="highlighter-rouge">MarginalPSpace</code> has been implemented, and along with some multivariate distributions. These include multivariate normal, multivariate laplace, multivariate student, and normal gamma distributions. The user can call the respective functions to return an object of <code class="highlighter-rouge">JointPSpace</code>, and can calculate the density or the marginal density at any point in the domain, as per the code in <a href="https://github.com/sympy/sympy/pull/14764">#14764</a>. Here’s an example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> from sympy.stats.joint_rv_types import MultivariateNormal
>>> m = MultivariateNormal((x, y), [1, 2], [[1, 0], [0, 1]])
>>> density(m)(1, 2)
1/(2*pi)
>>> n = MultivariateNormal(('x', 'y', 'z'), [1, 2, 3], [[1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> marginal_density(n, x, y)(1, 2)
1/(2*pi)
</code></pre></div></div>
<p>Hopefully this PR can get merged without many modifications, and I can move on to implementing compound probability distributions.</p>
Sun, 24 Jun 2018 00:00:00 +0000
http://akash9712.github.io/2018/week6/
http://akash9712.github.io/2018/week6/GSoCGSoCWeek 5: Joint Probability distributions<p>[#14777] was merged earlier this week by Francesco, allowing SymPy to handle expressions contained mixed expressions, i.e., expressions containing both continuous and discrete random variables. The expression returned is not simplified as of now, and there were some issues regarding the correctness of the result. However, we were able to verify that the expression returned is indeed the correct answer, by checking it for some trivial cases.</p>
<p>I am still working on Joint probability spaces, and there has been a change in my approach about the functionality, as recommended by my project mentors. I will add some multivariate distributions that the user will be allowed to initialize, similar to single variable cases.
Eg:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#input individual symbols, mean vector, and covariance matrix for multivariate
#normal distribution.
>>> m = MultivariateNormal('m', ('m1', 'm2'), [1, 2], [[1, 0], [0, 1]])
>>> density(m)(1, 2)
1/(2*pi)
</code></pre></div></div>
Sun, 17 Jun 2018 00:00:00 +0000
http://akash9712.github.io/2018/week5/
http://akash9712.github.io/2018/week5/GSoCGSoCWeek 4: Mixed product spaces<p>The SymPy stats module implements the class <code class="highlighter-rouge">ProductPSpace</code>, which can be used for calculations where multiple random variables are involved. Here’s an example.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> from sympy.stats import *
>>> N1, N2 = Normal('N1', 0, 1), Normal('N2', 1, 1)
>>> P(N1 + N2 > 1)
1/2
</code></pre></div></div>
<p>However, the implementation does not allow the user to do calculations on expressions that are a mixture of discrete and continuous random variables. With the PR <a href="https://github.com/sympy/sympy/pull/14777">#14777</a>, the user can calculate probabilities of conditions consisting of both discrete and continuous random vairiables. This was done by removing the subclasses <code class="highlighter-rouge">ProductDiscretePSpace</code> and <code class="highlighter-rouge">ProductContinuousPSpace</code>, and giving <code class="highlighter-rouge">ProductPSpace</code> its own <code class="highlighter-rouge">Probability</code> and <code class="highlighter-rouge">compute_density</code> functions. Apart from adding the functionality, one added benifit of doing this was reducing the complexity of the code because of too many classes and subclasses in the stats module, an issue earlier raised by Francesco and Kalevi.</p>
Sun, 10 Jun 2018 00:00:00 +0000
http://akash9712.github.io/2018/week4/
http://akash9712.github.io/2018/week4/GSoCGSoCWeek 3: Joint Probability spaces (1)<p>I spent most of this week working on joint probability spaces, and have uploaded an incomplete pull request, <a href="https://github.com/sympy/sympy/pull/14764">#14764</a>. This PR implements a basic structure of the joint probability classes. Like I said, the PR is still incomplete and requires a lot of work before it can be merged, but hopefully, I should be able to complete the implementation on time. Right now, the work done primarily focusses on creating Joint probability spaces and the underlying domain and density out of independent random variables.
Example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> from sympy.stats.joint_rv import *
>>> from sympy.stats import Geometric, Poisson
>>> from sympy import S
>>> from sympy.abc import x, y
>>> X, Y = Geometric('X', S(1)/2), Poisson('Y', 4)
>>> Z = Joint('Z', (X, Y))
>>> density(Z)(x, y)
2**(-x + 1)*4**y*exp(-4)/(2*factorial(y))
</code></pre></div></div>
<p>For the next week, I plan to implement joint spaces so as to include user defined distributions, for which individual random variables have not been declared earlier.</p>
Sun, 03 Jun 2018 00:00:00 +0000
http://akash9712.github.io/2018/week3/
http://akash9712.github.io/2018/week3/GSoCGSoC