1. Week 13: Wrapping up (not really)

    I am still working on implementing Stochastic Processes as per an API that is similar to the current API, but unfortunately, haven’t made much progress this week. On the other hand, expectations of expressions in the various ‘components’ of a joint random variable can now be calculated (Implemented in #15079). One issue that came up while implementing this was how to approach arithmatic expressions of a joint random variable itself. Consider the following: >>> N = Normal('N', [0, 1], [[1, 0], [0, 1]]) >>> E(N) Now, N[1] and N[0] take values belonging to the set of reals numbers, but N itself takes values in R*R, where R is the set of real numbers. Francesco suggested that we might consider a joint random variable to be a vector, but this issue has been left for a later time. As of now, running a piece of code as given in the example will simply raise a NotImplementedError. …


  2. Week 12: Stochastic Processes

    This last week turned out to be particularly tedious for me. GSoC period is about to come to an end, the final evals are here, and yet again, my laptop malfunctioned. But that’s a whole another story. This week’s task was to complete the implementation of Stochastic Processes, and hopefully get it merged into the SymPy master before the coding period ends. This was one of the first things my mentors discussed with me, and something I almost completely got wrong in my project proposal. However, with the joint distributions already being a part of the master branch, the task seems quite feasable. Right now, I have made a PR that implements Bernoulli process. This process is implemented in a way that indexing the process returns a random variable with the bernoulli distribution, and contains a method which returns a joint distribution given a set of keys. For example: >>> B = BernoulliProcess('x', S(1)/3) >>> x, y = symbols('x y', integer=True) >>> assert E(B[1]) == S(1)/3 >>> assert E(B[x + y], B[x]) == S(1)/3 …


  3. Week 11: Compound distributions(4)

    My mentors were not very convinced about creating a JointDistributionHandmade object out of each compound distributions, and marginalising over the latent distribution. Finally, after some discussion the Gitter channel, and on #14989, the final API that was decided gives handles compound distribution to give results as in the follows: N1 = Normal('N1', 0, 1) N2 = Normal('N2', N1, 2) assert density(N2)(0).doit() == sqrt(10)/(10*sqrt(pi)) assert simplify(density(N2, Eq(N1, 1))(x)) == \ sqrt(2)*exp(-(x - 1)**2/8)/(4*sqrt(pi)) This result doesn’t break the previous API, and gives the correct results in terms of density. This is done by creating a CompoundDistribution object instead of SingleDistribution objects, and calculating the PDFs using MarginalDistribution. There are checks in place that see to it that the CompoundDistribution objects indeed have a RV as an argument, and return single distribution objects otherwise. Though compound distributions took a longer time than expected, the PR was merged into the master branch, and compound distributions are implemented in the development version. …


  4. Week 9/10: Compound distributions(3)

    I couldn’t update the blog last week due to some issues with my laptop(I hope they don’t cause more trouble any time soon), so this post is going to contain the updates of the last week as well as the one before it. …


  5. Week 8: Compound distributions(2)

    After some changes, #14847 was merged. This means that now SymPy supports 4 different kinds of pre-defined joint distributions, as listed in the PR description. A ‘TODO’ that’s best left for later right now is adding the support the method that single probability distributions support, variance( ), expectation( ) and probability( ). While I am not sure about all the challenges that might arise while implementing these, implementing probability might be a bit trickier than others due to the fact that it will require SymPy to solve inequalities in multiple variables, which is not yet supported. …


  6. Week 7: Starting with Compound distributions

    I was finally able to get #14764 merged, thus having implemented joint probability distributions on the master branch of SymPy. While I have made contributions to the stats module before, this was my first major implementation, and the first new feature I have added to the module, or SymPy entirely. I opened #14847 also, which contains certain other predefined joint probability distributions. …


  7. Week 6: Joint Probability distributions (continued)

    This week too, was spent working on joint probability distributions. I should have been working on compound probability distributions by now, but multivariate distributions took more time than I anticipated. I think I’ll be able to wrap up joint probability distributions soon, and hopefully catch up with the timeline I suggested in my project proposal. As for this week’s progress, a class MarginalPSpace has been implemented, and along with some multivariate distributions. These include multivariate normal, multivariate laplace, multivariate student, and normal gamma distributions. The user can call the respective functions to return an object of JointPSpace, and can calculate the density or the marginal density at any point in the domain, as per the code in #14764. Here’s an example: >>> from sympy.stats.joint_rv_types import MultivariateNormal >>> m = MultivariateNormal((x, y), [1, 2], [[1, 0], [0, 1]]) >>> density(m)(1, 2) 1/(2*pi) >>> n = MultivariateNormal(('x', 'y', 'z'), [1, 2, 3], [[1, 0, 0], [0, 1, 0], [0, 0, 1]]) >>> marginal_density(n, x, y)(1, 2) 1/(2*pi) Hopefully this PR can get merged without many modifications, and I can move on to implementing compound probability distributions. …


  8. Week 5: Joint Probability distributions

    [#14777] was merged earlier this week by Francesco, allowing SymPy to handle expressions contained mixed expressions, i.e., expressions containing both continuous and discrete random variables. The expression returned is not simplified as of now, and there were some issues regarding the correctness of the result. However, we were able to verify that the expression returned is indeed the correct answer, by checking it for some trivial cases. …


  9. Week 4: Mixed product spaces

    The SymPy stats module implements the class ProductPSpace, which can be used for calculations where multiple random variables are involved. Here’s an example. >>> from sympy.stats import * >>> N1, N2 = Normal('N1', 0, 1), Normal('N2', 1, 1) >>> P(N1 + N2 > 1) 1/2 However, the implementation does not allow the user to do calculations on expressions that are a mixture of discrete and continuous random variables. With the PR #14777, the user can calculate probabilities of conditions consisting of both discrete and continuous random vairiables. This was done by removing the subclasses ProductDiscretePSpace and ProductContinuousPSpace, and giving ProductPSpace its own Probability and compute_density functions. Apart from adding the functionality, one added benifit of doing this was reducing the complexity of the code because of too many classes and subclasses in the stats module, an issue earlier raised by Francesco and Kalevi. …


  10. Week 3: Joint Probability spaces (1)

    I spent most of this week working on joint probability spaces, and have uploaded an incomplete pull request, #14764. This PR implements a basic structure of the joint probability classes. Like I said, the PR is still incomplete and requires a lot of work before it can be merged, but hopefully, I should be able to complete the implementation on time. Right now, the work done primarily focusses on creating Joint probability spaces and the underlying domain and density out of independent random variables. Example: >>> from sympy.stats.joint_rv import * >>> from sympy.stats import Geometric, Poisson >>> from sympy import S >>> from sympy.abc import x, y >>> X, Y = Geometric('X', S(1)/2), Poisson('Y', 4) >>> Z = Joint('Z', (X, Y)) >>> density(Z)(x, y) 2**(-x + 1)*4**y*exp(-4)/(2*factorial(y)) For the next week, I plan to implement joint spaces so as to include user defined distributions, for which individual random variables have not been declared earlier. …