previous   next   up   top

Embedded probabilistic domain-specific language Hakaru10 (discrete and continuous distributions, MCMC (MH) sampling)

 


 

Probabilistic Programming Language and its Incremental Evaluation

[The Abstract of the paper]
This system description paper introduces the probabilistic programming language Hakaru10, for expressing, and performing inference on (general) graphical models. The language supports discrete and continuous distributions, mixture distributions and conditioning. Hakaru10 is a DSL embedded in Haskell and supports Monte-Carlo Markov Chain (MCMC) inference.

Hakaru10 is designed to address two main challenges of probabilistic programming: performance and correctness. It implements the incremental Metropolis-Hastings method, avoiding all redundant computations. In the presence of conditional branches, efficiently maintaining dependencies and correctly computing the acceptance ratio are non-trivial problems, solved in Hakaru10. The implementation is unique in being explicitly designed to satisfy the common equational laws of probabilistic programs. Hakaru10 is typed; specifically, its type system statically prevents meaningless conditioning, enforcing that the values to condition upon must indeed come from outside the model.

References
design.pdf [315K]
The paper published in Springer's LNCS 10017: the Proceedings of the 14th Asian Symposium on Programming Languages and Systems (APLAS 2016). November 21-23, 2016. Hanoi, Vietnam.
DOI: 10.1007/978-3-319-47958-3_19

 

The implementation of Hakaru10

The language Hakaru10 is shallowly embedded in Haskell and is, therefore, a regular Haskell library. Here is the code.
Version
The current version is November 2016.
References
Metropolis.hs [52K]
Metropolis-Hastings (MH) algorithm, the main body of Hakaru10

Syntax.hs [12K]
Syntax sugar: making the library more convenient to use

Distribution.hs [14K]
Primitive distributions

Util.hs [2K]
Utilities: computing histograms, momenta, KL divergence, etc.

Memory.hs [4K]
First-class storage

 

Introductory examples and regression tests

The `factorial example' of the Bayesian net is the lawn/grass model, with three random variables representing the events of rain, of a switched-on sprinkler and wet grass. The (a priori) probabilities of the first two events are judged to be 30% and 50% correspondingly. Rain almost certainly (90%) wets the grass. The sprinkler also makes the grass wet, in 80% of the cases. The grass may also be wet for some other reason. The modeler gives such an unaccounted event 10% of a chance. Having observed in the morning that the grass is wet, we want to find out the chance it was raining overnight. This model is encoded in Hakaru10 as follows:
     grass = do
      rain        <- dist bern 0.3
      sprinkler   <- dist bern 0.5
      --  dist (nor 0.9 0.8 0.1) rain sprinkler
      grass_is_wet <- dist (True `condition` nor 0.9 0.8 0.1) rain sprinkler
      return rain
     
     -- noisy-or function
     nor :: Double -> Double -> Double -> Bool -> Bool -> Distribution Bool
     nor strengthX strengthY noise x y =
       bern $ 1 - nnot (1-strengthX) x * nnot (1-strengthY) y * (1-noise)
     
     -- noisy not function
     nnot :: Num a => a -> Bool -> a
     nnot p True  = p
     nnot p False = 1

The model is an ordinary Haskell function grass whose inferred type is Model Bool. The function nor is a custom parameterized distribution, the noisy-or function. Sampling (20000 times) from the model -- evaluating mcmC 20000 grass -- and counting the number of True gives the posterior estimate of rain having observed that the grass is wet. See the introduction part of the paper for more explanation of the example.

References
Examples1.hs [8K]
All examples from the paper, and many more

Bench.hs [2K]
The benchmark code

 

Example: colored balls

This is an elegant example of a model with indefinite number of objects whose `real' identity is unobservable. The example is a simplified version of population estimation. HANSEI's implementation of the example makes the case for memoization of stochastic functions. The example was introduced as example 1.1 in Milch's et al. chapter on BLOG:

``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''

References
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >

ColoredBalls.hs [7K]
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.



Last updated January 1, 2017

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML