previous next up top

Embedded probabilistic domain-specific language Hakaru10 (discrete and continuous distributions, MCMC (MH) sampling)

Probabilistic Programming Language and its Incremental Evaluation
The implementation of Hakaru10
Introductory examples and regression tests
Example: colored balls
Example: Bird migration problem

Probabilistic Programming Language and its Incremental Evaluation

[The Abstract of the paper]
This system description paper introduces the probabilistic programming language Hakaru10, for expressing, and performing inference on (general) graphical models. The language supports discrete and continuous distributions, mixture distributions and conditioning. Hakaru10 is a DSL embedded in Haskell and supports Monte-Carlo Markov Chain (MCMC) inference.
Hakaru10 is designed to address two main challenges of probabilistic programming: performance and correctness. It implements the incremental Metropolis-Hastings method, avoiding all redundant computations. In the presence of conditional branches, efficiently maintaining dependencies and correctly computing the acceptance ratio are non-trivial problems, solved in Hakaru10. The implementation is unique in being explicitly designed to satisfy the common equational laws of probabilistic programs. Hakaru10 is typed; specifically, its type system statically prevents meaningless conditioning, enforcing that the values to condition upon must indeed come from outside the model.
References: design.pdf [315K]
The paper published in Springer's LNCS 10017: the Proceedings of the 14th Asian Symposium on Programming Languages and Systems (APLAS 2016). November 21-23, 2016. Hanoi, Vietnam.
DOI: 10.1007/978-3-319-47958-3_19

The implementation of Hakaru10

The language Hakaru10 is shallowly embedded in Haskell and is, therefore, a regular Haskell library. Here is the code.

Version

The current version is November 2016.

References

Metropolis.hs [52K]
Metropolis-Hastings (MH) algorithm, the main body of Hakaru10

Syntax.hs [12K]
Syntax sugar: making the library more convenient to use

Distribution.hs [16K]
Primitive distributions

Util.hs [2K]
Utilities: computing histograms, momenta, KL divergence, etc.

Memory.hs [4K]
First-class storage

Introductory examples and regression tests

The `factorial example' of the Bayesian net is the lawn/grass model, with three random variables representing the events of rain, of a switched-on sprinkler and wet grass. The (a priori) probabilities of the first two events are judged to be 30% and 50% correspondingly. Rain almost certainly (90%) wets the grass. The sprinkler also makes the grass wet, in 80% of the cases. The grass may also be wet for some other reason. The modeler gives such an unaccounted event 10% of a chance. Having observed in the morning that the grass is wet, we want to find out the chance it was raining overnight. This model is encoded in Hakaru10 as follows:

     grass = do
      rain        <- dist bern 0.3
      sprinkler   <- dist bern 0.5
      --  dist (nor 0.9 0.8 0.1) rain sprinkler
      grass_is_wet <- dist (True `condition` nor 0.9 0.8 0.1) rain sprinkler
      return rain
     
     -- noisy-or function
     nor :: Double -> Double -> Double -> Bool -> Bool -> Distribution Bool
     nor strengthX strengthY noise x y =
       bern $ 1 - nnot (1-strengthX) x * nnot (1-strengthY) y * (1-noise)
     
     -- noisy not function
     nnot :: Num a => a -> Bool -> a
     nnot p True  = p
     nnot p False = 1

The model is an ordinary Haskell function grass whose inferred type is Model Bool. The function nor is a custom parameterized distribution, the noisy-or function. Sampling (20000 times) from the model -- evaluating mcmC 20000 grass -- and counting the number of True gives the posterior estimate of rain having observed that the grass is wet. See the introduction part of the paper for more explanation of the example.

References

Examples1.hs [8K]
All examples from the paper, and many more

Bench.hs [2K]
The benchmark code

Example: colored balls

This is an elegant example of a model with indefinite number of objects whose `real' identity is unobservable. The example is a simplified version of population estimation. HANSEI's implementation of the example makes the case for memoization of stochastic functions. The example was introduced as example 1.1 in Milch's et al. chapter on BLOG:
``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''
References: Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >
ColoredBalls.hs [7K]
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.

Example: Bird migration problem

Bird migration problem was one of the Challenge problems of the DARPA PPAML (Probabilistic Programming for Advanced Machine Learning) Program. The goal was to estimate the model parameters from the set of bird migration observations over the period of thirty years. The migration model is a log-linear model with four parameters that gives the probability of a bird flying from cell i to cell j under given conditions.

Bird migration problem was also the topic of the 2018 bachelor thesis of Sasaki Shoichiro. He was to learn Haskell, Hakaru10, try to implement the model and estimate its parameters, and reflect on his work. The first conclusion of his thesis is that the bird migration model is expressible in Hakaru10 -- in the form that rather closely reflects the model specification:

     model :: Features -> ObsData -> Model [Double]
     model features obs = do
       b1  <- abs <$> dist normal 0 10 -- Priors of the four parameters, b1..b4
       b2  <- abs <$> dist normal 0 10 -- see Statement of Problem, p9
       b3  <- abs <$> dist normal 0 10
       b4  <- abs <$> dist normal 0 10
       let bs = collect [b1,b2,b3,b4]
       forM_ [1..nYears] $ \year -> do -- each year
         foldM_ (\n day -> do -- one bird exists in cell n on each day(1..nDays-1)
           let trans n bs = [ (j,theta year day n j features bs) | j<-neighbors n]
           -- n' is the cell where the bird is on day+1
           let day' = day + 1
           n' <- dist categorical $ liftA2 trans n bs 
           forM_ [1..nCells] $ \i ->
             dist (fromIntegral (obs ! (year,day',i)) `condition` poisson)
                ((\nv -> if i==nv then 1 else 0) <$> n')
           return n'
           ) (1 :: SExp Int) [1..nDays-1]
       return bs

where theta computes the dot-product of the parameters and the feature vector at the day and year in question, and exponentiates the result. Due to the extreme time pressure, only the first of the three parts of the Challenge problem was implemented: the 1-bird dataset part. The second conclusion is that it turns out possible to learn Haskell and Hakaru10 within a month, leaving time to implement and run the model, analyze its results and compare with other solutions (BLOG). It did take time to get used to the `applicative' character of Hakaru10, and to use liftA and applicative combinators appropriately. The third conclusion is that Hakaru10 turns out performant: on his Mac laptop, he could obtain 1 million samples of the MCMC chain within 9 minutes. He could comfortably obtain even 10 million samples, which gave him enough material to investigate and verify the convergence. For comparison, the published BLOG solution to the same problem needed 49 minutes for 1000 samples on the same laptop.

Overall we conclude that Hakaru10 is good at least for a bachelor thesis.

References

BirdMigrationProblem.pdf [715K]
Tom Dietterich and Shahed Sorower: PPAML Challenge Problem: Bird Migration
Problem description. Version 6, August 30, 2014

Birds1.hs [7K]
The Hakaru10 bird migration model and the main inference module

Last updated July 14, 2018

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML