Hakaru10 is designed to address two main challenges of probabilistic programming: performance and correctness. It implements the incremental Metropolis-Hastings method, avoiding all redundant computations. In the presence of conditional branches, efficiently maintaining dependencies and correctly computing the acceptance ratio are non-trivial problems, solved in Hakaru10. The implementation is unique in being explicitly designed to satisfy the common equational laws of probabilistic programs. Hakaru10 is typed; specifically, its type system statically prevents meaningless conditioning, enforcing that the values to condition upon must indeed come from outside the model.
Syntax sugar: making the library more convenient to use
Utilities: computing histograms, momenta, KL divergence, etc.
grass = do rain <- dist bern 0.3 sprinkler <- dist bern 0.5 -- dist (nor 0.9 0.8 0.1) rain sprinkler grass_is_wet <- dist (True `condition` nor 0.9 0.8 0.1) rain sprinkler return rain -- noisy-or function nor :: Double -> Double -> Double -> Bool -> Bool -> Distribution Bool nor strengthX strengthY noise x y = bern $ 1 - nnot (1-strengthX) x * nnot (1-strengthY) y * (1-noise) -- noisy not function nnot :: Num a => a -> Bool -> a nnot p True = p nnot p False = 1
The model is an ordinary Haskell function
grass whose inferred type
Model Bool. The function
nor is a custom parameterized distribution,
the noisy-or function. Sampling (20000 times) from the model --
mcmC 20000 grass -- and counting the number of
True gives the posterior estimate of
rain having observed that the
grass is wet. See the introduction part of the paper for more
explanation of the example.
The benchmark code
``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.
junder given conditions.
Bird migration problem was also the topic of the 2018 bachelor thesis of Sasaki Shoichiro. He was to learn Haskell, Hakaru10, try to implement the model and estimate its parameters, and reflect on his work. The first conclusion of his thesis is that the bird migration model is expressible in Hakaru10 -- in the form that rather closely reflects the model specification:
model :: Features -> ObsData -> Model [Double] model features obs = do b1 <- abs <$> dist normal 0 10 -- Priors of the four parameters, b1..b4 b2 <- abs <$> dist normal 0 10 -- see Statement of Problem, p9 b3 <- abs <$> dist normal 0 10 b4 <- abs <$> dist normal 0 10 let bs = collect [b1,b2,b3,b4] forM_ [1..nYears] $ \year -> do -- each year foldM_ (\n day -> do -- one bird exists in cell n on each day(1..nDays-1) let trans n bs = [ (j,theta year day n j features bs) | j<-neighbors n] -- n' is the cell where the bird is on day+1 let day' = day + 1 n' <- dist categorical $ liftA2 trans n bs forM_ [1..nCells] $ \i -> dist (fromIntegral (obs ! (year,day',i)) `condition` poisson) ((\nv -> if i==nv then 1 else 0) <$> n') return n' ) (1 :: SExp Int) [1..nDays-1] return bswhere
thetacomputes the dot-product of the parameters and the feature vector at the day and year in question, and exponentiates the result. Due to the extreme time pressure, only the first of the three parts of the Challenge problem was implemented: the 1-bird dataset part. The second conclusion is that it turns out possible to learn Haskell and Hakaru10 within a month, leaving time to implement and run the model, analyze its results and compare with other solutions (BLOG). It did take time to get used to the `applicative' character of Hakaru10, and to use
liftAand applicative combinators appropriately. The third conclusion is that Hakaru10 turns out performant: on his Mac laptop, he could obtain 1 million samples of the MCMC chain within 9 minutes. He could comfortably obtain even 10 million samples, which gave him enough material to investigate and verify the convergence. For comparison, the published BLOG solution to the same problem needed 49 minutes for 1000 samples on the same laptop.
Overall we conclude that Hakaru10 is good at least for a bachelor thesis.
The Hakaru10 bird migration model and the main inference module
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML