previous   next   up   top

Embedded probabilistic domain-specific language HANSEI (discrete distributions, importance sampling, nested inference)




HANSEI is the the embedded domain-specific language for probabilistic programming: for writing potentially infinite discrete-distribution models and performing exact inference, importance sampling and inference of inference.

HANSEI is an ordinary OCaml library, with probability distributions represented as ordinary OCaml programs. Delimited continuations let us reify non-deterministic programs as lazy search trees, which we may then traverse, explore, or sample. Thus an inference procedure and a model invoke each other as co-routines. Thanks to the delimited control, deterministic expressions look exactly like ordinary OCaml expressions, and are evaluated as such, without any overhead.

Inference procedures and probabilistic models are both ordinary OCaml functions. Both may be defined and extended by library users; both may appear in each other's code. Performing inference on a model that includes calls to inference procedures lets us parameterize distributions by distributions, and lets inference procedures measure their own accuracy. One application is modeling agents reasoning about each other's limited reasoning.

This is joint work with Chung-chieh Shan.

Probabilistic modeling in a general-purpose programming language
The web page also explains the advantage of delimited continuations for representing non-deterministic and stochastic computations.

nips2008.pdf [266K]
Abstract of the poster at the NIPS2008 workshop ``Probabilistic Programming: Universal Languages, Systems and Applications''


Embedded probabilistic programming

[The Abstract of the paper]
Two general techniques for implementing a domain-specific language (DSL) with less overhead are the finally-tagless embedding of object programs and the direct-style representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countable probabilistic models and performing exact inference and importance sampling on them. Our language is embedded as an ordinary OCaml library and represents probability distributions as ordinary OCaml programs. We use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of a model. We thus take advantage of the existing OCaml implementation to achieve competitive performance and ease of use. Inference algorithms can easily be embedded in probabilistic programs themselves.

The paper, in particular, presents a step-by-step refining of the finally-tagless to a very shallow embedding of the probabilistic DSL. That is, probabilistic functions are embedded as host-language functions, calls to them are embedded as host function calls, and random integers can be added using the addition operation of the host language. This approach lets us take advantage of the existing infrastructure of the host language while also reducing notational and interpretive overhead.

This is joint work with Chung-chieh Shan.

dsl-paper.pdf [189K]
In proceedings of the IFIP working conference on domain-specific languages, ed. Walid Taha. LNCS 5658, Springer, 2009, pp. 360-384.

Delimited control and breadth-first, depth-first, and iterative deepening search
Another illustration, only in Haskell rather than OCaml, and without probabilities


Monolingual Probabilistic Programming Using Generalized Coroutines

[The Abstract of the paper]
Probabilistic programming languages and modeling toolkits are two modular ways to build and reuse stochastic models and inference procedures. Combining strengths of both, we express models and inference as generalized coroutines in the same general-purpose language. We use existing facilities of the language, such as rich libraries, optimizing compilers, and types, to develop concise, declarative, and realistic models with competitive performance on exact and approximate inference. In particular, a wide range of models can be expressed using memoization. Because deterministic parts of models run at full speed, custom inference procedures are trivial to incorporate, and inference procedures can reason about themselves without interpretive overhead. Within this framework, we introduce a new, general algorithm for importance sampling with look-ahead.

This is joint work with Chung-chieh Shan.

embedpp.pdf [125K]
In proceedings of the 25th conference on uncertainty in artificial intelligence (UAI), 2009.


Probabilistic programming using first-class stores and first-class continuations

[The Summary]
Probabilistic inference is a popular way to deal with uncertainty in many areas of science and engineering. Following the declarative approach of expressing probabilistic models and inference algorithms as separate, reusable modules, we built a probabilistic programming language as an OCaml library and applied it to several problems. We describe this embedded domain-specific language using aircraft tracking as an example. We focus on our use of first-class stores and first-class continuations to perform faster inference (with local laziness and full-speed determinism) and express more models (with stochastic memoization and nested inference).

This is joint work with Chung-chieh Shan.

ML2010.pdf [60K]
The extended abstract published in the informal proceedings of the 2010 ACM SIGPLAN Workshop on ML.


The implementation of HANSEI

The language HANSEI is shallowly embedded in OCaml and is, therefore, a regular OCaml library. Here is the code.
The current version is April 2009.
ptypes.mli [<1K]
Library type declarations

probM.mli [4K]
The interface of the library: the `syntax' of HANSEI [18K]
The implementation of the library core: The dist function, evidence assertion, reification and reflection, call-by-need evaluation, sharing and memoization, reification and memoization in nested scopes

inference.mli [3K] [13K]
Exact and approximate inference procedures: the explorers of the lazy search tree resulting from reifying probabilistic programs [7K]
pMap.mli [5K]
Dependency: PMap - Polymorphic maps, by Xavier Leroy, Nicolas Cannasse, and Markus Mottl. The original code was augmented with the function insert_with

Dependency: Delimited continuations in OCaml

Makefile [3K]
How to compile the library and run the tests and examples


Introductory examples and regression tests

References [4K]
The lawn/grass model: a simple Bayesian net. A taste of nested inference [11K]
Exact inference, variable elimination, likening to IBAL [16K]
Importance sampling with look-ahead: a new, general algorithm for approximate inference; contrast with rejection sampling [27K]
Lazy evaluation (or, evidence pulling) and sharing. Probabilistic context-free grammars. Memoization of stochastic functions [16K]
Many examples and regression tests of nested inference and nested call-by-need probabilistic computations


Example: colored balls

This is an elegant example of a model with indefinite number of objects whose `real' identity is unobservable. The example is a simplified version of population estimation. HANSEI's implementation of the example makes the case for memoization of stochastic functions. The example was introduced as example 1.1 in Milch's et al. chapter on BLOG:

``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''

Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< > [5K]
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.


Example: blood types

This is an example from Manfred Jaeger's benchmark of probabilistic logic languages and systems: ``A simple genetic model for the inheritance of bloodtypes in a pedigree. Input domains consist of pedigrees with individuals and father and mother relation. The reference model comes from Balios.'' The hidden variables are the presence of genes A and B in the chromosomes inherited from the mother and the father. The observed variable is the blood type, which is a stochastic function of the genotype. A typical query asks for the distribution of blood types for a particular individual, possibly conditioned on the known genotype of some of the ancestors.
Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< > [9K]
The well-commented code of the model and sample inferences


Example: Hidden Markov Model (HMM)

A simple HMM model with a single hidden 8-state variable. Our implementation demonstrates the investigation of time complexity of the inference -- in an interactive session at the OCaml top level -- and changing the model by introducing variable elimination, which speeds up inference exponentially.
Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< > [11K]
The well-commented code of the model and the record of time complexity investigations and optimization


Example: Noisy OR

This is another problem from Manfred Jaeger's benchmark: ``A very basic model that uses the noisy-or function for combining several causal inputs. Input structures are directed acyclic graphs on which a random unary attribute is propagated from the roots downward. Reference model comes from Primula.''

We write the model to be independent from the structure of the graph. Typical queries ask for the (distribution of the) values of the attribute at a particular node. Therefore, we express the attribute propagation backwards: rather the specifying how the attribute of a node propagates to its descendants, we describe how a node obtains the attribute from its ancestors. The process is trivially declared in OCaml, as a recursive function that, given a non-root node, finds its ancestors and combines their attributes using noisy-or. We easily extend this function with assertions of available evidence, to support conditional queries.

Although the graph does not have directed cycles, it does have many undirected ones. As we work backwards, we may encounter a given node several times. Each time the value of node's random attribute must be the same. We have to use memoization.

Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< > [8K]
The well-commented code of several variations of the model. All in all, our exact inference scales to the large version of the benchmark, with 19 nodes.


Realistic Example: Motivic development in classical music

This model developed by Avi Pfeffer is said to be of realistic complexity. It illustrates impressively advanced importance sampling of IBAL. The model describes the transformation of melodic motives in classical music as a combination of transposition, inversion, deletion and insertion operations. The number of operations and their sequence are random, as is the effect of transposition on each note. Given two motives (sequences of notes) we wish to determine how likely one motive could have been developed from the other.

The very large search space and the very low probability of evidence make exact inference or rejection sampling hopeless. IBAL was able to handle the model using a combination of very clever and extensive analyses of model's code. Since HANSEI is embedded in OCaml and is a regular OCaml library, it cannot obtain the code of the whole program. It turns out that with the combination of reification and lazy evaluation we are able to handle this model with accuracy roughly comparable to that of IBAL.

Avi Pfeffer: A General Importance Sampling Algorithm for Probabilistic Programs.
Harvard University Technical Report TR-12-07, 2007. Dagstuhl Seminar 07161.
< > [6K]
Warm-up: splitting a note sequence in two parts at a random place, and applying a randomly chosen operation (deletion, identity, or a note transposition by various amounts) to both parts of the sequence. Crucially this simplified model does not recursively subdivide and transform the sequence. Therefore, the state space remains sufficiently small even for exact inference. The lazy evaluation was essential. [16K]
The full model, with recursive subdivision. The state space is very large, and the probability of evidence is very low, of the order 10^(-7). Only importance sampling is feasible.


Realistic Example: Multitarget tracking

Milch's et al. BLOG chapter, Example 1.3, briefly described the problem of radar tracking of several aircrafts. David Poole posted the complete AILog2 code for a simple version of the example. After removing the over-simplification, we obtain the following model.

A flat region of space is observed by a radar on a 10x10 screen. A number of aircrafts fly through the region. The state (the location and the velocity) of an aircraft at each time step is a stochastic function of its state at the previous time step. With certain probability, the radar detects an aircraft, displaying a blip on the the screen. Due to the limited resolution of the radar, several aircrafts can give rise to the same blip. A number of false alarms may occur, each producing a blip. We observe the screen at several time moments and note the blips. The problem is to estimate the number of aircrafts. We may also want to identify the trajectory of each aircraft. A more general model allows aircrafts to disappear and new aircrafts to appear at each time step.

We found it crucial to sample the state of an aircraft and the location of a false alarm lazily. If the x-coordinate of the detected aircraft does not match the x-coordinate of any observed blip, we immediately report the contradiction (of the possible world with the evidence). We do not need to compute the y-coordinate of the aircraft, let alone the locations of other aircrafts and false alarms. Because of laziness, extending our model to 3D would increase the computation time by 50% (rather than ten-fold, for the 10x10x10 grid).

Sequential Monte Carlo was another necessary technique. We compute the complete distribution for the state of all the aircrafts at each time step. We can use exact inference or our importance sampling. For the ideal case without noise, we did both and observed good agreement, even with only 1500 samples. Computing the state of the system requires roughly the same amount of work per time step. This fact lets us trace the evolution of the system for as long as needed. As the noise decreases, the sampling results seem to converge to the `ground truth' (which can be inferred exactly). For simple cases, the computed results match calculations by hand.

The very end of the code file shows the results for the general case, of three blips moving horizontally. As the time progresses, the observed evidence favors more and more there being three aircrafts.

References [30K]
The successful attempt to express the model and do the inference. We used laziness systematically, almost mechanically, resisting the temptation to ``optimize.''

Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< >

David Poole: the AILog2 web page
< >
The web page refers to the complete AILog2 code for the airplane-blip example. The code simplifies the problem: first, it restricts aircraft movements to a plane. Then it assumes that observations of the radar screen are incomplete: the screen may contain other blips. The latter assumption is an over-simplification. For our model we used the observation procedure as given in Example 1.3 of Milch's et al. BLOG chapter. Alas, that BLOG code did not specify various priors; we picked what we considered reasonable priors, taking a hint from the AILog model whenever possible. [9K] [13K]
The re-implementation of the AILog code and the naive but ultimately unsuccessful attempt at the full model.


Realistic Example: sorted list

This problem may appear trivial to be called realistic. What makes it interesting is the astoundingly large search space. The problem has been posed by Georges Harik:

Draw N independent random samples from the uniform distribution of integers within the interval 0..M-1. What is the probability that the sequence of samples is sorted?

Theoretically, when M is large (and so duplicates are unlikely), the probability of the drawn samples being ordered is 1/N!. The task is to estimate that probability in one's favorite probabilistic system, setting M to be 1 million and N at least 10. With these parameters, the size of the search space is 10^60. Clearly exact inference and rejection sampling are preposterous to consider.

Harik's problem is ideal for a benchmark: it is easy to explain and to model. The expected result is also known. Getting any useful estimate of the result is rather difficult, especially without extensive fine-tuning of the inference algorithm.

Here are the results of solving the problem in HANSEI, using the importance sampling procedure from HANSEI library. In the model, the numbers are represented as lazy lists of bits, which proved crucial. Run times are in seconds on EeePC 701 (633 MHz CPU).

     M = 2^20, N = 10
      1000 samples  runtime 8 secs   estimated prob is 1.69e-07
      2000 samples  runtime 16 secs  estimated prob is 1.83e-07
      4000 samples  runtime 32 secs  estimated prob is 1.83e-07
      8000 samples  runtime 64 secs  estimated prob is 2.23e-07
         1/10! is 2.76e-07
     M = 2^20, N = 20
      24000 samples  runtime 468 secs  estimated prob is 3.64e-20
        1/20! is 4.11e-19
The convergence is a bit slow. We stress the use of a general-purpose importance sampling procedure, which knows nothing about numbers and numeric order, and which has not been tuned for the problem at hand.
References [9K]
The source code for the model expressed in HANSEI. Most of the code is comprised of various tests. The full sorted problem is tackled near the end of the file. OCaml comments show the results of sample runs.


Bounded-Rational Theory of Mind for Conversational Implicature

[The Abstract of the talk]
Game-theoretic and relevance-theoretic accounts of pragmatic reasoning often implement Grice's maxim of Manner by penalizing utterances for their syntactic complexity or processing effort. In ongoing work, we propose to explain this penalty as the risk of misinterpretation in a bounded-rational theory of mind. The core of this explanation is an executable representation of probabilistic inference in which probabilistic models and inference algorithms are expressed as interacting programs in the same language. The payoff is that the same cognitive machinery that the hearer uses to decide what to do, by interpreting utterances, can also be used by the speaker to decide what to say, by predicting how potential utterances would be interpreted or misinterpreted by the hearer.

This is joint work with Chung-chieh Shan.

Chung-chieh Shan. The abstract of the Invited talk at Texas Linguistics Society XII. November 13-15 2009.
< > [17K]
The code for the model and for several experiments with it


Correlated memoization and lazy permutations

Sampling of permutations, stochastic bijective functions, sequences of correlated random variables, the all-different constraint -- are related to the same problem. A problem with so many names must be important, and so must be its efficient solution. We explain the efficient solution on a sequence of simple examples, stressing correlated stochastic memoization. We then discuss connections with sorting and applications to various puzzles, such as zebra and permutation sort.

Our goal thus is to build a sequence of N all-different random variables with the same range: a sample from the sequence should have no duplicates. If the value of each variable is an integer from 0 through N-1, the problem is then the building of a stochastic bijective function[0..N-1] -> [0..N-1]. Defining a stochastic relation that is a function demands care: the naively [0,1] -> [0,1] stochastic relation f1 below is not a function, which is clearly seen from the probability table for the tuple (f1 0, f1 1, f1 0, f1 1) computed by Hansei. (In the OCaml session transcript below, the responses of the OCaml top-level are indented.)

     open ProbM;;
     let uniformly_list lst = List.nth lst (uniform (List.length lst));;
         val uniformly_list : 'a list -> 'a = <fun>
     exact_reify (fun () -> 
       let f1 = function
         | 0 -> uniformly_list [0;1]
         | 1 -> uniformly_list [0;1]
       (f1 0, f1 1, f1 0, f1 1))
         - : (int * int * int * int) Ptypes.pV =
         [(0.0625, Ptypes.V (1, 1, 1, 1)); (0.0625, Ptypes.V (1, 1, 1, 0));
          (0.0625, Ptypes.V (1, 1, 0, 1)); (0.0625, Ptypes.V (1, 1, 0, 0));
          (0.0625, Ptypes.V (1, 0, 1, 1)); (0.0625, Ptypes.V (1, 0, 1, 0));
          (0.0625, Ptypes.V (1, 0, 0, 1)); (0.0625, Ptypes.V (1, 0, 0, 0));
          (0.0625, Ptypes.V (0, 1, 1, 1)); (0.0625, Ptypes.V (0, 1, 1, 0));
          (0.0625, Ptypes.V (0, 1, 0, 1)); (0.0625, Ptypes.V (0, 1, 0, 0));
          (0.0625, Ptypes.V (0, 0, 1, 1)); (0.0625, Ptypes.V (0, 0, 1, 0));
          (0.0625, Ptypes.V (0, 0, 0, 1)); (0.0625, Ptypes.V (0, 0, 0, 0))]
The transcript shows importing the Hansei library and defining a procedure to uniformly sample from a list. The probability table computed next makes it patent that repeated evaluations of f1 0 may give different results: f1 is not a function. We have to use the memo facility of Hansei to define stochastic functions, such as f2:
     exact_reify (fun () ->
      let f2 = memo (function
        | 0 -> uniformly_list [0;1]
        | 1 -> uniformly_list [0;1])
      (f2 0, f2 1, f2 0, f2 1));;
         - : (int * int * int * int) Ptypes.pV =
         [(0.25, Ptypes.V (1, 1, 1, 1)); (0.25, Ptypes.V (1, 0, 1, 0));
          (0.25, Ptypes.V (0, 1, 0, 1)); (0.25, Ptypes.V (0, 0, 0, 0))]
The value of f2 0 is non-deterministic: it can be 0 in one possible world and 1 in another. Once the value of f2 0 is chosen, however, it stays the same. Therefore, f2 is a function: repeated applications of f2 to the same argument give the same value. The stochastic function f2 however is not injective: there are possible worlds (the first and the last one, in the above table) in which f2 0 and f2 1 have the same value.

The simplest way to turn f2 into an injective function is to reject the worlds in which it turns out not injective. We define f3 by asserting that applying it to different arguments must give different values:

     exact_reify (fun () ->
      let f2 = memo (function
        | 0 -> uniformly_list [0;1]
        | 1 -> uniformly_list [0;1])
      let f3 x = 
        if f2 0 == f2 1 then fail ();
        f2 x
      (f3 0, f3 1, f3 0, f3 1));;
        - : (int * int * int * int) Ptypes.pV =
        [(0.25, Ptypes.V (1, 0, 1, 0)); (0.25, Ptypes.V (0, 1, 0, 1))]
We obtained f3 that is stochastic, is a function (applying it to the same arguments gives the same values) and is injective (applying it to different arguments gives different values). Alas, it is not efficient. The weights in the resulting probability table sum to 0.5 rather than to one, betraying the waste of the half of the choice space. We would like to avoid making choices that we come to regret later. It behooves us to generate only the `right' worlds.

An injective and surjective function [0..N-1] -> [0..N-1] is a permutation of the sequence [0..N-1]. Hence we merely need to compute all permutations of the sequence, and sample from it (using the already defined uniformly_list). For N=2, there are only two permutations, leading us to f4 (here letlazy is a version of memo specialized to thunks):

     exact_reify (fun () ->
      let perms = letlazy (fun () -> uniformly_list [ [0;1]; [1;0] ]) in
      let f4 = List.nth (perms ())
      (f4 0, f4 1, f4 0, f4 1));;
       - : (int * int * int * int) Ptypes.pV =
       [(0.5, Ptypes.V (1, 0, 1, 0)); (0.5, Ptypes.V (0, 1, 0, 1))]

The function f4 is stochastic, is a function, is injective, and is defined without wasting any choices. Yet we are still not satisfied. For the general N, we will have to generate all permutations of the sequence [0..N-1] so to sample from them. The first application f4 x to any argument x involves sampling from N! choices. (Further applications of f4 use already made choices.) If the value of f4 x turns out inconsistent with further constraints on the world, we have to reject not only the choice that has lead to f4 x but also (N-1)! choices that have lead to f4 y for all other arguments y. We will have wasted a lot of choice space.

The waste can be reduced by computing the permutations on demand. We introduce a function lazy_permutation : 'a list -> (unit -> 'a) list that maps a list lst of values to a list of random variables. The variables are correlated. Sampling the first random variable gives a value uniformly chosen from lst. Sampling the second variable gives an element of lst that is different from the one chosen by the first variable, etc. Sampling all the random variables gives a permutation of lst. Crucially, if we reject the sample of one random variable, all further random variables are left unsampled: The permutation is computed to the needed extent. If we handle the result of lazy_permutation as a list, in sequence, we can save quite many choices. The source code below demonstrates that lazy_permutation indeed samples from all permutations, uniformly. The code defines the lazy stochastic injective function f5 such that determining f5 0 requires only N choices, f5 1 -- N*(N-1) choices, etc.

The code for lazy_permutation relies on the function uniformly_uncons : 'a list -> 'a * 'a list that uniformly selects one element from a list, returning the element and the remainder of the list. The code thus illustrates an acute observation by Dylan Thurston and Chung-chieh Shan that a sorting algorithm corresponds to a permutation algorithm. To further illustrate the relationship, the code implements a lazy permutation function that is based on insertion sort, non-deterministically inserting elements one-by-one into the initially empty list. It turns out slower than the selection-sort--based permutation.

The current version is December 2010.
lazy_permutations.mli [2K] [12K]
The complete source code and the tests

Purely Functional Lazy Non-deterministic Programming
The cited paper describes an alternative implementation of lazy permutations, which corresponds to insertion sort rather than selection sort. This alternative method requires defining a data structure for lists with a non-deterministic tail. The alternative method is more general since it represents lists whose size is random as well. On the other hand, the selection-sort--based method here works with the ordinary lists of OCaml, and is faster.

Solving the zebra puzzle with a sequence of all-different random variables.

Last updated March 1, 2016

This site's top page is
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML