previous  next  contents  top

Embedded domain-specific language HANSEI for probabilistic models and (nested) inference

 


 

Introduction

HANSEI is the the embedded domain-specific language for probabilistic programming: for writing potentially infinite discrete-distribution models and performing exact inference, importance sampling and inference of inference.

HANSEI is an ordinary OCaml library, with probability distributions represented as ordinary OCaml programs. Delimited continuations let us reify non-deterministic programs as lazy search trees, which we may then traverse, explore, or sample. Thus an inference procedure and a model invoke each other as co-routines. Thanks to the delimited control, deterministic expressions look exactly like ordinary OCaml expressions, and are evaluated as such, without any overhead.

Inference procedures and probabilistic models are both ordinary OCaml functions. Both may be defined and extended by library users; both may appear in each other's code. Performing inference on a model that includes calls to inference procedures lets us parameterize distributions by distributions, and lets inference procedures measure their own accuracy. One application is modeling agents reasoning about each other's limited reasoning.

This is joint work with Chung-chieh Shan.

References
Probabilistic modeling in a general-purpose programming language
The web page also explains the advantage of delimited continuations for representing non-deterministic and stochastic computations.

nips2008.pdf [266K]
Abstract of the poster at the NIPS2008 workshop ``Probabilistic Programming: Universal Languages, Systems and Applications''

Chung-chieh Shan: Self-applicable probabilistic inference without interpretive overhead.
Talk at the meeting of the IFIP Working Group 2.11 (program generation), April 2009.
< http://www.cs.rutgers.edu/~ccshan/rational/bay-slides.pdf >

 

Embedded probabilistic programming

[The Abstract of the paper]
Two general techniques for implementing a domain-specific language (DSL) with less overhead are the finally-tagless embedding of object programs and the direct-style representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countable probabilistic models and performing exact inference and importance sampling on them. Our language is embedded as an ordinary OCaml library and represents probability distributions as ordinary OCaml programs. We use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of a model. We thus take advantage of the existing OCaml implementation to achieve competitive performance and ease of use. Inference algorithms can easily be embedded in probabilistic programs themselves.

The paper, in particular, presents a step-by-step refining of the finally-tagless to a very shallow embedding of the probabilistic DSL. That is, probabilistic functions are embedded as host-language functions, calls to them are embedded as host function calls, and random integers can be added using the addition operation of the host language. This approach lets us take advantage of the existing infrastructure of the host language while also reducing notational and interpretive overhead.

This is joint work with Chung-chieh Shan.

References
dsl-paper.pdf [189K]
In proceedings of the IFIP working conference on domain-specific languages, ed. Walid Taha. LNCS 5658, Springer, 2009, pp. 360-384.

Chung-chieh Shan. Slides of the talk at DSL-WC. July 17, 2009, Oxford, UK.
< http://www.cs.rutgers.edu/~ccshan/rational/dsl-slides.pdf >

 

Monolingual Probabilistic Programming Using Generalized Coroutines

[The Abstract of the paper]
Probabilistic programming languages and modeling toolkits are two modular ways to build and reuse stochastic models and inference procedures. Combining strengths of both, we express models and inference as generalized coroutines in the same general-purpose language. We use existing facilities of the language, such as rich libraries, optimizing compilers, and types, to develop concise, declarative, and realistic models with competitive performance on exact and approximate inference. In particular, a wide range of models can be expressed using memoization. Because deterministic parts of models run at full speed, custom inference procedures are trivial to incorporate, and inference procedures can reason about themselves without interpretive overhead. Within this framework, we introduce a new, general algorithm for importance sampling with look-ahead.

This is joint work with Chung-chieh Shan.

References
embedpp.pdf [125K]
In proceedings of the 25th conference on uncertainty in artificial intelligence (UAI), 2009.

Chung-chieh Shan. Slides of the talk at UAI. June 19, 2009, Montreal, Canada.
< http://www.cs.rutgers.edu/~ccshan/rational/embedpp-slides.pdf >

 

The implementation of HANSEI

The language HANSEI is shallowly embedded in OCaml and is, therefore, a regular OCaml library. Here is the code.
Version
The current version is April 2009.
References
ptypes.mli [<1K]
Library type declarations

probM.mli [4K]
The interface of the library: the `syntax' of HANSEI

probM.ml [18K]
The implementation of the library core: The dist function, evidence assertion, reification and reflection, call-by-need evaluation, sharing and memoization, reification and memoization in nested scopes

inference.mli [3K]
inference.ml [11K]
Exact and approximate inference procedures: the explorers of the lazy search tree resulting from reifying probabilistic programs

pMap.ml [6K]
pMap.mli [5K]
Dependency: PMap - Polymorphic maps, by Xavier Leroy, Nicolas Cannasse, and Markus Mottl. The original code was augmented with the function insert_with

Dependency: Native delimited continuations in (byte-code) OCaml

Makefile [2K]
How to compile the library and run the tests and examples

 

Introductory examples and regression tests

References
paper_examples.ml [4K]
The lawn/grass model: a simple Bayesian net. A taste of nested inference

exactInfM.ml [11K]
Exact inference, variable elimination, likening to IBAL

samplingM.ml [15K]
Importance sampling with look-ahead: a new, general algorithm for approximate inference; contrast with rejection sampling

slazy.ml [27K]
Lazy evaluation (or, evidence pulling) and sharing. Probabilistic context-free grammars. Memoization of stochastic functions

nested.ml [16K]
Many examples and regression tests of nested inference and nested call-by-need probabilistic computations

 

Example: colored balls

This is an elegant example of a model with indefinite number of objects whose `real' identity is unobservable. The example is a simplified version of population estimation. HANSEI's implementation of the example makes the case for memoization of stochastic functions. The example was introduced as example 1.1 in Milch's et al. chapter on BLOG:

``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''

References
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >

colored_balls.ml [5K]
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.

 

Example: blood types

This is an example from Manfred Jaeger's benchmark of probabilistic logic languages and systems: ``A simple genetic model for the inheritance of bloodtypes in a pedigree. Input domains consist of pedigrees with individuals and father and mother relation. The reference model comes from Balios.'' The hidden variables are the presence of genes A and B in the chromosomes inherited from the mother and the father. The observed variable is the blood type, which is a stochastic function of the genotype. A typical query asks for the distribution of blood types for a particular individual, possibly conditioned on the known genotype of some of the ancestors.
References
Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< http://www.cs.aau.dk/~jaeger/plsystems/models.html >

bloodtype.ml [9K]
The well-commented code of the model and sample inferences

 

Example: Hidden Markov Model (HMM)

A simple HMM model with a single hidden 8-state variable. Our implementation demonstrates the investigation of time complexity of the inference -- in an interactive session at the OCaml top level -- and changing the model by introducing variable elimination, which speeds up inference exponentially.
References
Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< http://www.cs.aau.dk/~jaeger/plsystems/models.html >

hmm.ml [11K]
The well-commented code of the model and the record of time complexity investigations and optimization

 

Example: Noisy OR

This is another problem from Manfred Jaeger's benchmark: ``A very basic model that uses the noisy-or function for combining several causal inputs. Input structures are directed acyclic graphs on which a random unary attribute is propagated from the roots downward. Reference model comes from Primula.''

We write the model to be independent from the structure of the graph. Typical queries ask for the (distribution of the) values of the attribute at a particular node. Therefore, we express the attribute propagation backwards: rather the specifying how the attribute of a node propagates to its descendants, we describe how a node obtains the attribute from its ancestors. The process is trivially declared in OCaml, as a recursive function that, given a non-root node, finds its ancestors and combines their attributes using noisy-or. We easily extend this function with assertions of available evidence, to support conditional queries.

Although the graph does not have directed cycles, it does have many undirected ones. As we work backwards, we may encounter a given node several times. Each time the value of node's random attribute must be the same. We have to use memoization.

References
Manfred Jaeger: Comparative Study of Probabilistic Logic Languages and Systems.
< http://www.cs.aau.dk/~jaeger/plsystems/models.html >

noisy_or.ml [8K]
The well-commented code of several variations of the model. All in all, our exact inference scales to the large version of the benchmark, with 19 nodes.

 

Realistic Example: Motivic development in classical music

This model developed by Avi Pfeffer is said to be of realistic complexity. It illustrates impressively advanced importance sampling of IBAL. The model describes the transformation of melodic motives in classical music as a combination of transposition, inversion, deletion and insertion operations. The number of operations and their sequence are random, as is the effect of transposition on each note. Given two motives (sequences of notes) we wish to determine how likely one motive could have been developed from the other.

The very large search space and the very low probability of evidence make exact inference or rejection sampling hopeless. IBAL was able to handle the model using a combination of very clever and extensive analyses of model's code. Since HANSEI is embedded in OCaml and is a regular OCaml library, it cannot obtain the code of the whole program. It turns out that with the combination of reification and lazy evaluation we are able to handle this model with accuracy roughly comparable to that of IBAL.

References
Avi Pfeffer: A General Importance Sampling Algorithm for Probabilistic Programs.
Harvard University Technical Report TR-12-07, 2007. Dagstuhl Seminar 07161.
< ftp://ftp.deas.harvard.edu/techreports/tr-12-07.pdf >

music1a.ml [6K]
Warm-up: splitting a note sequence in two parts at a random place, and applying a randomly chosen operation (deletion, identity, or a note transposition by various amounts) to both parts of the sequence. Crucially this simplified model does not recursively subdivide and transform the sequence. Therefore, the state space remains sufficiently small even for exact inference. The lazy evaluation was essential.

music2.ml [13K]
The full model, with recursive subdivision. The state space is very large, and the probability of evidence is very low, of the order 10^(-7). Only importance sampling is feasible.

 

Realistic Example: Multitarget tracking

Milch's et al. BLOG chapter, Example 1.3, briefly described the problem of radar tracking of several aircrafts. David Poole posted the complete AILog2 code for a simple version of the example. After removing the over-simplification, we obtain the following model.

A flat region of space is observed by a radar on a 10x10 screen. A number of aircrafts fly through the region. The state (the location and the velocity) of an aircraft at each time step is a stochastic function of its state at the previous time step. With certain probability, the radar detects an aircraft, displaying a blip on the the screen. Due to the limited resolution of the radar, several aircrafts can give rise to the same blip. A number of false alarms may occur, each producing a blip. We observe the screen at several time moments and note the blips. The problem is to estimate the number of aircrafts. We may also want to identify the trajectory of each aircraft. A more general model allows aircrafts to disappear and new aircrafts to appear at each time step.

We found it crucial to sample the state of an aircraft and the location of a false alarm lazily. If the x-coordinate of the detected aircraft does not match the x-coordinate of any observed blip, we immediately report the contradiction (of the possible world with the evidence). We do not need to compute the y-coordinate of the aircraft, let alone the locations of other aircrafts and false alarms. Because of laziness, extending our model to 3D would increase the computation time by 50% (rather than ten-fold, for the 10x10x10 grid).

Sequential Monte Carlo was another necessary technique. We compute the complete distribution for the state of all the aircrafts at each time step. We can use exact inference or our importance sampling. For the ideal case without noise, we did both and observed good agreement, even with only 1500 samples. Computing the state of the system requires roughly the same amount of work per time step. This fact lets us trace the evolution of the system for as long as needed. As the noise decreases, the sampling results seem to converge to the `ground truth' (which can be inferred exactly). For simple cases, the computed results match calculations by hand.

The very end of the code file shows the results for the general case, of three blips moving horizontally. As the time progresses, the observed evidence favors more and more there being three aircrafts.

References
blip.ml [30K]
The successful attempt to express the model and do the inference. We used laziness systematically, almost mechanically, resisting the temptation to ``optimize.''

Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >

David Poole: the AILog2 web page
< http://www.cs.ubc.ca/spider/poole/aibook/code/ailog/ailog2.html >
The web page refers to the complete AILog2 code for the airplane-blip example. The code simplifies the problem: first, it restricts aircraft movements to a plane. Then it assumes that observations of the radar screen are incomplete: the screen may contain other blips. The latter assumption is an over-simplification. For our model we used the observation procedure as given in Example 1.3 of Milch's et al. BLOG chapter. Alas, that BLOG code did not specify various priors; we picked what we considered reasonable priors, taking a hint from the AILog model whenever possible.

blip_ailog.ml [9K]
blip_naive.ml [13K]
The re-implementation of the AILog code and the naive but ultimately unsuccessful attempt at the full model.

 

Realistic Example: sorted list

This problem may appear trivial to be called realistic. What makes it interesting is the astoundingly large search space. The problem has been posed by Georges Harik:

Draw N independent random samples from the uniform distribution of integers within the interval 0..M-1. What is the probability that the sequence of samples is sorted?

Theoretically, when M is large (and so duplicates are unlikely), the probability of the drawn samples being ordered is 1/N!. The task is to estimate that probability in one's favorite probabilistic system, setting M to be 1 million and N at least 10. With these parameters, the size of the search space is 10^60. Clearly exact inference and rejection sampling are preposterous to consider.

Harik's problem is ideal for a benchmark: it is easy to explain and to model. The expected result is also known. Getting any useful estimate of the result is rather difficult, especially without extensive fine-tuning of the inference algorithm.

Here are the results of solving the problem in HANSEI, using the importance sampling procedure from HANSEI library. In the model, the numbers are represented as lazy lists of bits, which proved crucial. Run times are in seconds on EeePC 701 (633 MHz CPU).

     M = 2^20, N = 10
      1000 samples  runtime 8 secs   estimated prob is 1.69e-07
      2000 samples  runtime 16 secs  estimated prob is 1.83e-07
      4000 samples  runtime 32 secs  estimated prob is 1.83e-07
      8000 samples  runtime 64 secs  estimated prob is 2.23e-07
     
         1/10! is 2.76e-07
     
     M = 2^20, N = 20
      24000 samples  runtime 468 secs  estimated prob is 3.64e-20
     
        1/20! is 4.11e-19
The convergence is a bit slow. We stress the use of a general-purpose importance sampling procedure, which knows nothing about numbers and numeric order, and which has not been tuned for the problem at hand.
References
sorted.ml [9K]
The source code for the model expressed in HANSEI. Most of the code is comprised of various tests. The full sorted problem is tackled near the end of the file. OCaml comments show the results of sample runs.

 

Bounded-Rational Theory of Mind for Conversational Implicature

[The Abstract of the talk]
Game-theoretic and relevance-theoretic accounts of pragmatic reasoning often implement Grice's maxim of Manner by penalizing utterances for their syntactic complexity or processing effort. In ongoing work, we propose to explain this penalty as the risk of misinterpretation in a bounded-rational theory of mind. The core of this explanation is an executable representation of probabilistic inference in which probabilistic models and inference algorithms are expressed as interacting programs in the same language. The payoff is that the same cognitive machinery that the hearer uses to decide what to do, by interpreting utterances, can also be used by the speaker to decide what to say, by predicting how potential utterances would be interpreted or misinterpreted by the hearer.

This is joint work with Chung-chieh Shan.

References
Chung-chieh Shan. The abstract of the Invited talk at Texas Linguistics Society XII. November 13-15 2009.
< http://uts.cc.utexas.edu/~tls/tls2009/abstracts/invited_shan.pdf >

Chung-chieh Shan. Slides of the talk at the Texas Linguistics Society (UT Austin) and the workshop on Logical Methods for Discourse (LORIA), 2009.
< http://www.cs.rutgers.edu/~ccshan/rational/discourse-slides.pdf >

incite.ml [17K]
The code for the model and for several experiments with it



Last updated February 5, 2010

This site's top page is http://okmij.org/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML