# Embedded domain-specific languages for probabilistic programming

## Introduction

Broadly speaking, probabilistic programming languages are to express computations with degrees of uncertainty, which comes from the imprecision in input data, lack of the complete knowledge or is inherent in the domain. More precisely, the goal of probabilistic programming languages is to represent and automate reasoning about probabilistic models, which describe uncertain quantities -- random variables -- and relationships among them. The canonical example is the grass model, with three random variables representing the events of rain, of a switched-on sprinkler and wet grass. The (a priori) probabilities of the first two events are judged to be 30% and 50% correspondingly. Probabilities are non-negative real numbers that may be regarded as weights on non-deterministic choices. Rain almost certainly (90%) wets the grass. The sprinkler also makes the grass wet, in 80% of the cases. The grass may also be wet for some other reason. The modeler gives such an unaccounted event 10% of a chance. This model is often depicted as a directed acyclic graph (DAG)-- so-called Bayesian, or belief network -- with nodes representing random variables and edges conditional dependencies. Associated with each node is a distribution (such as Bernoulli distribution: the flip of a biased coin), or a function that computes a distribution from the node's inputs (such as the noisy disjunction `nor`).

The sort of reasoning we wish to perform on the model is finding out the probability distribution of some of its random variables. For example, we can work out from the model that the probability of the grass being wet is 60.6%. Such reasoning is called probabilistic inference. Often we are interested in the distribution conditioned on the fact that some random variables have been observed to hold a particular value. In our example, having observed that the grass is wet, we want to find out the chance it was raining on that day. For background on the statistical modeling and inference, the reader is referred to Pearl's classic text and to Getoor and Taskar's collection.

References
Judea Pearl: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
Morgan Kaufmann. Revised 2nd printing, 1998.

Lise Getoor and Ben Taskar: Introduction to Statistical Relational Learning
MIT Press, 2007.

## Problems of the Lightweight Implementation of Probabilistic Programming

We identify two problems and an open research question with Wingate et al. lightweight implementation technique for probabilistic programming. Simple examples demonstrate that common, what should be semantic-preserving program transformations drastically alter the program behavior. We briefly describe an alternative technique that does respect program refactoring. There remains a question of how to be really, formally sure that the MCMC acceptance ratio is computed correctly, especially for models with conditioning and branching.
References
PPS2016.pdf [141K]
The extended abstract published in the informal proceedings of the 2016 ACM SIGPLAN Workshop on Probabilistic Programming Semantics (PPS2016). January 23, 2016.

David Wingate, Andreas Stuhlmueller and Noah D. Goodman: Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation.
AISTATS2011. Revision 3. February 8, 2014.

## Metropolis-Hastings for Mixtures of Conditional Distributions

Models with embedded conditioning operations -- especially with conditioning within conditional branches -- are a challenge for Monte-Carlo Markov Chain (MCMC) inference. They are out of scope of the popular Wingate et al. algorithm or many of its variations. Computing the MCMC acceptance ratio in this case has been an open problem. We demonstrate why we need such models. Second, we derive the acceptance ratio formula. The corresponding MH algorithm is implemented in the Hakaru10 system, which thus can handle mixtures of conditional distributions.
References
PPS2017.pdf [199K]
The extended abstract published in the informal proceedings of the 2017 ACM SIGPLAN Workshop on Probabilistic Programming Semantics (PPS2017). January 17, 2017.

PPS2017-poster.pdf [98K]
Poster at PPS 2017

### Last updated February 4, 2017

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML