- Introduction
**Hansei:**Embedded probabilistic DSL with discrete distributions, importance sampling, nested inference**Hakaru10:**Embedded probabilistic DSL with discrete and continuous distributions, MCMC (MH) sampling- Problems of the Lightweight Implementation of Probabilistic Programming
- Metropolis-Hastings for Mixtures of Conditional Distributions
- Meta-Programming for Statistical Machine Learning
- Lifted inference: normalizing loops by evaluation

- Broadly speaking, probabilistic programming languages are to express
computations with degrees of uncertainty, which comes from the
imprecision in input data, lack of the complete knowledge or is
inherent in the domain. More precisely, the goal of probabilistic
programming languages is to represent and automate reasoning about
*probabilistic models,*which describe uncertain quantities -- random variables -- and relationships among them. The canonical example is the grass model, with three random variables representing the events of rain, of a switched-on sprinkler and wet grass. The (a priori) probabilities of the first two events are judged to be 30% and 50% correspondingly. Probabilities are non-negative real numbers that may be regarded as weights on non-deterministic choices. Rain almost certainly (90%) wets the grass. The sprinkler also makes the grass wet, in 80% of the cases. The grass may also be wet for some other reason. The modeler gives such an unaccounted event 10% of a chance. This model is often depicted as a directed acyclic graph (DAG) -- so-called Bayesian, or belief network -- with nodes representing random variables and edges conditional dependencies. Associated with each node is a distribution (such as Bernoulli distribution: the flip of a biased coin), or a function that computes a distribution from the node's inputs (such as the noisy disjunction`nor`

).The sort of reasoning we wish to perform on the model is finding out the probability distribution of some of its random variables. For example, we can work out from the model that the probability of the grass being wet is 60.6%. Such reasoning is called probabilistic

*inference*. Often we are interested in the distribution conditioned on the fact that some random variables have been observed to hold a particular value. In our example, having observed that the grass is wet, we want to find out the chance it was raining on that day. For background on the statistical modeling and inference, the reader is referred to Pearl's classic text and to Getoor and Taskar's collection. **References**- Judea Pearl.
Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference

Morgan Kaufmann. Revised 2nd printing, 1998Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning

MIT Press, 2007.

- We identify two problems and an open research question with Wingate et al. lightweight implementation technique for probabilistic programming. Simple examples demonstrate that common, what should be semantic-preserving program transformations drastically alter the program behavior. We briefly describe an alternative technique that does respect program refactoring. There remains a question of how to be really, formally sure that the MCMC acceptance ratio is computed correctly, especially for models with conditioning and branching.
**References**- PPS2016.pdf [141K]

The extended abstract published in the informal proceedings of the 2016 ACM SIGPLAN Workshop on Probabilistic Programming Semantics (PPS2016). January 23, 2016.David Wingate, Andreas Stuhlmueller and Noah D. Goodman: Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation

AISTATS 2011. Revision 3. February 8, 2014

- Models with embedded conditioning operations -- especially with conditioning within conditional branches -- are a challenge for Monte-Carlo Markov Chain (MCMC) inference. They are out of scope of the popular Wingate et al. algorithm or many of its variations. Computing the MCMC acceptance ratio in this case has been an open problem. We demonstrate why we need such models. Second, we derive the acceptance ratio formula. The corresponding MH algorithm is implemented in the Hakaru10 system, which thus can handle mixtures of conditional distributions.
**References**- PPS2017.pdf [199K]

The extended abstract published in the informal proceedings of the 2017 ACM SIGPLAN Workshop on Probabilistic Programming Semantics (PPS2017). January 17, 2017PPS2017-poster.pdf [98K]

Poster at PPS 2017

- Statistical machine learning (ML) is a broad branch of machine learning
aiming at drawing conclusions and learning from inherently
uncertain data, using ``ideas from probability theory and statistics to
address uncertainty while incorporating tools from logic, databases,
and programming languages to represent structure.'' (Getoor, Taskar,
2007).
Just as importance of ML increases, the scalability problem of developing ML applications becomes more and more pressing. Currently, applying a non-trivial machine learning task requires expertise both in the modeled domain as well as in probabilistic inference methods and their efficient implementations on modern hardware. The tight coupling between the model and the efficient inference procedure hinders making changes and precludes reuse. When the model changes significantly, the inference procedure often has to be re-written from scratch.

Probabilistic programming -- which decouples modeling and inference and lets them be separately written, composed and reused -- has the potential to make it remarkably easier to develop new ML tasks and keep adjusting them, while increasing confidence in the accuracy of the results. That promise has been recognized by the U.S. Defense Advanced Research Projects Agency (DARPA), which has initiated the broad program ``Probabilistic Programming for Advancing Machine Learning (PPAML)'', started in March 2013 and running through 2017.

Developing the potential of probabilistic programming requires applying the recent insights from programming language (PL) research such as supercompilation from metaprogramming. A surprising challenge is correctness: it turns out that a number of well-known and widely-used libraries and systems such as STAN may produce patently wrong results on some problems (as well-demonstrated in Hur et al., FSTTCS 2015).

We proposed a discussion-heavy workshop to promote the evident and growing interest of the developers of ML/probabilistic domain-specific languages in program generation and transformation, and programming language researchers in ML applications. It was to discuss the not-yet-answered challenges and the issues raised at the POPL-affiliated ``Probabilistic Programming Semantics'' workshops, but in more depth and detail. The meeting took place at the Shonan Village Center, Kanagawa, Japan on May 22-25, 2018.

The following questions were raised repeatedly during the meeting:

- What is correctness and how to measure it?
- Different communities have different takes. In programming languages we consider correctness as program (e.g., an implementation of inference algorithm) satisfying its specification. On the other hand, modeling and machine learning community looks at correctness as model adequately representing data and generating useful predictions. Modeling community is hence rather accepting towards `faulty' implementations (which fail in some cases) so long as they are useful in evaluating the model. We have to be very clear what notion of correctness is at hand at each point in a discussion.
- How to represent complex data?
- ML learning community has the need to represent complex data (for example, complicated graphs); in contrast, many probabilistic programming languages support only very simple data types (in the extreme, only collections of floating-point numbers). Related is the question of a suitable notation for these complex data types.
- How to compose inference algorithms and control them?
- On one hand we would like to compose inference procedures as programs from library modules, where modules are treated as black box and freely substitutable (provided they support the same interface). On the other hand, there are compelling examples for knowing the details of the inference algorithm so to `guide' it (e.g., to `weight' the distributions used therein).

**References**- <http://shonan.nii.ac.jp/seminar/113/>

The web page of the seminarshonan-113-report.pdf [158K]

The final report