Classical Prolog is an enchanting misconception. It ought to be studied, for its ideas and lessons. One lesson is that guessing-- non-determinism-- is fundamental, but should not be the default mode of execution. One should guess, but guess lazily.
The strong points of Prolog can be brought into an ordinary functional programming language. Using OCaml as a representative, we implement lazy guessing as a library, with which we reproduce classical Prolog examples. Furthermore, we demonstrate parser combinators that use committed choice (maximal munch) and can still be run forwards and backwards. They cannot be written in Classical Prolog. Logic variables, unification, and its WAM compilation strategy naturally emerge as a ``mere optimization'' of the Herbrand universe enumeration.
rethinking-talk.pdf [261K]
Annotated slides of the talk at the 31st JSSST meeting, Sep 9, 2014
prolog_drawbacks.pro [6K]
Source code for the Prolog examples in the first part of the talk
logic_vars.ml [13K]
The complete source code for the Hansei examples in the second part of the talk
NII2012.pdf [198K]
Non-deterministic choice in a conventional programming language: Enough for logic programming?
Annotated slides of the presentation at the 55th Tokyo Programming Seminar
(ToPS) at the National Institute of Informatics, Japan. December 18, 2012.
An inspiring conversation with Mikael More brought up unthought-of questions. What if we do try to use Hansei to solve typical logic programming puzzles? Can we even do it? Can we do it efficiently? How much of logic programming will remain? In particular, can we read the Hansei code as a clear, declarative statement of the problem, without being distracted by the details of the solution? This present web page documents the attempts to answer all these questions. Hansei turns out suitable for logic programming after all.
We discuss logic programming without logic variables in a separate article below. Here we argue why one might want to consider Hansei when solving a typical logic programming problem.
Hansei adheres to declarative programming, by separating building probabilistic models from inference. A model describes the properties of a possible world and defines computations and constraints on them. Inference finds possible worlds satisfying the properties and the constraints. Building models and writing inference (search) procedures are distinct activities with distinct abstractions; writing custom inference procedures is rare since the ones in Hansei often suffice.
Many uses of cuts, negation and committed choice in Prolog are a way of telling that certain predicate is (semi-)deterministic; in other words, the underlying relation is functional. In Hansei, it is the determinism that is the default; deterministic computations run `at full speed' of OCaml, with no overhead. Hansei is thus particularly suitable for logic programs with a large portion of deterministic, especially numeric, computations. Hansei, as underlying OCaml, is typed. Types are notably useful in logic programming, to avoid a typical Prolog problem, of a mis-spelled constant or variable leading to an incorrect answer rather than an error message. Without types, such errors are frustratingly difficult to find.
Probability estimations turn out useful too: if we prematurely terminate a Hansei search, we obtain an estimate of the probability of missing a solution. Assigning different weights to different choices (that is, using non-uniform probability distributions) is a way to supply heuristics to bias the search.
Except for probability, the arguments for Hansei are the arguments for functional logic programming, equally applicable to the languages like Curry. The strong case for functional-logic programming has been made elsewhere, see in particular surveys by Michael Hanus. The advantage of Hansei is that it is not a separate language: it is just a user-extensible OCaml library. The arguments for Hansei also apply to embeddings of non-determinism in a typed functional language (for example, MonadPlus in Haskell), provided the crucial feature of lazy sharing is supported.
HANSEI: Embedded Probabilistic Programming
Purely Functional Lazy Non-deterministic Programming
The paper describes the importance, the equational theory and the implementation of lazy sharing. The implementation of letlazy
in Hansei is based on the technique explained in the paper.
Guessing is a part of life and science. We form a hypothesis, work out the consequences and compare with observations. Lots of problems are formulated by first assuming that the solution exists and then describing the properties it should have. Planning, scheduling, diagnostic, learning problems and Sudoku all follow this pattern. Guessing is good not only for describing these problems but also for solving them. We make a guess -- often a series of guesses -- to build, for example, a schedule, and check if it satisfies resource, timeliness and other constraints. Often, we guess again.How do we write ``guess the value of this variable'' in code? How do we code ``guess again''? How to put in prior knowledge favoring some guesses? The talk first will answer these questions.
Naive guessing however is hopeless even for toy problems. We often have to make lots of guesses before getting a candidate solution to check against the constraints. Only a tiny or even infinitesimal proportion of these guesses yield a successful candidate. How to make good guesses? That is very hard to know: Most physical, biological, sociological, etc. models are set up to compute the consequences of causes rather than the other way around. It helps to reformulate the question: how to avoid too many bad guesses? The talk will describe and illustrate a general principle, found in any serious logic, non-deterministic or probabilistic programming system.
The techniques explained in the talk are not tied to any language and can be used even in C. However, functional, especially typed functional languages have a serious advantage, as we shall see. No prior knowledge of logic or non-deterministic programming is required. The ability to read introductory OCaml or Haskell code will be helpful. The participants will learn how to guess in their favorite language, and what it takes to succeed. They will see laziness, unification and constraint propagation in the same light.
There are several programming languages build around non-determinism: standalone -- such as Icon, Prolog, Curry -- or deeply embedded like Kanren. The talk is not about them. It is about your language. Although the talk uses OCaml for concreteness, the main points are not OCaml-specific and have been realized in other languages. We will not be talking about implementing a dialect of Prolog in your language. Rather, we argue for using non-determinism in your language directly. There is no new language to learn or a strict convention to follow. You write code as you normally do; you call any function in any library as usual -- without any wrappers and without pondering how to turn that third-party function into a relation or to make it reversible. Likewise, you can give non-deterministic callbacks to unsuspecting third-party functions.
Does this `direct guessing' measure up to bona fide logic programming?
After all, your language, like OCaml, may have no logic variables,
unification, or constraint solvers. The talk -- and this page --
describe several classical examples of logic programming, all
implemented in OCaml, with Hansei library. You can do logic
programming in your language. All you need is fork(2)
-- a form of
clonable threads with a backtrackable (world-local) state.
Tim Menzies, David Owen and Julian Richardson: ``The Strangest Thing About Software''
IEEE Computer, January 2007, pp. 54-60.
< http://menzies.us/pdf/07strange.pdf >
Abstract: ``Although there are times when random search is dangerous and should
be avoided, software analysis should start with random methods because
they are so cheap, moving to the more complex methods only when random
methods fail.''
The article makes the compelling case for guessing as a way of solving
problems. It details two examples, scheduling and finite-model
checking, where random search turns out faster than sophisticated
methods. These examples are not accidental, the authors argue.
puzzle.ml [6K]
The code for the puzzle at the beginning of the talk
`Reversible' parser combinators
The second example in the talk
type_inference.ml [10K]
The code for the reversible type inference, the last example in the talk
Logic programming without logic variables should be obvious. The minimal model semantics of Prolog needs no logic variables; after all, Herbrand basis is by definition a set of ground atoms. The fixpoint construction (iterating the immediate consequence operator) gives the algorithm of finding the minimal model with no use for variables or unification. The reality dims the hope: although the fixpoint construction is viable for definite Datalog programs, it is difficult to perform when the Herbrand basis is infinite and the logic program contains negation and committed choice.
Two recent theoretical results in functional logic programming, Antoy and Hanus (2006) and Dios Castro and Lopez-Fraguas (2006), point out that logic variables and non-determinism due to overlapping rules have the same computational power. The main idea is representing a logic variable by a generator of its domain, with sharing of the generator expressing repeated occurrences of the variable. The idea is intuitively clear: a logic variable indeed stands for some term, non-deterministically chosen from a suitable domain. The papers derive a systematic transformation of a program with logic variables into the one without, while preserving the set of derivations. Alas, preserving the set of derivations (reductions) is a rather weak notion of program equivalence. For example, it does not account for the multiplicity of results, considering a program that yields, say, a single 0 equivalent to a program yielding the infinite sequence of 0s. In logic programming practice, we usually do distinguish such programs, valuing programs that produce a finite set of answers in finite time. The other drawback of the theoretical frameworks used in both papers is the inability to reason about finite failure. A logic program that returns "NO" and a program that loops forever are regarded as equivalent in these frameworks. In practice we consider such programs as very different. Since the papers are written as theoretical, the details about sharing and fairness of generators are sketchy.
Aware of the theoretical hopes and practical shortcomings, we decided to press ahead and see just how far we can go expressing standard Prolog idioms in a language without logic variables. Pending adequate theoretical analysis, empirical case studies seem the only choice. The present web page documents our results. The somewhat unexpected conclusion is that we go quite far. Hansei as it is appears suitable for logic programming.
We have gone back to Herbrand: we build the Herbrand universe (the set of
all ground terms) and explore it to find a model of a program
(that is, tuples of terms that model all declared relations).
We build the universe by modeling `logic variables'
as generators for their domains. It greatly helps that Hansei is
typed; the type of a `logic variable' specifies its domain and, hence,
the generator. Since the Herbrand universe for most logic programs
is infinite, we need
non-strict evaluation. Furthermore, since logic variables may
occur several times, we must be able to correlate the generators. Although
Hansei is embedded in OCaml, which is strict, Hansei supports
on-demand evaluated non-deterministic computations with sharing:
so-called letlazy
computations. Finally, since the search space
is generally infinite, we need a systematic way of exploring it,
without getting stuck in one infinite sub-region. Again Hansei provides
a number of search strategies, including sampling and iterative deepening.
(We shall use the latter in all our examples.)
Logic variables and unification have been introduced by Robinson as a way to `lift' ground resolution proofs of Herbrand, to avoid generating the vast number of ground terms. Logic variables effectively delay the generation of ground terms to the last possible moment and to the least extent. Doing computations only as far as needed is also the goal of lazy evaluation. It appears that lazy evaluation can make up for logic variables, rendering Herbrand's original approach practical.
John Alan Robinson: ``Computational Logic: Memories of the Past and Challenges for the Future''
Proceedings of the First International Conference on Computational Logic, LNAI 1861, pp. 1-24, 2000.
John Alan Robinson reminisces about Herbrand, unification and the discovery of resolution.
Sergio Antoy and Michael Hanus: Overlapping rules and logic variables in Functional Logic Programs
ICLP 2006.
J. de Dios Castro and F.J. Lopez-Fraguas: Elimination of Extra Variables in Functional Logic Programs
Proc. V Jornadas sobre Programacion y Lenguajes (PROLE'06), pp. 121-135, 2006.
append
relation. Append relates three lists l1
, l2
and l3
such that l3
is the concatenation of l1
and l2
. All throughout we
take our lists to be lists of booleans. We will contrast Prolog code
with Hansei code, developing both incrementally and interactively.
The responses of Prolog resp. OCaml top-level are indented.In Prolog, the append relation is stated as:
append([],L,L). append([H|T],L,[H|R]) :- append(T,L,R).declaring that an empty list is a prefix of any list, a list is a suffix of itself, and prepending to the prefix of a list prepends to the list. Certainly
append
can concatenate two lists:?- append([t,t,t],[f,f],X). X = [t, t, t, f, f].By passing a free logic variable as one of the two first arguments, we can concatenate not fully-determined lists:
?- append([t,t,t],X,R). R = [t, t, t|X].We see the single result, which stands for every list with the prefix
[t,t,t]
. Such a compact representation of an infinite
set is an asset. Alas, it is not always available, and is often a mirage.
For example, if we pass a free variable as the first argument of
append
, Prolog responds differently:?- append(X,[f,f],R). R = [f, f] ; R = [_G328, f, f] ; R = [_G328, _G334, f, f] ; R = [_G328, _G334, _G340, f, f] .with an infinite set of answers, to be enumerated indefinitely. It may not be fully apparent that our Prolog code has not faithfully represented the original problem: to relate boolean lists. The last two answers describe lists of more than mere booleans. We have to impose the restriction, by defining the type predicates
bool(t). bool(f). boollist([]). boollist([H|T]) :- bool(H), boollist(T).and re-writing our queries:
?- append([t,t,t],X,R), boollist(X), boollist(R). R = [t, t, t] ; R = [t, t, t, t] ; R = [t, t, t, t, t] ; R = [t, t, t, t, t, t] ; R = [t, t, t, t, t, t, t, t, t] . ?- append(X,[f,f],R), boollist(X), boollist(R). R = [f, f] ; R = [t, f, f] ; R = [f, f, f] .One of
boollist(X)
or boollist(R)
would have
been enough: if X
is a boolean list, so is R
. Alas, Prolog has no
type inference and is unable to infer or take advantage of that fact.
To be safe, we add both predicates. The compact representation for the
lists with the [t, t, t]
prefix has disappeared.
More seriously, the default depth-first search strategy of Prolog gives
us only half of the answers; we won't ever see the lists with an f
after the first three t
.
We switch to Hansei. In a typed language, types -- the specification
of the problem -- come first. We start by defining the type blist
of
boolean lists with a non-deterministic spine, along with the convenient
constructors:
type bl = Nil | Cons of bool * blist and blist = unit -> bl;; let nil : blist = fun () -> Nil;; let cons : bool -> blist -> blist = fun h t () -> Cons (h,t);;and the conversion to ordinary, fully deterministic lists, to see the results.
let rec list_of_blist bl = match bl () with | Nil -> [] | Cons (h,t) -> h:: list_of_blist t;; val list_of_blist : blist -> bool list = <fun>
We now define append
, seemingly as a function, in the fully standard,
also declarative, way.
let rec append l1 l2 = match l1 () with | Nil -> l2 | Cons (h,t) -> cons h (fun () -> append t l2 ());; val append : blist -> blist -> blist = <fun>
An attempt to concatenate two sample lists:
let t3 = cons true (cons true (cons true nil));; let f2 = cons false (cons false nil);; append t3 f2;; - : blist = <fun>
turns out not informative. Recall that we use the Hansei library to build a probabilistic model, which we then have to run. Running the model determines the set of possible worlds consistent with the probabilistic model: the model of the model. The set of model outputs in these worlds is the set of answers. Hansei offers a number of ways to run models and obtain the answers and their weights. We will be using iterative deepening:
val reify_part : int option -> (unit -> 'a) -> (Ptypes.prob * 'a) listThe first argument is the depth search bound (infinite, if
None
).reify_part None (fun () -> list_of_blist (append t3 f2));; - : (Ptypes.prob * bool list) list = [(1., [true; true; true; false; false])]Running our first append model has given the expected result, with the expected weight. Following along the Prolog code, we turn to appending to an indeterminate list.
Indeterminate lists are represented by generators. Hence, we need
generators for bool and bool list domains. The generators must use letlazy
,
to delay the generation until the value is needed; once the value has been
determined, it stays the same.
let a_boolean () = letlazy (fun () -> flip 0.5);; val a_boolean : unit -> unit -> bool = <fun> let rec a_blist () = letlazy (fun () -> dist [(0.5, Nil); (0.5, Cons(flip 0.5, a_blist ()))]);; val a_blist : unit -> blist = <fun>
Let us see a sample of generated values, as a spot-check:
reify_part (Some 5) (fun() -> list_of_blist (a_blist ()));; - : (Ptypes.prob * bool list) list = [(0.5, []); (0.125, [false]); (0.03125, [false; false]); (0.03125, [false; true]); (0.125, [true]); (0.03125, [true; false]); (0.03125, [true; true])]The lists of different lengths have different probabilities, or different likelihoods of encountering them. We could use the accumulated probability mass as the measure of the explored search space, terminating the search once a threshold is reached. One is reminded of probabilistic algorithms such as the Rabin-Miller primality test.
We are now ready to reproduce the Prolog code, passing a `free' variable
as an argument to append
and obtaining a sequence of lists with the
given prefix or suffix. Unlike Prolog, we are
not stuck in a corner of the search space. We generate all lists
with the given prefix [true; true; true]
.
reify_part (Some 5) (fun() -> let x = a_blist () in list_of_blist (append t3 x));; - : (Ptypes.prob * bool list) list = [(0.5, [true; true; true]); (0.125, [true; true; true; false]); (0.03125, [true; true; true; false; false]); (0.03125, [true; true; true; false; true]); (0.125, [true; true; true; true]); (0.03125, [true; true; true; true; false]); (0.03125, [true; true; true; true; true])] reify_part (Some 5) (fun() -> let x = a_blist () in list_of_blist (append x f2));; - : (Ptypes.prob * bool list) list = [(0.5, [false; false]); (0.125, [false; false; false]); (0.03125, [false; false; false; false]); (0.03125, [false; true; false; false]); (0.125, [true; false; false]); (0.03125, [true; false; false; false]); (0.03125, [true; true; false; false])]
Prolog's append
is so elegant because it defines a relation among
three lists. One can specify any two lists and query for
the other one that makes the relation hold. For example, we check if
a given list has a given prefix, and if so, remove it:
?- append([t,t],X,[t,t,t,f,f]). X = [t, f, f].
We can do the same in Hansei, effectively `running append
backwards.'
We introduce a sample list to deconstruct and a function two compare
two lists. It is an ordinary function, not a built-in.
let t3f2 = append t3 f2;; let rec bl_compare l1 l2 = match (l1 (), l2 ()) with | (Nil,Nil) -> true | (Cons (h1,t1), Cons (h2,t2)) -> h1 = h2 && bl_compare t1 t2 | _ -> false;; val bl_compare : blist -> blist -> bool = <fun>The prefix-removal example is as follows. In words: we assert that there exists a boolean list
x
such that appending it to
[true; true]
yields the list t3f2
. We ask Hansei to find all
possible worlds in which the assertion holds and report the value
of x
in these worlds.reify_part None (fun() -> let x = a_blist () in let r = append (cons true (cons true nil)) x in if not (bl_compare r t3f2) then fail (); list_of_blist x);; - : (Ptypes.prob * bool list) list = [(0.0078125, [true; false; false])]Since the search bound was
None
, the returned single result came from
the exhaustive enumeration of the entire search space. The search space is
indeed infinite, but amenable to efficient exploration because most
of the possible worlds are inconsistent with the assertion and
could be rejected wholesale, without generating them. For example,
when comparing an indeterminate list to an empty list, bl_compare
returns the result after forcing a single choice. One can tell
if a list is empty by checking for the first Cons
cell.
One does not need to know how many else elements there may be in the list.
In general, if the lists are not equal, it can be determined by
examining their finite prefix. Furthermore, if one of bl_compare
's arguments
has a fixed length (as is the case in our example: t3f2
is the list
of 5 elements), the comparison finishes after forcing finitely many choices.
The ability of bl_compare
to deal with partially determined lists
relates it to unification: unifying [X|Y]
with the empty
list fails without the need to instantiate X
or Y
, in effect
performing the comparison with the infinite set of concrete lists.
We can likewise use append
to check for, and remove, a given suffix.
We can also split a given list in all possible ways,
returning its prefixes and suffixes. If the list is finite, we obtain
the finite number of answers, in Prolog:
?- append(X,Y,[t,t,t,f,f]). X = [], Y = [t, t, t, f, f] ; X = [t], Y = [t, t, f, f] ; X = [t, t], Y = [t, f, f] ; X = [t, t, t], Y = [f, f] ; X = [t, t, t, f], Y = [f] ; X = [t, t, t, f, f], Y = [] ; false.and, similarly, in Hansei:
reify_part None (fun() -> let x = a_blist () in let y = a_blist () in let r = append x y in if not (bl_compare r t3f2) then fail (); (list_of_blist x, list_of_blist y));; - : (Ptypes.prob * (bool list * bool list)) list = [(0.000244140625, ([], [true; true; true; false; false])); (0.000244140625, ([true], [true; true; false; false])); (0.000244140625, ([true; true], [true; false; false])); (0.000244140625, ([true; true; true], [false; false])); (0.000244140625, ([true; true; true; false], [false])); (0.000244140625, ([true; true; true; false; false], []))]The Hansei code follows the demonstrated pattern, of building a probabilistic model returning a pair of two `random' lists
x
and y
, conditioned upon the asserted evidence that their
concatenation is t3f2
. In the above code, the OCaml variable x
is used twice. Although it is bound to a generator, its two use
occurrences refer to the same generated value -- thanks to letlazy
in the definition of a_blist
.
Our final example is last
, relating a list to its last element.
We contrast Prolog code
last(E,L) :- append(_,[E],L). ?- last(E,[t,t,f]). E = f ;
to Hansei code:
let last bl = let x = a_blist () in let e = a_boolean () in if not (bl_compare (append x (cons (e ()) nil)) bl) then fail (); e;; val last : blist -> unit -> bool = <fun> reify_part None (fun() -> last t3f2 ());; - : (Ptypes.prob * bool) list = [(0.0009765625, false)]
There are still open questions. The following code produces the result with no restrictions on the search space:
reify_part None (fun () -> let x = a_blist () in let y = a_blist () in if not (bl_compare x t3f2) then fail (); if not (bl_compare x y) then fail (); list_of_blist y);; - : (Ptypes.prob * bool list) list = [(2.384185791015625e-07, [true; true; true; false; false])]
A slightly modified code, with (bl_compare x y)
first, although
gives the same result, requires the restriction on the search
depth. The search space can no longer be efficiently enumerated.
Can we automatically reorder the comparison statements, to improve the
search? Another open problem is the `occurs check.' Its absence in Hansei
again necessitates the restriction on the search space.
None of the two open issues are fatal: restricting the search by
depth or by the remaining probability mass (effectively, by the
probability of missing a solution if one exists) is always possible
and even natural. It is nevertheless interesting to see if
the occurs check and the unification of two unbound variables
have a clear analogue in the variable-less case.
For the daily schedule for Monday to Wednesday:We take the phrase `One of the days I'll shop.' to mean that `I shop only on one day out of the three.' Otherwise, there are too many solutions: for example, I can do all the activities on each day. We develop our examples interactively, by submitting expressions to the top-level of the OCaml interpreter after loading the Hansei library. In the transcript below, the OCaml top-level responses are indented.Which are all possible daily schedules, regarding those four events?
- One of the days I'll shop;
- One of the days I'll take a walk;
- One of the days I'll go to the barber;
- One of the days I'll go to the supermarket;
- The same day as I go to the supermarket, I'll shop;
- The same day as I take a walk I'll go to the barber;
- I'll go to the supermarket the day before the day I'll take a walk;
- I'll take a walk Tuesday.
The gist of the problem is selecting a subset of actions to do on one day from the set of possible actions. In other words, we need to sample from a powerset. Such a sampling function could have been provided by Hansei. It is however easy and illustrative to define from scratch:
open ProbM;; let powerset lst = List.fold_left (fun acc e -> if flip 0.5 then e::acc else acc) [] lst;; val powerset : 'a list -> 'a list = <fun>In words: to choose a subset of the set
lst
we decide, randomly and
independently, for each element of lst
whether to take it or not.
We use OCaml lists to represent sets.
To spot-check our sampling procedure we compute the probability
table for all samples from powerset [1;2;3]
: exact_reify (fun () -> powerset [1;2;3]);; - : int list Ptypes.pV = [(0.125, Ptypes.V [3; 2; 1]); (0.125, Ptypes.V [3; 2]); (0.125, Ptypes.V [3; 1]); (0.125, Ptypes.V [3]); (0.125, Ptypes.V [2; 1]); (0.125, Ptypes.V [2]); (0.125, Ptypes.V [1]); (0.125, Ptypes.V [])]Indeed all subsets are present, with the equal probabilities.
We define another simple but helpful function, which asserts the
proposition x
in the current world:
let mustbe x = if not x then fail ();; val mustbe : bool -> unit = <fun>
We are ready to encode our problem. We define the datatype enumerating the possible actions:
type action = Shop | Walk | Barber | Supermarket;;To define the model we specify the sampling procedure for the actions and enumerate the constraints, following the most straightforward generate-and-test approach.
let schedule_model () = let ndays = 3 inWe determine the set of actions to do on day
d
.
The set is chosen randomly from the set of all possible actions.
However, if we ask again which actions were chosen for day d
,
we should get the same answer. Thus actions d
acts as a
logic variable: its value, initially indeterminate, once
determined, does not change. We use the memo
facility of
Hansei to define such memoized stochastic functions.let actions = memo (fun d -> powerset [Shop; Walk; Barber; Supermarket]) inThe proposition
action_on a d
states that on day d
I do action a
.
The proposition only_on a d
asserts that I do action a
only
on day d
(and not on any other day):let action_on a d = List.mem a (actions d) in let only_on a d = [d] = List.filter (action_on a) [0;1;2] inWe now state the constraints. For example, the first two lines below assert that there exists a day on which I shop. I do not shop on any other day.
let d = uniform ndays in let _ = mustbe (only_on Shop d) in let d = uniform ndays in let _ = mustbe (only_on Walk d) in let d = uniform ndays in let _ = mustbe (only_on Barber d) in let d = uniform ndays in let _ = mustbe (only_on Supermarket d) in (* The same day as I go to the supermarket, I'll shop. That is, there is a day that I perform the action Shop and the action Supermarket *) let d = uniform ndays in let _ = mustbe (action_on Supermarket d && action_on Shop d) in let d = uniform ndays in let _ = mustbe (action_on Walk d && action_on Barber d) in (* I'll go to the supermarket the day before the day I'll take a walk. *) let d = uniform ndays in let _ = mustbe (action_on Walk d && d > 0 && action_on Supermarket (d-1)) in (* I'll take a walk Tuesday. *) let _ = mustbe (action_on Walk 1) inFinally, we return the schedule: the array of actions for each day. This is the output of the model.
Array.init ndays actions;; val schedule_model : unit -> action list array = <fun>The model is defined. We now `solve' it, determining if there exists a schedule satisfying all constraints. Hansei's reply is nearly instantaneous.
exact_reify schedule_model;; - : action list array Ptypes.pV = [(1.11632658893461385e-07, Ptypes.V [|[Supermarket; Shop]; [Barber; Walk]; []| ])]There is only one schedule satisfying the constraints: on Monday I go to the supermarket and shop, on Tuesday I walk and take a haircut. Hansei has also computed the `probability' of the solution, which is the the estimate of the search space. Each choice in the model corresponds to a possible world; many possible worlds are not consistent with the stated constraints and facts and are rejected. The reported
1.11632658893461385e-07
is the weight of the
surviving possible world(s), the one(s) that satisfied all
constraints. Since all our choices were uniform,
the weight represents one in 1/1.11632658893461385e-07
= 8957952
that many worlds. Since worlds can be rejected wholesale, Hansei did not
have to generate all of them. If one fact
in a world contradicts the evidence, we do not need to know
the other facts; that is, we do not need to make the choices for them.all_different
is a hassle to state and a chore to evaluate;
constraint logic programming (sub)systems (e.g.,
SICStus) typically rely on external constraint solvers, where
all-different is built-in. Even without `all-different' the puzzle is
hard; the naive generate-and-test approach is too
slow; a bit of creativity is required.Hansei's formulation of the problem is straightforward and declarative: the naive generate-and-test. That was the point, to see how well Hansei copes without the benefit of a creative formulation or fine-tuning. Hansei copes quite well. It helped that deterministic parts of the puzzle more naturally express in a functional language with data structures and pattern-matching (Hansei's host language, OCaml) than in Prolog. We use non-determinism only when we need it. It also greatly helped that the all-different constraint, or sampling all permutations, turns out efficiently, lazily, implementable in Hansei.
zebra.ml [6K]
The complete, commented source code, with every significant line annotated.
Hansei has determined that the puzzle has only one solution,
whose weight is 5e-12
. The search space is large indeed.
The Hansei parser combinator library looks quite like the famous Parsec, by which it is inspired. To be sure, the sort of non-determinism needed for Parsec is easily emulated in many languages -- that's why Parsec has spread so widely. And yet Hansei brings something new: running Parsec parsers not only forwards, parsing a given stream, but also effectively backwards, to generate a stream.
As usual, Hansei parsers digest a stream of characters; the characters
do not need to be present all in memory, but can be read on demand.
That's why stream
is a thunk. A parser takes a stream and
returns the parsing result, the result of a semantic action, and
the remainder of the stream:
type stream_v = Eof | Cons of char * stream and stream = unit -> stream_v type 'a parser = stream -> 'a * stream
The two primitive parsers are pure
let pure : 'a -> 'a parser = fun x st -> (x,st)which parses the empty string returning its argument as the parsing result, and
p_sat
:let p_sat : (char -> bool) -> char parser = fun pred st -> match st () with | Cons (c,st) when pred c -> (c,st) | _ -> fail ()that checks that the current element of the stream satisfies a given predicate -- and if it does not, reports failure using Hansei's
fail
. Other parsers are written in terms
of the above, for example, the parser of a characterlet p_char : char -> char parser = fun c -> p_sat (fun x -> x = c)
Combinators combine parsers and their semantic actions and express the rules of the grammar:
val (<*>) : ('a -> 'b) parser -> 'a parser -> 'b parser val (<$>) : ('a -> 'b) -> 'a parser -> 'b parser val ( *> ) : 'a parser -> 'b parser -> 'b parser val ( <* ) : 'a parser -> 'b parser -> 'a parser val p_fix : ('a parser -> 'a parser) -> 'a parser val many : 'a parser -> 'a list parser
For example, (<*>)
combines parsers sequentially. The parsers (<$>)
, ( *> )
, ( <* )
are the variations with respect to semantic
actions; their types show what they do. Alternative rules in a grammar
are expressed with the alternation combinators
let (<|>) : 'a parser -> 'a parser -> 'a parser = fun p1 p2 st -> uniformly [|p1;p2| ] st let alt : 'a parser array -> 'a parser = fun pa st -> uniformly pa stwhich rely on the Hansei operator for non-deterministic choice, of a parser from a given set.
Here is the first example: recognizing palindromes. Since we build a recognizer, the semantic actions do nothing, returning unit. The grammar reads quite like BNF: A palindrome over the two-character alphabet is either the empty string, a single character, or a palindrome flanked on both sides with the same character:
let pali = p_fix (fun pali -> alt [| empty; (fun _ -> ()) <$> p_char 'a'; (fun _ -> ()) <$> p_char 'b'; p_char 'a' *> pali <* p_char 'a'; p_char 'b' *> pali <* p_char 'b' | ] )
To parse a string according to the grammar -- to `run the parser forwards' -- we convert a string to a stream, apply the parser and check that the stream is fully consumed:
let run_fwd : 'a parser -> string -> 'a Ptypes.pV = fun p s -> exact_reify (fun () -> let (v,s) = p (stream_of_string s) in if s () <> Eof then fail (); v)We could just as well parse a file rather than an in-memory string. Evaluating
run_fwd pali "abaaba"
shows that "abaaba" is a
palindrome.More interestingly, the very same parser can be run effectively backwards, telling us all the strings it parses.
let run_bwd : int option -> 'a parser -> (unit -> stream) -> ('a * string) Ptypes.pV = fun n p stm -> Inference.explore n (reify0 (fun () -> let st = stm () in let (v,st') = p st in if st' () <> Eof then fail (); (v,string_of_stream st)))Since the set of such strings can be infinite, we have to bound the search, specifying the inference depth limit as the first argument of
run_bwd
(None
means no limit). For example, to see a few palindromes, we evaluate:run_bwd (Some 10) pali (fun () -> stream_over [|'a';'b'| ])To run a parser effectively backwards, all we need is to give the parser a random stream to parse. The function
stream_over
generates all streams over a given alphabet -- the infinite number of them.
The parser filters the set of streams; the surviving streams are then reported
as parseable.
The next example finds all 5-letter palindromes. We add
the predicate stream_len st 5
, which fails if the stream
does not have 5 elements:
run_bwd None pali (fun () -> let st = stream_over [|'a';'b'| ] in stream_len st 5; st)Although the code indeed generates all eight 5-letter palindromes (equally likely), it is quite puzzling. No search limit was imposed, and yet the search terminated, even though
stream_over
produces the
infinite sequence of streams. Since stream_len
rejects all
streams whose length is not 5, we should get stuck generating
longer and longer streams, and rejecting them all. But we don't.
Let's see the code for stream_len
and recall the type of the stream:
type stream_v = Eof | Cons of char * stream and stream = unit -> stream_v let rec stream_len st n = match (st (),n) with | (Eof,0) -> () | (Cons (_,t),n) when n > 0 -> stream_len t (n-1) | _ -> fail ()
All characters of the stream indeed do not have to be in memory at
once: they can be read on-demand, when the thunk, representing the
stream, is forced. In stream_len
code, this demand is expressed
as st ()
. Forcing the stream brings in a new character, and the
thunk for the tail of the stream. Clearly, stream_len st 5
forces no more than 6 thunks. If the 6th thunk yields a Cons
, the
function fails and never examines the stream any further. The rest of the
stream is never demanded, and is never chosen.
There remains one more mystery of run_bwd
: in its code, shown
earlier, the identifier st
is used three times, as if it were the
same sequence of characters. However, st
is a thunk, choosing the
sequence of characters on-demand. Forcing the thunk st ()
in one
place (e.g., inside stream_len
) does not guarantee that the
evaluation of st ()
somewhere else would give exactly the same
choice. Sharing a procedure for making choices does not mean that the
choices themselves are shared. For that reason, the stream generator stream_over
employs a trick:
val letlazy : (unit -> 'a) -> (unit -> 'a) let stream_over : char array -> stream = fun ca -> let rec loop () = if flip 0.5 then Cons (uniformly ca, letlazy loop) else Eof in letlazy loopThe function
letlazy
, which looks from its type as an identity
function, takes a thunk and returns a thunk. When the result of letlazy thunk
is repeatedly forced, it repeatedly makes the same
choice. Such non-deterministic laziness requires first-class, or
world-local memory -- which is what Hansei implements. In
functional-logic programming, this is called ``call-time choice''. In
quantum mechanics, in is called ``wavefunction collapse''. Before we
observe a system, for example, before we look at a flipped coin, there could
indeed be several choices for the result. After we looked at the
coin, all further looks give the same value.The source code gives another example of reversible parsing: parsing an arithmetic expression and computing its value. The grammar looks like the one shown in nearly every book on compiler construction:
let digit = (fun c -> Char.code c - Char.code '0') <$> p_sat (fun x -> x >= '0' && x <= '9') let num = (List.fold_left (fun acc d -> d + acc * 10) 0) <$> many1 digit let rec expression st = ( ((fun t1 op t2 -> op t1 t2) <$> term <*> plus_minus <*> expression) <|> term ) st and term st = ( ((fun t1 op t2 -> op t1 t2) <$> factor <*> prod_div <*> term) <|> factor ) st and factor st = ( (p_char '(' *> expression <* p_char ')') <|> num ) stThe code demonstrates both the parsing of a string and obtaining all valid arithmetic expressions, or expressions that evaluate to 5.
The parsing combinator library demonstrated that reversible parsers are just as possible in Hansei as they are in Prolog. Prolog is a separate language whereas Hansei is just an OCaml library. The parsing combinator library shows off the laziness principle: delaying the choices until the last possible moment, hoping that the moment will never arrive. We have seen how the principle has made the infinite search finite. Less haste, infinitely more speed.
Guess Lazily! Making a program guess, and guess well
The parsing combinator library was the main example in the talk and was discussed in detail.
What is maximal munch and why it is so common that is hardly ever mentioned? A programming language specification cliche defines the syntax of an identifier as a letter followed by a sequence of letters and digits, or, in the extended BNF,
identifier ::= letter letter_or_digit*where
*
, the Kleene star, denotes zero or more repetitions of letter_or_digit
. Thus in a program fragment var1 + var2
, var1
and var2
are identifiers. Why not v
, va
, and var
?
According to the above grammar every prefix of an identifier is also
an identifier. To avoid such conclusions and the need to complicate
the grammar, the maximal munch rule is assumed: letter_or_digit*
denotes the longest sequence of letters and digits. Without the
maximal munch, we would have to writeidentifier ::= letter letter_or_digit* [look-ahead: not letter_or_digit]which is not only awkward and requires the notation for look-ahead, but is also much less efficient. If
*
means mere zero or more
occurrences, letter_or_digit*
on input "var1 "
will match the
empty string, "a"
, "ar"
and "ar1"
. Only the last match leads to
the successful parse of the identifier, recognizing var1
. Maximal
munch cuts the irrelevant choices. It has proved so useful that it is
rarely explicitly stated when describing grammars.
Maximal munch however all but destroys the reversible parsing, the
ability to run the parser forward (as a parser or recognizer) and
backward (as a language generator). We illustrate the problem in
Prolog. A recognizer in Prolog is a relation between two streams
(lists of characters) S
and Srem
such that Srem
is the suffix of S
.
In a functional language, we would say that a recognizer recognizers the
prefix in S
, returning the remaining stream as Srem
.
Here is the recognizer for the character 'a'
:
charA([a|Srem],Srem).The Kleene-star combinator (typically called
many
) takes as an argument
a recognizer and repeats it zero or more times. Without the maximal munch,
it looks as follows:many0(P,S,S). many0(P,S,Rest) :- call(P,S,Srem), many0(P,Srem,Rest).Thus
many0(charA,S,R)
will recognize or generate the prefix of S
with zero or more 'a'
characters. Thanks to the first clause, many0(P,S,R)
always recognizes the empty string. Here is how
we recognize a*
in the sample input stream [a,a,b]
:?- many0(charA,[a,a,b],R). R = [a, a, b] ; R = [a, b] ; R = [b]and generate the language of
a*
:?- many0(charA,S,[]). S = [] ; S = [a] ; S = [a, a] ; S = [a, a, a] ; S = [a, a, a, a] ; ...
To implement the maximal munch, many
should call the argument parser
for as long as it succeeds. We need a way to tell if the parser
fails or succeeds. To branch on success or failure of a goal, Prolog
offers soft-cut. Recall, soft-cut P *-> Q; R
is equivalent to the conjunction P, Q
if P
succeeds at least once. Soft-cut commits to that choice
and totally discards R
in that case. R
is evaluated only when P
fails from the outset. Soft-cut lets us write many
with maximal munch:
many(P,S,Rest) :- call(P,S,Srem) *-> many(P,Srem,Rest) ; S = Rest.Now the the empty string is recognized (i.e.,
S = Rest
) only
if the parser P
fails. Recognizing a*
in the sample input?- many(charA,[a,a,b],R). R = [b].
becomes quite more efficient. There is only one choice, for
the longest sequence of a
s. However, attempting to generate
the language a*
:
?- many(charA,S,[]). <loops>leads to an infinite loop. The argument recognizer,
charA
, when asked to
generate, always succeeds. Therefore, the recursion in many
never
terminates. When running backwards, the recognizer tries
to generate, disregarding all other choices,
the longest string of a
s -- the infinite string. Although the empty
string is part of the language a*
, we fail to generate it.
The appearance of the soft-cut in the definition of many
should
have already raised the alarm. After all, soft-cut embodies
negation, which is required to detect
failure and proceed. In fact, negation can be expressed via
soft-cut: not(P) :- P *-> fail; true.
The Hansei parser combinator library supports many
, which, unlike
the one in Prolog, no longer forces the trade-off between efficient
parsing and generation. Hansei's many
obeys maximal munch and
generates the complete language, with no omissions. Hansei lets us
have it both ways. Before describing the implementation, we show a few
representative examples.
Parser | Stream | Result | |
---|---|---|---|
1 | many (p_char 'a') |
aaaa |
uniquely |
2 | many (p_char 'a') <*> p_char 'b' |
b |
uniquely |
3 | many (p_char 'a') <* p_char 'a' |
aaa |
NO parse |
4 | many (p_char 'a') <*> many (p_char 'a')) |
aaa |
uniquely |
5 | many ((p_char 'a') <|> (p_char 'b')) |
ababb |
uniquely |
6 | many ((many1 (p_char 'a')) <|> (many1 (p_char 'b'))) |
aaabab |
uniquely |
7 | many ((p_char 'a' <* p_char 'a') <|> p_char 'a') <* p_char 'a' |
aaa |
NO parse |
8 | many (p_char 'a') |
random stream | "" "a" "aa" ... |
many
(actually, many1
defined as many1 p = p <* many p
) may nest.
In Example 3, a*a
does not recognize "aaa"
since the a*
munches the entire stream leaving nothing for the parser of the final a
. This is the expected behavior under maximal munch. Example 7
shows no parse for the same reason. Incidentally, the example demonstrates
that under maximal munch,NOT (many (p1 <|> p2) >= many p1 <|> many p2)where
>=
is to be understood as language inclusion. This inequality
does hold for many
without maximal munch (think of (a+b)^n >= a^n + b^n
for non-negative a
, b
and n
). Finally in the last
example we generate the complete language for a*
,
including the empty string. That is, many p
has the propertymany p (s1 | s2) = many p s1 | many p s2where
|
means non-deterministic choice.
To implement the maximal munch in Hansei we need something like soft-cut,
the ability to detect a failure and proceed. Hansei has exactly
the right tools: reify0
and reflect
:
type 'a vc = V of 'a (* leaf *) | C of (unit -> 'a pV) (* unexpanded branch *) and 'a pV = (prob * 'a vc) list val reify0 : (unit -> 'a) -> 'a pV val reflect : 'a pV -> 'a
The primitive reify0
converts a probabilistic computation
to a lazy tree of choices 'a pV
, whose nodes contain found solutions V of 'a
or not-yet-explored branches. The primitive reflect
, the inverse
of reify0
, turns a tree of choices into a probabilistic program that
will make those choices. The primitive reify0
is fundamental in Hansei:
probabilistic inference is implemented by first reifying a program
(the generative model) to the tree of choices and then exploring the
tree in various ways.
For instance, the full tree traversal corresponds to exact inference.
Soft-cut can also be implemented as a choice-tree traversal, first_success
below, which explores the branches looking for the first V
leaf.
It returns the tree resulting from the exploration, which could be
empty if no V
leaf was ever found. Soft-cut then is simply
val first_success : 'a pV -> 'a pV let soft_cut : (unit -> 'a) -> ('a -> 'w) -> (unit -> 'w) -> 'w = fun p q r -> match first_success (reify0 p) with | [] -> r () | t -> q (reflect t)We write
many
in terms of the soft-cut as we did in Prolog:let many : 'a parser -> 'a list parser = fun p -> let rec self st = soft_cut (fun () -> p st) (* check parser's success *) (fun (v,st) -> (* continue parsing with p *) let (vs,st) = self st in (v::vs,st)) (fun () -> ([],st)) (* total failure of the parser *) in self
The second question is avoiding losing solutions when running the parser ``backwards''. Unlike Prolog, Hansei parsers are functions rather than relations. They take a stream, attempt to recognize its prefix and return the rest of the stream on success. They cannot be run backwards. However, we achieve the same result -- generating the set of parseable streams -- by generating all streams, feeding them to the parser and returning the streams that parsed completely. Since the number of possible streams is generally infinite, we have to generate them lazily, on demand. To ensure completeness -- to avoid losing any solutions -- the parsers should have the property
many p (s1 | s2) = many p s1 | many p s2Surprisingly,
many p
already satisfies it. The trick is laziness in the stream
and Hansei's support of nested inference. The primitive reify0
may appear in probabilistic programs -- in other words, a probabilistic
model may itself perform inference, over an inner model. In order
for this to work correctly, we had to ensure thatlet x = letlazy (s1 | s2) in reify0(fun () -> model x) === let x = letlazy s1 in reify0(fun () -> model x) | let x = letlazy s2 in reify0(fun () -> model x)where
x
is demanded in model. That is, reify0
should reify only the
choices made by the inner program, and let the outer choices take
effect. The stream generator has to be lazy, so it has the form letlazy (s1 | s2)
. Comparing the nested
inference property with the code for many
reveals that
the key property of many p (s1 | s2)
is satisfied
without us needing to do anything. Our many
has exactly the
right semantics.In conclusion, standard logic programming all too often forces us to choose between efficiency and expressiveness, on one hand, and completeness on the other hand. Negation and committed choice make logic programs easier to write and, in some modes, faster to run. Alas, some other modes (informally, running `backwards') become unusable or impossible. Kleene star is a perfect example of the trade-off: maximal munch simplifies the grammar and makes parsing efficient, but destroys the ability to generate grammar's language. Functional logic programming systems can remove the trade-off. Properly implemented encapsulated search (nested inference, in Hansei) lets us distinguish the choices of the parser from the choices of the stream and cut only the former. Perhaps surprisingly this distinction just falls out of the need for non-deterministic stream to be lazy. Thus Kleene star with maximal munch lets us parse and generate the complete language. In Hansei, we can have it both ways.
many
. In addition, the arithmetic expression grammar
relies on many
to parse integer literals.`Reversible' parser combinators
LogicT - backtracking monad transformer with fair operations and pruning
Extensive discussion of committed choice
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML