A short tutorial on embedding effectful DSLs, or: An effectful computation as a small, first-order DSL
do
In truth, effects are not married to monads and approachable directly. The profound insight behind monads is the structuring, the separation of `pure' (context-independent) and effectful computations. The structuring can be done without explicating mathematical monads, and especially without resorting to vernacular monads such as State, etc. This article gives an example: a simple, effectful, domain-specific sublanguage embedded into an expressive `macro' metalanguage. Abstraction facilities of the metalanguage such higher-order functions and modules help keep the DSL to the bare minimum, often to the first order, easier to reason about and implement.
The key insight predates monads and goes all the way back to the origins of ML, as a scripting language for the Edinburgh LCF theorem prover. What has not been clear is how simple an effectful DSL may be while remaining useful. How convenient it is, especially compared to the monadic encodings. How viable it is to forsake the generality of first-class functions and monads and what benefits it may bring. We report on an experiment set out to explore these questions.
We pick a rather complex effect -- non-determinism -- and use it in OCaml, which at first blush seems unsuitable since it is call-by-value and has no monadic sugar. And yet, we can write non-deterministic programs just as naturally and elegantly as in Haskell or Curry.
The running tutorial example is computing all permutations of a given
list of integers. The reader may want to try doing that in their
favorite language or plain OCaml. Albeit a simple exercise, the code
is often rather messy and not obviously correct.
In the functional-logic language Curry, it is
strikingly elegant: mere foldr insert []
. It is the re-statement
of the specification:
a permutation is moving the elements of the
source list one-by-one into some position in the
initially empty list. The code immediately tells that the number of
possible permutations of n
elements is n!
. From its very
conception in the 1959 Rabin and Scott's paper, non-determinism was
called for
to write clear specifications -- and then to make them executable.
That is what will shall do.
nondet-paper.pdf [161K]
The extended abstract, submitted to the ML Family Workshop 2017
Eugenio Moggi and Sonia Fagorzi: A Monadic Multi-stage Metalanguage.
FoSSaCS 2003: Foundations of Software Science and Computational Structures
Moggi and Fagorzi described monads as a tool for structuring -- staging -- of effectful computations. There are other ways to introduce sublanguages, such as the tagless-final style shown off in this article.
M. Gordon, R. Milner, L. Morris, M. Newey and C. Wadsworth: A Metalanguage for Interactive Proof in LCF
Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, 1978
The origins of non-determinism
The separation of `pure' (context-independent) and effectful computations
is akin to the distinction between expressions and commands in Algol
(with one important difference), as Robert Harper pointed out.
That distinction is modal but not monadic: specifically,
in a so-called lax modality.
Robert Harper: Commentary on Practical Foundations for Programming Languages (Second Edition). December 30, 2017 < http://www.cs.cmu.edu/~rwh/pfpl/commentary.pdf >
Section 20. Modalities and Monads in Algol
Frank Pfenning and Rowan Davies: A judgmental reconstruction of modal logic.
Mathematical Structures in Computer Science, 11(4):511-540, 2001
Matt Fairtlough, Michael Mendler: Propositional Lax Logic
Information and Computation, v137, N1, 25 August 1997, pp. 1-33
I am grateful to Robert Harper for pointing out the lax modality and its discussion.
int_t
and ilist_t
below, the semantic domains of
integer and integer list expressions). The meaning of a complex expression
is computed by combining the meaning of immediate sub-expressions, that is,
compositionally. A language is thus defined by specifying the semantic
domain types and the meaning computations for its syntactic forms.
These definitions
are typically collected into a signature, such as:module type NDet = sigwhich we will be filling in. Since we will be talking about integer lists, we need the integer type and at least the integer literals:
type int_t val int: int -> int_tWe can add the standard operations on integers, but they are not needed for the problem at hand. They can always be added later. After all, the extensibility is the strong suit of the tagless-final embedding.
We also need integer lists, with the familiar constructors.
type ilist_t val nil: ilist_t val cons: int_t -> ilist_t -> ilist_t val list: int list -> ilist_tThe
list
primitive makes an OCaml list to be the list in our DSL:
although every DSL list can be expressed through nil
and cons
, the
special notation for literal DSL lists is convenient. We also need
pattern-matching on lists, or
the deconstructor. The syntax is admittedly ungainly: we are trying to
represent match ... with
as an applicative expression:val decon: ilist_t -> (unit -> ilist_t) -> (* if nil *) (int_t -> ilist_t -> ilist_t) -> (* if cons h t *) ilist_t
We also need foldr
. Strictly speaking, we do not need it: it is expressible
through the features already defined. But foldr
is so fundamental,
it is convenient to have as a primitive.
val foldr: (int_t -> ilist_t -> ilist_t) -> ilist_t -> ilist_t -> ilist_t
Finally, we define the operations for non-determinism: failure and the binary choice. The latter takes two non-deterministic expressions and non-deterministically executes one of them.
val fail: ilist_t val (|||): ilist_t -> ilist_t -> ilist_t
And we are done.
end (* of the signature NDet *)
An attentive reader may get the feeling that something is amiss: where are functions? We have defined neither the DSL function type nor operations to create and apply functions. Our DSL is not a lambda-calculus; it is essentially first order. Please hold your wonder.
NDet
DSL may be, it is enough for the task at hand.
We now use it to write the list permutation as elegantly
as in Curry.
First, we need the non-deterministic list insertion: insert x lst
is
to insert the element x
somewhere in lst
, returning the extended list.
That is, it inserts x
at the front of lst
, or after the first element of lst
, or after the second element of lst
, etc. The algorithm can be
formulated, and hence implemented, inductively: insert x lst
either
inserts x
at the front of lst
or within lst
, i.e.,
somewhere in its tail.
Computing the list permutation is now accomplished. The following is the complete code, which also includes a simple test.
module Perm(S:NDet) = struct open S let rec insert x l = cons x l ||| decon l (fun () -> fail) (* l is empty, no tail *) (fun h t -> cons h (insert x t)) let perm = foldr insert nil let test1 = perm (list [1;2;3]) endThe DSL primitives such as
foldr
, fail
, nil
etc. are all defined
in the implementation S
of the signature NDet
. The code does not depend
on any particular implementation, which is hence abstracted over as an
argument S
. DSL code is hence typically represented as an OCaml functor,
parameterized by the DSL implementation.
Although our code looks like the Curry code and is exceedingly simple,
there is something odd about it. We have said that NDet
has no functions:
no function types, no way to create or apply functions,
let alone recursive functions. What is insert
then?
Isn't foldr
a higher-order function? They are functions --
in the metalanguage but not in NDet
. From the DSL point of
view, insert
is a `macro'. Our code then is a combination
of a trivial, non-deterministic DSL with a very expressive,
higher-order `macro' system (*).
Moreover, the DSL evaluation and the `macro-expansion' run like coroutines.
It is not unheard of: after all, coroutines were invented
as a communication mechanism among phases of a Cobol compiler
(hat tip to Aggelos Biboudis). This coroutine-like evaluation is the
essence of Moggi's computational calculus.
Melvin E. Conway. Design of a Separable Transition-diagram Compiler.
Commun. ACM 6.7 (July 1963), pp. 396-408.
Our motivation hence is to make the preprocessor less human.Much of the power of C comes from having a powerful preprocessor. The preprocessor is called a programmer.
P.J. Moylan: The Case against C. Technical Report EE9240. July 1992. The University of Newcastle
This is the source for the above C joke. The section ``Modularity'' of Moylan's paper reads like the compelling motivation for the tagless-final approach.
Writing the DSL code as functors is ungainly. Lighter, functorless notation: blending DSL into OCaml shows what we can do to make the DSL programming more pleasant.
Perm
code we need an implementation of the NDet
signature,
which we develop in this section. Since we are interested in the list
of all permutations, it seems natural to take int_t
and ilist_t
to be lists of all choices an integer or a list DSL expression
may produce.
One can hear the shouts: this is the List monad! Yes, it is -- after all,
it is one of the implementations of non-determinism, envisioned already
by Rabin and Scott in the the 1950s. But it is not the only one.
The accompanying code shows another implementation of NDet
, in terms
of delimited continuations, the delimcc
library. One may easily think of
others, e.g., using the operating system threads. See also a definitely
non-monadic code-generation implementation below.
The list implementation is the easiest to explain -- and also the most
performant for our simple task. It is has to be stressed: our NDet
DSL has no bind
or return
operations. The user does not know if ilist_t
is the `list monad' or other monad or no monad at all. Nor
should the user care: they want non-deterministic operations on lists,
and that is what NDet
provides, directly.
Thus we will be implementing
module NDetL = structfilling in the definitions one-by-one. As already said, we take a non-deterministic expression to mean the list of all values it may produce. For now we assume finitely non-deterministic expressions, which is enough for our task:
type int_t = int list type ilist_t = int list listLiteral expressions are deterministic: have exactly one value:
let int x = [x] let list x = [x] let nil = [[]]
Two auxiliary functions will make further definitions easier:
let concatmap: ('a -> 'b list) -> 'a list -> 'b list = fun f l -> List.concat @@ List.map f l let liftm2 : ('a -> 'b -> 'c) -> 'a list -> 'b list -> 'c list = fun f l1 l2 -> concatmap (fun x -> List.map (f x) l2) l1The types and names should hint at their purpose, which becomes clear below. Since we are not concerned with monads, we do not need to prove any monad laws. (Stating equational laws is generally a good idea. But these laws should be formulated in terms of
NDet
rather than a particular
implementation. Also, the standard monad laws are the least interesting.)
Now we can implement the constructor and deconstructor for the NDet
lists:let cons: int_t -> ilist_t -> ilist_t = liftm2 (fun h t -> h::t) let decon: ilist_t -> (unit -> ilist_t) -> (int_t -> ilist_t -> ilist_t) -> ilist_t = fun l onnil oncons -> concatmap (function | [] -> onnil () | h::t -> oncons (int h) [t]) l let rec foldr: (int_t -> ilist_t -> ilist_t) -> ilist_t -> ilist_t -> ilist_t = (* exercise for the reader *)
In the list implementation of non-determinism, the two non-deterministic operations have a particularly simple expression:
let fail = [] let (|||) = (@) end
We are finished, and ready to run our Perm
code. The result is in
the comments:
let _ = let module M = Perm(NDetL) in M.test1 (* - : NDetL.ilist_t = [[1; 2; 3]; [1; 3; 2]; [2; 1; 3]; [2; 3; 1]; [3; 1; 2]; [3; 2; 1]] *)
do
sort
just like that -- and execute it, too.
It is called `slow sort' -- one of the benchmarks
of functional-logic programming. Although not usually fast,
it is correct by definition. The actual
performance depends on the implementation and could be quite good.To express sorting we need two more non-deterministic primitives. Extending a language defined in the tagless-final style is easy: we just add new definitions reusing the old ones:
module type NDetComm = sig include NDet val rId : (int list -> bool) -> ilist_t -> ilist_t val once : ilist_t -> ilist_t endThe operation
rId
is a form of a logical conditional: it imposes a
guard (a predicate constraint) on a non-deterministic expression.
It is hence akin to List.filter
. The name
is chosen to match the Curry standard library. The primitive once
(called head
in Curry) expresses the so-called `don't care non-determinism':
if an expression has several latent choices, once
picks one of them.The sorting is written literally as ``a sorted permutation'':
module Sort(Nd:NDetComm) = struct open Nd include Perm(Nd) let rec sorted = function | [] -> true | [_] -> true | h1 :: h2 :: t -> h1 >= h2 && sorted (h2::t) let sort l = once @@ rId sorted @@ perm l let tests = sort (list [3;1;4;1;5;9;2]) endAn attentive reader must have noticed that the sortedness is expressed `meta-theoretically' as one might say (why?).
Extending a DSL implementation is just as easy as extending the language definition: we just add the code for the new primitives, which are indeed primitive:
module NDetLComm = struct include NDetL let rId = List.filter let once = function [] -> [] | h::_ -> [h] end
And we can really sort: let module M = Sort(NDetLComm) in M.tests
As a warm-up, let us take one particular DSL implementation, such as NDetL
described earlier. Let us write perm
without any functors this time,
as an ordinary OCaml function:
let perm : int list -> int list list = fun l -> let open NDetL in let rec insert x l = cons x l ||| decon l (fun () -> fail) (* l is empty, no tail *) (fun h t -> cons h (insert x t)) in foldr insert nil (list l)
It is the ordinary OCaml function, no kidding: we can apply it just like perm [1;2;3]
.
We now abstract over the DSL implementation. First, we add to NDet
the
observation function, which should have been there from the very beginning.
But it is never too late.
module type NDetO = sig include NDet val run : ilist_t -> int list list end
The permutation function will receive the DSL implementation as the (first-class) module argument:
let perm : (module NDetO) -> int list -> int list list = fun (module S:NDetO) l -> let open S in let rec insert x l = cons x l ||| decon l (fun () -> fail) (* l is empty, no tail *) (fun h t -> cons h (insert x t)) in run @@ foldr insert nil (list l)
Modular implicits, currently an OCaml branch, can even save us the trouble of
passing the NDet
implementation explicitly.
DSLs become convenient:
DSL primitives look like the ordinary OCaml operations,
but can be distinguished by their types. Instead of first-class
modules we could have used plain records. Our
approach hence easily applies to other ML(-like) languages.
do
NDet
DSL is not monadic. We program in it just like in ML,
directly operating on effectful expressions, without any binds and
returns. To be sure, monads are not without benefits: clear
separation of pure and effectful computations in types and syntax;
uniformity; easy extension to higher-order functions. Let us see
how much if any we have lost without monads.
Our DSL approach just as clearly separates pure and effectful:
anything of the type int_t
or ilist_t
is
potentially non-deterministic; everything else is deterministic.
Thus from the type of insert : int_t -> ilist_t -> ilist_t
we immediately tell that insert
deterministically combines
non-deterministic computations.
Thanks to the richness of the metalanguage, NDet
did not need
higher-order (or even first-order) functions.
Hence we do not
need to worry about dealing with them non-deterministically. There are
certainly problems requiring higher-order effectful computations.
The experience with Lightweight Modular Staging in Scala shows,
however, that for rather many
practical problems, a first-order DSL is sufficient.
Finally, forsaking monads permits more implementations of our NDet
. Suppose we wish to generate code for list
permutations and realize the types as
type int_t = int list code type ilist_t = int list list code(see the accompanying code for the full implementation). Had
NDet
been monadic, such an implementation is impossible: 'a code
is not a monad. First, return : 'a -> 'a code
is
problematic since not every value can be lifted to code (think of
reference cells and I/O channels). Second, we cannot generally execute
the code until we have finished generating it (because intermediate
code may be open). Therefore, bind
is not expressible either.An interested reader might want to ponder:
foldr
is not actually necessary: it can be written
using the other features of NDet
. Write it.'a repr
,
a set of OCaml values that represent DSL expressions of the type 'a
.
We have managed to do without 'a repr
. What have we lost?NDet
signature introducing 'a repr
and
implement this language.NDet
: e.g., using the
free(r) monad or threads. Besides the depth-first search (underlying the
list implementation), try to implement complete search strategies such
as breadth-first search or iterative deepening.NDet
. Why? How to speed it up?
Hint: this web site points to an article on this very topic.oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML