We adopt and argue the position of factoring an effectful program into a first-order effectful DSL with a rich, higher-order `macro' system. Not all programs can be thus factored. Although the approach is not general-purpose, it does admit interesting programs. The effectful DSL is likewise rather problem-specific and lacks general-purpose monadic composition, or even functions. On the upside, it expresses the problem elegantly, is simple to implement and reason about, and lends itself to non-standard interpretations such as code generation (compilation) and abstract interpretation. A specialized DSL is liable to be frequently extended; the experience with the tagless-final style of DSL embedding shown that the DSL evolution can be made painless, with the maximum code reuse.
We illustrate the argument on a simple but representative example of a rather complicated effect -- non-determinism, including committed choice. Unexpectedly, it turns out we can write interesting non-deterministic programs in an ML-like language just as naturally and elegantly as in the functional-logic language Curry -- and not only run them but also statically analyze, optimize and compile. The richness of the Meta Language does, in reality, compensate for the simplicity of the effectful DSL.
The key idea goes back to the origins of ML as the Meta Language for the Edinburgh LCF theorem prover. Instead of using ML to build theorems, we now build (DSL) programs.
This position paper seeks to draw attention to a non-monadic alternative: Rather than structuring (effectful) programs as monads -- or applicatives, arrows, etc., -- we approach programming as a (micro) language design. We determine what data structures and (effectful) operations seem indispensable for the problem at hand -- and design a no-frills language with just these domain-specific features. We embed this bare-bone DSL into OCaml, relying on OCaml's extensive facilities for abstraction and program composition (modules, objects, higher-order functions), as well as on its parsing and type checking.
We state the main points of our argument in Points to argue. We hasten to say that the key insight is rather old (Wand, Th.Comp.Sci. 1982), quite resembling algebraic specifications (see Wirsing's comprehensive survey in the Handbook of Theoretical Computer Science B). In fact, it is the insight behind the original ML, as a scripting language for the Edinburgh LCF theorem prover -- only applied to programs rather than theorems. What has not been clear is how simple an effectful DSL may be while remaining useful. How convenient is it, especially compared to the monadic encodings? How viable is it to forsake the generality of first-class functions and monads and what benefits may come? We report on an experiment set out to explore these questions.
The present paper follows the style of Hughes (Why functional programming matters, 1989), Hudak (Building domain-specific embedded languages, 1996) and Goguen (Higher order functions considered unnecessary for higher order programming, 1988) -- or, for mathematically inclined, Polya (How to solve it, 1945) -- arguing from examples. Just like those arguments, it is indeed hard to grasp the limitations and applicability, and it is hard to formalize. Problem solving in general is a skill to learn rather than an algorithm to implement; it is inherently informal. Even in mathematics, how to prove theorems is an art and a judgement; one acquires it not by following rigorous descriptions but by reading existing proofs and doing exercises. This is the format we follow in the paper.
M. Gordon, R. Milner, L. Morris, M. Newey and C. Wadsworth:
A Metalanguage for Interactive Proof in LCF
Conference Record of the Fifth Annual ACM Symposium
on Principles of Programming Languages, 1978
The message left me puzzled about its author's goals and motivations. It is hard to imagine he preferred the monadic program for its verbose notation, replete with irrelevant names like``Assume a simple OCaml program with two primitives that can cause side-effects:let counter = ref 0 let incr x = counter := !counter + x ; !counter let put n = counter := n; !counter put (5 + let f x = incr x in f 3)This example can be transformed into a pure program using a counter monad (usingppx_monadic
syntax):do_; i <-- let f x = incr x in f 3 ; p <-- put (5 + i) return pFor a suitable definition of bind and return, both programs behave equivalently. My question is: How can one automatically translate a program of the former kind to the latter?''
i
and
p
. Was the author after purity? That is a mirage we lay bare in
Sec. 6. Was he attracted to the separation of effectful
and non-effectful code and the possibility of multiple interpretations
of effects (`the overriding of the semicolon')? These good properties
are not unique to monads. The other, often forgotten, ways of dealing
with side-effects ought to be more widely known.
The second cue came about a month later, observing students
solving an exercise to compute all permutations of a given list of
integers. The reader may want to try doing that in their
favorite language. Albeit a simple exercise, the code
is often rather messy and not obviously correct. In the
functional-logic language Curry built around
non-determinism, computing a permutation is strikingly elegant:
mere foldr insert []
. It is the re-statement of the specification:
a permutation
is moving the elements of the source list one-by-one into some
position in the initially empty list. The code immediately tells that
the number of possible permutations (possible choices of permutations)
of n
elements is n!
. From its
very conception in the 1950s, non-determinism
was called for to write clear specifications -- and then to make them
executable. Can we write the list permutation code just as elegantly
in a language that
was not designed with non-determinism in mind and offers no support
for it? How far can we extend it?
Along with the describing the experiment and its ramifications, we also offer an argument: why this approach is worth exploring in the first place. The points of the argument, reverberating throughout the paper, are collected below:
Although this point may be obvious to some, the caml-list (as noted in earlier), Reddit, Stack Overflow, etc. discussion places are awash with misunderstandings and irrational exuberance towards monads. The argument pointing out their proper place and limitations is worth repeating.
Since the DSL is intentionally without abstraction or syntactic sugar -- just enough to express the problem at hand, however ungainly -- programming in it directly is a chore. That is why we endow it with a very expressive `preprocessor', by embedding into a metalanguage with rich abstractions like functions, definitions, modules, etc.
Not all problems can be thus factored into a first-order DSL and a higher-order metalanguage (e.g., the factoring does not support arbitrary higher-order effectful functions). Therefore, how well the factoring works in practice and if it is worth paying attention to become empirical questions. The paper describes one case study, exploring how far we can push this approach.
The compelling case for (embedded) DSLs has been already made, by Hudak, 1996. The present paper is another case study. We also demonstrate, in Sec. 5, one more advantage of an embedded DSL: the ability to evaluate the same DSL code in several ways. We can not only (slowly) interpret the code, but also perform static analyses such as abstract interpretation, and generate (faster) code.
The advantages of the domain-specific approach have to be balanced against the applicability (how wide is the domain, does it let us do something interesting or practically significant) and extensibility (how easy it is to extend the program and the domain and reuse the existing code). This is a trade-off, which the case study in the present paper is to help evaluate.
Thus we do not argue that the domain-specific approach is `better'. We do argue, however, against the presumption (evidenced in the received comments on the drafts of this paper) that one always has to strive for the general solution. Premature generalization and abstraction, like premature optimization, is not a virtue.
Our language is truly domain-specific: for example, it offers no abstraction mechanism of its own and no general monadic interface for writing effectful computations. As an upside, the DSL admits useful non-standard implementations. Sec. 5 shows three. In particular, Sec. 5.2 describes an abstract interpretation, to statically estimate the degree of non-determinism of a DSL term. The code-generation interpretation -- the DSL compiler -- is presented in Sec. 5.3.
Sec. 6 discusses the presented factoring approach, answering several commonly heard objections. The main theoretical ideas have all been known in isolation, often for many decades, as we review in Sec. 7.
0README.dr [<1K]
The complete source for all the code described in the paper
We have reported only one experiment, which -- combined with the related LMS experience -- suggests that the DSL-metalanguage factoring approach to effectful programming is viable. More experiments are needed to better grasp its usefulness. Specifically we would like to try examples in the scope of Async or Lwt libraries. A bigger exercise would be to re-implement the programming language Icon -- another language with built-in non-determinism.
One may wonder how the history of (meta) programming might have turned out if the ML evolution had taken a different turn: kept using ML as the Meta Language as it was initially designed for, but building objects other than formulas and theorems -- in particular, programs.