Optimizations of tagless-final embedded DSLs are possible after all. They are typed and patently type-preserving. We present the optimization framework inspired by the normalization-by-evaluation and demonstrate that tagless-final optimizations are also modular, convenient and composable. We contrast them with the optimizations over the familiar deep DSL embedding as a data type.
This article will demonstrate not only that context-sensitive transformations of tagless-final DSLs are possible -- such optimizations are modular, convenient and composable. They are also typed and type-preserving, which is clear not only to us but also to the compiler. As we extend the DSL with new expression forms (such as first-class functions, conditionals, exceptions, etc.) we may still be able to use the previously written optimizations as they are. Many optimizations, such as arithmetic expression simplifications, deal only with a subset of the language. The tagless-final approach lets us express them succinctly, abstracting from the irrelevant parts of the language. Not only optimizations are clearer; they are also robust with respect to language extensions, applicable to any extended language.
We will distinctly see the advantages of the tagless-final DSL embedding compared to the ``deep embedding'', that is, representing the DSL as a data type (AST) in the host language. The extensibility is not the only advantage. Although optimizations are AST transformations, each optimization prefers its own AST form. (The abstract syntax tree may have several subtly different realizations, as we shall see later.) The tagless-final framework lets each optimization work with its own preferred AST view.
This article is written as a tutorial of the tagless-final optimization framework and uses a very simple example. The framework has been successfully used in a complex, practical application, of compiling and optimizing language-integrated queries. Although this tutorial uses OCaml, it may just as well be presented in Haskell.
Tagless-final, systematic optimizations in Haskell and OCaml: Tutorial
The extended version of the tutorial on optimizations in the tagless-final style -- with the different running example: the DSL of logic circuits. The tutorial has been given in Haskell and OCaml.
< http://logic.cs.tsukuba.ac.jp/~ken/quel/ >
The present framework has been used for real-life projects, such as the compositional and extensible language-integrated query. The user-composed query expression is optimized into the form from which the efficient SQL (without nested subqueries) is generated.
type expr = Int of int | Add of expr * exprThis data type represents the ``deep'' embedding of the DSL in OCaml. This is not the road we take. Rather, we introduce the signature for the two, so far, basic forms of our language; integer literals and the addition operation:
module type SYM = sig type 'a repr val int: int -> int repr val add: int repr -> int repr -> int repr endHere, the (for now, abstract) type
'a repr
stands for some
representation of language expressions, indexed by the expression
type. One may think of SYM
as specifying the DSL syntax as a
context-free grammar: just read int repr
as ``the start symbol'' and val int
and val add
as two productions labeled int
and add
. We
do not need know what 'a repr
really is to write sample DSL
expressions, such as:module Ex2(I:SYM) = struct open I let res = add (add (int 1) (int 2)) (add (int 3) (int 4)) end(As above, the name
I
will be used for a generic interpreter,
an instance of SYM
.)To evaluate that expression we write an interpreter for it:
module Ru = struct type 'a repr = 'a let int x = x let add = (+) end
The DSL operations are mapped directly to the corresponding OCaml
operations: Ru
is a meta-circular interpreter for the tiny subset of OCaml.
Correspondingly, a DSL expression is represented by the OCaml value
it evaluates to. SYM
hence can also be viewed as the signature
of the DSL interpreters -- or, in loftier terms, as the specification
of the denotational semantics for the language: 'a repr
defines the domain, and int
and add
give the meaning to DSL integer
literals and the addition in this domain. The denotation
for complex expressions is determined compositionally, from the denotations
of their sub-expressions. To evaluate the sample Ex2
we
interpret it with the Ru
interpreter: let module M = Ex2(Ru) in M.res
,
which gives the OCaml value 10, the denotation of Ex2
in Ru
as the OCaml integer.
The interpreter Ru
is not the only way to give the meaning to DSL
expressions. We way also interpret them as strings, so to display them.
The denotation for an expression is hence its printed representation:
module Sh = struct type 'a repr = string let int = string_of_int let add x y = "(" ^ x ^ " + " ^ y ^ ")" endInterpreting the same
Ex2
using Sh
, as let module M = Ex2(Sh) in M.res
, now gives the string "((1 + 2) + (3 + 4))"
.Besides evaluating DSL expressions we may also want to transform them. A simple example is negating the expression. The negation transformation can also be written as an interpreter (after all, interpreting is the only thing we can do with tagless-final expressions).
module Neg(From:SYM) = struct type 'a repr = 'a From.repr let int x = From.int (-x) let add e1 e2 = From.add e1 e2 end
Neg
interprets the DSL in terms of another interpreter, From
(which we will be calling just F
).
On the surface of it, Neg
is the interpreter transformer. Dually,
it can also be regarded as the expression transformer:module Ex2Neg(F:SYM) = Ex2(Neg(F))
Ex2Neg
has the same type as the original Ex2
: given an interpreter SYM
it computes the denotation of its DSL expression in that SYM
's domain. That is, Ex2Neg
is a tagless-final representation
of a DSL expression -- namely, the negated one. It can be interpreted
with the existing Ru
and Sh
interpreters, or even the Neg
interpreter
(i.e., it can be negated again):let module M = Ex2Neg(Ru) in M.res -10 let module M = Ex2Neg(Sh) in M.res "((-1 + -2) + (-3 + -4))" let module Ex2NegNeg(F:SYM) = Ex2Neg(Neg(F)) in let module M = Ex2NegNeg(Ru) in M.res 10(The evaluation result is shown, indented, underneath each expression.)
In the following we write more such DSL transformers -- the interpreters that not only evaluate or print an expression but also to re-write it, in generally context-sensitive, apparently non-compositional ways.
The sample optimization is simple: apply the algebraic laws of
addition x+0 = 0+x = x
to eliminate the additions to zero. We assume for
simplicity that all integer literals are non-negative. It is an
exercise to the reader to remove that assumption.
The optimization is expressed as an interpretation in the domain of partially-known values, which track and propagate the available static information. In our case, this information is whether the value is statically known to be zero.
module SuppressZeroPE(F:SYM) = struct type 'a repr = {dynamic: 'a F.repr; known_zero : bool} let int x = {dynamic = F.int x; known_zero = (x = 0)} let add e1 e2 = match (e1,e2) with (* add is not recursive! *) | ({known_zero=true},x) -> x | (x,{known_zero=true}) -> x | ({dynamic=x},{dynamic=y}) -> {dynamic=F.add x y; known_zero=false} endAs typical of partially-known values,
'a repr
has two components.
The dynamic
part is the (transformed) expression to be interpreted
by the source F
interpreter. Since this interpreter is abstract at
this point, nothing is known about 'a F.repr
. The
second component of 'a repr
tells if this partially-known value is
in fact fully known, as zero. We find this out when
interpreting DSL integer literals, which are known literally.
The interpretation of the addition propagates this knowledge
and applies the algebraic laws in the straightforward way.
Let us take a moment to convince ourselves of the correctness.
The inductive structure of interpreters makes structural
induction proofs straightforward. First we need a lemma that if e
is
a value of the type 'a repr
and e.known_zero
is true, e
represents zero. The proof is trivial. The soundness can be stated as
the theorem that if E
is a tagless-final encoding of an expression
and I
is an interpreter in which the monoid law of addition
holds, then E(I).res
and E(SuppressZeroPE(I)).res.dynamic
have the same value. We can also prove a stronger property: if e
is
a value of the type 'a repr
, it represents zero if and only if e.known_zero
is true. The theorem clearly holds for the base case int x
. In the inductive case add e1 e2
we apply the inductive
hypothesis to e1
and e2
and the assumption that literals are
non-negative. The stronger property entails the completeness of the
optimization: if e
is a value of the type 'a repr
then e.dynamic
represents the expression that has no additions to zero. The reader is
encouraged to write the detailed proof, perhaps with the help of
their favorite proof assistant.
The following example will demonstrate the optimization:
module Ex3(I:SYM) = struct open I let res = add (add (int 3) (add (int 5) (int 0))) (add (int 1) (int 0)) endOptimizing it and then getting the
Sh
interpreter to print out the result
produceslet module Ex3NoZerosPE(F:SYM) = Ex3(SuppressZeroPE(F)) in let module M = Ex3NoZerosPE(Sh) in M.res - : int SuppressZeroPE(Sh).repr = {SuppressZeroPE(Sh).dynamic = "((3 + 5) + 1)"; known_zero = false}The optimization has worked: zero summands have disappeared. The OCaml output also shows the problem: what is printed is the partially known value. These values are used during the optimization; at the end we would like just the final result. There should be a way to observe the optimization result and spare the user the internals of the optimizer. The second problem is extensibility: if we extend our DSL with, say, first-class functions, we likewise have to extend
SuppressZeroPE
to interpret the new expression forms
(function abstraction and applications) in the 'a
SuppressZeroPE(Sh).repr
domain. These new interpretations are simple
homomorphisms. They are irrelevant for the zero-addition elimination and
are annoying boilerplate. More seriously, they break modularity: every time we
extend the language we have to change all optimizations, even those
that are not affected by the extension. Our optimization framework
overcomes these problems.In the previous section we have seen that tagless-final DSL transformations are written as interpreter transformations. The transformed interpreter uses the suitable representation for DSL expressions, from which the optimization result is extracted, or `observed', at the end. It makes sense therefore to add the observation function to the interpreter signature:
module type SYMOBS = sig include SYM type 'a obs val observe: 'a repr -> 'a obs endIn our
Ru
and Sh
interpreters, the representation type is already suitable
for observation and so observe
is just the identity. The following functor
will add such a trivial observation:module AddObs(I:SYM) = struct include I type 'a obs = 'a repr let observe x = x end
Let us recall again the SuppressZeroPE
optimizer from the warm-up
section: SuppressZeroPE(F)
interprets DSL expressions as values
of the data type
type 'a repr = {dynamic: 'a F.repr; known_zero : bool}that includes
'a F.repr
, the interpretation of the expression in
some F
interpreter (which evaluates the optimized expression). In
addition, 'a repr
contains extra data used during the
optimization. The zero-elimination optimization evaluates the DSL
expression and hence builds 'a repr
bottom-up. The dynamic
part is
the optimized F
interpretation and is extracted at the end. The
following captures the pattern of two expression representations, one
with the extra data for the optimization:module type Trans = sig type 'a from type 'a term val fwd : 'a from -> 'a term (* reflection *) val bwd : 'a term -> 'a from (* reification *) endThe type
'a from
is the representation of a DSL (sub)expression in
the From
interpreter; one may think of it as the optimization result
so far. Since the From
interpreter is abstract, nothing is known
about 'a from
values and they cannot be inspected. The type 'a
term
represents the same DSL (sub)expression; it is also the
optimization result so far. Unlike 'a from
, the values of 'a term
are known, at least to some extent, and can be inspected. These values
contain the working data for the optimization, its state. The
optimization process builds the 'a term
values bottom up, inspecting
those for the already optimized sub-expressions. The next two sections
show the examples. The two representations are related, by the
operations fwd
and bwd
. The latter is used in particular at the
very end to extract the optimization result from the working data 'a
term
. The operation fwd
builds the working data. The two operations
do not generally define bijection: usually fwd
is not surjective and bwd
is not injective. That is, although the composition bwd . fwd
should be the identity, fwd . bwd
is typically not. One may think of bwd
as an `abstraction' function: it throws away the data used
internally during the optimization, the optimization state. Then fwd
is a concretization function, which initializes the optimization
state, to some default, empty value. The composition fwd . bwd
is
not the identity because the thrown away optimization state is
typically richer than the default state.
The two types and their relationship reminds one of the
normalization-by-evaluation (NBE). In NBE, 'a from
will be called an
opaque object representation; 'a term
will then be a meta-language
representation, the domain of the (non-standard) evaluation. The
function fwd
is then reflection and bwd
is reification.
To complete the framework we define a default, generic optimizer:
given a particular instance of Trans
(we reserve the name X
for
such instances), the optimizer builds the 'a term
bottom-up,
from which the 'a from
value can be extracted, or observed.
module SYMT(X:Trans)(F:SYMOBS with type 'a repr = 'a X.from) = struct open X type 'a repr = 'a term type 'a obs = 'a F.obs let int x = fwd (F.int x) let add x y = fwd (F.add (bwd x) (bwd y)) let observe x = F.observe (bwd x) endThe reader may have noticed that this optimizer is the silly identity transformer: the optimization state initialized by
fwd
is not used
for anything and is later thrown away by bwd
. Therefore, the SYMT
-optimized Ex2
is the same as the original Ex2
: to be
precise, module M = Ex2(SYMT(X)(I)) in M.observe M.res
will always
give the same result as module M = Ex2(I) in M.res
for any X
and
interpreter I
. Concrete optimizers are built as 'deltas' to SYMT
, overriding the interpretations of some DSL forms and actually
using the optimization state. Interpretations of the other forms,
irrelevant for the optimization, remain homomorphisms. The example is
next.
Defining an optimization pass within the framework means specifying
two things. First is the optimization state, the data that govern the
optimization. They are defined by writing an instance of Trans
.
Next we need to tell how to use this optimization state and actually
do the optimization. It is specified as a `partial interpreter': an
interpreter for only those DSL forms that are affected by the
optimization. We will be calling that second part IDelta
. It is
convenient to put the two parts inside a functor, since they are both
parameterized by the From
interpreter, which interprets the
optimized expression. In the case of the zero-addition elimination, the
optimization pass looks as follows.
module SuppressZeroPass(F:SYM) = struct module X = struct type 'a from = 'a F.repr type 'a term = | Unk : 'a from -> 'a term | Zero : int term (* if the expression is zero *) let fwd x = Unk x (* generic reflection *) let bwd : type a. a term -> a from = function | Unk x -> x | Zero -> F.int 0 end open X module IDelta = struct let int n = if n = 0 then Zero else Unk (F.int n) let add : int term -> int term -> int term = fun x y -> match (x,y) with | (Zero,x) -> x | (x,Zero) -> x | (Unk x, Unk y) -> Unk (F.add x y) end end
The type 'a term
is the GADT with two variants: Zero
for the
literal zero and Unk
for a DSL expression whose zeroness is
statically unknown or forgotten. The GADT hence tells the minimum we
need to know to carry out the zero-addition elimination. The operation fwd
marks its argument as unknown, and bwd
forgets all static
knowledge, producing the opaque 'a from
value. This is the typical
pattern. It should be clear that 'a term
, albeit a data type, is not
at all the deep DSL embedding (such as the expr
data type from the
Primer section). For one, 'a term
is not recursive. Second, it
makes explicit only the bare minimum -- in our case, if the expression
is literally zero or not.
The partial interpreter shows how the literal zeroness is determined
and used. The match
statement in the interpretation of add
expresses the rules of optimization clearly and directly. Since our
language is so simple, IDelta
is in fact the full interpreter. Later
on, we extend the DSL with more expression forms; the IDelta
however
will remain the same.
The complete optimizer is built by combining the optimization pass with
the default optimizer SYMT
and overriding the default interpretations by
those in IDelta
. The result is the full, optimizing interpreter:
module SuppressZero(F:SYMOBS) = struct module OptM = SuppressZeroPass(F) include SYMT(OptM.X)(F) include OptM.IDelta (* overriding int and add in SYMT *) end
Ex3
from the previous section demonstrates the optimization. The result
is shown underneath the expression, indented.
module SuppressZeroSh = SuppressZero(AddObs(Sh)) let module M = Ex3(SuppressZeroSh) in SuppressZeroSh.observe M.res "((3 + 5) + 1)"
The next section shows a more interesting optimization and its composition with the present zero-elimination pass.
(add 1 (add 2 (add 3 ...)))
. In this form, the
first argument of any addition is not itself an addition. Such
expressions can be compiled to a sequence of simple addition
instructions over a single accumulator (think of Z80). We later
compose this optimization with the zero-elimination of the previous
section.
The implementation follows the established pattern, of defining an
instance of Trans
with the data type 'a term
representing the optimization state, and the partial interpreter IDelta
that builds and propagates this state.
module LinearizeAddPass(F:SYM) = struct module X = struct type 'a from = 'a F.repr type 'a term = | Unk : 'a from -> 'a term (* non-empty difference list *) | Add : (int from -> int from) * int from -> int term let fwd x = Unk x (* generic reflect *) let bwd : type a. a term -> a from = function | Unk x -> x | Add (f,n) -> f n end open X module IDelta = struct let int n = Add ((fun x -> x),F.int n) let add : int term -> int term -> int term = fun x y -> match (x,y) with | (Add(fx,nx), Add(fy,ny)) -> Add ((fun nil -> fx (F.add nx (fy nil))),ny) | (Unk x, Add(fy,ny)) -> Add ((fun nil -> F.add x (fy nil)),ny) | (Add(fx,nx), Unk y) -> Add ((fun nil -> fx (F.add nx nil)),y) | _ -> Unk (F.add (bwd x) (bwd y)) end end
Again, some may think that the data type 'a term
is the deep DSL
embedding. It cannot be: it is not recursive. Further, the addition
is represented differently from the AST data type expr
of the Primer
section: Add (f,n) : int term
is not a node in a tree but a
non-empty difference list. The 'a term
here is different from the 'a term
in the previous section. Unlike the typical deep embedding
optimizations that traverse the same AST data type, tagless-final
optimizations employ whatever representation is the most convenient for
them. Different optimizations will likely use different expression
representations.
Right-associated additions are isomorphic to a non-empty list of
summands. We encode it as a non-empty difference list and
build bottom-up. Again the readers are encouraged to prove that
the code above really does so. The proof is an easy induction: after
all, the structural induction is apparent in the IDelta
code, at least
from the fact add
is not recursive.
Taking Ex3
again as the example, we linearize it and pretty-print the
result:
module LinearizeAdd(F:SYMOBS) = struct module OptM = LinearizeAddPass(F) include SYMT(OptM.X)(F) include OptM.IDelta end module SLin = LinearizeAdd(AddObs(Sh)) let module M = Ex3(SLin) in SLin.observe M.res "(3 + (5 + (0 + (1 + 0))))"
The result is indeed right-associated, but with zero summands. We have a pass for that, which we tack on:
module SLinZero = SuppressZero(SLin) let module M = Ex3(SLinZero) in SLinZero.observe M.res "(3 + (5 + 1))"
Composing the LinearizeAdd
optimization with the earlier defined SuppressZero
is as simple as the ordinary functional composition of
the transformers.
In the tagless-final approach, a DSL is defined by the
signature of its interpreters, like the SYM
signature from the Primer.
The new DSL of higher-order functions and addition expression has
the following signature:
module type SYMHO = sig include SYMOBS val lam: ('a repr -> 'b repr) -> ('a->'b) repr val app: ('a->'b) repr -> ('a repr -> 'b repr) endIt reuses the already defined simple DSL of arithmetic expressions, and its observations, adding to it two new expression forms. Here is a sample expression of the extended language, with lambda-abstractions and applications:
module Ex5(I:SYMHO) = struct open I let res = observe @@ lam @@ fun x -> add (add x (add (int 5) (int 0))) (add (add (add (int 1) (int 0)) (int 7)) (add (app (lam (fun y->y)) (int 4)) (int 2))) endThe inferred signature
module Ex5 : functor (I : SYMHO) -> sig val res : (int -> int) I.obs endtells that our DSL has grown beyond solely
int
expressions: Ex5
represents an int->int
function.
We need an Ru
-like interpreter to evaluate it. We do not write it
from scratch: rather, we extend the Ru
interpreter:
module RuHO = struct include AddObs(Ru) let lam f = f let app f x = f x endAs the proper extension, it can handle all earlier examples in the original, simple DSL; it also interprets the extended language:
let module M = Ex2(RuHO) in M.res 10 let module M = Ex5(RuHO) in M.res 6 25
RuHO
, like any other tagless-final interpreter, is typed, evaluating an 'a repr
DSL expression into the OCaml value of the same type. We did
not talk about types before as
the DSL had only int
expressions. It has grown: Ex5
represents an int->int
expression, and is evaluated into the corresponding OCaml
function; the latter is then applied to 6
. The evaluation result has
no tags and other wrappers, justifying the name of the approach.
The ShHO
interpreter pretty-prints expressions:
let module M = Ex5(ShHO) in M.res "Lx. (x + (5 + 0)) + (((1 + 0) + 7) + ((Ly. y) 4 + 2))"(We re-wrote the interpreter with precedence pretty-printing, to omit useless parenthesis).
Let us see how to optimize the extended expressions. First, we need
a new default interpreter (which is again obtained by extending SYMT
and reusing all of its code):
module SYMTHO(X:Trans)(F:SYMHO with type 'a repr = 'a X.from) = struct open X include SYMT(X)(F) let app x y = fwd (F.app (bwd x) (bwd y)) let lam f = fwd (F.lam (fun x -> bwd (f (fwd x)))) endThere is no second step. The old optimization passes
SuppressZeroPass
and LinearizeAddPass
can be used as they are:module SuppressZeroHO(F:SYMHO) = struct module OptM = SuppressZeroPass(F) include SYMTHO(OptM.X)(F) include OptM.IDelta end module SuppressZeroHOSh = SuppressZeroHO(ShHO) let module M = Ex5(SuppressZeroHOSh) in M.res "Lx. (x + 5) + ((1 + 7) + ((Ly. y) 4 + 2))"Adding the next pass, to linearize the additions, is a simple composition, as before:
module LinearizeAddHO(F:SYMHO) = struct module OptM = LinearizeAddPass(F) include SYMTHO(OptM.X)(F) include OptM.IDelta end module SLinZeroHO = LinearizeAddHO(SuppressZeroHOSh) let module M = Ex5(SLinZeroHO) in M.res "Lx. x + (5 + (1 + (7 + ((Ly. y) 4 + 2))))"The additions are all associated to the right, and there are no zeros in sight. Incidentally,
Ex5(SLinZeroHO)
has the type sig val res : (int -> int) SLinZeroHO.obs end
. The result, just like
the representation of the original Ex5
, is indexed by the same type int->int
. The optimization has preserved the type. All
tagless-final optimizations have this property by the very construction.The intended audience are the users, researchers and students of functional programming. Experience with the typed functional programming and only the basic familiarity with Haskell or OCaml are assumed. The tutorial thus should be applicable across the modern functional programming languages. Rather than a lecture, the tutorial is designed to be an interactive session, explaining the technique on a series of progressively more complex examples and discussing the implementation points and problems with the audience.
Just as a Math talk cannot avoid looking at proofs, in a programming tutorial we cannot help but stare at the code. Befitting its informal style, the tutorial was presented by showing, describing, explaining and discussing the code. Below are the code files, in Haskell and OCaml, with many comments, questions, remarks and quizzes.
The tutorial has numerous quizzes.
A good exercise is implementing the optimization of
applying a statically known function, but only in the case when the function's
argument occurs in its body at most once. Further problems to think are:
modeling circuits with multiple inputs and outputs; implementing and
composing adders (ripple-adders and more advanced ones); writing the
optimizations of the sort AND x x -> x
; complete
Boolean circuit simplifications; converting to DNF; handling sharing.
slides-PPL.pdf [82K]
A few introductory slides (in Japanese)
T4: Embedding and optimizing domain-specific languages in the typed final style
The more extensive (3.5hrs) tutorial, presented at CUFP 2015 in Vancouver, BC, Canada on September 3, 2015.
Functional Thursday #30 October 1, 2015. Taipei, Taiwan
The recorded lecture at the Functional Thursday Meetup Taipei
International Summer School on Metaprogramming
Robinson College, Cambridge, 8th to 12th August 2016
Three-lecture course using Haskell as the meta-language
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML