We explain how to write an online partial evaluator for a typed higher-order embedded language in the tagless-final style. De Bruijn-index presentation enables ``specializations under lambda''; it also makes the partial evaluator agonizingly tricky albeit insightful to write. The partial evaluator is type-preserving by construction. Hence many problems during its construction were type errors and became easier to find.
We take our embedded language (DSL) to be typed and higher-order,
letting us define functions (lambda-abstractions) and apply them. To
represent them in the metalanguage (here, Haskell), we can use either
higher-order abstract syntax (HOAS) or De Bruijn indices. For example,
the DSL function to add two integers is encoded as lam (\x -> lam (\y
-> x `add` y))
using HOAS, or lam (lam (s z `add` z))
with De
Bruijn indices.
The JFP 2009 paper that introduced the tagless-final approach has already presented an online partial evaluator for the HOAS embedding. Another HOAS partial evaluator, relying on the type-directed PE, is described elsewhere on this site. Since a tagless-final encoding with De Bruijn indices can always be converted to the higher-order abstract syntax, De Bruijn-index--based PE seems a solved problem.
And yet there is a compelling reason for developing a partial
evaluator for tagless-final terms with De Bruijn indices. Reducing the
problem to the HOAS case does not quite work: although the above-mentioned
HOAS partial
evaluators do perform constant propagation and folding within function
bodies, these improvements are delayed until the function is applied --
and so will be done over and over again if the function is applied
several times. For example, a HOAS partial evaluator will transform lam (\x -> lam (\y -> x + y)) `app` (int 1)
to the more
efficient lam (\y -> int 1 + y)
, reducing the outer function application.
Alas, a HOAS partial evaluator will not immediately
improve lam (\y -> lam (\x -> x + y) `app` (int 1))
because the
redex is hidden within the outer function
body. The PE does notice the redex and arranges to reduce
it: lam (\y -> (\x -> x + y) (int 1))
-- by converting the embedded
language redex to the Haskell redex (notice the disappearance of app
and the inner lam
). The reduction will be delayed until
the outer function is
applied. If we pass the PE result to a code generator or
pretty-printer -- as is often the case -- the function will be applied
and the improvements will be carried out. The pretty-printed code will
look efficient, like \y -> 1 + y
. However, if we evaluate the
PE result as the Haskell Int->Int
function, we see no improvements
and perhaps even a slow-down, since the PE specializations will be run
every time the function is applied. In contrast, a De Bruijn partial evaluator
will do the static computations (i.e., the ones that can be done without
knowing functions' arguments) without delay, at the PE time -- whether
these computations are hidden in function bodies or not. The improvements
will be clear no matter which interpreter runs the specialization result.
Although HOAS terms are fun to write, De Bruijn terms are easier
to automatically manipulate; therefore, in practice various
intermediate higher-order languages tend to use De Bruijn
representation. For example, the output of the tagless-final type
checker will be in a De Bruijn form. Finally, although the De Bruijn
partial evaluator is much harder to write, it is more insightful. This
article aims to expose this insight.
TTFdB_pe.hs [13K]
The complete Haskell code with very many examples and test cases
Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages
TTFdBHO.hs [2K]
Converting from the De-Bruijn--based Symantics to the one based on the higher-order abstract syntax
class Symantics repr where int :: Int -> repr h Int -- int literal add :: repr h Int -> repr h Int -> repr h Int z :: repr (a,h) a -- variables: z and s ... (s z) s :: repr h a -> repr (any,h) a lam :: repr (a,h) b -> repr h (a->b) app :: repr h (a->b) -> repr h a -> repr h bAn embedded language (DSL) expression of the type
a
is represented
as the Haskell value of the type repr h a
where h
describes the
types of free variables in the expression. Functions are embedded
using De Bruijn indices: z
stands for the variable with the index 0, s z
for the variable with the index 1, s (s z)
for index 2,
etc. We will see examples in the next section. The semantics is
defined by the instances of the type class, for concrete repr
. Thus
the type variable repr
represents an interpreter for the
language. The signature of the class Symantics may be read as the
typing rules for our DSL.\y (\x -> y + x) 0
t1S :: Symantics reprS => reprS h (Int->Int) t1S = lam (lam (add z (s z)) `app` int 0) -- \y (\x -> y + x) 0and produces the generally improved target term. In our case:
t1T :: Symantics reprT => reprT h (Int->Int) t1T = lam z -- \y -> yTo be precise, a partial evaluator interprets the original tagless-final term
t1S
using the primitives of another interpreter reprT
.
To clarify, we start with the silly identity PE that does no improvement.newtype IdT reprT h a = IdT (reprT h a) instance Symantics reprT => Symantics (IdT reprT) where int x = IdT (int x) add (IdT x) (IdT y) = IdT (add x y) z = IdT z s (IdT x) = IdT (s x) lam (IdT f) = IdT (lam f) app (IdT x) (IdT y) = IdT (app x y)
If we overlook the ubiquitous IdT
, all the definitions look like the
identity, mapping the primitives of the (IdT reprT)
interpreter directly
to the primitives of the target interpreter reprT
with no further ado.
If we instantiate reprS
in t1S
with IdT reprT
, the result, after IdT
unwrapping will be the same as t1S
.
The improvement comes from noticing `static' computations, that can be done
already during the partial evaluation, and doing them. The key idea is
to notice and carry around all statically available information. For
example, when the IdT reprT
interpreter handles int 1
, it merely
builds the integer literal term int 1::reprT Int
.
Yet the interpreter gets to know at this point
that the source expression represents the literal value 1; we
should carry that information for later. When the IdT reprT
interpreter handles add (int 1) (int 2)
, it again creates the
summation expression add (int 1) (int 2) :: reprT Int
for the
target interpreter. Had the subexpressions int 1
and int 2
carried around the known literal values, we could have done the
addition already and created int 3::reprT Int
. The
target interpreter no longer has to do the addition; it gets the
addition result instead.
It seems no static knowledge can be gained from z
-- the variable
whose value will be known only when the function we are building is
applied to an argument. Yet there is: z
will be substituted with the
argument. If the application of our function is visible to PE, PE will
do it. Knowing where to substitute the argument will be useful
then.
We are ready to write the partial evaluator. It interprets the source
expressions as `partially known values' P reprT h a
, which are
essentially reprT h a
annotated with the known static
information. Since the reprT h a
value is completely opaque (after
all, reprT
is just a type variable rather than any concrete type),
we call it `dynamic'. On the other hand, if the PE knows exactly or
roughly what reprT h a
actually represents, we call this knowledge
static. It can always be forgotten: the function dynamic
below
converts P reprT h a
to just reprT h a
. The data type P
reprT h a
is a sum (union) data type with the variants for the
different amounts of static knowledge:
data P repr h a where Unk :: repr h a -> P repr h a Static :: a -> (forall h. repr h a) -> P repr h a StaFun :: (forall hout. EnvT repr (a,h) hout -> P repr hout r) -> P repr h (a->r) Open :: (forall hout. EnvT repr h hout -> P repr hout a) -> P repr h aThe first variant,
Unk
, tells that we know nothing about the
embedded expression. On the other extreme, Static
tells that we
completely know the value of the expression and can compute with it as
we partially evaluate. This complete static knowledge is by nature
unconditional, that is, does not depend on the free-variable
environment h
: statically known expressions are closed. StaFun
describes a function that can be statically applied. The argument may
be dynamic, but its substitution for the free variable in the
function's body can be carried out statically. Unlike Static
, the
function body may be open and hence depends on h
. (The data type EnvT
responsible for the substitution is described later.) The last
alternative describes an open expression -- which, like a function,
permits static substitution for its free variables. The
substitution returns a new, rebuilt P repr hout a
term. If it still has
free variables, the substitution can be carried out again.P repr h a
so difficult is that their type includes the environment h
with the types of the free variables that may occur in the
term. The two key operations of partial evaluation, substitution for
free variables and weakening, do change the environment h
. (The type a
of the embedded expression of course stays the same.)
Furthermore, these operations have to adjust the De Bruijn indices in
a complex way. Three simple examples should make the challenges
clear. In the first examplet2S :: P repr h (Int->Int) t2S = lam $ (lam (s z `add` z `add` int 1)) `app` z -- representing: \x -> (\y -> x + y + 1) xthe inner function
lam (s z `add` z `add` int 1)
of the type P repr (Int,h) (Int -> Int)
is applied to z :: P repr (Int,h) Int
.
Both the function and the argument are open expression, as the environment (Int,h)
shows. The application is statically visible and so PE does it,
substituting in (s z `add` z `add` int 1) :: P (Int,(Int,h)) Int
the top free variable with z
. This seems like the identity substitution,
replacing z
with z
, but it is not. The replaced z
is the free variable
of the inner function, with the type P (Int,(Int,h)) Int
, whereas the replacement is the free
variable of the outer function, of the type P (Int,h) Int
. The substitution
happens at a different type. Correspondingly, the result of the substitution (z `add` z `add` int 1) :: P (Int,h) Int
has the different (environment)
type. The type changes during the substitution are the first challenge.
If we compare the expression before and after the substitution we see that s z
, unrelated to the substituted variable, is changed to just z
.
The adjustment of all indices during the substitution is another challenge.In the second example,
t3S :: P repr h ((Int->Int)->Int) t3S = (lam (lam (z `app` (s z `add` int 1)))) `app` (int 2) -- (\x y -> y (x + 1)) 2the partial evaluator will apply the outer function and hence substitute the free variable in
(lam (z `app` (s z `add` int 1))) ::
P (Int,h) (((Int->Int)->Int))
with int 2
. That means, PE will have
to substitute int 2
within (z `app` (s z `add` int 1)) :: P
((Int->Int),(Int,h)) Int
. That expression has two free variables,
and the one that is substituted, of the type Int
, is not at the top
of the environment. The result (z `app` (int 2 `add` int 1)) :: P
((Int->Int),h) Int
has the addition of two integer literals, which
can be performed statically: a substitution enables further
simplifications. The final
result is lam (z `app` int 3)
representing \y -> y 3
.
The challenge: the variable to substitute may be s (s ... (s z))
rather
just z
.The final challenge is substituting functions, as in:
t4S :: P repr h (Int->Int->Int) t4S = lam $ (lam (lam (s z `app` z))) `app` (lam $ z `add` s z) -- \x -> (\u v -> u v) (\y z -> z + y)Doing the outer application and substituting
(lam $ z `add` s z)
produces at first lam ( (s (lam $ z `add` s z)) `app` z) :: P repr
(Int,h) (Int->Int)
. The next step is weakening the substituted
function, `pushing s
inside': lam ( (lam $ z `add` s (s z)) `app`
z)
. This complex update of indices (z
remained the same but s z
turned into s (s z)
) is the final challenge. The substituted
function can now be statically applied; the final result is: lam (z
`add` (s z))
.
Answering the challenges demands a general way to transform
the type environment of a term. The data type EnvT
represents the
transformation and the function app_open
below carries it out.
data EnvT repr hin hout where Dyn :: EnvT repr hin hin Arg :: P repr hout a -> EnvT repr (a,hout) hout Weak :: EnvT repr h (a,h) Next :: EnvT repr hin hout -> EnvT repr (a,hin) (a,hout)
EnvT repr hin hout
describes the transformation of the type environment
from hin
to hout
. Dyn
is the identity transformer; it also requests
the forgetting of all statically known data, converting P repr h a
to the form Unk (x::repr h a)
. Arg p
asks to substitute p
for the z
free variable and hence removes its type a
from the environment.
On the other hand, Weak
requests weakening of the environment, adding
some type a
at the top. Since we may need to weaken and substitute
for free variables other than z
, Next
increments the environment level
at which Weak
and Arg
should apply. The function app_open
carries out the environment update and returns the rebuilt, transformed
term.app_open :: Symantics repr => P repr hin r -> EnvT repr hin hout -> P repr hout r app_open e Dyn = Unk (dynamic e) app_open (Static es ed) _ = Static es ed app_open (Open fs) h = fs h app_open (StaFun fs) h = lam (fs (Next h)) app_open (Unk e) h = Unk (app_unk e h) where ...Static values do not depend on the environment and hence the transformation is vacuous. Transformation under
lam
increments the update level.
The real transformation is done by the interpreters for z
and s
below.
The final auxiliary function, dynamic
, forgets the static information
and returns the purely dynamic term:dynamic:: Symantics repr => P repr h a -> repr h a dynamic (Unk x) = x dynamic (Static _ x) = x dynamic (StaFun f) = lam $ dynamic (f Dyn) dynamic (Open f) = dynamic (f Dyn)
P reprT h a
values. We describe primitive-by-primitive.instance Symantics repr => Symantics (P repr) where int n = Static n (int n)From the integer literal we statically know its value, in any environment. The interpretation of addition takes advantage of this knowledge:
add (Static n1 _) (Static n2 _) = int (n1+n2)If the values of both summands are statically known, the addition can be done right away, producing the literal carrying the sum. If only one argument is statically known, the improvement is still possible if that argument is 0:
add (Static 0 _) x = x add x (Static 0 _) = xIf both arguments are unknown (cannot be substituted into and hence improved) there is nothing else we can do:
add (Unk x) (Unk y) = Unk (add x y)Otherwise, at least one argument may be improved by a substitution. We will look again through the result to see if the static addition becomes possible:
add e1 e2 = Open (\h -> add (app_open e1 h) (app_open e2 h))
The primitive z
is interpreted as the primitive improvable
(substitutable) term, a term that reacts to EnvT
requests:
z = Open f where f :: EnvT repr (a,h) hout -> P repr hout a f Dyn = Unk z -- turn to dynamic as requested f (Arg x) = x -- substitution f (Next _) = z -- not my level f Weak = s z -- weaken
The interpretation of the weakening s p
is more involved:
s :: forall h a any. P repr h a -> P repr (any, h) a s (Unk x) = Unk (s x) s (Static a ar) = Static a ar s (StaFun fs) = lam (fs (Next Weak)) s p = Open f where f :: EnvT repr (any,h) hout -> P repr hout a f Dyn = Unk (s (dynamic p)) f (Arg _) = p f (Next h) = s (app_open p h) f Weak = s (s p)If the expression to weaken is
Open
(that is, reacts to EnvT
requests)
we forward the request but remove one Next
level. The subtle point
is weakening lambda-expressions, that is, StaFun fs :: P repr h (ba->br)
. To answer the last
challenge, we have to rebuild the function with the weakened body: P repr (ba,h) br
is weakened to P repr (ba,(any,h)) br
. It is an exercise
to the reader to see that this procedure indeed shifts the De Bruijn indices
as needed. (A mistake in shifting the indices would be a type error.)
The most complex part of PE is over. Interpreting a lam
expression is
simple:
lam (Static k ks) = StaFun $ \_ -> Static k ks lam (Unk x) = Unk (lam x) lam body = StaFun (\args -> app_open body args)If the body is statically known, regardless of the argument, it must be a constant function. If the body can be substituted into, we let this knowledge known, as
StaFun
. The pay-off comes when
interpreting an application:app (StaFun fs) p = fs (Arg p) app (Unk f) (Unk x) = Unk (app f x) app e1 e2 = Open (\h -> app (app_open e1 h) (app_open e2 h))If the body of the function is known, it can be statically applied, by substituting the argument for the free variable in its body. If the function and the argument are dynamic, no further improvements are possible. If the function or the argument are open (hence, improvable), we repeat
app
after the improvement.The result of the partial evaluation is the created dynamic term:
pe :: Symantics repr => P repr () a -> repr () a pe = dynamic
This partial evaluator indeed improves our sample term t1S
t1T
(which is just the identity function) and the challenge
terms t2S
through t4S
. The source code has many, many more examples
of partially evaluated terms.
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML