previous next up top

Manifestly type-preserving, effective partial evaluation in the tagless-final style

We explain how to write an online partial evaluator for a typed higher-order embedded language in the tagless-final style. De Bruijn-index presentation enables ``specializations under lambda''; it also makes the partial evaluator agonizingly tricky albeit insightful to write. The partial evaluator is type-preserving by construction. Hence many problems during its construction were type errors and became easier to find.

Introduction
The object language: the embedded DSL
Principles of the online PE in the tagless-final style
Three challenges of partial evaluation with De Bruijn indices
Partial Evaluator itself
Conclusions

Introduction

The tagless-final embedding of a language lends itself to online partial evaluation (PE) -- that is, converting an embedded language expression to an equivalent but more efficient one. The efficiency is gained because of, essentially, function inlining, constant propagation and folding. A tagless-final PE is manifestly type-preserving: the partially evaluated expression is well-typed and has the same type as the original expression -- and this correctness property holds by construction. Therefore, when writing a tagless-final PE we do not have to worry about possible unbound variables or producing an ill-typed result. Freed from these safety concerns we can concentrate on the efficiency of the result.

We take our embedded language (DSL) to be typed and higher-order, letting us define functions (lambda-abstractions) and apply them. To represent them in the metalanguage (here, Haskell), we can use either higher-order abstract syntax (HOAS) or De Bruijn indices. For example, the DSL function to add two integers is encoded as lam (\x -> lam (\y -> x `add` y)) using HOAS, or lam (lam (s z `add` z)) with De Bruijn indices.

The JFP 2009 paper that introduced the tagless-final approach has already presented an online partial evaluator for the HOAS embedding. Another HOAS partial evaluator, relying on the type-directed PE, is described elsewhere on this site. Since a tagless-final encoding with De Bruijn indices can always be converted to the higher-order abstract syntax, De Bruijn-index--based PE seems a solved problem.

And yet there is a compelling reason for developing a partial evaluator for tagless-final terms with De Bruijn indices. Reducing the problem to the HOAS case does not quite work: although the above-mentioned HOAS partial evaluators do perform constant propagation and folding within function bodies, these improvements are delayed until the function is applied -- and so will be done over and over again if the function is applied several times. For example, a HOAS partial evaluator will transform lam (\x -> lam (\y -> x + y)) `app` (int 1) to the more efficient lam (\y -> int 1 + y), reducing the outer function application. Alas, a HOAS partial evaluator will not immediately improve lam (\y -> lam (\x -> x + y) `app` (int 1)) because the redex is hidden within the outer function body. The PE does notice the redex and arranges to reduce it: lam (\y -> (\x -> x + y) (int 1)) -- by converting the embedded language redex to the Haskell redex (notice the disappearance of app and the inner lam). The reduction will be delayed until the outer function is applied. If we pass the PE result to a code generator or pretty-printer -- as is often the case -- the function will be applied and the improvements will be carried out. The pretty-printed code will look efficient, like \y -> 1 + y. However, if we evaluate the PE result as the Haskell Int->Int function, we see no improvements and perhaps even a slow-down, since the PE specializations will be run every time the function is applied. In contrast, a De Bruijn partial evaluator will do the static computations (i.e., the ones that can be done without knowing functions' arguments) without delay, at the PE time -- whether these computations are hidden in function bodies or not. The improvements will be clear no matter which interpreter runs the specialization result. Although HOAS terms are fun to write, De Bruijn terms are easier to automatically manipulate; therefore, in practice various intermediate higher-order languages tend to use De Bruijn representation. For example, the output of the tagless-final type checker will be in a De Bruijn form. Finally, although the De Bruijn partial evaluator is much harder to write, it is more insightful. This article aims to expose this insight.

References

Tagless-final approach: Type-preserving embedding of higher-order, typed DSLs

TTFdB_pe.hs [13K]
The complete Haskell code with very many examples and test cases

Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages

TTFdBHO.hs [2K]
Converting from the De-Bruijn--based Symantics to the one based on the higher-order abstract syntax

The object language: the embedded DSL

The object language -- in which the programs to partially evaluate are written -- will be a simple, typed higher-order language: simply typed lambda-calculus with integer literals and addition. The tagless-final embedding lets us add more operations to the language later. The partial evaluator is likewise extensible. The syntax of the language, the constructors of its expressions, are defined by the following type class:

     class Symantics repr where
         int :: Int -> repr h Int                -- int literal
         add :: repr h Int -> repr h Int -> repr h Int
     
         z   :: repr (a,h) a			-- variables: z and s ... (s z)
         s   :: repr h a -> repr (any,h) a
         lam :: repr (a,h) b  -> repr h (a->b)
         app :: repr h (a->b) -> repr h a -> repr h b

An embedded language (DSL) expression of the type a is represented as the Haskell value of the type repr h a where h describes the types of free variables in the expression. Functions are embedded using De Bruijn indices: z stands for the variable with the index 0, s z for the variable with the index 1, s (s z) for index 2, etc. We will see examples in the next section. The semantics is defined by the instances of the type class, for concrete repr. Thus the type variable repr represents an interpreter for the language. The signature of the class Symantics may be read as the typing rules for our DSL.

Principles of the online PE in the tagless-final style

A tagless-final partial evaluator is a transformer: it takes a source tagless-final term such as the one below, representing the function \y (\x -> y + x) 0

     t1S :: Symantics reprS => reprS h (Int->Int)
     t1S = lam (lam (add z (s z)) `app` int 0) -- \y (\x -> y + x) 0

and produces the generally improved target term. In our case:

     t1T :: Symantics reprT => reprT h (Int->Int)
     t1T = lam z -- \y -> y

To be precise, a partial evaluator interprets the original tagless-final term t1S using the primitives of another interpreter reprT. To clarify, we start with the silly identity PE that does no improvement.

     newtype IdT reprT h a = IdT (reprT h a)
     
     instance Symantics reprT => Symantics (IdT reprT) where
       int x               = IdT (int x)
       add (IdT x) (IdT y) = IdT (add x y)
       z                   = IdT z
       s (IdT x)           = IdT (s x)
       lam (IdT f)         = IdT (lam f)
       app (IdT x) (IdT y) = IdT (app x y)

If we overlook the ubiquitous IdT, all the definitions look like the identity, mapping the primitives of the (IdT reprT) interpreter directly to the primitives of the target interpreter reprT with no further ado. If we instantiate reprS in t1S with IdT reprT, the result, after IdT unwrapping will be the same as t1S.

The improvement comes from noticing `static' computations, that can be done already during the partial evaluation, and doing them. The key idea is to notice and carry around all statically available information. For example, when the IdT reprT interpreter handles int 1, it merely builds the integer literal term int 1::reprT Int. Yet the interpreter gets to know at this point that the source expression represents the literal value 1; we should carry that information for later. When the IdT reprT interpreter handles add (int 1) (int 2), it again creates the summation expression add (int 1) (int 2) :: reprT Int for the target interpreter. Had the subexpressions int 1 and int 2 carried around the known literal values, we could have done the addition already and created int 3::reprT Int. The target interpreter no longer has to do the addition; it gets the addition result instead.

It seems no static knowledge can be gained from z -- the variable whose value will be known only when the function we are building is applied to an argument. Yet there is: z will be substituted with the argument. If the application of our function is visible to PE, PE will do it. Knowing where to substitute the argument will be useful then.

We are ready to write the partial evaluator. It interprets the source expressions as `partially known values' P reprT h a, which are essentially reprT h a annotated with the known static information. Since the reprT h a value is completely opaque (after all, reprT is just a type variable rather than any concrete type), we call it `dynamic'. On the other hand, if the PE knows exactly or roughly what reprT h a actually represents, we call this knowledge static. It can always be forgotten: the function dynamic below converts P reprT h a to just reprT h a. The data type P reprT h a is a sum (union) data type with the variants for the different amounts of static knowledge:

     data P repr h a where
       Unk    :: repr h a ->
                    P repr h a
       Static :: a -> (forall h. repr h a) ->
                    P repr h a
       StaFun :: (forall hout. EnvT repr (a,h) hout -> P repr hout r) ->
                    P repr h (a->r)
       Open   :: (forall hout. EnvT repr h hout -> P repr hout a) ->
                    P repr h a

The first variant, Unk, tells that we know nothing about the embedded expression. On the other extreme, Static tells that we completely know the value of the expression and can compute with it as we partially evaluate. This complete static knowledge is by nature unconditional, that is, does not depend on the free-variable environment h: statically known expressions are closed. StaFun describes a function that can be statically applied. The argument may be dynamic, but its substitution for the free variable in the function's body can be carried out statically. Unlike Static, the function body may be open and hence depends on h. (The data type EnvT responsible for the substitution is described later.) The last alternative describes an open expression -- which, like a function, permits static substitution for its free variables. The substitution returns a new, rebuilt P repr hout a term. If it still has free variables, the substitution can be carried out again.

Three challenges of partial evaluation with De Bruijn indices

What makes partial evaluation of De Bruijn-indexed terms P repr h a so difficult is that their type includes the environment h with the types of the free variables that may occur in the term. The two key operations of partial evaluation, substitution for free variables and weakening, do change the environment h. (The type a of the embedded expression of course stays the same.) Furthermore, these operations have to adjust the De Bruijn indices in a complex way. Three simple examples should make the challenges clear. In the first example

     t2S :: P repr h (Int->Int)
     t2S = lam $ (lam (s z `add` z `add` int 1)) `app` z    -- representing: \x -> (\y -> x + y + 1) x

the inner function lam (s z `add` z `add` int 1) of the type P repr (Int,h) (Int -> Int) is applied to z :: P repr (Int,h) Int. Both the function and the argument are open expression, as the environment (Int,h) shows. The application is statically visible and so PE does it, substituting in (s z `add` z `add` int 1) :: P (Int,(Int,h)) Int the top free variable with z. This seems like the identity substitution, replacing z with z, but it is not. The replaced z is the free variable of the inner function, with the type P (Int,(Int,h)) Int, whereas the replacement is the free variable of the outer function, of the type P (Int,h) Int. The substitution happens at a different type. Correspondingly, the result of the substitution (z `add` z `add` int 1) :: P (Int,h) Int has the different (environment) type. The type changes during the substitution are the first challenge. If we compare the expression before and after the substitution we see that s z, unrelated to the substituted variable, is changed to just z. The adjustment of all indices during the substitution is another challenge.

In the second example,

     t3S :: P repr h ((Int->Int)->Int)
     t3S = (lam (lam (z `app` (s z `add` int 1)))) `app` (int 2) -- (\x y -> y (x + 1)) 2

the partial evaluator will apply the outer function and hence substitute the free variable in

(lam (z `app` (s z `add` int 1))) ::
P (Int,h) (((Int->Int)->Int))

with int 2. That means, PE will have to substitute int 2 within

(z `app` (s z `add` int 1)) :: P
((Int->Int),(Int,h)) Int

. That expression has two free variables, and the one that is substituted, of the type Int, is not at the top of the environment. The result

(z `app` (int 2 `add` int 1)) :: P
((Int->Int),h) Int

has the addition of two integer literals, which can be performed statically: a substitution enables further simplifications. The final result is lam (z `app` int 3) representing \y -> y 3. The challenge: the variable to substitute may be s (s ... (s z)) rather just z.

The final challenge is substituting functions, as in:

     t4S :: P repr h (Int->Int->Int)
     t4S = lam $ (lam (lam (s z `app` z))) `app` (lam $ z `add` s z) -- \x -> (\u v -> u v) (\y z -> z + y)

Doing the outer application and substituting (lam $ z `add` s z) produces at first

lam ( (s (lam $ z `add` s z)) `app` z) :: P repr
(Int,h) (Int->Int)

. The next step is weakening the substituted function, `pushing s inside':

lam ( (lam $ z `add` s (s z)) `app`
z)

. This complex update of indices (z remained the same but s z turned into s (s z)) is the final challenge. The substituted function can now be statically applied; the final result is:

lam (z
`add` (s z))

Answering the challenges demands a general way to transform the type environment of a term. The data type EnvT represents the transformation and the function app_open below carries it out.

     data EnvT repr hin hout where
       Dyn  :: EnvT repr hin hin
       Arg  :: P repr hout a -> EnvT repr (a,hout) hout
       Weak :: EnvT repr h (a,h)
       Next :: EnvT repr hin hout -> EnvT repr (a,hin) (a,hout)

EnvT repr hin hout describes the transformation of the type environment from hin to hout. Dyn is the identity transformer; it also requests the forgetting of all statically known data, converting P repr h a to the form Unk (x::repr h a). Arg p asks to substitute p for the z free variable and hence removes its type a from the environment. On the other hand, Weak requests weakening of the environment, adding some type a at the top. Since we may need to weaken and substitute for free variables other than z, Next increments the environment level at which Weak and Arg should apply. The function app_open carries out the environment update and returns the rebuilt, transformed term.

     app_open :: Symantics repr =>
                 P repr hin r -> EnvT repr hin hout -> P repr hout r
     app_open e Dyn            = Unk (dynamic e)
     app_open (Static es ed) _ = Static es ed
     app_open (Open fs) h      = fs h
     app_open (StaFun fs) h    = lam (fs (Next h))
     app_open (Unk e) h        = Unk (app_unk e h) where ...

Static values do not depend on the environment and hence the transformation is vacuous. Transformation under lam increments the update level. The real transformation is done by the interpreters for z and s below. The final auxiliary function, dynamic, forgets the static information and returns the purely dynamic term:

     dynamic:: Symantics repr => P repr h a -> repr h a
     dynamic (Unk x)      = x
     dynamic (Static _ x) = x
     dynamic (StaFun f)   = lam $ dynamic (f Dyn)
     dynamic (Open f)     = dynamic (f Dyn)

Partial Evaluator itself

We are ready to write the PE itself. It is defined by how it interprets the primitives of the embedded language as P reprT h a values. We describe primitive-by-primitive.

     instance Symantics repr => Symantics (P repr) where
       int n = Static n (int n)

From the integer literal we statically know its value, in any environment. The interpretation of addition takes advantage of this knowledge:

     add (Static n1 _) (Static n2 _) = int (n1+n2)

If the values of both summands are statically known, the addition can be done right away, producing the literal carrying the sum. If only one argument is statically known, the improvement is still possible if that argument is 0:

     add (Static 0 _) x = x
     add x (Static 0 _) = x

If both arguments are unknown (cannot be substituted into and hence improved) there is nothing else we can do:

     add (Unk x) (Unk y)   = Unk (add x y)

Otherwise, at least one argument may be improved by a substitution. We will look again through the result to see if the static addition becomes possible:

     add e1 e2 = Open (\h -> add (app_open e1 h) (app_open e2 h))

The primitive z is interpreted as the primitive improvable (substitutable) term, a term that reacts to EnvT requests:

     z = Open f where
       f :: EnvT repr (a,h) hout -> P repr hout a
       f Dyn       = Unk z       -- turn to dynamic as requested
       f (Arg x)   = x           -- substitution
       f (Next _)  = z           -- not my level
       f Weak      = s z         -- weaken

The interpretation of the weakening s p is more involved:

     s :: forall h a any. P repr h a -> P repr (any, h) a
     s (Unk x)       = Unk (s x)
     s (Static a ar) = Static a ar
     s (StaFun fs)   = lam (fs (Next Weak))
     s p = Open f where
       f :: EnvT repr (any,h) hout -> P repr hout a
       f Dyn              = Unk (s (dynamic p))
       f (Arg _)          = p
       f (Next h)         = s (app_open p h)
       f Weak             = s (s p)

If the expression to weaken is Open (that is, reacts to EnvT requests) we forward the request but remove one Next level. The subtle point is weakening lambda-expressions, that is, StaFun fs :: P repr h (ba->br). To answer the last challenge, we have to rebuild the function with the weakened body: P repr (ba,h) br is weakened to P repr (ba,(any,h)) br. It is an exercise to the reader to see that this procedure indeed shifts the De Bruijn indices as needed. (A mistake in shifting the indices would be a type error.)

The most complex part of PE is over. Interpreting a lam expression is simple:

     lam (Static k ks) = StaFun $ \_ -> Static k ks
     lam (Unk x)       = Unk (lam x)
     lam body          = StaFun (\args -> app_open body args)

If the body is statically known, regardless of the argument, it must be a constant function. If the body can be substituted into, we let this knowledge known, as StaFun. The pay-off comes when interpreting an application:

     app (StaFun fs) p   = fs (Arg p)
     app (Unk f) (Unk x) = Unk (app f x)
     app e1 e2           = Open (\h -> app (app_open e1 h) (app_open e2 h))

If the body of the function is known, it can be statically applied, by substituting the argument for the free variable in its body. If the function and the argument are dynamic, no further improvements are possible. If the function or the argument are open (hence, improvable), we repeat app after the improvement.

The result of the partial evaluation is the created dynamic term:

     pe :: Symantics repr => P repr () a -> repr () a
     pe = dynamic

This partial evaluator indeed improves our sample term t1S t1T (which is just the identity function) and the challenge terms t2S through t4S. The source code has many, many more examples of partially evaluated terms.

Conclusions

We have described the manifestly type-preserving online partial evaluator for tagless-final terms with De Bruijn indices and demonstrated two main principles. First is noticing available static information, propagating it and taking advantage of it when an opportunity arises. Second, an effective partial evaluator should be able to ``reduce under lambda'' -- carry out improvements within function bodies and hence manipulate free variables symbolically. This manipulation does a non-trivial shifting of the De Bruijn indices, which is quite a challenge to code. Luckily, shifting mistakes are caught by the Haskell type-checker.

Last updated November 9, 2014

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML