Stack as the variable binding environment: lexical binding via dynamic binding, strict and lazy

 
We demonstrate how lexical binding can be realized in terms of dynamic binding. Therefore, higher-order lexically scoped functions can be introduced into an existing first-order language as is, without re-writing its compiler/interpreter to deal with the variable binding environment. The approach works for strict and lazy evaluation, and is particularly instructive for delimited dynamic binding. Applications involve extensible interpreters and compilers, and staging.

 

Introduction

The most familiar, explained in any textbook approach to dealing with variable references in higher-order programs is variable binding environment: an association of variable names with their bound values. In an interpreter (the eval function), the environment is one of its arguments. Compiled functions receive the environment as an extra argument. Terms with variables are commonly thought to mean functions from the environment to the value domain.

This article demonstrates a different, unconventional approach to variable references, coming from the observation that a (let-)bound variable can only be used while the control remains within the corresponding let-expression body. In interpreter terms, a variable may be accessed only while the interpreter is still handling the let-form, which remains on the interpreter stack. Therefore, to find the value associated with the bound variable we search the stack -- or, better, just point to the right place on the stack that stores that value.

That is a very attractive idea, which seems first to come to mind when implementing higher-order languages. It was indeed first that came to mind to J. McCarthy and A. Turing. Alas, the idea -- now called dynamic binding -- is flawed (as Turing soon realized). To regard functions as first-class, to be able to pass them as arguments and return as results, lexical binding is indispensable.

This article demonstrates that, perhaps surprisingly, lexical binding can be implemented via dynamic binding, in strict and lazy settings.

The main advantage of this alternative approach is making variables a modular feature: variables and (first-class) functions can be introduced to a first-order language without any changes to the latter, without any need to re-write the interpretation of the first-order fragment (numbers, strings, etc.) to pass the environment. We may write extensible interpreters and compilers. Another application is the surprisingly simple implementation of staging.

Dynamic binding can be implemented in many ways. One particular implementation, in terms of delimited control, turns out particularly insightful. Incidentally, it demonstrates the need for multi-shot (non-affine) delimited continuations -- perhaps the first non-trivial use of such delimited continuations aside from non-determinism.

Another insight is that a let-expression -- often considered a mere syntax sugar in lambda calculus -- turns out more fundamental than the lambda abstraction.

References
lexdyn.ml [20K]
The complete code accompanying the article, with many tests and examples

Dynamic Binding

A surprisingly simple implementation of staging

 

Sample languages

To make the point we introduce a progression of (simply typed) languages, starting from the basic and adding more and more features. We shall see that their implementations are just as extensible.

We implement the languages by writing their interpreters -- several, in fact, for various evaluation strategies -- in OCaml. It helps therefore to embed the languages in OCaml as DSLs. We will be using tagless-final embedding, as more illuminating and easier to extend. The latter is particularly important: the extensibility of implementations is one of the main points.

The base language is the simplest first-order language with integers and integer operations, as well as applications (of built-in functions, since there is no way to define functions yet.) As a tagless-final DSL, the language (its operations -- syntax -- and the types) are described by the following OCaml signature.

    module type fo = sig
      type 'a exp                           (* expressions *)
    
      val int : int -> int exp              (* integer literals *)
      val ( + ) : int exp -> int exp -> int exp
      val ( - ) : int exp -> int exp -> int exp
    
      val (//)  : ('a -> 'b) exp -> 'a exp -> 'b exp   (* application, left-assoc *)
    end

A DSL expression of type t is represented by an OCaml expression of (an abstract) type t exp. For example, an OCaml expression int 1 represents DSL's integer literal 1. Here is a (OCaml representation) of a more elaborate DSL expression:

    let exfo = int 1 + int 2 - int 3 - int 4
Applications are denoted by the (left-associative) operator //. Built-in functions are easy to add but they will not be needed for our examples.

The second language, fol, extends fo with definitions: let-expressions and variables:

    module type fol = sig
      include fo
      type 'a var
      val newvar : string -> 'a var
      val var    : 'a var -> 'a exp
      val let_   : 'a var -> 'a exp -> (unit -> 'b exp) -> 'b exp
    end
The type 'a var describes variables, created by newvar from an unlimited supply. The first argument of var is the name, used only for tracing, which is quite useful especially for call-by-need later on. Variables per se are not expressions: use var to reference them (which may, and actually will be, an effectful operation). The body of a let-expression is a thunk: the timing of its evaluation is controlled by let_.

A sample fol term may look as follows:

    let x = newvar "x"
    let y = newvar "y"
    
    let exfol = 
      let_ x (int 10 - int 5) @@ fun () -> 
      let_ y (int 1 + var x)  @@ fun () -> 
      let_ x (var x + var y)  @@ fun () ->
      var x + var y
The first two lines create the variables for use in exfol. (Hereafter we assume that all variables appearing in examples have been created similarly).

Finally, we make the language higher-order by adding first-class functions, built by lambda-expressions:

    module type ho = sig
      include fol
      val lam : 'a var -> (unit -> 'b exp) -> ('a -> 'b) exp
    end
For example, an anonymous function of adding two to its argument is represented as lam x (fun () -> x + int 2).

Here are a few sample expressions using all ho facilities, including those inherited from fo and fol:

    let ex0 = 
       (lam x @@ fun () -> let_ y (var x + var x) @@ fun () -> var y + var y) 
       // int 10
    
    let ex1 = (lam x @@ fun () ->
                 let_ y (var x + var x) @@ fun () ->
                   lam z @@ fun () -> var z + (var z + (var y + var y)))
               // (int 10 - int 5) 
               // (int 20 - int 10)
    
    let exd =
      let_ h (lam f @@ fun () -> let_ x (int 10) @@ fun () -> var f // var x) @@ fun () ->
      let_ x (int 1) @@ fun () ->
      var h // (lam y (fun () -> var x + var y))
The last example, exd, deliberately uses the same variable x in different contexts, to highlight the difference between lexical and dynamic bindings.

 

Implementation: first-order fragment

We will be writing many interpreters of our DSLs, starting with fo in this section. For a tagless-final DSL, an interpreter is not a single (eval) function; rather, it is a realization of the fo signature, showing the implementation (interpretation, or, denotation) of each expression form of the DSL.

We chose the simplest, meta-circular interpreter, which interprets the fo DSL as a subset of OCaml. Therefore, the type t exp is realized as t, DSL integer literals are OCaml integers, DSL addition and subtraction are OCaml addition and subtraction (modulo printing the trace), DSL application is the OCaml application. Other interpreters are possible: pretty-printer or code generator.

    module FO = struct
      type 'a exp = 'a
      
      let int : int -> int exp = Fun.id
      let ( + ) : int exp -> int exp -> int exp = fun x y -> 
        Printf.printf "Adding %d and %d\n" x y; x + y
      let ( - ) : int exp -> int exp -> int exp = fun x y -> 
        Printf.printf "Subtracting %d and %d\n" x y; x - y
    
      let (//) : ('a -> 'b) exp -> 'a exp -> 'b exp = (@@)
    end

Thus defined FO can be immediately used to evaluate the sample term exfo, producing -4 and printing the trace of additions and subtractions.

 

Dynamic binding

As remarked earlier, dynamic binding is first to come to mind when thinking of variables. It can be described by the following operations:
    module type dynbind = sig
      type 'a dynvar
      val dnew : ?init:'a -> unit -> 'a dynvar
      val dlet : 'a dynvar -> 'a -> (unit -> 'w) -> 'w
      val dref : 'a dynvar -> 'a
    end
The operation dnew creates a new dynvar: dynamically-bound variable, possibly with the default value. The operation dlet, having received a dynvar, a value and a thunk, evaluates the thunk in the dynamic environment extended with the binding of the given value to the given dynvar. Finally, dref obtains the value associated with the dynvar by the closest dynamically-enclosing dlet (or the default value, if it was specified and the variable is not bound).

There are several implementations of this signature. The simplest uses reference cells: so-called `shallow binding'.

    module DBShallow = struct
      type 'a dynvar = 'a option ref
    
      let dnew : ?init:'a -> unit -> 'a dynvar = fun ?init () -> ref init
    
      let dlet : 'a dynvar -> 'a -> (unit -> 'w) -> 'w = fun dv e body -> 
        let vold = !dv in
        dv := Some e;
        match body () with
        | r -> dv := vold; r
        | exception e -> dv := vold; raise e
    
      let dref : 'a dynvar -> 'a = function
        | {contents = Some x} -> x
        | _ -> failwith "unbound dynamic variable"
    end

One may also implement dynamic binding via delimited control -- so-called delimited dynamic binding (DDB). In this implementation, dereferencing a dynvar is literally reaching back to the place on the evaluation stack that stores the bound value (as part of the dlet stack frame). Finally, dynamic binding is what's called a `reader effect', and may be implemented with any algebraic effect facility (e.g., `one-shot' algebraic effects of OCaml 5).

References
Delimited Dynamic Binding

 

Pure let-binding is dynamic binding

Let us turn to the language fol, which extends fo with variables and let-expressions. A variable may only be accessed while in scope of the let-expression that bound it. In fol, the scope of a let-expression -- when control remains within its body -- cannot be interrupted, captured or resumed. Therefore, lexical scope and dynamic scope coincide, and we may use dynamic binding to implement let-expressions and variables. The advantage of dynamic binding is that it may added modularly to an existing language without any changes to the latter -- as we now demonstrate.

The interpreter of fol is a proper extension of FO, including it as is and adding the interpretation of the new facilities: variables and binding:

    module FOL(D:dynbind) = struct
      include FO
    
      type 'a var = string * 'a D.dynvar    (* name for tracing *)
    
      let newvar : string -> 'a var = fun name -> (name, D.dnew ())
    
      let var : 'a var -> 'a exp = fun (name,dv) -> 
          Printf.printf "Needing %s...\n" name; D.dref dv
    
      let let_ : 'a var -> 'a exp -> (unit -> 'b exp) -> 'b exp = 
        fun (name,dv) e body -> 
          Printf.printf "Binding %s...\n" name; D.dlet dv e body
    end
Essentially, DSL variables are realized as dynvar, var is dref and let_ is dlet. In particular, accessing a variable is a (reader) effect.

The FOL interpreter is parameterized by the implementation of dynamic binding. See the accompanying code for tests, using DBShallow and DDB implementations of dynamic binding.

We must stress again the absence of any explicit environment, and that the first-order fragment FO is used as is, without any modification. The first-order fragment remains unaware of any variables.

 

Higher-order functions: variable capture

It may be tempting to add higher-order functions similarly, as a simple generalization of let:
    module HODB(D:dynbind) = struct
      include FOL(D)
      let lam : 'a var -> (unit -> 'b exp) -> ('a -> 'b) exp = fun v body -> 
        fun e -> let_ v e body
    end

which is what the original Lisp did. It is indeed very intuitive and elegant -- in fact, too intuitive. Trying to interpret our examples shows the problem: whereas ex0 runs and gives the expected result, interpreting ex1, repeated below

    let ex1 = (lam x @@ fun () ->
                 let_ y (var x + var x) @@ fun () ->
                   lam z @@ fun () -> var z + (var z + (var y + var y)))
               // (int 10 - int 5) 
               // (int 20 - int 10)
ends in a ``unbound dynamic variable'' when accessing y, according to the trace. Looking closely, we see that although lam z is defined within the scope of let_ y, it is applied outside the dynamic scope of that let-expression. The example exd shows a more insidious error:
    let exd =
      let_ h (lam f @@ fun () -> let_ x (int 10) @@ fun () -> var f // var x) @@ fun () ->
      let_ x (int 1) @@ fun () ->
      var h // (lam y (fun () -> var x + var y))
Interpreting it with HODB finishes without problems; the result, 20, is probably not what one may have expected. In the presence of first-class functions, lexical binding no longer coincides with dynamic binding.

The solution is also old, proposed by Landin back in 1960s: closures, which carry the variable environment at the point of function definition. When the function is later applied, its body is evaluated within the environment of its definition rather than the environment of the application. We can implement this standard, textbook solution literally, if we use an extended dynamic binding facility that permits capturing and restoring of the dynamic environment:

    module type db_capture = sig
      include dynbind
    
      type env
      val current_env : unit -> env
      val with_env    : env -> (unit -> 'w) -> 'w
    end
The operation current_env captures the current environment and returns it as an opaque value (of an abstract type env). The operation with_env receives an env and a thunk and evaluates the thunk in the given dynamic environment env rather than the current one.

Given an implementation of db_capture, lexical first-class functions are implemented literally as the textbooks say:

    module HO(D:db_capture) = struct
      include FOL(D)
    
      let lam : 'a var -> (unit -> 'b exp) -> ('a -> 'b) exp = fun v body -> 
        let env = D.current_env () in   (* env of the lam definition *)
        fun e -> 
          D.with_env env @@ fun () -> let_ v e body
    end
The body of the lambda-expression is evaluated in the environment of its definition.

 

Capturing dynamic environment

We have thus seen that lexical scope is characterized by an extended dynamic binding db_capture -- extended with the operations for capturing and restoring of the dynamic environment. We now demonstrate that these operations are realizable in terms of the ordinary dynamic binding (plus the facility to build sequences, or at least something like function compositions).

The proof is constructive: the following functor delivers an implementation of db_capture given an implementation of dynbind:

    module DBCapture(D:dynbind) = struct
      include D
      type env = {k: 'w. (unit -> 'w) -> 'w}
      
      let with_env : env -> (unit -> 'w) -> 'w = fun {k} body -> k body
    
      let env_acc = dnew ~init:{k=fun b -> b ()} ()
    
      let current_env : unit -> env = fun () -> dref env_acc
      
      let dlet : 'a dynvar -> 'a -> (unit -> 'w) -> 'w = fun dv e body -> 
        let env = current_env () in
        D.dlet env_acc {k = fun b -> env.k (fun () -> D.dlet dv e b)} @@ fun () -> 
          D.dlet dv e body
    end
The trick is the dynamic variable env_acc, to accumulate the current dynamic environment. A dlet expression records in env_acc the binding it is about to make.

The functor DBCapture builds the implementation of db_capture generically, given any implementation of dynbind. For a particular implementation of dynbind, however, there may be a more efficient realization of db_capture. For example, the accompanying code shows such efficient implementation of db_capture it terms of DDB, using multiprompt delimited control (supported in OCaml as the library delimcc). Incidentally, capturing of the environment requires multi-shot delimited control, which is not supported by OCaml 5 algebraic effects. It turns out that multi-shot delimited control is needed not just to implement non-determinism.

References
lexdyn.ml [20K]
The complete code accompanying the article

 

Call-by-name and call-by-need

In the call-by-name evaluation strategy, variables are bound to expressions rather than values; the evaluation occurs when the value is required, to return the overall result or perform an arithmetic operation. Making our DSLs call-by-name is messier, because the the evaluation strategy of the DSL differs from that of the host language, and we may no longer piggy-back on the latter: straightforward meta-circular interpretation is no longer possible. Still, we may introduce variables and variable binding modularly, as described earlier. A complication is that we need to capture the environment already in the fol interpreter: in call-by-name, lexical and dynamic scope differ even without first-class functions.

Call-by-need, or lazy, evaluation is call-by-name with the memoization of the result. A lazy expression is not evaluated until its result is needed. At that point, the expression is evaluated and the result is memoized. Thus a lazy argument expression is evaluated at most once. Dynamic binding often includes the operation dset to change the binding of a dynamic variable in scope (in particular, DDB supports such an operation; it is easy to add to DBShallow). Using dset in the implementation of the var operation is all that is needed to make call-by-name into call-by-need.

References
lexdyn.ml [20K]
The complete code accompanying the article

 

Call-by-need via delimited continuations

The two old pieces of code, mostly of historical interest by now, implement lazy evaluation in a call-by-value lambda-calculus with shift and reset. The first code was written in 2005 in Scheme. An improved version, in OCaml and using delimcc, was implemented in July 2017 over discussions with Petros Barbagiannis. That version has eventually lead to the present article.

The goal at that time was to related call-by-need with delimited control. I have eventually realized that it is not about delimited control but about dynamic binding (which may be implemented via delimited control, among other things).

References
cbn_delimcc1.ml [8K]
The OCaml implementation, July 7, 2017. It uses a more conventional representation of an embedded DSL as an algebraic data type (deep embedding). The interpreter (eval) is the fold over that data type.

lazy-eval.scm [6K]
The complete implementation and several illustrative examples, including the recursively-defined Fibonacci stream. The code was originally written on Feb 21, 2005. The version as of Dec 28, 2008 adds a convenient automatic identification of lazy expressions and more examples.

Ronald Garcia, Andrew Lumsdaine and Amr Sabry: Lazy Evaluation and Delimited Control. Logical Methods in Computer Science, July 11, 2010, Volume 6, Issue 3

 

Conclusions

We have thus seen that