Free Variable as Effect, in Practice

 
The paper and the full accompanying code illustrating practical benefits of treating variable reference as an effect. The code also demonstrates the incremental development of an interpreter or a compiler: adding new features one-by-one, without modifying or even touching the older code.

 

HOPE'23 Paper

Variable environment is the time-honored way of making sense of free variables, used in programming language theory as well when writing interpreters and some compilers. Algebraic effects give another way, as was pointed already at HOPE 2017. Although a theoretical curiosity, it may have surprising practical benefits: a new way of writing compilers, with the incremental type-checking, with easy variable usage, leaf function analyses. This work-in-progress report prototypes and illustrates the idea.

In the environment semantics the meaning of an expression is a function from the environment, which is opaque and cannot be examined. We cannot tell which variables in the environment have actually been used, and how many times. Algebraic effects make the denotation more observable: a handler can watch questions and find out which variables have been asked about, and how many times. Thus we obtain the variable usage analysis in the ordinary course of compilation, almost for free, so to speak.

It remains to be seen how this promise holds for a real compiler for a realistic programming language. I intend to find out by trying this technique out in the new installment of the compiler class (which is underway).

Version
The current version is December 2023
References
var-effect.pdf [500K]
The extended version of the paper presented at the ACM SIGPLAN HOPE'23 workshop, Seattle, WA, USA, September 2023. doi:10.48550/arXiv.2312.16446

HOPE.pdf [114K]
Higher-order Programming is an Effect (HOPE 2017 submission)

Having an Effect
Detailed explanation of using effects for variable reference

 

step1: Warmup -- Interpreter of a simple language with numbers and addition

We start, as a warm-up, with an interpreter -- to turn later into a Wasm compiler by changing the domain of the interpretation. We also start with an exceedingly simple language: only numbers and the addition operation, as in a simple desk calculator. Variables -- let-expressions, and then functions -- are added in the next steps.

Each step is in its own directory.

References
lang.mli [<1K]
The source language definition: abstract syntax

lexer.mll [<1K]
parser.mly [2K]
The concrete syntax of the source language: integer literals, addition, parenthesized expressions -- in the form of the (ocaml)lex/(ocaml)yacc grammar. The parser relates the concrete and the abstract syntaxes. The lexer.mll is used in in all further steps as is.

eval1.ml [<1K]
The most straightforward implementation of lang.mli: the meaning of an expression is its value. It is hence an evaluator of the source language (terms).

eval.ml [<1K]
The same as eval1.ml but with the printing of intermediary results. Our interpreter then truly acts as the familiar desk calculator. The reason to introduce the printing of intermediary results is to make it easy to see how eager or delayed the evaluation is. The delay correlates with memory usage.

main.ml [<1K]
The driver that puts everything -- the parser, the abstract syntax evaluator and the observer -- together. This driver is used all throughout.

Makefile [<1K]
How to make it

Makefile.common [<1K]
The common Makefile code

 

step2: let-bound variables with the Environment Semantics

We now extend the language of step1 with variables and let-expressions. We use the environment to give meaning to terms with free variables.

By extension we mean re-use, in the compiled, binary form -- rather than copy-paste, let alone in-place modification of the earlier code. For example, suppose step1/lang.mli -- the abstract syntax of step1's language -- is available under the name langInt.mli (as a symbolic link) in the current directory. Its extension with variables and let-expressions is hence merely the addition of the new language forms:

    include module type of LangInt
    
    type name = string
    
    val var : name -> repr
    val let_ : name * repr -> repr -> repr  (* local let *)
References
<step2/>
The directory for this step

lang.mli [<1K]
Extending the abstract syntax of step1 with variables and let-expressions (as shown above)

parser.mly [2K]
The parser for the extended language

eval.ml [<1K]
The extended evaluator. The evaluator step1/eval1.ml is assumed symbolically linked as evalInt.ml. It is included and extended.

test1.sh [<1K]

Makefile [<1K]

 

step3: let-bound variables with the Effect Semantics

This step evaluates the language of step2 using effects (rather than the environment) for variable references. The effect semantics for free variables is used in all further steps.

After we build the interpreter, we can observe that it evaluates constant expressions as soon as possible -- in contrast to the step2 interpreter, which evaluates only when the whole program is parsed. See the paper for details.

References
<step3/>
The directory for this step. Most of its files are the same as in step2 (symbolically linked from there). The new files are as follows.

varEff.ml [3K]
Reader and other effects: Freer-monad implementation

eval.ml [<1K]
Extension of step1/eval1.ml in a different way: using effects, rather than the environment, to determine the meaning of a free variable

Makefile [<1K]

 

step4: Adding local functions

We now a add to the language second-class function declarations and function calls. The declarations may nest. The functions are not recursive: recursive functions make no sense in a language with no conditional expressions.
References
<step4/>
The directory for this step

lang.mli [<1K]
Extending the abstract syntax of step2 language with function declarations and function calls (step2/lang.mli is assumed symbolically linked under the name langLet.mli)

parser.mly [2K]

eval.ml [3K]
Extending the step3 evaluator, assumed symbolically linked under the name evalLet.ml. Although we are using effects for variable reference, we truly implement lexical scoping, by evaluating function bodies in the handling environment of their definition rather than the invocation.

test1.sh [2K]
Test cases, now including functions and lexical scoping

Makefile [<1K]

 

step5: Alternatively, adding global functions

As an alternative to step4, we add to the language of step3 second-class global functions. Their declarations must come first in a program, and they may not nest. Such C-like language is easier to compile to Wasm. (The Pascal-like language with nested function declarations may also be compiled to Wasm. However, it makes our prototype complicated and less suitable as an illustration.)
References
<step5/>
The directory for this step

lang.mli [<1K]
Extending the abstract syntax of step2 language with global function declarations and function calls (step2/lang.mli is assumed symbolically linked under the name langLet.mli)

parser.mly [2K]

eval.ml [3K]
Extending the step3 evaluator, assumed symbolically linked under the name evalLet.ml. Since the functions are global and closed, they are not closures and the question of lexical scoping is moot.

test1.sh [<1K]

Makefile [<1K]

 

Wasm as a DSL

References
wasm.mli [4K]
wasm.ml [12K]
Language definition and its implementation: generating Wasm modules (.wat files)

ex_low1.ml [5K]
Several examples of generating (and even executing) Wasm code

 

step7: Wasm compiler of a simple language with let-bound variables

We turn the interpreter of step3 into a compiler, by changing the domain of the interpretation: rather than an integer, as in step3, it is now a sequence of Wasm instructions leaving an integer on the stack.
References
<step7/>
The directory for this step. The source language -- its abstract and concrete syntaxes and the parser -- are the same as in step3. The new files are as follows.

eval_wasm.ml [<1K]
The interpreter from the basic step1 language turned into a compiler

eval_var.ml [<1K]
Lifting the basic eval_wasm.ml compiler to let-bound variables. It is exactly step3/eval.ml with EvalInt replaced with Eval_wasm. This compiler inlines let-expressions, as explained in the paper.

eval.ml [4K]
The proper compiler for let-expressions, extending eval_var.ml with variable usage analyses and memory (Wasm locals) allocation

test1.sh [<1K]

Makefile [<1K]

 

step8: Adding global functions

This step turns the step5 interpreter into a compiler. See the paper for the demo of incremental type checking.
References
<step8/>
The directory for this step. Almost all files come from step5, except for eval_wasm.ml and eval_var.ml from step7. The only new files are as follows.

eval.ml [4K]
The extension of eval_var.ml (naive let-expression compiler) with global second-class functions and their type-checking.

test1.sh [2K]
Test cases and examples

testw.ml [2K]
Tests running Wasm

Makefile [<1K]