Whenever one writes an interpreter or a compiler, or studies logic,
model theory, programming language theory -- one soon has to face
variables. The well-known way to deal with them is by introducing
*environment*, or variable assignment in logic. There is, however,
another, more general approach,
related to algebraic effects. One should not be too surprised. Algebraic
effects originate from studying terms with variables, and equations on
them: free algebras.

This article explains both approaches on an exasperatingly simple example. The example scales up, to something rigorous, realistic and interesting: see Not by Equations Alone: Reasoning with Extensible Effects.

- Simple reminder of denotational semantics
- Free variables and Free semantics
- Environment semantics
- Environment, Free, and Initial semantics
- Simple State
- Free Variable as Effect, in Practice

practical benefits of treating variable reference as an effect, in writing interpreters or compilers - Having an Effect
- Stack as the variable binding environment

- Consider an expression
`"(1+(3-2))"`

. It is a well-parenthesized character string: a sequence of characters with no particular meaning. Let's give them a meaning -- the one that springs to mind when seeing this string. (That is not the only possible meaning, as we see soon.) That is, digits mean integers, parentheses mean what is inside them, and`"+"`

and`"-"`

mean integer addition and subtraction. To state this formally (and hence concisely and unambiguously), it is common to use the so-called semantic brackets 〖 〗 as the abbreviation for ``the meaning of''. For example, what we said earlier can be re-stated formally as:〖`"1"`

〗 = 1 〖`"2"`

〗 = 2 〖`"3"`

〗 = 3 〖`"+"`

〗 = + 〖`"-"`

〗 = -〖`"(`

*e*_{1}`+`

*e*_{2}`)"`

〗 = 〖*e*_{1}〗 + 〖*e*_{2}〗 〖`"(`

*e*_{1}`-`

*e*_{2}`)"`

〗 = 〖*e*_{1}〗 - 〖*e*_{2}〗*e*,*e*_{1},*e*_{2}stand for arbitrary well-parenthesized sequences of characters. These equations express*compositionality*: the meaning of a composite expression is computed from the meaning of its parts -- but not the shape or other information about those parts. The meaning of the original expression then works out as:〖`"(1+(3-2))"`

〗 = 〖"1"〗 + 〖`"(3-2)"`

〗 = 1 + (3 - 2) = 2`"(`

*e*_{1}`+`

*e*_{2}`)"`

〗 depends solely on the meaning of*e*_{1}and*e*_{2}-- but not on the structure of these subexpressions. It follows then, for example, that 〖`"(1+(3-2))"`

〗 must mean the same as 〖`"(1+1)"`

〗, because 〖`"(3-2)"`

〗 = 〖`"1"`

〗. In other words, substituting a sub-expression with another (perhaps, simpler) sub-expression of the same meaning does not change the overall meaning.In our example, the meaning of the expression is an integer. Said loftier, the semantic domain is the set of integers. We call it Ｎ semantics. This is not the only possible way to assign meaning to parenthesized expressions of digits and plus/minus characters. One may, for example, chose as the semantic domain the two-element set

*{odd, even*}. We leave the meaning assignment (semantic equations) as an exercise, and consider yet another semantic domain: sequences of WASM instructions. The semantic equations are〖`"1"`

〗 =`i32.const 1`

〖`"2"`

〗 =`i32.const 2`

〖`"3"`

〗 =`i32.const 3`

〖`"(`

*e*_{1}`+`

*e*_{2}`")`

〗 = 〖*e*_{1}〗 〖*e*_{2}〗`i32.add`

〖`"(`

*e*_{1}`-`

*e*_{2}`")`

〗 = 〖*e*_{1}〗 〖*e*_{2}〗`i32.sub`

`"(1+(3-2))"`

in the WASM domain is the instruction sequencei32.const 1 i32.const 3 i32.const 2 i32.sub i32.add

We have obtained a simple compiler to WASM. (The approach does scale to much more complicated expressions.)

- Let us add variables to our expressions. Our language of
parenthesized strings of digits and plus/minus operators will now
include identifiers: sequences of letters and digits starting with a
letter. Here is a sample expression in the extended language:
`"(y-(x+1))"`

. The earlier semantics giving expressions the meaning as integers has the obvious trouble: there is no clear way to tell which integer should be the meaning of`"x"`

or`"y"`

.One may give meaning in a different way: as a collection of algebraic identities such as

(1) "(x+1)" === "(1+x)"

that is, assertions that some expressions are ``equal''. We use the sign`===`

to signify such equivalence of expressions (in contrast to the`=`

sign used for the semantic equality, e.g., equality of integers). Here, variables pose no problems: this is just an elementary school algebra. The above algebraic identity specifies one aspect of the meaning. Along with the congruence lawsif*e*_{1}`===`

*e*_{2}then`"`

*e*_{1}`+`

*e*`" === "`

*e*_{2}`+`

*e*`"`

Variables are problematic only for the most naive denotational approach. Giving a compositional semantics to a language with variables is straightforward: we merely need a more sophisticated semantic domain than mere integers. Since we do not know the meaning of variables, we ask for it. A domain that accommodates questions can be described by the following (OCaml) inductive data type declaration

type d = V of int | E of var * (int -> d)

where`var`

is the type of identifiers. This is the set of well-founded infinitely-branching trees of finite height with integer leaves. We call this Ｄ semantics. For convenience, we introducelet an n = V n let qu v k = E(v,k)

In the Ｄ domain,`V n`

is an*answer*: the meaning as an integer`n`

. On the other hand,`E(v,k)`

is a*question*, about the meaning of the variable`v`

, where`k`

of the type`int->d`

is an infinite tuple, indexed by integers. For each possible answer to the question -- for each possible integer value`n`

the variable`v`

may take --`k n`

tells the meaning of the expression for that particular variable value assignment.The denotational semantics is then given as (we attach the superscript for easy reference):

〖`"1"`

〗^{d}=`an`

1 〖`"2"`

〗^{d}=`an`

2 etc.

〖*v*〗^{d}=`qu`

*v*`an`

*v*stands for an arbitrary identifier. The meaning of a digit is, hence, an answer: the number the digit stands for. The meaning of a variable is a question about that variable, immediately returning the answer. The meaning of a composite expression becomes:〖`"(`

*e*_{1}`+`

*e*_{2}`")`

〗^{d}=`lift2`

(+) 〖*e*_{1}〗^{d}〖*e*_{2}〗^{d}

〖`("`

*e*_{1}`-`

*e*_{2}`")`

〗^{d}=`lift2`

(-) 〖*e*_{1}〗^{d}〖*e*_{2}〗^{d}`lift2 : (int->int->int) -> (d->d->d)`

. It transforms/adjusts the meaning of binary operations: extends them from Ｎ to Ｄ. It is defined generically for any binary operation on integers, by induction and case analysis:lift2 op (V n1) (V n2) = an (op n1 n2) lift2 op (E (v,k)) d2 = qu v (fun n -> lift2 op (k n) d2) lift2 op d1 (E (v,k)) = qu v (fun n -> lift2 op d1 (k n))

If we know the answer to both operands, we can compute the answer to the operation. If an operand is a question, the operation is also a question. When it gets answered, the operation is tried again. The function`lift2`

hence propagates questions; the first operand gets to ask questions first.All in all, the above semantic equations map an expression to a question-answer tree with integer leaves, which tells the meaning of the expression for each possible answer about the meaning of its variables. For example, for our sample term

`"(y-(x+1))"`

we compute its meaning asE("y", fun n -> E("x", fun m -> V(n-(m+1))))

The just defined semantics lets us verify the earlier equational law "1+x===x+1", by computing the denotations of the both sides and checking they are the same -- relying on the fact that the (semantic) addition on integers is commutative. The semantics lets us pose and verify many more identities, from the properties of integer addition. For example:

(2)`"(`

*e*`+`

*i*`)" === "(`

*i*`+`

*e*`)"`

`"((`

*e*_{1}`+`

*e*_{2}`)+`

*e*_{3}`)" === "(`

*e*_{1}`+(`

*e*_{2}`+`

*e*_{3}`))"`

*i*and well-parenthesized expressions*e*,*e*_{1}, etc. For example, we obtain`"(y-(x+1))" === "(y-(1+x))"`

, because both sides have the same meaning (shown above).

- The just described denotational semantics for the toy language with
variables is not the common one. The most familiar semantics
uses
*environment*, also called variable assignment in logic and typically denoted by ρ. The environment is a`var->int`

mapping:`ρ(v)`

tells the meaning (i.e., integer) associated with a variable`v`

in this environment. Since environment is a finite map (any program has only a finite number of variables), it is often implemented as an associative list. The meaning of an expression hence is a function`env -> int`

, which tells which integer the expression stands for given a particular variable assignment. We call this semantic domain Ｅ. The semantic equations are:〖`"1"`

〗^{e}= fun ρ -> 1 〖*v*〗^{e}= fun ρ -> ρ(*v*)

〖`"(`

*e*_{1}`+`

*e*_{2}`)"`

〗^{e}= fun ρ -> 〖*e*_{1}〗^{e}ρ + 〖*e*_{2}〗^{e}ρ 〖`"(`

*e*_{1}`-`

*e*_{2}`)"`

〗^{e}= fun ρ -> 〖*e*_{1}〗^{e}ρ - 〖*e*_{2}〗^{e}ρThe Ｅ semantics above is built over the Ｎ domain. We may just as well use the WASM domain as the basis. The environment in this case maps variable names to WASM instructions. The meaning of the sample expression

`"(y-(x+1))"`

is thenfun ρ -> ρ("y") ρ("x") i32.const 1 i32.add i32.sub

In the environment that maps`"x"`

to`local.get 0`

and`"y"`

to`local.get 1`

, our running example meanslocal.get 1 local.get 0 i32.const 1 i32.add i32.sub

- The semantics Ｅ looks rather different from the earlier Ｄ. But only
on first sight. Let's define
let ane n = fun ρ -> n let que v k = fun ρ -> k (ρ(v)) ρ let lift2e op e1 e2 = fun ρ -> op (e1 ρ) (e2 ρ)

then the semantic equations〖`"1"`

〗^{e}=`ane`

1 〖*v*〗^{e}=`que`

*v*`ane`

〖`"(`

*e*_{1}`+`

*e*_{2}`)"`

〗^{e}=`lift2e`

(+) 〖*e*_{1}〗^{e}〖*e*_{2}〗^{e}〖`"(`

*e*_{1}`-`

*e*_{2}`)"`

〗^{e}=`lift2e`

(-) 〖*e*_{1}〗^{e}〖*e*_{2}〗^{e}^{d}. Only the choice of the three functions`an`

,`qu`

,`lift2`

differs, reflecting the different semantic domain (Ｅ rather that Ｄ). One may therefore say that Ｄ and Ｅ semantics, and whatever else one may come up with, are all instances of the single general semantics of variables (call it Ｇ):〖`"1"`

〗^{g}=`ang`

1 〖*v*〗^{g}=`qug`

*v*`ang`

〖`"(`

*e*_{1}`+`

*e*_{2}`)"`

〗^{g}=`lift2g`

(+) 〖*e*_{1}〗^{g}〖*e*_{2}〗^{g}〖`"(`

*e*_{1}`-`

*e*_{2}`)"`

〗^{g}=`lift2g`

(-) 〖*e*_{1}〗^{g}〖*e*_{2}〗^{g}`ang`

,`qug`

and`lift2g`

(and`lift1g`

if the language has unary operations). This general semantics is an*initial algebra*semantics of variables. All other semantics are obtained by appropriately choosing the parameters`ang`

,`qug`

and`lift2g`

.Semantics Ｅ and Ｄ are thus two different instances of Ｇ. They can also be related in another way:

〖*e*〗^{e}= hnd 〖*e*〗^{d}`hnd`

(which we call the handler) is inductively defined by the two cases:hnd V(n) = ane n hnd E(v,k) = que v (hnd k)

In other words, the handler takes a question-answer tree and a variable assignment ρ and uses the latter to answer all the questions in the former. Another way to understand the handler is the wholesale replacement of`V`

and`E`

nodes in the question-answer tree with`ane`

and`que`

. (One may also note that`hnd`

is what is called fold in initial algebra.)Since

`hnd`

is fold, and`ane`

and`que`

could just as well be parameters`ang`

and`qug`

rather than fixed functions, the semantics Ｄ is also the initial algebra semantics (and, hence, isomorphic to Ｇ).The question-answering strategy provided by

`hnd`

(and hence, by the environment semantics) can verify more identities, for example"x+y" === "y+x"

which cannot be verified in the Ｄ semantics because nothing guarantees that the question about the meaning of, say,`x`

is answered with the same integer no matter when this question is asked relative to other questions. The handler`hnd`

makes this guarantee but other handlers may not. On the other hand, (1) and (2) are valid for any question-answering strategy.This is the taste of things to come in the paper Not by Equations Alone: Reasoning with Extensible Effects, where one encounters two sorts of semantics. One is the initial algebra, question-answer--tree semantics like Ｄ that presupposes no question-answering strategy. We often call such semantics `free'; the identities (equational laws) verified it are to be called `free' as well. The other semantics, such as Ｅ, rely on a particular question-answering strategy, and hence can verify more identities.

- Let us turn to effects. Let's go back to the toy language with digits
and addition/subtraction, and instead of variables, introduce
`"Get"`

and`"(Put`

*i*`)"`

(where*i*is a string of digits). We may tell the story of the language with free variables again, with hardly any changes. The effect operations, like the variables`x`

and`y`

earlier, are just names, with no definite interpretation. Therefore, the expression`"(Get+1)"`

, like`"(x+1)"`

earlier, is open, and poses problem for the naive denotational semantics. Just as before, we may specify the semantics via identities, such as"((Put 1)+Get)" === "((Put 1)+1)"

(in which one may recognize the so-called Put-Get law).As in the case of variables, effects call for the extension of the semantic domain. The question-answer trees work here as well:

type p = V of int | E of ops * (int -> p)

where`ops`

is the type of effect operations: in our case,type ops = Get | Put of int

Again we introduce the convenient functionslet anp n = V n let qup op k = E(op,k)

The semantics (call it Ｐ)

〖`"1"`

〗^{p}=`anp`

1 〖`"2"`

〗^{p}=`anp`

2 etc.

〖`"Get"`

〗^{p}=`qup`

`Get`

`anp`

〖`"(Put`

*i*`)"`

〗^{p}=`qup`

(`Put`

*i*)`anp`

〖`"(`

*e*_{1}`+`

*e*_{2}`")`

〗^{p}=`lift2p`

(+) 〖*e*_{1}〗^{p}〖*e*_{2}〗^{p}〖`("`

*e*_{1}`-`

*e*_{2}`")`

〗^{p}=`lift2p`

(-) 〖*e*_{1}〗^{p}〖*e*_{2}〗^{p}`lift2p`

is`lift2`

with`anp`

/`qup`

instead of`an`

/`qu`

. The semantics Ｐ is hence almost literally the same as Ｄ. It is also an instance of Ｇ.Since the question-answer semantics of state Ｐ turns out almost literally Ｄ, one may wonder what an environment-like semantics for state may look like, and how it may relate to Ｅ. In our simple state language, there is only one piece of state. Therefore, the environment ρ is the function

`unit -> int`

, which is isomorphic to an integer. The environment-like semantics of state (call it Ｓ) uses the following question operation:let qus Get k = fun ρ -> k ρ ρ let qus (Put i) k = fun ρ -> k i i

(`ans`

is the same as`ane`

). The semantics equations then〖*e*〗^{s}= hnd 〖*e*〗^{p}`hnd`

is essentially the same as before:hnd V(n) = ans n hnd E(op,k) = qus op (hnd k)

Just like Ｅ is an instance of the initial algebra semantics Ｄ, Ｓ is an instance of Ｐ. Clearly, Ｓ is the familiar state-passing semantics of state.The free semantics of state Ｐ lets us verify

`"(Get + 1)" === "(1 + Get)"`

and, in general, the identities similar to (2). On the other hand, it does not verify the Put-Get law. But the more specific semantics Ｓ does, as one can easily work out.One may now ask, what should

`an`

/`qu`

operations be to verify the Put-Get law. This is the central question for the algebraic effects and handlers approach. One may also ask: given a particular way of answering Get-Put questions (that is, particular`an`

/`qu`

operations) what are the corresponding additional identities that may be verified? This is the question of the `Not by Equations Alone' paper (in particular, for the Get/Put effect, it is answered in Sec 5 of the paper).We have thus confirmed the analogy between free variables and effect operations. It can be traced to Reynolds' Idealized Algol and explored in more detail in Having an Effect. The free, question-answer--tree semantics makes the analogy particularly easy to see. A variable access may indeed be an effect when the variable resides far off core (maybe in another host in a distributed system) or in a memory-mapped i/o. The question-answer--tree semantics could be particularly useful there.