previous   next   up   top

Typed tagless-final interpretations: Lecture notes


The course on typed tagless-final embeddings of domain-specific languages has been presented at the Spring School on Generic and Indexed Programming (SSGIP) < > at Wadham College, Oxford, UK on 22nd to 26th March 2010. This page collects the notes for the course in the form of the extensively commented Haskell and OCaml code.



The topic of the course is the embedding of domain-specific languages (DSL) in a host language such as Haskell or OCaml. We will often call the language to embed `the object language' and the host language `the metalanguage'. All throughout the course we will repeatedly encounter the following points:
Multiple interpretations:
writing a DSL program once, and interpret it many times, in standard and non-standard ways;
enriching the syntax of the object language, re-using but not breaking the existing interpreters;
Typed implementation language
getting the typechecker to verify properties of interpreters, such as not getting stuck;
Typed object language
getting the metalanguage typechecker to enforce properties of DSL programs, such as being well-typed;
Connections with logic
preferring lower-case
preferring elimination over introduction
connecting to denotational semantics


Lecture Notes

[The Abstract of the paper]
The so-called `typed tagless final' approach of Carette et al. has collected and polished a number of techniques for representing typed higher-order languages in a typed metalanguage, along with type-preserving interpretation, compilation and partial evaluation. The approach is an alternative to the traditional, or `initial' encoding of an object language as a (generalized) algebraic data type. Both approaches permit multiple interpretations of an expression, to evaluate it, pretty-print, etc. The final encoding represents all and only typed object terms without resorting to generalized algebraic data types, dependent or other fancy types. The final encoding lets us add new language forms and interpretations without breaking the existing terms and interpreters.

These lecture notes introduce the final approach slowly and in detail, highlighting extensibility, the solution to the expression problem, and the seemingly impossible pattern-matching. We develop the approach further, to type-safe cast, run-time-type representation, Dynamics, and type reconstruction. We finish with telling examples of type-directed partial evaluation and encodings of type-and-effect systems and linear lambda-calculus.

The current version is August 2012.
lecture.pdf [275K]
Typed Tagless Final Interpreters
Generic and Indexed Programming: International Spring School, SSGIP 2010, Oxford, UK, March 22-26, 2010, Revised Lectures
Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science 7470, 2012, pp. 130-174 doi:10.1007/978-3-642-32202-0_3

First-order languages (generic programming)

We will be talking about ordinary data types and (generic) operations on them. The expression problem will make its appearance. The first-order case makes it easier to introduce de-serialization and seemingly non-compositional operations.


Initial and final, deep and shallow: the first-order case

Intro1.hs [2K]
Algebraic data type/initial representation of expressions
Constructor functions: the intimation of the final representation (or, shallow embedding)

Intro2.hs [3K]
Symantics: parameterization of terms by interpreters

Intro3.hs [2K]
Initial and Final, Deep and Shallow, First-class

ExtI.hs [<1K]
Algebraic data types are indeed not extensible

ExtF.hs [2K]
Adding a new expression form to the final view: solving the expression problem

Serialize.hs [7K]
Serialization and de-serialization

SerializeExt.hs [4K]
De-serializing the extended language


Final embeddings in OCaml

We demonstrate several encodings of extensible first-order languages in OCaml. Objects turn out handy in emulating the composition of type class dictionaries. [2K]
The traditional application of objects to represent extensible data types. Alas, the set of operations on these data types is not extensible. [3K]
Tagless-final embedding using modules [3K]
Tagless-final embedding with objects emulating type-class dictionaries. Both the language and the set of its interpretations are extensible.


Non-compositionality: Fold-unlike processing

Interpreters are well suited for compositional, fold-like operations on terms. The tagless-final representation of terms makes writing interpreters quite convenient. One may wonder about operations on terms that do not look like fold. Can we even pattern-match on terms represented in the tagless-final style? Can we compare such terms for equality? We answer the first question here, deferring the equality test till the part on implementing a type checker for a higher-order language. Our running examples are term transformations, converting an expression into a simpler, more optimal, or canonical form. The result is an uncrippled expression, which we can feed into any of the existing or future interpreters. Our sample term transformations look like simplified versions of the conversion of a boolean formula into a conjunctive normal form.

PushNegI.hs [3K]
Pushing the negation down: the initial implementation

PushNegF.hs [3K]
Pushing the negation down: the final implementation

PushNegFExt.hs [4K]
Pushing the negation down for extended tagless-final terms

FlatI.hs [2K]
FlatF.hs [4K]
Flattening of additions, the initial and the final implementations

PushNegFI.hs [4K]
The final-initial isomorphism, and its use for implementing arbitrary pattern-matching operations on tagless-final terms.
< >
Ralf Hinze, in Part 7 of his Spring School course, has derived this `initial-final' isomorphism rigorously, generally and elegantly from the point of view of Category Theory. In the first-order case, both `initial' and `final' are the left and the right views to the same Initial Algebra. The `final' view is, in the first-order case, ordinary Church encoding.

Interpreters for higher-order languages

Higher-order languages are data types with binding, so to speak. In the first part, only the interpreters were typed; we could get away with our object language being unityped. Now, the object language itself becomes typed, bringing the interesting issues of interpreting a typed language in a type language ensuring type preservation. It is this part that explains the attributes `typed' and 'tagless' in the title of the course.


Type-preserving embedding of higher-order, typed DSLs

Using simply-typed lambda-calculus with constants as a sample DSL, we demonstrate its various embeddings into Haskell. We aim at a type-preserving embedding and efficient and type-preserving evaluations. The tagless-final embedding not only achieves this goal, it also makes the type-preservation patently clear. Tagless-final evaluators are well-typed Haskell programs with no pattern-matching on variant types. It becomes impossible for the evaluators to get stuck. Since the type preservation of the evaluators is apparent not only to us but also to a Haskell compiler, the evaluators can be efficiently compiled. Tagless-final embeddings are also extensible, letting us add to the syntax of the DSL, preserving and reusing old interpreters.

IntroHOT.hs [3K]
The illustration of problems of embedding a typed DSL into a typed metalanguage
Either the Universal type (and hence spurious partiality, type tags and inefficiency), or fancy type systems seem inevitable. The problem stems from algebraic data types' being too broad: they express not only well-typed DSL terms but also ill-typed ones.

Term.agda [2K]
< >
Shin-Cheng Mu: Typed Lambda-Calculus Interpreter in Agda. September 24, 2008
Shin-Cheng Mu solves the problem of the type-preserving tagless interpretation of simply-typed lambda-calculus, relying on dependent types and type functions in full glory.

IntroHOIF.hs [6K]
Tagless-initial and Tagless-final evaluators

TTFdB.hs [7K]
Typed, tagless, final, with De Bruijn indices: Expressing the type system of simply-typed lambda-calculus in Haskell98. No dependent types are necessary after all. The types of methods in the Symantics type class read like the axioms and inference rules of the implication fragment of minimal logic.

TTF.hs [7K]
Typed, tagless, final, in the higher order abstract syntax (HOAS). We illustrate extending the DSL with more constants, types, and expression forms.

TTIF.hs [8K]
Initial-final isomorphism, in the higher-order case

TTFdBHO.hs [2K]
Converting from the De-Bruijn--based Symantics to the one based on the higher-order abstract syntax
This is the first example of a transformer, which translates from one Symantics to another. The transformer works as an interpreter, whose primitive operations are implemented in terms of another Symantics. In the tagless final approach, transformers are manifestly typed and type-preserving.


De-serialization and type-checking

Since we represent DSL expressions as well-typed Haskell terms, we can place DSL terms in Haskell code or enter at the GHCi prompt. However, we also want to interpret DSL expressions that are read from files or received from communication pipes. We no longer can then rely on GHC to convert DSL expressions from a text format into the typed embedding. We have to do the type-checking of DSL expressions ourselves. Our goal is to type-check an expression once, during de-serialization, and evaluate the result many times. Since a type checker needs to represent types and reason about type equality, we develop type representations and type safe cast. We regard the language of types, too, as a typed DSL, which we embed in Haskell in the tagless-final style.

TypeCheck.hs [12K]
De-serialization: (Dynamic) Type Checking
In contrast to an earlier version of the type checker, we use De Bruijn indices and obtain a much clearer code. The code is quite similar to Baars and Swierstra's ``Typing Dynamic Typing'' (ICFP02). However, the result of our type-checking is an embedded DSL expression that can be interpreted many times and in many ways, rather than being merely evaluated. The set of possible interpretations is open. Also, our code is written to expose more properties of the type-checker for verification by the Haskell type-checker; for example, that closed source terms are de-serialized into closed target terms.

Typ.hs [8K]
Type representation, equality and the type-safe generalized cast
We present an above-the-board version of Data.Typeable , in the tagless-final style. Our implementation uses no GHC internal operations, no questionable extensions, or even a hint of unsafe operations.

< >
Stephanie Weirich some time ago wrote a very similar type-checker, but in the initial style, using GADTs. The comparison with the tagless-final style here is illuminating.

Applications and Extensions


Ordinary and one-pass CPS transformation

We demonstrate ordinary and administrative-redex--less call-by-value Continuation Passing Style (CPS) transformation that assuredly produces well-typed terms and is patently total.

Our goal here is not to evaluate, view or serialize a tagless-final term, but to transform it to another one. The result is a tagless-final term, which we can interpret in multiple ways: evaluate, view, or transform again. We first came across transformations of tagless-final terms when we discussed pushing the negation down in the simple, unityped language of addition and negation. The general case is more complex. It is natural to require the result of transforming a well-typed term be well-typed. In the tagless-final approach that requirement is satisfied automatically: after all, only well-typed terms are expressible. We impose a more stringent requirement that a transformation be total. In particular, the fact that the transformation handles all possible cases of the source terms must be patently, syntactically clear. The complete coverage must be so clear that the metalanguage compiler should be able to see that, without the aid of extra tools.

Since the only thing we can do with tagless-final terms is to interpret them, the CPS transformer is written in the form of an interpreter. It interprets source terms yielding transformed terms, which can be interpreted in many ways. In particular, the terms can be interpreted by the CPS transformer again, yielding 2-CPS terms. CPS transformers are composable, as expected.

A particular complication of the CPS transform is that the type of the result is different from the type of the source term: the CPS transform translates not only terms but also types. Moreover, base types and the arrow type are translated in different ways. To express CPS, we need an interpreter that gives the meaning not only to terms but also to types. In particular, what the function types denote should be up to a particular interpreter. It turns out the existing tagless-final framework is up to that task: with the help of type families, we can after all define an instance of Symantics that interprets source types as CPS-transformed types.

The ordinary (Fischer or Plotkin) CPS transform introduces many administrative redices, which make the result too hard to read. Danvy and Filinski proposed a one-pass CPS transform, which relies on the metalanguage to get rid of the administrative redices. The one-pass CPS transform can be regarded as an example of the normalization-by-evaluation.

CPS.hs [10K]
Ordinary and one-pass CPS transforms and their compositions

TTFdBHO.hs [2K]
The simplest tagless-final transformer, from the De-Bruijn--based Symantics to the one based on the higher-order abstract syntax

Olivier Danvy and Andrzej Filinski. Representing Control: A Study of the CPS Transformation.
Mathematical Structures in Computer Science, 1992.


Type-directed partial evaluation

Olivier Danvy's original POPL96 paper on type-directed partial evaluation used an untyped target language, represented as an algebraic data type. Type preservation was not apparent and had to be proved. In our presentation, the result of reification is a typed expression, in the tagless-final form. Type preservation of reification is now syntactically apparent and is verified by the Haskell type-checker. In the tagless-final presentation, reification and reflection seem particularly symmetric, elegant and insightful.

TDPE.hs [6K]
Tagless-final presentation of type-directed partial evaluation
ToTDPE.hs [<1K]
The imported module with sample functions to reify. Compiling this module makes for a nicer example.

< >
Olivier Danvy: Lecture notes on type-directed partial evaluation. The lecture notes are based on his POPL96 paper.


Linear and affine lambda-calculi

One may think that only those DSL can be embedded in Haskell whose type system is a subset of that of Haskell. To counter that impression we show how to faithfully embed typed linear lambda calculus. Any bound variable must be referenced exactly once in abstraction's body. As before, only well-typed and well-formed terms are representable. Haskell as the metalanguage will statically reject as ill-typed the attempts to represent terms with a bound variable referenced several times -- or, as in the K combinator, never.

We build on the embedding of the ordinary simply typed lambda calculus with De Bruijn indices described earlier. An object term of the type a was represented as a value of the type repr h a where the binary type constructor repr is a member of the class Symantics . Here h stands for the type environment, assigning types to free variables (`hypotheses') of a term. Linear lambda calculus regards bound variables as representing resources; referencing a variable consumes the resource. We use the type environment for tracking the state of resources: available or consumed. The type environment becomes the type state . We follow the approach described in Variable (type)state `monad' .

We represent linear lambda terms by Haskell values of the type repr hi ho a , where hi stands for the variable state before evaluating the term and ho stands for the state after evaluating the term. To be more precise, hi and ho are the types of the variable states. We can determine the types and hence the state of the variables statically: As usual, the type checker does abstract interpretation. In our tagless-final encoding, lam has the following type

     	 lam :: repr (F a,hi) (U,ho) b  -> repr hi ho (a->b)
The expression (lam e) has the type repr hi ho (a->b) provided the body of abstraction, e , has the type repr (F a,hi) (U,ho) b . That is, in the environment extended with a term of the type a , the body must produce the value of type b . The body must consume the term at the top of the environment, changing the type of the first environment cell from F a to U (the type of the used variable).

A trivial modification turns the embedding of the linear lambda-calculus into that of the affine lambda-calculus, which allows to ignore bound variables. K combinator becomes expressible.

LinearLC.hs [11K]
Commented code defining the typed linear lambda calculus and its two interpreters, to evaluate and to show linear lambda terms. Later we add general abstractions imposing no constraints on the use of bound variables.

Jeff Polakow: Embedding a full linear Lambda calculus in Haskell
Proceedings of the ACM SIGPLAN Haskell Symposium, 2015, pp. 177-188
Polakow's tagless-final linear lambda-calculus interpreter relies on higher-order abstract syntax, rather than De Bruijn indices of LinearLC.hs. He implements the full linear lambda calculus with additives and units.


Lambek calculi

Lambek calculus is a resource-sensitive calculus introduced by Lambek in 1958, almost three decades before linear logic. Like linear logic, Lambek calculus does not have the weakening rule. In fact, in the non-associative Lambek calculus NL, the antecedent of a sequent is a tree and there are no structural rules at all. Lambek calculus is hence the purest, simplest, and the earliest substructural logic. Adding the associativity and commutativity rules (that is, treating the antecedent as a sequence rather than a tree, and allowing exchange) turns Lambek calculus into a fragment of the Multiplicative Linear Logic (MILL).

The most noticeable difference of Lambek calculus from the conventional or linear lambda-calculus is its directional implications and abstractions. There are two function types typically written as B / A and A \ B, called left/right slash-types rather than arrow types. The function of the type B / A accepts the argument of the type A on the right; A \ B accepts the A argument on the left. There are hence two rules for eliminating implications and, correspondingly, two rules for introducing them, which bring in the power of hypothetical reasoning. Although the NL calculus per se has no structural rules, various NL theories add so-called structural postulates: the ways to rearrange the antecedent structure in particular limited ways.

All these features set Lambek calculi even farther apart from the Haskell type system. And yet it can be embedded in Haskell, in the tagless-final approach. All and only valid derivations are represented by Haskell values of a particular repr type. One tagless-final interpreter prints the ``yield'', the constants used in the derivation. In linguistic applications the yield spells out the sentence whose parse is represented by the derivation. Other interpreters transform the derivation to a logical formula that stands for the meaning of the sentence.

A so-called symmetric, Lambek-Grishin calculus has, in addition to directional implications, directional co-implications and interesting symmetric structural rules for moving between the antecedent and the consequent structures. It too can be represented in the tagless-final style. The semantic interpretation builds the meaning formula in the continuation-passing style.

Michael Moortgat: Typelogical Grammar
The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), Edward N. Zalta (ed.)
< >

HOCCG.hs [28K]
Explanation of the code
Non-associative Lambek calculus NL with the non-standard semantic interpretation. Applications to quantification, non-canonical coordination and gapping.

LG.hs [10K]
Tagless-final embedding of the Lambek-Grishin symmetric calculus and its 1-CPS translation
Our starting point is the regular CBV CPS translation for lambda-LG, described on p. 697 of the paper by Michael Moortgat ``Symmetric Categorial Grammar''. J. Philos. Logic, 2009. The original translation (eq (20) of the paper) yields many administrative beta redices. The present translation uses lightweight staging to remove such redices in the process.


Further applications

Last updated December 4, 2014

This site's top page is
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML