Typed Type Checking and Typed Compilation

Introduction
Metatypechecking: Staged Typed Compilation into GADT using typeclasses
Typed compilation to HOAS as emulation of staging
Typed compilation via GADTs
Relating Final and Initial typed tagless representations

Introduction

Strictly speaking, type checking is the verification that an expression has the claimed type. As the name of a compiler phase, however, type checking is the transformation from an untyped representation of the source language to a typed-annotated representation. The former is the data type -- the Abstract Syntax Tree (AST) -- that normally comes from parsing a file. The type-annotated AST is, usually, also a data type, with extra fields (`type tags') describing the type of each node and its type environment. This `typed tree' does not ensure by construction that the type annotations are placed consistently, respecting the rules of the type system. The correctness comes only from `careful programming.'

This web page describes an alternative, realizable when the compiler and the type checker for the (object) language are written in a typed (meta) language. In that case we can `bake in' the typing rules of the object language into the typed tree representation. Such typed trees, by their very construction, realize only and all correct typing derivations of the object language. Whatever bugs a type checker may have, it now cannot produce an invalid typing derivation or inconsistent annotations. Otherwise, the type-checker code will not compile. Various transformations -- optimizations -- on the typed tree are likewise assured to preserve the validity of typing. Such static guarantees of correctness are especially important when the compiler is frequently modified or extended, which happens when implementing domain-specific languages.

We will refer to the type checking implemented in a typed metalanguage with static validity guarantees as `typed compilation'. We rely on the typed compilation when dynamically loading (typed) code or implementing typed DSL. Typed compilation is an inherently partial operation (a term being type-checked may well be ill-typed) and it has to deal with (run-time) type representations. The result of compilation, however, has no type-related partiality and needs no type information. `Typecheck once, assuredly interpret many times' is our main goal. It is achievable, albeit not simply. We follow the seminal work of Baars and Swierstra on Typing Dynamic Typing.

Conceptually, the typed compiler is a function of the signature typecheck :: Expr -> Term t, whose three concrete examples are given below. One must distinguish this function from a similar, also partial deserializer my_read :: Expr -> Term t. Although the latter has the same signature, the intent is different. When the user writes my_read exp, the user is supposed to specify the type of the desired result. Just as when we write read "1" in Haskell, we are supposed to specify the type of the expected value: Int, Double, etc. The function typecheck, in contrast, computes the result type and builds the term of that type, the compilation result. The function indeed acts as a compiler of a typed language. The computed result type Term t depends on the structure of the untyped source expression: it is, on the face of it, a dependent type. Our solutions are expressed in the language (Haskell) that is not normally thought of as a dependently typed language.

References

Arthur I. Baars and S. Doaitse Swierstra: Typing Dynamic Typing. ICFP 2002.

Metatypechecking: Staged Typed Compilation into GADT using typeclasses

We demonstrate a typed compiler, the function whose type is literally Expr -> Term t. Here expr is an ordinary algebraic data type of untyped source terms (the first-order language described in Peyton-Jones et al, ICFP 2006), and Term t is a GADT, the typed representation.

     data Expr = ELit Int | EInc Expr | EIsZ Expr | ...
     data Term t where
        Lit  :: Int -> Term Int
        Inc  :: Term Int -> Term Int
        IsZ  :: Term Int -> Term Bool ...

The result type t depends on the value of expr. Thus we demonstrate the Haskell solution of a truly dependent-type problem.

Although the result of the typed compilation is a GADT, the compiler itself uses neither GADTs nor representation types. The compiler is made of two parts. The first is the conversion from Expr to `lifted' terms:

     newtype FLit   = FLit Int
     newtype FInc e = FInc e
     newtype FIsZ e = FIsZ e

implemented with the help of Template Haskell:

     type F = Q TH.Exp
     parse :: Expr -> F
     parse (ELit x) = [e| FLit $(litE . integerL . fromIntegral $ x) | ]
     parse (EInc x) = [e| FInc $(parse x) | ]
     parse (EIsZ x) = [e| FIsZ $(parse x) | ]

The only inconvenience of using the Template Haskell is the necessity of splitting the whole code into two modules.

The main part is the typechecker class, which computes both the type of the result t and the corresponding value of the type Term t, the compilation result. The typechecking rules are stated as TypeCheck instances:

     class TypeCheck e t | e -> t where
       typecheck :: e -> Term t
      
     instance TypeCheck FLit Int where
       typecheck (FLit x) = Lit x
      
     instance TypeCheck e Int => TypeCheck (FInc e) Int where
       typecheck (FInc e) = Inc (typecheck e)
      
     instance TypeCheck e Int => TypeCheck (FIsZ e) Bool where
       typecheck (FIsZ e) = IsZ (typecheck e)

It is remarkable that the instances express the type checking rules as judgments, almost in the form they are commonly presented in papers. For example, the IsZ rule would be written in a paper as

     |- e : int
     ---------------
     |- IsZ e : bool

(We do not need the environment Gamma as our language is first order.)

Given two sample terms (as strings), we parse and typed-compile them:

     e1 = "EIf (ELit 1) (ELit 2) (ELit 3)"
     e2 = "(EIf (EIsZ (ELit 0))          " ++
          "     (EInc (ELit 1))          " ++
          "     (EFst (EPair (ELit 42) (ELit 43))))"
     t1' = $(parse . read $ e1)
     t2' = $(parse . read $ e2)
      
     -- tt1 = typecheck t1' -- gives a type error
     tt2 = typecheck t2'
       *G> :t tt2
       tt2 :: Term Int

Obviously, the term e1 is ill-typed: the test expression of a conditional should be a boolean rather than an integer. Therefore, typecheck t1 raises a type error: ``Couldn't match expected type Bool against inferred type Int''. In contrast, the typechecking (of the result of the parsing) of e2 succeeds. The computed type of the compilation result is Term Int.

We stress that the typechecking of the embedded DSL is carried out by the Haskell typechecker. It is GHC that applies the typing judgments of our DSL expressed as the instances of the class TypeCheck. We thus succeeded in embedding into Haskell not only DSL terms but also the DSL type system.

Version

The current version is 1.1, Nov 2006.

References

TypeCheck.hs [3K]
TermLift.hs [2K]
The complete implementation. The code was originally posted as An example of dependent types [was: Simple GADT parser for the eval example] on the Haskell-Cafe mailing list on Wed, 1 Nov 2006 00:30:56 -0800 (PST).

Typed compilation to HOAS as emulation of staging

We present the typed compilation from an untyped higher-order language to the typed tagless final representation. The latter can be evaluated by all the interpreters in the Tagless Final paper. The typechecking happens only once, regardless of the number of interpretations of the term. The typed compiler has the signature

     typecheck :: forall repr. Symantics repr => 
                  UExp -> Gamma -> Either String (DynTerm repr)
     data DynTerm repr = forall a. (Show a, Typeable a) => DynTerm (repr a)

Given the untyped term and the type environment, typecheck returns either a type error message, or the compiled term and the representation of its computed type. The compilation result can be interpreted by any Symantics interpreter assuredly without any pattern-match errors.

The biggest problem is compiling higher-order untyped terms such as Lam "x" (UAdd (UVar "x") (UInt 1)) into typed higher-order abstract syntax: e.g., lam (\x -> add x (int 1)). The naive approach such as

     DynTerm (lam (\x -> unDynTerm $ typecheck (UAdd (UVar "x") (UInt 1)) [("x",x)]))

fails in our goal of compiling only once: here, typecheck is invoked every time the compiled function is applied. Since the typechecking is partial, we can no longer assure that the result of a successful compilation can be interpreted without any type-related errors.

Nevertheless, we have accomplished our task, without relying on any built-in staging, GADTs or type classes with functional dependencies. Of all the extensions to Haskell98, we only use existentials and Typeable. The code shows uncanny similarities with staging. Furthermore, it seems we need staging or its emulation for the typed compilation into higher-order abstract syntax. Both type-checking and staging have to manipulate open code, whose free variables are hypotheses, or type abstractions.

Typed compilation is closely related to logic: Typechecking is truly building a proof, using hypothetical reasoning. Moreover, our typed compiler must be explicit with weakening (to extend the typing environment) and the application of structural rules.

Our DynTerms are essentially Data.Dynamic, only with more operations and implemented in pure Haskell.

Version

The current version is 1.2, Nov 21, 2007.

References

A new, much simpler version of the code, using De Bruijn indices

IncopeTypecheck.hs [18K]
Extensively commented Haskell code. The title comments explain the problem of HOAS typechecking, the solution, and its connection to staging. The code requires Incope.hs, see below.

Incope.hs [21K]
Tagless final interpreters. The archive includes Incope.hs, which defines our source language and several its interpreters.

Closing the Stage: from staged code to typed closures
The translation of the staged code to an unstaged language, too, requires manipulation of `open' code and the explicit weakening of the binding environment. The weakening is called coercion in the paper `Closing the Stage'.

Typed compilation via GADTs

We describe the typed compiler into a GADT-based typed representation. The compiler itself is implemented via GADT. It has the signature

     typecheck :: Gamma -> Exp -> Either String TypedTerm
     data TypedTerm = forall t. TypedTerm (Typ t) (Term t)

The compiler takes the type environment and the untyped term, the value of the ordinary algebraic data type Exp, and returns either the type error message, or the compiled term and the representation of its computed type. Both Typ t and Term t are GADTs. Although to Haskell TypedTerm is just as `untyped' as Exp, the latter is `deeply' untyped whereas the former is only superficially. The former value is built out of typed components, and the type is erased only at the end. That fact guarantees that an evaluator of Term t, e.g., eval :: Term t -> t never gets stuck. Combining the evaluator with the typed compiler and the show function gives the complete DSL interpreter teval :: Exp -> String.

     te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0)))
                (EApp (EPrim "inc") (EDouble 20.0))
     te4 = EApp (EPrim "rev") te3
      
        *TypecheckedDSL> teval te3
        "type Double, value 32.0"
      
        *TypecheckedDSL> teval te4
        "Type error: incompatible type of the application: (String->String) and Double"

Upon success, the compiler returns a typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. We cannot write

     case (typecheck te3) of Right (TypedTerm t e) -> sin (eval e)

although we know that e has the type Term Double and so applying the function sin to the result of eval e is well-typed. To the Haskell typechecker however, the type of eval e is an unknown abstract type t rather than Double. Fortunately, Template Haskell lets us convert from an existential to a concrete type. This is equivalent to implementing an embedded compiler for our DSL. Since TypedTerm has already been typechecked, we are guaranteed the absence of errors during the Template-Haskell-based `code generation'. The compiler, of the signature tevall :: Exp -> ExpQ can be used as follows

     tte3 = $(tevall te3)
       :t tte3
       tte3 :: Term Double
     ev_tte3 = eval tte3
       -- 32.0
     testr = sin (eval tte3)
       -- 0.5514266812416906

The key part of the implementation is the Equality GADT and checking if two DSL types are the same, and if so, computing the proof (the witness).

     data EQ a b where Refl :: EQ a a
     eqt :: Typ a -> Typ b -> Maybe (EQ a b)

Version

The current version is 1.2, October 2007.

References

TypecheckedDSL.hs [5K]
The complete implementation of a typed DSL with the typed evaluator and the typed compiler from untyped terms to GADT. The code was originally posted as Typechecker to GADT: the full implementation of a typed DSL on the Haskell-Cafe mailing list on Thu Oct 4 02:02:32 EDT 2007.

TypecheckedDSLTH.hs [8K]
TypedTermLiftTest.hs [<1K]
The complete code for the compiler of the typed DSL, using Template Haskell to `compile' GADT to `machine code'. The code was originally posted as Typed DSL compiler, or converting from an existential to a concrete type on the Haskell-Cafe mailing list on Sat Oct 6 03:55:36 EDT 2007.

Last updated August 19, 2016

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML