This web page describes an alternative, realizable when the compiler and the type checker for the (object) language are written in a typed (meta) language. In that case we can `bake in' the typing rules of the object language into the typed tree representation. Such typed trees, by their very construction, realize only and all correct typing derivations of the object language. Whatever bugs a type checker may have, it now cannot produce an invalid typing derivation or inconsistent annotations. Otherwise, the type-checker code will not compile. Various transformations -- optimizations -- on the typed tree are likewise assured to preserve the validity of typing. Such static guarantees of correctness are especially important when the compiler is frequently modified or extended, which happens when implementing domain-specific languages.
We will refer to the type checking implemented in a typed metalanguage with static validity guarantees as `typed compilation'. We rely on the typed compilation when dynamically loading (typed) code or implementing typed DSL. Typed compilation is an inherently partial operation (a term being type-checked may well be ill-typed) and it has to deal with (run-time) type representations. The result of compilation, however, has no type-related partiality and needs no type information. `Typecheck once, assuredly interpret many times' is our main goal. It is achievable, albeit not simply. We follow the seminal work of Baars and Swierstra on Typing Dynamic Typing.
Conceptually, the typed compiler is a function of the signature
typecheck :: Expr -> Term t
, whose three concrete
examples are given below. One must distinguish this function from
a similar, also partial deserializer my_read :: Expr -> Term t
.
Although the latter has the same signature, the intent is different.
When the user writes my_read exp
, the user is supposed
to specify the type of the desired result. Just as when we
write read "1"
in Haskell, we are supposed
to specify the type of the expected value: Int
, Double
, etc. The
function typecheck
, in contrast,
computes the result type and builds the term of that
type, the compilation result. The function indeed acts as a compiler
of a typed language. The computed result type Term t
depends on the structure of the untyped source expression:
it is, on the face of it, a dependent type. Our solutions
are expressed in the language (Haskell) that is not normally thought
of as a dependently typed language.
Arthur I. Baars and S. Doaitse Swierstra: Typing Dynamic Typing. ICFP 2002.
Expr -> Term t
.
Here expr
is an ordinary algebraic data type of untyped source terms
(the first-order language described in Peyton-Jones et al, ICFP 2006),
and Term t
is a GADT, the typed representation.data Expr = ELit Int | EInc Expr | EIsZ Expr | ... data Term t where Lit :: Int -> Term Int Inc :: Term Int -> Term Int IsZ :: Term Int -> Term Bool ...The result type
t
depends on the
value of expr
. Thus we demonstrate the
Haskell solution of a truly dependent-type problem.Although the result of the typed compilation is a GADT, the compiler itself uses neither GADTs nor representation types. The compiler is made of two parts. The first is the conversion from Expr
to `lifted' terms:
newtype FLit = FLit Int newtype FInc e = FInc e newtype FIsZ e = FIsZ eimplemented with the help of Template Haskell:
type F = Q TH.Exp parse :: Expr -> F parse (ELit x) = [e| FLit $(litE . integerL . fromIntegral $ x) | ] parse (EInc x) = [e| FInc $(parse x) | ] parse (EIsZ x) = [e| FIsZ $(parse x) | ]The only inconvenience of using the Template Haskell is the necessity of splitting the whole code into two modules.
The main part is the typechecker class, which computes both the type
of the result t
and the corresponding value of the type
Term t
, the compilation result. The typechecking rules
are stated as TypeCheck
instances:
class TypeCheck e t | e -> t where typecheck :: e -> Term t instance TypeCheck FLit Int where typecheck (FLit x) = Lit x instance TypeCheck e Int => TypeCheck (FInc e) Int where typecheck (FInc e) = Inc (typecheck e) instance TypeCheck e Int => TypeCheck (FIsZ e) Bool where typecheck (FIsZ e) = IsZ (typecheck e)It is remarkable that the instances express the type checking rules as judgments, almost in the form they are commonly presented in papers. For example, the
IsZ
rule would be written in a paper as|- e : int --------------- |- IsZ e : bool(We do not need the environment Gamma as our language is first order.)
Given two sample terms (as strings), we parse and typed-compile them:
e1 = "EIf (ELit 1) (ELit 2) (ELit 3)" e2 = "(EIf (EIsZ (ELit 0)) " ++ " (EInc (ELit 1)) " ++ " (EFst (EPair (ELit 42) (ELit 43))))" t1' = $(parse . read $ e1) t2' = $(parse . read $ e2) -- tt1 = typecheck t1' -- gives a type error tt2 = typecheck t2' *G> :t tt2 tt2 :: Term IntObviously, the term
e1
is ill-typed: the test expression of a conditional
should be a boolean rather than an integer. Therefore,
typecheck t1
raises a type error:
``Couldn't match expected type Bool against inferred type Int''.
In contrast,
the typechecking (of the result of the parsing) of e2
succeeds.
The computed type of the compilation result is Term Int
.
We stress that the typechecking of the embedded DSL is carried out by the
Haskell typechecker. It is GHC that applies the typing
judgments of our DSL expressed as the instances of the class
TypeCheck
. We thus succeeded in embedding into Haskell
not only DSL terms but also the DSL type system.
TypeCheck.hs [3K]
TermLift.hs [2K]
The complete implementation. The code was originally posted as An example of dependent types [was: Simple GADT parser for the eval example] on the Haskell-Cafe mailing list on Wed, 1 Nov 2006 00:30:56 -0800 (PST).
typecheck :: forall repr. Symantics repr => UExp -> Gamma -> Either String (DynTerm repr) data DynTerm repr = forall a. (Show a, Typeable a) => DynTerm (repr a)Given the untyped term and the type environment,
typecheck
returns either a type error message, or the compiled term and the representation of its computed type. The compilation result can be interpreted by any Symantics interpreter assuredly without any pattern-match errors.
The biggest problem is compiling higher-order untyped terms such as
Lam "x" (UAdd (UVar "x") (UInt 1))
into typed higher-order abstract syntax: e.g.,
lam (\x -> add x (int 1))
. The naive approach such as
DynTerm (lam (\x -> unDynTerm $ typecheck (UAdd (UVar "x") (UInt 1)) [("x",x)]))fails in our goal of compiling only once: here,
typecheck
is invoked every time the compiled function is applied. Since the typechecking is partial, we can no longer assure that the result of a successful compilation can be interpreted without any type-related errors.Nevertheless, we have accomplished our task, without relying on any built-in staging, GADTs or type classes with functional dependencies. Of all the extensions to Haskell98, we only use existentials and Typeable. The code shows uncanny similarities with staging. Furthermore, it seems we need staging or its emulation for the typed compilation into higher-order abstract syntax. Both type-checking and staging have to manipulate open code, whose free variables are hypotheses, or type abstractions.
Typed compilation is closely related to logic: Typechecking is truly building a proof, using hypothetical reasoning. Moreover, our typed compiler must be explicit with weakening (to extend the typing environment) and the application of structural rules.
Our DynTerm
s are essentially Data.Dynamic
, only with more operations and
implemented in pure Haskell.
A new, much simpler version of the code, using De Bruijn indices
IncopeTypecheck.hs [18K]
Extensively commented Haskell code. The title comments explain the problem of HOAS typechecking, the solution, and its connection to staging. The code requires Incope.hs, see below.
Incope.hs [21K]
Tagless final interpreters. The archive includes Incope.hs, which defines our source language and several its interpreters.
Closing the Stage: from staged code to typed closures
The translation of the staged code to an unstaged language, too, requires manipulation of `open' code and the explicit weakening of the binding environment. The weakening is called coercion in the paper `Closing the Stage'.
typecheck :: Gamma -> Exp -> Either String TypedTerm data TypedTerm = forall t. TypedTerm (Typ t) (Term t)The compiler takes the type environment and the untyped term, the value of the ordinary algebraic data type Exp, and returns either the type error message, or the compiled term and the representation of its computed type. Both
Typ t
and Term t
are GADTs. Although to Haskell TypedTerm
is just as
`untyped' as Exp
, the latter is `deeply' untyped
whereas the former is only superficially. The former value is built
out of typed components, and the type is erased only at the end. That
fact guarantees that an evaluator of Term t
,
e.g., eval :: Term t -> t
never gets stuck. Combining
the evaluator with the typed compiler and the show function gives
the complete DSL interpreter teval :: Exp -> String
.te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0)) te4 = EApp (EPrim "rev") te3 *TypecheckedDSL> teval te3 "type Double, value 32.0" *TypecheckedDSL> teval te4 "Type error: incompatible type of the application: (String->String) and Double"
Upon success, the compiler returns a typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. We cannot write
case (typecheck te3) of Right (TypedTerm t e) -> sin (eval e)although we know that
e
has the type Term Double
and so applying the function sin
to the result of eval e
is well-typed. To the Haskell typechecker however, the type of eval e
is an unknown abstract type t
rather than Double
. Fortunately, Template Haskell
lets us convert from an existential to a concrete type. This is
equivalent to implementing an embedded compiler for our DSL.
Since TypedTerm
has already been typechecked, we are
guaranteed the absence of errors during the Template-Haskell-based
`code generation'. The compiler, of the signature tevall :: Exp -> ExpQ
can be used as followstte3 = $(tevall te3) :t tte3 tte3 :: Term Double ev_tte3 = eval tte3 -- 32.0 testr = sin (eval tte3) -- 0.5514266812416906
The key part of the implementation is the Equality GADT and checking if two DSL types are the same, and if so, computing the proof (the witness).
data EQ a b where Refl :: EQ a a eqt :: Typ a -> Typ b -> Maybe (EQ a b)
TypecheckedDSL.hs [5K]
The complete implementation of a typed DSL with the typed evaluator and the typed compiler from untyped terms to GADT. The code was originally posted as Typechecker to GADT: the full implementation of a typed DSL on the Haskell-Cafe mailing list on Thu Oct 4 02:02:32 EDT 2007.
TypecheckedDSLTH.hs [8K]
TypedTermLiftTest.hs [<1K]
The complete code for the compiler of the typed DSL, using Template Haskell to `compile' GADT to `machine code'. The code was originally posted as Typed DSL compiler, or converting from an existential to a concrete type on the Haskell-Cafe mailing list on Sat Oct 6 03:55:36 EDT 2007.
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML