- How OCaml type checker works -- or what polymorphism and garbage collection have in common
- Implementing, and understanding type classes
- Interpreting types as abstract values: A tutorial on Hindley-Milner type inference
- Leibniz equality can, after all, be made injective
- Advanced Polymorphism in Simpler-Typed Languages
- Motivation and development of let-polymorphism, value restriction and its relaxation (Section 2.1 of the paper)
- Attractive Haskell Types: dependent, higher-ranked, existential, etc. types and logical type programming in Haskell
- Type inference and the undead code
- Type checking as a small-step abstract evaluation
- The challenge of first-class memory
- Lightweight Dependent-type Programming
- Polymorphic stanamically balanced AVL trees
- Genuine dependent types and faking them on the example of type-safe printf
- Number-parameterized types and decimal type arithmetic
- Abstraction and polymorphism: Type-level lambda calculator in three lines of Haskell
- Extensible Effects and Extensible Type-State
- Inverse type checking and theorem proving in intuitionistic and classical logics
- The syntactic approach to type soundness, proofs in Twelf
- First-class modules: Leibniz equality, GADTs and generic programming
- Representing existential data types with isomorphic simple types
- How to reify fresh type variables?
- Universally polymorphic lists and a polymorphic FOLD in ANSI C without casts
- Typed Tagless Interpretations and Typed Compilation Type checking, staging, and dependent types

- We expound a view of type checking as the evaluation with `abstract
values'. Whereas the dynamic semantics, evaluation, deals with (dynamic)
values like 0, 1, etc., the static semantics, type-checking, deals with
approximations like
`int`

. A type system is sound if it correctly approximates the dynamic behavior and predicts its outcome: if the static semantics predicts that a term has the type`int`

, the dynamic evaluation of the term, if it terminates, will yield an integer.Conventional type-checking is the big-step evaluation in the abstract: to find a type of an expression, we fully `evaluate' its immediate subterms to their types. We propose a different kind of type checking that is small-step evaluation in the abstract: it unzips (pun intended: cf. Huet's zipper) an expression into a context and a redex. The type-checking algorithms in the paper are implemented in Twelf; the complete code is available. In particular, the well-commented file

`lfix-calc.elf`

implements, by way of introduction and comparison, the small-step type-checking for simply typed lambda calculus.The benefit of the small-step type checking is that, by `following' the control flow, it is more suitable for effectful computations. A more interesting result comes from the fact that static evaluation, unlike the dynamic one, goes `under lambda'. Thus the small-step static evaluation context is a binding context.

Colon becomes the new turnstile! Since the evaluation context is neither commutative or associative but is subject to focusing rules, we obtain the framework for expressing various substructural and modal logics.

Joint work with Chung-chieh Shan.

**Version**- The current version is 1.1, June 2007
**References**- A Substructural Type
System for Delimited Continuations

This paper introduced the small-step type checking when designing the first sound type system for dynamic delimited continuations.small-step-typechecking.tar.gz [10K]

Commented Twelf code with small small-step type checking algorithms and many sample derivations.Abstract interpretation in the denotational context:

Patrick Cousot: Types as abstract interpretations. POPL 1997, 316-331.Abstract interpretation as an aid during program development of logical programs:

Manuel Hermenegildo, Germa'n Puebla, and Francisco Bueno: Using global analysis, partial specifications, and an extensible assertion language for program validation and debugging.

In: The logic programming paradigm: A 25-year perspective, ed. Krzysztof R. Apt, Victor W. Marek, Mirek Truszczynski, and David S. Warren, 161-192, 1999. Berlin: Springer-Verlag.Interpreting types as abstract values: A tutorial on Hindley-Milner type inference

*First-class memory*(or, first-class store) is a program value that represents a region of mutable memory. In other words, it is a collection of mutable cells -- with the interface to allocate new cells, get and update the contents of a cell. First-class memory helps implement versioned arrays, replay debugging and nested transactions (Morrisett, 1993), among others. It is particularly useful in non-deterministic and probabilistic programming. A non-deterministic choice is in many ways akin to Unix`fork(2)`

, splitting the current computational process into several, one per choice alternative. Each process has its own memory, whose contents is inherited from the parent but can be mutated without affecting the other processes -- thus ensuring that independent non-deterministic choices are indeed evaluated independently. As in Unix, making copies of process memory is best done lazily, as `copy-on-write'. To manage process memory and copy-on-write, the Unix kernel maintains a special data structure. First-class memory is like such data structure, letting an ordinary programmer do the kernel-like memory management.First-class memory has at least the following interface (described below in Haskell):

type Memory -- the abstract type of first-class memory empty :: Memory -- allocate new, initially empty memory size :: Memory -> Int -- get the current memory size (the number of allocated cells) type Ref a -- the abstract reference to a memory cell holding a value of type a new :: a -> Memory -> (Ref a, Memory) get :: Ref a -> Memory -> a set :: Ref a -> a -> Memory -> Memory

The interface is similar to the familiar`Data.STRef`

interface, with`newSTRef`

,`readSTRef`

,`writeSTRef`

operations. However, our`new`

,`get`

and`set`

explicitly use the`Memory`

in which the cell is (to be) located.The interface comes with the performance requirement: the execution time of

`get`

and`set`

(or, more precisely, the number of accesses to the real computer memory) should not exceed a small constant. In other words, the implementation of`Memory`

should at least be as efficient as the`Data.IntMap`

trie.The challenge is implementing the interface efficiently -- and, at the same time, statically guaranteeing some degree of correctness. The correct implementation should satisfy the familiar laws:

let (r,m') = new x m in get r m' === x for all x. let m' = set r' x m in get r m' === get r m if r /= r', r' exists let m' = set r x m in get r m' === x if r exists let m' = set r x m in set r y m' === set r y m

That is: we get what we store; storing in a memory cell overrides its previous value without affecting the other cells. Ensuring these laws in types seems to require too complicated type system to be practical. Still, we would like the types at the very least guarantee an approximation of these laws, where`===`

is understood as equating all values of the same type. In other words, we want to ensure that`get`

-ting and`set`

-ting of existing references are*total*and type-preserving. We want to use types so that the correctness guarantees are clear even to the compiler, which would then optimize out redundant checks. Implementing practical first-class memory with such typing guarantees is still an open challenge.Let's consider a straightforward non-solution. Recall,

`Memory`

should store values of arbitrary types. The first impulse is to represent memory cells as`Dynamic`

, `dynamically-typed' values. Hence the implementationtype Memory = IntMap Int Dynamic type Ref a = Int

a finite map from locations (`Int`

) to values of*some*type. Alas, the operation`fromDynamic :: Dynamic -> Maybe a`

to read from this`Dynamic`

reference cell is partial.`Dynamic`

stores a value and the representation of its type;`fromDynamic`

checks that the (representation of the) requested result type`a`

matches the type (representation) of the stored value. Trying to extract from the`Dynamic`

a value of a different type than was stored returns`Nothing`

. In the correctly implemented`Memory`

,`get`

for an existing reference is total and type-preserving. Therefore, the type-matching check of`fromDynamic`

should always succeed and hence redundant. The`Dynamic`

representation is unsatisfactory because it imposes a run-time check that should never fail. Again, the challenge is to ensure correctness statically and avoid run-time assertions.Besides the performance,

`Dynamic`

is unsatisfactory theoretically. A`Dynamic`

stores a value and a representation of its type. On the other hand, the memory address -- the location of a memory cell -- uniquely identifying a cell also uniquely describes the type of the stored value. That is, within given`Memory`

, the location is itself a type representation: location equality implies type equality. Therefore, the type representation used in`Dynamic`

is redundant.The pragmatic way to avoid redundant checks is to use

`unsafeCoerce`

. Although helping performance, it forces us to rely on pen-and-paper correctness proofs. We better make no mistakes in the proof, or else risk a segmentation fault.Let's recall the outline of the pen-and-paper correctness argument, to see why it is so hard to express in types. First, the type of reference cells

`Ref a`

includes the type of the stored value. The type signatures of`new`

,`get`

and`set`

show that once a reference`r`

is created to hold the value of some type`a`

,`get r`

and`set r`

can only read and write values of that type`a`

. To complete the proof we have to check that the reference`r`

created by`new x m`

is `fresh' -- it names a cell that is different from any cell in the parent memory`m`

-- and`get r m'`

and`set r y m'`

really access that very cell`r`

. The correctness of`get`

and`set`

is hence tied up to the third function,`new`

: that`new`

really creates fresh references. Expressing and using this freshness property in types is the real challenge.**References**- J. Gregory Morrisett: Refining First-Class Stores

ACM SIGPLAN Workshop on State in Programming Languages, 1993Memory.hs [4K]

First-class memory module used in the probabilistic programming language Hakaru10

In this implementation the reference Ref acts as a cache for the stored value.Probabilistic programming using first-class stores and first-class continuations

- Type inference is generally non-compositional: it is
context-dependent. Hindley-Milner--like type inference, especially in
its `collect constraints and solve them' formulation, is explicitly
designed to let the context influence/determine the type of an
expression such as a bare variable. Nevertheless, the
non-compositionality of type inference does sometimes catch us by
surprise. For example, when the `clearly dead' well-typed code affects
the typeability of an expression -- or even its result. The present
article comes from a discussion with Andreas Abel, started on
the Haskell-Cafe list in July 2012. He mentioned a wish to preserve
the examples raised during the discussion. This article is to fulfill
the wish -- by collecting, extending and polishing the examples and drawing
connections.
`Dead code' generally means a subexpression that has no information flow -- neither data nor control -- into the expression's result. In this article we use the term specifically for irrelevant definitions: well-typed and side-effect-free definitions with no information flow into the expression of interest; the defined identifiers do not or even cannot appear in the expression in question.

We start with the simplest, well-known example of an irrelevant definition making an expression untypeable. One has certainly seen such examples, in any language with the Hindley-Milner--like inference. For concreteness, we use the simple subset of OCaml below:

let exp = let id = fun u -> u in let f = id id in let z = fun () -> f true in f 1

The code does not type-check, with the type error reported for`f 1`

. However, if we remove the definition of`z`

-- which clearly does not appear within`f 1`

-- the latter expression becomes well-typed.Exercise: what other valid (from the dynamic-semantics point of view) transformation on

`exp`

will make it well-typed?In Haskell98 and above, the similar problem is usually associated with the infamous MonomorphismRestriction:

f = print exp = f 1 z = f True

The above code fails to type-check, with the compiler blaming`1`

in`f 1`

. The definition of`z`

is obviously not relevant to`exp`

(in realistic programs, it could have appeared many lines down the program). Yet if we remove that definition, the program type-checks and runs. Although the MonomorphismRestriction is often the culprit, it is not always to blame. For example:exp = do f <- return id r <- return (f 1) z <- return (f True) print r

Although`z`

is bound after`r`

, it nevertheless manages to get the type-checker to complain about the`r`

definition, specifically, about the literal`1`

. This time, the problem has nothing to do with the MonomorphismRestriction, or the presence or absence of explicit type signatures.In all these examples, the type of

`f`

cannot be fully determined from its definition; we need to know the context of its use. The dead code -- the`z`

definition -- although contributing nothing to`exp`

's result, uses`f`

in a `misleading', from the point of view of the live code, context, thus causing the type error. It is worth contemplating how come the type error is reported in the `live' expression rather than in the dead code (even when the dead code appears after the live one).It is likely less known that the dead code can have the opposite effect: removing it makes the program untypeable. As far as the computation and its result are concerned, such code does nothing. Still, it does something useful, from the type checker point of view. For example, the following program compiles successfully.

let id = fun u -> u let f = id id let exp x = f x let z () = f True

If we remove the definition of`z`

, it will not: the compiler will blame the definition of`f`

, saying its type ``contains type variables that cannot be generalized''. Here is a similar example in Haskell98:main = do x <- return [] print (x == x) let z = (x == "") return ()

As written, it type-checks and runs. If we delete the definition of the seemingly irrelevant (and also later-bound)`z`

, the type checker complains, about ``Ambiguous type variable `t0' arising from a use of `==' in x==x''. Sometimes dead may help alive, by constraining them.Most surprisingly, dead code may affect the result of the program -- even though it is not even executed. Here is a Haskell98 example:

instance Num Char main = do x <- return [] print x let y = abs (head x) let z = (x == "") return ()

As written, the program prints`""`

. If we remove the irrelevant`z`

definition, the printed result changes to`[]`

. Incidentally, the program has another irrelevant definition, of`y`

. What happens if that is removed?Here is an even simpler Haskell98 example, distilling the problem that Andreas Abel's student ran into when doing RGB-conversion in some graphics-related code. It is not just an academic problem.

main = do x <- return (2^65) print (x == 0) let z = x > length [] return ()

The program prints`True`

; if we remove the`z`

definition, the result changes to`False`

.The context-sensitivity of type inference per se should not be surprising. Type inference, in essence, guesses the type of each subexpression so to make the whole expression typeable. What makes the algorithm deterministic is `lazy guessing': representing the guess by a fresh type variable. As the inference proceeds and more context is seen, something, hopefully, would tell the concrete type to be substituted for the type variable.

The problem behind all our examples is the scope of the introduced type variables. One, `natural' scope is the scope of the current definition: the let-binding. If some type variables are left undetermined at the end of type checking the definition, they are generalized. What if the generalization is impossible/inapplicable? One design choice is to reject the definition as ill-typed. As Andreas Abel indicated, that is what Agda does, in effect. And also MetaOCaml, but for the generated-code variables; as soon as a variable is caught leaving the scope of its intended binder, the code generation process is terminated.

I guess, confining type variables to the scope of program definitions was considered draconian by language designers. They allowed the type variables to float into a larger context, in the hope that the concrete type for them will eventually be found. Therefore, the type of a defined identifier gets determined not only by its definition but also by its use. This is the non-compositionality that some, Andreas Abel in particular, find objectionable.

Still, Haskell and OCaml do impose a strict limit on type variables: the compilation unit. If some variables are left unbound and ungeneralized when the whole unit has been type-checked, something has to be done. The OCaml compiler rejects the unit. Haskell attempts to apply the type defaulting rules; if they do not help, the program is declared ambiguous.

Most of the time the non-compositionality is `benign' -- the contexts of use for an identifier, if conflicting, may make the program untypeable, but will not change its result. However, when type defaulting, reflection, overlapping instances, etc. enter the scene, we are up to real surprises.

**References**- The Haskell Wiki page on MonomorphismRestriction collects many more
surprises caused by that Haskell feature

<https://wiki.haskell.org/Monomorphism_restriction>Refined Environment Classifiers: Type- and Scope-Safe Code Generation with Mutable Cells

The global, unrestricted scope of type variables has much in common with variable names in the generated code. The calculus`<NJ>`

developed in the cited paper uses a special `name heap' to track such future-stage names.

- A logic programming system with first-class relations and the
complete decision procedure (e.g., Kanren) can define the pure
Hindley-Milner type checking
*relation*for a language with polymorphic let, sums and products. The type checking relation relates a term and its type: given a term we obtain its type. It can also work in reverse: given a type, we can obtain terms that have this type. Or, we can give a term with blanks and a type with blanks, and ask the relation to fill in the blanks.The code below implements this approach. We use Scheme notation for the source language (we could just as well supported ML or Haskell-like notations). The notation for type terms is infix, with the right-associative arrow. As an example, the end of the file type-inference.scm shows the derivation for

`call/cc`

,`shift`

and`reset`

from their types in the continuation monad. Given the type:(define (cont a r) `((,a -> . ,r) -> . ,r)) (((a -> . ,(cont 'b 'r)) -> . ,(cont 'b 'b)) -> . ,(cont 'a 'b))

within 2 milli-seconds, we obtain the term for shift:(lambda (_.0) (lambda (_.1) ((_.0 (lambda (_.2) (lambda (_.3) (_.3 (_.1 _.2))))) (lambda (_.4) _.4))))

From the Curry-Howard correspondence, determining a term for a type is tantamount to proving a theorem -- in intuitionistic logic as far as our language is concerned. We formulate the proposition in types, for example:(define (neg x) `(,x -> . F)) (,(neg '(a * b)) -> . ,(neg (neg `(,(neg 'a) + ,(neg 'b)))))

This is one direction of the deMorgan law. In intuitionistic logic, deMorgan law is more involved:NOT (A & B) == NOTNOT (NOT A | NOT B)

The system gives us the corresponding term, the proof:(lambda (_.0) (lambda (_.1) (_.1 (inl (lambda (_.2) (_.1 (inr (lambda (_.3) (_.0 (cons _.2 _.3))))))))))

The de-typechecker can also prove theorems in classical logic, via the double-negation (aka CPS) translation. We formulate the proposition:(neg (neg `(,(neg 'a) + ,(neg (neg 'a)))))

and, within 403 ms, obtain its proof:(lambda (_.0) (_.0 (inr (lambda (_.1) (_.0 (inl _.1))))))

The proposition is the statement of the Law of Excluded Middle, in the double-negation translation.Programming languages may help studying logic, indeed.

**Version**- The current version is 1.2, Jan 26, 2006
**References**- type-inference.scm [19K]

Kanren code for the pure Hindley-Milner type inference*relation*relating a term in the lambda-calculus with fixpoint, polymorphic`let`

, sums and products -- and its type. The code contains many comments that explain the notation, the incremental composition of the type checking relation and taking the advantage of the first-class status of Kanren relations for type checking of polymorphic`let`

.logic.scm [9K]

Kanren code that illustrates the application of the inverse typechecker to proving theorems in intuitionistic and classical logics.Deriving a program from its incomplete specification such as type using constraints, background knowledge, heuristics or training set is the subject of inductive programming

<http://www.inductive-programming.org>

- To type-check a use of rank-2 polymorphism, a type checker
generates (
`gensym`

s) a fresh type variable and watches for its scope, following the proof rule for universal introduction. Due to this implementation, rank-2 polymorphism has been used (some would say abused) to express static capabilities and assure a wide variety of safety properties statically, including:- isolation of mutable state threads,
- bounds of array indices,
- orderly access and timely disposal of resources such as file handles,
- evaluating only closed code while manipulating open code,
- lexical scoping of variables in generated code, and
- distinction of perturbation variables in automatic differentiation.

Joint work with Chung-chieh Shan.

**References**- Chung-chieh Shan:
talk at the Shonan Seminar 007 on dependently typed programming.
Shonan Village Center, Japan. September 15, 2011.
Lightweight Dependent-type Programming

Lightweight monadic regions

A few applications of fresh type variables, not counting the ST monad