Lightweight Dependent-type Programming

This is a joint work with Chung-chieh Shan. We describe several approaches to lightweight dependent-type programming, letting us gain experience with dependent types on existing programming language systems, such as the available Haskell or ML compilers.

All these lightweight approaches rely on type-level proxies for values, so we can statically express properties (e.g., equality, inequality) of the values that are not generally known until the run time.

This much is clear: many programmers are already finding practical uses for the approximants to dependent types which mainstream functional languages (especially Haskell) admit, by hook or by crook.''[WDTM].

What is a dependent type

In general, dependent type is a type that is predicated upon a (dynamic) value [WDTM], [LtU-DT]. Most commonly, this dependency of a type takes the form of a term indexing within a family of types. As Neel Krishnaswami well put it, dependent types are the Curry-Howard interpretation of first-order logic'' [LtU-DT]. That is, propositions correspond to types, and so a formula P(n) expressing an assertion about a particular element n in the domain of discourse corresponds to a type that depends on the particular (dynamic) value -- or on an expression, term, that will compute that dynamic value. Therefore, any program specification that can be written in first-order logic can be expressed as a typechecking constraint. Dependently typed programs are, by their nature, proof carrying code'' [WDTM].

The obvious main issue with dependent typechecking is deciding type equality. In the type indexing approach, types may include terms, and so deciding if two types are equal requires determination if two terms compute (or denote) the same value. The extensional equality of terms is, in presence of general recursion, generally undecidable. The issue of dependent types and general recursion, and various solutions, are elaborated in [WDTM].

ADT -- the simplest dependent-type programming technique

Because most (statically typed) programming languages do not permit value expressions appear in type expressions, one may think that these languages cannot encode in their types the fact, e.g., that some proposition P(n) holds over all or some integers n. The type system of Haskell and ML permits types (predicates) that are parameterized over type variables. However, the type variables range over types -- sets of values -- rather than over individual values in the domain of discourse.

Abstract data types, however, already offer the simplest way to encode some propositions about values. For example, in OCaml, evaluating

     module S1 =
Set.Make(struct type t = int
let compare x y = if x = y then 0
else if x > y then -1 else 1 end);;

yields
     module S1 :
sig
type elt = int
type t
val empty : t
val is_empty : t -> bool
val mem : elt -> t -> bool
val add : elt -> t -> t
val singleton : elt -> t
...
end

where the type of the set, S1.t, is abstract. The only way to obtain values of that type is to use the members of the module S1. That is, whatever set-related invariant is enforced by the code of module S1, remains in force throughout the program. In particular, if S1 enforces the invariants
     P(n) === S1.mem n (S1.singleton n) = true
Q(n) === forall m \in int. m /= n -> S1.mem m (S1.singleton n) = false

these invariants hold all throughout, for all integers. The abstract type S1.t represents, at the level of types, the above properties of integers. The static property Q(n) is actually quite strong and cannot be efficiently represented by a run-time contract, due to the forall m \in int part.

Of course the invariant actually enforced by an implementation of the data type may differ from the invariant we think it enforces. Thus an implementation becomes a trusted kernel'', which has to be verified, using formal tools if necessary. It goes without saying that verifying an implementation of the Set interface is far easier than verifying the whole program that uses Set.

It should be pointed out that guarantees of abstract data types come from the lack of reflection, extension, and introspection of abstract data types. OCaml, incidentally, is weaker in this respect because polymorphic equality can break some of the invariants. So, reflection, extension and introspection facilities of a language can be viewed as vices rather than virtues.

The paper and the talk on lightweight static capabilities (see below) discuss this topic in more (formal) detail. The idea of abstract data type's statically representing and enforcing sophisticated properties of run-time values was Robin Milner's principal idea in the design of Edinburgh LCF back in the early 1970s.

Singleton types

The type system of Haskell or ML lets us express propositions of the form P(n) where n is either a type or a type variable that ranges over types. Seemingly, we cannot express a proposition P(n) where n ranges over individual members of a set (e.g., individual integers) rather than over sets (i.e., types of integers). This inexpressibility is superficial however: if we define particular types such that the corresponding sets of values consist of only one value, the conceptual barrier between elements and sets of elements disappears. For example:

     data Zero   = Zero
data Succ a = Succ a


Here, there is only one (non-bottom) value of the type Zero, there is only one (non-bottom) value of the type Succ (Succ Zero), etc. There is the obvious correspondence between the family of types induced by the two declarations and non-negative integers. With more sophistication (well-formedness constraints expressed in type classes) we can easily represent, in types, propositions about integers''. Quite sophisticated propositions can indeed be expressed this way: [Blume01], [number-parameterized-types]. Singleton types have been introduced by Hayashi (1991) and first used for dependent type programming by Xi.

Branding: type proxies

A language with higher-rank types or existential type quantification lets us introduce type proxies for the values. We can associate a value with some type in such a way that type equality entails value equality, even if the values in question are not known statically (e.g., read from a file).

The paper [Implicit-Config] used singleton-type families to generate such proxies. The paper demonstrated, for example, how to write arithmetic expressions over implicit modulus and statically guarantee that within a (potentially, quite complex and impure expression) the modulus is the same -- even if the modulus is not known until the run time.

One can build type proxies for values without resorting to singleton types. Existential quantification suffices. This approach shares with abstract data types the insight that opaque, unforgeable types can represent propositions on values. This idea is extended and formalized in the next section.

Lightweight static capabilities

Abstract and Introduction

This paper [lightweight-static-capabilities] is written with Chung-chieh Shan, who presented it at the PLPV 2006 workshop. We describe a modular programming style that harnesses modern type systems to verify safety conditions in practical systems. This style has three ingredients: (i) A compact kernel of trust that is specific to the problem domain; (ii) Unique names (capabilities) that confer rights and certify properties, so as to extend the trust from the kernel to the rest of the application; (iii) Static (type) proxies for dynamic values. We illustrate our approach using examples from the dependent-type literature, but our programs are written in Haskell and OCaml today, so our techniques are compatible with imperative code, native mutable arrays, and general recursion. The three ingredients of this programming style call for (1) an expressive core language, (2) higher-rank polymorphism, and (3) phantom types.

This paper demonstrates a lightweight notion of static capabilities that brings together increasingly expressive type systems and increasingly accessible program verification. Like many programmers, we want to assure safety conditions: array indices remain within bounds; modular arithmetic operates on numbers with the same modulus; a file or database handle is used only while open; and so on. The safety conditions protect objects such as arrays, modular numbers, and files. Our overarching view is that a static capability authorizes access to a protected object and simultaneously certifies that a safety condition holds. Rather than proposing a new language or system, our contribution is to substantiate the slogan that types are capabilities, today: we use concrete and straightforward code in Haskell and OCaml to illustrate that a programming language with an appropriately expressive type system is a static capability language.

The follow-up paper [lightweight-static-resources] describes further applications, for safe embedded and systems programming, ensuring, among other properties, proper alignment when accessing raw memory areas. The paper also introduces kind-level static capabilities to enforce invariants of type-level programming.

Formalization and proofs

• The so-called strict' type system that assures safety of list or array accesses is presented in [lsc-talk]. The system has a dependent-type flavor. The talk then shows how a relaxation of the system to the Hindley-Milner system with higher-ranked types can preserve the safety properties, provided branding operations are isolated in a separate kernel, and the kernel correctly implements the formally stated semantics of these operations.
• safety.elf [17K]
Twelf code with proofs of soundness of the strict' type system for the language made of System F plus lists and non-empty lists. The progress theorem assures that a well-typed program shall not attempt to take the head or tail of an empty list.
• safety-array.elf [32K]
Twelf code verifying manual proofs of soundness of the strict' type system for the language made of System F plus arrays' whose type reflects their size. The progress theorem states the safety property: a well-typed program shall not attempt to access an out-of-bounds element of an array.

Accompanying source code

• Binary search with safe array access and no array bound checks: the example from Xi and Pfenning's PLDI 1998 paper.
eliminating-array-bound-check-literally.ml [12K]
The OCaml code patterned after Xi's Dependent ML code from the PLDI 1998 paper.
eliminating-array-bound-check-literally.hs [8K]
The corresponding Haskell code. The code is written in Haskell98 with a sole extension for higher-ranked types.
eliminating-array-bound-check.lhs [9K]
A slightly more general literate Haskell version that accounts for Haskell arrays with arbitrary lower and upper bounds.
• Eliminating array bound checks in multiple arrays
eliminating-mult-array-bound-check.lhs [13K]
folding over multiple arrays of various sizes.
• KMP-deptype.hs [14K]
The Haskell code for Knuth-Morris-Pratt (packed) string matching. The safety of all string/array access operations is guaranteed statically.
• reverse.ml [2K]
The OCaml code for the introductory example of the paper: list reversal with safe list access.
• reverse.ml [2K]
The OCaml code for the introductory example of the paper: list reversal with safe list access.

The question of verification

The lightweight approaches depend on a trusted kernel. How to make sure the trusted library deserves our trust? The same question exists for any other dependent-type system: how we make sure that our Oracle is correct, the typing rules are correct, and, mainly, that the implementation of those rules in a compiler is correct?

I have heard a similar question asked of J. Strother Moore and J. Harrison. J. Strother Moore said that most of ACL2 is built by bootstrapping, from lemmas and strategies that ACL2 itself has proven. However, the core of ACL2 just has to be trusted. ACL2 has been used for quite a while and so there is a confidence in its soundness. Incidentally, both NSA and NIST found this argument persuasive, when they accepted proofs by ACL2 as evidence of high assurance, in awarding Orange book A1 and IFIPS 140-1 ratings -- the highest security ratings -- to some products.

In a great more detail, this issue is explored in Randy Pollack's How to believe a machine-checked proof''[Believing].

Incidentally, even in Mathematics, it is not at all resolved what exactly constitutes a mathematical proof and how much trust, including personal trust, is involved: [Theoretical-Math].

Genuine Dependent-type systems

Section 2, Related work'' of [WDTM] gives a comprehensive list of genuine Dependent-Type systems, such as Cayenne, ATS, Epigram, DML, Omega.

References

[WDTM] Thorsten Altenkirch, Conor McBride, and James McKinna: Why Dependent Types Matter. April 2005.
<http://www.cs.nott.ac.uk/~txa/publ/ydtm.pdf>

[LtU-DT] Dependent types: literature, implementations and limitations ?
A thread on Lambda the Ultimate started on Nov 21, 2005
<http://lambda-the-ultimate.org/node/view/1129>

[Implicit-Config] Implicit configurations -- or, type classes reflect the values of types
Joint work with Chung-chieh Shan.

[Blume01] Matthias Blume: No-Longer-Foreign: Teaching an ML compiler to speak C natively''
In BABEL'01: First workshop on multi-language infrastructure and interoperability, September 2001, Firenze, Italy.
<http://people.cs.uchicago.edu/~blume/pub.html>

[lightweight-static-capabilities] lightweight-static-capabilities.pdf [334K]
Lightweight static capabilities. Joint work with Chung-chieh Shan.
Electr. Notes Theor. Comput. Sci, 174(7), pp. 79-104, 2007.

[lsc-talk] Lightweight static capabilities. Slides of the talk at the IJCAR'06 workshop Programming Languages meet Program Verification'' (PLPV'06). Seattle, WA, August 21, 2006.
<http://www.cs.rutgers.edu/~ccshan/capability/talk.pdf>

[tagless-typed-tc] Metatypechecking: Staged Typed Compilation into GADT via typeclasses. Computing the type from a value

[XiThesis] Hongwei Xi: Dependent Types in Practical Programming
Ph.D thesis, Carnegie Mellon University, September 1998.
<http://www.cs.bu.edu/~hwxi/>

[Believing] Randy Pollack: How to believe a machine-checked proof
<http://www.dcs.ed.ac.uk/home/rap/export/believing.ps.gz>

[lightweight-static-resources] Lightweight static resources: Typed memory areas and time-parameterized monads for safe embedded and systems programming

[Theoretical-Math] Arthur Jaffe, Frank Quinn: Theoretical mathematics'': Toward a cultural synthesis of mathematics and theoretical physics
Bull.Am.Math.Soc. 29 (1993) 1-13.
<http://arxiv.org/abs/math.HO/9307227>
William P. Thurston: On Proof And Progress In Mathematics
Bull.Am.Math.Soc. 30 (1994) 167-177.
<http://arxiv.org/abs/math.HO/9404236>
Michael Atiyah, Armand Borel, G. J. Chaitin, Daniel Friedan, James Glimm, Jeremy J. Gray, Morris W. Hirsch, Saunder MacLane, Benoit B. Mandelbrot, David Ruelle, Albert Schwarz, Karen Uhlenbeck, Rene' Thom, Edward Witten, Christopher Zeeman: Responses to Theoretical Mathematics: Toward a cultural synthesis of mathematics and theoretical physics'', by A. Jaffe and F. Quinn
Bull.Am.Math.Soc. 30 (1994) 178-207.
<http://arxiv.org/abs/math.HO/9404229>
Arthur Jaffe, Frank Quinn: Response To Comments On Theoretical Mathematics''
Bull.Am.Math.Soc. 30 (1994) 208-211.
<http://arxiv.org/abs/math/9404231>

Last updated January 1, 2008

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org