Haskell is not a dependently-typed language -- at least it was not designed that way. Nevertheless, even Haskell98 admits some form of dependent types, enough to statically prevent the annoying ``head of empty list'' error. After multi-parameter type classes with functional dependencies were introduced, it was quickly (Hallgren, 2001) realized that we can compute with types. HList (2004) has lifted the entire list library to the type level, opening the floodgates of type computation, all the way to SkewLists (Martinez et al., PEPM 2013) and RSA. Later on, type families have made programming with types deceptively more functional. 2004 brought another key realization: polymorphic recursion lets us dynamically select among the family of types and hence makes the selected type in effect depend on a run-time value. Singleton programming and GADTs perfected this pattern. ``This much is clear: many programmers are already finding practical uses for the approximants to dependent types which mainstream functional languages (especially Haskell) admit, by hook or by crook.'' (Altenkirch et al., 2005)
This page is a panorama of approximations to dependently-typed programming. We will see how closely Haskell comes to dependent types, yet not reaching them. No worry: cruder approximations turn out just as, if not more, practically useful as the overweight approaches.
[a]
may depend on other
types -- in our case, the types of list elements -- but not on the
values of those elements or their number. A dependent type does depend
on such dynamic values. For example, in the hypothetical dependent
Haskell, the function to produce a list with n
copies of a given
value could have the following signaturereplicate :: (n::Nat) -> a -> List n a
List n a
is the type of lists with elements of type a
and the
length exactly n
, where n
is a non-negative integer Nat
. To be
more precise, List n a
is a family of types (of lists of various
lengths with elements of type a
); the natural number n
selects one
type of that family: ``indexes within the family''. Such an
indexing is one of the common manifestations of dependent types. The
index n
is the argument of replicate
: it is a value that is not
known until the run-time, when the function is applied.
The function to append length-indexed lists List n a
naturally
has the signature:
append :: List n a -> List m a -> List (n+m) awhere
n+m
is the ordinary addition of two natural numbers n
and m
.
Thus not only values may appear in types but also arbitrary expressions (terms).
The pay-off for including the length of the list in its type is being
able to give head
and tail
the signatures:head :: List (succ n) a -> a tail :: List (succ n) a -> List n athat make their applications to the empty list ill-typed. Taking the head/tail of an empty list -- the functional programming equivalent of the infamous NullPointerException -- is hence statically prevented. We no longer have to puzzle out the location of the
Prelude.head: empty list
exceptions or wonder how to deal with it
in the production code. Such an error simply occur in a well-typed
program.
Curry-Howard correspondence regards types as propositions and programs
as proofs. In simply-typed systems, types (propositions) are made of
type constants such as Int
(atomic propositions), combined by
function arrow (implication), pairing (conjunction), etc. Simply-typed
systems hence correspond to propositional logic. First-order predicate
logic distinguishes between terms (denoting ``things'': elements of
some domain) and propositions, stating properties and relations among
terms. Propositions like Prime(n)
naturally contain terms, n
in
our case. Dependent types thus are the Curry-Howard interpretation of
the first-order predicate logic. Therefore, any program specification
written in first-order logic can be expressed as a
dependent type, to be checked by the compiler. ``Dependently
typed programs are, by their nature, proof carrying code.''
(Altenkirch et al.)
One should distinguish a dependent type (which
depends on a dynamic value) from a polymorphic type such as Maybe
a
. The type List n a
from our running example is indexed by the
value (the list length) and by the type (of its elements): it is both
dependent (in n
) and polymorphic (or, technically, parametric), in a
. In Agda, List
would have the type Nat -> Set -> Set
, which
clearly shows the distinction between the dependency and the
parametric polymorphism. The first argument of List
is a natural
number, an element of the domain Nat
. The second argument of List
is a Set
-- itself a proposition. Parametric polymorphism hence
has the Curry-Howard correspondence to (a fragment of a)
second-order logic. The different nature of the two arguments in List n a
has many consequences: since the list length n
is a
natural number, one may apply to it any operation on natural numbers
such as addition, increment, multiplication, etc. -- as we have seen
in the append
example. On the other hand, there is much less one
can do with types. List concatenation and many similar operations keep
the type of list elements but change its length.
The main issue of dependently-typed systems is illustrated by the following simple example (which will be our running example):
pcomm :: Bool -> List n a -> List m a -> List (n+m) a pcomm b l1 l2 = if b then append l1 l2 else append l2 l1
The type checker has to verify that append l1 l2
and append l2 l1
in the conditional branches have the same type. Recalling the
signature of append
, the type checker has to ascertain that n+m
is
equal to m+n
. Now, deciding if two types are the same involves
determining if two expressions are equal, which is generally undecidable
(think of functions or recursive expressions). There is a bigger
problem: in our example, n
and m
are just variables, whose values
will be known only at run-time. The type-checker, which runs at
compile-time, therefore has to determine that n+m
is equal to m+n
without knowing the concrete values of n
and m
. We know that
natural addition is commutative, but the type-checker does not. It is
usually not so smart to figure out the commutativity from the
definition of addition. Therefore, we have to somehow supply the proof of the commutativity to the type-checker. Programming with
dependent types involves a great deal of theorem proving.
head
or tail
of the empty list in Haskell
are equivalent to the dereferencing of the zero pointer in C/C++ or NullPointerException
in Java. These errors occur because the
domain of the function is smaller than the function's type
suggests. For example, the type of head
says that the function
applies to any list. In reality, it can be meaningfully applied only
to a non-empty list. One can eliminate such errors by giving
functions head
and tail
a more precise type, such as FullList a
.
Languages like Cyclone and Cw do exactly that.We stress that the head-of-empty-list errors can be eliminated now, without any modification to the Haskell type system, without developing any new tool. Already Haskell98 can do that. The same technique applies to OCaml and even Java and C++. The only required advancement is in our thinking and programming style.
Thinking of full lists as a separate type from ordinary, potentially empty lists does affect our programming style -- but it does not have to break the existing code. The new style is easy to introduce gradually. Besides safety, its explicitness makes list processing algorithms more insightful, separating out algorithmically meaningful empty list checks from the redundant safety checks. Let us see some examples.
Assume the following interface
type FullList a -- assume abstract. For an implementation, see below fromFL :: FullList a -> [a] indeedFL :: [a] -> w -> (FullList a -> w) -> w -- an analogue of `maybe' headS :: FullList a -> a tailS :: FullList a -> [a] -- Adding something to a general list surely gives a non-empty list infixr 5 !: class Listable l where (!:) :: a -> l a -> FullList aAll the above functions are total, in particular,
headS
and tailS
.
The operation fromFL
lets us forget that the list is FullList
, giving
the ordinary list. The application headS []
obviously does not type-check.
Less obvious is that the application headS $ some_expression
will also
fail to type check unless we can statically assure some_expression
produces a FullList
. To see how difficult it is to obtain that assurance,
let's take an example. First is the accumulating list reversal function,
written to explicitly use head
and tail
:regular_reverse :: [a] -> [a] regular_reverse l = loop l [] where loop [] accum = accum loop l accum = loop (Prelude.tail l) (Prelude.head l : accum)Let us re-write it using safe head and tail functions:
safe_reverse :: [a] -> [a] safe_reverse l = loop l [] where loop l accum = indeedFL l accum $ (\l -> loop (tailS l) (headS l : accum)) test1 = safe_reverse [1,2,3]That was straightforward: we relied on
indeedFL
to perform case
analysis on the argument list (if it is empty or not). We had to do
this analysis anyway, according to the algorithm. In the case branch
for the non-empty list, we statically know that l
is non-empty and
so the applications of headS
and tailS
are well-typed. Even the
type-checker can see that. Here is another example, which should be
self-explanatory.safe_append :: [a] -> FullList a -> FullList a safe_append [] l = l safe_append (h:t) l = h !: safe_append t l l1 :: FullList Int l1 = 1 !: 2 !: [] -- We can apply safe_append on two FullList without any problems test5 = tailS $ safe_append (fromFL l1) l1 -- [2,1,2]
Before further discussing this programming style and the problem of
backwards compatibility (``do we have to re-write all the code?''),
let us see the implementation. Recall, our goal is to define the type
of FullList
, of lists that are guaranteed to be non-empty. One can see
two approaches: one particular and insightful, and the other is
generic and insightful. We start with the first one: a non-empty list
for sure has at least one element, to be carried around explicitly:
data FullList a = FullList a [a] -- carry the list head explicitly fromFL :: FullList a -> [a] fromFL (FullList x l) = x : l headS :: FullList a -> a headS (FullList x _) = x tailS :: FullList a -> [a] tailS (FullList _ x) = xThe other functions of our interface are equally straightforward, see the source code for details. The undisputed advantage of the implementation is its obvious correctness:
FullList
represents a
non-empty list by its very construction, without any pre-conditions
and reservations. FullList
truly is a non-empty list. There is a
deep satisfaction in finding a data structure that ensures a desired
property by its very construction, where the property is built-in. In
this respect, one can't help but think of Chris Okasaki's work.
There is another approach of representing FullList
, which easily
generalizes to other structures.
module NList (FullList, fromFL, headS, tailS, ...) where newtype FullList a = FullList [a] -- data constructor is not exported! fromFL (FullList x) = x -- The following are _total_ functions -- They are guaranteed to be safe, and so we could have used -- unsafeHead# and unsafeTail# if GHC provided them. headS :: FullList a -> a headS (FullList (x:_)) = x tailS :: FullList a -> [a] tailS (FullList (_:x)) = xWe introduce the abstract data type
FullList
-- abstract in the
sense that its constructor, also named FullList
, is not exported.
The only way to construct and manipulate the values of that type is to
use the operations exported by the module. All the exported operations
ensure that FullList
represents a non-empty list.
One may regard the abstract type FullList
as standing for the proposition
(invariant) that represented list is non-empty. Now we have something to
prove: we have to verify, manually or semi-automatically, that all
operations within the module NList
whose return type is FullList
respect the invariant and ensure the truth of the non-emptiness proposition.
Once we have verified these exported constructors, all operations
that consume FullList
, within NList
or outside,
can take this non-emptiness proposition for granted. Therefore, we are
justified in using unsafe-head operations in implementing headS
. Compared
to the first implementation, the fact that FullList
represents a
non-empty list is no longer obvious and has to be proven. Fortunately,
we only have to prove the operations within NList
, that
is, the ones that make use of the data constructor FullList
. All other
functions, which produce FullList
merely by invoking the operations
of NList
, ensure the non-emptiness invariant by construction and do not
need a proof. The advantage of the second implementation is the easy
generalization to ByteString
-like
packed lists, etc. The data construction FullList
is merely a newtype
wrapper, with no run-time overhead. Thus the second implementation
provides the safety of head and tail operations without sacrificing
efficiency.
In the old (2006) discussion of non-empty lists on Haskell-Cafe,
Jan-Willem Maessen wrote: ``In addition, we have this rather nice
assembly of functions which work on ordinary lists. Sadly, rewriting
them all to also work on NonEmptyList or MySpecialInvariantList is a
nontrivial task.'' Backwards compatibility is indeed a serious concern:
no matter how better a new programming style may be, the mere thought
of re-writing the existing code is a deterrent. Suppose we have a function foo:: [a] -> [a]
(whose code, if available, we'd rather not change) and we want to
write something like
\l -> [head l, head (foo l)]The first attempt of using the safe functions
\l -> indeedFL l onempty (\l -> [headS l, headS (foo l)])does not type:
foo
applies to [a]
rather than FullList a
;
furthermore, the result of foo
is not FullList a
, required by headS
. The first problem is easy to solve: FullList a
can always
be cast into the general list, with fromFL
. We insist on writing the
latter function explicitly, which keeps the type system simple, free of
subtyping and implicit coercions. One may regard fromFL
as an
analogue of fromIntegral
-- which, too, we have to write explicitly,
in any code with more than one sort of integral numbers (e.g., Int
and Integer
, or Int
and CInt
).
If we are not sure if our foo
maps non-empty lists
to non-empty lists, we really should handle the empty list case:
\l -> indeedFL l onempty $ \l -> [headS l, indeedFL (foo $ fromFL l) onempty' headS]If we have a hunch that
foo
indeed maps non-empty lists to non-empty lists,
but we are too busy to verify it, we can write\l -> indeedFL l onempty $ \l -> [headS l, indeedFL (foo $ fromFL l) (error msg) headS] where msg = "I'm quite sure foo maps non-empty lists to " ++ "non-empty lists. I'll be darned if it doesn't."That would get the code running. Possibly at some future date (during the code review?) we will be called to justify the hunch, to whatever required degree of formality (informal argument, formal proof). If we fail at this justification, we'd better think what to do if the result of
foo
turns out empty. If we
succeed, we would be given permission to add to the module NList
the following definition:nfoo (FullList x) = FullList $ foo xafter which we can write
\l -> indeedFL l onempty (\l -> [headS l, headS (nfoo l)])with no extra empty list checks.
In conclusion, we have demonstrated the programming style that ensures safety without sacrificing efficiency. The key idea is that an abstract data type ensures (possibly quite sophisticated) propositions about the data -- so long as the very limited set of basic constructors satisfy the propositions. This main idea is very old, advocated by Milner and Morris in the mid-1970s. If there is a surprise in this, it is in the triviality of approach. One can't help but wonder why we do not program in this style.
James H. Morris Jr.: Protection in Programming Languages Comm. of the ACM, 1973, V16, N1, pp. 15-21
< http://www.erights.org/history/morris73.pdf >
Lightweight dependent typing
Longer explanations of the technique, justification, formalization, and
more complex examples
The FullList library was first presented and discussed on Haskell-Cafe in November 2006 and later summarized on Haskell Wiki
< https://www.haskell.org/haskellwiki/Non-empty_list >
The present article is the expanded and elaborated version.
We solve the configurations problem in Haskell using only
stable and widely implemented language features like the type-class
system. In our approach, a term expression can refer to run-time
configuration parameters as if they were compile-time constants in
global scope. Besides supporting such intuitive term notation and
statically guaranteeing separation, our solution also helps improve
the program's performance by transparently dispatching to specialized
code at run-time. We can propagate any type of configuration
data -- numbers, strings, IO
actions, polymorphic functions,
closures, and abstract data types. No previous approach to
propagating configurations implicitly in any language provides the
same static separation guarantees.
The enabling technique behind our solution is to propagate values via types, with the help of polymorphic recursion and higher-rank polymorphism. The technique essentially emulates local type-class instance declarations while preserving coherence. Configuration parameters are propagated throughout the code implicitly as part of type inference rather than explicitly by the programmer. Our technique can be regarded as a portable, coherent, and intuitive alternative to implicit parameters. It motivates adding local instances to Haskell, with a restriction that salvages principal types.
Joint work with Chung-chieh Shan.
term4'
on page 9 should
be read as test4'
.Prepose.hs [10K]
The complete code for the paper, updated for the current GHC
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML