From oleg-at-okmij.org Thu Aug 5 20:31:36 2004 To: haskell@haskell.org Subject: Eliminating Array Bound Checking through Non-dependent types From: oleg-at-pobox.com Message-ID: <20040806023136.1A5A8AB7E@Adric.metnet.navy.mil> Date: Thu, 5 Aug 2004 19:31:36 -0700 (PDT) Status: OR There is a view that in order to gain static assurances such as an array index being always in range or tail being applied to a non-empty list, we must give up on something significant: on data structures such as arrays (to be replaced with nested tuples), on general recursion, on annotation-free programming, on clarity of code, on well-supported programming languages. That does not have to be the case. This message shows a non-trivial example involving native Haskell arrays, index computations, and general recursion. All arrays indexing operations are statically guaranteed to be safe -- and so we can safely use an efficient unsafeAt provided by GHC seemingly for that purpose. The code is efficient; the static assurances cost us no run-time overhead. The example uses only Haskell98 + higher-ranked types. No new type classes are introduced. The safety is based on: Haskell type system, quantified type variables, and a compact general-purpose trusted kernel. Our example is `bsearch', taken from the famous paper "Eliminating Array Bound Checking Through Dependent Types" by Hongwei Xi and Frank Pfenning (PLDI'98). Hongwei Xi's code was written in SML extended with a restricted form of dependent types. Here is the original code of the example (taken from Figure 3 of that paper, see also http://www-2.cs.cmu.edu/~hwxi/DML/examples/) ] datatype 'a answer = NONE | SOME of int * 'a ] ] assert sub <| {n:nat, i:nat | i < n } 'a array(n) * int(i) -> 'a ] assert length <| {n:nat} 'a array(n) -> int(n) ] ] fun('a){size:nat} ] bsearch cmp (key, arr) = ] let ] fun look(lo, hi) = ] if hi >= lo then ] let ] val m = (hi + lo) div 2 ] val x = sub(arr, m) ] in ] case cmp(key, x) of ] LESS => look(lo, m-1) ] | EQUAL => (SOME(m, x)) ] | GREATER => look(m+1, hi) ] end ] else NONE ] where look <| ] {l:nat, h:int | 0 <= l <= size /\ 0 <= h+1 <= size } int(l) * int(h) ] -> 'a answer ] in ] look (0, length arr - 1) ] end ] where bsearch <| ('a * 'a -> order) -> 'a * 'a array(size) -> 'a answer The text after `<|' are dependent type annotations. They _must_ be specified by the programmer -- even for internal functions such as `look'. Here's our code, deliberately written to be close to Hongwei Xi's code (This message is the complete code). The file eliminating-array-bound-check-literally.hs presents another, more literal translation of the same bsearch code in Haskell98 + higher-rank types. > {-# OPTIONS -fglasgow-exts #-} > module Dep where > import Data.Array > > bsearch cmp (key, arr) > = brand arr (\arrb -> bsearch' cmp (key, arrb)) Nothing > > bsearch' cmp (key,(arr,lo,hi)) = look lo hi > where > look lo hi = let m = bmiddle lo hi > x = arr !. m > in case cmp (key,x) of > LT -> bpred lo m (look lo) Nothing > EQ -> Just (unbi m, x) > GT -> bsucc hi m (\m' -> look m' hi) Nothing This code is just as algorithmically efficient as the Dependent SML code: one middle index computation, one element comparison, one index comparison, one index increment or decrement per iteration. There are no type annotations as none are needed. Operator (!.) is a statically safe array indexing operator. The type system and the trust properties of the kernel below guarantee that in the expression "arr !. m" the index `m' is positively in range of the array `arr' bounds. > barr1 = listArray (5,5 + (length s)-1) s where s = "abcdefgh" > btest1 = bsearch (uncurry compare) ('c',barr1) > btest2 = bsearch (uncurry compare) ('x',barr1) > btest3 = bsearch (uncurry compare) ('x',listArray (0,-10) []) The code relies on a compact general-purpose trusted kernel explained below. That code should be preferably put into a separate module. First we introduce tags for Branded arrays and Branded indices: > -- those two must *not* be exported! > newtype BArray s i a = BArray (Array i a) > newtype BIndex s i = BIndex i > > unbi (BIndex i) = i These are `newtype's and so impose no run-time overhead. Of interest is a phantom type variable 's', which marks a _brand_ of an array and of an array index. An index is branded if it is certainly within the range of the array of its brand. The type variable 's' is similar to that in the ST monad. The latter relies on 's' to enforce serialization. We, OTH, do not impose any linearity constraints on 's' -- it may be freely duplicated (see bbounds below) and discarded (see `unbi'). It is created however under controlled conditions. The safety depends on the trusted way of creating branded types: the constructors BIndex and BArray should be used in the trusted kernel only, and should not be available anywhere else. The uniqueness of 's' afforded by the explicit universal quantification prevents mixing up of different brands. We must re-iterate that safety depends on assurances of the code that constructs BIndex values. Because of the high assurance, we must formulate the safety properties as propositions, and prove them. Fortunately, the code below is compact and straightforward, as well as general purpose. We start with an introduction rule for BArray: > brand:: (Ix i, Integral i) => Array i e > -> (forall s. (BArray s i e, BIndex s i, BIndex s i) -> w) > -> w -> w > brand (a::Array i e) k kempty = > let (l,h) = bounds a > in if l <= h then k ((BArray a)::BArray () i e, BIndex l, BIndex h) > else kempty The function takes an array and two continuations. If the array is empty (that is, its lower bound is greater than its upper bound), kempty is returned. Otherwise, the continuation k is invoked, which receives the branded array along with two branded indices, for the lower and the upper bounds of the array. Proposition: For the argument (a,l,h) of the continuation k, the branded indices l and h are within the bounds of the non-empty branded array a. Proof: from the semantics of the function `bounds', taken here as an axiom, and the code itself. The function brand has a higher-rank type. It is the existential quantification of 's' as well as the absence of BArray constructor elsewhere guarantee that the same brand entails the same bounds. > bmiddle:: (Integral i) => BIndex s i -> BIndex s i -> BIndex s i > bmiddle (BIndex i1) (BIndex i2) = BIndex ((i1 + i2) `div` 2) Proposition: l <= i1 <= h, l <= i2 <= h |- l <= (i1 + i2) `div` 2 <= h Proof: plain arithmetics. We should stress that the type of bmiddle assures that all indices involved have the same brand -- that is, the same lower and upper boundaries. A brand 's' is a (type-level) representation of index bounds. At compile time, we don't know what they are, but the unforgeability of 's' statically guarantees that the same 's' represents the same bounds. > bsucc:: (Integral i) > => BIndex s i -> BIndex s i -> (BIndex s i -> r) -> r -> r > bsucc (BIndex upb) (BIndex i) on_within on_out > = let i' = i + 1 > in if i' <= upb then (on_within (BIndex i')) else on_out The function `bsucc' takes two branded indices that correspond to the same bounds (see the variable 's'). The function also takes two continuations, on_within and on_out. The first index is considered to be an upper limit. The function increments the second index. If the result does not exceed the upper limit, we invoke on_within and pass it the result. Otherwise, we invoke `on_out'. Safety Proposition: l <= upb <= h, l <= i <= h, (i+1) <= upb |- l <= (i+1) <= h Proof: from i < (i+1) and properties of inequalities. The safety proposition justifies our use of the data constructor BIndex. > bpred:: (Integral i) > => BIndex s i -> BIndex s i -> (BIndex s i -> r) -> r -> r > bpred (BIndex lwb) (BIndex i) on_within on_out > = let i' = i - 1 > in if i' >= lwb then (on_within (BIndex i')) else on_out The dual of `bsucc'. Safety Proposition: l <= lwb <= h, l <= i <= h, (i-1) >= lwb |- l <= (i-1) <= h Because a branded index is assuredly within the bounds of the array of the same brand, we can write > infixl 5 !. > (!.):: (Ix i) => BArray s i e -> (BIndex s i) -> e > (BArray a) !. (BIndex i) = a ! i actually, we may _safely_ replace a ! i with `unsafeAt a i'