From oleg-at-okmij.org Tue Oct 11 17:25:24 2005 To: haskell-at-haskell.org Subject: How to zip folds: A library of fold transformers In-Reply-To: <20051006183104.GB595@sleepingsquirrel.org> Message-ID: <20051012002524.7A387ACB2@Adric.metnet.navy.mil> Date: Tue, 11 Oct 2005 17:25:24 -0700 (PDT) X-Comment: updated on Jun 16, 2008: added the standard list API (cons, empty, null, head, tail) and two examples of surprisingly simple expressions of list intersperse and Fibonacci in terms of fold. Status: RO We show how to merge two folds into another fold `elementwise'. Furthermore, we present a library of (potentially infinite) ``lists'' represented as folds (aka streams, aka success-failure-continuation--based generators). Whereas the standard Prelude functions such as |map| and |take| transform lists, we transform folds. We implement the range of progressively more complex transformers -- from |map|, |filter|, |takeWhile| to |take|, to |drop| and |dropWhile|, and finally, |zip| and |zipWith|. The standard list API is also provided. Emphatically we never convert a stream to a list and so we never use recursion or recursive types. All iterative processing is driven by the fold itself. The implementation of zip also solves the problem of ``parallel loops''. One can think of a fold as an accumulating loop. One can easily represent a nested loop as a nested fold. Representing parallel loop as a fold is a challenge, answered at the end of the message. This library is inspired by Greg Buchholz' message on the Haskell-Cafe list and is meant to answer open questions posed at the end of that message http://www.haskell.org/pipermail/haskell-cafe/2005-October/011575.html This message a complete literate Haskell code. > {-# LANGUAGE Rank2Types #-} > module Folds where First we define the representation of a list as a fold: > newtype FR a = FR (forall ans. (a -> ans -> ans) -> ans -> ans) > unFR (FR x) = x It has a rank-2 type. The defining equations are: if flst is a value of a type |FR a|, then unFR flst f z = z if flst represents an empty list unFR flst f z = f e (unFR flst' f z) if flst represents the list with the head 'e' and flst' represents the rest of that list From another point of view, |unFR flst| can be considered a _stream_ that takes two arguments: the success continuation of the type |a -> ans -> ans| and the failure continuation of the type |ans|. This is the encoding of lists in a (non-simply-typed) lambda-calculus. The LogicT paper discusses such types in detail, and shows how to find that "rest of the list" flst'. The slides of the ICFP05 presentation by Chung-chieh Shan point out to more related work in that area. But we are here to drop, take, dropWhile, etc. Our functions will take a stream and return another stream, of the |FR a| type, which represents truncated, filtered, etc. source stream. Let us define two sample streams: a finite and an infinite one: > stream1 :: FR Char > stream1 = FR (\f unit -> foldr f unit ['a'..'i']) > stream2 :: FR Int > stream2 = FR (\f unit -> foldr f unit [1..]) We may also consider an integer to be a stream (of all of its predecessors, starting from zero): > nstream :: (Ord a, Enum a, Num a) => a -> FR a > nstream n = FR (fold_num n) > where fold_num n f unit | n <= 0 = unit > fold_num n f unit = let n' = pred n in f n' (fold_num n' f unit) The internal function fold_num is primitively recursive. It may as well have been a primitive, like foldr. We define a way to show the stream. This is the only time we convert |FR a| to a list -- so we can more easily show it. > instance Show a => Show (FR a) where > show l = show $ unFR l (:) [] *Folds> show stream1 "\"abcdefghi\"" *Folds> show (nstream 5) "[4,3,2,1,0]" We start by implementing the primitive list operations: constructors and deconstructors: > scons :: a -> FR a -> FR a > scons x l = FR(\f z -> f x (unFR l f z)) > > sempty :: FR a > sempty = FR (\f z -> z) > > snull :: FR a -> Bool > snull l = unFR l (\a z -> False) True > > shead :: FR a -> Maybe a > shead l = unFR l (\a z -> Just a) Nothing > > stail :: FR a -> FR a > stail = sdrop 1 -- see below The functions shead and stail, unlike the corresponding Data.List functions, are total. The function shead has the Maybe result type, returning Nothing if the list is empty; stail returns the empty list when applied to the empty list. The function sdrop is defined below. Here is another stream: > stream3 = True `scons` (False `scons` sempty) *Folds> show stream3 "[True,False]" *Folds> shead stream3 Just True *Folds> shead (stail stream3) Just False *Folds> shead (stail (stail stream3)) Nothing *Folds> snull stream3 False *Folds> snull (stail (stail stream3)) True Since we support the minimal API for lists (cons, empty, null, head, tail), we are done. However, any non-trivial list processing using only these operations involves explicit recursion -- which we would like to avoid. It is not generally possible in Haskell to statically restrict a recursive definition to be primitively recursive. If we avoid explicit recursion relying only on |FR a|, and trust that the corresponding stream is well-founded, our list processing operation will surely terminate. Thus |FR a| can be regarded as a restricted, `good' form of recursion (either induction or co-induction, depending on the stream). Thus we must strive to write list processing operations using the fold implicit in |FR a| rather than using |shead| and |stail|. Granted, there are some list processing operations (non-primitively recursive) that cannot be expressed via |FR a| without additional recursion. In that case, |shead| and |stail| are useful. In some other cases, using |FR a| directly may appear quite awkward. In these cases, the derived functions below like |sdrop|, |szipWith|, etc. may help. At the end we show that even some more interesting list processing tasks such as intersperse or computing Fibonacci numbers can be _simply_ expressed via folds. The map function is trivial: > smap :: (a->b) -> FR a -> FR b *> smap f l = FR(\g -> unFR l (g . f)) which can also be written as > smap f l = FR((unFR l) . (flip (.) f)) For example, > test1 = show $ smap succ stream1 Next is the filter function: > sfilter :: (a -> Bool) -> FR a -> FR a > sfilter p l = FR(\f -> unFR l (\e r -> if p e then f e r else r)) > test2 = sfilter (not . (`elem` "ch")) stream1 The function takeWhile is quite straightforward, too > stakeWhile :: (a -> Bool) -> FR a -> FR a > stakeWhile p l = FR(\f z -> unFR l (\e r -> if p e then f e r else z) z) > test3 = stakeWhile (< 'z') stream1 > test3' = stakeWhile (< 10) stream2 As we can see, stakeWhile well applies to an infinite stream. The functions take, drop, dropWhile ask for more complexity. > stake :: (Ord n, Num n) => n -> FR a -> FR a > stake n l = FR(\f z -> > unFR l (\e r n -> if n <= 0 then z else f e (r (n-1))) (const z) n) > test4 = stake 20 stream1 > test4' = stake 5 stream1 > test4'' = stake 11 stream2 > test4''' = (stake 11 . smap (^2)) stream2 The function sdrop shows the major deficiency: we're stuck with the (<=0) test for the rest of the stream. In this case, some delimited continuation operators like `control' can help, in limited circumstances. > sdrop :: (Ord n, Num n) => n -> FR a -> FR a > sdrop n l = FR(\f z -> > unFR l (\e r n -> if n <= 0 then f e (r n) else r (n-1)) (const z) n) > test5 = sdrop 20 stream1 > test5' = sdrop 5 stream1 > test5'' = stake 5 $ sdrop 11 stream2 The function dropWhile becomes straightforward > sdropWhile :: (a -> Bool) -> FR a -> FR a > sdropWhile p l = FR(\f z -> > unFR l (\e r done -> > if done then f e (r done) > else if p e then r done else f e (r True)) (const z) False) > test6 = sdropWhile (< 'z') stream1 > test6' = sdropWhile (< 'd') stream1 > test6'' = stake 5 $ sdropWhile (< 10) stream2 The zip function might appear complex. However, if we recall that we already have stake and sdrop (so that we can `split' an FR stream into the head element (stake 1) and the rest (sdrop 1)), the code for szipWith becomes trivial: > szipWith :: (a->b->c) -> FR a -> FR b -> FR c > szipWith t l1 l2 = > FR(\f z -> > unFR l1 (\e1 r r2 -> > unFR r2 (\e2 _ -> f (t e1 e2) (r (sdrop 1 r2))) z) (const z) l2) > szip :: FR a -> FR b -> FR (a,b) > szip = szipWith (,) The code for szipWith is not recursive and needs no recursive types. Let us take a closer look at the code: at first blush it looks like a nested fold (``nested loop''). But then we see that the nested expression operates on the progressively shorter list l2. That is, the 'tail' of l2 becomes the 'seed' of the outer fold represented by l1. The szipWith code demonstrates that `unFR l1' instantiates 'ans' to the type of 'l2' -- that is, to the type FR itself. We definitely have impredicativity -- the quantified type variable in FR may be instantiated to the type that is being defined by FR. Thus the higher-rank type of FR is not a nicety and is not merely a safety property -- it is the necessity. The latter follows from the general fact that a predecessor of a Church numeral (aka, `tail') is not expressible in simply-typed lambda-calculus. One can easily prove that the function szip does correspond to zip for all finite streams. The proof for infinite streams requires more elaboration. > test81 = szip stream1 stream1 > test82 = szip stream1 stream2 > test83 = szip stream2 stream1 > test84 = stake 5 $ szip stream2 (sdrop 10 stream2) As one may expect (or not), these tests give the right results *Folds> test83 [(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e'),(6,'f'),(7,'g'),(8,'h'),(9,'i')] *Folds> test84 [(1,11),(2,12),(3,13),(4,14),(5,15)] Simon Peyton-Jones (in a follow-up message posted on the Haskell mailing list on Oct 28, 2005) pointed out that ``John Launchbury et al had a paper about hyper-functions which tackled the zip problem too.'' http://citeseer.ist.psu.edu/krstic01hyperfunctions.html. But hyper-functions imply the fix-point operator. I specifically wanted to avoid Y in any guise! There are several reasons: - In the pure case, there will be nothing to prove. We can always convert an FR-list to an ordinary Haskell (i.e., co-inductive) list, and back. This is an iso-morphism, so the library of FR lists is a trivial consequence of the Haskell list library. If we banish Y however, we can no longer convert Haskell lists to FR lists, and so iso-morphism is broken. BTW, the iso-morphism between FR lists and Haskell lists no longer holds if we do all operations in some monad 'm' and that monad is strict. - It would be interesting to see, constructively, how to build the full-fledged (FR)-list library and avoid general recursion. Let the FR-list be the only driver of iterations. - Related to the above: how to build a list library without Y, without recursive types. We need only higher-ranked types. The zip function is particularly interesting because it relates to 'parallel loops'. It becomes especially interesting in the case of general backtracking computations, or backtracking computations in direct style, with delimited continuations modeling `lists'. The following two examples have been inspired by the message posted on Lambda-the-Ultimate by Neelakantan Krishnaswami on Thu, 2008-06-12 00:05. Given a list of strings, we wish to intersperse ", " concatenating result. In SML, this function can be written as fun concat [] = "" | concat [x] = x | concat (x :: xs) = x ^ ", " ^ (concat xs) It has two base cases (for the empty list and the one-element list). It may seems expressing this function via fold is awkward. Actually, it is not, since the induction is quite apparent. > concat_with_commas l = > case h of > Nothing -> "" > Just x | snull t -> x > Just x -> x ++ unFR t (\e seed -> ", " ++ e ++ seed) "" > where h = shead l > t = stail l *Folds> concat_with_commas sempty "" *Folds> concat_with_commas ("s1" `scons` sempty) "s1" *Folds> concat_with_commas ("s1" `scons` ("s2" `scons` ("abc" `scons` sempty))) "s1, s2, abc" The second example is Fibonacci fun fib 0 = 1 | fib 1 = 1 | fib n + 2 = (fib n) + (fib (n + 1)) which calls itself recursively twice. Still, the induction is transparent here, and so a simple first-order fold suffices. > sfib :: FR Integer -> Integer > sfib l = fst (unFR l (\_ (a,b) -> (a+b,a)) (1,0)) *Folds> sfib (nstream 0) 1 *Folds> sfib (nstream 1) 1 *Folds> sfib (nstream 2) 2 *Folds> sfib (nstream 3) 3 *Folds> sfib (nstream 4) 5 *Folds> sfib (nstream 5) 8 *Folds> sfib (nstream 8) 34