From oleg-at-okmij.org Tue Oct 11 17:25:24 2005
To: haskell-at-haskell.org
Subject: How to zip folds: A library of fold transformers
In-Reply-To: <20051006183104.GB595@sleepingsquirrel.org>
Message-ID: <20051012002524.7A387ACB2@Adric.metnet.navy.mil>
Date: Tue, 11 Oct 2005 17:25:24 -0700 (PDT)
X-Comment: updated on Jun 16, 2008: added the standard list API
  (cons, empty, null, head, tail) and two examples of
  surprisingly simple expressions of list intersperse and Fibonacci
  in terms of fold.
Status: RO

We show how to merge two folds into another fold
`elementwise'. Furthermore, we present a library of (potentially
infinite) ``lists'' represented as folds (aka streams, aka
success-failure-continuation--based generators). Whereas the standard
Prelude functions such as |map| and |take| transform lists, we
transform folds. We implement the range of progressively more complex
transformers -- from |map|, |filter|, |takeWhile| to |take|, to |drop|
and |dropWhile|, and finally, |zip| and |zipWith|. The standard list API
is also provided.

Emphatically we never convert a stream to a list and so we never use
recursion or recursive types. All iterative processing is driven by
the fold itself.

The implementation of zip also solves the problem of ``parallel
loops''.  One can think of a fold as an accumulating loop. One can
easily represent a nested loop as a nested fold. Representing parallel
loop as a fold is a challenge, answered at the end of the message. 

This library is inspired by Greg Buchholz' message on the Haskell-Cafe list
and is meant to answer open questions posed at the end of that message
 http://www.haskell.org/pipermail/haskell-cafe/2005-October/011575.html

This message a complete literate Haskell code.

> {-# LANGUAGE Rank2Types #-}
> module Folds where

First we define the representation of a list as a fold:

> newtype FR a = FR (forall ans. (a -> ans -> ans) -> ans -> ans)
> unFR (FR x) = x

It has a rank-2 type. The defining equations are: if flst is a value
of a type |FR a|, then
	unFR flst f z = z if flst represents an empty list
	unFR flst f z = f e (unFR flst' f z)
			if flst represents the list with the head 'e'
			and flst' represents the rest of that list

From another point of view, |unFR flst| can be considered a _stream_
that takes two arguments: the success continuation of the type 
|a -> ans -> ans| and the failure continuation of the type |ans|.  This is
the encoding of lists in a (non-simply-typed) lambda-calculus.  The
LogicT paper discusses such types in detail, and shows how to find
that "rest of the list" flst'. The slides of the ICFP05 presentation
by Chung-chieh Shan point out to more related work in that area.

But we are here to drop, take, dropWhile, etc. Our functions will
take a stream and return another stream, of the |FR a| type, which
represents truncated, filtered, etc. source stream.

Let us define two sample streams: a finite and an infinite one:

> stream1 :: FR Char
> stream1 = FR (\f unit -> foldr f unit ['a'..'i'])
> stream2 :: FR Int
> stream2 = FR (\f unit -> foldr f unit [1..])

We may also consider an integer to be a stream (of all of its predecessors,
starting from zero):

> nstream :: (Ord a, Enum a, Num a) => a -> FR a
> nstream n = FR (fold_num n)
>  where fold_num n f unit | n <= 0 = unit 
>        fold_num n f unit = let n' = pred n in f n' (fold_num n' f unit)

The internal function fold_num is primitively recursive. It may as well have
been a primitive, like foldr.


We define a way to show the stream. This is the only time we convert |FR a|
to a list -- so we can more easily show it.

> instance Show a => Show (FR a) where
>   show l = show $ unFR l (:) []

*Folds> show stream1
"\"abcdefghi\""
*Folds> show (nstream 5)
"[4,3,2,1,0]"


We start by implementing the primitive list operations: constructors
and deconstructors:

> scons :: a -> FR a -> FR a
> scons x l = FR(\f z -> f x (unFR l f z))
>
> sempty :: FR a
> sempty = FR (\f z -> z)
>
> snull :: FR a -> Bool
> snull l = unFR l (\a z -> False) True
>
> shead :: FR a -> Maybe a
> shead l = unFR l (\a z -> Just a) Nothing
>
> stail :: FR a -> FR a
> stail = sdrop 1			-- see below

The functions shead and stail, unlike the corresponding Data.List functions,
are total. The function shead has the Maybe result type, returning Nothing
if the list is empty; stail returns the empty list when applied to the
empty list. The function sdrop is defined below. Here is another stream:

> stream3 = True `scons` (False `scons` sempty)

*Folds> show stream3
"[True,False]"
*Folds> shead stream3
Just True
*Folds> shead (stail stream3)
Just False
*Folds> shead (stail (stail stream3))
Nothing
*Folds> snull stream3
False
*Folds> snull (stail (stail stream3))
True

Since we support the minimal API for lists (cons, empty, null, head, tail),
we are done. However, any non-trivial list processing using only these
operations involves explicit recursion -- which we would like to avoid.
It is not generally possible in Haskell to statically restrict
a recursive definition to be primitively recursive. If we avoid
explicit recursion relying only on |FR a|, and trust that the 
corresponding stream is well-founded, our list processing operation
will surely terminate. Thus |FR a| can be regarded as a restricted,
`good' form of recursion (either induction or co-induction, depending on the
stream). Thus we must strive to write list processing operations using the 
fold implicit in |FR a| rather than using |shead| and |stail|. Granted,
there are some list processing operations (non-primitively recursive) that
cannot be expressed via |FR a| without additional recursion. In that case,
|shead| and |stail| are useful. In some other cases, using |FR a| directly
may appear quite awkward. In these cases, the derived functions below like
|sdrop|, |szipWith|, etc. may help. At the end we show that even some
more interesting list processing tasks such as intersperse or computing
Fibonacci numbers can be _simply_ expressed via folds. 


The map function is trivial:

> smap :: (a->b) -> FR a -> FR b

*> smap f l = FR(\g -> unFR l (g . f))

which can also be written as

> smap f l = FR((unFR l) . (flip (.) f))

For example,

> test1 = show $ smap succ stream1


Next is the filter function:

> sfilter :: (a -> Bool) -> FR a -> FR a
> sfilter p l = FR(\f -> unFR l (\e r -> if p e then f e r else r))

> test2 = sfilter (not . (`elem` "ch")) stream1

The function takeWhile is quite straightforward, too

> stakeWhile :: (a -> Bool) -> FR a -> FR a
> stakeWhile p l = FR(\f z -> unFR l (\e r -> if p e then f e r else z) z)

> test3  = stakeWhile (< 'z') stream1
> test3' = stakeWhile (< 10) stream2

As we can see, stakeWhile well applies to an infinite stream.

The functions take, drop, dropWhile ask for more complexity.

> stake :: (Ord n, Num n) => n -> FR a -> FR a
> stake n l = FR(\f z -> 
>	 unFR l (\e r n -> if n <= 0 then z else f e (r (n-1))) (const z) n)

> test4    = stake 20 stream1
> test4'   = stake 5 stream1
> test4''  = stake 11 stream2
> test4''' = (stake 11 . smap (^2)) stream2

The function sdrop shows the major deficiency: we're stuck with the
(<=0) test for the rest of the stream. In this case, some delimited
continuation operators like `control' can help, in limited
circumstances.

> sdrop :: (Ord n, Num n) => n -> FR a -> FR a
> sdrop n l = FR(\f z -> 
>	 unFR l (\e r n -> if n <= 0 then f e (r n) else r (n-1)) (const z) n)

> test5    = sdrop 20 stream1
> test5'   = sdrop 5 stream1
> test5''  = stake 5 $ sdrop 11 stream2

The function dropWhile becomes straightforward

> sdropWhile :: (a -> Bool) -> FR a -> FR a
> sdropWhile p l = FR(\f z -> 
>	 unFR l (\e r done -> 
>	   if done then f e (r done) 
>	      else if p e then r done else f e (r True)) (const z) False)

> test6   = sdropWhile (< 'z') stream1
> test6'  = sdropWhile (< 'd') stream1
> test6'' = stake 5 $ sdropWhile (< 10) stream2

The zip function might appear complex. However, if we recall that we
already have stake and sdrop (so that we can `split' an FR stream into
the head element (stake 1) and the rest (sdrop 1)), the code for
szipWith becomes trivial:

> szipWith :: (a->b->c) -> FR a -> FR b -> FR c
> szipWith t l1 l2 = 
>   FR(\f z -> 
>        unFR l1 (\e1 r r2 -> 
>	  unFR r2 (\e2 _ -> f (t e1 e2) (r (sdrop 1 r2))) z) (const z) l2)


> szip :: FR a -> FR b -> FR (a,b)
> szip = szipWith (,)

The code for szipWith is not recursive and needs no recursive types.
Let us take a closer look at the code: at first blush it looks like a
nested fold (``nested loop''). But then we see that the nested
expression operates on the progressively shorter list l2. That is, the
'tail' of l2 becomes the 'seed' of the outer fold represented by l1.

The szipWith code demonstrates that `unFR l1' instantiates 'ans' to
the type of 'l2' -- that is, to the type FR itself. We definitely have
impredicativity -- the quantified type variable in FR may be
instantiated to the type that is being defined by FR. Thus the
higher-rank type of FR is not a nicety and is not merely a safety
property -- it is the necessity. The latter follows from the general
fact that a predecessor of a Church numeral (aka, `tail') is not
expressible in simply-typed lambda-calculus.


One can easily prove that the function szip does correspond to zip for
all finite streams. The proof for infinite streams requires more
elaboration.

> test81 = szip stream1 stream1
> test82 = szip stream1 stream2
> test83 = szip stream2 stream1
> test84 = stake 5 $ szip stream2 (sdrop 10 stream2)

As one may expect (or not), these tests give the right results

*Folds> test83
[(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e'),(6,'f'),(7,'g'),(8,'h'),(9,'i')]
*Folds> test84
[(1,11),(2,12),(3,13),(4,14),(5,15)]


Simon Peyton-Jones (in a follow-up message posted on the Haskell
mailing list on Oct 28, 2005) pointed out that ``John Launchbury et al
had a paper about hyper-functions which tackled the zip problem too.''
  http://citeseer.ist.psu.edu/krstic01hyperfunctions.html. 

But hyper-functions imply the fix-point operator. I specifically
wanted to avoid Y in any guise!  There are several reasons:

        - In the pure case, there will be nothing to prove. We can
always convert an FR-list to an ordinary Haskell (i.e., co-inductive)
list, and back. This is an iso-morphism, so the library of FR lists is
a trivial consequence of the Haskell list library. If we banish Y
however, we can no longer convert Haskell lists to FR lists, and so
iso-morphism is broken. BTW, the iso-morphism between FR lists and
Haskell lists no longer holds if we do all operations in some monad
'm' and that monad is strict.

        - It would be interesting to see, constructively, how to build
the full-fledged (FR)-list library and avoid general recursion. Let
the FR-list be the only driver of iterations.

        - Related to the above: how to build a list library without Y,
without recursive types. We need only higher-ranked types. 

The zip function is particularly interesting because it relates to
'parallel loops'. It becomes especially interesting in the case of
general backtracking computations, or backtracking computations in
direct style, with delimited continuations modeling `lists'.


The following two examples have been inspired by the message
posted on Lambda-the-Ultimate by Neelakantan Krishnaswami
on Thu, 2008-06-12 00:05.

Given a list of strings, we wish to intersperse ", " concatenating result. 
In SML, this function can be written as

    fun concat [] = ""
      | concat [x] = x
      | concat (x :: xs) = x ^ ", " ^ (concat xs)

It has two base cases (for the empty list and the one-element list). It may
seems expressing this function via fold is awkward. Actually, it is not,
since the induction is quite apparent.

> concat_with_commas l = 
>    case h of
>      Nothing -> ""
>      Just x | snull t -> x
>      Just x -> x ++ unFR t (\e seed -> ", " ++ e ++ seed) ""
>  where h = shead l
>        t = stail l

*Folds> concat_with_commas sempty
""
*Folds> concat_with_commas ("s1" `scons` sempty)
"s1"
*Folds> concat_with_commas ("s1" `scons` ("s2" `scons` ("abc" `scons` sempty)))
"s1, s2, abc"

The second example is Fibonacci

      fun fib 0 = 1
        | fib 1 = 1
        | fib n + 2 = (fib n) + (fib (n + 1))

which calls itself recursively twice. Still, the induction is
transparent here, and so a simple first-order fold suffices.

> sfib :: FR Integer -> Integer
> sfib l = fst (unFR l (\_ (a,b) -> (a+b,a)) (1,0))

*Folds> sfib (nstream 0)
1
*Folds> sfib (nstream 1)
1
*Folds> sfib (nstream 2)
2
*Folds> sfib (nstream 3)
3
*Folds> sfib (nstream 4)
5
*Folds> sfib (nstream 5)
8
*Folds> sfib (nstream 8)
34