The generators in our story come from the formula yield = exceptions + non-determinism
. We will mostly use the Haskell
implementation because of more revealing types;
porting to OCaml or other languages is straightforward. The basic
generator interface defines one (abstract) type GenT e m a
and two functions, for yielding and iterating -- along with the observation
function.
-- Basic interface type GenT e m a -- abstract instance Monad m => Monad (GenT e m) instance MonadTrans (GenT e) type LoopBody e m = e -> m () yield :: Monad m => e -> GenT e m () for_loop :: (MonadTrans t, Monad m, Monad (t m)) => GenT e (t m) () -> LoopBody e (t m) -> t m () observe :: (MonadTrans t, Monad m) => t m a -> m a
Gen e m a
is the type of computations that yield
values of the type e
in the monad m
, eventually returning the
value of the type a
. The particular implementation of generators used here
requires the monad m
be MonadPlus
. Likewise the monad transformer t
that appears in the signatures of for_loop
and observe
must be LogicT
. Other implementations of the basic generator interface
lift these requirements.
The primitive generator yield
(along with the fact that Gen e m
is a monad)
lets us write other generators. In this article, we will use two.
One is the generator that enumerates values within a given range.
Compared to the familiar Enum a => enumFromTo :: a -> a -> [a]
,
the generator does not create any list, even lazily.
Rather, it yields elements as they are being enumerated:
enumFromToG :: (MonadPlus m, Enum a) => a -> a -> GenT a m () enumFromToG xbeg xend | fromEnum xbeg <= fromEnum xend = yield xbeg >> enumFromToG (succ xbeg) xend enumFromToG xbeg xend = return ()The second sample generator yields labels from an in-order traversal of a tree with integer-labeled leaves. We have seen this generator many times in an earlier page on generators.
type Label = Int in_order2 :: (MonadIO m, MonadPlus m) => Tree -> GenT Label m ()
To perform the iteration, we connect a generator with the consumer of the generated values, that is, with a loop body. The sample loop:
sl1 = for_loop (enumFromToG 'a' 'e') $ \i -> lift . print $ icorresponds to the for-loop in Pascal
for i:='a' to 'e' do writeln(i)The second sample loop
for_loop (in_order2 (make_full_tree 3)) $ \i -> lift . print $ icannot be idiomatically expressed in Pascal. It can be written however, roughly in the same form, in CLU, which had generators. To execute the overall iterative computation we have to
observe
it,
in a base monad such as IO
.
The basic interface is an abstraction for simple generators, such as those
in CLU, which suffice for many cases. Parallel loops however need
more powerful generators, described by the advanced
interface below. The basic interface was the abstraction of the advanced one.
The advanced interface exposes the implementation of generators as
yield = exceptions + non-determinism
.
-- Advanced interface type GenT e m a = MonadPlus m => ErrorT e m a -- now the concrete type yield :: MonadPlus m => e -> GenT e m () yield x = raise x `mplus` return () for_loop :: (LogicT t, MonadPlus (t m), Monad m) => GenT e (t m) () -> LoopBody e (t m) -> t m () for_loop gen body = iter Nothing $ catchError' gen >>= body catchError' :: MonadPlus m => GenT e m () -> m e iter:: (LogicT t, MonadPlus (t m), Monad m) => Maybe Int -> t m () -> t m () msplit :: (LogicT t, MonadPlus (t m), Monad m) => t m a -> t m (Maybe (a, t m a))Since generators combine exceptions (the
ErrorT
monad) and non-determinism
(the LogicT
monad), they inherit the operations of those monads.
Specifically, catchError'
, an instance of catch
, obtains the value
yielded by a generator. The combinator iter
, a particular case of
bagOfN
, takes a non-deterministic computation t m ()
and
executes all of its choices, one-by-one. If the first argument
of iter
is Just n
, only the first n
choices are executed and the rest
are discarded. Effectively, iter
`runs' a non-deterministic computation,
actualizing all its non-deterministic choices. For example, desugaring
the first sample loop gives the following advanced code:iter Nothing $ do i <- catchError' $ enumFromToG 'a' 'e' lift . print $ iThe loop body non-deterministically obtains the value yielded by the generator, and prints it. The
iter Nothing
line -- the loop header, so
to speak -- makes the body run through all choices, that is,
all generated values.Generators: yield = exceptions + non-determinism
The detailed description of the generators used on this page
GenLoop.hs [5K]
The complete code for this article
nl1 = observe $ for_loop (in_order2 (make_full_tree 3)) $ \i -> for_loop (enumFromToG 'a' 'e') $ \j -> lift . print $ (i,j)The print-out confirms -- although we have never doubted -- that for each value of
i
, j
ranges over all its possible values.
Since the inner j
-loop is within the i
-loop body, the variable i
scopes over the entire inner loop, including its generator. That is,
the inner loop generator may refer to the outer-loop variable, making
loop dependence obvious:
nl2 = observe $ for_loop (in_order2 (make_full_tree 3)) $ \i -> for_loop (enumFromToG i 3) $ \j -> lift . print $ (i,j)The binding scope of a loop variable matches the scope of iteration of the corresponding loop.
The story picks up as we desugar the nested loop nl1
and rearrange the code
using equational reasoning. We obtain a different way of writing
nested loops, now using the advanced interface:
nl1' = observe $ iter Nothing $ do -- loop header i <- catchError' (in_order2 (make_full_tree 3)) j <- catchError' (enumFromToG 'a' 'e') lift . print $ (i,j) -- loop bodywhich we will write in the abbreviated form as
nl1' = observe $ iter Nothing $ do i <- e1 j <- e2 bodyThe binding and control scopes are a bit harder to see now. Recall that the Haskell notation
do i <- e1 j <- e2 bodyis a syntactic sugar for
e1 >>= (\i -> e2 >>= (\j -> body))exposing that
i
scopes over body
and over e2
-- as was the
case in the original nl1
. The repeated do
-bindings corresponds to
what other languages call `nested let' or `repeated let':let i = e1 in let j = e2 in body
The control scope also becomes clear if we look at the evaluation context
of the first generator e1
:
do i <- [] j <- e2 bodyThe context includes the generator
e2
. The non-deterministic
expression e1
replicates its context; it plugs the first yielded
value into the first context replica, the second yielded value into
the second replica, etc. Each replica has the copy of e2
, the generator
for the j
-loop, which runs anew in each replicated copy. The body
is
thus executed at each point in the Cartesian product of the domain of i
and the domain of j
. The loop nl1'
indeed has the same behavior
as the original nl1
. The binding scope of a loop variable and the
control scope -- the evaluation context of the generator --
still match.
Desugaring of nl2
re-iterates the point that the second generator
is within the binding scope of the loop variable i
associated
with the first generator.
nl2' = observe $ iter Nothing $ do -- loop header i <- catchError' (in_order2 (make_full_tree 3)) j <- catchError' (enumFromToG i 3) lift . print $ (i,j) -- loop bodyThe binding scope is lexical and hence is easier to see than the control scope. Since the two scopes match, we can tell which generator holds control over which by examining the binding scopes of their loop variables.
Although the desugared form of for_loop
is less pretty to look at,
it is more insightful. There is only one binding form for loop
variable: do
-binding, which is just the lambda-binding. Lambda
provides binding and generators provide iteration. The binding scope
of a loop variable matches the control scope of the iteration.
let
- or do
-bindings
with generators as binding expressions correspond to nested loops.
Many programming languages offer so-called parallel bindings,
such as the following OCaml form:let i = e1 and j = e2 in bodyAlthough
i
still scopes over body
it no longer scopes
over e2
. If e2
contains an occurrence of i
, it would
remain free. The following two expressions should help see the difference
in binding scope:let i = 1 in let i = i + 1 in let j = i + i in (i,j) let i = 1 in let i = i + 1 and j = i + i in (i,j)The former, with all nested
let
s, evaluates to (2,4)
; the latter
evaluates to (2,2)
.
The bindings are called parallel because e1
and e2
no longer have
any data dependency and may be evaluated in any order: e2
can be
evaluated without waiting for the value to be bound to i
.
The expressions e1
and e2
may even be evaluated in parallel, and the
program should not -- ought not to -- notice.
Our story comes to a decisive point. Can we realize parallel bindings,
first in Haskell, to maintain not only data independence of e1
and e2
but also their control independence? Can we use parallel bindings
to implement parallel loops?
Parallel let
such as let i = e1 and j = e2 in body
may be regarded, in the first approximation, as a syntactic sugar for (fun (i,j) -> body) (e1,e2)
. The Haskell implementation starts off easy: pair e1 e2 >>= \ (i,j) -> body
. What remains is to determine the zip
-like combinator to pair-up two generators. The naive implementation:
seq2G :: Monad m => m t -> m t1 -> m (t, t1) seq2G e1 e2 = do i <- e1 j <- e2 return (i,j)alas is not satisfactory:
e2
is in the evaluation context of e1
, and
hence control-dependent on e1
. Somehow we have to exclude e1
and e2
from the evaluation contexts of each other, we have to isolate
non-deterministic effects of e1
and e2
and prevent them from
affecting each other.
Fortunately, LogicT
lets us run a non-deterministic computation
in an isolated `sand-box'.
msplit :: (LogicT t, MonadPlus (t m), Monad m) => t m a -> t m (Maybe (a, t m a))That is,
msplit e1
runs e1
in the sandbox and reports the outcome: Nothing
if e1
fails or Just (x,e1')
if e1
produced x
;
the computation e1'
represents the remaining non-deterministic choices.
Technically speaking, msplit
reifies the non-determinism effect
as a Maybe (a, t m a)
value. This sandbox isolation is exactly what is
needed for the parallel combinator:par2G :: (LogicT t, MonadPlus (t m), Monad m) => t m t1 -> t m t2 -> t m (t1, t2) par2G e1 e2 = do iR <- msplit e1 jR <- msplit e2 case (iR,jR) of (Just (i,e1'), Just (j,e2')) -> return (i,j) `mplus` par2G e1' e2' _ -> mzeroWe determine the first choice of
e1
and, separately, the first choice
of e2
, and pair them up as the first choice of the par2G e1 e2
.The story concludes with the parallel loop
pl1' = observe $ iter Nothing $ do -- loop header (i,j) <- par2G (catchError' (in_order2 (make_full_tree 3))) (catchError' (enumFromToG 'a' 'e')) lift . print $ (i,j) -- loop bodyLexically, the two generator expressions are not in scope of either loop variable binding. The print-out shows the two generators indeed run in parallel. Neither of the two generators is in the control scope of the other. Binding and control scopes match again.
Our pairing combinator par2G
does not report which of the two generators
finished first. We describe a more general combinator below.
The Haskell and OCaml generators derived on this page effectively
run on the spaghetti stack, one may say, which they inherit from
the implementation of non-determinism. The Haskell and OCaml
generators hence can run side-by-side. Recall that
our generators can be represented as -- reified into -- a lazy stream
(see msplit
). Writing a zipWith
-like function to process two
streams in parallel is straightforward then.
We demonstrate the parallel composition of Haskell and OCaml generators on the famous ``same-fringe problem'', determining if two binary trees have the same fringe, that is, yield the same sequence of labels when traversed in a particular order. Crucially, we must stop the traversal as soon as the mismatch is found, returning it. To make clear the early traversal termination, we print the labels as we traverse the tree. The printing operation makes the traversal effectful and hence precludes lazy lists.
We solve the problem by defining the parallel loop -- the
zipWith
-like combinator for streams. We show the Haskell code first.
To let the zipping function deal with the streams of unequal size, we
make its argument types to be Maybe a
, with Nothing
signifying stream's end:
zipWithL :: (Monad m, LogicT t, MonadPlus (t m)) => (Maybe a -> Maybe b -> c) -> t m a -> t m b -> t m c zipWithL f m1 m2 = do r1 <- msplit m1 r2 <- msplit m2 case (r1,r2) of (Nothing, Nothing) -> return $ f Nothing Nothing (Just (v1,_),Nothing) -> return $ f (Just v1) Nothing (Nothing,Just (v2,_)) -> return $ f Nothing (Just v2) (Just (v1,t1),Just (v2,t2)) -> return (f (Just v1) (Just v2)) `mplus` zipWithL f t1 t2The function
same_fringe
below takes two generator expressions and
runs them side-by-side, generating the stream of mismatches.
The one-element stream of (Nothing,Nothing)
denotes the complete match.same_fringe :: (Monad m, LogicT t, MonadPlus (t m), Eq a) => ErrorT a (t m) () -> ErrorT a (t m) () -> t m (Maybe a, Maybe a) same_fringe m1 m2 = zipWithL (,) (catchError' m1) (catchError' m2) >>= check where check r@(Nothing,_) = return r check r@(_,Nothing) = return r check r@(Just x,Just y) | x /= y = return r check _ = mzeroThe following function compares the in-order sequences of labels of two trees, returning the first mismatch.
sfringe_test :: MonadIO m => Tree -> Tree -> m (Maybe Label, Maybe Label) sfringe_test t1 t2 = observe $ same_fringe (in_order2 t1) (in_order2 t2)The generator
in_order2
, described earlier on this page,
prints the trace of the traversal. The trace reveals that the two
trees are indeed traversed side-by-side; the traversal stops
as soon as the first mismatch is found. We may also print, say,
the first three mismatches.The OCaml implementation is the same. We define a parallel loop over two generators:
let for_loop_2 p gen1 gen2 body : unit = let rec loop = function | (Nil,Nil) -> body None None | (Cons (v1,_),Nil) -> body (Some v1) None | (Nil,Cons (v2,_)) -> body None (Some v2) | (Cons (v1,t1), Cons (v2,t2)) -> body (Some v1) (Some v2); loop (t1(),t2()) in loop (msplit p gen1, msplit p gen2)employing it to write the generator for mismatches of two other generators
gen1
and gen2
:let pmismatch : ('a option * 'a option) stream Delimcc.prompt = new_prompt () let same_fringe p gen1 gen2 = for_loop_2 p gen1 gen2 (fun x y -> match (x,y) with | (None,_) | (_,None) as r -> yield pmismatch r | (Some x,Some y) as r when not (x = y) -> yield pmismatch r | _ -> ())We may run
same_fringe
as any other generator, to obtain the first
mismatch or the first n
mismatches.test_gen.ml [7K]
The complete code for the OCaml examples.
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML