Generators have become popular because they give the benefit of lazy
evaluation -- modular on-demand processing -- in strict languages or
when lazy evaluation does not apply because the computation has
observable effects such as IO. Our main result clarifies the meaning
of generators and may lead to new ways of implementing them:
computations that yield
decompose into exceptions and
non-determinism. To be precise if abstruse, a monad of generators is
an exception-monad--transformed LogicT monad. The one-line Haskell definition
of yield
should clarify:
yield x = raise x `mplus` return ()
We come to the main result after an overview of the origins of generators as the encapsulation of traversal of a real or virtual collection; a non-deterministic computation being an example of the latter. We focus on the convenience of defining generators for new collections and on building generators by composition. Yield then arises as a mode of composition.
Our main result sprouts the implementations of generators in
Haskell, logical programming and ML -- the implementations
that are correct and working on the first try. Iterators like
for
-loop are programmed-in rather than built-in. We program
not only customary traversals but also accumulating ones,
not only generator statements but also value-producing
generator expressions, not only nesting of generators but
also running them side-by-side.
I am grateful to Roman Cheplyaka for helpful discussions and suggestions.
About two years later, generators as the fundamental control structure appeared in the programming language Icon. Icon, as its predecessor SL5, focused on goal-directed evaluation, expressing non-deterministic computations. A generator in Icon is an expression that may produce several values on demand. The expression's context determines how many values will actually be produced. Again there was a separation between obtaining (candidate) values and consuming the values, possibly demanding more. A non-deterministic operation hides the details of generating the candidates and encapsulates the generation state. It represents the traversal of a virtual collection whose elements are computed during the enumeration as needed.
However different Alphard and Icon are, they both have for
-loop,
implemented in terms of the generator producing integers from i
through j
on demand -- called upto(i,j)
in Alphard and i to j
in Icon.
I thank Richard O'Keefe for pointing out early AI languages.
Barbara Liskov: A History of CLU.
MIT LCS Technical Report 561. April l992.
Section 3.10. Iterators
< http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-561.pdf >
Ralph E. Griswold: An Overview of the Icon Programming Language; Version 9. March 2, 1996.
Section 2.2. Generators
< http://www.cs.arizona.edu/icon/docs/ipd266.htm >
Daniel G. Bobrow and Bertram Raphael: New Programming languages for AI research
Xerox Palo Alto Research Center Report CSL-73-2. August 20, 1973.
< http://www.bitsavers.org/pdf/xerox/parc/techReports/CSL-73-2_New_Programming_Languages_For_AI_Research.pdf >
Generators also appeared in many AI languages of early 1970s such as SAIL, CONNIVER, QLISP/INTRELISP, POPLER -- alongside backtracking, co-routines and other control facilities. Icon stands out in the systematic use of generators as the fundamental control structure, on which backtracking and other control facilities are built.
Generators in Alphard and CLU encapsulate the enumeration of a collection. They have a simple interface with the operations to test for the end of the enumeration, to obtain the current element and advance the enumeration. This is the interface of streams, or lazy lists (lists whose cons-cells and elements are produced on demand).
Generators in Icon at first blush seem different. Here is an excerpt from Section 2.2 of `An Overview of the Icon Programming Language.'
The results that a generator produces depend on context. In a situation where only one result is needed, the first is produced, as insentence := "Store it in the neighboring harbor" i := find("or", sentence)which assigns the value 3 toi
. If the result produced by a generator does not lead to the success of an enclosing expression, however, the generator is resumed to produce another value. An example isif (i := find("or", sentence)) > 5 then write(i)Here the first result produced by the generator, 3, is assigned toi
, but this value is not greater than 5 and the comparison operation fails. At this point, the generator is resumed and produces the second position, 23, which is greater than 5. The comparison operation then succeeds and the value 23 is written.
These simple examples are straightforward to implement with lazy lists,
even more so in Haskell, whose native lists are lazy and which has
a rich (lazy) list processing library. First we write the analogue
of Icon's find
function, to obtain the index of an occurrence
of the pattern pat
in the string str
.
findIL :: String -> String -> [Int] findIL pat str = findIndices (isPrefixOf pat) . tails $ strSince there may be several or none such occurrences, Icon's
find
is
non-deterministic, and so is our findIL
, as seen from its type. We
take advantage of the standard library function tails
, to compute the
list of all suffixes of str
, from str
itself down to the
empty string. The standard library function findIndices
checks each
element against the predicate isPrefixOf pat
and returns the
indices of the matching elements, that is, the positions of all
occurrences of pat
. The clarity of the algorithm gives confidence
in its correctness, but also worry about its efficiency. The
appeal of lazy lists is the removal of that worry: none of these
suffix and index computations are done unless demanded. The demand,
in Icon and in Haskell, comes from the context. In our example,
the context needs a value to bind to i
.sentence = "Store it in the neighboring harbor" find_ex1 = do case filter (>4) $ findIL "or" sentence of (i:_) -> print i [] -> return ()The
case
expression asks for the first element in the list
of indices that is greater than 4 (indexing in Haskell starts with 0).
The filter expression asks for the first index, filters it out, and ask
for the second one, 22. The latter index passes the test, incorporated
in the result list and eventually bound to i
and printed. Since no
more indices are demanded, the sentence
is not searched past the
second occurrence of "or"
. The example Icon and Haskell code
indeed execute similarly.The context of a generator may also ask for all values in generator's collection. Here is Icon again:
A generator can be resumed repeatedly to produce all its results by using the every-do control structure. An example isevery i := find("or", sentence) do write(i)which writes all the positions at which "or" occurs in sentence. For the example above, these are 3, 23, and 33. Generation is inherited like failure, and this expression can be written more concisely by omitting the optional do clause:every write(find("or", sentence))
A lazy list likewise can actually produce all of its elements, in the context of an operation that fully traverses the list and prints its elements. The similarity of the Haskell and Icon code here is striking.
mapM_ print $ findIL "or" sentence
It is regrettable that the similarity between Haskell lazy lists and Icon generators holds only for simple examples. We need better abstractions to chain and combine the generators. Haskell lazy lists become less suitable for generators with a lot of state -- and of no use when the generation has an observable effect such as IO.
MonadsPlus
, is not sufficient either. At the very least, we need
LogicT
.We again turn to `An Overview of the Icon Programming Language,' this time, for larger examples:
There are several other control structures related to generation. One is alternation,Writing nested generators like those above with Haskell lists becomes sufficiently cumbersome, prompting us to adopt a better abstraction. First, we change the primitive building blocks: instead of the standard list constructorsexpr1 | expr2which generates the results ofexpr1
followed by the results ofexpr2
. Thusevery write(find("or", sentence1) | find("or", sentence2))writes the positions of "or" insentence1
followed by the positions of "or" insentence2
. Again, this sentence can be written more compactly by using alternation in the second argument of find():every write(find("or", sentence1 | sentence2))Another use of alternation is illustrated by(i | j | k) = (0 | 1)which succeeds if any of i, j, or k has the value 0 or 1.
[]
and (:)
, we will use
mzero
for the empty generator, return
for the singleton one,
mplus
to model alteration and (>>=)
to model chaining.
These are the four operations of MonadPlus
. Although the operations
can be expressed in terms lists -- lists are one
implementation of MonadPlus
-- we shall maintain the abstraction
and treat the operations as primitive.
We easily write the last Icon's example, as tcomp
below.
First we define the equivalent of Icon equality comparison,
which fails if the equality does not hold.
equal :: MonadPlus m => m Int -> m Int -> m Int equal i j = do iv <- i jv <- j if iv == jv then return iv else mzero tcomp i j k = (i `mplus` j `mplus` k) `equal` (return 0 `mplus` return 1)We could have simplified the notation by writing
0
instead of
return 0
, taking advantage of overloaded numeric literals in
Haskell. Although MonadPlus
lets us write tcomp
, it does not,
however, let us use it, for example, in a conditional operator. We
cannot implement the conditional, the negation-as-failure, or the
collector of all generated values -- all the operations that test, query
and advance the generator. MonadPlus
lets us build non-deterministic
computations, but not examine them. We need a richer interface, such
as LogicT
. The latter extends MonadPlus
with the operation
msplit
that implements the minimal generator interface, of testing
for emptiness, obtaining the current element and advancing. The
LogicT
library includes the Icon-style conditional, called ifte
,
written in terms of msplit
. We are now able to use our tcomp
:tcomp_ex1 :: (LogicT t, Monad m, MonadPlus (t m)) => t m String tcomp_ex1 = ifte (tcomp (return 2) (return 1) (return 3)) (\i -> return $ "Yes: " ++ show i) (return "No")To implement the first Icon example in the quoted excerpt, we need the analogue of Icon's
find
. We `lift' our earlier
implementation in terms of lists to an arbitrary MonadPlus
:-- The MonadPlus analogue of the lazy cons mcons :: MonadPlus m => a -> m a -> m a mcons h t = return h `mplus` t -- The operation foldr below replaces nil with mzero and cons with mcons findIM :: MonadPlus m => String -> String -> m Int findIM pat str = foldr mcons mzero $ findIL pat strTo make the example better, we pretend that
sentence1
and sentence2
read their sentences from a file.
(We only pretend, printing a trace message instead.)
Our generators now have an observable effect, ruling out Haskell lists.
Fortunately, LogicT
has other implementations. In fact, LogicT
is a monad transformer,
parameterized by the monad responsible for the underlying
effects.sentence1, sentence2 :: (MonadIO (t IO), LogicT t) => t IO String sentence1 = liftIO (putStrLn "sentence1") >> return sentence sentence2 = liftIO (putStrLn "sentence2") >> return "Sort of" twosen = observe $ iter Nothing $ (liftIO . print =<< findIM "or" =<< sentence1 `mplus` sentence2)The Haskell code looks quite similar to the Icon code, especially if we view
=<<
as a (call-by-value) function application.
The function iter
is a variant of bagofN
of the LogicT library.
The latter accumulates all (if the first argument is Nothing
)
or the first several values produced by a non-deterministic
computation, returning the accumulated list. The function iter
throws
away the result since the computations were executed for their
side effects. Both bagofN
and iter
are written in terms of
msplit
.GenCode.hs [15K]
The complete code for the Haskell examples.
yield
. We introduce typical examples and show
several unsatisfactory attempts to implement them with the already
introduced tools such as LogicT. In the languages with generators,
yield
is a keyword, a special syntactic form with the dedicated
support by the run-time system. We would like however to define
yield
, as a regular library function, in terms of the existing
language features. In this section, we try several such features,
without full satisfaction. We achieve our goal in the next
section.The generators so far have been built `by hand' (as lists) or by chaining and alternation. Although theoretically sufficient, these methods become ungainly when building generators encapsulating traversals with large amount of state. In a recursive traversal function, the state is implicit in the local variables of recursive invocations of the function. When writing generators, or zippers, the state has to be explicated, to be manipulated by the three operations of the generator interface (emptiness test, query and advance). Generators encapsulating traversal are closely related to zippers. Huet zippers however deal with algebraic data structures; the algebraic structure helps writing zippers, essentially by differentiation. Generators are more general, encapsulating traversals of virtual collections, with arbitrary effects.
When CLU borrowed generators from Alphard it introduced a new way of
building them. Rather than writing the emptiness test, the query and
the advance operations and packaging them up, a CLU programmer would use
yields
. The only way to write a generator (called iterator) in CLU
was by suspending, usually an iterative computation. Since all
iterative computations in CLU were built around iterators, yields
was a mode of composing them. The inspiration for yields
came from Simula 67 coroutines. Icon has a similar form, called
suspend
, borrowed from SL5, also inspired by Simula's
coroutines.
Here is an example of suspend
in Icon from the
Icon's overview. Applying the procedure findodd
defined
below to the strings s1
and s2
non-deterministically
gives all odd-valued positions at which s1
occurs in s2
.
procedure findodd(s1, s2) every i := find(s1, s2) do if i % 2 = 1 then suspend i endSince
find
is itself a generator in Icon, findodd
is a derived
generator (in our case, a `filtered' generator), which gives us
an idea of writing findodd
with just a MonadPlus:findodd :: MonadPlus m => String -> String -> m Int findodd s1 s2 = do i <- findIM s1 s2 if i `mod` 2 == 1 then return i else mzeroAlas, the translation from Icon to Haskell is not idiomatic; we would like to reproduce the explicit iteration over
find
's collection. We accomplish that in the next
section.
The construct suspend
, often called yield
, has become the emblem
of generators, spreading to many other languages. Our next
example comes from Python, which, since version 2.2, has
generators and the keyword yield
to build them. The example is quoted
from David Mertz's Charming Python article, as an elegant, exemplary
generator. The similarity to Icon is quite striking.
>>>> # A recursive generator that generates Tree leaves in in-order. >>> def inorder(t): ... if t: ... for x in inorder(t.left): ... yield x ... yield t.label ... for x in inorder(t.right): ... yield x
We describe several attempts at implementing this example in Haskell. First, we need a tree datatype
type Label = Int data Tree = Leaf | Node Label Tree Tree deriving Showand a sample complete tree
tree1
that looks as follows:Node 1 (Node 2 (Node 4 Leaf Leaf) (Node 5 Leaf Leaf)) (Node 3 (Node 6 Leaf Leaf) (Node 7 Leaf Leaf))
The first attempt builds the in_order generator by alternation. The current element may come from the left branch, from Node's label, or from the right branch:
in_order1 Leaf = mzero in_order1 (Node label left right) = in_order1 left `mplus` return label `mplus` in_order1 rightWe can either collect the labels of all traversed nodes in a list, printing it afterwards, or print the labels as we go:
in_order1_r = observe (bagofN Nothing $ in_order1 tree1) >>= print -- [4,2,5,1,6,3,7] in_order1_r' = observe $ iter Nothing $ in_order1 tree1 >>= liftIO . printThis attempt is not satisfactory. Our translation has replaced chaining, implicit in the Python code, by alternation. That trick will not work in general, e.g., when traversing the tree post-order and yielding at each node the sum of the labels, including all descendants. We need the result of a branch traversal, and so we cannot replace chaining (bind) with alternation (mplus).
Traversing with yield
looks quite similar to the logging of values
as we compute -- to the Writer monad. It is tantalizing to
implement yield
as tell
. It indeed works, in the first
approximation. Here is our traversal generator:
in_orderW :: (MonadWriter [Label] m, MonadIO m) => Tree -> m () in_orderW Leaf = return () in_orderW (Node label left right) = do in_orderW left liftIO . putStrLn $ "traversing: " ++ show label tell [label] in_orderW right
We have added the statement to print the label, the trace, right before `yielding.' Alas, this code is too strict, making the traversal not incremental. We have to complete the traversal before we get hold of the accumulated `log'. The following makes the problem clear:
in_orderW_r = do (_,labels) <- runWriterT $ in_orderW tree1 let some_labels = take 3 labels print some_labels
We only wanted the first three labels encountered during the traversal. The printed trace shows that we have traversed the whole tree nevertheless.
Our last attempt uses `final zippers', implemented with the library
of delimited control, the CC
monad.
in_orderCC :: Monad m => Tree -> CC (P m Label) m () in_orderCC Leaf = return () in_orderCC (Node label left right) = do in_orderCC left yield label in_orderCC rightThe code is clear and idiomatic. Yet we hoped to avoid using delimited control explicitly, counting at gaining insight into delimited control and its idioms. We hoped that LogicT might turn out sufficient.
Generator1.hs [3K]
Generator2.hs [7K]
Implementing Generators directly with the delimited continuations, using the CCMonad libraries.
GenCode.hs [15K]
The complete code for the Haskell examples.
yield
by equational reasoning.
The earlier examples have likened yield
to raising an exception:
both yield
and raise
set up a side-channel, for `side-results'
of a computation. Both yield
and raise
leak values from the depths
of an expression. Unlike raise
, yield
gives a choice to
continue the computation. We need non-determinism then. Thus, we need
a monad with the operations raise
and mplus
. The
former should satisfy the law
raise e >>= k === raise eformalizing that raising an exception aborts the rest,
k
, of the
computation. The operation mplus
should satisfy the law of
left-distributivity:(m1 `mplus` m2) >>= k === (m1 >>= k) `mplus` (m2 >>= k)formalizing that a choice splits the computation. Monad transformers let us build the monad with the required properties, by applying the exception monad transformer,
ErrorT
, to an instance of MonadPlus
.
(Traditionally, the transformer ErrorT e
has required the type of
the exception e
to be a member of the Error
class. Since a
generator should be able to yield values of any type, we need an
error monad transformer free from this silly restriction. Such a
transformer is called EitherT
.) Since both monads have control
effects, the order of the composition matters: applying a MonadPlus
transformer to an exception monad fails the left distributivity (we
leave showing that as an exercise).
Let e
be a complex computation, which can be written like
e = m >>= kor, eta-expanded,
e = m >>= \x -> k xLet us regard
m
as an `already computed' part of e
,
with k
being the remaining part. Since m
is already computed,
it is equivalent to return v
, where v
is the
(intermediate) result, to be bound to x
and used in the remainder
of e
.Let us calculate:
ey =def= m >>= \x -> (raise x `mplus` return ()) >> k x === m >>= \x -> (raise x `mplus` return ()) >>= \() -> k x {left distributivity} === m >>= \x -> (raise x >>= \() -> k x) `mplus` (return () >>= \() -> k x) {the law of exception} === m >>= \x -> raise x `mplus` (return () >>= \() -> k x) {return is the left unit of bind} === m >>= \x -> raise x `mplus` (k x) {recall that m === return v} === (raise v) `mplus` m >>= k === (raise v) `mplus` eThus
ey
is a non-deterministic computation that gives us a choice
of e
and of the intermediate result of e
, up to the point m
.
We have obtained thatyield :: MonadPlus m => e -> EitherT e m () yield x = raise x `mplus` return ()`splits' the computation at the point of its use, yielding as the first choice the intermediate result incorporated in the exception. We have unwittingly obtained the typing of generators:
MonadPlus m => EitherT e m a
is the
type of a generator returning the result of type
a
while yielding intermediate results of type e
.
The derived yield
let us implement the examples
of generators from the previous section -- this time, idiomatically.
yield
, expressing it in terms
of exceptions and alternation. We can implement the examples of
building generators by suspension -- this time, with satisfaction. We start
with the example of the in-order tree traversal, yielding the node labels
encountered along the way. The code below looks as it sounds. We
have added the statement to print the trace of the encountered nodes,
so we can later tell how far we have gone.in_order2 :: (MonadIO m, MonadPlus m) => Tree -> EitherT Label m () in_order2 Leaf = return () in_order2 (Node label left right) = do in_order2 left liftIO . putStrLn $ "traversing: " ++ show label yield label in_order2 right
We emphasize that in Python, Ruby, etc., yield
is a keyword and
generators are built-in. In Haskell, yield
is a regular, small,
user-defined function, and generators are programmed-in.
Recall that a yielded value is transmitted as an exception. To use
the value, we have to catch the exception -- to `run' the EitherT
monad:
catchError :: Monad m => EitherT e m e -> m e catchError (EitherT m) = m >>= check where check (Left x) = return x check (Right x) = return xA reader familiar with delimited continuations may notice that
catchError
is essentially reset
. Merging the `side-channel' into
the `main channel' is the function of the exception handling form,
or reset
.
If the generator only yields, returning no useful result, the
following variant of catchError
is fitting. The code supports the
standard, in CLU, Icon and other languages, convention that the
normal return from a generator is the end of the traversal.
catchError' :: MonadPlus m => EitherT e m () -> m e catchError' (EitherT m) = m >>= check where check (Left x) = return x check (Right x) = mzeroWe traverse our sample
tree1
completely:in_order2_r :: IO () in_order2_r = observe $ iter Nothing $ do i <- catchError' (in_order2 tree1) liftIO . putStrLn $ "Generated: " ++ show ior partially, asking only for the first two elements:
in_order2_r' :: IO () in_order2_r' = observe $ iter (Just 2) $ do i <- catchError' (in_order2 tree1) liftIO . putStrLn $ "Generated: " ++ show iThe trace confirms the incremental, on-demand traversal. The trace of using a node label is printed right after the trace of encountering the label during the traversal. In the second example, the traversal has stopped right after the desired two elements have been consumed.
We implement the more complex post-order example, yielding at each node the sum of the labels for the subtree rooted at the node. The generator now returns a useful value (the computed sum for all labels in the tree), in addition to yielding the intermediate results. We print the intermediate results, as they are computed, and the final result.
post_order :: MonadPlus m => Tree -> EitherT Label m Label post_order Leaf = return 0 post_order (Node label left right) = do sum_left <- post_order left sum_right <- post_order right let sum = sum_left + sum_right + label yield sum return sum post_order_r = observe $ iter Nothing $ catchError (post_order tree1) >>= liftIO . print
We finish with the Icon example of yielding while consuming the results of another generator.
procedure findodd(s1, s2) every i := find(s1, s2) do if i % 2 = 1 then suspend i endOur translation now preserves the structure of Icon's code. The function
iterE
below is our old iter
`lifted' to the EitherT
monad,
propagating exceptions.findodd2 :: (Monad m, LogicT t, MonadPlus (t m)) => String -> String -> EitherT Int (t m) () findodd2 s1 s2 = iterE Nothing $ do i <- findIM s1 s2 if i `mod` 2 == 1 then yield i else return ()
yield
turns any traversal to a generator.
We confirm this result, literally. Any abstract data type
that implements the Data.Foldable
interface implements
the generator interface. In fact, any (collection)
type T
with the operation like
mapM_ :: Monad m => (Element T -> m b) -> T -> m ()
that lets a monadic action examine its elements (of the type Element T
)
implements the generator interface. The claims holds regardless of the
implementation of the data type, whether it is a data
structure or if the elements to traverse are computed on-the-fly.The proof of the claim is constructive, short, and trivial:
foldable_gen :: (MonadPlus m, F.Foldable t) => t a -> EitherT a m () foldable_gen = F.mapM_ yieldThe type says it all: any Foldable is a generator.
As an example, the following code uses the generator to extract the first three elements of a foldable and return them in a list. The traversal is done only to the extent needed to obtain the desired three elements.
trav :: (Monad m, F.Foldable t) => t a -> m [a] trav t = observe $ bagofN (Just 3) $ catchError' $ foldable_gen t
Zipper from any Traversable
Zipper does need Traversable
, or the operation like mapM :: Monad m => (a -> m b) -> t a -> m (t b)
,
since the result of unzipping has to be a (possibly modified)
input data structure t b
. Generator merely visits the elements
of the data structure; it does not have to rebuild and return it.
Thanks to Roman Cheplyaka for this observation of the difference between
the generator and the zipper.
Simple Generators v. Lazy Evaluation
The operation foldG
is one of the basic operations of simple generators
BactTrackT (EitherT e Identity) a
, essentially in the monad
Either e [a]
. The computation with the raised exception becomes
simply Left e
; the list of choices [a]
is forgotten.
It does not have to be that way: we have implemented exception handling
in the logic programming language Kanren
that preserves choice points. In other words, Kanren's monad is
essentially [Either e a]
; raising an exception when evaluating
a goal does not affect the other choices. It immediately follows
that Kanren supports generators, which we demonstrate below.
Hopefully the implementors of other logic programming systems might
consider an option to preserve choice points upon exceptions.
The poster example of generators, the in-order traversal of a tree,
is still implementable in Prolog. The example in_order1
described
earlier on this page made clear that a stateless traversal with suspensions
is implementable with non-determinism only. Here is the code in Prolog.
Assume that make_full_tree(+Depth,-Tree)
makes the full node-labeled binary tree with the given depth:
?- make_full_tree(3,T). % T = node(1, node(2, node(4, leaf, leaf), node(5, leaf, leaf)), % node(3, node(6, leaf, leaf), node(7, leaf, leaf))).
We define the predicate in_order1(+Tree,?Label)
that holds
holds if the Tree
has the node with the given Label
.
The tree is traversed in-order.
in_order1(node(LabN,Left,Right),Label) :- in_order1(Left,Label); LabN = Label; in_order1(Right,Label).We can traverse the tree interactively
?- make_full_tree(3,Tree1), !, in_order11(Tree1,L). % L = 4 ; % L = 2 ; % L = 5 ; % ...accumulate all traversal results in a list
?- make_full_tree(3,Tree1), !, findall(L,in_order11(Tree1,L),B). % B = [4, 2, 5, 1, 6, 3, 7].or print labels until the label 5 is encountered, which stops the traversal.
?- make_full_tree(3,Tree1), !, in_order11(Tree1,L), write(L), nl, L == 5, !. % 4 % 2 % 5
Alas, we cannot implement the post_order
example, of an accumulating
traversal yielding intermediate results.
The running sum is the state of the traversal, threaded through the
whole computation. Prolog's disjunctions launch
`processes' that execute, at least conceptually, in parallel and
so do not share state (without stooping to imperative features like record
).
Accumulating traversals cannot be implemented using only the
`parallel composition'.
Discarding choice points upon raising an exception was Prolog's
design choice rather than a necessity. The choice points can be
preserved, as we demonstrated in Kanren. To implement generators
in Kanren, the only thing remains is to define yield
:
(define (yield term) (anye (throw term) succeed))The definition transcribes the earlier derived formula
yield x = raise x `mplus` return ()
in the most straightforward way.
All examples of generators on the present page can thus be
written in Kanren. We show the challenging post_order
, traversing
a tree post-order, yielding the running sum of the current
label and the labels in the left and the right branches
(that is, the sum of the labels for the subtree rooted at the current node).(define (post-order* tree running-sum) (conde ((== tree #f) (== running-sum 0)) (else (fresh (label left right sum-left sum-right) (== tree (list label left right)) (post-order* left sum-left) (post-order* right sum-right) (project (sum-left sum-right label) (let ((sum (+ sum-left sum-right label))) (printf "traversing: sum ~a~n" sum) (== running-sum sum))) (yield `(*post-order* ,running-sum)) ))))The predicate
post-order +Tree ?Sum
below
holds if there is a branch in a Tree whose labels sum to Sum.(define (post-order tree sum) (fresh (s) (catch (post-order* tree sum) `(*post-order* ,s) (== s sum))))We traverse the sample
tree1
completely, obtaining
the running sums of its labels:tree1 ;; (1 (2 (4 #f #f) (5 #f #f)) ;; (3 (6 #f #f) (7 #f #f))) (run #f (q) (post-order tree1 q)) ;; traversing: sum 4 ;; traversing: sum 5 ;; traversing: sum 11 ;; traversing: sum 6 ;; traversing: sum 7 ;; traversing: sum 16 ;; traversing: sum 28 ;; (4 5 11 6 7 16 28 28)The second argument of
post-order
does not have to be a variable.
In other words, it could be the output or the input parameter.
If we specify Sum
as a number, the very
same procedure searches the tree for the occurrence of a branch whose
labels sum to the given number. We stop the search as soon as the branch
is found.(run 1 (q) (post-order tree1 16) (== q 'yes)) ;; traversing: sum 4 ;; traversing: sum 5 ;; traversing: sum 11 ;; traversing: sum 6 ;; traversing: sum 7 ;; traversing: sum 16 ;; (yes)
yield
and a variety of iterator operators. We implement
simple and accumulating traversals and the challenging Icon examples.Generators are in essence lazy streams of possibly effectful computations. OCaml has so-called streams (whose special syntax is now supported by a camlp4 syntax extension), used for writing parsers. These streams are limited: for example, there does not seem to be a robust way to join two streams, to implement Icon's alteration or Haskell's mplus. OCaml also supports lazy evaluation; however, throwing exceptions from lazy computations (let alone doing other control effects) is undefined.
The formula ``yield = exceptions + non-determinism'' gives us generators with the well-defined behavior in the presence of effects and which we can concatenate and transform. For non-determinism, the initial implementation uses Hansei, a probabilistic programming language embedded as a library in OCaml. For exceptions, we use native OCaml exceptions. The general theory applies immediately:
let yield x = if flip 0.5 then raise xIt just works. The ordinary
in_order
tree traversal and the
challenging, and rare, accumulating tree traversal, post_order
, are
straightforward. They work on the first try.
We systematically optimize the implementation, to avoid the overhead
of Hansei and probabilities, not relevant for generators.
The semantics of Hansei primitives leads, after a bit of equational
re-writing, to the specification for the generator primitives
yield
and msplit
(msplit
reifies a generator expression into
a stream):
type 'a stream = Nil | Cons of 'a * (unit -> 'a stream) msplit (fun () -> exp : unit) === Nilif
exp
is a value or evaluates without yield
, andmsplit (fun () -> ctx[yield x]) === Cons (x, fun () -> msplit (fun () -> ctx[()]))if
exp
has the form ctx[yield x]
where ctx[]
is the evaluation
context, not crossing the msplit
boundary. Contrasting the equations
with the laws of delimited control operators shift0
and
push_prompt
from the delimcc library gives the
implementation below. (Although Hansei is written in terms of
delimcc as well, the Hansei implementation differs.)(* val yield : 'a stream Delimcc.prompt -> 'a -> unit *) let yield p x = shift0 p (fun k -> Cons (x,k)) (* val msplit : 'a stream Delimcc.prompt -> (unit -> unit) -> 'a stream *) let msplit p thunk = push_prompt p (fun () -> thunk (); Nil)
In Python, yield
is a keyword. In OCaml, it is a regular OCaml function
-- a user-defined one-liner. Various loop iterators also become
regular user-defined functions. For example, here is the for
-loop:
let rec stream_iter (f : 'a -> unit) = function | Nil -> () | Cons (x,t) -> f x; stream_iter f (t ()) let for_loop p gen f = stream_iter f (msplit p gen)
We may likewise define the while loop, or the loop that takes at
most n
elements from the stream, etc.
Generator expressions and streams may nest. A prompt,
the value of the type 'a stream Delimcc.prompt
, identifies the
enumerator of a stream for yield
to generate. The prompt plays
the role of a `loop label' in Perl, for example, letting
loop control primitives to reach beyond the inner-most loop.
The challenging post_order
example, of traversing
a tree post-order yielding the running sum of labels of tree
branches takes the following form:
let pint = new_prompt () let rec post_order = function | Leaf -> 0 | Node (label, left, right) -> let sum_left = post_order left in let sum_right = post_order right in let sum = sum_left + sum_right + label in yield pint sum; sum for_loop pint (fun () -> Printf.printf "Final: %d\n" (post_order tree1)) (fun x -> Printf.printf "Got %d\n" x)Unlike Python generators, for example,
post_order tree1
is a generator
expression: besides yielding intermediate results, it also produces
a useful value.We finish with the Icon example of yielding while consuming the results of another generator:
procedure findodd(s1, s2) every i := find(s1, s2) do if i % 2 = 1 then suspend i endIn OCaml, the example reads:
let findodd s1 s2 = for_loop pfind (fun () -> find s1 s2) (fun i -> if i mod 2 == 1 then yield pfind i)where
find s1 s2
generates the positions at which the string
s1
occurs in s2
. The function find
is not a built-in; rather,
it is written in terms of yield
. The OCaml translation
preserves the structure of the Icon code. We run the findodd
example
as follows, showing off the nested generation:for_loop pfind (fun () -> findodd "a" "abracadabra") (fun i -> Printf.printf "found index %d\n" i)
generator.mli [2K]
The interface of the generator library
generator.ml [8K]
The complete derivation of the implementation of the generator library, from the general theory of generators and the semantics of Hansei primitives.
test_gen.ml [7K]
Makefile [2K]
Many examples of using generators, borrowed from Python and Icon, and the challenging post_order
example. The test code illustrates defining custom loop operators.
HANSEI: Embedded Probabilistic Programming
A direct implementation of delimited continuations for byte- and native-code OCaml
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML