MonadIOAND without pattern-matching
O(n^2) time. A more sophisticated code has O(n*logn) time complexity. The article contains the full code to
arrange a sequence in a complete binary tree and to extract an
arbitrary element from the tree maintaining the completeness of the
remaining tree.perfect-shuffle.txt [plain text file]
The article with the full source code for the two
implementations, including the code for building and deconstructing a
complete binary tree. It was originally posted as Provably
perfect shuffle algorithms on a newsgroup comp.lang.functional on Mon, 3 Sep 2001 13:59:46 -0700
Dave Bayer, Persi Diaconis: Trailing the dovetail shuffle to its lair
Ann. Appl. Probab. 2 (1992), no. 2, pp. 294-313.
In the context of card shuffling, `perfect shuffle' means
a deterministic interlacing of the two halves of the deck.
The paper studies the perfect and other card shuffles in detail.
Cycles certainly make it difficult to transform graphs in a pure non-strict language. Cycles in a source graph require us to devise a way to mark traversed nodes -- however we cannot mutate nodes and cannot even compare nodes with a generic ('derived') equality operator. Cycles in a destination graph call for keeping track of already constructed nodes so we can make back-references. An obvious solution is to resort to a state monad or IORefs. There is also a monad-less solution, which involves maintaining a dictionary of already constructed nodes. This solution is less obvious: seemingly we cannot add a node to the dictionary until we have built the node. This fact means that we cannot use the updated dictionary when building the descendants of the node -- which need the updated dictionary to link back. The problem can be overcome however with a credit card transform (a.k.a. "buy now, pay later" transform). To avoid hitting the bottom, we just have to "pay" by the "due date".
For illustration, the article below considers the problem of printing out a non-deterministic finite automaton (NFA) and transforming it into a deterministic finite automaton (DFA). Both NFA and DFA are represented as cyclic graphs.
CCard-transform-DFA.lhs [10K]
A literate Haskell code expounding the technique. The code was first posted in a message Transformations of cyclic graphs [Was: Efficiency of list indices in definitions] on Thu, 08 Aug 2002 16:11:30 -0700 on the Haskell-Cafe mailing list.
Tying the knot: How to build a cyclic data structure
<http://haskell.org/wiki/wiki?TyingTheKnot>
A CommonHaskellIdioms Haskell Wiki page.
Trying to tie the knot
<http://www.mail-archive.com/haskell@haskell.org/msg10687.html>
A message on the Haskell mailing list about a fixed-point combinator helping to tie a knot with forward links. Posted on Fri, 12 Apr 2002 18:44:07 -0700.
Deforestation is usually defined as the elimination of an intermediate data structure, often a collection of items, sent from a single producer to a single consumer. Jorge Adriano showed an interesting example of one producer feeding two independent consumers. The two streams of data are mutually dependent. Furthermore, the rate of production is non-uniform: Generating an element in one stream may require generating trillion items in the other stream. If the first consumer is being evaluated, that consumer causes the evaluation of the producer until the latter yields an item. The trillion of items incidentally produced in the other stream have to be stored somewhere. In more OS-centric terms, the evaluator can't generally reduce two expressions in parallel. If one stream consumer is running, the other consumer sits in the ready queue. Therefore, data sent to the second consumer must be buffered, sometimes at great expense.
We will consider the problem of one producer and two consumers in the general setting. We derive a deforested version, which no longer needs to buffer any produced items. When we run that version, the memory retaining profile shows no space leaks. We also discuss a ``parallel'' writing of several streams into two distinct files. Our solution is safe and yet the i/o is effectively interleaved. The deforested solution is derived, via a sequence of equivalent transformations. In fact, the derived code worked on the first try.
de-biforestation.lhs [12K]
The article as a well-commented literate Haskell code. An earlier version was posted on the Haskell-Cafe mailing list
on Tue, 3 Dec 2002 22:05:05 -0800 (PST).
Jorge Adriano: producing and consuming lists
A message posted on the Haskell-Cafe mailing list
on Tue, 5 Nov 2002 16:04:58 +0000. The message describes the problem
of consuming two inter-dependent streams while avoiding space leaks.
We consider three examples of such transformations, including two pure functional implementations of a bubble-sort. The imperative code is actually quite complex: nested loops, data dependencies across iterations, reading from and writing into the array, conditional branches that depend on data and the iteration history, control dependencies across nested loops. We show two solutions in Haskell. Both use fast, imperative STArrays; both try to separate the imperative part from a functional part. The imperative part is performed "in a monad" while the functional part is written in ordinary Haskell. The two solutions are closely related, and written to emphasize their similarity. One of the solution is an interpreter for a trivial "virtual machine".
The analysis of data and control dependencies across loop iterations, scheduling of instructions and interlocking of their results, pre-fetching of data -- is the job of an advanced compiler or the CPU. On the other hand, we have to do the same kind of analyses to separate the functional part of a computation from its imperative part. It appears functional programming is closer to "the bare metal" as one would have thought.
The assertion that a dependency analysis of imperative array code is related to extracting a functional program from an imperative one has been stated by Andrew W. Appel, in his paper "SSA is Functional Programming". The present article has shown several illustrations of the thesis. We have done manual dependency analyses of several code snippets that update arrays in place. The analyses let us extract pure-functional parts of imperative programs. The rest requires serialization and can be handled by an ST monad.
The article with the full source code [plain text file]
The article itself. It was originally posted as FP vs assembly programming for super-scalar CPUs [Was: Nested Loops in Clean] on a newsgroup comp.lang.functionalon Thu, 16 Aug 2001 18:24:24 -0700
Andrew W. Appel SSA is Functional Programming
ACM SIGPLAN Notices v. 33, no. 4, pp. 17-20, April 1998.
Takusen is a joint project with Alistair Bayley. Takusen is a DBMS access library. It has a backend for Oracle on Unix, Linux or Windows via OCI, and a stub to test the library without any database. The infrastructure and the stub let one interface natively with other databases.
The distinguished feature of Takusen is processing query
results using a left-fold enumerator. The user supplies an iteratee
function, which receives rows one-at-a-time from the result-set. The
number of the arguments to the iteratee is the number of the columns in the
result-set, plus the seed. Each column in the result-set has
its own Haskell type. The latter could be a Maybe type if the
particular iteratee wishes to process NULLs.
The benefits are: more convenient and intuitive enumeration, iteration, and accumulation (see tests for examples); the retrieved data are not merely strings but have Haskell types: Int, Float, Date, etc.; buffer preallocation; pre-fetching; support for both enumerators and cursors, proper handling of all errors including various IO errors. No unsafe operations are used.
In lazy evaluation, an expression is not evaluated unless its result is needed. Once an expression is evaluated, its result is remembered, or memoized, in case the value of the same expression is needed again. This on-demand evaluation and memoization happens transparently and `for free', sparing us from explicitly programming these features. Another great attraction of lazy evaluation is the potential to improve algorithmic complexity: As Bird et al. wrote, ``for some problems at least -- lazy evaluators are strictly more powerful than eager ones.''
The implicit memoization however is a trade-off. We trade space to store expression's result for time gained in avoiding recomputations. The trade-off is particularly worthy if the result takes much less memory than needed for the closure (thunk) of the expression. This is often the case with numeric code. Non-deterministic, probabilistic programming and general AI search problems are the opposite case. A non-deterministic expression is typically represented as a lazy search tree, which is often huge even for small expressions. It becomes a better trade-off to re-evaluate an expression rather than to fill all memory with results.
Alas, GHC is designed for the opposite trade-off. Therefore, using Haskell even for simple search problems is quite a challenge since memoization gets in the way. Preventing the memoization is surprisingly hard, since GHC is very good at finding the opportunities for it, even within thunks. This article uses a typical example of non-deterministic search to illustrate the problem posed by lazy evaluation and to describe a few tricks to prevent memoization. Some of them are unexpected.
Our running example computes and prints the first n
elements of the infinite stream of Pythagorean triples pyth, using three infinite streams of integers from 1. As
typical for non-deterministic programs, the example generates candidate
solutions and rejects most of them.
from :: MonadPlus m => Int -> m Int
from i = return i `mplus` from (i+1)
pyth :: MonadPlus m => m (Int,Int,Int)
pyth = do
x <- from 1
y <- from 1
z <- from 1
if x*x + y*y == z*z then return (x,y,z) else mzero
The interleaving of three infinite streams precludes the List
monad (depth-first search), see below. Instead of a lazy list, we use a search
tree to represent the result of a non-deterministic computation:data Tree1 a = Fail1 | Val1 a | Node1 (Tree1 a) (Tree1 a)We rely on non-strictness of Haskell to prevent evaluation of tree nodes until we traverse to them: after all, the tree may be infinite -- as is the case in our example.
Tree1 is an
instance of Monad and MonadPlus; here are the most complex
parts of these instances (see the accompanying code for the rest): Node1 e1 e2 >>= f = Node1 (e1 >>= f) (e2 >>= f)
mplus = Node1
To `run' the non-deterministic computation and produce the the stream
of triples, we traverse the Tree, extract the
successfully produced results from the Val leaves and
return them as a lazy list. Different tree traversals correspond to
different non-deterministic search strategies. Depth-first
traversal (DFS) is the most efficient, needing only O(d)
space to examine a node at depth d. Alas, an infinite
branch in the tree traps DFS. In our pyth tree, DFS will
get stuck chasing an infinite chain of Fail.
Breadth-first traversal (BFS) in contrast shall visit any node in a
tree, given time. BFS is a complete strategy: if a
solution (leaf Val) exists, BFS will find it. Alas, BFS
needs a lot of space to maintain the job queue, the frontier of the
search. At search depth d the frontier may
take O(2^d) space. Iterative deepening is a hybrid
method, complete as BFS yet needing as little of working space as
DFS. Iterative deepening explores the progressively long `prefix' of
tree with DFS. Each new exploration phase repeats all the work of the
previous explorations of shallower prefixes. Iterative deepening
clearly trades time for space. Despite its gross wastefulness, the
method is quite popular, for example, in automated theorem proving.
Its trade-off has proved worthwhile.
Here are the results of computing and printing the first n
Pythagorean triples. The code was compiled by GHC 7.0.4 with optimization
-O2.
| Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
|---|---|---|---|---|
| BFS, n=30 | 13.0 | 5.0 | 3 | 465 |
| Iter Deep, n=30 | 0.15 | 0.06 | 5 | 1506 |
| Iter Deep, n=100 | 4.8 | 1.3 | 56 | 20832 |
Recall that iterative deepening keeps re-traversing the tree. Each
exploration cycle redoes all the previous explorations. Lazy
evaluation helps, it seems. When we first reach Node e1
e2, we evaluate e1 and e2 that were
stored in the node unevaluated (otherwise, we would have diverged
constructing the tree, which is infinite). Lazy evaluation
replaces e1 and e2 with their results. When
iterative deepening comes across the same node in a new cycle, it gets
the results of e1 and e2 right away. That
seems like a good thing, until we look at the space. As iterative
deepening explores the Tree, it needs more and more memory to
store the explored prefix, which is about twice the size of the BFS
frontier. Lazy evaluation thus defeats the purpose of iterative
deepening, of recomputing the revisited tree nodes to avoid
storing them. Lazy evaluation does exactly the wrong thing.
In a strict language, we would have used thunks to represent infinite trees. If tree nodes store thunks, lazy evaluation would memoize thunks -- which evaluate to themselves rather than to trees. It seems therefore the following modification should stop lazy evaluation's meddling in iterative deepening.
data Tree2 a = Fail2 | Val2 a | Node2 (() -> Tree2 a) (() -> Tree2 a)
Node2 e1 e2 >>= f = Node2 (\() -> e1 () >>= f) (\() -> e2 () >>= f)
mplus e1 e2 = Node2 (\() -> e1) (\() -> e2)
Every time we need traverse through a Node, we have to
force the thunks and re-compute the branches. At least, in theory.
Here is the practice.
| Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
|---|---|---|---|---|
| BFS, n=30 | 13.0 | 5.0 | 3 | 509 |
| Iter Deep, n=30 | 0.3 | 0.1 | 8 | 2964 |
| Iter Deep, n=100 | 10.6 | 1.7 | 96 | 39244 |
Such an unexpected result was quite a puzzle. It seems GHC is
just too smart. Apparently it notices that a thunk
(\() -> e) can only be applied to the same argument.
Therefore, the first time the thunk is forced by applying it
to (), the result can justifiably be memoized: the next
time around the thunk will be applied to the same (), and
hence, will give the same result anyway.
The new fix is to deliberately confuse GHC. We obfuscate the
tree-construction operations (>>=) and mplus
with auxiliary functions app and app1.
Node3 e1 e2 >>= f = Node3 (app1 e1 f) (app1 e2 f)
mplus e1 e2 = Node3 (app e1) (app e2)
{-# NOINLINE app #-}
app e () = e
{-# NOINLINE app1 #-}
app1 e f () = e () >>= f
That does the trick. Here are the results.
| Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
|---|---|---|---|---|
| BFS, n=30 | 13.2 | 4.7 | 3 | 413 |
| Iter Deep, n=30 | 0.4 | 0.03 | 2 | 78 |
| Iter Deep, n=100 | 13.4 | 0.9 | 2 | 413 |
We have seen that lazy evaluation is a trade-off, which may be hurtful in some cases, in particular, in search problems over huge data structures, where it is often beneficial to recompute the result than to store it. Preventing lazy evaluation is possible but surprisingly tricky.
Richard S. Bird, Geraint Jones and Oege de Moor:
More Haste, Less Speed: Lazy versus Eager Evaluation
J. Functional Programming, 1997, v7(5), pp. 541--547
STrees.hs [11K]
Complete code for our example and the benchmark.
Delimited control and breadth-first, depth-first, and iterative deepening search with more details on BFS and iterative deepening
The problem is adding memoization to a recursive function
Int -> Int. That seems simple, until we realize that
finding the value of that particular function at the argument
x requires evaluating the function for smaller arguments
and may require evaluating the function for larger arguments. Thus
straight-forward memoization leads to a large and sparsely populated
memoization table. As the original poster observed, the memoized
function begins thrashing sooner than the original code.
The solution is to balance the cost of recomputing the value
versus the cost of maintaining the memoization table and doing the
lookup. It is worth memoize the values of the function for some small
x -- which are being used frequently. For larger x, re-computation is better. The parameter sc_upb in
the code determines the balance, the largest argument value for which
we still memoize. The simple benchmark shows that this controlled
memoization does improve both the space and time performances.
A notable feature of the code is the transparency of adding the controlled memoization. The body of the algorithm remains essentially the same.
This is a simple trick of making a function strict in some or all of its arguments, without affecting the body of the function.
iterate2 f x n | seq f $ seq x $ seq n $ False = undefined
iterate2 f x n = --- as before
That is, we add one simple line to the definition of the
function, without changing the proper body of the function at all. We
prepend an extra clause with the guard that always yields False. Before failing, however, the guard forces the evaluation
of the selected arguments of the function. If needed, seq
can be replaced with deepSeq.
Making (some of the) arguments of a function strict can make
the function run in constant space, and so an iteration (Runge-Kutta
iteration in the above example) can proceed without running out
of (stack) space. The function iterate2 (and a similar
function rk4Next) were applied to arguments like
(n-1) and y + a1/6 -- which is a red
flag. These are exactly the kinds of expressions that lead to memory
exhaustion. Perhaps it is because the size of an unevaluated thunk for
(n-1) is so much bigger than the size of the evaluated
thunk. It seems that arithmetic expressions are the best candidates
for some opportunistic evaluation...
Re: How do I get a long iteration to run in constant space
<http://www.haskell.org/pipermail/haskell-cafe/2004-October/006983.html>
Discussion thread on the Haskell-Cafe mailing list, where the
message was originally posted on Tue, 5 Oct 2004 00:44:40 -0700 (PDT)
Partial signatures
Another application of the trick of adding a clause with an
always failing guard
We demonstrate an executable model of the evaluation of definite logic programs, i.e., of resolving Horn clauses presented in the form of definitional trees. Our implementation, DefinitionTree, is yet another embedding of Prolog in Haskell. It is distinguished not by speed or convenience. Rather, it is explicitly designed to formalize evaluation strategies such as SLD and SLD-interleaving, to be easier to reason about and so help prove termination and other properties of the evaluation strategies. The main difference of DefinitionTree from other embeddings of Prolog in Haskell is the absence of name-generation effects. We need neither gensym nor the state monad to ensure the unique naming of logic variables. Since naming and evaluation are fully separated, the evaluation strategy is no longer concerned with fresh name generation and so is easier to reason about equationally.
DefinitionTree.hs [14K]
The commented source code
Pure, declarative, and constructive arithmetic relations
DefinitionTree has been used to describe SLD and
SLD-interleave evaluation strategies and to prove the basic properties
of solution sets.
We present an efficient library for writing CGI and FastCGI programs. Our goal is the least latency and the least memory consumption. The library underlies a production web application server Metcast. The server has been running since February 2007, sending large amounts of text and binary data in response to a continuous stream of requests. The server processes a request in constant and small memory imposing little latency, encoding and sending to the client chunks of data as soon as the database delivers them. The server handles hundreds of such requests without allocating memory; in fact, it uses only one 16KB buffer for all of its I/O. With the exception of an occasional rank-2 type and extended pattern guards, all of the code is in Haskell98. Not a single unsafeHaskell function occurs in the code.
The library is originally based on the code written by Jesse Tov, who in turn used NewCGI by Bjorn Bringert et al. Most of the code has been re-written. The biggest change is minimizing the amount of state. The library is centered around generalized input and output ports: a simple IO interface to read, write and copy via a single, large, once allocated buffer. The buffer and its dimensions are hidden, to discourage aliasing. We use this interface for the IO to and from the client, to and from files or the PostgreSQL database, and for reading from external servers.
A generalized input port is a procedure to read at most the
specified number of bytes into a given buffer. The procedure should
return the number of bytes actually read, or 0 on EOF. It may throw
various exceptions if reading didn't go well. The procedure is like
hGetBuf partially applied to a handle.
newtype Input = Input (forall m. EMonadIO m => Ptr Word8 -> Int -> m Int)
inputFd :: Fd -> Input
inputStr :: EMonadIO m => String -> m Input
inputCombined :: EMonadIO m => Input -> Input -> m Input
One can construct an Input from a Posix file descriptor
or a String. One can combine two Inputs into
one, which reads from the first generalized port until EOF and then
reads from the second. The frequently mentioned EMonadIO is a class of monads permitting IO along with throwing and catching of arbitrary errors. The IO monad and most of its transformations
are in that class. EMonadIO lets us write
gthrow, gcatch, gbracket, etc.
without even thinking of the current monad.
Dually, a generalized output port is a procedure to write the specified number of bytes from the given buffer. It may throw various exceptions if writing didn't go well.
newtype Output = Output (forall m. EMonadIO m => Ptr Word8 -> Int -> m ())
outputFd :: Fd -> Output
newtype BCopy = BCopy (forall m. EMonadIO m =>
Input -> Output -> Maybe Int -> m (Maybe Int))
The CGI monad exports (Fast)CGI output and error streams as Output, and lets us access request content as Input. We can use the generalized ports for reading and writing
strings. The ports are intended though to be connected via BCopy, which copies from Input to Output the desired number of bytes or till EOF.NewerCGI.hs [29K]
The commented source code of the library
for writing CGI and FastCGI programs
The code was first mentioned in a message Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen] on the Haskell-Cafe mailing list on
Fri, 27 Jul 2007 20:34:29 -0700 (PDT)
FastCGI.hsc [3K]
Bindings to the FastCGI C client library
MonadIOHandling exceptions via catch, bracket, catchDyn, etc. in a MonadIO other than the IO itself
has been a fairly frequently requested feature. The requests tend to recur,
probably because these functions cannot be
defined for a general MonadIO. However, we can define generic
exception handling for a large and interesting subset of MonadIO,
which includes various (repeated) transformations of IO by
ReaderT, WriterT, ErrorT,
StateT, and newtype wrapping.
The generic catch has been successfully used since 2006 in a
database library Takusen, where many operations work in a monad
ReaderT Session IO; the database session data are always
available as the environment. Many other foreign libraries are structured
around sessions, connections, library environments, which can be
encapsulated in a monad. We should nevertheless be able to handle IO errors
and user exceptions that arise in such computations.
Back in 2006 Jesse Tov has done an admirably thorough job of
implementing exception handling in general monads. His code defines
two classes: EMonad and EMonadIO -- which
contain most of the interesting monads. The latter is the subclass of
the former, permitting arbitrary IO via liftIO. In either
case, we use gthrow, gbracket, gcatch,
ghandle, gfinally, etc. -- without even
thinking which Monad we are in and how error handling is actually
implemented, via ErrorT or via IO exceptions. It works
universally for most of monads of interest. The experience of
using EMonadIO since 2006 in large production code has
been most positive.
CaughtMonadIO.lhs [6K]
The complete literate Haskell code
The code includes two tests, illustrating throwing and
catching of (dynamic) exceptions in monads obtained from
IO by several applications of ReaderT, WriterT and ErrorT.
The code was originally posted as generic catch in a MonadIO on the Haskell mailing list on Tue Feb 7 22:48:24 EST 2006
Takusen's MonadIO, with gtry, gtryJust, gbracket, gfinally.The code supports both the new- and the old-style exceptions. Joint work with Alistair Bayley.
<http://code.haskell.org/takusen/Control/Exception/MonadIO.hs>
Discussion threads:
Haskell' ticket, introduced as the result of the above Haskell-Cafe discussions:
<http://hackage.haskell.org/trac/haskell-prime/ticket/110>
MySysOpen module offers a reliable, proven way of interacting
with another local or remote process via a unidirectional or
bidirectional channel. It supports pipes and Unix and TCP sockets.
MySysOpen is a simple and explicit alternative to
the multi-threaded IO processing of the GHC run-time
system. The module is the Haskell binding to
sys_open -- the extended, user-level file opening interface.
The second half of MySysOpen.hs contains several
bi-directional channel interaction tests. One checks repeated
sending and receiving of data; the amount of received data is
intentionally large, about 510K. Two other tests interact with
programs that are not specifically written for interactive use,
such as sort. The latter cannot produce any
output before it has read all of the input, accepting no input terminator
other than the EOF condition. One test uses shutdown to set
the EOF condition. The other test programs the handler
for a custom EOF indicator, literally in the file name of
the communication pipe.
MySysOpen.hs [7K]
The code for the sys_open interface and tests
sys_open.c [20K]
The complete, well-commented code for sys_open
The following code is a space-efficient implementation of the
game Mastermind. It is efficient not in terms of the heap or stack
space but in the source text space. Jan Rochel wrote the original
code and put it in his e-mail signature. He asked for a shorter
version. The present code relies on a different algorithm to compute
exact and inexact matches in two strings. The new algorithm made the
code a bit faster, but more importantly, a bit (7%) shorter. The new
code is also more obscure and exhibits some interesting patterns, in
particular f x=(x\\).(x\\).
A version to put in a .signature (285 chars):
import Random;import List;import Char;p=putStrLn;u=uncurry;f x=(x\\).(x\\)
main=mapM(\x->randomRIO(49,54))[1..4]>>=n 0.map chr>>=p.("Tries: "++).show
e=((partition$u(==)).).zip;h(p,q)=['*'|x<-p]++['+'|x<-(u f)$unzip q]
n a s=getLine>>=m where{m i|i==s=return a;m i=p(h$e i s)>>n(a+1)s}
An expanded version that is easier to read:
import Random;import List;import Char
main = mapM(\x->randomRIO(49,54))[1..4]
>>= n 0 . map chr >>= p . ("Tries: "++) . show
p=putStrLn
u=uncurry
e=((partition$u(==)).) . zip
f x=(x\\).(x\\)
h(p,q)=['*'|x<-p] ++ ['+'|x<-(u f) $ unzip q]
n a s=getLine>>=m a s
m a s i | i==s = return a
m a s i = p(h$e i s) >> n(a+1)s
The code runs on GHC(i).comp.lang.functional on Sat, 24 May 2003 00:04:40 -0700AND without pattern-matchingAND, i.e., (&&), without explicit or even implicit pattern-matching? Note that the functions not and (||) are defined in the Prelude with the use of pattern-matching. The if function may rely on pattern-matching as well. The article below gives several answers.
comp.lang.functional on Fri, 13 Jul 2001 17:45:22 -0700This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML