unsafeInterleaveST
MonadIO
AND
without pattern-matchingThe implicit memoization however is a trade-off. We trade space to store expression's result for time gained in avoiding recomputations. The trade-off is particularly worthy if the result takes much less memory than needed for the closure (thunk) of the expression. This is often the case with numeric code. Non-deterministic, probabilistic programming and general AI search problems are the opposite case. A non-deterministic expression is typically represented as a lazy search tree, which is often huge even for small expressions. It becomes a better trade-off to re-evaluate an expression rather than to fill all memory with results.
Alas, GHC is designed for the opposite trade-off. Therefore, using Haskell even for simple search problems is quite a challenge since memoization gets in the way. Preventing the memoization is surprisingly hard, since GHC is very good at finding the opportunities for it, even within thunks. This article uses a typical example of non-deterministic search to illustrate the problem posed by lazy evaluation and to describe a few tricks to prevent memoization. Some of them are unexpected.
Our running example computes and prints the first n
elements of the infinite stream of Pythagorean triples pyth
,
using three infinite streams of integers from 1
. As
typical for non-deterministic programs, the example generates candidate
solutions and rejects most of them.
from :: MonadPlus m => Int -> m Int from i = return i `mplus` from (i+1) pyth :: MonadPlus m => m (Int,Int,Int) pyth = do x <- from 1 y <- from 1 z <- from 1 if x*x + y*y == z*z then return (x,y,z) else mzeroThe interleaving of three infinite streams precludes the List monad (depth-first search), see below. Instead of a lazy list, we use a search tree to represent the result of a non-deterministic computation:
data Tree1 a = Fail1 | Val1 a | Node1 (Tree1 a) (Tree1 a)We rely on the non-strictness of Haskell to prevent evaluation of tree nodes until we traverse to them: after all, the tree may be infinite -- as is the case in our example.
Tree1
is an
instance of Monad and MonadPlus; here are the most complex
parts of these instances (see the accompanying code for the rest):Node1 e1 e2 >>= f = Node1 (e1 >>= f) (e2 >>= f) mplus = Node1
To `run' the non-deterministic computation and produce the the stream
of triples, we traverse the Tree
, extract the
successfully produced results from the Val
leaves and
return them as a lazy list. Different tree traversals correspond to
different non-deterministic search strategies. Depth-first
traversal (DFS) is the most efficient, needing only O(d)
space to examine a node at depth d
. Alas, an infinite
branch in the tree traps DFS. In our pyth
tree, DFS will
get stuck chasing an infinite chain of Fail
.
Breadth-first traversal (BFS) in contrast shall visit any node in a
tree, given time. BFS is a complete strategy: if a
solution (leaf Val
) exists, BFS will find it. Alas, BFS
needs a lot of space to maintain the job queue, the frontier of the
search. At search depth d
the frontier may
take O(2^d)
space. Iterative deepening is a hybrid
method, complete as BFS yet needing as little of working space as
DFS. Iterative deepening explores the progressively long `prefix' of
tree with DFS. Each new exploration phase repeats all the work of the
previous explorations of shallower prefixes. Iterative deepening
clearly trades time for space. Despite its gross wastefulness, the
method is quite popular, for example, in automated theorem proving.
Its trade-off has proved worthwhile.
Here are the results of computing and printing the first n
Pythagorean triples. The code was compiled by GHC 7.0.4 with optimization -O2
.
Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
---|---|---|---|---|
BFS, n=30 | 13.0 | 5.0 | 3 | 465 |
Iter Deep, n=30 | 0.15 | 0.06 | 5 | 1506 |
Iter Deep, n=100 | 4.8 | 1.3 | 56 | 20832 |
Recall that iterative deepening keeps re-traversing the tree. Each
exploration cycle redoes all the previous explorations. Lazy
evaluation helps, it seems. When we first reach Node e1 e2
,
we evaluate e1
and e2
that were
stored in the node unevaluated (otherwise, we would have diverged
constructing the tree, which is infinite). Lazy evaluation
replaces e1
and e2
with their results. When
iterative deepening comes across the same node in a new cycle, it gets
the results of e1
and e2
right away. That
seems like a good thing, until we look at the space. As iterative
deepening explores the Tree
, it needs more and more memory to
store the explored prefix, which is about twice the size of the BFS
frontier. Lazy evaluation thus defeats the purpose of iterative
deepening, of recomputing the revisited tree nodes to avoid
storing them. Lazy evaluation does exactly the wrong thing.
In a strict language, we would have used thunks to represent infinite trees. If tree nodes store thunks, lazy evaluation would memoize thunks -- which evaluate to themselves rather than to trees. It seems therefore the following modification should stop lazy evaluation's meddling in iterative deepening.
data Tree2 a = Fail2 | Val2 a | Node2 (() -> Tree2 a) (() -> Tree2 a) Node2 e1 e2 >>= f = Node2 (\() -> e1 () >>= f) (\() -> e2 () >>= f) mplus e1 e2 = Node2 (\() -> e1) (\() -> e2)Every time we need traverse through a
Node
, we have to
force the thunks and re-compute the branches. At least, in theory.
Here is the practice.Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
---|---|---|---|---|
BFS, n=30 | 13.0 | 5.0 | 3 | 509 |
Iter Deep, n=30 | 0.3 | 0.1 | 8 | 2964 |
Iter Deep, n=100 | 10.6 | 1.7 | 96 | 39244 |
Such an unexpected result was quite a puzzle. It seems GHC is
just too smart. Apparently it notices that a thunk (\() -> e)
can only be applied to the same argument.
Therefore, the first time the thunk is forced by applying it
to ()
, the result can justifiably be memoized: the next
time around the thunk will be applied to the same ()
, and
hence, will give the same result anyway.
The new fix is to deliberately confuse GHC. We obfuscate the
tree-construction operations (>>=)
and mplus
with auxiliary functions app
and app1
.
Node3 e1 e2 >>= f = Node3 (app1 e1 f) (app1 e2 f) mplus e1 e2 = Node3 (app e1) (app e2) {-# NOINLINE app #-} app e () = e {-# NOINLINE app1 #-} app1 e f () = e () >>= fThat does the trick. Here are the results.
Mutator time, sec | CG time, sec | Memory in use, MB | Average residency, KB | |
---|---|---|---|---|
BFS, n=30 | 13.2 | 4.7 | 3 | 413 |
Iter Deep, n=30 | 0.4 | 0.03 | 2 | 78 |
Iter Deep, n=100 | 13.4 | 0.9 | 2 | 413 |
We have seen that lazy evaluation is a trade-off, which may be hurtful in some cases, in particular, in search problems over huge data structures, where it is often beneficial to recompute the result than to store it. Preventing lazy evaluation is possible but surprisingly tricky.
STrees.hs [11K]
Complete code for our example and the benchmark.
Delimited control and breadth-first, depth-first, and iterative deepening search
with more details on BFS and iterative deepening
unsafeInterleaveST
unsafeInterleaveST
(and its close relative,
or specialization, unsafeInterleaveIO
) is often viewed as `mostly
harmless'. The function unsafeInterleaveIO
underlies Lazy IO, an
(unfortunately) widely used feature. The interleaving functions
bear hardly any stigma. On the contrary: Hackage has the monad-interleave
package dedicated to them, encouraging their use.
It is unfortunate that the admonition of the people who introduced interleaveST
has gone unheeded. In their 1995 paper ``State in Haskell''
Launchbury and Peyton Jones wrote:
``It should be clear by now that interleaveST
has very undesirable
properties. It duplicates and discards the state, which gives rise to
a very subtle class of programming errors. We have so far failed to
develop good techniques for reasoning about its correctness''
[Sec 10.5]. Their fears are justified: unsafeInterleaveST
is unsafe.
It lets us write Bool
expressions with observable mutable-state
side-effects, hence proving that True
equals False
.
In other words, referential transparency, or the substitution
of equals for equals, may fail even for Bool
expressions.
Our example is about Boolean equality, defined as
(==) :: Bool -> Bool -> Bool True == True = True False == False = True _ == _ = FalseGHC's standard Prelude derivation is identical. Clearly, it is symmetric, as behooves of equality:
x == y
is the same as y == x
--
even if either x::Bool
or y::Bool
(or both) are undefined.
And yet there exists a context that distinguishes x == y
from y == x
.
That is, there exists
bad_ctx :: ((Bool,Bool) -> Bool) -> Boolsuch that
*R> bad_ctx $ \(x,y) -> x == y True *R> bad_ctx $ \(x,y) -> y == x False
Here is the complete code:
module R where import Control.Monad.ST.Lazy (runST) import Control.Monad.ST.Lazy.Unsafe (unsafeInterleaveST) import Data.STRef.Lazy bad_ctx :: ((Bool,Bool) -> Bool) -> Bool bad_ctx body = body $ runST (do r <- newSTRef False x <- unsafeInterleaveST (writeSTRef r True >> return True) y <- readSTRef r return (x,y)) t1 = bad_ctx $ \(x,y) -> x == y t2 = bad_ctx $ \(x,y) -> y == x
To see how bad this is, recall the claims about ST
from Launchbury and
Peyton Jones' paper, excerpted from Section 1 below:
unsafeInterleaveST
breaks each of these claims. It is unsafe.
(The authors define safety as the Church-Rosser property:
the program giving the same result regardless of the evaluation order provided
data dependencies are respected; see the beginning of Sec. 10.)
Launchbury and
Peyton Jones knew that very well. In the hindsight, the bad_ctx
example
has the same kernel of badness as the unique-supply tree example
in Sec 10.2 of their paper. The authors
point out a subtlety in that code, admitting that they fell into the
trap themselves. The fact that unsafeInterleaveST
is unsafe and dangerous,
and also apparently useful lead the authors to pessimistically
conclude Sec 10.5:
``We fear that there may be no absolutely secure system -- that is, one
which guarantees the Church-Rosser property -- which is also
expressive enough to describe the programs which systems programmers
(at least) want to write, such as those above. We do, however, regard interleaveST
as useful primarily for systems programmers.''
The function unsafeInterleaveST
ought to bear the same stigma as
does unsafePerformIO
. After all, both masquerade side-effecting
computations as pure. Both break the equational reasoning, the
greatest asset of Haskell.
ST
monad and also lazy IO.True
if the function
is strict and False
otherwise. The article shows several implementations
of the test. After a brief reminder of strictness, we explain
why the strictness test is impossible to implement in Haskell.
Nonetheless several implementations exist -- which should tell us about
the features of GHC and Haskell to be leery of.
In a non-strict language, a function generally receives as its
argument an unevaluated computation rather than a value. If a function
can produce the result value without executing the argument
computation -- without really needing the value of the argument -- the
function is called non-strict. It is strict otherwise. For example,
the function const 1 :: Int -> Int
is non-strict (since it returns 1
without looking at its argument), and succ :: Int -> Int
is strict
since it needs the value to increment. The function id :: Int ->
Int
is also strict: obtaining the value (the weak head normal form) of the
result requires the value of the argument. A well-known non-strict
function is cons
, or (:)
. For the reason why, see the famous paper
that begat lazy evaluation, on why cons
should not evaluate its
arguments.
To tell if a function is strict we should be able to observe the
evaluation of its argument. Functions are opaque and we cannot look
inside them. Hence the only way to tell if the argument has been
evaluated is by observing the side-effect of the evaluation. However,
the type of the argument may preclude side effects. For example, the
argument of an Int->Int
function has the pure type Int
. Computations of that type cannot have observable side-effects
in a pure language. To put this another way: if x::Int
is a
computation whose evaluation is observable and if v
is the value of x
, then v
and x
are not substitutable for each other since f x
and f v
are observably different if f
is strict. Therefore, the
strictness test should not be possible. If it were, it would break the
referential transparency.
There is an escape hatch in Haskell: a non-termination is not considered
a side-effect. Therefore, a computation of the type Int
is allowed
not to evaluate to anything and loop forever. Haskell has
a convenient term for such a looping computation: undefined
.
Thus we come to the conventional definition of non-strict functions:
a function f
is non-strict if f undefined
returns a value.
This escape hatch does not contradict the impossibility to observe
strictness: if f undefined
takes long time to compute a value, we
cannot be sure that it really loops (and the cause of looping is
really the attempt to evaluate the looping argument). Perhaps f
just
takes a very long time to finish. In short, we cannot really observe
divergence -- otherwise, we would solve the Halting problem.
And yet there are at least two ways to write the strictness test in
practice, because GHC lets us create observably side-effecting
computations of pure types, and IO contexts are not
referentially transparent. Lazy IO makes the test possible already in Haskell98.
The computation length str
is side-effecting
if str::String
is the result of getContents
. Evaluating length
str
has the side-effect of reading the file and closing the
handle. The status of the handle can be detected. GHC brings imprecise
exceptions, thus permitting `multiple bottoms'. Here is a simple test
of strictness with imprecise exceptions.
import Control.Exception handler :: SomeException -> IO Bool handler _ = return True is_strict :: (Int -> Int) -> IO Bool is_strict f = handle handler $ if f (error "Bang!") > 0 then return False else return False main_s = is_strict succ -- True main_ns = is_strict (const 1) -- False
Finally, even `mildly unsafe' functions like unsafeInterleaveST
(let
alone really unsafe ones) break referential transparency in any contexts.
Here is the strictness test based in unsafeInterleaveST
.
import Control.Monad.ST.Lazy import Data.STRef.Lazy is_strict :: (Int -> Int) -> Bool is_strict f = f (snd t4) `seq` fst t4 where t4 = runST (do r <- newSTRef False y <- unsafeInterleaveST (writeSTRef r True >> return 0) x <- readSTRef r return (x,y)) tr1 = is_strict id -- True tr2 = is_strict (const 1) -- False
The demonstrated implementations of the strictness test may be practically useful, but they are theoretically worrying.
Daniel P. Friedman and David S. Wise: CONS should not Evaluate its Arguments
ICALP 1976 and also Indiana University Technical Report TR44. January 1976.
< http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR44 >
ST
monad, interleaveST
, and
lazy IO. We demonstrate that the problem is as bad as they
feared.UnsafePerformIO
is known to be unsafe, breaking equational
reasoning; unsafeInterleaveIO
, which underlies Lazy IO, gets a free
pass because any computation with it has to be embedded in the IO
context in order to be evaluated -- and we can expect anything from
IO. But unsafeInterleaveIO
has essentially the same code as unsafeInterleaveST
: compare unsafeInterleaveST
from GHC/ST.lhs
with unsafeDupableInterleaveIO
from GHC/IO.hs
keeping in mind that IO
and ST
have the same representation, as described in GHC/IO.hs
. And unsafeInterleaveST
is really unsafe -- not just
mildly or somewhat or vaguely unsafe. In breaks equational reasoning,
in pure contexts.
Lazy IO is likewise problematic: On one hand, a simple
equational proof shows that for all Boolean x
and y
, x < y
is the same as not (y <= x)
. On the other hand,
we exhibit a context that distinguishes the two expressions: plugging
them into the context gives two programs that print different
results. Equational reasoning is thus unsound.
Equational reasoning is regarded as the greatest advantage of pure
functional languages: we may derive programs or prove their
correctness using the elementary, high-school algebra. Equational
reasoning is based on the principle of
`substitution of equals for equals'. If an expression e1
is equal
to e2
(which we write as e1 === e2
) and e1
occurs as part
of a larger expression e
(written as e[e1]
), then we may replace
that e1
with e2
obtaining the equal expression: e[e1] === e[e2]
.
The familiar symmetry and transitivity properties of equality can
be easily derived from the substitution principle. Here a few more
examples of substitution: 2+2 === 4
derives (2+2) * 5 === 4 * 5
, (\x -> (2 + 2) + x) === (\x -> 4 + x)
, and print (2+2) === print 4
. We may thus substitute within a function
and within an action. The expression to substitute into
may be as big as the entire program. Therefore, we may reason equationally about
program behavior -- which is the reason of existence of equational reasoning
in Computer Science. The whole program in Haskell
has the type of IO ()
and its behavior is doing an observable action,
for example, printing something. If the equational reasoning is sound,
the programs that are proven equal must behave the same: print the same
results.
As an example of equational reasoning, we prove that for all
Boolean x
and y
, x < y === not (y <= x)
, using elementary algebra:
x < y === {- total order on Bool -} not (x >= y) === {- x >= y === y <= x -} not (y <= x)We could have proved by cases: there are only 4 to examine.
The fact that x < y
is equal to not (y <= x)
lets us substitute
the former expression with the latter in any program it may
appear. The result should be an equal program. Alas, the enclosed code
shows a program in which such a substitution changes the printed
result. The program
main = counterex_ctx $ \(x,y) -> x < yprints
True
whereasmain = counterex_ctx $ \(x,y) -> not (y <= x)prints
False
.
The problematic counterex_ctx
used lazy IO, which, recall, creates an
illusion that file data is just a String
. Pattern-matching on such a
string may however cause IO, to read a chunk of a file. IO
has side-effects, affecting other, side, expressions in
context. Therefore, one may not freely substitute side-effecting
expressions. Although reading a file may appear a benign effect, it
has side-effects still: the file-descriptor and buffer allocation,
locking, advancing the file position, or consuming data from a
pipe. They are all observable. Our expression x < y
appeared pure and had the pure type Bool
, and so its substitution looked justified. With lazy IO, even a
pure expression may have an observable side-effect. Equational
reasoning becomes unsound.
Some object by saying that IO is inherently non-deterministic. If
there is a race condition, one should not be surprised if a program
that printed True
prints False
on the next run. The
non-determinism of IO will hence explain why unsafeInterleaveIO
is `fine'
whereas unsafeInterleaveST
is truly unsafe: ST
is supposed to be
deterministic.
Recall however that IO
is just an instance of ST
, by
design. Launchbury and Peyton Jones, in the paper ``State in Haskell''
(1995) that introduced ST
and Lazy IO, stated as one of the
contributions: ``Input/output takes its place as a specialised form of
stateful computation. Indeed, the type of I/O-performing
computations is an instance of the (more polymorphic) type of stateful
computations. Along with I/O comes the ability to call imperative
procedures written in other languages.''
To expand on this connection, here is the IO
version of the earlier ST
demonstration that the equality on Booleans is, worryingly,
not symmetric.
import Data.IORef import System.IO.Unsafe bad_ctx :: ((Bool,Bool) -> Bool) -> IO Bool bad_ctx body = do r <- newIORef False x <- unsafeInterleaveIO (writeIORef r True >> return True) y <- unsafeInterleaveIO (readIORef r) return $ body (x,y) t1 = bad_ctx $ \(x,y) -> x == y -- True t2 = bad_ctx $ \(x,y) -> y == x -- FalseThis code has no ``real'' IO, no interaction with the external non-deterministic world. Furthermore, all the
IO
operations in the code
are the type-specialized versions of the corresponding ST
operations, with
essentially the same code. For example, the type IORef
is defined in GHC/IORef.hs
asnewtype IORef a = IORef (STRef RealWorld a)It is hard to believe that the supposedly deterministic
ST
code becomes
non-deterministic when the fully polymorphic state type is
instantiated to RealWorld
(recall, types are erased at run-time.)
We have seen that, disturbingly, replacing one pure Bool
expression
with another equal expression changes the result printed by
the program. Now that
there are many alternatives to lazy IO for incremental file
processing, it is high time to banish lazy IO from Haskell.
Breaking referential transparency with unsafeInterleaveST
unsafeInterleaveST
is the general version of unsafeInterleaveIO
, which underlies Lazy IO.
John O'Donnell, Cordelia Hall, Rex Page: Discrete Mathematics Using a Computer
Springer, 2006 (Second Edition) Chaper 2. Equational Reasoning
describe.pdf [334K]
The full version of the paper presented at FLOPS 2012. A shorter version is published in FLOPS Proceedings, Springer's LNCS 7294, pp. 166-181.
Section 2 of the paper illustrates many practical problems with Lazy IO.
David Sabel and Manfred Schmidt-Schauss: Conservative Concurrency in Haskell
Logic in Computer Science (LICS), 2012
< http://www.ki.informatik.uni-frankfurt.de/papers/sabel/chf-conservative-lics.pdf >
< http://www.ki.informatik.uni-frankfurt.de/persons/sabel/chf-conservative.pdf >
Their conclusions:
Program transformations valid for pure, deterministic, core Haskell
(with no futures or IO) remain valid if concurrency (threads), MVar
s
and futures are added. However, adding unsafeInterleaveIO
or even
lazy futures breaks this conservativity property because
the order of evaluation becomes observable.
The goal of this old project was to put the AIS data, as they are received, into a database, to serve to remote clients on request. A Google Earth application was developed to display ship positions and tracks in real time. Clicking on ship's icon would show its destination, full name, dimensions, etc.
The first step in dealing with AIS messages is to parse them. That is not an easy task. The AIS format was designed quite a long time ago, for the telecommunication equipment of those times (still used today). Since AIS messages are typically transmitted in VHS maritime band using time-division multiple access, they have to be compact. AIS encoding is hence at the level of bits, not bytes. For example, the following two records (transmitted separately, one after another) encode an AIS message 5 (Class A vessel data report):
!AIVDM,2,1,8,A,577n`T02<teAI88;J20`Tr0d4pN2222222222216@0vB?7?aN?43lU30,0*55,r000006098,1167638398 !AIVDM,2,2,8,A,CQ8888888888880,2*3E,r000006098,1167638398Can you make any sense of it? You probably cannot even tell the name of the ship. The 6-bit ASCII characters of the name are tightly packed in the message body, which is then ASCII encoded (in AIS-specific, BASE64-like encoding). Here are all the data from the message, returned by the parser:
[Destination "PORTLAND", Draught 6.0, ETAMin 30, ETAHour 9, ETADay 31, ETAMonth 12, ShipDimD 15, ShipDimC 18, ShipDimB 62, ShipDimA 128, CatShipCargo "CARGO", ShipName "JIN KANG", Callsign "VRBB6", IMOnumber 9237204, MMSI 477997200, TStampRec 1167638398, MsgId 5]
To give the further sense of encoding, longitude is represented as a signed 28-bit (!) number, in 1/10000 minutes. Latitude is encoded as a signed 27-bit number, also in 1/10000 minutes.
The parser below parses AIS message 1 through 5: position reports and ship data/voyage reports. The bit-stream parsing engine is general and can be easily extended to other messages. The parser produces a CSV file, ready to be uploaded into a database.
The parser was written as an experiment in `pure' parsing, without
applicatives or monads. We could therefore take the full advantage of
(extended) pattern guards, clausal definitions and other
niceties of Haskell. The monadic sub-language, in contrast, is less
expressive and ugly. The problem for pure parsing is error reporting.
Errors are expected (and actually rather frequent): after all, AIS messages
are transmitted by radio. To indicate a parsing error and
abort the processing of the current message, we
rely Haskell's error
, which may be used in pure code. Although elegant,
such errors cannot be handled in pure code. More interestingly, such
errors are not even guaranteed to be raised! If an error is encountered
when parsing a datum that turns out unnecessary
for the final result, no error is actually raised.
Hence call-by-need is truly problematic:
rather than returning valid parsed data or
an error indicator, our parser just returns data -- which may be invalid.
That is, when trying to use the data one can run into a parsing error.
Luckily, this very late reporting of errors was not a problem for our parser.
Recall, it converts one or
several lines of input into one line of the output (the CSV file to load
into the database). At the point of writing the output line, the
evaluation is forced, and whatever the problem was encountered in
processing the input message becomes manifest.
To assure this `delimited laziness',
we merely need to note the position in the output stream before
writing a line, so we can back to it if writing the output line ends
in error (see the function write_parsed
in the code).
For this particular application, it turns out possible to write most
of the code outside a monad.
The parser code also contrives poor-man's extensible records for parsing results: a list fields. This lets us easily omit a field.
When the parser was developed back in July 2007, I also evaluated its performance. It took the compiled Haskell program 59 secs (on a typical 2GHz desktop computer of that time) to parse 100,000 messages (about 1/17 of the daily amount) and create the output CSV file. If the input is already in the file system cache, the parsing time drops down to 45 secs. The output CSV file has 78,934 rows. The unparsed 21% correspond mostly to link management messages and proprietary reports.
AISDecoder.hs [34K]
The complete code, which includes many tests
AIS-20070101-small.log [2K]
AIS-20070101-small.out [3K]
AIS-20070101-small.csv [2K]
Sample of AIS messages and the corresponding parsed data
MonadIO
Handling exceptions via catch
, bracket
, catchDyn
, etc. in a
MonadIO
other than the IO
itself has been a fairly frequently
requested feature. The requests tend to recur, probably because these
functions cannot be defined for a general MonadIO
. However, we can
define the generic exception handling for a large and interesting
subset of MonadIO
, which includes various (repeated) transformations
of IO by ReaderT
, WriterT
, ErrorT
, StateT
, and newtype
wrapping.
The generic catch
has been successfully used since 2006 in a
database library Takusen, where many operations work in the monad
ReaderT Session IO
; the database session data are always
available as the environment. Many other foreign libraries are structured
around sessions, connections, library environments, which can be
encapsulated in a monad. We should nevertheless be able to handle IO errors
and user exceptions that arise in such computations.
Back in 2006 Jesse Tov has done an admirably thorough job of
implementing exception handling in general monads. His code defines
two classes: EMonad
and EMonadIO
-- which
contain most of the interesting monads. The latter is the subclass of
the former, permitting arbitrary IO via liftIO
. In either
case, we use gthrow
, gbracket
, gcatch
, ghandle
, gfinally
,
etc. -- without even thinking which Monad we are in and how error
handling is actually implemented, via ErrorT
or via IO exceptions. It works
universally for most of monads of interest. The experience of
using EMonadIO
since 2006 in large production code has
been most positive.
CaughtMonadIO.lhs [6K]
[OBSOLETE] The complete literate Haskell code
The code includes two tests, illustrating throwing and
catching of (dynamic) exceptions in monads obtained from
IO
by several applications of
ReaderT
, WriterT
and ErrorT
.
The code was originally posted as generic catch in a MonadIO on the Haskell mailing list on Tue Feb 7 22:48:24 EST 2006
Takusen's MonadIO
, with gtry
, gtryJust
, gbracket
, gfinally
.
The code supports both the new- and the old-style exceptions.
Joint work with Alistair Bayley.
< http://code.haskell.org/takusen/Control/Exception/MonadIO.hs >
Discussion threads:
Haskell' ticket, introduced as the result of the above Haskell-Cafe discussions:
< http://hackage.haskell.org/trac/haskell-prime/ticket/110 >
iterate2 f x n | seq f $ seq x $ seq n $ False = undefined iterate2 f x n = --- as beforeThat is, we add one simple line to the definition of the function, without changing the proper body of the function at all. We prepend an extra clause with the guard that always yields
False
. Before failing, however, the guard forces the evaluation
of the selected arguments of the function. If needed, seq
can be replaced with deepSeq
.
Making (some of the) arguments of a function strict can make
the function run in constant space, and so an iteration (Runge-Kutta
iteration in the above example) can proceed without running out
of (stack) space. The function iterate2
(and a similar
function rk4Next
) were applied to arguments like n-1
and y + a1/6
-- which is a red
flag. These are exactly the kinds of expressions that lead to memory
exhaustion. Perhaps it is because the size of an unevaluated thunk for n-1
is so much bigger than the size of the evaluated
thunk. It seems that arithmetic expressions are the best candidates
for some opportunistic evaluation...
Partial signatures
Another application of the trick of adding a clause with an always failing guard
The MySysOpen
module offers a reliable, proven way of interacting
with another local or remote process via a unidirectional or
bidirectional channel. It supports pipes and Unix and TCP sockets. MySysOpen
is a simple and explicit alternative to
the multi-threaded IO processing of the GHC run-time
system. The module is the Haskell binding to sys_open
-- the extended, user-level file opening interface.
The second half of MySysOpen.hs
contains several
bi-directional channel interaction tests. One checks repeated
sending and receiving of data; the amount of received data is
intentionally large, about 510K. Two other tests interact with
programs that are not specifically written for interactive use,
such as sort
. The latter cannot produce any
output before it has read all of the input, accepting no input terminator
other than the EOF condition. One test uses shutdown
to set
the EOF
. The other test programs the handler
for a custom EOF
indicator, literally in the file name of
the communication pipe. The final example tests the handling
of the forceful closure of the communication channel in the middle
of writing.
sys_open.c [21K]
The complete, well-commented code for sys_open
The library is originally based on the code written by Jesse Tov, who in turn used NewCGI by Bjorn Bringert et al. Most of the code has been re-written. The biggest change is minimizing the amount of state. The library is centered around generalized input and output ports: a simple IO interface to read, write and copy via a single, large, once allocated buffer. The buffer and its dimensions are hidden, to discourage aliasing. We use this interface for the IO to and from the client, to and from files or the PostgreSQL database, and for reading from external servers.
A generalized input port is a procedure to read at most the
specified number of bytes into a given buffer. The procedure should
return the number of bytes actually read, or 0 on EOF. It may throw
various exceptions if reading didn't go well. The procedure is like hGetBuf
partially applied to a handle.
newtype Input = Input (forall m. EMonadIO m => Ptr Word8 -> Int -> m Int) inputFd :: Fd -> Input inputStr :: EMonadIO m => String -> m Input inputCombined :: EMonadIO m => Input -> Input -> m InputOne can construct an
Input
from a Posix file descriptor
or a String
. One can combine two Input
s into
one, which reads from the first generalized port until EOF and then
reads from the second. The frequently mentioned EMonadIO
is a class of monads permitting IO along with throwing and
catching of arbitrary errors. The IO monad and most of its transformations
are in that class. EMonadIO
lets us write gthrow
, gcatch
, gbracket
, etc.
without even thinking of the current monad.Dually, a generalized output port is a procedure to write the specified number of bytes from the given buffer. It may throw various exceptions if writing didn't go well.
newtype Output = Output (forall m. EMonadIO m => Ptr Word8 -> Int -> m ()) outputFd :: Fd -> Output newtype BCopy = BCopy (forall m. EMonadIO m => Input -> Output -> Maybe Int -> m (Maybe Int))The CGI monad exports (Fast)CGI output and error streams as
Output
, and lets us access request content as Input
.
We can use the generalized ports for reading and writing
strings. The ports are intended though to be connected via BCopy
,
which copies from Input
to Output
the desired number of bytes or
till EOF.NewerCGI.hs [29K]
The commented source code of the library for writing CGI and FastCGI programs
The code was first mentioned in a message Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen] on the Haskell-Cafe mailing list on Fri, 27 Jul 2007 20:34:29 -0700 (PDT)
FastCGI.hsc [3K]
Bindings to the FastCGI C client library
The distinguished feature of Takusen is processing query
results using a left-fold enumerator. The user supplies an iteratee
function, which receives rows one-at-a-time from the result-set. The
number of the arguments to the iteratee is the number of the columns in the
result-set, plus the seed. Each column in the result-set has
its own Haskell type. The latter could be a Maybe
type if the
particular iteratee wishes to process NULL
s.
The benefits are: more convenient and intuitive enumeration,
iteration, and accumulation (see tests for examples); the retrieved
data are not merely strings but have native Haskell types: Int
, Float
, Date
, etc.; buffer preallocation; pre-fetching; support for
both enumerators and cursors, proper handling of all errors including
various IO errors. No unsafe operations are used.
AND
without pattern-matchingAND
, i.e., (&&)
, without the explicit or
even the implicit pattern-matching? Keep in mind that the standard
Prelude not
and (||)
are defined in terms
of pattern-matching. The article below gives several answers.
One of them relies on pointer arithmetic, in Haskell.oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML