Monads in Haskell and other languages



LogicT - backtracking monad transformer with fair operations and pruning

We design and implement a library for adding backtracking computations to any Haskell monad. Inspired by logic programming, our library provides, in addition to the operations required by the MonadPlus interface, constructs for fair disjunctions, fair conjunctions, conditionals, pruning, and an expressive top-level interface. Implementing these additional constructs is well-known in models of backtracking based on streams, but not known to be possible in continuation-based models. We show that all these additional constructs can be generically and monadically realized using a single primitive which we call msplit. We present two implementations of the library: one using success and failure continuations; and the other using control operators for manipulating delimited continuations.

It seems that something similar (but not identical) to msplit has been discovered at least twice before: under the names unbuild (Andrew Gill, 1996) and destroy (Josef Svenningsson, ICFP02). Our function reflect is also related to unfoldr. One of the earlier versions of Kanren did indeed implement something like destroy, internally using regular list constructors to split a stream.

However, the present function msplit is significantly more general: it has nothing to do with Haskell lists and never uses any lists constructors, even internally. The functions msplit and reflect operate over any backtracking computation over any base monad. The presence of the base monad makes things trickier as we now must assure that we do not repeat effects. Therefore, the equivalent of the destroy/infoldr rule of Josef Svenningsson's destroy . unfoldr === id is, in ``small-step semantics'':

    rr (lift m >> mzero)     ===  lift m >> mzero
    rr (lift m `mplus` tma)  ===  lift m `mplus` (rr tma)
    where rr tm      =   msplit tm >>= reflect
One can read mzero and mplus as generalized list constructors [] and (:), and read (>>=) as a flipped application. The more complex laws assure that effects are not duplicated. It is fair to say that whereas foldr/build and destroy/unfoldr fusion techniques work for Haskell lists, msplit/reflect in the LogicT paper is a fusion law for a general backtracking computation over arbitrary, perhaps even strict, base monad. That computation may look nothing like list; in fact, the LogicT paper gives an example with delimited continuations, with abort playing the role of nil and shift the role of cons.

Joint work with Chung-chieh Shan, Dan Friedman and Amr Sabry.

The current version is 1.3, September 2010
LogicT.pdf [186K]
Backtracking, Interleaving, and Terminating Monad Transformers
The Functional Pearl paper presented at ICFP 2005.

LogicT as an extensible effect
The Haskell Symposium 2015 paper describes in Sec 5 the extensible-effect implementation of LogicT.

LogicT.tar.gz [12K]
The Haskell code accompanying the paper
The code includes two implementations of LogicT: based on the two-continuation model of streams, and the direct-style implementation with first-class delimited continuations, using CCCxe or CCExc libraries. The code has two larger examples: Missionaries and cannibals problem, and playing Tic-Tac-Toe with minimax search and two heuristics. The latter code is rather heavily based on the code by Andrew Bromage.

Generators and LogicT

How to take a tail of a functional stream
A brief and de-sugared explanation of splitting a functional stream

Simple fair and terminating backtracking Monad Transformer
A more efficient implementation of LogicT

Josef Svenningsson: Shortcut Fusion for Accumulating Parameters and Zip-like Functions.
Proc. ICFP'02, 2002, pp. 124-132


Simple fair and terminating backtracking Monad Transformer

The present code is an implementation of MonadPlus and the MonadPlus transformer with fair conjunctions and disjunctions and with distinct good termination properties. This implementation terminates where more conventional MonadPlus realizations (e.g., the List monad) may diverge. Here's an example:
    pythagorean_triples :: MonadPlus m => m (Int,Int,Int)
    pythagorean_triples = do
        let number = (return 0) `mplus` (number >>= (return . (+1)))
        i <- number
        guard $ i > 0
        j <- number
        guard $ j > 0
        k <- number
        guard $ k > 0
        guard $ i*i + j*j == k*k
        return (i,j,k)
    test = take 7 $ runM Nothing pythagorean_triples
We stress that i, j, and k are all infinite streams of numbers. If we run this example with the standard List monad or with list comprehensions, divergence is imminent, because the backtracking ``gets stuck'' on k. Our fair stream implementation terminates. Our implementation also terminates and gives a stream of non-trivial answers if we remove the i>0 guards and make number to be left-recursive. See the example at the end of the source code.

The function runM :: Maybe Int -> Stream a -> [a] of our monad takes an extra argument: runM (Just n) m runs the computation m for at most n ``backtracking'' steps. This limit is another way to improve termination.

In the present monad the laws are satisfied modulo the order of the answers: we should treat the result of runM as a multi-set rather than a list. This issue is discussed in the LogicT paper.

I used to think that this monad is less capable than those described in the LogicT paper. It turns out that FBackTrackT implements the LogicT interface -- rather trivially. Thus FBackTrackT is an implementation of LogicT, with quite a good performance.

The current version is, Oct 31, 2005; added LogicT: January 2014
FBackTrack.hs [3K]
The complete Haskell source code for the plain monad, and an example

FBackTrackT.hs [6K]
The corresponding monadic transformer (in Haskell98), with examples

LogicT - backtracking monad transformer with fair operations and pruning


Purely Functional Lazy Non-deterministic Programming

Functional logic programming and probabilistic programming have demonstrated the broad benefits of combining laziness (non-strict evaluation with sharing of the results) with non-determinism. Yet these benefits are seldom enjoyed in functional programming, because the existing features for non-strictness, sharing, and non-determinism in functional languages are tricky to combine.

We present a practical way to write purely functional lazy non-deterministic programs that are efficient and perspicuous. We achieve this goal by embedding the programs into existing languages (such as Haskell, SML, and OCaml) with high-quality implementations, by making choices lazily and representing data with non-deterministic components, by working with custom monadic data types and search strategies, and by providing equational laws for the programmer to reason about their code.

This is joint work Sebastian Fischer and Chung-chieh Shan.

The current version is August 2011
lazy-nondet.pdf [283K]
Purely functional lazy nondeterministic programming
Journal of Functional Programming 21(4-5):413--465, 2011
This is the greatly extended, journal version of the paper first published in the Proc. of ICFP 2009, pp. 11-22

The laws of sharing with non-determinism were stated `modulo observation', but that notion was not clearly defined. The proof that the laws are congruences was also left to future work. That work has been completed: the paper Reasoning with Extensible Effects formally defines the contextual equivalence modulo handler (`observer') and introduces the framework to prove equational laws to be congruences.

Deriving1.hs [17K]
Executable small-step semantics based on an alternative set of laws

The laws of non-determinism with explicit sharing described in the paper are concise but not constructive because of pattern-matching on bottom. The present code introduces an alternative set of laws. The code implements small-step semantics of a small monadic language with non-determinism and sharing. The semantics is based on the oriented version of the alternative equational laws. The code is hence the proof that the laws are constructive.

The datatype MP a represents the syntax of the language. The clauses of the function step define the reduction relation, in the declarative form. The semantics is executable; there are functions to evaluate a term, observing intermediate results if desired. The code includes the extensive test suite.

HANSEI: Embedded Probabilistic Programming
Memoization of non-deterministic functions in HANSEI is implemented using thread-local storage, according to the principles explained in the paper.

Efficient Set monads
A different way to share possible intermediate results of non-deterministic computations


Parameterized `monad'

The familiar State monad lets us represent computations with a state that can be queried and updated. The state must have the same type during the entire computation however. One sometimes wants to express a computation where not only the value but also the type of the state can be updated -- while maintaining static typing. We wish for a parameterized `monad' that indexes each monadic type by an initial (type)state and a final (type)state. The effect of an effectful computation thus becomes apparent in the type of the computation, and so can be statically reasoned about.
    class Monadish m where
        gret  :: a -> m p p a
        gbind :: m p q a -> (a -> m q r b) -> m p r b
Unlike the ordinary monads (where m is a type constructor of one argument), here m is a type constructor of three type arguments, p, q and a. The argument a is the type of values produced by the monadic computation. The two other arguments describe the type of the computation state before and after the computation is executed. As in ordinary monad, gbind is required to be associative and gret to be the left and the right unit of gbind. It is clear that all ordinary monads are particular case of Monadish. Like the ordinary class monad, Monadish is implementable in Haskell98.

The need for such a type-state indexed `monad' has been observed on many occasions. For example, we can introduce monadish LIO p q a where p and q are either Locked or Unlocked, representing the state of a particular lock. Just by looking at the type of a computation, we can see if the computation acquires the lock (its (inferred) type is LIO Unlocked Locked a), acquires and releases the lock (its type is LIO Unlocked Unlocked a), or does not use the lock (its type LIO p p a and polymorphic over p). It becomes easy to statically ensure that the lock can be released only if it is being held. The locking problem can also be solved with the ordinary monad (by defining a function like with_lock which uses the monad in a contra-variant position). The latter solution is more cumbersome, and becomes increasingly cumbersome as we wish to represent more complex effects, such as control effects due to delimited control operators with answer-type modification.

The variable state, parameterized `monad' has been thoroughly investigated by Robert Atkey, MSFP2006. The message below is an independent discovery.

The current version is 1.1, Dec 18, 2006
VarStateM.hs [5K]
The complete Haskell98 code defining the parameterized monad and using it to implement a variable-state monad and the IO-like monad that statically enforces a locking protocol
The first part of the code was originally posted as Simple variable-state `monad' on the Haskell mailing list on Mon Dec 18 00:02:02 EST 2006

Robert Atkey: Parameterised Notions of Computation. MSFP2006

Lightweight static resources, for safe embedded and systems programming
RealTime.hs [8K]
Using Monadish to statically track ticks so to enforce timing and protocol restrictions when writing device drivers and their generators

Delimited continuations with effect typing, full soundness, answer-type modification and polymorphism
The genuine shift/reset control operators are implemented in terms of the parameterized monad.

Matthieu Sozeau: Type-safe printf using delimited continuations, in Coq (Jan 2008)
The implementation of the parameterized `monad' (called GMonad) in Coq. The Coq code not only defines the operations gret and gbind but also states their laws, which Coq will verify for each GMonad instance.

Lightweight monadic regions
Section 6 of the paper describes a Haskell library for manual resource management, where deallocation is explicit and safety is assured by a form of linear types. We implement the linear typing in Haskell with the help of phantom types and a parameterized monad to statically track the type-state of resources.

Linear and affine lambda-calculi
The tagless-final encoding uses the variable-state `monad' to track resources associated with bound variables, to enforce the linearity of variable references.

Tracking dynamic values in types


Laws of MonadPlus

Alongside the well-known Monad type class
    class Monad m where
      return :: a -> m a
      (>>=)  :: m a -> (a -> m b) -> m b
the Haskell Report also introduces
    class Monad m => MonadPlus m where
      mzero :: m a
      mplus :: m a -> m a -> m a
If Monad expressions are to represent general effectful computations, MonadPlus is meant specifically for expressing non-deterministic choice (mplus) and failure (mzero). Haskell report states the equational laws for (>>=) and return that any monad implementation has to satisfy. The laws are meticulously described in numerous monad tutorials and inscribed on many T-shirts. What are the laws of MonadPlus?

It is often claimed that mplus must be associative and mzero must be its left and right unit. That is, supposedly mplus and mzero should satisfy the same laws as list concatenation and the empty list -- that is, the monoid laws. It should be stressed and stressed again that the Haskell Report says nothing about the laws of MonadPlus. Absent any other authority, the status of the MonadPlus laws is undetermined.

Do we want the monoid laws for MonadPlus? The answer is surprisingly complicated. In a short and confusing form, it may be reasonable to use the commutative monoid laws to contemplate non-determinism and to approximate the result of a program. If one objects to the commutativity of mplus, the associativity should not be a requirement either. This article shows simple examples of this complex issue. Along the way, we discuss why we may want equational laws to start with.

An equational law is a statement of equality: for example, the left-unit monoid law for MonadPlus states that mzero `mplus` m and m must be `the same'. This may mean that given the particular definitions of mzero and mplus, the expression mzero `mplus` m reduces to m for any m. Such a fine-grained, intensional view of equality is most helpful when writing implementations and proofs. Let's take the following simple model of non-deterministic computation:

    data NDet a = Fail | One a | Choice (NDet a) (NDet a)
with One a standing for the deterministic computation whose result is a, Fail denoting failure and Choice representing non-deterministic choice. If we implement mzero as Fail, then the left-unit monad law, on the fine-grain reading, immediately gives us one clause in the implementation of mplus:
    instance MonadPlus NDet where
      mzero = Fail
      Fail `mplus` m = m
       -- ... continued below
the associativity of mplus gives the second clause:
    (Choice m1 m2) `mplus` m = Choice m1 (m2 `mplus` m)
The remainder is trivial:
    (One v) `mplus` m = Choice (One v) m
Thus the monoid laws gave us an implementation of MonadPlus. Building a prototype, an `executable specification', is one, very good reason to specify equational laws. Hinze's ICFP 2000 Functional Pearl is the most elegant and inspiring example of using equational laws to derive and improve an implementation -- of, as it happens, backtracking monad transformers.

One does not have to take equational laws too literally. For example, with the following, even more lucid instance

    instance MonadPlus NDet where
      mzero = Fail
      mplus = Choice
the expression mzero `mplus` m reduces to Choice Fail m, which is clearly different from m. One may still insist that these expressions should be treated as `morally' the same. That is, replacing mzero `mplus` m with m, in any context and for any m should be observationally equivalent: should not change the results of a program. We now have to define `the result' of our program: how to observe an NDet computation. Commonly, see Hinze (2000) again for an example, a program of the form return v `mplus` m is regarded as resulting in v; mzero is to be observed as failure. (To generalize, one may take return v1 `mplus` (return v2 `mplus` m) as having two results: first v1 and then v2.) Therefore, the observation function for NDet, typically called run, is to be defined as follows:
    run :: NDet a -> a
    run Fail    = error "failure"
    run (One x) = x   -- the intended meaning of One
    run (Choice (One x) m) = x
     -- ... continued 
If we insist that replacing mzero `mplus` m with just m should preserve the observations, we have to add the following clause to run:
    run (Choice Fail m) = run m
The associativity of mplus lets us complete the definition:
    run (Choice (Choice m1 m2) m) = run (Choice m1 (Choice m2 m))

To recap, in this implementation, mzero `mplus` m does not reduce to m. Likewise, m1 `mplus` (m2 `mplus` m3), which is (Choice m1 (Choice m2 m3), is a structurally different NDet value than m1 `mplus` (m2 `mplus` m3). Still, as programs, mzero `mplus` m and m produce the same results. Furthermore, replacing the former expression with the latter, as part of any larger expression, preserves the result of the overall program. (That claim, while intuitively true for our implementation, is very hard to prove formally: see JFP2021 for the proof and the rigorous discussion of equivalence modulo observation.) The same can be said for m1 `mplus` (m2 `mplus` m3) and (m1 `mplus` m2) `mplus` m3. This preservation of the results, the observational equivalence, lets a compiler, a programmer or a tool to optimize mzero `mplus` m to m. Deriving and justifying optimizations is another compelling application of equational laws.

Finally, the equational laws can be used to predict and understand the behavior of a program without regard to any particular implementation. Let consider the programs ones and main:

    ones = return 1 `mplus` ones
    main = ones >>= \x -> if x == 1 then mzero else return x
Recall that in our notion of observation, return v `mplus` m is regarded as resulting in v. Thus ones has 1 as the result -- as the first and all the following results. As for main,
    main =
     ones >>= \x -> if x == 1 then mzero else return x
     === {- inline ones -}
     (return 1 `mplus` ones) >>= \x -> if x == 1 then mzero else return x
     === {- distributivity of mplus over bind -}
     (return 1 >>= \x -> if x == 1 then mzero else return x)
      `mplus` (ones >>= \x -> if x == 1 then mzero else return x)
     === {- the second clause of mplus is just main -}
     (return 1 >>= \x -> if x == 1 then mzero else return x) `mplus` main
     === {- monad law -}
     mzero `mplus` main
     === {- monoid law -}

That chain of equational re-writes brought out nowhere. Clearly, no other chain does any better: there is no way to convert main either to the form return v `mplus` something or mzero. Our intuition agrees: ones keeps offering 1, which main rejects.

Now consider onestoo and the program maintoo:

    onestoo = ones `mplus` return 2
    maintoo = onestoo >>= \x -> if x == 1 then mzero else return x
If mplus is associative, onestoo can be re-written to
    onestoo = ones `mplus` return 2
     === {- inlining ones -}
     (return 1 `mplus` ones) `mplus` return 2
     === {- associativity -}
     return 1 `mplus` (ones `mplus` return 2)
     return 1 `mplus` onestoo

Absent any other laws, we must regard maintoo just as diverging as main. This time the conclusion goes against our intuitions. We would like to interpret m `mplus` return 2 as a non-deterministic choice that includes 2. There is therefore a chance that maintoo may produce that result. If MonadPlus is meant to represent non-determinism and to implement non-deterministic search, a search procedure ought to be complete: if it is possible maintoo may produce 2, it should be found sooner or later. A MonadPlus instance with the associative mplus cannot be an implementation of a complete search procedure, unless we assume some sort of commutativity.

To derive the result 2 in our maintoo example, we have to posit additional laws for mplus, at the very least, m `mplus` return v === return v `mplus` m. One can even take mplus to be commutative for arbitrary arguments. The commutative and associative mplus agrees with our intuitions of non-deterministic choice. But it violently disagrees with the notion of observation used so far: return v `mplus` something is to be observed as resulting in v. With the commutative mplus, return 1 `mplus` return 2 is to be equal to return 2 `mplus` return 1, which are observed as 1 and 2 respectively. It is not reasonable to treat 1 and 2 as the same observation. We have to re-define observation and treat return v `mplus` m as a program that might produce the result v in one particular run. The program may finish with some other result. Intuitive as it is, such notion of observation is weak, feeling like a typical disclaimer clause: the program we sold you may do nothing at all. It feels, well, non-deterministic.

It is still reasonable to assume the commutative monoid laws as a over-broad specification. If a MonadPlus implementation produces one of possibly many answers predicted by the specification, the implementation should be deemed correct. We have to be, however, explicit about such an interpretation. To avoid the confusion with the common notion of observation -- what the program actually returns -- we should not be saying that a MonadPlus is required to be be associative and commutative. A program m1 `mplus` (m2 `mplus` m3) may still print different results from (m1 `mplus` m2) `mplus` m3.

Thus which laws MonadPlus is to obey? It depends on what we want to use the laws for, and how precise we want to be about the program behavior. As specification, mplus may be regarded as associative. It should also then be commutative, otherwise it does not represent non-deterministic choice. If it is commutative we have to significantly weaken the notion of observation. Generally, a specification is supposed to be an approximation of the behavior of a real program. What is special about MonadPlus is just how imprecise the commutative monoid specification is. A reasonable alternative -- non-associative mplus -- is described in ICFP05 and JFP2011 papers below, with many examples of equational reasoning with such monads.

The JFP2021 paper `Not by Equations Alone' discusses laws of non-determinism in great detail, including proving the laws being congruences. The paper takes the approach opposite to the one shown in the present article. Rather than postulate the laws and then derive an observation function to uphold them, the paper starts with the observation function as given, and checks which equational laws hold up to that observation. The paper hence shows which commonly used observations for non-determinism give rise to which equational laws, and what one has to sacrifice for a richer equational theory.

The current version is January 2015
Ralf Hinze: Deriving Backtracking Monad Transformers. ICFP 2000

LogicT - backtracking monad transformer with fair operations and pruning

Purely Functional Lazy Non-deterministic Programming

Not by Equations Alone: Reasoning with Extensible Effects, JFP2021


Monte Carlo Monad: arithmetic of `probabilistic numbers'

The following article discusses a (sub)language of ``fuzzy'', probabilistic numbers. The language lets us write ordinarily-looking arithmetic expressions; variables in these expressions however denote random values rather than numbers. A user can specify the exact shape of the probability distribution curve for each random quantity. Arithmetic expressions over fuzzy numbers often arise in physics and statistics. Physical quantities often cannot be described by a single number; rather they are random values, due to measurement errors or the very nature of the corresponding physical process. We can use the language of fuzzy numbers to automate the error analysis and to estimate the resulting distribution.

The problem is more complex than it appears. An apparent solution is to represent fuzzy numbers as distribution functions. Operator + should be generalized to take two distribution functions as arguments and yield the distribution function of the sum. Other arithmetical operators and functions should be lifted as well. As Jerzy Karczmarczuk noted, this naive approach is fraught with problems. For example, fuzzy numbers cannot be an instance of the Num class as the latter requires its members to belong to the Eq class; comparing functions is in general undecidable. A more immediate problem is that the arithmetic of distributions is far more complex. Distributions are not linear: the distribution of a sum is a convolution of the distributions of its terms. Division is especially troublesome, if the denominator has a non-zero probability of being zero.

The following article develops a different approach. The key insight is that we do not need to ``symbolically'' manipulate distributions. We represent a random variable by a computation, which, when executed, generates one number. If executed many times, the computation generates many numbers -- a sample of its distribution. We can lift the standard arithmetical operators and functions (including comparisons) to such computations, and write ordinarily-looking arithmetical expressions over these random computations. We can then execute the resulting expression many times and obtain a sample distribution of the resulting quantity. The sample lets us estimate the mathematical expectation, the variance and other moments of the resulting quantity. Our method is thus a Monte Carlo estimation of a complex distribution.

As an example, we compute the center and the variance of sqrt( (Lw - 1/Cw)^2 + R^2) which is the impedance of an electric circuit, with an inductance being a normal variable (center 100, variance 1), frequency being uniformly distributed from 10 through 20 kHz. Resistance is also uniformly distributed from 10 through 50 kOm and the capacitance has the square root of the normal distribution. (The latter is just to make the example more interesting.)

In the follow-up, summarizing article, Jerzy Karczmarczuk designed a complementary approach, based on approximating a distribution by linear combinations of normal distributions. The arithmetic of normal distributions is well-understood.

The current version is 1.1, Oct 31, 2001
random-var-monad.txt [6K]
Re: Non-local arithmetic. The article with the source code and examples, posted on a newsgroup comp.lang.functional on Wed, 31 Oct 2001 17:48:33 -0800.

HANSEI: Embedded domain-specific language for probabilistic models and (nested) inference


How to prove monad laws

When introducing a new monad, it is not enough to define a type constructor and the bind, return operations. One has to prove the three monad laws: return is the left and right unit of bind, and bind is associative. Proving the associativity law is less straightforward.

The following article demonstrates proving associativity for a particular Error monad. Filinski's `star' notation and a variable-free formulation of the associativity law turned out quite helpful.

The current version is 1.1, Sep 2, 2003
proving-monad-laws.txt [4K]
The article with the complete proof It was originally posted as Re: proving the monad laws on the Haskell mailing list on Tue, 2 Sep 2003 19:26:39 -0700 (PDT)


Monadic reflection and reification

A brief essay on Andrzej Filinski's paper ``Representing Monads'' (POPL 1994), discussing the interpretation of the bind operator in terms of reflection and reification of monadic effects.
The current version is 1.1, Aug 26, 2003
monadic-reflection.txt [3K]
The article was originally posted as Re: monads and reflection [was: overloading] on the newsgroup comp.lang.scheme on Tue, 26 Aug 2003 12:00:45 -0700

Scheme's multiple values as an effect
Another discussion of monadic reflection and reification


A serially-numbering monad

We want to build trees such that every node is tagged with a unique integer. Ideally every term/node has to have its id associated with it at construction time. As the original poster remarked, ``An obvious way to implement it in this way is to have a collection of state which is passed along all the time to every function. But this seems rather tedious.''

Is there a way to solve the problem purely functionally and elegantly? The article proposes one solution. It relies on monads to effectively hide the counter that uniquely tags every node. Building a tagged tree now looks just as simple as building an untagged one. The counter variable is well-hidden. And yet it is there, invisibly threading through all computations.

The current version is 1.1, Oct 9, 2001
numbered-monad.txt [4K]
The article with the source code. It was originally posted as Re: How to do this functionally? on the newsgroup comp.lang.functional on Tue, 9 Oct 2001 22:42:03 -0700
The discussion thread shows, in particular, an implementation of the serially-numbering monad in terms of GHC's State monad.