General-purpose algorithms and data structures, illustrated in Haskell.
Part II

- Reflection without Remorse: A conceptual sequence as a tangible and efficient data structure
- Pure functional, mutation-free, efficient double-linked lists
- Total stream processors and their applications to all infinite streams
- Sparse Grids: multi-dimensional interpolation, approximation and integration
- Fast computation of Bernoulli numbers
- Surprisingly simple fast typed-aligned queues
- Left fold vs. right fold: theory vs. practice
- Efficient integer logarithm of large numbers in any base
- Iterated zipping puzzles
- Generating interesting lambda-terms
- The curious case of the test-driven development
**Part I ...**

- Test-driven development (TDD) has become popular -- which is
fortunate. Writing tests, before code, not only guides the
implementation and gives the confidence that it works, at least in
some cases. Perhaps the biggest benefit of TDD is getting us to think
about program correctness. Before starting on the code we got to
have a rather concrete idea of what the code is to do: we have to
work out the ideal program and the way to check that the finished code
conforms. Testing is one way to check the conformance -- to
some degree, but not fully: ``Testing can
only show the presence of errors, not their absence.'' Like many
maxims, this Dijkstra's saying sounds abstract, removed from
real life: just too true to be good. It is not until
we actually run into a situation where a well-tested code inexplicably and
embarrassingly fails that we begin to see what Dijkstra must have
meant. We have all heard of Intel's FDIV and many similar stories.
That can happen to
everyone, everywhere. This article relays a particularly clear-cut
illustration -- once again prompting the thought that testing ought
to be complemented by other criteria of program correctness.
The sample problem comes from a programming contest; it may just as well be given at an interview. Its core -- modular arithmetic with large numbers -- is at the heart of cryptographic libraries. The problem is: given a natural number

`n`

that can be as big as`10^6`

, print the nine rightmost decimal digits of`n!`

disregarding the trailing zeros. Although the final code is in C, we will be using Haskell for all development.We chose Haskell because it lets us easily write down the well-defined, unambiguous specification:

spec :: Int -> Int spec = fromIntegral . lastnz9 . fact . fromIntegral fact :: Integer -> Integer -- n! fact n = product [1..n] lastnz9 :: Integral a => a -> a -- Last 9 decimal digits of a number, not counting trailing zeros lastnz9 n = drop0s n `mod` 10^9 drop0s n | (q,0) <- quotRem n 10 = drop0s q | otherwise = n

Here`fact`

computes the factorial,`drop0s`

drops the trailing zeroes from the decimal representation of the number (that is, keeps dividing the number by 10 until it no longer divides) and the`mod`

operation gives the last nine decimal digits of the result, if there are that many. This is our ideal program. Another benefit of Haskell is that the specification is executable -- at least in some cases. We can, therefore, use it as the `ground truth' in unit tests.The specification

`spec`

is perfect as the ideal program, clearly describing the intended result. It also runs. However, it is not something we want to run in real life:`spec 10^6`

will take a lot of time, and a whole lot of space. Indeed,`10^6!`

is about`(2*10^5)^(10^6)`

, with at least 5 million decimal digits.As the first step and with an eye towards for-loops in the final C code, we rewrite

`fact`

in the accumulator-passing style (the threaded accumulator corresponds to a mutable variable in C):fact_accum' :: Int -> Int -> Integer -> Integer fact_accum' n i acc | i > n = acc | otherwise = fact_accum' n (i+1) (acc * fromIntegral i) fact_accum n = fact_accum' n 1 1

Since we only need the nine rightmost digits not counting trailing zeros, let's accumulate only those:fact_accum_9' :: Int -> Int -> Int -> Int fact_accum_9' n i acc | i > n = acc | otherwise = fact_accum_9' n (i+1) (lastnz9 (acc * i)) fact_accum_9 n = fact_accum_9' n 1 1

To see that

`fact_accum_9`

conforms to the ideal`spec`

, we test it on a few sample numbers: 37, 53, 100. We get the correct result. Moreover, we do the `smallcheck': exhaustively check for small values of`n`

:all (\n -> spec n == fact_accum_9 n) [0..21]

The idea of the Small Check is that errors, if any, tend to show up for small arguments. The above test returns`True`

. It seems we are done: we can re-write our solution in C, and submit it as the answer.The contest-running software rejects our solution as erroneous, without any details. After more and more testing, luckily we eventually stumble on the problem: although

`fact_accum_9 n`

gives the same answers as`spec n`

for very many`n`

, it deviates from the ideal for some special, isolated`n`

, for example,`25`

:*FactLimit> spec 25 330985984 *FactLimit> fact_accum_9 25 80985984

What went wrong? Let us look at

`fact_accum_9`

carefully. Clearly it is derived from`spec`

by substituting-in the accumulator-passing implementation`fact_accum`

and pushing the operation`lastnz9`

`inside'. We have thus assumed that`lastnz9`

distributes through multiplication:lastnz9 (i*y) === lastnz9 (i * lastnz9 y)

where`i`

is smaller than`10^9`

. That seems correct: the`mod`

that underlies`lastnz9`

indeed has the required property. If`i`

has any factors of`10`

, they will be removed by the`drop0s`

in`lastnz9`

. Let us, however, take`i`

to be`25`

and`y`

to be divisible by`4`

and without trailing zeros, that is,`y`

represented as`k*10^9 + 4r`

for some`k>=0`

and`0 < 4r < 10^9`

. Thenlastnz9 (25 * (k*10^9 + 4r)) === lastnz9 (100 * (k*10^9/4 + r)) === lastnz9 (k*10^9/4 + r) | === r if k is divisible by 4 | === lastnz9 (k*25*10^7 + r) otherwise lastnz9 (25 * lastnz9 (k*10^9 + 4r)) === lastnz9 (25 * 4r) === r

The assumption is hence violated if`k`

is not divisible by`4`

. In our case,`drop0s (fact 24) `div` 10^9`

is`62044840173`

and it is indeed not divisible by four. Thus the property we took for granted does not actually holds. What is worse, it fails in rather special circumstances, which took some Math to find out.Let us make a few observations:

- The
`mod`

operation distributes over multiplication:`(x*y) `mod` p === ((x `mod` p) * (y `mod` p)) `mod` p`

. If`p=10^9`

, we merely have to deal with the numbers up to`10^9`

, which comfortably fit within 32-bit signed integers, with a 64-bit accumulator for the intermediate result of the multiplication. The problem in the earlier program was the subtle interaction of`mod`

and`drop0s`

. - If
`n!`

has any factors of`5`

, it has even more factors of`2`

. That is,`drop0s n!`

is divisible by`2`

but not divisible by`5`

. - All zeros of
`n!`

come from the factors of`5`

contributed by the multiplicands`n`

,`n-1`

,`n-2`

,... If we suppress the factors`5`

during the accumulation,`drop0s`

becomes redundant. We can take the full advantage of point 1. - To elaborate on the last point: suppose we have already computed
`drop0s n!`

. By point 2, it is representable in the form`2^k * r`

, for some odd`r`

not divisible by`5`

. Suppose`n+1`

factors as`2^i * 5^j * s`

where`s`

is likewise odd and not divisible by`5`

. Then`drop (n+1)!`

is`2^(k+i-j) * (r*s)`

. We are sure, by point 2, that`k+i>j`

. It is enough to compute`r`

,`s`

and their product`mod 10^9`

, using the property of`mod`

from point 1.

-- return (n',k) such that n == n'*f^k factors n f = go 0 n where go k n | (q,0) <- quotRem n f = go (k+1) q | otherwise = (n,k) modulus = 10^9 factl :: Int -> Int factl n = go 2 0 1 where go i twos acc | i > n = shifts acc twos go i twos acc = let (i1,k1) = factors i 2 (i2,k2) = factors i1 5 in go (i+1) (twos + k1 - k2) (acc * i2 `mod` modulus) -- compute (acc * 2^twos) `mod` modulus shifts acc 0 = acc shifts acc twos | twos >= 4 = shifts ((acc `shift` 4) `mod` modulus) (twos - 4) | otherwise = (shift acc twos) `mod` modulus

The function`shifts`

performs the delayed multiplication by`2^twos`

, by a sequence of shifts. We again rely on the property of`mod`

to limit the precision of all operations to nine decimal digits. For speed, we shift in batches of four.It is rather easy to transcribe the above algorithm in C (see below). The C code worked on first try and was accepted as the answer to the challenge.

Thus, tests -- and types -- are necessary and should be used, as a spot-check for corner cases and silly mistakes made in transcribing the idea into code. Useful the tests and the types as they are, for verification, they are not sufficient. They have to be complemented with other ways to be sure in the code, such as thinking about the problem mathematically. Proofs are not sufficient either and have to be complemented with tests to spot-check for realism of their assumptions and adequacy of their conclusions. All in all, there is no royal road -- or any road, it seems -- to the correct code, as there are none to an invention or discovery.

- The
**Version**- The current version is December 2015
**References**- FactLimit.hs [3K]

The Haskell code used in the articlefactlimit.c [<1K]

The C code, based on the correct version of the optimal Haskell algorithmRobert P. Colwell. The Pentium Chronicles: The people, passion and politics behind Intel's landmark chips

IEEE Society Press, 2006. Published by John Wiley & Sons, Inc. 187pp.Bob Colwell was the first manager within Intel who learned about the FDIV problem, when a member of his team demonstrated the bug in Pentium (but not in his P6). From talking to Pentium developers Bob Colwell found out how the bug came about. It turns out that the change to FDIV was introduced very late in the Pentium design process, in response to a call to save the die space. The changed algorithm indeed fit on a smaller die (which was however irrelevant since that saved space was inside an already laid out block). The management was at first reluctant to put the change in so late in the process. But it was convinced by the mathematical proof of correctness. The proof also made the validators test the change less stressfully. Alas, the proof turned out faulty.

It is probably not coincidental that after the FDIV problem Intel has developed keen interest in automated formal theorem proving.

We were lucky to find the problematic case without too many tries. For floating-point computations (used in the Intel FPU) the numbers that may cause problems are rare and special. Finding them is a mathematical problem, like the one addressed in the paper:

John Harrison: Isolating critical cases for reciprocals using integer factorization.

Proc. 16th IEEE Symposium on Computer Arithmetic, Santiago de Compostela, Spain 2003, IEEE Computer Society Press, pp. 148-157 2003.

- The difference between theory and practice is a butt of many jokes,
and, hence, the indication of a profound difficulty. How can we apply
a theoretical insight -- say, that the left fold can be written in
terms of the right fold -- to the practical question of designing data
structure traversal libraries. Should we take a stand, as one comes
across, that the left fold, as theoretically redundant, has no place
in a well-designed library? Applying theoretical insights (from moral
and political theory) to messy reality has also worried Kant, among
other people. In his 1793 essay on theory and practice, Kant
emphasized that theoretical rules ``are thought of as principles
possessing a certain generality and, consequently, as being abstracted
from a multitude of conditions that nonetheless necessarily influence
their application.'' Judgment is therefore required to discern when
the abstraction is justified and hence the rule is applicable, and
when it is not. This article will expound this point, reminding
that the theory that makes the left fold redundant abstracts away
memory consumption and (often significant) constant factors in the
running time. Running out of memory and k-times slowdowns do make us
worry. Therefore, in practice the left fold is not redundant
and should be provided (in a form with early termination) alongside the
right fold in a data traversal library. And yet, the abstract theory
is not at all practically futile: it guides program transformations
and reasoning. These transformations, implemented in GHC,
do make
`Prelude.foldr`

subsume the left fold even in practice.Right fold, such the

`foldr`

on listsfoldr :: (a -> b -> b) -> b -> [a] -> b foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)

is one of the most common and fundamental patterns of processing a data structure. In fact, a data structure can be faithfully represented by -- is isomorphic to -- its right fold. For example, the entire list processing library, from`head`

and`tail`

to`drop`

and`take`

and even`zipWith`

can be written entirely in terms of`foldr`

. The left fold, often called `reduce',foldl :: (b -> a -> b) -> b -> [a] -> b foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs

can likewise be written as the right fold:foldl_via_foldr f z l = foldr (\e a z -> a (f z e)) id l z

Therefore, providing both folds within the same library seems redundant. Let us however take a second look at`foldl_via_foldr`

: the`foldr`

in that expression is a so-called `second order fold', building a closure which is then applied to`z`

. To appreciate the costs, let us consider the evaluation of a simple example`foldl_via_foldr f z [e1,e2,e3]`

for some`f`

,`z`

and the three list elements, in call-by-name/need and call-by-value.In call-by-name (call-by-need is the same), our example evaluates as follows:

foldl_via_foldr f z [e1,e2,e3] === foldr (\e a z -> a (f z e)) id [e1,e2,e3] z === -- constructing thunk (foldr (\e a z -> a (f z e)) id [e2,e3]) (\a z -> a (f z e1)) (foldr (\e a z -> a (f z e)) id [e2,e3]) z === -- and a thunk (f z e1) foldr (\e a z -> a (f z e)) id [e2,e3] (f z e1)

We have constructed two thunks and about to force the first one, unrolling`foldr`

once more. A few more steps lead to=== -- a big thunk which duplicates the original list f (f (f z e1) e2) e3

a thunk whose structure mirrors the original list`[e1,e2,e3]`

. We have essentially duplicated the list -- worse than duplicated since a thunk (closure) typically takes more space than a cons cell. We are not done yet: we have to reduce the thunk to the weak-head-normal form. If`f`

is strict in its left argument (such as addition), we have to reduce`(f z e1)`

, then`(f (f z e1) e2)`

, etc. -- effectively traversing the original list once again, this time to compute the result of`f`

reductions.For comparison we evaluate the same example with the native left fold, actually, using its strict version

`foldl'`

:foldl' :: (b -> a -> b) -> b -> [a] -> b foldl' f !z [] = z foldl' f !z (x:xs) = foldl' f (f z x) xs

The reductions, again in call-by-name, look as follows:foldl' f z [e1,e2,e3] === foldl' f (f z e1) [e2,e3] === -- forcing f z x let !v1 = f z e1 in foldl' f v1 [e2,e3] === {- ... -} let !v1 = f z e1 in let !v2 = f v1 e2 in let !v3 = f v2 e3 in v3

No closures or thunks are constructed; we first compute`f z e1`

as`v1`

, then`f v1 e2`

, etc. -- reducing the list element-by-element in constant space. In contrast,`foldl_via_foldr`

had built two sets of thunks (whose size is proportional to the length of the list) and effectively traversed the list twice. One can express`foldl'`

via`foldr`

by adding a strictness annotation to`foldl_via_foldr`

. The applications like`f z e1`

are then reduced immediately; the thunks`foldr (\e a z -> a (f z e)) id [e2,e3]`

, etc. still have to be built. The benchmarks in the accompanying code bear the conclusions out. The strict left fold`foldl'`

is the fastest and the most space-economical way to reduce a list with a strict function: when interpreting the Haskell code and when compiling with all optimizations. In the latter case,`foldl'`

reductions are more than the order-of-magnitude faster than`foldr`

and`foldl_via_foldr`

reductions and three times as faster then`foldl'_via_foldr`

. Furthermore, summing an integer list with`foldl'`

runs truly in constant space, with no allocations. In contrast, emulating`foldl`

or`foldl'`

via`foldr`

looks, from the amount of allocated memory, like duplicating the original list. (See however below for`Prelude.foldr`

.) The case for providing`foldl'`

as a primitive in a data traversal library is compelling.The case for the left fold is even more compelling in strict languages. Our example of reducing a three-element list runs under call-by-value thusly:

foldl_via_foldr f z [e1,e2,e3] === foldr (\e a z -> a (f z e)) id [e1,e2,e3] z === (\a z -> a (f z e1)) (foldr (\e a z -> a (f z e)) id [e2,e3]) z

Before the application of`(\a z -> a (f z e1))`

can proceed the argument`(foldr (\e a z -> a (f z e)) id [e2,e3])`

has to be evaluated first, pushing the pending application onto stack:=== -- evaluating the argument, application on stack let f23 = foldr (\e a z -> a (f z e)) id [e2,e3] in (\a z -> a (f z e1)) f23 z === {- ... -} let f0 = foldr (\e a z -> a (f z e)) id [] in let f3 = (\a z -> a (f z e3)) f0 in let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 z

At this point the stack looks quite like the original list itself, with list elements in closures rather than cons cells. We have in effect copied the whole list onto stack. For long lists, stack overflow is imminent. Now we have to unwind the stack:=== let f0 = id in let f3 = (\a z -> a (f z e3)) f0 in let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 z === let f3 = (\z -> id (f z e3)) in -- constructing closure, a value let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 z === let f3 = (\z -> id (f z e3)) in -- constructing closure, a value let f23 = (\z -> f3 (f z e2)) in -- constructing closure, a value (\a z -> a (f z e1)) f23 z

Unwinding the stack has built the closures`f3`

and`f23`

. The list got copied again, from the stack to the heap, as closures. The application`(\a z -> a (f z e1))`

can proceed, and the list is traversed once more, this time computing the`f`

reductions:=== let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in f23 (f z e1) -- evaluating (f z e1) === let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in let v1 = f z e1 in f23 v1 === {- ... -} let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in let v1 = f z e1 in let v2 = f v1 e2 in let v3 = f v2 e3 in v3

The total cost: copying the list onto the stack and then to the heap, requiring O(N) stack and heap space, and effectively three traversals. On the other hand, the primitive left fold accomplishes the job in constant space, with a single pass through the list. Thus emulating left fold via right fold is a bad idea in practice.An interesting variation is folding with a non-strict function. For example, indexing of a list

index :: Int -> [a] -> a index _ [] = error $ "No such index" index 0 (x:_) = x index n (_:xs) = index (n-1) xs

is also theoretically expressible through the right fold:index_foldr :: Int -> [a] -> a index_foldr n xs = foldr (\x r n -> if n == 0 then x else r (n - 1)) (const (error $ "No such index")) xs n

An argument similar to the one for the left fold may convince us that writing`index`

via`foldr`

is not good in practice, due to the cost of closure construction and extra traversal. Writing`index`

in terms of`foldl`

is not a good idea either since the left fold always traverses the list through the very end. Query functions like`index`

should stop scanning the data structure as soon as the desired element is found. Therefore, we need something like`foldl`

with early termination -- which is what iteratees are. In strict languages, whose folds are generally effectful, that is perhaps the only choice. However, in Haskell there is another, surprising choice.We have seen that

`foldl_via_foldr`

is not practically viable and hence`foldl'`

is better be provided by a data processing library. And yet the theory that neglects memory consumption is not practically futile. In GHC, compiling`foldl_via_foldr`

using specifically Prelude's`foldr`

and full optimizations produces the same code as the hand-written`foldl`

with the explicit recursion. GHC, implementing theoretically justified transformations,*can*afford to drop`foldl`

and`foldl'`

from the standard library without any loss of performance.We have to, however, re-write

`foldr`

as follows, which is how it is programmed in the standard Prelude:foldr_Prelude :: (a -> b -> b) -> b -> [a] -> b foldr_Prelude f z = go where go [] = z go (x:xs) = f x (go xs)

We have applied the equivalence-preserving transformation called `lambda-dropping', the opposite of `lambda-lifting'. The correctness of lambda-dropping is shown theoretically. Lambda-lifting is pervasive in compiling higher-order languages; lambda-dropping is of limited applicability and so is not done automatically. GHC authors had to write`foldr_Prelude`

by hand.When compiling

`foldl_via_foldr`

, GHC first inlines`foldr_Prelude`

, obtaining:foldl_via_foldr f z l = go l z where go [] = id go (x:xs) = (\e a z -> a (f z e)) x (go xs)

The next step are two beta-reductions, the second of which substitutes`go xs`

. Generally substituting an expression is only sound in call-by-name/need, or when we can prove the expression is effect-free and terminating. In this particular case, the substitution is clearly always valid. (It is not clear how smart a call-by-value compiler has to be to see that.) The result:foldl_via_foldr f z l = go l z where go [] = id go (x:xs) = \z -> (go xs) (f z x)

or, equivalentlyfoldl_via_foldr f z l = go l z where go [] z = z go (x:xs) z = (go xs) (f z x)

is literally the left fold. A look at the Core code produced by GHC shows that`foldl_via_foldr`

and`foldl`

compile to the same code. After inlining and code substitutions GHC has derived the left fold. Likewise, GHC derives the optimal, hand-written`index`

from`index_foldr`

.Applying theory to practice is indeed a messy business. One has to know the assumptions of the theory and use one's judgment. The theoretical expressivity of the left fold in terms of the right fold does not usually apply to practice, where we care about running out of memory and k-times slowdowns. Therefore, the left fold, albeit theoretically redundant, should be provided in a data structure library along with the right fold, especially in call-by-value languages. In the latter case, the left fold with an early termination makes for a better primitive.

We have also seen that a theory that abstract away memory consumption and constant factors is not at all practically useless. The abstraction uncovers the essentials of

`foldr`

and help derive very powerful optimizations, which do make, in some cases, the left fold redundant after all, even in practice. It seems befitting to end with two quotes from Kant: ``[T]he general rule must be supplemented by an act of judgment whereby the practitioner distinguishes instances where the rule applies from those where it does not. ... No-one can pretend to be practically versed in a branch of knowledge and yet treat theory with scorn, without exposing the fact that he is an ignoramus in his subject.'' **Version**- The current version is May 2014
**References**- Folds.hs [9K]

Complete code for the article, with detailed derivations and a few benchmarksAlgebraic data types and measuring strictness

The article that counts closure and cons allocations for different folds, including the left-via-right fold, for different evaluation strategies and list strictnessTim Sheard and Leonidas Fegaras: A Fold for All Seasons. FPCA 1993

Beyond Church encoding: Boehm-Berarducci isomorphism of algebraic data types "and polymorphic lambda-terms

Boehm and Berarducci's motivation was to translate programs with data structures into pure System F programs, with just functions.Jeremy Gibbons and Geraint Jones: The Under-Appreciated Unfold. ICFP 1998

One of the points of the paper is the argument against using foldr as a big hammer. Looking at other combinators, such as unfold, leads to the insightful and efficient code.Olivier Danvy and Ulrik P. Schultz. Lambda-Dropping: Transforming Recursive Equations into Programs with Block Structure

BRICS Report Series 1999, RS-99-27

<http://www.brics.dk/RS/99/27/BRICS-RS-99-27.pdf>How to zip folds: A library of fold transformers (streams) The list processing library implemented in terms of the right fold

From enumerators to cursors: turning the left fold inside out

An early argument for the left fold with early termination as a traversal primitiveImmanuel Kant: Theory and Practice

(The full title, literally from German: ``On the Proverb: That May be True in Theory but Is of No Practical Use'')

A few salient quotesAn aggregation of rules, even of practical rules, is called a theory, as long as these rules are thought of as principles possessing a certain generality and, consequently, as being abstracted from a multitude of conditions that nonetheless necessarily influence their application. Conversely, not every undertaking [Hantierung] is a practice [Praxis]; rather, only such ends as are thought of as being brought about in consequence of certain generally conceived [vorgestellten] principles of procedure [Verfahrens] are designated practices.For to the concept of the understanding that contains the rule must be added an act of judgment by means of which the practitioner decides whether or not something is an instance of the rule. And since further rules cannot always be added to guide judgment in its subsumptions (for that could go on infinitely), there can be theoreticians who, lacking judgment, can never be practical in their lives.

<http://homepages.law.asu.edu/~jeffriem/kantarticlea.htm>the general rule must be supplemented by an act of judgement whereby the practitioner distinguishes instances where the rule applies from those where it does not.No-one can pretend to be practically versed in a branch of knowledge and yet treat theory with scorn, without exposing the fact that he is an ignoramus in his subject.

Encyclopedia of Ethics, 2nd edition, vol. 3, pp. 1706-1708.

<http://www.jamesrachels.org/theory.pdf>

- It is always an interesting challenge to write a
pure functional and efficient implementation of an imperative
algorithm destructively operating on a data
structure. The functional implementation has a significant benefit
of equational reasoning and modularity. We can comprehend
the algorithm without keeping the implicit global state in mind. The
mutation-free, functional realization also has practical benefits:
the ease of adding checkpointing, undo and redo. The absence of
mutations makes the code multi-threading-safe and helps in porting
to distributed or non-shared-memory parallel
architectures. On the other hand, an imperative implementation has the
advantage of optimality: mutating a component in a complex data
structure is a constant-time operation, at least on conventional
architectures. Imperative code makes sharing explicit, and so permits
efficient implementation of cyclic data structures.
We show a simple example of achieving all the benefits of an imperative data structure -- including sharing and the efficiency of updates -- in a pure functional program. Our data structure is a doubly-linked, possibly cyclic list, with the standard operations of adding, deleting and updating elements; traversing the list in both directions; iterating over the list, with cycle detection. The code:

- uniformly handles both cyclic and terminated lists;
- does not rebuild the whole list on updates;
- updates the value in the current node in time bound by a small constant;
- does not use or mention any monads;
- does not use any
`IORef`

,`STRef`

,`TVars`

, or any other destructive updates; - permits logging, undoing and redoing of updates, checkpointing;
- easily generalizes to two-dimensional meshes.

The algorithm is essentially imperative -- hence supporting the identity check and the in-place `update' -- but implemented purely functionally. Although the code relies on many

*local*, type safe `heaps', there is emphatically no global heap and no global state.It is not for nothing that Haskell has been called the best imperative language. Imperative algorithms are implementable as they are -- yet genuinely functionally, without resorting to the monadic sub-language but taking the full advantage of clausal definitions, pattern guards, laziness.

**Version**- The current version is 1.2, Jan 7, 2009
**References**- FDList.hs [8K]

The complete, commented Haskell98 code and many testsHaskell-Cafe discussion ``Updating doubly linked lists''. January 2009.

- We present a pure-functional queue data structure with the operations:
- adding an element to the left or to the right:
`cons`

and`snoc`

; - concatenation;
- emptyness testing;
- taking from the left-edge: obtaining head and the tail.

The implementation stands out by its sheer simplicity, and hence time- and space-efficiency: it is the ordinary tree, which is made more and more left-leaning by left-edge deconstruction operations.

The queue also fits elements of different types. However, unlike `freely' heterogenous data structures, our queue is

*type-aligned*: each element has the type of the form`r ai bi`

where`r`

is a two-place type constructor, and`ai = b{i-1}`

and`bi = a{i+1}`

. The type alignment is better explained by example: suppose`r`

is the arrow type constructor. Then a type-aligned queue with`n`

elements contains functions of the following types and in the following order:a0 -> a1, a1 -> a2, ... a{n-2} -> a{n-1}, a{n-1} -> an

The type-aligned queue hence represents the composition of the above`n`

functions. Unlike the ordinary functional composition, the queue lets us access individual members, i.e., uncompose.The need to split or peel off a composition of functions comes, in particular, when the composition represents the stack of an abstract machine, or a continuation. Each member of the composition stands for a stack frame. The type-aligned queue then becomes the ideal data structure to represent the continuation of an effectful operation, in the free(er) monad implementation of algebraic effects.

- adding an element to the left or to the right:
**Version**- The current version is July 2015
**References**- FTCQueue.hs [2K]

The complete Haskell implementationFTCQueue1.hs [2K]

Fast type-aligned queue specialized to the functions of the type`a -> m b`

(the type of `stack frames' of monadic computations) and further optimized. It underlies the implementation of extensible effects.Reflection without Remorse: A conceptual sequence as a tangible and efficient data structure

The paper also describes type-aligned queues, based on Okasaki's pure-functional queues. However, those queues are rather heavy-weight. Although the algorithmic complexity is the same as for our queues, Okasaki-based type-aligned queues have bigger constant factors, which made the first freer monad implementation of extensible effects noticeably slower.

- Sparse Grids is a multi-dimensional interpolation, approximation and
integration technique that offers a way to cope with the curse of
dimensionality and is hence scalable to a larger number of dimensions than
the standard grid approximations. Sparse grids are constructed as a
tensor product (Cartesian product) of one-dimensional
*multi-scale*bases. If we take the Cartesian product of`d`

ordinary grids with`N`

points, we end up with`O(N^d)`

points (degrees of freedom). The sparse grid construction gives only`O(N * (log N)^(dā1))`

degrees of freedom, without sacrificing the accuracy of the approximation.Sparse Grids were first proposed as a numerical integration technique by the Soviet mathematician Smolyak in 1963, and re-discovered in Germany in 1990s for solving partial differential equations. It is now used for integral equations, stochastic differential equations, interpolation and approximation, and many other applications.

The enclosed code builds the hat-function hierarchical basis and performs the sparse-grid interpolation for functions vanishing at the boundaries. The code includes a few tests. They clearly show that the interpolant is computed on-the-fly -- as an on-demand sequence of successively more precise approximations, to be cut off when reaching the desired precision.

I'm grateful to Viktor Winschel for introducing me to sparse grids and helpful conversations.

**Version**- The current version is November 2009
**References**- SparseGrid.hs [13K]

Complete, essentially Haskell98 codeHans-Joachim Bungartz and Michael Griebel: Sparse grids

Acta Numerica (2004), pp. 1ā123. DOI: 10.1017/S0962492904000182Jochen Garcke: Sparse Grid Tutorial. August 2006

- In the article on seemingly impossible functional programs,
Marti'n Escardo' wrote about decidable checking of the
satisfaction of a total computable predicate on Cantor numerals. The
latter represent infinite bit strings, or all real numbers
within
`[0,1]`

. Mart'n Escardo's technique can tell,*in finite time*, if a given total computable predicate is satisfied over*all*possible infinite bit strings. Furthermore, for so-called sparse predicates, Marti'n Escardo's technique is very fast.We re-formulate the problem in terms of streams and depth-limited depth-first search, and thus cast off the mystery of deciding the satisfiability of a total computable predicate over the set of all Cantor numerals, which are uncountable.

As an additional contribution, we show how to write functions over Cantor numerals in a `natural' monadic style so that those functions become self-partially evaluating. The instantiation of the functions in an appropriate

*pure*monad gives us transparent memoization, without any changes to the functions themselves. The monad in question is pure and involves no reference cells.On `dense' functions on numerals (i.e., those that need to examine most of the bits of their argument, up to a limit), our technique performs about 9 times faster than the most sophisticated one by Marti'n Escardo'.

**Version**- The current version is October 2, 2007
**References**- StreamPEval.lhs [14K]

Total stream processors and quantification over the infinite number of infinite streams. The complete article.StreamPEval.hs [12K]

Extensively commented Haskell98 codeMarti'n Escardo': Seemingly impossible functional programs.

<http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/>

- It is straightforward to generate lambda-terms, with the help of the
simplest non-deterministic operations for choice and failure that are
available in many language systems. With a little
more care one can ensure that any lambda-term occur in the generated
stream sooner or later. It takes remarkably more care to make
interesting terms come sooner than much, much later. This article
gives a few tips on how to relatively quickly
generate interesting terms such the S and Y
combinators, the divergent omega-term, a Church numeral and its successor.
We limit attention to closed terms since all interesting terms mentioned earlier are closed. De Bruijn indices let us avoid wasting time on building alpha-equivalent terms. (The terms will be pretty-printed in the conventional, named-variable notation, for clarity.) Thus we will be generating terms described by the following grammar (data type declaration), with the side condition that the terms are closed.

data Exp = V Int | A Exp Exp | L Exp

The generator is a truly straightforward application of the simplest non-deterministic operations in the

`MonadPlus`

interface.a_term_naive :: MonadPlus m => m Exp a_term_naive = L `liftM` go 0 where -- go n: generate a lambda term with free variables 0..n go n = choose_var n `mplus` choose_lam n `mplus` choose_app n choose_var 0 = return (V 0) choose_var n = return (V n) `mplus` choose_var (n-1) choose_app n = liftM2 A (go n) (go n) choose_lam n = L `liftM` go (n+1)

Absent constants, a closed lambda-term must start with a lambda-binder. The rest could be either a variable, an application of two non-deterministically chosen terms, or an abstraction with a random body.

The equally straightforward implementation of

`MonadPlus`

, the List monad, fails to generate anything interesting within the first 10 000 terms. It is not difficult to see why. Here are the first 8 generated terms:Lx.x, Lx.Ly.x, Lx.Ly.y, Lx.Ly.Lz.x, Lx.Ly.Lz.y, Lx.Ly.Lz.z, Lx.Ly.Lz.Lu.x, Lx.Ly.Lz.Lu.y

The List monad realizes an incomplete search strategy: there is no guarantee that any given term will ever come. No applications are indeed forthcoming with the List monad. A better implementation of non-determinism is needed, with a complete search strategy: for example, iterative deepening or FBackTrack. With iterative deepening (which in our implementation produces the same sequence as breadth-first search but without taking Gbytes of memory), the first 10 generated terms

Lx.x, Lx.Ly.x, Lx.Ly.y, Lx.x x, Lx.Ly.Lz.x, Lx.Ly.Lz.y, Lx.Ly.Lz.z, Lx.x (Ly.x), Lx.x (Ly.y), Lx.x (x x)

look quite hopeful. Within the first 10 000 generated terms, the self-application`Lx.x x`

, or`delta`

, comes 4th; the third Church numeral comes 716th and`omega`

comes 3344th. Alas, there is no successor or the S combinator, let alone the Y combinator. With the FBackTrack implementation, the first few terms may look even more hopefulLx.x, Lx.x x, Lx.Ly.x, Lx.x x x, Lx.Ly.x x, Lx.x (x x), Lx.Ly.Lz.x, Lx.(Ly.x) x, Lx.x (Ly.x)

The term`delta`

comes the second, the third Church numeral comes 695th. Alas, within the first 99 977 terms nothing else interesting comes. Generating interesting terms is not at all as straightforward as it seems. Even sophisticated MonadPlus implementations did not help.It is crucial to recognize that the straightforward search for lambda-terms is biased, and that bias is not in favor of interesting terms. Generally, the fewer non-deterministic choices it takes to produce a term, the sooner the term comes. The straightforward generator clearly favors selectors (terms of the form

`Lx.Ly...Lu.x`

) and simple applications such as`Lx. (x x) (x x)`

. We should put the brake on generating abstractions: each new`L`

adds a new variable and hence dramatically many more possible terms. We should encourage the generator to play more with the variables it already has. To prevent the string of applications of a variable to itself, we should produce terms in the general order of their size. Again the goal is to explore the search space more uniformly. It is tempting to generate only normal forms. Alas, the Y combinator and`omega`

do not have normal forms. Therefore, we do generate redices, but only of interesting kinds (whose argument is an abstraction), and only occasionally. The following code incorporates these ideas:a_term :: MonadPlus m => m Exp a_term = L `liftM` go 0 where -- go n: generate a lambda term with free variables 0..n go n = do size <- iota 1 gen n True size -- gen n l s: generate a lambda term with free variables 0..n and -- of the size exactly s. The second argument tells if to generate -- abstractions gen _ _ s | s <= 0 = mzero gen n _ 1 = choose_var n gen n True 2 = choose_lam n 2 gen n False 2 = mzero gen n True s = choose_app n s `mplus` choose_lam n s gen n False s = choose_app n s choose_var n = msum . map (return . V) $ [0..n] choose_lam n s = penalty (40*n) $ L `liftM` gen (n+1) True (s-1) choose_app n s = do let s' = s - 1 -- Account for the 'A' constructor let gen_redex = do lefts <- range 4 (s' - 3) liftM2 A (choose_lam n lefts) (choose_lam n (s' - lefts)) let gen_noredex = do lefts <- range 1 (s' - 1) liftM2 A (gen n False lefts) (gen n True (s'-lefts)) gen_noredex `mplus` penalty 4 gen_redex

The auxiliary

`iota i`

produces an integer greater or equal to`i`

;`range i j`

chooses an integer between`i`

and`j`

, inclusive. The operation`yield`

-- Lowers the priority of m, so choices of m will be tried less often yield :: MonadPlus m => m a -> m a yield m = mzero `mplus` m

and its n-th iterate`penalty n`

produce a string of failures before trying the argument computation`m`

. When a complete search strategy sees many failures it tends to turn away and to pay more attention to other parts of the search space.Here are the first 10 terms produced by the sophisticated generator:

Lx.x, Lx.Ly.y, Lx.Ly.x, Lx.x x, Lx.x (Ly.y), Lx.Ly.y y, Lx.Ly.x y, Lx.x (x x), Lx.Ly.y x, Lx.x (Ly.x)

The term`delta`

comes fourth,`omega`

comes 54th, the Y combinator 303d, the third Church numeral 393d, the S combinator 1763d and the successor 4865th.The conclusion, although obvious in hindsight, is still thought-provoking: interesting lambda-terms are really hard to encounter by accident. They are exquisitely rare in the space of possible lambda-terms and distributed non-uniformly. A monkey banging on even the sophisticated lambda-typewriter may have printed the Y combinator, but would unlikely to print even the addition combinator within its lifetime.

**References**- EnumFix.hs [12K]

The complete code for the simple and sophisticated generators, as part of the procedure to search for fixpoint combinatorsFBackTrack.hs [3K]

Simple fair and terminating backtracking monadSearches.hs [14K]

Iterative deepening search

- The integer logarithm of the number
`n`

in base`b`

is the integer`l`

such that`b^l <= n < b^(l+1)`

. The number one greater than the integer logarithm base`b`

is the size of`n`

, that is, the number of digits, in its base-`b`

representation. We present the Haskell98 code that is just as efficient as the internal`GHC.Integer.Logarithms.integerLogBase#`

function but uses no unboxed data, no optimizations, and is not even compiled.Naively, the integer logarithm is computed by repeated divisions of

`n`

by`b`

, until the result reaches 1. This procedure requires`l`

divisions, where`l`

is the logarithm. We present two efficient algorithms: the first uses only multiplications, no more than`2*log_2 l`

of them, and`(log_2 l) * sizeof(n)`

extra memory for the powers of`b`

. The second algorithm does`log_2 l`

multiplications and no more than`log_2 l`

integer divisions (and the same amount of extra memory) to compute`l`

.Here is the multiplication-only procedure that returns

`l+1`

where`l`

is the integer logarithm of`n`

in base`b`

. It is a composition of two functions, which together compute the bits of`l`

:`major_bit`

determines the upper bound and`other_bits`

improves it.data BaseP = BaseP !Integer -- b^k !Int -- k mul_bp :: BaseP -> BaseP -> BaseP mul_bp (BaseP bk1 k1) (BaseP bk2 k2) = BaseP (bk1*bk2) (k1+k2) basewidth :: Integer -> Integer -> Int basewidth b _ | b < 1 = error "basewidth: base must be greater than 1" basewidth b n | n < b = 1 basewidth b n | n == b = 2 basewidth b n = major_bit [BaseP b 1] where major_bit :: [BaseP] -> Int major_bit bases@(bp:bps) = let bpnext@(BaseP bk2 k2) = bp `mul_bp` bp in case compare bk2 n of EQ -> k2 + 1 -- n == b^(2k) GT -> other_bits bp bps -- b^(2k) > n LT -> major_bit (bpnext : bases) -- b^(2k) < n other_bits (BaseP _ i) [] = i+1 -- b^i < n < b^(i+1) other_bits bp (bphalf:bps) = let bpup@(BaseP bik ik) = bp `mul_bp` bphalf in case compare bik n of EQ -> ik + 1 -- n == b^(i+k) GT -> other_bits bp bps -- b^i < n < b^(i+k) LT -> other_bits bpup bps -- b^(i+k) < n < b^(i+2k)

The correctness and the complexity analysis follow from the invariants of the two auxiliary functions. In any call`major_bit bases`

, the list`bases`

is`[BaseP b^k k | j <- [d,d-1..0], k=2^j]`

for some`d`

and`n > b^(2^d)`

. In the first invocation,`d`

is 0 and progressively increases until such`d>0`

is found that`b^(2^(d-1)) < n <= b^(2^d)`

. In any call`other_bits (BaseP bi i) bases`

, the list`bases`

is`[BaseP b^k k | j <- [d,d-1..0], k=2^j]`

for some`d`

,`k=2^d`

and`b^i < n < b^(i+2k)`

. Each invocation of`other_bits`

halfs`k`

until it reaches one. The other, more efficient as it turns out, algorithm modifies`other_bits`

to divide`n`

by the candidate lower bound`b^i`

. That integer logarithm function computes that the 48th Mersenne prime`2^57885161-1`

has 17425170 digits in its decimal representation -- in`1.2`

seconds. **Version**- The current version is December 2013
**References**- BaseWidth.hs [9K]

Complete commented Haskell98 source code and tests

- This Haskell98 code quickly computes Bernoulli numbers. The
code avoids explicit recursion, explicit factorials and (most)
computing with rationals, demonstrating stream-wise processing and CAF
memoization. For example, the following snippet defines a 2D table of
pre-computed powers
`r^n`

for*all*`r>=2`

and`n>1`

. Thanks to lazy evaluation, the table is automatically sized as needed. There is no need to guess the maximal size of the table so to allocate it.powers = [2..] : map (zipWith (*) (head powers)) powers

The rest of the algorithm exhibits similar stream-wise processing and computations of tables in terms of themselves. **Version**- The current version is 1.1, March 2003
**References**- Bernoulli.hs [3K]

Commented Haskell98 source code and testsMessages speedup help by Damien R. Sullivan, Bill Wood, Andrew J Bromage, Mark P Jones, and many others posted on the Haskell-Cafe mailing list on March 6-8, 2003

- The following is the refined code for the challenge
originally posted by Joe English on a Haskell-Cafe thread about
homework-like puzzles. The challenge was to figure out what the code
does without first loading it up in a Haskell interpreter.
s f g x = f x (g x) puzzle = (!!) $ iterate (s (lzw (+)) (0:)) [1] where lzw op [] ys = ys lzw op (x:xs) (y:ys) = op x y : lzw op xs ys

Incidentally, a small change gives a different series:puzzle1 = (!!) $ iterate (s ((lzw (+)).(0:)) (1:)) [] where lzw op [] ys = ys lzw op (x:xs) (y:ys) = op x y : lzw op xs ys

Finally, how can we possibly live without the following:puzzle2 = (!!) $ iterate (s ((lzw (+)).(1:).(0:)) (0:)) [1,1] where lzw op xs [] = [] lzw op (x:xs) (y:ys) = op x y : lzw op xs ys

Hints:

*Main> puzzle 5 [1,5,10,10,5,1] *Main> puzzle1 5 [1,2,4,8,16] *Main> puzzle2 5 [1,1,2,3,5,8,13]

**Version**- The current version is August 2003
**References**- Message Homework posted on
the Haskell-Cafe mailing list on Mon, 25 Aug 2003 18:50:42 -0700 (PDT)
Discussion thread, started on Haskell-Cafe by Thomas Bevan on Aug 22, 2003.