General-purpose algorithms and data structures, illustrated in Haskell
The article first formally defines the perfect random shuffle,
outlines the algorithm, and discusses the well-known imperative
shuffler that swaps elements in the array. We present and analyze two
purely functional implementations. The naive one runs in O(n^2)
time. A more sophisticated code has the O(n*logn)
time complexity. The article contains the full code to
arrange a sequence in a complete binary tree and to extract an
arbitrary element from the tree maintaining the completeness of the
remaining tree.
Dave Bayer, Persi Diaconis: Trailing the dovetail shuffle to its lair
Ann. Appl. Probab. 2 (1992), no. 2, pp. 294-313.
In the context of card shuffling, `perfect shuffle' means a deterministic interlacing of the two halves of the deck. The paper studies the perfect and other card shuffles in detail.
int
. A type system is sound if it
correctly approximates the dynamic behavior and predicts its outcome:
if the static semantics predicts that a term has the
type int
, the dynamic evaluation of the term, if it
terminates, will yield an integer.As object language, we use simply-typed and let-polymorphic lambda calculi with integers and integer operations as constants. We use Haskell as a metalanguage in which to write evaluators, type checkers, type reconstructors and inferencers for the object language.
We explore the deep relation between parametric polymorphism and `inlining'. Polymorphic type checking then is an optimization allowing us to type check a polymorphic term at the place of its definition rather than at the places of its use.
Joint work with Chung-chieh Shan.
EvalN.hs [4K]
Evaluator for the untyped (nominal) lambda-calculus with integers and zero-comparison
TEvalNC.hs [4K]
Type reconstruction for the simply-typed Church-style lambda-calculus with integers and zero-comparison
Source terms have type annotations on bound variables. Type reconstruction is implemented as an evaluator with a non-standard semantics.
TEvalNR.hs [4K]
A version of TEvalNC.hs
that reports not only the inferred type of the whole term but also the inferred types for each subterm
TInfT.hs [7K]
Type reconstruction for the simply-typed Curry-style lambda-calculus with integers and zero-comparison
Source terms have no type annotations whatsoever.
TInfTM.hs [10K]
A version of TInfT.hs
hiding the single-threaded state through simple algebraic transformations (beta-expansions)
TInfLetI.hs [10K]
The Curry-style calculus with Let
: polymorphic type checking via `inlining'
TInfLetP.hs [14K]
Let-polymorphism via type schemes
This is a non-canonical presentation of Hindley-Milner type inference, treated as an optimization of polymorphic type checking by inlining.
TInfTEnv.hs [10K]
Simply-typed Curry-style lambda-calculus with integers and zero-comparison: reconstructing not only types but also the type environment
This approach lets us type check open terms. A type checking failure means the term cannot be typed in any type environment.
One often hears a complaint against typing: we can evaluate terms we can't type check. This code shows that we can type check terms we can't even evaluate. We cannot evaluate an open term at all, but we can type check it, inferring both its type as well as the requirement on the environments in which the term must be used.
EvalTaglessI.hs [9K]
Initial Tagless Typed lambda-calculus with integers and the conditional in the higher-order abstract syntax
Haskell itself ensures the object terms are well-typed. This code relies on GADT.
EvalTaglessF.hs [8K]
Final Tagless Typed lambda-calculus with integers and the conditional in the higher-order abstract syntax
Haskell itself ensures the object terms are well-typed. No GADTs are used. Apart from switching off the monomorphism restriction for convenience, the code is in Haskell98.
IORef
s. There is also a
monad-less solution, which maintains a dictionary of
already constructed nodes. The latter is less obvious: seemingly we
cannot add a node to the dictionary until we have built the node. Therefore,
we cannot use the updated dictionary when building the
descendants of the node -- which need the updated dictionary to link
back. The problem can be overcome however with a credit card
transform (a.k.a. "buy now, pay later" transform). To avoid
hitting the bottom, we just have to "pay" by the "due date".For illustration, the article below considers the problem of printing out a non-deterministic finite automaton (NFA) and transforming it into a deterministic finite automaton (DFA). Both NFA and DFA are represented as cyclic graphs.
Tying the knot: How to build a cyclic data structure
A CommonHaskellIdioms Haskell Wiki page.
Trying to tie the knot
< http://www.mail-archive.com/haskell@haskell.org/msg10687.html >
A message on the Haskell mailing list about a fixed-point combinator helping to tie a knot with forward links. Posted on Fri, 12 Apr 2002 18:44:07 -0700.
We consider the problem of one producer and two consumers in the general setting. We derive a deforested version, which no longer needs to buffer any produced items. When we run that version, the memory retaining profile shows no space leaks. We also discuss a ``parallel'' writing of several streams into two distinct files. Our solution is safe and yet the i/o is effectively interleaved. The deforested solution is derived, via a sequence of equivalent transformations. The derived code worked on the first try.
Jorge Adriano: producing and consuming lists
A message posted on the Haskell-Cafe mailing list posted on Tue, 5 Nov 2002 16:04:58 +0000. The message describes the problem of consuming two inter-dependent streams while avoiding space leaks.
DefinitionTree
, is yet another
embedding of Prolog in Haskell. It is distinguished not by speed or
convenience. Rather, it is explicitly designed to formalize
evaluation strategies such as SLD and SLD-interleaving, to be easier
to reason about and so help prove termination and other properties of
the evaluation strategies. The main difference of the DefinitionTree
from
other embeddings of Prolog in Haskell is the absence of
name-generation effects. We need neither gensym
nor the state monad to
ensure the unique naming of logic variables. Since naming and
evaluation are fully separated, the evaluation strategy is no longer
concerned with fresh name generation and so is easier to reason
about equationally.Pure, declarative, and constructive arithmetic relations
DefinitionTree
has been used to describe the SLD and SLD-interleave evaluation strategies and to prove the basic properties of solution sets.
Int->Int
function.
That seems simple until we realize that
finding the value of that particular function at the argument x
requires evaluating the function for smaller arguments and may require evaluating the function for larger arguments. Thus
straight-forward memoization leads to a large and sparsely populated
memoization table. As the original poster observed, the memoized
function begins thrashing sooner than the original code.
The solution is to balance the cost of recomputing the value versus
the cost of maintaining the memoization table and doing the lookup. It
is worth memoize the values of the function for some small x
--
which are used frequently. For larger x
, re-computation is
better. The parameter sc_upb
in the code determines the balance, the
largest argument value for which we still memoize. The simple
benchmark shows that this controlled memoization does improve both the
space and time performances.
A notable feature of the code is the transparency of adding the controlled memoization. The body of the algorithm remains essentially the same.
Our problem is an instance of a more general one, of attribute, or property lists. In Common Lisp, a property list is a list of arbitrary key-value pairs attached to a symbol as an annotation. Crucially, the set of possible keys and the types of possible associated values are not known beforehand. One can always add an arbitrarily new annotation to a symbol. Property lists are especially handy in a compiler, letting each node of an abstract syntax tree be annotated with source location data, with the inferred type of the corresponding sub-expression, or with the results of various analyses. MLton, for example, uses property lists extensively for that purpose. However, MLton destructively modifies the tree as new annotations are added, and MLton resorts to the Universal type. Pure functional, typed implementation of post-factum--added property lists is an interesting problem.
Our solution relies on the observation that each node in
an (ordered) tree can be identified by a path -- a sequence of
integers. The root node has the empty sequence as its path. The
path (h:t)
identifies the node that is the h
-th child of the
node reachable by the path t
. The path
is the `natural' identification for each tree node; it can be
constructed and resolved purely functionally. Furthermore, there is a
decidable notion of identity and order on paths. To annotate the tree
with values of a certain type, we build a separate
finite map data structure associating paths with the annotations on
the corresponding nodes. To add annotations of a different type, we
build another map.
We have used this solution in a type checking code, to annotate each
node of the syntax tree with the type. Each sub-expression gets
annotated with its reconstructed type. The annotations are attached
without re-defining the data type of expressions or even rebuilding
the syntax tree. Since the code was intended for teaching, clarity was
important and so was the avoidance of STRefs
let alone StableNames
and any IO. Our
code, extending the base type checker TEvalNC.hs
, is in
the file TEvalNR.hs
.
A bonus exercise was to modify the code to make the type checker return the reconstructed types of sub-terms even if the entire term turns out ill-typed. The intention was to model OCaml -- which, given a special flag, can dump the inferred types of all sub-expressions, even if the overall type checking failed. In Emacs and vi, one can highlight an expression or variable and see its inferred type. If the type checking failed, one can still explore what the type checker did manage to figure out.
TEvalNR.hs [4K]
A version of TEvalNC.hs
that reports not only
the inferred type of the whole term but also the inferred types for
each subterm. The latter data are
returned in a `virtual' typed AST -- virtual because the original AST
is not modified and the inferred types are attached to AST nodes,
well, virtually.
Interpreting types as abstract values: a tutorial on Hindley-Milner type inference
Lecture notes with the explanation of the type checking code.
MLton: Property Lists
< http://mlton.org/PropertyList >
The description of property lists and their implementation using mutation and the Universal type.
We consider three examples of such transformations,
including two pure functional implementations of a bubble-sort. The
imperative code is quite complex: nested loops, data
dependencies across iterations, reading from and writing into the
array, conditional branches that depend on data
and the iteration history, control dependencies across nested loops.
We show two solutions in Haskell. Both use fast, imperative STArrays
;
both try to separate the imperative part from a functional part. The
imperative part is performed in a monad while the functional part
is written in ordinary Haskell. The two solutions are closely related,
and written to emphasize their similarity. One of the solution is
an interpreter for a trivial virtual machine.
The analysis of data and control dependencies across loop iterations, scheduling of instructions and interlocking of their results, pre-fetching of data -- all are the job of an advanced compiler or the CPU. On the other hand, we have to do the same kind of analyses to separate the functional part of a computation from its imperative part. Functional programming appears closer to ``the bare metal'' than one would have thought.
The relationship of the dependency analysis of the imperative array
code with the extraction of a functional program from an imperative
one has been stated by Andrew W. Appel, in his paper "SSA is
Functional Programming". The present article has shown several
illustrations of his thesis. We have done manual dependency analyses
of several code snippets that update arrays in place. The analyses let
us extract pure-functional parts of imperative programs. The rest
requires serialization and can be handled by an ST
monad.
fp-arrays-assembly.txt [12K]
The article with the full source code
It was originally posted as FP vs assembly programming for super-scalar CPUs [Was: Nested Loops in Clean] on the newsgroup comp.lang.functional on Thu, 16 Aug 2001 18:24:24 -0700
Andrew W. Appel SSA is Functional Programming
ACM SIGPLAN Notices v. 33, no. 4, pp. 17-20, April 1998.
The following code is a space-efficient implementation of the
game Mastermind. It is efficient not in terms of the heap or stack
space but in the source text space. Jan Rochel wrote the original
code and put it in his e-mail signature. He asked for a shorter
version. The present code relies on a different algorithm to compute
exact and inexact matches in two strings. The new algorithm made the
code a bit faster, but more importantly, a bit (7%) shorter. The new
code is also more obscure and exhibits some interesting patterns, in
particular f x=(x\\).(x\\)
.
The version to put in a .signature (275 chars):
import Random;import List;p=putStrLn;u=uncurry;f x=(x\\).(x\\) main=mapM(\x->randomRIO(49,54))[1..4]>>=n 0.map toEnum>>=p.("Tries: "++).show e=((partition$u(==)).).zip;h(p,q)=['*'|x<-p]++['+'|x<-u f$unzip q] n a s=getLine>>=m where{m i|i==s=return a;m i=p(h$e i s)>>n(a+1)s}
An expanded version that is easier to read:
import Random;import List main = mapM(\x->randomRIO(49,54))[1..4] >>= n 0 . map toEnum >>= p . ("Tries: "++) . show p=putStrLn u=uncurry e=((partition$u(==)).) . zip f x=(x\\).(x\\) h(p,q)=['*'|x<-p] ++ ['+'|x<-u f $ unzip q] n a s=getLine>>=m a s m a s i | i==s = return a m a s i = p(h$e i s) >> n(a+1)sThe code runs on GHC(i).
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML