We implement the type-directed memoization from the paper in OCaml. Quite a few adjustments had to be made. First of all, OCaml does not have type families, or even type classes. We use type-indexed--value approach pioneered by Zhe Yang. Second, whereas Haskell relied on laziness, we explicitly use reference cells for recording the computed results. Finally, recursive types require an additional level of indirection via the reference cell. The trick is not unlike the eta-expansion that converts the ordinary fixpoint combinator to the applicative one: We have to delay the computation of the fixpoint until we receive the argument for the fixpointed function. Still, the computed fixpoint should be shared among all applications of the memoized function.
Here is a simple example: memoizing functions on boolean lists. A boolean list has a recursive type (recursive sum of product), which can be written as:
BList = 1 + Bool * BListor, with the explicit fixpoint,
blist = mu self.(unit + bool * self)In OCaml, we write it as follows:
module BLST = FIX(struct type 'self t = (unit,bool*'self) either let mdt self = md_sum md_unit (md_prod md_bool self) end) let nil = BLST.Fix (Left ()) let cons h t = BLST.Fix (Right (h,t))The type expression for the type of
BLST
, (unit,bool*'self) either
matches the mathematical notation. Mainly, the expression
md_sum md_unit (md_prod md_bool self)
--
which computes the memoizer for the functions on boolean lists
from the memoizers for unit, booleans and memoizer combinators, --
also matches the mathematical notation for the recursive type of
boolean lists. Our memo tables are indeed type-directed.memo_type_directed.ml [6K]
The complete OCaml code and tests
The original HList offered a range of heterogeneous collections, from lists to open records and variants, indexed by types or type-level labels. It also required heavy type-level computations involving type-level type equality functions -- with the accompanying fragile type inference, incomprehensible error messages and long compile times. We present a lightweight heterogeneous list library. It provides only lists, but also requires none of the heavy type level machinery and is hence implementable in pure OCaml.
A nested tuple is already a heterogeneous list. It is not clear how to iterate or map over it, though. In fact, what should be the type of a map over a heterogeneous list is already a challenge. A bigger challenge is the operation to update an element deep inside a heterogeneous list with a value of a different type. We implement it (the comments underneath an expression show its result):
module HL = HList(struct type 'a t = 'a end) let l3 = HL.(cons "abc" @@ cons true @@ cons 1 @@ nil) (* val l3 : (string * (bool * (int * unit))) HLLens.hlist = HL.S ("abc", HL.S (true, HL.S (1, HL.Z))) *) let n0 = LHere let n1 = LNext n0 let n2 = LNext n1 HLLens.proj n1 l3 (* - : bool = true *) let l31 = HLLens.rplc n1 l3 false (* val l31 : (string * (bool * (int * unit))) HLLens.hlist = HLLens.S ("abc", HLLens.S (false, HLLens.S (1, HLLens.Z))) *) let l32 = HLLens.update n1 l3 (function true -> 'a' | false -> 'b') (* val l32 : (string * (char * (int * unit))) HLLens.hlist = HLLens.S ("abc", HLLens.S ('a', HLLens.S (1, HLLens.Z))) *) HLLens.proj n1 l32 (* - : char = 'a' *)
Another example is a heterogeneous list whose elements are ordinary lists, such as:
module HLL = HList(struct type 'a t = 'a list end) let test_list = HLL.(cons [1;2;3] @@ cons ["A";"B"] @@ cons [0.3;0.2;0.1] @@ nil)We then compute the Cartesian product over
test_list
, producing:
- : (int * (string * (float * unit))) list = [(1, ("A", (0.3, ()))); (1, ("A", (0.2, ()))); (1, ("A", (0.1, ())));...]
A quite more interesting example is a tagless-final embedding of simply-typed lambda calculus with De Bruijn levels.
Embedding of lambda calculus with De Bruijn levels
The original Strongly typed heterogeneous collections, in Haskell
One may think that making captured continuations persistent is
trivial: after all, OCaml already supports marshaling of values
including closures. If one actually tries to marshal a captured
delimited continuation, one quickly discovers that the naive
marshaling fails with the error on attempting to serialize an abstract
data type. One may even discover that the troublesome abstract data
type is _chan
. The captured delimited continuation (a
piece of stack along with administrative data) refers to heap data
structures created by delimcc and other OCaml libraries; some of these
data structures are closures, which contain module's environment and
may refer to standard OCaml functions like prerr
. That
function is a closure over the channel stderr
, which is
non-serializable. This points out the first problem: if we serialize
all the data reachable from the captured continuation, we may end up
marshaling a large part of the heap and the global environment. This
is not only inefficient but also lethal, as we are liable to
encounter channels and other non-serializable data structures.
There is a more serious problem however. If we serialize all data reachable from the captured delimited continuation, we also serialize two pieces of global state used by the delimcc library itself. When the stored continuation is deserialized, a fresh copy of these global data is created, referenced from within the restored continuation. Thus the whole program will have two copies of delimcc global data: one for use in the main program and one for use by the deserialized continuation. Although such an isolation may be desirable in many cases, it is precisely wrong in our case: the captured and the host continuations do not have the common view of the system and cannot work together. It may be instructive to contemplate process checkpointing offered by some operating systems (see also `undump' typically used by Emacs and TeX). When checkpointing a process, we wish to save the continuation of the process only (rather than the continuation of the scheduler that created the process, and the rest of the OS continuation). We also wish to save data associated with the process, for example, the process control block and the description of allocated memory and other resources. Control blocks of all processes are typically linked in; when saving the control block of one process, we definitely do not wish to save everything that is reachable from it. When saving the state of a process in a checkpoint, we do not usually save the state of the file system -- or even of all files used by the process. First of all, that is impractical. Mainly, it is sometimes wrong. For example, a process might write to a log file, e.g., syslog. We specifically do not wish to save the contents of the syslog along with the process image. We want the restored process append to the system log rather than replace it!
Of course resuming a suspended process after modifying its input files may also be wrong. It is a hard question of what should be saved by value and what should be saved by reference only. It is clear however that both mechanisms are needed. The serialization code of the delimcc library does offer both mechanisms. The inspiration comes from the fact that OCaml's own marshaling function, when handling closures, serializes OCaml code by reference. The delimcc library extends this approach to data. The library supports the registration of data (which currently must be closures in the old heap) in a global array. When serializing a continuation, the library traverses it and replaces all references to registered closures with indices in the global array; we then invoke OCaml's own serialization routine to marshal the result. After that, we undo the replacement of closures with indices. Such value mangling is not without precedent: to detect sharing, OCaml's own marshaling routine too mangles the input value. The use of the global array is akin to the implementation of cross-staged persistence in MetaOCaml.
Persistent delimited continuations for CGI programming with nested
transactions
The salient application of persistent delimited continuations
is the library for writing CGI scripts as if they were interactive
console applications using read
and printf
.
The above library implements the minimal CGI programming framework
with form validation. The library also supports nested transactions. The
captured continuations are relatively compact: the essentially empty
captured continuation takes 491 bytes when serialized. Serialized
continuations of the unoptimized blog application have the typical
size of 10K (depending on the size of the posts); bzip can
compress them to one third of the original size.
Dynamic binding is implemented as a regular library, dependent
on the delimited continuations library. No changes to the OCaml system
and no code transformations are required; (parts of the) code that do
not use dynamic variables incur no overhead and run at the same speed
as before. Our dynamic variables are mutable; mutations however are
visible only within the scope of the dlet
statement where
they occurred. It is also possible to obtain not only the latest
binding to the dynamic variable, but also any of the shadowed
bindings.
Because dynamic binding is implemented in terms of delimited continuations, the two features harmoniously interact. We can use dynamic variables in shift-based, cooperative threads, and support partial inheritance of the dynamic environment, with both shared and thread-private (mutable) dynamic variables.
dynvar.mli [2K]
dynvar.ml [2K]
The interface and the implementation of the library
vdynvar.ml [6K]
The test code
The example at the end of that file demonstrates the partial
inheritance of the dynamic environment among the parent and two
cooperatively run threads.
Delimited Dynamic Binding The ICFP 2006 paper justifying the implementation
Luc Moreau: A Syntactic Theory of Dynamic Binding
Higher-Order and Symbolic Computation, 11, 233-279 (1998)
Printing the outline of a pruned tree, using the extension to obtain shadowed dynamic bindings.
Delimited continuations in OCaml: required dependency
We show a conservative and elementary implementation of resumable exceptions in OCaml: the implementation is a self-contained 100% pure OCaml library; makes no changes to the OCaml system; supports the existing style of defining exceptions; is compatible with the ordinary exceptions; works in byte- or natively-compiled code; uses the most basic facilities of ML and so can easily be translated to SML.
We impose no extra restrictions on the resumable exception
raising and handling forms. Like with ordinary exceptions, resumable
ones may encapsulate values of arbitrary types; the same exception
handler may process exceptions of many types -- and send resumption
replies of many types. The raise form may appear within
the guarded code at many places; different raise forms
may resume with the values of different types. Furthermore, resumable
exceptions are declared just like the ordinary ones,
with the exception
keyword. If
the resumable exception handler never resumes, resumable exceptions
act and feel precisely as the regular ones.
resumable.ml [6K]
The complete implementation, interface documentation, explanation,
and an illustrative example
Rainer Joswig's message with several examples of usefulness of resumable exceptions. It is posted on `Lambda the Ultimate' on June 15, 2006.
amb
operator, first introduced by John
McCarthy and well described by Dorai Sitaram in the context of Scheme,
takes zero or more expressions (thunks) and nondeterministically
returns the value of one of them. This implies that at least one of
amb
's expressions must yield a value, that is, does not
fail. If amb
has no expressions to evaluate or all of
them fail, amb
itself fails. One may think that amb
is easily
implementable by taking a list of thunks and
evaluating the thunks in some order within the try
block. The value of the thunk finishing without raising an exception
is returned. However, that simple implementation is wrong. It is not
enough that amb
's chosen expression itself evaluates
successfully. The chosen expression must be such that its value causes
the whole program finish without errors, if at all
possible. The amb
operator must `anticipate' how the
value of the chosen expression will be used in the rest of the
computation. Therefore, amb is called an angelic
nondeterministic choice operator.
Andrej Bauer gave the following example on the discussion thread:
if (amb [(fun _ -> false); (fun _ -> true)]) then 7 else failwith "failure"This program, he explained, should return 7: ``the
amb
inside the conditional should `know' (be told by an angel) that the
right choice is the second element of the list because it leads to 7,
whereas choosing the first one leads to failure.''
Therefore, we need the ability to examine (or speculatively
execute) the rest of the computation. In Scheme, amb
is
implementable in terms of call/cc
, as well explained by
Dorai Sitaram. OCaml has more appropriate delimited control operators,
which implement amb
in two lines of code. We also need a `toplevel
function', to tell us if the overall computation succeeded. One may
think of it as St. Peter at the gate. For now, we take a computation
that raises no exception as successful. In general, even
non-termination within a branch can be dealt with
intelligently (cf. `cooperative' threading which must yield from time to time).
Andrej Bauer's test now looks in full as
let test1 () = let v = if (amb [(fun _ -> false); (fun _ -> true)]) then 7 else failwith "Sinner!" in Printf.printf "the result: %d\n" v;; let test1r = toplevel test1;; (* "the result: 7" *)Speculatively evaluating amb's expressions or the rest of the computation may incur effects, such as mutation or IO. We can deal with them using one of the standard transaction implementation techniques: prohibit effects, log the updates, log the state at the beginning to roll back to, use zipper for functional `mutations'.
Here is a more advanced test, requiring a three-step-ahead clairvoyance
from amb
:
let numbers = List.map (fun n -> (fun () -> n)) [1;2;3;4;5];; let pyth () = let (v1,v2,v3) = let i = amb numbers in let j = amb numbers in let k = amb numbers in if i*i + j*j = k*k then (i,j,k) else failwith "too bad" in Printf.printf "the result: (%d,%d,%d)\n" v1 v2 v3;; let pythr = toplevel pyth;; (* the result: (3,4,5) *)In monadic terms,
amb
is equivalent to msum
in MonadPlus
. Even though we are interested in the first
result of the entire MonadPlus
computation, along the way
we have to keep track of many possible worlds. That is, we need
something like a List
monad rather than a Maybe
monad
(the latter should not even be regarded as MonadPlus
).Dorai Sitaram: Teach Yourself Scheme in Fixnum Days. 1998-2004.
Chapter 14. Nondeterminism.
<http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-16.html#node_chap_14>
amb.ml [3K]
The commented OCaml implementation
It was posted on the above discussion thread on
Sat, 10 Feb 2007 03:15:56 -0800 (PST)
fun 'a embed () = let exception E of 'a fun project (e: t): 'a option = ...At first sight, that SML feature seems impossible in OCaml (prior to OCaml 3.12). Although we can declare local exceptions in OCaml via local
struct
ures, before OCaml 3.12 we could not use type variables
quantified outside the structure: a structure limits the scope of all
of its type variables. We show that local globally-quantified
exceptions are macro-expressible in all versions of OCaml and
demonstrate two translations. The first one relies on multi-prompt
delimited continuations, whose implementation leads to the second
translation. The latter represents a polymorphic exception mere by a
parameter-less exception and one reference cell.
The caml-list thread referenced below gave a good motivation
for locally polymorphic exceptions: writing an efficient library
function fold_file
of the following interface:
module type FoldFile = sig val fold_file : in_channel -> (* file *) (in_channel -> 'a) -> (* read_func *) ('a -> 'b -> 'b) -> (* combiner *) 'b -> (* seed *) 'b endWe can use this general folding over a file to, for example, count the number of lines in a file:
module TestFold(F:FoldFile) = struct let line_count filename = (* string->int *) let f = open_in filename in let counter _ count = count + 1 in F.fold_file f input_line counter 0 let test = line_count "/etc/motd" endThe following tentative implementation has been outlined on ocaml-list:
module Attempt0 = struct exception Done of 'a let fold_file file read_func elem_func seed = let rec loop prev_val = let input = try read_func file with End_of_file -> raise (Done prev_val) in let combined_val = elem_func input prev_val in loop combined_val in try loop seed with Done x -> x endThe
loop
is properly tail-recursive (NB: the body of a
try
block is not in a tail position) and avoids
any administrative data structures. Alas, the typechecker does not
accept the exception declaration, which says that Done
should carry a value of all types. There is no such value in OCaml,
and if it were, it wouldn't be useful. That was not our intention
anyway: we want the value of Done
to have the same type as
the result of the polymorphic function fold_file
. We should have
declared the exception inside fold_file
.
Surprisingly, that can be done:
Delimcc.prompt
is precisely this type of `local
exceptions'. We need only a slight and local adjustment to the above
code to make it compile. This is our first translation.
open Delimcc let abort p v = take_subcont p (fun sk () -> v);; module AttemptA : FoldFile = struct let fold_file file read_func elem_func seed = let result = new_prompt () in (* here is our local exn declaration *) let rec loop prev_val = let input = try read_func file with End_of_file -> abort result prev_val in let combined_val = elem_func input prev_val in loop combined_val in push_prompt result (fun () -> loop seed) end let module TestA = TestFold(AttemptA) in TestA.test;; (* - : int = 24 *)The analogy between exceptions and delimited continuations is profound: local exceptions are commonly used to implement multi-prompt delimited continuations in SML. We see the converse is also true. Furthermore, delimited continuations in OCaml are implemented in terms of exceptions. Abort is essentially raise. If we `inline' the gist of the delimited continuation library we arrive at our second translation. The result requires no libraries and works with both byte-code and native compiler.
module AttemptR : FoldFile = struct exception Done let fold_file file read_func elem_func seed = let result = ref None in (* here is our local exn declaration *) let rec loop prev_val = let input = try read_func file with End_of_file -> result := Some prev_val; raise Done in let combined_val = elem_func input prev_val in loop combined_val in try loop seed with Done -> (match !result with Some x -> x | _ -> failwith "impossible!} end;; let module TestR = TestFold(AttemptR) in TestR.test;; (* - : int = 24 *)The code is still properly tail-recursive and deforested. In contrast to other imperative implementations of
fold_file
, ours is
almost pure: the reference cell result
is written to
and immediately after read from only once during the whole folding
-- namely, at the very end. The bulk of the iteration is functional.
A mutable cell is the trick behind typing of
multi-prompt delimited continuations. One may even say that if a type
system supports reference cells, it shall support multi-prompt
delimited continuations -- and vice versa.Caml-list discussion thread Locally-polymorphic exceptions October 3-4, 2007. Of special note is the PML implementation posted by Christophe Raffalli.
The MLton team. UniversalType
Source for the SML example of local exceptions
<http://mlton.org/UniversalType>