This page describes various Objective Caml (OCaml), SML, and MetaOCaml code.
amb
Dynamic binding adds expressiveness; as Luc Moreau emphasized, it is, when used in moderation, quite useful: for parameterizing a function deep in the code without changing the interface of all callers; for propagating the environment information like the current working directory, printing-depth, etc. Dynamic binding is inescapable in mobile code or continuation-based web servers. Dynamic binding, in some restricted form, is already present in OCaml: exception handlers are dynamically scoped. This message announces the availability of general dynamic binding. This is joint work with Chung-chieh Shan and Amr Sabry.
Dynamic binding is implemented as a regular library, dependent
on the delimited continuations library. No changes to the OCaml system
and no code transformations are required; (parts of the) code that do
not use dynamic variables incur no overhead and run at the same speed
as before. Our dynamic variables are mutable; mutations however are
visible only within the scope of the dlet statement where
they occurred. It is also possible to obtain not only the latest
binding to the dynamic variable, but also any of the shadowed
bindings.
Because dynamic binding is implemented in terms of delimited continuations, the two features harmoniously interact. We can use dynamic variables in shift-based, cooperative threads, and support partial inheritance of the dynamic environment, with both shared and thread-private (mutable) dynamic variables.
The announcement with a few examples [plain text file]
It was originally posted on the caml-list on
Mon, 10 Apr 2006 18:25:29 -0700
dynvar.mli [2K]
The library interface
dynvar.ml [2K]
The library implementation
vdynvar.ml [6K]
The test code. The example at the end of that file demonstrates the partial
inheritance of the dynamic environment among the parent and two
cooperatively run threads.
Delimited Dynamic Binding
The ICFP 2006 paper justifying the implementation
Luc Moreau: A Syntactic Theory of Dynamic Binding. Higher-Order and Symbolic Computation, 11, 233-279 (1998)
Printing the outline of a pruned tree, using the extension to obtain shadowed dynamic bindings.
The facility that prints results and types of expressions evaluated at the top-level is now available anywhere in the program -- in bytecode- or natively compiled programs. Generic printing is a, perhaps unintentional, `side-effect' of MetaOCaml -- of the fact that a code value is not merely AST; the code value also captures the type and the type environment of variables and other values. Generic printing is a library that works with the unmodified MetaOCaml (which is fully compatible with the regular OCaml).
val fprint : Format.formatter -> ('a,'b) code -> string
is the core function which takes a code value of any type, and
pretty-prints it on the given formatter. The printed result is exactly the same as that by the top-level value printing. The
function fprint returns the representation of the
expression's type, as a string. The latter is arguably a frill, but it
was easy to do, so just as well.
For example,
let pr_type et = Format.printf "\n%s@." et
let () =
let x = Some ([|(10,true);(11,false)|]) in
pr_type (print .<x>.)
prints the following two lines: Some [|(10, true); (11, false)|]
(int * bool) array option
The first line is the value, and the latter (printed by pr_type)
is the type. There was no need to define any custom printer for the
value or its components. A more involved example is given at the end
of the announcement article referenced below.
Informally, an OCaml function of the type 'a-> ... corresponds to the Haskell function a -> .... OTH, an
OCaml function of the type ('a,'b) code -> ... corresponds to Haskell's Typeable b => b -> .... The latter
enables generic programming, similar to Haskell's
`Scrap Your Boilerplate' (Laemmel and Peyton-Jones).
Motivation, design and examples of generic print [plain text file]
This announcement article was originally posted on the caml-list on Sun, 16 Apr 2006 19:09:10 -0700
gprint.mli [<1K]
The library interface
gprint.ml [4K]
The library implementation
vgprint.ml [2K]
The test code
Makefile [5K]
The Makefile for building the generic printing library and
testing it at the MetaOCaml top level, in byte-code- and natively-compiled
programs.
Resumable exceptions are the strict generalization of regular exceptions, letting the exception raising form return a value and so the computation may continue. It's the exception handler that decides either to abort the exceptional computation or to resume it with a particular value. Resumable exceptions are made popular by Common Lisp, where they are widely used. Rainer Joswig's message, cited below, lists several real-life examples of resumable exceptions.
We show a conservative and elementary implementation of resumable exceptions in OCaml: the implementation is a self-contained 100% pure OCaml library; makes no changes to the OCaml system; supports the existing style of defining exceptions; is compatible with the ordinary exceptions; works in byte- or natively-compiled code; uses the most basic facilities of ML and so can easily be translated to SML.
We impose no extra restrictions on the resumable exception
raising and handling forms. Like with ordinary exceptions, resumable
ones may encapsulate values of arbitrary types; the same exception
handler may process exceptions of many types -- and send resumption
replies of many types. The raise form may appear within
the guarded code at many places; different raise forms
may resume with the values of different types. Furthermore, resumable
exceptions are declared just like the ordinary ones,
with the exception keyword. If
the resumable exception handler never resumes, resumable exceptions
act and feel precisely as the regular ones.
Motivation, design and an example of resumable exceptions [plain text file]
This announcement article was originally posted on the
caml-list on Wed, 14 Jun 2006 15:54:03 -0700. This file
also includes follow-ups, discussing syntactic sugar and the ways to
implement resumable exceptions in a multi-threaded system.
resumable.ml [6K]
Complete implementation, interface
documentation, explanation, and an illustrative example.
Rainer Joswig's message with several examples of usefulness
of resumable exceptions. It is posted on `Lambda the Ultimate' on June
15, 2006.
<http://lambda-the-ultimate.org/node/1544#comment-18632>
The new version of the delimcc library of first-class delimited continuations for byte-code OCaml now supports serializing and storing of captured continuations. The stored continuation can be invoked in the same or a different process running the same executable. The persistence feature supports process checkpointing, migration -- and specifically CGI programming on any, unmodified web server, e.g., Apache. Serialized continuations are twice delimited -- both in `control' and in `data'; the latter makes them compact and possible. The delimcc library is a pure library; it makes no changes to the OCaml system and has no effect on the code that does not capture delimited continuations.
One may think that making captured continuations persistent is
trivial: after all, OCaml already supports marshaling of values
including closures. If one actually tries to marshal a captured
delimited continuation, one quickly discovers that the naive
marshaling fails with the error on attempting to serialize an abstract
data type. One may even discover that the troublesome abstract data
type is _chan. The captured delimited continuation (a
piece of stack along with administrative data) refers to heap data
structures created by delimcc and other OCaml libraries; some of these
data structures are closures, which contain module's environment and
may refer to standard OCaml functions like prerr. That
function is a closure over the channel stderr, which is
non-serializable. This points out the first problem: if we serialize
all the data reachable from the captured continuation, we may end up
marshaling a large part of the heap and the global environment. This
is not only inefficient but also lethal, as we are liable to
encounter channels and other non-serializable data structures.
There is a more serious problem however. If we serialize all data reachable from the captured delimited continuation, we also serialize two pieces of global state used by the delimcc library itself. When the stored continuation is deserialized, a fresh copy of these global data is created, referenced from within the restored continuation. Thus the whole program will have two copies of delimcc global data: one for use in the main program and one for use by the deserialized continuation. Although such an isolation may be desirable in many cases, it is precisely wrong in our case: the captured and the host continuations do not have the common view of the system and cannot work together. It may be instructive to contemplate process checkpointing offered by some operating systems (see also `undump' typically used by Emacs and TeX). When checkpointing a process, we wish to save the continuation of the process only (rather than the continuation of the scheduler that created the process, and the rest of the OS continuation). We also wish to save data associated with the process, for example, the process control block and the description of allocated memory and other resources. Control blocks of all processes are typically linked in; when saving the control block of one process, we definitely do not wish to save everything that is reachable from it. When saving the state of a process in a checkpoint, we do not usually save the state of the file system -- or even of all files used by the process. First of all, that is impractical. Mainly, it is sometimes wrong. For example, a process might write to a log file, e.g., syslog. We specifically do not wish to save the contents of the syslog along with the process image. We want the restored process append to the system log rather than replace it!
Of course resuming a suspended process after modifying its input files may also be wrong. It is a hard question of what should be saved by value and what should be saved by reference only. It is clear however that both mechanisms are needed. The serialization code of the delimcc library does offer both mechanisms. The inspiration comes from the fact that OCaml's own marshaling function, when handling closures, serializes OCaml code by reference. The delimcc library extends this approach to data. The library supports the registration of data (which currently must be closures in the old heap) in a global array. When serializing a continuation, the library traverses it and replaces all references to registered closures with indices in the global array; we then invoke OCaml's own serialization routine to marshal the result. After that, we undo the replacement of closures with indices. Such value mangling is not without precedent: to detect sharing, OCaml's own marshaling routine too mangles the input value. The use of the global array is akin to the implementation of cross-staged persistence in MetaOCaml.
Native delimited continuations in (byte-code) OCaml
The description of the delimcc library
The persistent twice-delimited continuations have been demonstrated
at the Continuation Fest 2008 (April 13, Tokyo, Japan) and described
in the message to the Caml-List posted on
Sun, 27 Apr 2008 17:53:36 -0700 (PDT).
Persistent delimited continuations for CGI programming with nested
transactions
The salient application of persistent delimited continuations
is the library for writing CGI scripts as if they were interactive
console applications using read and printf. The above
library implements the minimal CGI programming framework with form
validation. The library also supports nested transactions. The
captured continuations are relatively compact: the essentially empty
captured continuation takes 491 bytes when serialized. Serialized
continuations of the unoptimized blog application have the typical
size of 10K (depending on the size of the posts); bzip can
compress them to one third of the original size.
We show how to list all top-level bindings defined since the start of the OCaml toplevel session. We can list the names and the types of the bindings. We can also print the values of top-level bindings of a specific type. There is no need to recompile anything. However, we need the OCaml installation directory with the object files left after the making of the toplevel. If we are willing to remake the toplevel, the installation directory is no longer required.
After the preparation step or the alternative preparation step
described below, we enter or #use, in the toplevel, the
code print-toplevel-bindings.ml. We define a few sample
bindings:
# let x = 1;;
# let x = 2;;
# let y = 10;;
After that, evaluating# print_bindings Format.std_formatter (get_value_bindings (!Toploop.toplevel_env));;will print all the top-level bindings defined since the start of the toplevel session:
binding: get_value_bindings/79
val get_value_bindings : Env.t -> (Ident.t * Types.value_description) list
binding: print_bindings/107
val print_bindings :
Format.formatter -> (Ident.t * Types.value_description) list -> unit
binding: type_to_str/177 val type_to_str : Types.type_expr -> string
binding: print_int_toplevel/179
val print_int_toplevel :
Format.formatter -> (Ident.t * Types.value_description) list -> unit
binding: x/186 val x : int
binding: x/187 val x : int
binding: y/188 val y : int
Done
For each binding, its name and type are printed. We see that
the type environment keeps track of all the previous
definitions of a name. Because x was defined twice,
there are two entries in the type environment: x/186 and x/187. The counters are the timestamp.
We can also print the values associated with the bindings of
one particular type, for example, int:
# print_int_toplevel Format.std_formatter (get_value_bindings (!Toploop.toplevel_env));;which gives the following output:
binding: x/186 value: 2
binding: x/187 value: 2
binding: y/188 value: 10
Done
If a variable is defined several times, the top-level value
environment keeps the last associated value, however. The function
print_int_toplevel cannot, generally, be polymorphic
-- unless we are willing to assume responsibility that our type
representation string matches the desired type -- or we are willing
to use MetaOCaml.Caml-list discussion thread Listing toplevel bindings started by John Harrison. Sep 26-27, 2006.
Preparation step
This step requires the OCaml installation directory with the
object files left after the building of the toplevel. First, we need
to retrieve the file gprint_toplevel.ml [2K]
and adjust the paths in the #directory directives to
point to our OCaml installation directory. We start the OCaml
toplevel and execute all #directory and the
#load directives in that file up to, but not including
the loading of genprintval.cmo. Please do not change the order of the load directives! It took about half an hour
to find the right order....
Alternative preparation step
On the discussion thread, Jonathan Roewen suggested an alternative.
It requires rebuilding of the toplevel;
the OCaml distribution is no longer needed then.
We need to skip the expunge step after the toplevel is built: grep
for expunge in the base Makefile.
That step
erases the mentioning of many internal components from toplevel's
module dictionary. Therefore, these OCaml system modules act as if they are
not loaded. We need these module for the present application however.
print-toplevel-bindings.ml [2K]
The implementation file. It was posted on the above discussion thread
on Tue, 26 Sep 2006 01:01:20 -0700 (PDT)
ambThe amb operator, first introduced by John
McCarthy and well described by Dorai Sitaram in the context of Scheme,
takes zero or more expressions (thunks) and nondeterministically
returns the value of one of them. This implies that at least one of
amb's expressions must yield a value, that is, does not
fail. If amb has no expressions to evaluate or all of
them fail, amb itself fails. One may think that amb is easily implementable by taking a list of thunks and
evaluating the thunks in some order within the try
block. The value of the thunk finishing without raising an exception
is returned. However, that simple implementation is wrong. It is not
enough that amb's chosen expression itself evaluates
successfully. The chosen expression must be such that its value causes
the whole program finish without errors, if at all
possible. The amb operator must `anticipate' how the
value of the chosen expression will be used in the rest of the
computation. Therefore, amb is called an angelic nondeterministic choice operator.
Andrej Bauer gave the following example on the discussion thread:
if (amb [(fun _ -> false); (fun _ -> true)]) then
7
else failwith "failure"
This program, he explained, should return 7: ``the amb
inside the conditional should "know" (be told by an angel) that the
right choice is the second element of the list because it leads to 7,
whereas choosing the first one leads to failure.''
Therefore, we need the ability to examine (or speculatively
execute) the rest of the computation. In Scheme, amb is
implementable in terms of call/cc, as well explained by
Dorai Sitaram. OCaml has more appropriate delimited control operators,
which implement amb in two lines of code. We also need a `toplevel
function', to tell us if the overall computation succeeded. One may
think of it as St. Peter at the gate. For now, we take a computation
that raises no exception as successful. In general, even
non-termination within a branch can be dealt with
intelligently (cf. `cooperative' threading which must yield from time to time).
Andrej Bauer's test now looks in full as
let test1 () =
let v =
if (amb [(fun _ -> false); (fun _ -> true)]) then
7
else failwith "Sinner!"
in Printf.printf "the result: %d\n" v;;
let test1r = toplevel test1;; (* "the result: 7" *)
Speculatively evaluating amb's expressions or the rest of the
computation may incur effects, such as mutation or IO. We can deal
with them using one of the standard transaction implementation
techniques: prohibit effects, log the updates, log the state at
the beginning to roll back to, use zipper for functional `mutations'.
Here is a more advanced test, requiring a three-step-ahead clairvoyance
from the amb
let numbers = List.map (fun n -> (fun () -> n)) [1;2;3;4;5];;
let pyth () =
let (v1,v2,v3) =
let i = amb numbers in
let j = amb numbers in
let k = amb numbers in
if i*i + j*j = k*k then (i,j,k) else failwith "too bad"
in Printf.printf "the result: (%d,%d,%d)\n" v1 v2 v3;;
let pythr = toplevel pyth;; (* the result: (3,4,5) *)
In monadic terms, amb is equivalent to msum
in MonadPlus. Even though we are interested in the first
result of the entire MonadPlus computation, along the way
we have to keep track of many possible worlds. That is, we need
something like a List monad rather than a Maybe monad (the latter should not even be regarded as
MonadPlus).Caml-list discussion thread Amb started by Jonathan Bryant. February 09-10, 2007.
Dorai Sitaram: Teach Yourself Scheme in Fixnum Days. 1998-2004. Chapter 14. Nondeterminism.
<http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-16.html#node_chap_14>
amb.ml [3K]
Commented OCaml implementation
It was posted on the above discussion thread on
Sat, 10 Feb 2007 03:15:56 -0800 (PST)
This message shows the translation of Haskell98 non-constructor classes to OCaml, which gives us bounded polymorphism and a sort of polymorphic recursion in OCaml. The code easily extends to SML.
We implement a simplified version of the numeric
Num typeclass, called Numb, and use it to write
a polymorphic function summ to sum a list of any
Numb-ers. For clarity, we develop code both in Haskell
and OCaml. It is instructive to compare the inferred types of the
summ function, in Haskell and OCaml. They are,
respectively:
summ :: (Numb a) => [a] -> a
val summ : 'a numb -> 'a list -> 'a = <fun>
The only difference is in the shape of the arrow: (Numb a =>) vs. ('a numb ->). In both Haskell and OCaml, the function summ is bounded polymorphic: it applies to lists of values of any
type, provided that we have the evidence, or the
`dictionary', that the type is in the class Numb. We must
pass that evidence, the witness of the Numb constraint,
as the first argument of summ. The OCaml type of summ is revealing about the nature of bounded polymorphism. An
unbounded polymorphic function such as id : 'a -> 'a
corresponds to the universally quantified proposition
forall a. a -> a. The function summ
corresponds to the proposition (List(a) -> a) quantified
only over a part of the domain of discourse. That part is decided
by the predicate Numb(a). In other words, we assert
List(a) -> a only when Numb(a). The whole
proposition reads forall a. Numb(a) -> (List(a) -> a) --
which is literally the type of summ modulo stylistic
differences.
Although the OCaml implementation is a faithful translation of Haskell code, the explicit use of dictionary passing is quite an annoyance. Since the translation replaces double arrow with a single arrow, we have to explicitly pass the dictionary argument to all bounded polymorphic functions in OCaml. In Haskell, these arguments are (most of the time) inferred.
Implementing Haskell's constructor classes -- classes parameterized on type constructors -- require the use of the module system of OCaml.
Operator overloading [plain text file]
Implementation of the simplified version of the Num typeclass in OCaml, in comparison with Haskell. This article was originally posted on the caml-list on Thu, 8 Mar 2007 23:36:58 -0800 (PST)
P. J. Stuckey and M. Sulzmann: A theory of overloading
ACM Transactions on Programming Languages and Systems (TOPLAS), 27(6):1-54, 2005.
<http://www.comp.nus.edu.sg/~sulzmann/research/ms_bib.html#overloading-journal>
Applicative translucent functors in Haskell
Messages relating the module system of OCaml with Haskell typeclasses.
Posted on the Haskell mailing list on Fri Aug 27 19:51:43 EDT 2004 and
Mon Sep 13 15:20:33 EDT 2004.
<http://www.haskell.org/pipermail/haskell/2004-August/014463.html>
<http://www.haskell.org/pipermail/haskell/2004-September/014515.html>
SML has local exception declarations. Furthermore, it lets us define a polymorphic function whose body declares a local exception with the type tied to that of the whole function. For example:
fun 'a embed () = let exception E of 'a
fun project (e: t): 'a option = ...
At first sight, that SML feature seems impossible in
OCaml. Although we can declare local exceptions in OCaml via local
structures, we cannot use type variables quantified outside the
structure: a structure limits the scope of all of its type variables.
We show that local globally-quantified exceptions are macro-expressible in OCaml and demonstrate two translations. The first one relies on
multi-prompt delimited continuations, whose implementation leads to
the second translation. The latter represents a
polymorphic exception mere by a parameter-less exception and one reference
cell.
The caml-list thread referenced below gave a good motivation
for locally polymorphic exceptions: writing an efficient library
function fold_file of the following interface:
module type FoldFile = sig
val fold_file : in_channel -> (* file *)
(in_channel -> 'a) -> (* read_func *)
('a -> 'b -> 'b) -> (* combiner *)
'b -> (* seed *)
'b
end;;
We can use this general folding over a file to, for example, count the
number of lines in a file: module TestFold(F:FoldFile) = struct
let line_count filename = (* string->int *)
let f = open_in filename in
let counter _ count = count + 1 in
F.fold_file f input_line counter 0
let test = line_count "/etc/motd"
end;;
The following tentative implementation has been outlined on ocaml-list: module Attempt0 = struct
exception Done of 'a
let fold_file file read_func elem_func seed =
let rec loop prev_val =
let input = try read_func file
with End_of_file -> raise (Done prev_val) in
let combined_val = elem_func input prev_val in
loop combined_val
in
try loop seed with Done x -> x
end;;
The loop is properly tail-recursive (NB: the body of a try block is not in a tail position) and avoids
any administrative data structures. Alas, the typechecker does not
accept the exception declaration, which says that Done
should carry a value of all types. There is no such value in OCaml,
and if it were, it wouldn't be useful. That was not our intention
anyway: we want the value of Done to have the same type as
the result of the polymorphic function fold_file. We should have
declared the exception
inside fold_file. Surprisingly, that can be done:
Delimcc.prompt is precisely this type of `local
exceptions'. We need only a slight and local adjustment to the above
code to make it compile. This is our first translation. open Delimcc let abort p v = take_subcont p (fun sk () -> v);;
module AttemptA : FoldFile = struct
let fold_file file read_func elem_func seed =
let result = new_prompt () in (* here is our local exn declaration *)
let rec loop prev_val =
let input = try read_func file
with End_of_file -> abort result prev_val in
let combined_val = elem_func input prev_val in
loop combined_val
in
push_prompt result (fun () -> loop seed)
end;;
let module TestA = TestFold(AttemptA) in TestA.test;; (* - : int = 24 *)
The analogy between exceptions and delimited continuations is profound: local exceptions are commonly used to implement multi-prompt delimited continuations in SML. We see the converse is also true. Furthermore, delimited continuations in OCaml are implemented in terms of exceptions. Abort is essentially raise. If we `inline' the gist of the delimited continuation library we arrive at our second translation. The result requires no libraries and works with both byte-code and native compiler.
module AttemptR : FoldFile = struct
exception Done
let fold_file file read_func elem_func seed =
let result = ref None in (* here is our local exn declaration *)
let rec loop prev_val =
let input = try read_func file
with End_of_file -> result := Some prev_val; raise Done in
let combined_val = elem_func input prev_val in
loop combined_val
in
try loop seed with Done -> (match !result with Some x -> x
| _ -> failwith "impossible!")
end;;
let module TestR = TestFold(AttemptR) in TestR.test;; (* - : int = 24 *)
The code is still properly tail-recursive and deforested. In
contrast to other imperative implementations of fold_file, ours is
almost pure: the reference cell result is written to
and immediately after read from only once during the whole folding
-- namely, at the very end. The bulk of the iteration is functional.
A mutable cell is the trick behind typing of
multi-prompt delimited continuations. One may even say that if a type
system supports reference cells, it shall support multi-prompt
delimited continuations -- and vice versa.Yin-so Chen, Olivier Roussel, kirillkh, et al. best and fastest way to read lines from a file? Caml-list discussion thread. October 1-2, 2007.
Caml-list discussion thread Locally-polymorphic exceptions. October 3-4, 2007.
Of special note is the PML implementation posted by
Christophe Raffalli.
The MLton team. UniversalType. Source for the SML example of local exceptions.
<http://mlton.org/UniversalType>
This site's top page is http://pobox.com/~oleg/ftp/
Converted from SXML by SXML->HTML