ML Programming

The index of DSLs embedded in OCaml (mostly, in tagless-final style)

Introduction to tagless-final embedding

Tagless-final Primer

Gates and circuits: Hardware-description DSL and its optimizations

Lecture notes

Algebraic Data Types

Algebraic Data Types and Pattern-Matching

Embedding, analyzing and compiling DSLs with effects

Effects Without Monads: Non-determinism
DSL of algebraic effects and handlers
DSLs with reference types, pointers, mutable variables: translating OCaml-like reference-type variables to C-like mutable variables

Language-integrated query: database-specific languages

QUEL Language-integrated query and its compilation to SQL. Both the language and the transformation rules to SQL are safely user-extensible
SQUR Language-integrated query with ranking operations ORDER BY and LIMIT. The compilation to SQL is now performed by normalization-by-evaluation rather than by rewriting

Lambda calculus

Evaluators, Normalizers, Reducers: from denotational to operational semantics
Lambda to SKI: (typed and untyped) lambda calculus DSL, combinator calculus DSL, and the translation of the former to the latter
Embedding of lambda calculus with De Bruijn levels
Convenient to use embedding of lambda calculus and its (normal-order) normalization

Two-staged languages: combinator-based code generation

A surprisingly simple implementation of staging
<NJ> A DSL for code generation with mutable state, achieving for first time type-safety and hygiene without ad hoc restrictions
Embedded two-staged DSL with let-, letrec- and mutual-letrec--insertion in pure plain OCaml
C-like DSL with arrays, loops and mutable variables, and its compilation to C

Streams

DSL for stream processing pipelines, compiling to the hand-optimized--like imperative code
DSL for infinite streams, abstracting over laziness, delaying and forcing
shell/AWK/Perl-like scripting in OCaml

Session types

Session types without sophistry

Multiprompt delimited control

Executable direct denotational semantics of multiprompt delimited control, or, delimcc library as a DSL exclusively in OCaml
Answer-type modification without tears: DSLs of delimited control with answer-type modification, and the compilation to the delimcc interface

Natural language semantics

Basic Linear Algebra DSL and modular optimizations

Reconciling Abstraction with High Performance: A MetaOCaml approach, Chapters 4 and 5

Image Manipulation: from an interpreter to a compiler

Reconciling Abstraction with High Performance: A MetaOCaml approach, Chapter 6

Probabilistic programming

Embedded probabilistic programming The DSL2009 paper develops, in Sec 2, two embeddings of stochastic programs: tagless-final and direct (or, `very shallow'). The latter relies on the delimcc library of delimited control

Type-directed memoization

Memoization of a function is recording and reusing already computed associations between the argument and the result of the function. The associations are recorded in a memo table, which can be constructed generically, with the type determined automatically from the type of function's argument. The paper ``Fun with type functions'' (Sections 3.1 and 3.2) describes one such generic construction in Haskell, using type functions, and refers to other constructions. Essentially the same approach leads to generic finite maps, see Section 3.3 of the paper.

We implement the type-directed memoization from the paper in OCaml. Quite a few adjustments had to be made. First of all, OCaml does not have type families, or even type classes. We use type-indexed--value approach pioneered by Zhe Yang. Second, whereas Haskell relied on laziness, we explicitly use reference cells for recording the computed results. Finally, recursive types require an additional level of indirection via the reference cell. The trick is not unlike the eta-expansion that converts the ordinary fixpoint combinator to the applicative one: We have to delay the computation of the fixpoint until we receive the argument for the fixpointed function. Still, the computed fixpoint should be shared among all applications of the memoized function.

Here is a simple example: memoizing functions on boolean lists. A boolean list has a recursive type (recursive sum of product), which can be written as:

    BList = 1 + Bool * BList

or, with the explicit fixpoint,

    blist = mu self.(unit + bool * self)

In OCaml, we write it as follows:

    module BLST =
        FIX(struct type 'self t = (unit,bool*'self) either
                   let mdt self = md_sum md_unit (md_prod md_bool self) 
            end)
    
    let nil      = BLST.Fix (Left ())
    let cons h t = BLST.Fix (Right (h,t))

The type expression for the type of BLST, (unit,bool*'self) either matches the mathematical notation. Mainly, the expression md_sum md_unit (md_prod md_bool self) -- which computes the memoizer for the functions on boolean lists from the memoizers for unit, booleans and memoizer combinators, -- also matches the mathematical notation for the recursive type of boolean lists. Our memo tables are indeed type-directed.

Version

The current version is January 2009

References

Fun with type functions
Joint work with Simon Peyton Jones and Chung-chieh Shan
<http://research.microsoft.com/en-us/um/people/simonpj/papers/assoc-types/fun-with-type-funs/typefun.pdf>

memo_type_directed.ml [6K]
The complete OCaml code and tests

Typed heterogeneous collections: Lightweight HList

A heterogeneous collection is a datatype that is capable of storing data of different types, while providing operations for look-up, update, iteration, and others. There are various kinds of heterogeneous collections, differing in representation, invariants, and access operations.

The original HList offered a range of heterogeneous collections, from lists to open records and variants, indexed by types or type-level labels. It also required heavy type-level computations involving type-level type equality functions -- with the accompanying fragile type inference, incomprehensible error messages and long compile times. We present a lightweight heterogeneous list library. It provides only lists, but also requires none of the heavy type level machinery and is hence implementable in pure OCaml.

A nested tuple is already a heterogeneous list. It is not clear how to iterate or map over it, though. In fact, what should be the type of a map over a heterogeneous list is already a challenge. A bigger challenge is the operation to update an element deep inside a heterogeneous list with a value of a different type. We implement it (the comments underneath an expression show its result):

    module HL = HList(struct type 'a t = 'a end)
    let l3 = HL.(cons "abc" @@ cons true @@ cons 1 @@ nil)
    (*
      val l3 : (string * (bool * (int * unit))) HLLens.hlist =
        HL.S ("abc", HL.S (true, HL.S (1, HL.Z)))
    *)
    
    let n0 = LHere
    let n1 = LNext n0
    let n2 = LNext n1
    HLLens.proj n1 l3
    (* - : bool = true *)
    
    let l31 = HLLens.rplc n1 l3 false
    (*
      val l31 : (string * (bool * (int * unit))) HLLens.hlist =
        HLLens.S ("abc", HLLens.S (false, HLLens.S (1, HLLens.Z)))
    *)
    
    let l32 = HLLens.update n1 l3 (function true -> 'a' | false -> 'b')
    (*
      val l32 : (string * (char * (int * unit))) HLLens.hlist =
        HLLens.S ("abc", HLLens.S ('a', HLLens.S (1, HLLens.Z)))
    *)
    
    HLLens.proj n1 l32
    (* - : char = 'a' *)

Another example is a heterogeneous list whose elements are ordinary lists, such as:

    module HLL = HList(struct type 'a t = 'a list end)
    let test_list = HLL.(cons [1;2;3] @@ cons ["A";"B"] @@ cons [0.3;0.2;0.1] @@ nil)

We then compute the Cartesian product over test_list, producing:

    - : (int * (string * (float * unit))) list =
    [(1, ("A", (0.3, ()))); (1, ("A", (0.2, ()))); (1, ("A", (0.1, ())));...]

A quite more interesting example is a tagless-final embedding of simply-typed lambda calculus with De Bruijn levels.

Version

The current version is November 2018

References

hlist_simple.ml [6K]
The complete OCaml code and a few examples

Embedding of lambda calculus with De Bruijn levels

The original Strongly typed heterogeneous collections, in Haskell

do-while loops have always been in OCaml

Whereas OCaml has for and while loops, C and Java-like languages also have do...while; the Pascal family has the closely-related repeat...until loops. There are cases (reminded below) where do-while comes in quite naturally, and with performance advantages. For quite a while I've been wishing for do-while -- as is, without any performance-degrading emulation or extra booleans. Until I suddenly realized that do-while has always been in OCaml. Most likely this is well-known to many -- but probably not to everyone. This message is meant to be a reminder and a reference, of how do-while, with the expected performance, is represented in OCaml.

As a bit of motivation, the real need for the performant do-while came in the development of the recent version of strymonas: the stream library with the utmost performance. Beside OCaml, strymonas also generates C code, which has do-while natively. I wanted an interface that supports do-while and produces the best code for both languages. The method in this article has been used in strymonas and indeed gave the desired performance -- matching, or surpassing in some cases, hand-written stream-processing code.

The most common use case for do-while is executing some bit of (imperative, IO) code that turns out needing re-execution. The typical example is menu-selection. Here is how it looks in C:

    int c;
    do {
      puts("\nPlease enter");
      puts("1. choice A");
      puts("2. choice B");
      puts("q. exit");
      switch (c = getchar()) {
      case '1': puts("\n*** doing A"); break;
      case '2': puts("\n*** doing B"); break;
      }
    } while (!(c=='q' || c == EOF));

The menu has to be displayed first -- and then re-displayed, if an invalid choice is entered, and also to request another choice.

In strymonas, a typical case for do-while is zipping of complex (nested, filtered, etc.) streams. At some point we execute the generator of one stream, which is to produce an item and let the further code deal with it. If the generator for some reason has not produced an item (e.g., because of the filtering) and has not set the termination trigger, it needs to be re-executed before we turn to the other stream.

As well known, the lack of do-while loop does not limit expressiveness: After all, according to the Structured Programming theorem, every control flow may be converted to the one using sequential composition, branching and while-iteration -- provided one may introduce an arbitrary amount of auxiliary (often boolean) state. For our running example, the menu-selection C code can be written in OCaml as follows:

    let again = ref true in
    while !again do
      print_endline "\nPlease enter";
      print_endline "1. choice A";
      print_endline "2. choice B";
      print_endline "q. exit";
      match input_char stdin with
      | exception End_of_file -> again := false
      | '1'   -> print_endline "\n*** doing A"
      | '2'   -> print_endline "\n*** doing B"
      | 'q'   -> again := false
      |  _    -> ()
    done

(Juxtaposing C and OCaml code could be confusing because the operator ! in C is negation whereas in OCaml it is dereference.) We have to introduce an extra boolean flag, and test for it, even before the first iteration where we know its value for sure. The compiled assembly code looks, schematically, as

          movq  $1,again
    loop: test  again, again
          jz    exit
          mov   $msg1, %edi
          call  print_endline
          ...
          jmp   loop
    exit:

In contrast, the original do-while C code gets compiled to:

    loop: mov   $msg1, %edi
          call  puts
          ...
          call  getc
          ...
          cmp   $-1, %eax
          jne   loop

It is more efficient: there is no need for a conditional branch at the beginning of the loop. An extra branch may seem insignificant -- but not when it is repeated literally hundreds of millions of times in inner loops. The lack of do-while indeed hinders the performance, as I have confirmed in strymonas benchmarks.

It turns out, however, that OCaml does have do-while. The menu-selection can be written to perfectly match the control flow of the original C code, also avoiding auxiliary booleans. In fact, we can avoid introducing any mutable state:

    while 
      print_endline "\nPlease enter";
      print_endline "1. choice A";
      print_endline "2. choice B";
      print_endline "q. exit";
      match input_char stdin with
      | exception End_of_file -> false
      | '1'   -> print_endline "\n*** doing A"; true
      | '2'   -> print_endline "\n*** doing B"; true
      | 'q'   -> false
      |  _    -> true
    do () done

That is the trick. To write it schematically, the do-while of C

    do stm while(test)

is perfectly represented in OCaml as

    while stm; test do () done

It compiles to the desired assembly, with no extra branching.

References

strymonas: Highest-performance stream processing

Persistent twice-delimited continuations

The new version of the delimcc library of first-class delimited continuations for byte-code OCaml now supports serializing and storing of captured continuations. The stored continuation can be invoked in the same or a different process running the same executable. The persistence feature supports process checkpointing, migration -- and specifically CGI programming on any, unmodified web server, e.g., Apache. Serialized continuations are twice delimited -- both in `control' and in `data'; the latter makes them compact and possible. The delimcc library is a pure library; it makes no changes to the OCaml system and has no effect on the code that does not capture delimited continuations.

One may think that making captured continuations persistent is trivial: after all, OCaml already supports marshaling of values including closures. If one actually tries to marshal a captured delimited continuation, one quickly discovers that the naive marshaling fails with the error on attempting to serialize an abstract data type. One may even discover that the troublesome abstract data type is _chan. The captured delimited continuation (a piece of stack along with administrative data) refers to heap data structures created by delimcc and other OCaml libraries; some of these data structures are closures, which contain module's environment and may refer to standard OCaml functions like prerr. That function is a closure over the channel stderr, which is non-serializable. This points out the first problem: if we serialize all the data reachable from the captured continuation, we may end up marshaling a large part of the heap and the global environment. This is not only inefficient but also lethal, as we are liable to encounter channels and other non-serializable data structures.

There is a more serious problem however. If we serialize all data reachable from the captured delimited continuation, we also serialize two pieces of global state used by the delimcc library itself. When the stored continuation is deserialized, a fresh copy of these global data is created, referenced from within the restored continuation. Thus the whole program will have two copies of delimcc global data: one for use in the main program and one for use by the deserialized continuation. Although such an isolation may be desirable in many cases, it is precisely wrong in our case: the captured and the host continuations do not have the common view of the system and cannot work together. It may be instructive to contemplate process checkpointing offered by some operating systems (see also `undump' typically used by Emacs and TeX). When checkpointing a process, we wish to save the continuation of the process only (rather than the continuation of the scheduler that created the process, and the rest of the OS continuation). We also wish to save data associated with the process, for example, the process control block and the description of allocated memory and other resources. Control blocks of all processes are typically linked in; when saving the control block of one process, we definitely do not wish to save everything that is reachable from it. When saving the state of a process in a checkpoint, we do not usually save the state of the file system -- or even of all files used by the process. First of all, that is impractical. Mainly, it is sometimes wrong. For example, a process might write to a log file, e.g., syslog. We specifically do not wish to save the contents of the syslog along with the process image. We want the restored process append to the system log rather than replace it!

Of course resuming a suspended process after modifying its input files may also be wrong. It is a hard question of what should be saved by value and what should be saved by reference only. It is clear however that both mechanisms are needed. The serialization code of the delimcc library does offer both mechanisms. The inspiration comes from the fact that OCaml's own marshaling function, when handling closures, serializes OCaml code by reference. The delimcc library extends this approach to data. The library supports the registration of data (which currently must be closures in the old heap) in a global array. When serializing a continuation, the library traverses it and replaces all references to registered closures with indices in the global array; we then invoke OCaml's own serialization routine to marshal the result. After that, we undo the replacement of closures with indices. Such value mangling is not without precedent: to detect sharing, OCaml's own marshaling routine too mangles the input value. The use of the global array is akin to the implementation of cross-staged persistence in MetaOCaml.

Version

The current version is 1.7, April 2008

References

Delimited continuations in OCaml
The description of the delimcc library
The persistent twice-delimited continuations have been demonstrated at the Continuation Fest 2008 (April 13, Tokyo, Japan) and described in the message to the caml-list posted on Sun, 27 Apr 2008 17:53:36 -0700 (PDT).

Persistent delimited continuations for CGI programming with nested transactions
The salient application of persistent delimited continuations is the library for writing CGI scripts as if they were interactive console applications using read and printf. The above library implements the minimal CGI programming framework with form validation. The library also supports nested transactions. The captured continuations are relatively compact: the essentially empty captured continuation takes 491 bytes when serialized. Serialized continuations of the unoptimized blog application have the typical size of 10K (depending on the size of the posts); bzip can compress them to one third of the original size.

(Delimited) Dynamically scoped variables in OCaml

Dynamic binding adds expressiveness; as Luc Moreau emphasized, it is, when used in moderation, quite useful: for parameterizing a function deep in the code without changing the interface of all callers; for propagating the environment information like the current working directory, printing-depth, etc. Dynamic binding is inescapable in mobile code or continuation-based web servers. Dynamic binding, in some restricted form, is already present in OCaml: exception (and now, effect) handlers are dynamically scoped. This message announces the availability of general dynamic binding. This is joint work with Chung-chieh Shan and Amr Sabry.

Dynamic binding is implemented as a regular library, using Stdlib.Effect. No changes to the OCaml system and no code transformations are required; code that does not use dynamic variables incurs no overhead. Our dynamic variables are mutable; mutations however are visible only within the scope of the dlet statement where they occurred. It is also possible to obtain not only the latest binding to the dynamic variable, but also any of the earlier, shadowed bindings. It is also possible to capture the current dynamic environment, and re-instate it.

Because dynamic binding is implemented in terms of effects, the two features harmoniously interact. We can use dynamic variables in effect-based, cooperative threads, and support partial inheritance of the dynamic environment, with both shared and thread-private (mutable) dynamic variables.

Version

The current version is 2.1, June 2025

References

dynvar.txt [5K]
The (old) announcement with a few examples
It was originally posted on the caml-list on Mon, 10 Apr 2006 18:25:29 -0700. At that time, the implementation used the delimcc library. Now it relies on the built-in effects.

dynvar.mli [2K]
dynvar.ml [3K]
The interface and the implementation of the library

vdynvar.ml [6K]
The test code
The example at the end of that file demonstrates a partial inheritance of the dynamic environment among the parent and two cooperatively run threads.

Delimited Dynamic Binding
The ICFP 2006 paper justifying the implementation

Luc Moreau: A Syntactic Theory of Dynamic Binding
Higher-Order and Symbolic Computation, 11, 233-279 (1998)

Printing the outline of a pruned tree, using the extension to obtain shadowed dynamic bindings

dynvar_ext.mli [<1K]
dynvar_ext.ml [2K]
vdynvar_ext.ml [4K]
Capturing and re-instating the dynamic environment. The test code demonstrates a sample application: implementing lexical binding in terms of dynamic binding.

Resumable exceptions

Resumable exceptions are the strict generalization of regular exceptions, letting the exception raising form return a value and so the computation may continue. It's the exception handler that decides either to abort the exceptional computation or to resume it with a particular value. Resumable exceptions are made popular by Common Lisp, where they are widely used. Rainer Joswig's message, cited below, lists several real-life examples of resumable exceptions.

We show a conservative and elementary implementation of resumable exceptions in OCaml: the implementation is a self-contained 100% pure OCaml library; makes no changes to the OCaml system; supports the existing style of defining exceptions; is compatible with the ordinary exceptions; works in byte- or natively-compiled code; uses the most basic facilities of ML and so can easily be translated to SML.

We impose no extra restrictions on the resumable exception raising and handling forms. Like with ordinary exceptions, resumable ones may encapsulate values of arbitrary types; the same exception handler may process exceptions of many types -- and send resumption replies of many types. The raise form may appear within the guarded code at many places; different raise forms may resume with the values of different types. Furthermore, resumable exceptions are declared just like the ordinary ones, with the exception keyword. If the resumable exception handler never resumes, resumable exceptions act and feel precisely as the regular ones.

Version

The current version is 1.2, Jun 14, 2006

References

resumable.txt [7K]
Motivation, design and an example of resumable exceptions
This announcement article was originally posted on the caml-list on Wed, 14 Jun 2006 15:54:03 -0700. This file also includes follow-ups, discussing syntactic sugar and the ways to implement resumable exceptions in a multi-threaded system.

resumable.ml [6K]
The complete implementation, interface documentation, explanation, and an illustrative example

Rainer Joswig's message with several examples of usefulness of resumable exceptions. It is posted on `Lambda the Ultimate' on June 15, 2006.

Non-deterministic choice amb

The amb operator, first introduced by John McCarthy and well described by Dorai Sitaram in the context of Scheme, takes zero or more expressions (thunks) and nondeterministically returns the value of one of them. This implies that at least one of amb's expressions must yield a value, that is, does not fail. If amb has no expressions to evaluate or all of them fail, amb itself fails. One may think that amb is easily implementable by taking a list of thunks and evaluating the thunks in some order within the try block. The value of the thunk finishing without raising an exception is returned. However, that simple implementation is wrong. It is not enough that amb's chosen expression itself evaluates successfully. The chosen expression must be such that its value causes the whole program finish without errors, if at all possible. The amb operator must `anticipate' how the value of the chosen expression will be used in the rest of the computation. Therefore, amb is called an angelic nondeterministic choice operator.

Andrej Bauer gave the following example on the discussion thread:

    if (amb [(fun _ -> false); (fun _ -> true)]) then
       7
    else failwith "failure"

This program, he explained, should return 7: ``the amb inside the conditional should `know' (be told by an angel) that the right choice is the second element of the list because it leads to 7, whereas choosing the first one leads to failure.''

Therefore, we need the ability to examine (or speculatively execute) the rest of the computation. In Scheme, amb is implementable in terms of call/cc, as well explained by Dorai Sitaram. OCaml has more appropriate delimited control operators, which implement amb in two lines of code. We also need a `toplevel function', to tell us if the overall computation succeeded. One may think of it as St. Peter at the gate. For now, we take a computation that raises no exception as successful. In general, even non-termination within a branch can be dealt with intelligently (cf. `cooperative' threading which must yield from time to time). Andrej Bauer's test now looks in full as

    let test1 () =
      let v = 
        if (amb [(fun _ -> false); (fun _ -> true)]) then
          7
        else failwith "Sinner!"
      in Printf.printf "the result: %d\n" v;;
    let test1r = toplevel test1;; (* "the result: 7" *)

Speculatively evaluating amb's expressions or the rest of the computation may incur effects, such as mutation or IO. We can deal with them using one of the standard transaction implementation techniques: prohibit effects, log the updates, log the state at the beginning to roll back to, use zipper for functional `mutations'.

Here is a more advanced test, requiring a three-step-ahead clairvoyance from amb:

    let numbers = List.map (fun n -> (fun () -> n)) [1;2;3;4;5];;
    let pyth () =
      let (v1,v2,v3) =
        let i = amb numbers in
        let j = amb numbers in
        let k = amb numbers in
        if i*i + j*j = k*k then (i,j,k) else failwith "too bad"
      in Printf.printf "the result: (%d,%d,%d)\n" v1 v2 v3;;
    
    let pythr = toplevel pyth;; (* the result: (3,4,5) *)

In monadic terms, amb is equivalent to msum in MonadPlus. Even though we are interested in the first result of the entire MonadPlus computation, along the way we have to keep track of many possible worlds. That is, we need something like a List monad rather than a Maybe monad (the latter should not even be regarded as MonadPlus).

Version

The current version is 1.1, Feb 10, 2007

References

Caml-list discussion thread Amb started by Jonathan Bryant. February 09-10, 2007.

Dorai Sitaram: Teach Yourself Scheme in Fixnum Days. 1998-2004. Chapter 14. Nondeterminism.
<http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-16.html#node_chap_14>

amb.ml [3K]
The commented OCaml implementation
It was posted on the above discussion thread on Sat, 10 Feb 2007 03:15:56 -0800 (PST)

Zipper File system and transactional storage

Local globally-quantified exceptions

SML has local exception declarations. Furthermore, it lets us define a polymorphic function whose body declares a local exception with the type tied to that of the whole function. For example:

    fun 'a embed () = let exception E of 'a
                          fun project (e: t): 'a option =  ...

At first sight, that SML feature seems impossible in OCaml (prior to OCaml 3.12). Although we can declare local exceptions in OCaml via local structures, before OCaml 3.12 we could not use type variables quantified outside the structure: a structure limits the scope of all of its type variables. We show that local globally-quantified exceptions are macro-expressible in all versions of OCaml and demonstrate two translations. The first one relies on multi-prompt delimited continuations, whose implementation leads to the second translation. The latter represents a polymorphic exception mere by a parameter-less exception and one reference cell.

The caml-list thread referenced below gave a good motivation for locally polymorphic exceptions: writing an efficient library function fold_file of the following interface:

    module type FoldFile = sig
     val fold_file : in_channel ->         (* file      *)
                     (in_channel -> 'a) -> (* read_func *)
                     ('a -> 'b -> 'b)   -> (* combiner  *)
                     'b ->                 (* seed      *)
                     'b
    end

We can use this general folding over a file to, for example, count the number of lines in a file:

    module TestFold(F:FoldFile) = struct
     let line_count filename = (* string->int *)
       let f = open_in filename in
       let counter _ count = count + 1 in
       F.fold_file f input_line counter 0
     let test = line_count "/etc/motd"
    end

The following tentative implementation has been outlined on ocaml-list:

    module Attempt0 = struct
      exception Done of 'a
      let fold_file file read_func elem_func seed =
       let rec loop prev_val =
         let input = try read_func file
                     with End_of_file -> raise (Done prev_val) in
         let combined_val = elem_func input prev_val in
         loop combined_val
       in
       try loop seed with Done x -> x
    end

The loop is properly tail-recursive (NB: the body of a try block is not in a tail position) and avoids any administrative data structures. Alas, the typechecker does not accept the exception declaration, which says that Done should carry a value of all types. There is no such value in OCaml, and if it were, it wouldn't be useful. That was not our intention anyway: we want the value of Done to have the same type as the result of the polymorphic function fold_file. We should have declared the exception inside fold_file. Surprisingly, that can be done: Delimcc.prompt is precisely this type of `local exceptions'. We need only a slight and local adjustment to the above code to make it compile. This is our first translation.

    open Delimcc   let abort p v = take_subcont p (fun sk () -> v);;
    
    module AttemptA : FoldFile = struct
      let fold_file file read_func elem_func seed =
       let result = new_prompt () in (* here is our local exn declaration *)
       let rec loop prev_val =
         let input = try read_func file
                     with End_of_file -> abort result prev_val in
         let combined_val = elem_func input prev_val in
         loop combined_val
       in
       push_prompt result (fun () -> loop seed)
    end
    
    let module TestA = TestFold(AttemptA) in TestA.test;; (* - : int = 24 *)

The analogy between exceptions and delimited continuations is profound: local exceptions are commonly used to implement multi-prompt delimited continuations in SML. We see the converse is also true. Furthermore, delimited continuations in OCaml are implemented in terms of exceptions. Abort is essentially raise. If we `inline' the gist of the delimited continuation library we arrive at our second translation. The result requires no libraries and works with both byte-code and native compiler.

    module AttemptR : FoldFile = struct
      exception Done
      let fold_file file read_func elem_func seed =
       let result = ref None in (* here is our local exn declaration *)
       let rec loop prev_val =
         let input = try read_func file
                     with End_of_file -> result := Some prev_val; raise Done in
         let combined_val = elem_func input prev_val in
         loop combined_val
       in
       try loop seed with Done -> (match !result with Some x -> x
                                     | _ -> failwith "impossible!}
    end;;
    let module TestR = TestFold(AttemptR) in TestR.test;; (* - : int = 24 *)

The code is still properly tail-recursive and deforested. In contrast to other imperative implementations of fold_file, ours is almost pure: the reference cell result is written to and immediately after read from only once during the whole folding -- namely, at the very end. The bulk of the iteration is functional. A mutable cell is the trick behind typing of multi-prompt delimited continuations. One may even say that if a type system supports reference cells, it shall support multi-prompt delimited continuations -- and vice versa.

Version

The current version is 1.2, Oct 4, 2007

References

Yin-so Chen, Olivier Roussel, kirillkh, et al. best and fastest way to read lines from a file? Caml-list discussion thread. October 1-2, 2007.

Caml-list discussion thread Locally-polymorphic exceptions October 3-4, 2007. Of special note is the PML implementation posted by Christophe Raffalli.

The MLton team. UniversalType Source for the SML example of local exceptions
<http://mlton.org/UniversalType>

ML Programming

Domain-specific languages (DSLs) embedded in OCaml

Type-directed memoization

Version

References

Typed heterogeneous collections: Lightweight HList

Version

References

do-while loops have always been in OCaml

References

Persistent twice-delimited continuations

Version

References

(Delimited) Dynamically scoped variables in OCaml

Version

References

Resumable exceptions

Version

References

Non-deterministic choice amb

Version

References

Local globally-quantified exceptions

Version

References

Last updated July 1, 2025