Time travel stories are often about stupefying outcomes of changing the past. This story is no exception. It accidentally exposes a dirty trick, a dubious operation that noone can see. And yet... Let's go to the beginning.
Once I received a message from a Hansei user: His probabilistic
program gave clearly wrong results.
(Hansei is a library for probabilistic programming in OCaml. It works
by altering past choices and counting alternative histories.)
He eventually traced the problem
to the library function List.map
: when he wrote the list map
himself, the problem disappeared. I was puzzled. Mapping over a list is
so simple, with no place for bugs to hide.
My correspondent mentioned using Batteries, the
alternative, almost standard OCaml library. I went to the Batteries
repository to look at the source code and found this:
HANSEI: Embedded Probabilistic Programming
Hayo Thielecke: Using a Continuation Twice and Its Implications for the Expressive Power of call/cc
Higher-Order and Symbolic Computation, volume 12, April 1999,pp. 47-73.
More stories of time-travel with unexpected consequences.
For example, as it turns out, time travel is noticeable
even if one does not change anything in the past (see
``the refutation of the idempotency hypothesis'').
List.map
in Batteries is not what one would have expected...
The standard, familiar List.map
function:
let rec map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function | [] -> [] | h :: t -> f h :: map f tis non-tail-recursive. It hence consumes stack space proportional to the size of the input list. When mapping over long lists, the program may run out of stack (or cause time-consuming stack reallocations) since stack is typically much smaller than the heap.
OCaml Batteries is a well-optimized library. Its List.map
is
tail-recursive and runs in constant stack space. Strictly speaking,
this is impossible to do efficiently. That's where a secret comes in.
Let's look at the source code. After desugaring, it is:
type 'a mut_list = { hd: 'a; mutable tl: 'a list } let map_mut : ('a -> 'b) -> 'a list -> 'b list = fun f -> function | [] -> [] | h :: t -> let rec loop dst = function | [] -> () | h :: t -> let cell = {hd = f h; tl = []} in dst.tl <- Obj.magic cell; loop cell t in let r = {hd = f h; tl = []} in loop r t; Obj.magic rThe function
loop
is clearly tail-recursive. Just as clearly,
there is magic. Let's try to understand the code -- by deriving it
ourselves.List.map
maps one element of the input into one element of
the output. It needs no buffers and can, in principle, work in
constant space. Still, it is expected to maintain the list
order. Therefore, if it takes an element from the head of the input
list it should add the transformed result to the end of the output
list.
module V1 = struct let snoc : 'a -> 'a list -> 'a list = fun x l -> l @ [x] let map : ('a -> 'b) -> 'a list -> 'b list = fun f l -> let rec loop acc = function | [] -> acc | h :: t -> loop (snoc (f h) acc) t in loop [] l endThis version V1 of list mapping is tail recursive, relying on
snoc
to add an element to the tail of a list. Alas, snoc
needs space
proportional to the input list size. Our goal is to make snoc
run in constant time.
In preparation for an optimization, we split out the empty list case,
so that snoc
deals only with non-empty lists:
module V2 = struct let snoc : 'a -> 'a list -> 'a list = fun x l -> l @ [x] let map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function | [] -> [] | h :: t -> let rec loop acc = function | [] -> acc | h :: t -> loop (snoc (f h) acc) t in loop [f h] t end
Next we look very carefully at map
to see which list operations it really
needs -- and abstract them away in NList
, a module of non-empty lists:
module type NList = sig type 'a t val singleton : 'a -> 'a t val to_list : 'a t -> 'a list val snoc : 'a -> 'a t -> 'a t end
In terms of NList
, the list mapping takes the following general form
-- which we will use for the rest of this web page:
module ListMap(NL:NList) = struct let map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function | [] -> [] | h :: t -> let rec loop acc = function | [] -> NL.to_list acc | h :: t -> loop (NL.snoc (f h) acc) t in loop (NL.singleton (f h)) t end
Ordinary lists are one, naive, realization of NList
:
module NL_list : NList = struct type 'a t = 'a list let singleton x = [x] let to_list x = x let snoc x l = l @ [x] end
ListMap(NL_list)
is the inefficient version V2 shown earlier.
Let's find a better implementation of NList
. The modular abstraction
ensures the representation independence: the map code cannot know or
depend on a particular NList
implementation. We are free to change
the representation of non-empty lists to make snoc
simple.Let's try the familiar, from tech interviews, linked list: a chain of links with a mutable tail (called cons-cells below, for historical reasons). A list is represented by a pair of the first and the last link in the chain.
module NL_twocons : NList = struct type 'a cons = {hd : 'a; mutable tl : 'a cons option} type 'a t = 'a cons * 'a cons let new_cell : 'a -> 'a cons = fun x -> {hd=x; tl=None} let singleton x = let l = new_cell x in (l,l) let to_list (first,_) = let rec go = function | {hd;tl=Some t} -> hd :: go t | {hd;tl=None} -> [hd] in go first let snoc x (first, last) = let cell = new_cell x in last.tl <- Some cell; (first,cell) endAdding to the tail becomes the trivial mutation of the
tl
pointer
in the last cons-cell, to point to the new cell. Alas, the benefits of
the now optimal snoc
are negated by to_list
.
It takes arcane knowledge to make to_list
also constant-time:
module NL_magic : NList = struct type 'a cons = {hd : 'a; mutable tl : 'a list} type 'a t = 'a cons * 'a cons let from_cons : 'a cons -> 'a list = Obj.magic let new_cell : 'a -> 'a cons = fun x -> {hd=x; tl=[]} let singleton x = let l = new_cell x in (l,l) let to_list (first,_) = from_cons first let snoc x (first, last) = let cell = new_cell x in last.tl <- from_cons cell; (first,cell) endThe structure
'a NL_magic.cons
has the same memory layout as the
cons-cell h :: t
of the ordinary list. Therefore, we can `cast'
one from the other. (Strictly speaking, casting from an
immutable h :: t
to 'a cons
is not safe: the OCaml compiler does take immutability
into account in optimizing. But we only need the other direction, which
is safe.) With all NList
operations taking constant time, we have achieved
our goal. The map_mut
code at the beginning of this section is ListMap(NL_magic).map
, after inlining and simple optimizations
(which could have been done by the compiler).
The code has the frequently seen `initialization pattern':
an initially allocated structure has some uninitialized (set to []
,
in our case) fields, later filled-in by mutation.
The structure is then `frozen' and becomes immutable. The
operation from_cons
is such `freezing'. So long as the not-yet-filled-in structure is not
revealed to others, so long as the unique ownership is preserved,
such delayed in-place initialization
is safe. In our case, 'a cons
is internal to NL_magic
; no pointers to it are visible
outside ListMap(NL_magic)
. The mutation to the private
data is really a secret. And yet it came to light. How?
List.map
problem
was first detected. Hansei was also used in the discussions
on the Caml-list. In this article, however, we explain the problem
using a quite simpler library, with only two operations:val choose : 'a -> 'a -> 'a val top : (unit -> 'a) -> 'a list
Here, choose x y
non-deterministically chooses x
or y
; top thunk
gives the list of thunk
results, in all
possible histories. For example,
top (fun () -> not (choose false true))produces
[true;false]
, for the two possible choices.
Backtracking has to capture the program state, or context, to
reinstate it later. Time travel invariably involves continuations.
That is how choose
and top
can be implemented, in terms of
the delimited control operators of the delimcc library, in a couple of
lines of code:
open Delimcc let p = new_prompt () let choose : 'a -> 'a -> 'a = fun x y -> shift p (fun k -> k x @ k y) let top : (unit -> 'a) -> 'a list = fun th -> push_prompt p (fun () -> [th ()])The coin flip
choose false true
first captures the context in k
and makes the false
choice: the entire choose false true
expression is replaced with false
and the program continues as usual,
eventually producing [true]
. We then backtrack and make the different choice,
re-running the program, which now gives [false]
. The results are collected
at the end.Let's take a more complex example, of two independent choices:
top (fun () -> map (fun _ -> choose false true) [1;2]) (* [[false; false]; [true; false]; [false; true]; [true; true]] *)The result, put in comments, shows four possible outcomes of two independent coin flips. We have used
List.map
from
the standard library. Any inefficient -- V1, V2, ListMap(NL_list)
-- version of map gives the same result. However,
the efficient tail-recursive version, ListMap(NL_magic)
or map_mut
,
differs:top (fun () -> let module M = ListMap(NL_magic) in M.map (fun _ -> choose false true) [1;2]) (* [[false; false]; [false; false]; [true; false]; [true; false]] *)Two coin flips no longer come up both
true
! In the
nutshell, that is the problem that was reported by the Hansei user.
What went wrong? Let us look at ListMap
again:
module ListMap(NL:NList) = struct let map : ('a -> 'b) -> 'a list -> 'b list = fun f -> function | [] -> [] | h :: t -> let rec loop acc = function | [] -> NL.to_list acc | h :: t -> loop (NL.snoc (f h) acc) t in loop (NL.singleton (f h)) t endIf the function in
f h
makes a choice, backtracks and makes another choice, NL.snoc (f h) acc
is executed twice: with different f h
but the
same acc
. If snoc
does mutation, as in the optimized versions:let snoc x (first, last) = let cell = new_cell x in last.tl <- from_cons cell; (first,cell)the second execution of the assignment overrides and destroys the result of the first one. That is how the
true
choices became
lost.
It is not wrong to think that choose false true
`splits the world'
(pretty much like an experiment
in many-worlds interpretation of Quantum Mechanics).
In one world, choose false true
returns false
and in the other true
. However, the split worlds share the same heap. A mutation
in one world is hence visible in all the others. In programming
language terms, a delimited continuation
captures the stack, but not the heap. If the captured
continuation modifies the heap, its reinstatement brings
a race condition. From yet another angle, map
in the split worlds share the same partially constructed output
list. The mutation is only safe when the list is uniquely owned.
When the worlds split, private remains private but no longer unique.
What makes the problem elusive and fascinating is its arising not because of an external tempering with data. Tempering is impossible: map's private data are truly inaccessible and temper-proof. Rather, the map function is tricked to temper with its data itself.
List.map
is not thread safe. The fix below is hence a thread-safe and lock-free
version. It is lock-free because
the implementation of non-determinism provides no locks.
The idea for the fix is taken from a message by Gabriel Scherer on the
Caml-list. However, it is implemented and explained simpler, without
a canary value.
Luckily, non-determinism is not as bad as the general shared memory
parallelism. First, `thread preemption' can only occur at specific
points, when we call a callback such as the mapping function. Mainly,
we still have the
guarantee that private data are modified only by their
owners. Therefore, we need to examine only the map
code;
we do not need to know what a callback or a consumer
of the resulting list may do.
Looking carefully at the NL_magic
code
let snoc x (first, last) = let cell = new_cell x in last.tl <- from_cons cell; (first,cell)shows that
snoc
has a pre-condition that we neglected to check: last
should be the last cons-cell of the list, as the name implies.
That is, last.tl
must be []
. In the single-threaded version,
the precondition was guaranteed by the fact that map
never calls snoc
twice on the same list.
If snoc
is called twice on the same list argument, it will find the
the list already extended by someone else. Hence snoc
is no longer
the sole owner of the list -- rather, it holds the shared
fragment, from first
through the last
cons-cell. We ought to take this
fragment private, by deep copying, thus restoring the unique ownership.
That code below implements this idea:
let snoc_simple : 'a -> 'a cons -> 'a cons = fun x dst -> let cell = new_cell x in dst.tl <- from_cons cell; cell (* Copy from first to last inclusive *) let deep_copy : 'a t -> 'a t = fun (first,last) -> let rec loop first last dst = if first == last then dst else match first with | _ :: ((h :: _) as t) -> loop t last (snoc_simple h dst) | _ -> assert false in let res = new_cell first.hd in let last = loop (from_cons first) (from_cons last) res in (res,last) let snoc x ((first, last) as l) = if last.tl = [] then (first,snoc_simple x last) else let (first,last) = deep_copy l in (first, snoc_simple x last)We have only shown the changed parts of
NL_magic
. The new version
is still just as fast in the `single-threaded' case
(only last.tl = []
check is added); it has no race conditions in the
non-deterministic case. Our deep-copying of the list is
what the world splitting should have done, to keep the two new worlds
independent.maps.ml [6K]
The complete OCaml code for the derivation, with tests.
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML