Representing existential data types with isomorphic simple types

We describe various ways of eliminating explicit existential quantification in data types, replacing such data types with isomorphic structures. We stress the encoding of existentials in terms of the equivalent first-rank types. The techniques let us work with existentials in a language with a simple type system and ordinary algebraic data types.


Introduction and motivation

Existential types exists a. t along with universal types forall a. t are two well-known examples of second-order types. Universals represent type parameterization and are used to give type to parametrically polymorphic functions such as the identity function. Existentials express type abstraction; their typical application is typing abstract data types, objects, and implicitly heterogeneous lists.

A sample existential type exists a b. (a,b) is the type of any pair whatsoever. We can take the pair apart using fst and snd, put it back together. We can pass the components of the pair around, but we cannot do anything else with them: we do not know their exact types and so have no idea which operations may apply to them. Existential types are supported as a widely implemented extension to Haskell98, and nowadays in OCaml. In Haskell, our sample data type is declared as data Pair = forall a b. Pair a b (using the forall quantifier, which may be confusing at first).

Existentials are most known as an implementation technique for abstract data types, aka packages. Packages are data structures whose realization is hidden and which can be manipulated only by a closed set of pre-defined operations. In other words, packages expose the interface but hide the implementation. One of the simplest packages is

    data SE = forall x. SE x (x -> String)
consisting of a value of some type x and the single pre-defined operation to manipulate this value, namely, to observe this value as a String. We can declare the package differently, but equivalently as
    data SE' = forall x. Show x => SE' x
using so-called bounded quantification, where the unknown type x is assured to be a member of the type class Show. The two declarations are equivalent; once can always convert the declaration with the type-class qualified quantification into the declaration with the unrestricted quantification by listing all type class methods (such as show :: x -> String) as the operations of the package. Therefore, we do not consider bounded quantification any further on this page.

Here are two sample values of the SE type:

    se1 = SE (5::Int) show
    se2 = SE "xx" id
    tle = [se1, se2]
    tle_run = map (\(SE x f) -> f x) tle
Although the two values represent the type x differently -- Int resp. String -- the difference is abstracted over and so the two values have the same type, SE. Therefore, the two values can be placed into the same list and uniformly manipulated. We thus observe the other, related benefit of existentials: supporting implicitly heterogeneous lists -- lists of elements of different types but with a common set of operations. The running example relied on existentials to `place' an Int and a String in the same list, and map the common `to-string' operation over the list. Existentials are only one of several ways of implementing implicitly heterogeneous lists; the others are described elsewhere on this web page.
John C. Mitchell, Gordon D. Plotkin: Abstract types have existential type
ACM Transactions on Programming Languages and Systems, 1988.

Luca Cardelli: Type systems
The Computer Science and Engineering Handbook (Allen B. Tucker (Ed.)). CRC Press, 2004. Chapter 97


Eliminating simple existentials

The introduction section used an existential to define a trivial abstract data type with a single operation of observing the hidden, abstract value as a String:
    data SE = forall x. SE x (x -> String)
By matching an SE value against the pattern SE x f we can get hold of the value x of the abstracted type. The only non-trivial operation possible on x is applying to it the function f obtained as the result of the same pattern-match. One may wonder therefore why not to apply f to x to begin with, observing x right away. There is nothing to hide any more and we do not need any existentials.

The gist of eliminating an existential, therefore, is representing it by a tuple of its possible observations. For example, SE is represented by the following ordinary data type:

    data S1 = S1 String
In a non-strict language like Haskell, the observed String is not computed unless needed. In a strict language such as OCaml, thunks are needed to delay the observations until needed; SE hence is represented in OCaml as type s1 = unit -> string. The implicitly heterogeneous list of an Int and a String with the common `to-string' operation, cf. tle of the Introduction, is written in Haskell simply as
    ts1 = [S1 (show 5), S1 "xx"]
    ts1_run = map (\(S1 x) -> x) ts1
or as following in OCaml
    let ts1 = [(fun () -> string_of_int 5); (fun () -> "xx")] (fun f -> f ()) ts1
To prove that SE and S1 are equivalent, we exhibit the following total functions to convert the value of one type to the other:
    iso_SES1 :: SE -> S1
    iso_SES1 (SE x to_string) = S1 (to_string x)
    iso_S1SE :: S1 -> SE
    iso_S1SE (S1 s) = SE () (\() -> s)
It is clear that the composition iso_SES1 . iso_S1SE :: S1 -> S1 is the identity. Proving that there is no context that can distinguish the value tse::SE from the value (iso_SES1 . iso_S1SE) tse is more involved. We have to appeal to the opacity of the existentially quantified type. The technique of Sumii and Pierce seems best for such equivalence proofs.
Eijiro Sumii and Benjamin C. Pierce: A Bisimulation for Type Abstraction and Recursion
Journal of the ACM, vol. 54, issue 5, article 26, pp. 1-43, October 2007.
The paper presents ``a bisimulation method for proving the contextual equivalence of packages in lambda-calculus with full existential and recursive types.'' Unlike other bisimulations, it is complete even for existential types.


Eliminating existential packages

A package packs a value of an abstract type together with the set of operations to manipulate the value. Since the type of the value is abstract, the value can only be manipulated by the operations of the same package. Here is a typical package, described as an OCaml module type:
    module type COUNTER = sig
      type t
      val init    : unit -> t
      val observe : t -> int
      val step    : t -> t
The package interface introduces the abstract type t of an unknown representation and three operations: the producer, the observer, and the transformer. The producer makes instances of the counter in the initial state; the step operation advances the counter. We can also observe the current value of the counter. The package itself, which implements the COUNTER signature, is a module. It used to be that modules were second-class values, which cannot be passed as function arguments or results, or stored in lists. Nowadays, OCaml has first-class modules (as well as existentials).

Let us still try to implement the COUNTER package as simple as possible. We need nothing beyond the ordinary algebraic data types. By the very definition of a package, its set of operations is pre-defined and closed. The possible observations are hence enumerable; the existential can hence be eliminated by replacing it with the tuple of all its observations.

    type counter = C of (unit -> int) * (unit -> counter)
We represent the package `counter' as a tuple of the observed current value of the counter and the advanced counter. These are the only possible observations of a counter. Thunks delay the observations until their results are required. Here are two counters with two different internal states, a pair of integers vs. a floating-point number:
    let counter1 =
      (* internal function, unavailable outside *)
      let rec make seed upper =
        C ((fun () -> seed),
           (fun () -> let seed = if seed >= upper then 0 else succ seed in make seed upper))
      in make 0 5
    let counter2 =
      (* internal function, unavailable outside *)
      let observe seed = int_of_float (10.0 *. seed) in
      let step seed = cos seed in
      let rec make seed = C ((fun () -> observe seed),
                             (fun () -> make (step seed)))
      in make 0.0
We can place the two counters into the same list and iterate over:
    let lst = [counter1; counter2]
    let advance_all = (function C (_,adv) -> adv ())
    let show_all = List.iter (function C (s,_) -> Printf.printf "%d\n" (s()))
    let test = let () = show_all lst in
               let () = show_all (advance_all (advance_all lst))
               in ()
We have demonstrated the very old idea: closures are poor-man objects. The realization of this idea back in 1976 begat Scheme.

More generally, a package can be described as the following Haskell existential data type:

    data CE i m o =
        forall x. CE x                      -- current state
                     (i -> x)               -- producer
                     (x -> m -> x)          -- transformer (`step')
                     (x -> o)               -- observer
Here, the type i is the type of the arguments to the constructor; the type m describes additional arguments to the transformer, and o is the type of observations. The corresponding simple data type is
    data C1 m o = C1 (m -> C1 m o) o
To show the isomorphism between the existential package and its existential-free representation, we exhibit the following pair of total functions. The function iso_CEC1 converts the existential data type CE into a pair of the corresponding simple data type C1 and the constructor function. The latter creates the (initial) value of C1. The function iso_C1CE is the observational inverse.
    iso_CEC1 :: CE i m o -> (C1 m o, i -> C1 m o)
    iso_CEC1 (CE x init step observe) = (makeC1 x, \i -> makeC1 (init i))
     where makeC1 x = C1 (makeC1 . step x) (observe x)
    iso_C1CE :: (C1 m o, i -> C1 m o) -> CE i m o
    iso_C1CE (c1, makeC1) = CE c1 init step observe
     where init = makeC1
           step (C1 step1 _)   = step1
           observe (C1 _ obs1) = obs1
One may call C1 m o a Skolem function for the quantified type variable x.
The current version is 1.3, June 28, 2008
In the final form, the idea of using a recursive first-order data type was explained in the message Eliminating existentials, finally posted on the Caml list on Wed, 14 Nov 2007 00:37:10 -0800 (PST). The early version of the idea was discussed in the thread ``Using existential types'' on Haskell-Cafe in October 2003.

Bernhard Beckert and Joachim Posegga: leanTAP: Lean Tableau-based Deduction
Journal of Automated Reasoning, v15, N3, pp. 339-358, 1995
Taking a formula itself as the Skolem-term is used in the leanTAP theorem prover. The authors credit Hilbert and Bernays, 1939.


Eliminating recursive existential variants

Existentials also prove useful in embedding domain-specific languages, including monadic languages. Here is the simplest example: a DSL of applications. It has primitive values and lets us apply them.
    data Expr a = Val a | forall b. Apply (Expr (b->a)) (Expr b)
    ex1 = Apply (Val fromEnum) (Apply (Apply (Val (==)) (Val 'd')) (Val 'c'))
Expr a describes the result of zero or more applications as well as the structure of this series of applications. Jan-Willem Maessen observed that the two aspects can be represented separately, in the following existential-free data type:
    data E1 a = E1 a Tree
    data Tree = Leaf | Node Tree Tree
We again exhibit a pair of total functions to convert between the two data types:
    iso_EXE1 :: Expr a -> E1 a
    iso_EXE1 (Val v) = E1 v Leaf
    iso_EXE1 (Apply opr opa) = E1 (opr' opa') (Node tr ta)
     where E1 opr' tr = iso_EXE1 opr
           E1 opa' ta = iso_EXE1 opa
    iso_E1EX :: E1 a -> Expr a
    iso_E1EX (E1 v Leaf) = Val v
    iso_E1EX (E1 v (Node t1 t2)) = Apply (iso_E1EX (E1 (\() -> v) t1))
                                         (iso_E1EX (E1 () t2))
The first-order representation given here first appeared as the last encoding in the message
Jan-Willem Maessen: Using existential types
posted on the Haskell-Cafe on Fri Oct 10 16:57:12 EDT 2003. The whole discussion thread is also relevant. No attempt to establish isomorphism was made back then.


Eliminating translucent existentials

Some existentials are translucent, exposing part of their internal structure. The epitome of such existentials was given by Chung-chieh Shan:
    data LE = forall x. LE [x] ([x] -> String)
This data type exposes the list [x] while hiding all information about the elements themselves. There are no operations to examine the elements individually; in particular, we cannot compare them. We can tell, however, the size of the list.

The LE existential is quite similar to the SE existential we have seen earlier. Matching an LE value against the pattern LE xs f gives the list xs and the observation function f. As with SE, we may immediately observe xs by passing it to f. Unlike SE, we may do more observations: find out the length of xs; transform xs before passing it to f. When transforming xs, we cannot examine the elements, but we may arbitrarily permute, drop and duplicate them. Generally, we may regard xs as a set of distinct `letters': the alphabet Sigma. The permissible transformations then map Sigma to Sigma* -- all strings over the alphabet.

Since we may not in any way examine the elements of the list xs given as the first component of LE, we may only refer to them by their position in xs. We may just as well use the positions (natural numbers) as the stand-in for the elements. One might think, therefore, that LE is isomorphic to the following

    data LN = LN Nat ([Nat] -> String)
where the first component of LN tells the size of the alphabet, and the second is the observation function, taking a sequence of letters each represented as a Nat. Here Nat is the type of natural numbers -- e.g., Peano numerals
    data Nat = Zero | Succ of Nat
However, LN is not isomorphic to LE: it is `too large'. Indeed, the type [Nat]->String contains functions that take a list of any Nats as the argument. To represent strings over the alphabet of size n, these Nats all have to be within the range [0..n-1]. In particular, if the alphabet is empty, the observation function should accept only the empty list as the argument. For the alphabet of size one, the observation function should accept arbitrary lists, but made entirely of zeros.

Size-limited natural numbers are easy to represent. Whereas arbitrary-size Nats are the least-fixed point of the functor F X = 1 + X, or

    data F x = Z | S x
in the Haskell notation (isomorphic to the familiar Maybe), the numbers under n are the n-th iteration of the functor (the n-th approximant to the fixpoint): Fn Empty, where Empty is the type with no inhabitants. For example, the non-bottom values of the type F2 Empty, or F (F Empty), are Z and S Z.

Thus the faithful representation of LE would look like

    data L' = Sigma (n::Nat, [F^n Empty] -> String)
with the full-blown dependent pair. It is quite more advanced than LE we started with. However, L' admits the simple inductive characterization:
    data L' a where
      LZ' :: ([a] -> String) -> L' a
      LS' :: L' (F a)        -> L' a
which, at least, can now be written in Haskell, as a GADT. Looking closely at L' we notice that the two constructors LZ' and LS' build values of the same type. Therefore, L' is not really a GADT: it is an ordinary data type, but nested. Here it is, written in the conventional data type notation:
    data L a = LZ ([a] -> String) | LS (L (F a))
    type L1 = L Empty
It reminds of the nested data type encoding of perfect binary trees.

Thus, L1 is the ordinary data type that faithfully represents LE with no junk. We prove it by showing the maps that witness the isomorphism between L1 and LE below. The operations one can do on LE and the operations on L1 also clearly correspond.

    iso_LEL1 :: LE -> L1
    iso_LEL1 (LE xs f) = go (reverse xs) f []
        -- build the dictionary, the correspondence between x and F^n Empty
        go :: Eq a => [x] -> ([x] -> String) -> [(a,x)] -> L a
        go []    f dict = LZ (decode f dict)
        go (h:t) f dict = LS (go t f ((Z,h):map (\ (a,x) -> (S a,x)) dict))
        -- decode according to the dictionary
        -- By construction, the dictionary contains all a, and so
        -- the lookup is total.
        decode :: Eq a => ([x]->String) -> [(a,x)] -> ([a] -> String)
        decode f dict = f . map (\a -> let Just x = lookup a dict in x)
    iso_L1LE :: L1 -> LE
    iso_L1LE = go []
       go :: [a] -> L a -> LE
       go ps (LZ f) = LE ps f
       go ps (LS l) = go (Z:map S ps) l
Here is a test of the equivalence of LE and (iso_LEL1 . iso_LEL1) LE on two sample points:
    test_eqLE (LE xs1 obs1) (LE xs2 obs2) =
        zip (map obs1 (tails xs1)) (map obs2 (tails xs2))
    test_LE1E = test_eqLE anLE (iso_L1LE . iso_LEL1 $ anLE)
      where anLE = LE "abc" reverse
    test_LE1E' = test_eqLE anLE (iso_L1LE . iso_LEL1 $ anLE)
      where anLE = LE "abc" (drop 1)


Bringing different types into union

A common problem is the uniform manipulation of a collection of elements that do not have the same type but support a common set of operations. For example, a GUI form may include several widgets -- scrollbars, buttons, text entry fields -- of different structure and generally of different types. The child widgets share operations such as repositioning, redrawing, showing and hiding. A typical implementation of the GUI form has a list of child widgets, and functions to iterate over the list, e.g., to show all widgets.

To place elements of different types into the same homogeneous list we have to inject the elements into the appropriate union type: the universe. Variant types are the simplest way to build the universe. For example,

    type univ = VInt of int | VFloat of float | VBool of bool
lets us place integers, floating-point numbers and booleans into the same list: [VInt 1; VFloat 2.0; VBool false]. The variant type, however, is not extensible. One often needs open unions.

Existentials are one way of building open unions, with the advantage of type abstraction. If exposing the type of a union member is not a problem, one may build open unions in other ways. For example, ML supports open unions in the form of exception types. OCaml has polymorphic open unions (and now, extensible variants). The type of mutable cells is an open union!

Yet another way of building open unions is by iterating the sum type. Here is an example in Haskell:

    data Sum a b = InL a | InR b
    instance (Show a, Show b) => Show (Sum a b) where
        show (InL x) = show x
        show (InR x) = show x
    ex_sum = [InL (1::Int), InR (InL (2.0::Float)), InR (InR False)]
    test_sum = map show ex_sum  -- ["1","2.0","False"]
One may define type classes for other common operations on the elements of ex_sum. Building such lists by hand is inconvenient. OOHaskell demonstrates automatic iterated sums with the corresponding injection and projection operations.

Implicitly heterogeneous lists without existentials offer no value abstraction, let alone type abstraction. On the upside, open unions support a projection operation, aka safe downcast.

The MLton team: UniversalType
The detailed explanation of implementing open unions in SML in terms of exceptions or reference cells.

Open Unions in Haskell


Existentials as universals

This approach is beside our goal of representing existentials in simple types: universals are not simple types. We consider this well-known encoding for contrast.

Recall the SE existential from the Introduction:

    data SE = forall x. SE x (x -> String)
    tle = [SE (5::Int) show, SE "xx" id]
    tle_run = map (\(SE x f) -> f x) tle
It can be realized as the following universal type:
    newtype SU = SU (forall w. (forall x. (x,x->String) -> w) -> w)
    tlu = [SU (\k -> k (5::Int, show)),
           SU (\k -> k ("x", id))]
    tlu_run = map (\(SU k) -> k (\(x,f) -> f x)) tlu
SU is the wrapper over the polymorphic function that takes as the argument the polymorphic function (forall x. (x,x->String) -> w). The expressions tle_run and tlu_run both evaluate to the same result, ["5","x"].

It is tempting to think of the correspondence between SE and SU in terms of the classical equivalence between an (existential) proposition and its double negation. This equivalence however holds only classically, and hence has no relevance for the polymorphic lambda calculus or programming languages based on it, which are not models of classical logic.

To convince oneself that the representation of existentials via universals is correct, it is enough to exhibit the isomorphic mappings between them. Here they are:

    data Ex c = forall x. Ex (c x)
    data Un c = Un (forall w. (forall x. c x -> w) -> w)
    -- The mappings are clearly total and the inverses of each other
    e2u :: Ex c -> Un c
    e2u (Ex v) = Un (\k -> k v)
    u2e :: Un c -> Ex c
    u2e (Un f) = f Ex
Instead of the specific SE, we took the most general form of existentials: Ex c, parameterized by the `constructor' c :: * -> *. For example, SE is an instance of the Ex c schema: type ES = Ex SC where newtype SC x = SC (x, x -> String).

One way to understand the universal encoding is to consider the elimination rule for existential types (in intuitionistic and classical logics):

    (Ex x. A)   (A[x:=T] -> C)
where T is an eigenvariable: a type that should not occur in A and C and should be used only within the derivation of (A[x:=T] -> C). The eigenvariable side-condition can be expressed in a different way:
    (Ex x. A)   (forall x. A -> C)
    ------------------------------    where x is not free in C
from which follows that (Ex x. A) acts as an implication (function) that takes (forall x. A -> C) and produces C, for arbitrary C. The u2e mapping above says the same, but in much fewer words. The mapping e2u relates to the introduction rule for Ex x. A, and its universal encoding.
Benjamin C. Pierce: Types and Programming Languages (MIT Press, 2002). Section 24.3
A different explanation of the universal encoding.


Objects as an encoding of existentials

This approach is also beside our goal of representing existentials in simple types. Object types are implicitly existential, with respect to the types of their private fields. The very motivation for objects is the encapsulation of their state, which includes the abstraction over the types of their private fields so to prevent the outsiders from finding out not only the values of the fields but also their types. Outsiders should only use the methods of the object and should know nothing about object's fields.

Here is the familiar `counter' -- this time as an object. We first declare the interface, the object type, for the sake of (optional) type annotations and clarity:

    class type counter = object ('self)
      method observe : int
      method step    : unit -> 'self
The following are two objects of the type counter -- two implementations of the above interface. The two counters differ in their internal state: a pair of integers vs. a floating-point number.
    let counter1 = object
       val seed  = 0
       val upper = 5
       method observe = seed
       method step () =
         let new_seed = if seed >= upper then 0 else succ seed in
         {< seed = new_seed >}
    let counter2 : counter = object
       val seed = 0.0
       method observe = int_of_float (10.0 *. seed)
       method step () = {< seed = cos seed >}
The state of the objects is not directly observable. Therefore, despite the different type of the internal state, the two counters have the same type. We can put the counters in the same list and uniformly process them by iterating over the list.
    let lst = [counter1; counter2]
    let advance_all = (fun c -> c#step ())
    let show_all = List.iter (fun c -> Printf.printf "%d\n" c#observe)
    let test = let () = show_all lst in
               let () = show_all (advance_all (advance_all lst))
               in ()
The test prints 0 0 2 5.
The current version is 1.1, Feb 18, 2008
The code was originally posted in the message Re: existentials [was: Formal specifications of programming languages] on the Caml mailing list on Tue, 19 Feb 2008 00:54:23 -0800 (PST)

Didier Re'my: Programming Objects with ML-ART: An extension to ML with Abstract and Record Types.
International Symposium on Theoretical Aspects of Computer Software, 1994, LNCS 789, pp. 321-346.
The paper is the best exposition of the motivation and the design of objects in OCaml. The paper discusses existentials and type abstraction in great detail.

OCaml now has first-class modules, letting us `package' a module hiding the implementation details. The package is a first-class value and can be placed into lists and passed around.