previous  next  contents  top

Representing existential data types with isomorphic simple types


We describe various ways of eliminating explicit existential quantification in data types, replacing such data types with isomorphic structures. We stress the encoding of existentials in terms of equivalent simple, first-order types. All the techniques let us work with existentials in a language like OCaml, which does not support existentials directly.

Representing existentials via isomorphic complex types (for contrast with the above)


Introduction and motivation

Existential types exists a. t along with universal types forall a. t are two well-known examples of second-order types. Universals represent type parameterization and so are used to give type to parametrically polymorphic functions such as the identity function. Existentials express type abstraction; their typical application is typing abstract data types, objects, and implicitly heterogeneous lists.

A sample existential type exists a b. (a,b) is the type of any pair whatsoever. We can take the pair apart using fst and snd, put it back together. We can pass the components of the pair around, but we cannot do anything else with them: we do not know their exact types and so have no idea what operations may apply to them. Existential types are supported as a widely implemented extension to Haskell98. In Haskell, our sample data type is declared as data Pair = forall a b. Pair a b (using the forall quantifier, which may be confusing at first).

Existentials are most known as an implementation technique for abstract data types, aka packages. Packages are data structures whose realization is hidden and which can be manipulated only by a closed set of pre-defined operations. In other words, packages expose the interface but hide the implementation. One of the simplest packages is

     data SE = forall x. SE x (x -> String)
consisting of a value of some type x and a single pre-defined operation to manipulate this value, namely, to observe this value as a String. We can declare the package differently, but equivalently as
     data SE' = forall x. Show x => SE' x
using so-called bounded quantification, where the unknown type x is assured to be a member of the type class Show. The two declarations are equivalent; we can always convert the declaration with the type-class qualified quantification into the declaration with the unrestricted quantification by including all the type class methods (such as show :: x -> String) as explicit operations of the package. Therefore, we do not consider bounded quantification any further on this page.

We can define two values of our existential type

     se1 = SE (5::Int) show
     se2 = SE "xx" id
     tle = [se1, se2]
     tle_run = map (\(SE x f) -> f x) tle
Although the two values represent the type x differently, Int vs. String, the difference is abstracted over and so the two values have the same type, SE. Therefore, the two values can be placed into the same list and uniformly manipulated. We thus observe the other, related benefit of existentials: supporting implicitly heterogeneous lists -- lists of elements of different types but with a common set of operations. In the above example we relied on existentials to `place' an Int and a String in the same list, and map the common `to-string' operation over the list. Existentials are only one of several ways of implementing implicitly heterogeneous lists; the others are described elsewhere on this web page.
John C. Mitchell, Gordon D. Plotkin: Abstract types have existential type.
ACM Transactions on Programming Languages and Systems, 1988.
< >

Luca Cardelli: Type systems.
The Computer Science and Engineering Handbook (Allen B. Tucker (Ed.)). CRC Press, 2004. Chapter 97.
< >


Eliminating simple existentials

In the introduction section we used existential to define a trivial abstract data type with a single operation of observing the hidden, abstract value as a String.
     data SE = forall x. SE x (x -> String)

By pattern-matching the value of the existential against the pattern (SE x f) we can get hold of the value x of the abstracted type and pass the value around. The only non-trivial operation possible on x is applying to x the function f obtained as the result of the same pattern-match operation. One may wonder therefore why not to apply f to x from the outset and so to observe x right away. There is nothing to hide any more and we do not need any existentials.

The gist of eliminating an existential is representing it by a tuple of its possible observations. We represent SE as a simple data type

     data S1 = S1 String
In a non-strict language like Haskell, the observation, the String, is not computed unless needed. In a strict language such as OCaml, we should use thunks to delay computations, and so we represent SE as type s1 = unit -> string. The implicitly heterogeneous list of an Int and a String with the common `to-string' operation, cf. tle above, can be written in Haskell simply as
     ts1 = [S1 (show 5), S1 "xx"]
     ts1_run = map (\(S1 x) -> x) ts1
or as following in OCaml
     let ts1 = [(fun () -> string_of_int 5); (fun () -> "xx")] (fun f -> f ()) ts1

To prove that SE and S1 are equivalent, we exhibit the following total functions to convert the value of one type to the other:

     iso_SES1 :: SE -> S1
     iso_SES1 (SE x to_string) = S1 (to_string x)
     iso_S1SE :: S1 -> SE
     iso_S1SE (S1 s) = SE () (\() -> s)
It is clear that the composition iso_SES1 . iso_S1SE :: S1 -> S1 is the identity. Proving that there is no context that can distinguish a value tse::SE from the value (iso_SES1 . iso_S1SE) tse is more involved. We have to appeal to the opacity of the existentially quantified type. The technique of Sumii and Pierce seems best for such equivalence proofs.
Eijiro Sumii and Benjamin C. Pierce: A Bisimulation for Type Abstraction and Recursion.
Journal of the ACM, vol. 54, issue 5, article 26, pp. 1-43, October 2007.
The paper presents ``a bisimulation method for proving the contextual equivalence of packages in lambda-calculus with full existential and recursive types.'' Unlike other bisimulations, it is complete even for existential types.


Eliminating existential packages

A package packs a value of an abstract type together with a set of operations to manipulate the value. Since the type of the value is abstract, the value can only be manipulated by the operations of the same package. Here is a typical package, described as an OCaml module type:
     module type COUNTER = sig
       type t
       val init    : unit -> t
       val observe : t -> int
       val step    : t -> t
The package interface introduces an abstract type t of an unknown representation and three operations: the producer, the observer, and the transformer. The producer makes instances of the counter in the initial state; the step operation advances the counter. We can also observe the current value of the counter. Our package is not an OCaml value. With the exception of MoscowML and AliceML, modules are not first-class values in ML and cannot be passed around or stored in lists.

To implement packages in the language itself, one commonly uses existentials -- which are not directly supported in OCaml. Nevertheless, we are able to realize packages as first-class values in ML or other simply-typed language with data types. By the very definition of a package, its set of operations is pre-defined and closed. We can enumerate the possible observations, thus eliminating the existential by replacing it with the tuple of its possible observations.

     type counter = C of (unit -> int) * (unit -> counter)
We represent the package `counter' as a tuple of the observed current value of the counter and the advanced counter. These are the only possible observations of a counter. Thunks delay the observations until their results are required. Here are two counters with two different internal states, a pair of integers vs. a floating-point number:
     let counter1 = 
       (* internal function, unavailable outside *)
       let rec make seed upper = 
         C ((fun () -> seed),
            (fun () -> let seed = if seed >= upper then 0 else succ seed in make seed upper))
       in make 0 5
     let counter2 = 
       (* internal function, unavailable outside *)
       let observe seed = int_of_float (10.0 *. seed) in
       let step seed = cos seed in
       let rec make seed = C ((fun () -> observe seed),
                              (fun () -> make (step seed)))
       in make 0.0
We can place the two counters into the same list and iterate over:
     let lst = [counter1; counter2];;
     let advance_all = (function C (_,adv) -> adv ());;
     let show_all = List.iter (function C (s,_) -> Printf.printf "%d\n" (s()));;
     let test = let () = show_all lst in
                let () = show_all (advance_all (advance_all lst))
                in ();;

We thus demonstrated a very old idea -- whose realization in 1976 was the birth of Scheme -- that closures are poor-man objects.

More generally, we can describe a package as the following Haskell existential data type:

     data CE i m o = 
         forall x. CE x                      -- current state
                      (i -> x)               -- producer
                      (x -> m -> x)          -- transformer (`step')
                      (x -> o)               -- observer
Here, the type i is the type of the arguments to the constructor; the type m describes the additional arguments to the transformer, and o is the type of observations. The corresponding simple data type is
     data C1 m o = C1 (m -> C1 m o) o
To show the isomorphism between the existential package and the existential-free representation, we exhibit the following pair of total functions. The function iso_CEC1 converts the existential data type CE into a pair of the corresponding simple data type C1 and the constructor function. The latter creates the (initial) value of C1. The function iso_C1CE is the observational inverse.
     iso_CEC1 :: CE i m o -> (C1 m o, i -> C1 m o)
     iso_CEC1 (CE x init step observe) = (makeC1 x, \i -> makeC1 (init i))
      where makeC1 x = C1 (makeC1 . step x) (observe x)
     iso_C1CE :: (C1 m o, i -> C1 m o) -> CE i m o
     iso_C1CE (c1, makeC1) = CE c1 init step observe
      where init = makeC1
            step (C1 step1 _)   = step1
            observe (C1 _ obs1) = obs1
One may call C1 m o the Skolem function for the quantified type variable x.
The current version is 1.3, June 28, 2008.
In the final form, the idea of using the recursive first-order data type was explained in the message Eliminating existentials, finally posted on the Caml list on Wed, 14 Nov 2007 00:37:10 -0800 (PST). The early version of the idea was discussed in the thread ``Using existential types'' on Haskell-Cafe in October 2003.

Bernhard Beckert and Joachim Posegga: leanTAP: Lean Tableau-based Deduction.
Journal of Automated Reasoning, v15, N3, pp. 339-358, 1995.
Taking a formula itself as the Skolem-term is used in the leanTAP theorem prover. The authors credit Hilbert and Bernays, 1939.


Eliminating recursive existential variants

Existentials also prove useful in embedding domain-specific languages, including monadic languages. Here is a sample representation of a trivial first-order language with application:
     data Expr a = Val a | forall b. Apply (Expr (b->a)) (Expr b)
     ex1 = Apply (Val fromEnum) (Apply (Apply (Val (==)) (Val 'd')) (Val 'c'))
Expr a describes the result of zero or more applications as well as the structure of this series of applications. Jan-Willem Maessen observed that the two aspects can be represented separately, in the following existential-free data type:
     data E1 a = E1 a Tree
     data Tree = Leaf | Node Tree Tree

We again exhibit a pair of total functions to convert between the two data types:

     iso_EXE1 :: Expr a -> E1 a
     iso_EXE1 (Val v) = E1 v Leaf
     iso_EXE1 (Apply opr opa) = E1 (opr' opa') (Node tr ta) 
      where E1 opr' tr = iso_EXE1 opr
            E1 opa' ta = iso_EXE1 opa
     iso_E1EX :: E1 a -> Expr a
     iso_E1EX (E1 v Leaf) = Val v
     iso_E1EX (E1 v (Node t1 t2)) = Apply (iso_E1EX (E1 (\() -> v) t1))
                                          (iso_E1EX (E1 () t2))
The first-order representation given here first appeared as the last encoding in the message
Jan-Willem Maessen: Using existential types
posted on Haskell-Cafe on Fri Oct 10 16:57:12 EDT 2003.
The whole discussion thread is also relevant. No attempt to establish isomorphism was made back then.


Eliminating translucent existentials

Some existentials are translucent, exposing part of their internal structure. The epitome of such existentials was given by Chung-chieh Shan:
     data LE = forall x. LE [x] ([x] -> String)

This data type exposes the list as part of LE and the number of elements in the list, while hiding all information about the elements themselves. There are no operations to examine the elements individually; in particular, we cannot compare the elements. We cannot tell if the list contains duplicates. We can do only three sorts of observations on LE: (i) observe the number of elements in the list, the first component of LE; (ii) observe the current `state', by applying the second component of LE to the first component; (iii) transform the first component to a different list and then observe the result. When transforming the list, we cannot examine the elements, but we may arbitrarily permute, drop and duplicate the elements. The permissible transformations then are scramblings of the list. Thus the data type LE is equivalent to the following package:

     data LEP =
         forall xs. LEP xs            -- current state, hidden
                        (n::Int)      -- the exposed part of the curr state
                        (xs -> (forall a. [a]^n->[a]) -> xs) -- transformer
                        (xs -> String)                       -- observer

One may think that scramblings can be described by a type (forall a. [a] -> [a]), but this is not precise. The set of transformations depends on the size of the list. For example, the only possible `scrambling' of the empty list is the identity: We cannot create any new elements of the list, and we are given no elements to drop or replicate. If the first component of LE is the list of one element, we could either drop or arbitrarily replicate this one element, producing lists of any size all containing copies of that one element. Generally, if the first component of LE is the list of n elements, we may regard these elements as a set of distinct `letters', the alphabet Sigma. The scramblings then map Sigma to Sigma*, to all strings over the alphabet Sigma. Thus we need the dependent type [a]n, denoting a list of elements of the type a and the length n.

Using the general technique of representing an existential package by the set of available observations, we can derive the equivalent first-order data type

     data L1P = L1P (n::Int)
                    (forall a. [a]^n->[a]) -> String
The second component of L1P maps each possible scrambling on the list of n distinct elements to a string. It is clear that we need the precise characterization of scramblings if we wish to establish the isomorphism of L1P with LE. The non-dependent type (forall a. [a]->[a]) is just `too large'.

Because of the dependent type, neither LEP nor L1P can be written in Haskell, at least not directly. Chung-chieh Shan showed the way around. We can write the type of scramblings of the list of n elements, as a nested data type. Recall that the set of scramblings is the set of all strings over the alphabet of n distinct letters. We can write the type of the alphabet as follows: the type () represents the singleton set, with one element (). The type Maybe () then represents a set with two elements, Nothing and Just (); Maybe (Maybe ()) represents a set with exactly three elements: Nothing, Just Nothing, Just (Just ()). Then all scramblings for the list of three elements are precisely described by the set of (non-bottom) values of the type [Maybe (Maybe ())]. For example, the identity scamble is represented as [Nothing, Just Nothing, Just (Just ())]; the scamble that drops the last element and replaces it with the first element is [Nothing, Just Nothing, Nothing], etc. Thus Maybe(n-1) () represents the alphabet of exactly n elements, and the type [Maybe(n-1) ()] precisely represents the set of all possible strings over that alphabet. That is, there exists a 1-1 correspondence between a non-bottom value of the type Maybe(n-1) () and a string over the n-letter alphabet.

The following is a simple embellishment of the code originally written by Chung-chieh Shan in December 2003. It builds the encoding of the alphabet and of the above correspondence. For example, the list of three elements of the unknown type is represented by the list of their 'codes': [Nothing, Just Nothing, Just (Just ())].

     -- First-order representation, via a nested data type
     data L1 = Lempty String | Lmany (Ln ())
     data Ln a = Ln ([a] -> String) | Ln1 (Ln (Maybe a))
     iso_LEL1 :: LE -> L1
     iso_LEL1 (LE [] to_string)     = Lempty (to_string [])
     iso_LEL1 (LE (x:xs) to_string) = Lmany (build to_string xs [x] [()])
      build :: Eq a => ([x] -> String) -> [x] -> [x] -> [a] -> Ln a
      build tos [] xs codes = Ln (tos . map (encode xs (reverse codes)))
      build tos (x:t) xs codes = Ln1 (build tos t (x:xs) (Nothing:map Just codes))
      encode xs codes y = maybe (error "can't happen") id $ lookup y (zip codes xs)
     iso_L1LE :: L1 -> LE
     iso_L1LE (Lempty s) = LE ([]::[()]) (const s)
     iso_L1LE (Lmany ls) = make ls [()]
      make :: Ln a -> [a] -> LE
      make (Ln f) codes   = LE codes f
      make (Ln1 ls) codes = make ls (Nothing:map Just codes)
     -- test the equivalence of two LE values on sample points
     test_eqLE (LE xs1 obs1) (LE xs2 obs2) =
        zip (map obs1 (tails xs1)) (map obs2 (tails xs2))
     test_LE1E = test_eqLE anLE (iso_L1LE . iso_LEL1 $ anLE)
      where anLE = LE "abc" reverse
     test_LE1E' = test_eqLE anLE (iso_L1LE . iso_LEL1 $ anLE)
      where anLE = LE "abc" (drop 1)


Bringing different types into union

A common problem is the uniform manipulation of a collection of elements that do not have the same type but permit a common set of operations. For example, a GUI form may include several widgets -- scrollbars, buttons, text entry fields -- of different structure and generally of different types. These child widgets share common operations such as repositioning, redrawing, showing and hiding. A typical implementation of the GUI form includes a list of the child widgets, and functions to iterate over the list, e.g., to show all widgets.

To place elements of different types into the same homogeneous list we have to inject the elements into the appropriate union type (the universe). Variant types are the simplest way to build the universe. For example,

     type univ = VInt of int | VFloat of float | VBool of bool
lets us place integers, floating-point numbers and booleans into the same list: [VInt 1; VFloat 2.0; VBool false]. The variant type has the clear disadvantage of being non-extensible. One often needs open unions.

Existentials are one way of building open unions, with an advantage of type abstraction. If encapsulation is not required, one can always build open unions in other ways. For example, ML supports open unions in the form of exception types. OCaml has polymorphic open unions. Even simpler, the type of mutable cells is an open union!

Yet another way of building open unions is by iterating the sum type. Here is an example in Haskell:

     data Sum a b = InL a | InR b
     instance (Show a, Show b) => Show (Sum a b) where
         show (InL x) = show x
         show (InR x) = show x
     ex_sum = [InL (1::Int), InR (InL (2.0::Float)), InR (InR False)]
     test_sum = map show ex_sum  -- ["1","2.0","False"]
One may define type classes for other common operations on the elements of ex_sum. Building such lists by hand is inconvenient. OOHaskell demonstrates automatic iterated sums with the corresponding injection and projection operations.

Implicitly heterogeneous lists without existentials offer no value abstraction, let alone type abstraction. On the upside, open unions support a projection operation, aka safe downcast.

The MLton team: UniversalType
< >
The detailed explanation of implementing open unions in SML in terms of exceptions or reference cells.


Existentials as universals

This approach is beside our goal of representing existentials in simple types: universals are not simple types, they are too of higher rank. We consider this well-known encoding for contrast.
     data ShowableE = forall x. ShowableE x (x -> String)
     tle = [ShowableE 5 show, ShowableE "x" id]
     tle_run = map (\(ShowableE x f) -> f x) tle
     newtype ShowableU = ShowableU (forall w. (forall x. (x,x->String) -> w) -> w)
     tlu = [ShowableU (\k -> k (5::Int, show)), 
            ShowableU (\k -> k ("x", id))]
     tlu_run = map (\(ShowableU k) -> k (\(x,f) -> f x)) tlu
Here, ShowableE is a simple existential (the value of some type packed along with the function to show it), and ShowableU is the corresponding universal: the wrapper over the polymorphic function that takes as an argument a polymorphic function (forall a. (a,a->String) -> w). The expressions tle_run and tlu_run both evaluate to the same result, ["5","x"].

Derek Elkins has lucidly explained the transformation as classical double-negation:

     exists x. P x
     <=> ~~(exists x. P x)
     <=> ~(forall x. ~(P x))
     <=> ~(forall x. P x -> _|_)
     <=> (forall x. P x -> _|_) -> _|_
where ~A means the negation of the formula A and _|_ means `bottom' or falsity.
Benjamin C. Pierce: Types and Programming Languages (MIT Press, 2002). Section 24.3

Derek Elkins: Re: Using existential types
A message posted on the Haskell-Cafe mailing list on Tue, 14 Oct 2003 03:49:32 -0400


Objects as an encoding of existentials

This approach is too beside our goal of representing existentials in simple types. Object types are implicitly existential, with respect to the types of their private fields. The very motivation for objects is the encapsulation of their state, which includes the abstraction over the types of their private fields so to prevent the outsiders from finding out not only the values of the fields but also their types. Outsiders should only use the methods of the object and should know nothing about object's fields.

We use the familiar `counter' as an illustration. We first declare the interface, the object type, for the sake of (optional) type annotations and clarity:

     class type counter = object ('self)
       method observe : int
       method step    : unit -> 'self
We introduce two objects of the type counter -- two implementations of the above interface. The two counters differ in their internal state: a pair of integers vs. a floating-point number.
     let counter1 = object
        val seed  = 0
        val upper = 5
        method observe = seed
        method step () = 
          let new_seed = if seed >= upper then 0 else succ seed in
          {< seed = new_seed >}
     let counter2 : counter = object
        val seed = 0.0
        method observe = int_of_float (10.0 *. seed)
        method step () = {< seed = cos seed >}

The state of the objects is not directly observable. Therefore, despite the different type of the internal state, the two counters have the same type. We can put the counters in the same list and uniformly process them by iterating over the list.

     let lst = [counter1; counter2];;
     let advance_all = (fun c -> c#step ());;
     let show_all = List.iter (fun c -> Printf.printf "%d\n" c#observe);;
     let test = let () = show_all lst in
                let () = show_all (advance_all (advance_all lst))
                in ();;

The test prints 0 0 2 5

The current version is 1.1, Feb 18, 2008.
The code was originally posted in a message Re: existentials [was: Formal specifications of programming languages] on the Caml mailing list on Tue, 19 Feb 2008 00:54:23 -0800 (PST)

Didier Re'my: Programming Objects with ML-ART: An extension to ML with Abstract and Record Types.
International Symposium on Theoretical Aspects of Computer Software, 1994, LNCS 789, pp. 321-346.
The paper is the best exposition of the motivation and the design of objects in OCaml. The paper discusses existentials and type abstraction in great detail.

Moscow ML: a light-weight implementation of Standard ML (SML)
< >
MoscowML has first-class modules, letting us `package' a module hiding the implementation details. The package is a first-class value and can be placed into lists and passed around.

Last updated October 5, 2008

This site's top page is
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML