From oleg at okmij.org Fri Aug 27 16:51:43 2004
To: haskell@haskell.org
Subject: Applicative translucent functors in Haskell
Message-ID: <20040827235143.2BDF7AB7E@Adric.metnet.navy.mil>
Date: Fri, 27 Aug 2004 16:51:43 -0700 (PDT)
X-comment: (July 2010) Added LANGUAGE pragma, corrected URLs
Status: RO


ML is known for its sophisticated, higher-order module system, which
is formalized in Dreyer-Crary-Harper language. A few months ago
Chung-chieh Shan showed a complete translation of that language into
System F-omega:
	http://www.cs.rutgers.edu/~ccshan/xlate/xlate.pdf

Chung-chieh Shan has concluded that languages based on F-omega, such
as Haskell with common extensions, already support higher-order
modular programming. In fact, his paper showed a sample Haskell
translation of the most complex and interesting ML module expression:
a translucent applicative functor. Different instantiations of the
functor with respect to type-compatible arguments are type-compatible;
and yet the functor hides the representation details behind the
unbreakable abstraction barrier.

Dreyer-Crary-Harper language and System F-omega are deliberately
syntactically starved, to simplify formal analyses. Therefore, using
the results of Chung-chieh Shan's paper _literally_ may be slightly
cumbersome. This message is an attempt to interpret some of
Chung-chieh Shan's results in idiomatic Haskell with the full use of
type classes. We will also show that type sharing constraints can be
expressed in a scalable manner, so that the whole translation is
practically usable. Thus we can enjoy the sophisticated, first-class
higher-order module system in today's Haskell. No new extensions are
required; furthermore, even undecidable instances (let alone
overlapping instances) are not used. This message has been inspired by
Chung-chieh Shan's paper and has greatly benefited from several
conversations with him.


Throughout this message we will be using an example of polymorphic
sets parameterized by an element-comparison function. Our example is
an extended version of the example in Chung-chieh Shan's paper. We will also be
using OCaml syntax for module expressions, which, IMHO, makes a little
bit more sense.  We abuse the terminology and use the word 'module' to
also mean what ML calls 'structure'.

This message is both Haskell and OCaml literate code. It can be loaded in
GHCi or Hugs -98 as it is. To get the OCaml code, please filter the text
of the message with "sed -n -e '/^}/ s/^} //p'"

> {-# LANGUAGE Rank2Types, ExistentialQuantification, ScopedTypeVariables #-}
> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}
> {-# LANGUAGE FunctionalDependencies #-}
> module TranslucentAppFunctor where

Our goal in this message is to produce implementations of a SET
interface:

} module type SET = sig
}   type element
}   type set
}   val empty  : set
}   val add    : element -> set -> set
}   val member : element -> set -> bool
} end;;

This is the OCaml declaration of a type of a collection made of two
type declarations and three value declarations. Such collections are
called structures (or, modules, by abuse of the terminology) and
their types are called signatures. The concise description of the
OCaml module language can be found at

	http://caml.inria.fr/pub/docs/manual-ocaml/manual004.html

We should point out that the types 'element' and 'set' are abstract --
the right-hand side of the corresponding type declarations is
empty. In ML, a single colon adds a type annotation whereas a double
colon is a list constructor -- in Haskell, it is just the opposite.
The corresponding SET signature in Haskell is well-known:

> class SET element set | set -> element where
>     empty  :: set
>     add    :: element -> set -> set
>     member :: element -> set -> Bool

We shall build an implementation of SET parameterized by the element
comparison function. The comparison function's interface is to be
described by its own signature, ORD. We shall define two different
instances of ORD and instantiate the SET functor with those two
instances. To make the game more interesting, our implementation of
the ORD interface will itself be parameterized by the ENUM interface,
which maps elements in a totally ordered domain into integers. So, the
comparison function will use that mapping to derive the element
comparison. Thus our game plan is: 
  - introduce the ENUM interface, 
  - define two implementations of the ENUM interface,
  - introduce two different ENUM->ORD transparent functors,
  - instantiate the functors yielding four implementations of the
    ORD interface,
  - introduce a translucent ORD->SET applicative functor,
  - instantiate it obtaining different implementations of SET
  - run a few tests, to illustrate applicativity of the functor and
    the abstraction barrier

We start with the ENUM interface:

} module type ENUM = sig
}   type e
}   val fromEnum: e -> int
} end;;

and its two implementations. One of them is

} module IntEnum : (ENUM with type e = int) = struct
}   type e = int
}   let fromEnum x = x
} end;;

We wrote a module expression -- a collection of one type definition
and one (functional) value definition, and told the compiler its
explicit type, ENUM. The stuff after 'with' is a type sharing
constraint: the type of IntEnum is ENUM such that the type ENUM.e is
int. The explicit type annotation can be dropped:

} module CharEnum = struct
}   type e = char
}   let fromEnum x = Char.code x
} end;;


The corresponding code in Haskell is
  <empty>

Indeed, the standard Enum type class in Haskell serves our
purpose. The class declaration plays the role of the signature
declaration, with instances providing the implementation. It's all
part of the Prelude, so we have nothing to write here.

One may ask -- what if I want several instances that correspond to the
same type? How to do that in Haskell? Please read on.

We use the ENUM module to build element comparison functions -- or
modules of the signature:

} module type ORD = sig
}   type el
}   val compr: el -> el -> bool
} end;;

The interface includes the type 'el' and the value of the comparison
function. Our comparison function is actually an equality predicate --
but this is enough for our purposes here. This message isn't about the
efficient implementation of sets. We promised to build two
implementations of ORD, both of which parameterized by ENUM. The
first implementation is:

} module Ord_LR(Elt: ENUM) : (ORD with type el = Elt.e) = struct
}   type el = Elt.e
}   let compr x y = (Elt.fromEnum x) = (Elt.fromEnum y)
} end;;

The type sharing constraint became more interesting: the result of the
Ord_LR functor (a mapping from modules to modules) is the module of
the signature ORD whose type 'el' is not explicitly specified and
remains abstract. Yet the type is constrained to be the same as the
element type of the argument ENUM module -- whatever that happens to
be. The other functor is:

} module Ord_LE(Elt: ENUM) = struct
}   type el = Elt.e
}   let compr x y = ((Elt.fromEnum x) mod 2) = ((Elt.fromEnum y) mod 2)
} end;;

Now, our ML code gives two different implementations of ORD, both of
which may have the same element type. How can we do that in Haskell?
Very simple: by introducing a discriminator label type.

> class ORD label elem where
>     compr:: label -> elem -> elem -> Bool

We translate Ord_LR and Ord_LE into the following Haskell code:

> -- Labels
> data LR = LR
> data LE = LE

> instance Enum a => ORD LR a where
>     compr _ a b = fromEnum a == fromEnum b
> instance Enum a => ORD LE a where
>     compr _ a b = even (fromEnum a) == even (fromEnum b)

As before, a class declaration corresponds to an ML signature, and
instances correspond to implementations of the signature. The two
Haskell instances above are actually parameterized implementations --
that is, functors -- parameterized by Enum. The type sharing
constraint ENUM.e = ORD.el is expressed by using the same type
variable 'a' in the instance declarations. We thus observe that an
abstract type of ML corresponds to a type variable (here:
uninstantiated type variable) and the type sharing is expressed 
by sharing of the type variable names.


Our goal, the ORD->SET functor, is as follows:

} module SETF(Elt: ORD) : (SET with type element = Elt.el) = struct
}   type element = Elt.el
}   type set = element list
}   let empty = []
}   let member element set = List.fold_left
}       (fun seed e -> Elt.compr element e || seed) false set
}   let add element set = if member element set then set else (element::set)
} end;;

Recall that the signature SET had two abstract types: 'element' and
'set'. The type sharing constraint makes the former to be the same as
ORD.el. The type SET.set remains abstract. Here's the complete type
that OCaml infers for SETF

     module SETF :
       functor (Elt : ORD) ->
	 sig
	   type element = Elt.el
	   and set
	   val empty : set
	   val add : element -> set -> set
	   val member : element -> set -> bool
	 end

Although our implementation of the functor used "element list" for the
type set, in the result of the functor, 'set' remains abstract. The
precise implementation of 'set' is hidden behind the abstraction
barrier. The functor SETF is therefore translucent -- the user can see
SET.element (if he can see ORD.el) -- but SET.set is hidden. The
functor SETF is also applicative -- if Ord1 == Ord2, then SETF(Ord1)
== SETF(Ord2). Applications of an applicative functor to type
compatible arguments yield type-compatible results. Let us illustrate
that point first, before considering the Haskell implementation.

} module Set1 = SETF(Ord_LR(IntEnum));;
} module Set2 = SETF(Ord_LR(IntEnum));;

We create two Sets, by instantiating SETF over ORD, which, in turn,
are instantiated over IntEnum. We have nested functor
applications. Both functors are applicative, so that Set1 and Set2 are
type compatible. That is, we can freely use the methods of Set1 on
Set2, and vice versa:

} let s1 = Set1.add 1 Set1.empty;;
} let s2 = Set2.add 2 s1;;
} let r1 = Set1.member 1 s2;;

Set 's1' was created by the module Set1 -- and yet we use Set2.add to
add more elements to it. We then have Set1.member check for
membership in the resulting set.

Of course if we apply the functor to type-incompatible modules, the
results aren't type-compatible either. Given

} module Set3 = SETF(Ord_LE(IntEnum));;
} let s3 = Set3.add 3 Set3.empty;;

An attempt to evaluate
       let r2 = Set1.member 3 s3;;
			      ^^
     This expression has type Set3.set = SETF(Ord_LE(IntEnum)).set
     but is here used with type Set1.set = SETF(Ord_LR(IntEnum)).set

leads to a type error. And so does this:

} module Set4 = SETF(Ord_LE(CharEnum));;
} let s4 = Set4.add 'a' (Set4.add 'b' Set4.empty);;
} let r4 = Set4.member 'c' s4;;

       let rw = Set1.add 1 Set4.empty;;
			   ^^^^^^^^^^
     This expression has type Set4.set = SETF(Ord_LE(CharEnum)).set
     but is here used with type Set1.set = SETF(Ord_LR(IntEnum)).set

The error message itself is remarkable. It says that the type 
SETF(Ord_LR(IntEnum)).set is different from SETF(Ord_LE(IntEnum)).set
and from the type SETF(Ord_LE(CharEnum)).set. And yet
SETF(Ord_LR(IntEnum)).set produced in two different applications of
SETF(Ord_LR(IntEnum)) is the same. However, 'set' itself remains
abstract. Indeed, if we attempt to break the barrier and access 'set'
as if it were a list (which it is, operationally), we get a type
error:

       List.length Set1.empty;;
		   ^^^^^^^^^^
     This expression has type Set1.set = SETF(Ord_LR(IntEnum)).set
     but is here used with type 'a list

How can we implement such an applicative and translucent functor in
Haskell? As Chung-chieh Shan pointed out, we need higher-ranked types, and
skolemization.

First, we introduce an auxiliary class, over a higher-ranked type
parameter:

> class SETE r where
>     lempty ::  r l elem
>     ladd   :: (ORD l elem) => elem -> r l elem -> r l elem
>     lmember:: (ORD l elem) => elem -> r l elem -> Bool

The class describes the signature of a transparent functor: notice the
parameterization by ORD. Now we have to claim that the result of
applying SETE to the ORD `module' is a SET. In Haskell terms, we have
to make the result of `applying' SETE an instance of the class SET:

> newtype SetESet a = SetESet a
> instance (SETE setlabel, ORD ordlabel element) => 
>     SET element (SetESet (setlabel ordlabel element)) where
>   empty = SetESet $ lempty
>   add e (SetESet s) = SetESet $ ladd e s 
>   member e (SetESet s)  = lmember e s

This is clearly the boilerplate. One may hope that it could be
automated, eventually. It is worth pointing out the sharing
constraint: the fact that 'element' type of a set is the same as the
element type of ORD is expressed here by using the same type variable
'element' in both places. So, the type sharing constraint of ML is
expressed by sharing a name in Haskell.

Now, we need to make the 'set' type hidden while at the same type
making sure we do not hide the element type:

> data SETFE = forall f. SETE f => SETFE (forall a b. (SetESet (f a b)))

As before, an abstract type of ML corresponds to a type variable in
Haskell -- only here it is an explicitly quantified type variable.  The
existential quantification provides the unbreakable abstraction
barrier. As Chung-chieh Shan repeatedly emphasized in his paper, the insight
here is skolemization, needed to universally quantify _under_ the
existential quantification. And for that, we need a higher-ranked type.

We haven't yet provided the implementation for the functor SETE:

> newtype SLE l e = SLE [e]
>
> instance SETE SLE where
>     lempty = SLE []
>     ladd e s@(SLE set) = if lmember e s then s else SLE$ e:set
>     lmember e (SLE set::SLE l e) = 
>              not (null (filter (compr (undefined::l) e) set))

And here is the applicative functor itself:

> setfe = SETFE (SetESet(lempty::SLE l e))

We must emphasize that it is an ordinary Haskell value. Thus we gained
not only a higher-order module system, but also a first-class one.

We only need to introduce a way to appropriately instantiate the functor

> inst:: w (f a b) -> a -> b -> w (f a b)
> inst f a b = f

and we are ready for examples:

> testf = case setfe of
>   (SETFE fs) -> let set1_empty = inst fs LR (undefined::Int)
> 	              s1  = add 1 (add 0 set1_empty)
> 		      rm1  = member 2 s1

Let us make another instantiation of the functor, to the same
arguments: LR and Int:

> 		      set2_empty = inst fs LR (undefined::Int)
>                     set2_add e s = add e (s `asTypeOf` set2_empty)
>                     set1_member e s = member e (s `asTypeOf` set1_empty)
>                     s2 = set2_add 3 s1
>                     r1 = set1_member 1 s2

we observe applicativity: two instantiations of the functor to type
compatible arguments are type compatible: set1_empty and set2_empty
are of the same type. Of course, if we instantiate the functor to type
incompatible arguments, the results are not type compatible:

> 		      set3_empty = inst fs LE (undefined::Int)
> 		      s3  = add 1 (add 0 set3_empty)
> 		      r3  = member 2 s3
>                     -- rw1 = set1_member 3 s3

If we uncomment the last line, we get a type error. The error message
is especially revealing in Hugs:

     ERROR - Type error in application
     *** Expression     : set1_member 3 s3
     *** Term           : s3
     *** Type           : SetESet (_3 LE Int)
     *** Does not match : SetESet (_3 LR Int)

It is worth comparing with the corresponding OCaml message cited
earlier. Note that the type indicate the particular implementation of
the set is abstract: printed as _3 here. We can't know which
implementation of set we're dealing with.

> 		      set4_empty = inst fs LE 'x'
> 		      s4  = add  'a' (add 'b' set4_empty)
> 		      r4  = member 'c' s4
> 		      -- rw2  = set1_member 'a' s4

Again, if we try to break the abstraction barrier, as in the following
statement, we just get a type error.

>                     -- rw3 = null x where (SetESet x) = set1_empty
> 
>                 in (rm1,r1,r3,r4)


The example illustrates that Haskell already has a higher-order module
language integrated with the core language and with the module
checking being a part of the regular typechecking.


The upshot of the translation:

  ML signatures correspond to Haskell type classes, and their
  implementations to the instances

  Abstract types in ML correspond to either uninstantiated or
  explicitly quantified type variables in Haskell

  Type sharing is expressed via type variable name sharing

  Functor (signatures or structures) correspond to Haskell (class
  declarations or instances) with type class constraints

  The argument of functors is expressed via types, with additional labels
  when needed for finer differentiation

  Functor applications are done by instance selection based on types
  at hand plus the additional labels

  OCaml signature attribution operation -- casting the module or
  the result of the functor into a desired signature and hiding
  the extras -- sometimes involves additional tagging/untagging tricks
  (cf. SetESet). This tagging, done via newtype, is syntactic only and
  has no run-time  effect.

  Hiding of information (`sealing', in ML-speak) is done by
  existential quantification. To gain applicativity, we quantify over
  a higher-ranked type variable (Skolem function proper).