From oleg at okmij.org Fri Aug 27 16:51:43 2004 To: haskell@haskell.org Subject: Applicative translucent functors in Haskell Message-ID: <20040827235143.2BDF7AB7E@Adric.metnet.navy.mil> Date: Fri, 27 Aug 2004 16:51:43 -0700 (PDT) X-comment: (July 2010) Added LANGUAGE pragma, corrected URLs Status: RO ML is known for its sophisticated, higher-order module system, which is formalized in Dreyer-Crary-Harper language. A few months ago Chung-chieh Shan showed a complete translation of that language into System F-omega: http://www.cs.rutgers.edu/~ccshan/xlate/xlate.pdf Chung-chieh Shan has concluded that languages based on F-omega, such as Haskell with common extensions, already support higher-order modular programming. In fact, his paper showed a sample Haskell translation of the most complex and interesting ML module expression: a translucent applicative functor. Different instantiations of the functor with respect to type-compatible arguments are type-compatible; and yet the functor hides the representation details behind the unbreakable abstraction barrier. Dreyer-Crary-Harper language and System F-omega are deliberately syntactically starved, to simplify formal analyses. Therefore, using the results of Chung-chieh Shan's paper _literally_ may be slightly cumbersome. This message is an attempt to interpret some of Chung-chieh Shan's results in idiomatic Haskell with the full use of type classes. We will also show that type sharing constraints can be expressed in a scalable manner, so that the whole translation is practically usable. Thus we can enjoy the sophisticated, first-class higher-order module system in today's Haskell. No new extensions are required; furthermore, even undecidable instances (let alone overlapping instances) are not used. This message has been inspired by Chung-chieh Shan's paper and has greatly benefited from several conversations with him. Throughout this message we will be using an example of polymorphic sets parameterized by an element-comparison function. Our example is an extended version of the example in Chung-chieh Shan's paper. We will also be using OCaml syntax for module expressions, which, IMHO, makes a little bit more sense. We abuse the terminology and use the word 'module' to also mean what ML calls 'structure'. This message is both Haskell and OCaml literate code. It can be loaded in GHCi or Hugs -98 as it is. To get the OCaml code, please filter the text of the message with "sed -n -e '/^}/ s/^} //p'" > {-# LANGUAGE Rank2Types, ExistentialQuantification, ScopedTypeVariables #-} > {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-} > {-# LANGUAGE FunctionalDependencies #-} > module TranslucentAppFunctor where Our goal in this message is to produce implementations of a SET interface: } module type SET = sig } type element } type set } val empty : set } val add : element -> set -> set } val member : element -> set -> bool } end;; This is the OCaml declaration of a type of a collection made of two type declarations and three value declarations. Such collections are called structures (or, modules, by abuse of the terminology) and their types are called signatures. The concise description of the OCaml module language can be found at http://caml.inria.fr/pub/docs/manual-ocaml/manual004.html We should point out that the types 'element' and 'set' are abstract -- the right-hand side of the corresponding type declarations is empty. In ML, a single colon adds a type annotation whereas a double colon is a list constructor -- in Haskell, it is just the opposite. The corresponding SET signature in Haskell is well-known: > class SET element set | set -> element where > empty :: set > add :: element -> set -> set > member :: element -> set -> Bool We shall build an implementation of SET parameterized by the element comparison function. The comparison function's interface is to be described by its own signature, ORD. We shall define two different instances of ORD and instantiate the SET functor with those two instances. To make the game more interesting, our implementation of the ORD interface will itself be parameterized by the ENUM interface, which maps elements in a totally ordered domain into integers. So, the comparison function will use that mapping to derive the element comparison. Thus our game plan is: - introduce the ENUM interface, - define two implementations of the ENUM interface, - introduce two different ENUM->ORD transparent functors, - instantiate the functors yielding four implementations of the ORD interface, - introduce a translucent ORD->SET applicative functor, - instantiate it obtaining different implementations of SET - run a few tests, to illustrate applicativity of the functor and the abstraction barrier We start with the ENUM interface: } module type ENUM = sig } type e } val fromEnum: e -> int } end;; and its two implementations. One of them is } module IntEnum : (ENUM with type e = int) = struct } type e = int } let fromEnum x = x } end;; We wrote a module expression -- a collection of one type definition and one (functional) value definition, and told the compiler its explicit type, ENUM. The stuff after 'with' is a type sharing constraint: the type of IntEnum is ENUM such that the type ENUM.e is int. The explicit type annotation can be dropped: } module CharEnum = struct } type e = char } let fromEnum x = Char.code x } end;; The corresponding code in Haskell is Indeed, the standard Enum type class in Haskell serves our purpose. The class declaration plays the role of the signature declaration, with instances providing the implementation. It's all part of the Prelude, so we have nothing to write here. One may ask -- what if I want several instances that correspond to the same type? How to do that in Haskell? Please read on. We use the ENUM module to build element comparison functions -- or modules of the signature: } module type ORD = sig } type el } val compr: el -> el -> bool } end;; The interface includes the type 'el' and the value of the comparison function. Our comparison function is actually an equality predicate -- but this is enough for our purposes here. This message isn't about the efficient implementation of sets. We promised to build two implementations of ORD, both of which parameterized by ENUM. The first implementation is: } module Ord_LR(Elt: ENUM) : (ORD with type el = Elt.e) = struct } type el = Elt.e } let compr x y = (Elt.fromEnum x) = (Elt.fromEnum y) } end;; The type sharing constraint became more interesting: the result of the Ord_LR functor (a mapping from modules to modules) is the module of the signature ORD whose type 'el' is not explicitly specified and remains abstract. Yet the type is constrained to be the same as the element type of the argument ENUM module -- whatever that happens to be. The other functor is: } module Ord_LE(Elt: ENUM) = struct } type el = Elt.e } let compr x y = ((Elt.fromEnum x) mod 2) = ((Elt.fromEnum y) mod 2) } end;; Now, our ML code gives two different implementations of ORD, both of which may have the same element type. How can we do that in Haskell? Very simple: by introducing a discriminator label type. > class ORD label elem where > compr:: label -> elem -> elem -> Bool We translate Ord_LR and Ord_LE into the following Haskell code: > -- Labels > data LR = LR > data LE = LE > instance Enum a => ORD LR a where > compr _ a b = fromEnum a == fromEnum b > instance Enum a => ORD LE a where > compr _ a b = even (fromEnum a) == even (fromEnum b) As before, a class declaration corresponds to an ML signature, and instances correspond to implementations of the signature. The two Haskell instances above are actually parameterized implementations -- that is, functors -- parameterized by Enum. The type sharing constraint ENUM.e = ORD.el is expressed by using the same type variable 'a' in the instance declarations. We thus observe that an abstract type of ML corresponds to a type variable (here: uninstantiated type variable) and the type sharing is expressed by sharing of the type variable names. Our goal, the ORD->SET functor, is as follows: } module SETF(Elt: ORD) : (SET with type element = Elt.el) = struct } type element = Elt.el } type set = element list } let empty = [] } let member element set = List.fold_left } (fun seed e -> Elt.compr element e || seed) false set } let add element set = if member element set then set else (element::set) } end;; Recall that the signature SET had two abstract types: 'element' and 'set'. The type sharing constraint makes the former to be the same as ORD.el. The type SET.set remains abstract. Here's the complete type that OCaml infers for SETF module SETF : functor (Elt : ORD) -> sig type element = Elt.el and set val empty : set val add : element -> set -> set val member : element -> set -> bool end Although our implementation of the functor used "element list" for the type set, in the result of the functor, 'set' remains abstract. The precise implementation of 'set' is hidden behind the abstraction barrier. The functor SETF is therefore translucent -- the user can see SET.element (if he can see ORD.el) -- but SET.set is hidden. The functor SETF is also applicative -- if Ord1 == Ord2, then SETF(Ord1) == SETF(Ord2). Applications of an applicative functor to type compatible arguments yield type-compatible results. Let us illustrate that point first, before considering the Haskell implementation. } module Set1 = SETF(Ord_LR(IntEnum));; } module Set2 = SETF(Ord_LR(IntEnum));; We create two Sets, by instantiating SETF over ORD, which, in turn, are instantiated over IntEnum. We have nested functor applications. Both functors are applicative, so that Set1 and Set2 are type compatible. That is, we can freely use the methods of Set1 on Set2, and vice versa: } let s1 = Set1.add 1 Set1.empty;; } let s2 = Set2.add 2 s1;; } let r1 = Set1.member 1 s2;; Set 's1' was created by the module Set1 -- and yet we use Set2.add to add more elements to it. We then have Set1.member check for membership in the resulting set. Of course if we apply the functor to type-incompatible modules, the results aren't type-compatible either. Given } module Set3 = SETF(Ord_LE(IntEnum));; } let s3 = Set3.add 3 Set3.empty;; An attempt to evaluate let r2 = Set1.member 3 s3;; ^^ This expression has type Set3.set = SETF(Ord_LE(IntEnum)).set but is here used with type Set1.set = SETF(Ord_LR(IntEnum)).set leads to a type error. And so does this: } module Set4 = SETF(Ord_LE(CharEnum));; } let s4 = Set4.add 'a' (Set4.add 'b' Set4.empty);; } let r4 = Set4.member 'c' s4;; let rw = Set1.add 1 Set4.empty;; ^^^^^^^^^^ This expression has type Set4.set = SETF(Ord_LE(CharEnum)).set but is here used with type Set1.set = SETF(Ord_LR(IntEnum)).set The error message itself is remarkable. It says that the type SETF(Ord_LR(IntEnum)).set is different from SETF(Ord_LE(IntEnum)).set and from the type SETF(Ord_LE(CharEnum)).set. And yet SETF(Ord_LR(IntEnum)).set produced in two different applications of SETF(Ord_LR(IntEnum)) is the same. However, 'set' itself remains abstract. Indeed, if we attempt to break the barrier and access 'set' as if it were a list (which it is, operationally), we get a type error: List.length Set1.empty;; ^^^^^^^^^^ This expression has type Set1.set = SETF(Ord_LR(IntEnum)).set but is here used with type 'a list How can we implement such an applicative and translucent functor in Haskell? As Chung-chieh Shan pointed out, we need higher-ranked types, and skolemization. First, we introduce an auxiliary class, over a higher-ranked type parameter: > class SETE r where > lempty :: r l elem > ladd :: (ORD l elem) => elem -> r l elem -> r l elem > lmember:: (ORD l elem) => elem -> r l elem -> Bool The class describes the signature of a transparent functor: notice the parameterization by ORD. Now we have to claim that the result of applying SETE to the ORD `module' is a SET. In Haskell terms, we have to make the result of `applying' SETE an instance of the class SET: > newtype SetESet a = SetESet a > instance (SETE setlabel, ORD ordlabel element) => > SET element (SetESet (setlabel ordlabel element)) where > empty = SetESet $ lempty > add e (SetESet s) = SetESet $ ladd e s > member e (SetESet s) = lmember e s This is clearly the boilerplate. One may hope that it could be automated, eventually. It is worth pointing out the sharing constraint: the fact that 'element' type of a set is the same as the element type of ORD is expressed here by using the same type variable 'element' in both places. So, the type sharing constraint of ML is expressed by sharing a name in Haskell. Now, we need to make the 'set' type hidden while at the same type making sure we do not hide the element type: > data SETFE = forall f. SETE f => SETFE (forall a b. (SetESet (f a b))) As before, an abstract type of ML corresponds to a type variable in Haskell -- only here it is an explicitly quantified type variable. The existential quantification provides the unbreakable abstraction barrier. As Chung-chieh Shan repeatedly emphasized in his paper, the insight here is skolemization, needed to universally quantify _under_ the existential quantification. And for that, we need a higher-ranked type. We haven't yet provided the implementation for the functor SETE: > newtype SLE l e = SLE [e] > > instance SETE SLE where > lempty = SLE [] > ladd e s@(SLE set) = if lmember e s then s else SLE$ e:set > lmember e (SLE set::SLE l e) = > not (null (filter (compr (undefined::l) e) set)) And here is the applicative functor itself: > setfe = SETFE (SetESet(lempty::SLE l e)) We must emphasize that it is an ordinary Haskell value. Thus we gained not only a higher-order module system, but also a first-class one. We only need to introduce a way to appropriately instantiate the functor > inst:: w (f a b) -> a -> b -> w (f a b) > inst f a b = f and we are ready for examples: > testf = case setfe of > (SETFE fs) -> let set1_empty = inst fs LR (undefined::Int) > s1 = add 1 (add 0 set1_empty) > rm1 = member 2 s1 Let us make another instantiation of the functor, to the same arguments: LR and Int: > set2_empty = inst fs LR (undefined::Int) > set2_add e s = add e (s `asTypeOf` set2_empty) > set1_member e s = member e (s `asTypeOf` set1_empty) > s2 = set2_add 3 s1 > r1 = set1_member 1 s2 we observe applicativity: two instantiations of the functor to type compatible arguments are type compatible: set1_empty and set2_empty are of the same type. Of course, if we instantiate the functor to type incompatible arguments, the results are not type compatible: > set3_empty = inst fs LE (undefined::Int) > s3 = add 1 (add 0 set3_empty) > r3 = member 2 s3 > -- rw1 = set1_member 3 s3 If we uncomment the last line, we get a type error. The error message is especially revealing in Hugs: ERROR - Type error in application *** Expression : set1_member 3 s3 *** Term : s3 *** Type : SetESet (_3 LE Int) *** Does not match : SetESet (_3 LR Int) It is worth comparing with the corresponding OCaml message cited earlier. Note that the type indicate the particular implementation of the set is abstract: printed as _3 here. We can't know which implementation of set we're dealing with. > set4_empty = inst fs LE 'x' > s4 = add 'a' (add 'b' set4_empty) > r4 = member 'c' s4 > -- rw2 = set1_member 'a' s4 Again, if we try to break the abstraction barrier, as in the following statement, we just get a type error. > -- rw3 = null x where (SetESet x) = set1_empty > > in (rm1,r1,r3,r4) The example illustrates that Haskell already has a higher-order module language integrated with the core language and with the module checking being a part of the regular typechecking. The upshot of the translation: ML signatures correspond to Haskell type classes, and their implementations to the instances Abstract types in ML correspond to either uninstantiated or explicitly quantified type variables in Haskell Type sharing is expressed via type variable name sharing Functor (signatures or structures) correspond to Haskell (class declarations or instances) with type class constraints The argument of functors is expressed via types, with additional labels when needed for finer differentiation Functor applications are done by instance selection based on types at hand plus the additional labels OCaml signature attribution operation -- casting the module or the result of the functor into a desired signature and hiding the extras -- sometimes involves additional tagging/untagging tricks (cf. SetESet). This tagging, done via newtype, is syntactic only and has no run-time effect. Hiding of information (`sealing', in ML-speak) is done by existential quantification. To gain applicativity, we quantify over a higher-ranked type variable (Skolem function proper).